Weekly Report 2/13/05 Yan Work 1: Perform experiments for TVLSI Journal revision VPR and cycle-accurate power simulator have been performed for FPGA architecture Class3, i.e., FPGAs using Vdd-programmable CLBs + Vdd-gateable interconnects. VPR and cycle-accurate power simulator have been performed for FPGA architecture Class2, i.e., FPGAs using Vdd-programmable CLBs + Vdd-programmable interconnects w/o Vdd-level converters. Tree-based assignment is used. VPR are currently running for FPGA architecture Class1, i.e., FPGAs using Vdd-programmable CLBs/interconnects w/ Vdd-level converters. Ongoing work includes finishing running VPR for Class1, performing cycle-accurate simulator for Class1, collecting and processing results. Experiments are expected to be finished in early next week and writing will be next step. Ongoing work also includes writing slides for FPGA'05. --------------------------------------------------------------------------- Work 2: Literature search on FPGA interconnect switch/switch block design I found quit a few papers on this area. And the papers I found can be separated into two categories, switch block topology design and circuit deign for routing switches. I. Studies on switch block topology [1] Y.W. Chang, D. F. Wang and C.K. Wong, "Universal Switch-Module Design for Symmetric-Array-Based FPGAs", ACM TODAES 1996. This paper presents an algorithm to generate universal switch module. A switch module M with W terminals on each side is said to be universal if every set of 2-pin nets satisfying the dimensional constraint (the number of nets on each side of M is at most W) can be routed through switch module M. This paper proves 6W is a lower bound on number of switches for any universal switch module. Compared to SUBSET switch module, it is shown that universal switch module can accommodate up to 25% more routing instances. [2] H. Fan, J. Liu and Y-L. Wu, "General Models for Optimum Arbitrary- Dimension FPGA Switch Box Designs", ICCAD 2000 [3] H. Fan, J. Liu and Y-L. Wu, "On Optimum Switch Box Designs for 2-D FPGAs", DAC 2001 These two papers extend [1] to hyper-universal switch module. A hyper-universal switch module can detailed route all possible surrounding multi-pin net topologies satisfying the global routing density constraints. [2] gives an upper bound on number of switches as 6.7W for any hyper-universal switch module and [3] further pushes the upper bound to 6.34W. [4] M. Imran and S. Wilton, " A New Switch Block for Segmented FPGAs", FPL, 1999 Universal switch block and Wilton block are shown to be effective in improving routability for uniform wire length 1 FPGAs. However, recently FPGAs use wire segment with length 4 or even 8 for smaller device area and higher performance. For FPGAs using wire length 4, using universal switch block and Wilton switch block consumes larger device area compared to using SUBSET switch block due to the fact that these two switch blocks have more possible connections for internal population of wire segments. The idea of this paper is simple. A new switch block named Imran switch block combining SUBSET and Wilton block is presented. Imran switch block uses Wilton switch block topology for those wire segments ending at a switch block to provide more flexible routability. And for those wire segments crossing through a switch block, Subset topology is used for area concern. Experimental results shows that for 0.35um technology node and wire length 4 FPGAs, the interconnect area is reduced by 13% at cost of 1.5% performance lost using Imran siwtch block compared to using Subset switch block. [5] G. Lemieux and D. Lewis, "Analytical Framework for Switch Block Design", FPL 2002. This paper presents an analytical framework to design switch block topology. Shifty, universal-TG, diverse and diverse-clique switch block are presented. Although the new switch block has up to 100% diversity improvement compared to Subset switch block, the channel width/area reduction using new switch blocks verified by VPR is not so promising. The best switch block reduces channel width and area only by approximately 2.5% and 3%, respectively. II. Circuit/Architecture Design [6] G. Lemieux and D. Lewis, "Circuit Design of Routing Switches", FPGA'02. The first part of this paper focuses on transistor level circuit design for typical buffer-based routing switch. Keeper instead of gate boosting is used to avoid excessive leakage due to NMOS pass transistor signal degradation. In this part, buffer/pass transistor sizing, buffer construction, impact of slow input slew-rate, buffer selection etc are studied. The second part which is the most insteresting part presents two new fanin-based switches, mux-based design and flatterned pass transitor based design. Firstly, by using fanin-based switches, min-size NMOS transitor can be used as the pass transistor. In constrast, in fanout-based switches, large size NMOS transistor has to be used. Thus, area can be potentially reduced. Secondly, using mux-based design, large fan-in can be afforded in terms of SRAMs. In constrast, fanout based cannot afford large fanout as each possible fanout needs an SRAM. Thirdly, fan-in based switches are ideally for output pin merging technique. Typical output connection block uses one shared large buffer driving a large pass transistor with an SRAM for each possible fannout. Using fan-in based routing switches, logic block output pin can be directly connected to fan-in of routing switches. For mux-based switch, only two min-size pass transistors are needed for each possible logic block fanout. No extra SRAM is needed when the routing switch block's Fs is 3. For pass transitor based switch, one min-size pass transistor with one SRAM are needed for each possible fanout. Pass transistor based designed is not as attracted as mux-based design in terms of output pin merging. The third part of the paper studies the effect of replacing some pass transistor with new switches. The baseline uses mix of pass transitor and buffer in interconnects. However, the following work [7] shows that 100% direct drive mux switch can lead to smallest area and highest performance. [7] D. Lewis et al, "The Stratix Routing and Logic Architecture", FPGA'03. This paper presents the Altera Stratix routing and logic architecture. The interesting part is that using direct drive mux switch leads to smaller area and higher performance. However, it's unclear if compared with fully buffered interconnects as the baseline used in this paper is mixed buffer and pass transistor structure. [8] A. Rahman and V. Polavarapuv, "Evaluation of Low-Leakage Design Techniques for Field Programmable Gate Arrays", FPGA'04. This paper assumes the same direct drive mux switch as that presented in [7]. The fan-in multiplexer might consumes leakage depending on input vectors. The paper presents techniques including using redundent SRAMs for multiplexer, dual-Vt, body biasing and gate biasing to reduce leakage. This is interesting as the same situation might happen in our multiplexer-based input connection block. Leakage might be consumed for those unused buffers in front of multiplexer even if they are power-gated. However, our power model cannot handle these situation as multiplexer are modeled lumped capacitance. [9] J. Anderson and F. Najm, "Low-power Programmable Routing Circuitry for FPGAs", ICCAD 04. This paper presents programmable dual-Vdd techniques for direct drive mux routing switch in [7]. Used switch can be programmed as high-speed or low-power. And unused one can be power-gated. Only one power-rail is required as low-Vdd is provided by degrading Vdd by Vt using NMOS power transitor.