Weekly report 	Yan Lin 2/5/05

Work 1 on direction biased routing

I read one related paper "The Stratix Routing and Logic Architecture" from
Altera. The paper presents the idea of direct driven muxes which lead to
faster and smaller architecture. By replacing some pass transistors with
the direct driven muxes, the system performance can be improved. Although 
using direct driven muxes may increase channel width due to more routing stress, 
it may still reduce routing area due to smaller pass transitor size. 
However, the authors do not discuss the details of sizing issue as well as
the detailed routing architecture including distribution of wire directions,
switch block pattern etc. I went through the Stratix Device Handbook and
Stratix II Device Handbook and found out Altera uses MultiTrack and 
DirectDrive technology while they are not discussed in details. And I did
some search both in literature and US patent & trademark office but cannot
find MultiTrack and DirectDrive. It seems that they out not open to public.

Work 2 on TVLSI revision

(1) Study Vdd-programmable/Vdd-gateable connection block using HSPICE (with Phoebe)
For 4X connection switch with 1X PMOS sleep transistor, the delay is
reduced by 28% compared to typical connection block using multiplexer.
And the switching energy is smaller using Vdd-programmable/Vdd-gateable
connection block. Therefore, we do not need to resize buffers in connection
block.

(2) Study Vdd-programmable/Vdd-gateable routing switch (with Phoebe)
For 7X routing switch with 4X PMOS sleep transistor, the delay overhead
is 16%. And the delay overhead is 11% if using 7X PMOS sleep transistor.
Considering that Vdd-programmable/Vdd-gateable connection switch can reduce
delay, 4X PMOS sleep tranistor is adopted to reduce area overhead.
The overall performance degrades 3.3%. 

(3) Study area overhead breakdown
For FPGA architecture (N,k) = (10, 4) using Vdd-programmable CLBs and Vdd-gateable 
interconnects (Class2), the overall area overhead for circuit alu4 is 20%, where the 
breakdown is as following,

Overall 20%
Level converters 1.5%
Power Transistors/SRAMs for Vdd-programmable CLBs 2.5%
Connection blocks 12%
-          SRAMs 1.5%
-          Power Transistors 3.5%
-          Control logics including decoder and Nand2 gates 8%
Routing switches 4%
-          Power Transistors 4%

For Vdd-programmable FPGA with fine-grained level converter inserted (Class1), the overall 
area overhead is 143%. For Vdd-programmable FPGA without LCs (Class3), the area overhead is 62%. 
As a remainder, the area overhead for Class1, Class2 and Class3 in our submission are 178%, 48% 
and 124%, respectively.

For Class3, the area overhead breakdown is as following,

Overall 62%
Level converters 1.5%
Power Transistors/SRAMs for Vdd-programmable CLBs 2.5%
Connection blocks 41%
-          SRAMs 1.8%
-          Power Transistors 7.5%
-          Control logics including decoder and Nand2 gates 31%
Routing switches 17%
-          SRAMs 6.6%
-          Power Transistors 8%
-          Nand2 gates 2.3%

Another insight is that by using region-based Vdd-progrommable/Vdd-gateable routing switch/connection block, 
we can only reduce the area overhead due to SRAMs and Power Transitors but not decoders which introduce large 
area overhead ( 8% out of 20% for Vdd-gateable interconnects and 31% out of 62% for Vdd-programmable interconnects).

(4) Ongoing work
-	Try to reduce area overhead due to control logic, i.e., decoders and NAND2 gates. 
-	Run VPR and PSim for each FPGA Class.
-	Start to work on the wrinting part