|
|
||||||||||||
|
|
Interconnect-centric Physical Synthesis for Performance, Power and Variation Tolerance
Clock synthesis considering time-variant temperature: The existing temperature-aware clock embedding assumes a time-invariant temperature gradient. However, it is not solved how to find the worst-case temperature gradient leading to the worst case skew. In [C100] and [C106], we develop a PErturbation based Clock Optimization (PECO) considering the timevariant temperature gradient. For a given clock topology, we minimize the worst case skew without asking for the worst case temperature map. We decide the merging point level by level based on the sensitivity of the skew with respect to the change of merging point. Such sensitivity is calculated using a parameterized model, which is compressed by a singularvalue- decomposition (SVD) and K-means based clustering considering the temperature correlation. The experimental results show that our algorithm reduces worst-case skew by up to 5X compared to the existing zero skew based ZST/DME method with small (up to 1%) wirelength overhead. Temperature aware microprocessor floorplanning: We studies microprocessor floorplanning considering thermal and throughput optimization [C112]. We first develop a stochastic heat diffusion model taking into account the application dependent power load for thermal analysis. Then, we design the floorplanning algorithm based on this model. Experimental results show that, compared with the deterministic heat diffusion model, our model obtains up to 3.2oC reduction of the on-chip peak temperature, 1.25% reduction of the area, and 1.125x better CPI (cycles per instruction) performance, respectively. Compared with temperature aware floorplanning in the HOTSPOT tool set that ignores interconnect pipelining, our algorithm is up to 27x faster, reduces the peak temperature by up to 3oC, and also reduces CPI significantly with a negligible area overhead. Interconnect modeling for signal integrity: We proposed to control the noise between inductively coupled interconnects by simultaneous shield insertion and net ordering (SINO) [C13], [J12] . Specifically, for net segments within a routing region, we characterize the inductive coupling effects between two net segments via their inductive coupling coefficient (called K-model) and the effective K model (Keff model) as the figure of merit for the total amount of inductive noise induced on the net segment. Compared to the three-dimensional field solver FastHenry, the K-model is reasonably accurate (within a 20% to 10% error range) and tends to be conservative. Moreover, the Keff model is shown to have high fidelity when compared to the SPICE calculated RLC noise for a SINO solution with a fixed wire length. Moreover, we developed an efficient yet accurate shielding estimation formula for any SINO solution without actually carrying out the SINO algorithm. To consider full chip level inductive coupling effects, we proposed a length scaled Keff model (LSK model) [C28] , [J10] to model the worst case noise for routing solutions with simultaneous shield insertion and net ordering structures. Moreover, noting the fact that shields are interconnects connected to the power network directly through vias instead of device and are indeed part of the power network, therefore, we proposed to include shielding into the power network design loop to manage the scant routing resource simultaneously for both power integrity and signal integrity [C43] , [C46] . Key to this approach is a simple yet accurate power net estimation formula that decides the minimum number of power nets needed to satisfy both power and signal integrity constraints prior to detailed layout. A probabilistic routing congestion estimation model considering shielding effect was also proposed in [C57] to help achieve design closure faster by considering crosstalk effect in the early routing stage. Optimization of inductively coupled interconnects: We studied the interconnect optimization problem for both signal integrity and power integrity. One of the key aspects of these projects is models with high accuracy or fidelity but still easy to compute during physical design, like the Keff model, length scaled Keff model, closed-form shield estimation, and the probabilistic congestion estimation. We formulated a full-chip routing optimization problem with RLC crosstalk budgeting, and solved this problem with a multiphase algorithm. In phase I, we solved an optimal RLC crosstalk budgeting based on linear programming to partition crosstalk bounds at sinks into bounds for net segments in routing regions. In phase II, we performed simultaneous shield insertion and net ordering to meet the partitioned crosstalk bounds in each region. In phase III, we carried out a local refinement procedure to reduce the total number of shields. Compared with the best alternative approach in experiments, the proposed algorithm reduces the total routing area by up to 5.71% and uses less runtime [C28] , [J10] . We studied an extended global routing problem with RLC crosstalk constraints. The key algorithm phase is global routing synthesis with shield reservation and minimization based on prerouting shield estimation. Experiments using large industrial benchmarks show that compared to the best alternative with postrouting shield insertion and net ordering, the proposed algorithm with shield reservation and minimization reduces the congestion by 18.4% with a smaller runtime [J15] . We also proposed an extended probabilistic congestion model to consider shielding for crosstalk reduction. We showed that our extended probabilistic congestion model considering shielding enables shielding reservation and minimization for routing and achieves routing congestion (or area) reduction by 47.7% (or 31.0%) on average under the given routing area (or congestion) constraints, all compared to the existing deterministic congestion model [C57]. Conventional physical design flow separates the design of power network and signal network. Such a separated approach results in slow design convergence for wire limited deep sub-micron designs. Therefore, we proposed a novel design methodology that simultaneously considers global signal routing and power network design under integrity constraints in [C43] , [C46] . The proposed design methodology is a one-pass solution to the co-design of power and signal networks in the sense that no iteration between them is required in order to meet design closure. Experiment results using large industrial benchmarks show that compared to the state-of-the-art alternative design approach, the proposed method can reduce the power network area by 19.4% on average under the same signal and power integrity constraints with better routing quality, yet use less runtime. Interconnect modeling and design with process variations - we demonstrated how much systematic CMP variation and random device variation can impact interconnect performance in [C55] (an invited talk) and [C63]. We further presented buffered interconnect synthesis with simultaneous fill insertion considering the above two types of variations at ISPD’05 [C67] and submitted the completed result to TCAD, which is currently under review. [J31] and [C82] revealed a few properties of comparing random variables and then developed an efficient pruning algorithm for random variables. The algorithm has a linear time complexity same as pruning deterministic variables, in constrast to the earlier work which aimed at accurate pruning using surface integrals at the expense of super-linear complexity [C64]. We further solved the buffer insertion problem considering device variations with spatial correlation, capable of handling 100x bigger designs compared to the existing non-deterministic methods. Dual-Vdd interconnect synthesis for power reduction - dual-Vdd technique has been used to reduce logic power, but was not used in interconnect synthesis due to power overhead associated with Vdd level converters. Enforcing that no low-Vdd buffers drive high-Vdd buffers within a routing tree, we showed that dual-Vdd can be used inside a tree without using level converters. Compared to the existing power-optimal single-Vdd buffer insertion for the same target delay, our algorithm reduces power by 23% and runs 17x faster. Considering routing obstacles and preserved buffer stations, we further apply dual-Vdd buffering to routing construction for extra design freedom of power reduction. The result will appear as the first paper in session 30 at DAC’05 [C69]. In addition, our newest buffering and routing algorithms [C98] are able to achieve 50x speedup compared to those in [C69]. Furthermore, we have studied simultaneous FF and buffer insertion for power reduction [C47]. Chip-level interconnect power estimation - we derived an analytical repeater insertion method, which finds the optimal repeater insertion lengths, repeater sizes and Vdd and Vt levels for a net with a delay target. It reduces more than 50% power over a previous work without considering Vdd and Vt optimization. We then studied the impacts of using multiple Vdd and Vt levels at the full-chip level. We presented how to select multiple Vdd and Vt levels for interconnect power reduction. Compared to the case with single Vdd and Vt levels suggested by ITRS, optimized dual-Vdd and dual-Vt can reduce overall global interconnect power by up to 47%, but extra Vdd or Vt levels only give marginal improvement. We also showed that an optimized single Vt can reduce interconnect power almost as effective as dual-Vt does, in contrast to the need of dual-Vt for logic as claimed by previous work. This work has been presented in ISLPED’05 [C73]
|