**Interconnect Optimization for Deep-Submicron and Giga-Hertz ICs** 

> Lei He he@ece.wisc.edu http://eda.ece.wisc.edu

# Outline

n Background and overview

#### n LR-based STIS optimization

- u LR -- local refinement
- **u** STIS -- simultaneous transistor and interconnect sizing

#### n Conclusions and future works

### **Upcoming Design Challenges**

#### **n** Microprocessors for server computers

- u 1998 -- 0.25um, 7.5M FETs, 450MHz
- u 2001 -- 0.18um, ~100M FETs, >1GHz
  - ▶ close to tape-out
- u 2005 -- 0.10um, ~200M FETS, ~3.5GHz
  - ▶ launch design in 2003
  - ▶ begin developing design tools in 2001
  - ▶ start research right now

#### We are moving faster than Moore's Law

#### **Critical Issue: Interconnect Delay**



- n Starting from 0.25um generation, circuit delay is dominated by interconnect delay
- **n** Efforts to control interconnect delay
  - u Processing technology: Cu and low K dielectric
  - u Design technology:

interconnect-centric design

# Layout Design: Device-Centric versus Interconnect-Centric



### **Interconnect Optimization**

# **Device locations and constraints:**

- Delay
- Power
- Signal integrity
- Skew

...



- Other critical optimizations: buffer insertion, simultaneous device and interconnect sizing ...
- Automatic solutions guided by accurate interconnect and device models

# **UCLA TRIO Package**

#### n Integrated system for interconnect design

u http://cadlab.cs.ucla.edu/~trio

#### **n** Efficient polynomial-time optimal/near-optimal algorithms

- u Interconnect topology optimization
- u Optimal buffer insertion
- u Optimal wire sizing
- u Wire sizing and spacing considering Cx
- u Simultaneous device and interconnect sizing
- u Simultaneous topology generation with buffer insertion and wiresizing

#### n Accurate interconnect models

- u 2 -1/2 D capacitance model
- u 2 -1/2 D inductance model
- **u** Elmore delay and higher-order delay models
- **•** Improve interconnect performance by up to 7x !
  - u Used in industry, e.g., Intel

# **Contributions to UCLA TRIO Package**

#### n Integrated system for interconnect design

u http://cadlab.cs.ucla.edu/~trio

#### **n** Efficient polynomial-time optimal/near-optimal algorithms

- u Interconnect topology optimization
- u Optimal buffer insertion
- u Optimal wire sizing [ICCAD'95, TODAES'96]
- u Wire sizing and spacing considering Cx [ICCAD'97, TCAD'99]
- **U** Simultaneous device and interconnect sizing [ICCAD'96, ISPD'98, TCAD'99]
- u Simultaneous topology generation with buffer insertion and wiresizing
- n Accurate interconnect models
  - u 2 -1/2 D capacitance model [DAC'97] (with Cadence)
  - u 2 -1/2 D inductance model [CICC'99] (with HP Labs)
  - **U** Elmore delay and higher-order delay models
- **•** Improve interconnect performance by up to 7x !
  - u Used in industry, e.g., Intel

# Outline

n Background and overview

n LR-based STIS optimization u Motivation for LR-based optimization

**n** Conclusions and future works

# **Discrete Wiresizing Optimization**

[Cong-Leung, ICCAD'93]

- n Given: A set of possible wire widths {  $W_1, W_2, ..., W_r$  }
- n Find: An optimal wire width assignment to minimize weighted sum of sink delays



### **Dominance Relation and Local Refinement**



**Dominance Relation** u For all  $E_j$ ,  $w(E_j)^{\mathfrak{s}}w'(E_j)$  **B** *W* dominates *W'* (*i.e.*,  $W^{\mathfrak{s}}W'$ )

#### n Local refinement (LR)

- LR for E<sub>1</sub> to find an optimal width for E<sub>1</sub>, assuming widths for other wires are fixed with respect to current width assignment
- U Single-variable optimization can be solved efficiently

#### **Dominance Property for Discrete Wiresizing** [Cong-Leung, ICCAD'93]

n If solution W dominates optimal solution W\* W' = local refinement of W Then, W' dominates W\*

n If solution W is dominated by optimal solution W\* W' = local refinement of W Then, W' is dominated by W\*

A highly efficient algorithm to compute tight lower and upper bounds of optimal solution

# **Bound Computation based on Dominance Property**

- **n** Lower bound computed starting with minimum widths
  - **u** LR operations on all wires constitute a pass of bound computation
  - u LR operations can be in an arbitrary order
  - **u** New solution is wider, but is still dominated by the optimal solution



- O Upper bound is computed similarly, but beginning with maximum widths
- **n** We alternate lower and upper bound computations
  - **u** Total number of passes is linearly bounded by size of solution space
- **n** Optimal solution is often achieved in experiments

#### **Other Problems Solved by LR operation**

#### Why does LR operation work?

- n Multi-source discrete wiresizing [Cong-He, ICCAD'95]/
  - u Bundled-LR is proposed to speed up LR by a factor of 100x
- n Continuous wiresizing [Chen-Wong, ISCAS'96]
  - **u** Linear convergence is proved [Chu-Wong, TCAD'99]
- n Simultaneous buffer and wire sizing [Chen-Chang-Wong, DAC'96]
  - **u** Lagrangian relaxation is proposed to minimize max delay
    - **via a sequence of weighted delay minimizations**
  - u Extended to general gates and multiple nets [Chu-Chen-Wong, ICCAD'98]

# Outline

**n** Background and overview

n LR-based STIS optimization
 u Motivation of LR-based optimization
 u Simple CH-program and application to STIS problem

**n** Conclusions and future works

#### Simple CH-function [Cong-He, ICCAD'96, TCAD'99]

n 
$$f(X) = \sum_{p=0}^{m} \sum_{q=0}^{m} \sum_{i=1}^{n} \sum_{j=1, j \neq i}^{n} \left(\frac{a_{pi}}{x_{i}^{p}}\right) \cdot \left(b_{qj} \cdot x_{j}^{q}\right)$$

#### is a simple CH-function

- **U** Variables  $x_i$  and  $x_j$  are positive, either continuous or discrete
- U Coefficients  $a_{pi}$  and  $b_{qi}$  are positive constants
- **n** Examples:

$$= ax^{2} + \frac{b}{x} + cy^{3}, g = \sum (a_{ij} \cdot \frac{x_{i}}{x_{j}})$$

**n** It includes the objective functions for a number of works

- U Discrete or continuous wire sizing [Cong-Leung, ICCAD'93][Cong-He, ICCAD'95][Chen-Wong,ISCAS'96]
- U Simultaneous device and wire sizing [Cong-Koh, ICCAD'94][Chen-Chang-Wong, DAC'96][Cong-Koh-Leung, ILPED'96][Chu-Chen-Wong, ICCAD'98]

### **Dominance Property for Simple CH-Program**

- Optimization problem to minimize a simple CHfunction is a simple CH-program.
- n The dominance property holds for simple CH-program w.r.t. the LR operation.
  - If X dominates optimal solution X\*
     X' = local refinement of X
     Then, X' dominates X\*
  - If X is dominated by optimal solution X\*
     X'=local refinement of X
     Then, X' is dominated by X\*
- **n** LR operation can be used for all simple CH-programs

### Simple CH-function [Cong-He, ICCAD'96, TCAD'99]

n 
$$f(X) = \sum_{p=0}^{m} \sum_{q=0}^{m} \sum_{i=1}^{n} \sum_{j=1, j \neq i}^{n} \left(\frac{a_{pi}}{x_i^p}\right) \cdot \left(b_{qj} \cdot x_j^q\right)$$

#### is a simple CH-function

- U Variables  $x_i$  and  $x_j$  are positive
- U Coefficients  $a_{pi}$  and  $b_{qj}$  are positive constants

#### **n** Examples:

$$U \quad f = ax^{2} + \frac{b}{x} + cy^{3}, g = \sum \left( \frac{a_{ij} \cdot \frac{x_{i}}{x_{j}}}{x_{j}} \right)$$

**Unified and efficient solution** 

#### **n** It includes the objective functions for a number of works

- U Discrete or continuous wire sizing [Cong-Leung, ICCAD'93][Cong-He, ICCAD'95][Chen-Wong,ISCAS'96]
- U Simultaneous device and wire sizing [Cong-Koh, ICCAD'94][Chen-Chang-Wong, DAC'96][Cong-Koh-Leung, ILPED'96][Chu-Chen-Wong, ICCAD'98]

# **General Formulation: STIS Simultaneous Transistor and Interconnect Sizing**



- Given: Circuit netlist and initial layout design
- **Determine:** Discrete sizes for devices/wires
- Minimize: **a** Delay + **b** Power + **g**Area
- It is the first publication to consider simultaneous device and wire sizing for complex gates and multiple paths

#### **STIS Objective for Delay Minimization**

$$f(X) = \sum_{i,j} F(i,j) \bullet \frac{R_0(i)}{x_i} \bullet C_0(j) \bullet x_j + \sum_{i,j} F(i,j) \bullet \frac{R_0(i)}{x_i} \bullet C_1(j)$$

$$+ \sum_i G(i) \bullet \frac{R_0(i)}{x_i} + \sum_i H(i) \bullet \frac{R_0(i)}{x_i} \bullet C_1(i)$$

$$u R_0: \qquad \text{unit-width resistance}$$

$$u C_0: \qquad \text{unit-width area capacitance}$$

$$u C_1: \qquad \text{effective-fringing capacitance}$$

$$u X = \{x_1, x_2, ..., x_n\}: \quad \text{discrete widths and variables for optimization}$$

- **n** It is a simple CH-function under simple model assuming  $R_0, C_0$  and  $C_1$  are constants
- n STIS can be solved by computing lower and upper bounds via LR operations
  - u Identical lower and upper bounds often achieved

# **SPICE-Delay reduction of LR-Based STIS**

- n STIS optimization versus manual optimization for clock net [Chien-et al.,ISCC'94]:
  - u 1.2um process, 41518.2 um wire, 154 inverters
- **n** Two formulations for LR-based optimization
  - u sgws simultaneous gate and wire sizing
  - u stis simultaneous transistor and interconnect sizing

|                 | manual | sgws          | stis          |
|-----------------|--------|---------------|---------------|
| max delay (ns)  | 4.6324 | 4.34 (-6.2%)  | 3.96 (-14.4)  |
| power(mW)       | 60.85  | 46.1 (-24.3%) | 46.3 (-24.2%) |
| clock skew (ps) | 470    | 130 (-3.6x)   | 40 (-11.7x)   |

- n Runtime (wire segmenting: 10um)
  - uLR-basedsgws 1.18s, stis 0.88suHSPICE simulation~2100s in total

#### **STIS Objective for Delay Minimization**

$$t(X) = \sum_{i,j} F(i,j) \bullet \frac{R_0(i)}{x_i} \bullet C_0(j) \bullet x_j + \sum_{i,j} F(i,j) \bullet \frac{R_0(i)}{x_i} \bullet C_1(j)$$

$$+ \sum_i G(i) \bullet \frac{R_0(i)}{x_i} + \sum_i H(i) \bullet \frac{R_0(i)}{x_i} \bullet C_1(i)$$

$$U = R_0:$$

$$U = C_0:$$

$$U = C_0:$$

$$U = \{x_1, x_2, ..., x_n\}:$$

#### **Over-simplified for DSM (Deep Submicron) designs**

**It is a simple CH-function under simple model assuming**  $R_0$ ,  $C_0$  and  $C_1$  are constants

### **R**<sub>0</sub> is far away from a Constant!

| effective-resistance | R | for unit-width n-transistor |
|----------------------|---|-----------------------------|
|                      |   |                             |

| size = 100x                                                 |        |        | size = 400x  |                                                             |        |        |               |
|-------------------------------------------------------------|--------|--------|--------------|-------------------------------------------------------------|--------|--------|---------------|
| $\mathbf{c}_{\mathbf{l}} \setminus \mathbf{t}_{\mathbf{t}}$ | 0.05ns | 0.10ns | 0.20ns       | $\mathbf{c}_{\mathbf{l}} \setminus \mathbf{t}_{\mathbf{t}}$ | 0.05ns | 0.10ns | <b>0.20ns</b> |
| 0.225pf                                                     | 12200  | 12270  | <b>19180</b> | 0.501pf                                                     | 12200  | 15550  | 19150         |
| 0.425pf                                                     | 8135   | 9719   | 12500        | 0.901pf                                                     | 11560  | 13360  | 17440         |
| 0.825pf                                                     | 8124   | 8665   | 10250        | 1.701pf                                                     | 8463   | 9688   | 12470         |

- N Using more accurate model like the table-based device model has the potential of further delay reduction.
  - U But easy to be trapped at local optimum, and to be even worse than using simple model [Fishburn-Dunlop, ICCAD'85]

### Neither C<sub>0</sub> nor C<sub>1</sub> is a Constant

- **n** Both depend on wire width and spacing
  - **u** Especially  $C_1 = C_f + C_x$  is highly sensitive to spacing



- n C<sub>x</sub> accounts for >50% capacitance in DSM
  - **u** Proper spacing may lead to extra delay reduction
  - **But no existing algorithm for optimal spacing**



### **STIS-DSM Problem to Consider DSM Effects**



#### n STIS-DSM problem

uFind:device sizing, and wire sizing and spacing solutionoptimal w.r.t. accurate device model and multiple nets

#### n Easier but less appealing formulation: single-net STIS-DSM

- u Find: device sizing, and wire sizing and spacing solution optimal w.r.t. accurate device model and <u>a single-net</u>
- u Assume: its neighboring wires are fixed

# Outline

**n** Background and overview

### n LR-based STIS optimization

- u Motivation: LR-based wire sizing
- **u** Simple CH-program and application to STIS problem
- u Bounded CH-program and application to STIS-DSM problem
- **n** Conclusions and future works

# **Go beyond Simple CH-function**

$$f(X) = \sum_{p=0}^{m} \sum_{q=0}^{m} \sum_{i=1}^{n} \sum_{j=1}^{n} \left(\frac{a_{pi}(X)}{x_i^p}\right) \cdot \left(b_{qj}(X) \cdot x_j^q\right)$$

#### n It is a simple CH-function if $a_{pi}$ and $b_{qj}$ are positive constants

#### **n** It is a bounded CH-function if

- u  $a_{pi}$  and  $b_{qj}$  are arbitrary functions of X
- u *a<sub>pi</sub>* and *b<sub>qj</sub>* are positive and bounded

$$a_{pi}^{L} \leq a_{pi}(X) \leq a_{pi}^{U} \text{ and } b_{qj}^{L} \leq b_{qj}(X) \leq b_{qj}^{U}$$

#### n Examples:

$$u \quad f(x_1, x_2) = \frac{1}{\ln x_1} \cdot x_1 + \frac{x_2}{x_1}, x_1 > e$$

**u** Objective function for STIS-DSM problem

# **Extended-LR Operation**

**n** Extended-LR (ELR) operation is a relaxed LR operation

- u Replace  $a_{pi}$  and  $b_{qj}$  by its lower or upper bound during LR operation.
- u Make sure that the resulting lower or upper bound is always correct

**> but might be conservative.** 

- n **Example:**  $f(x) = a(x) \cdot x + \frac{b(x)}{x}$ 
  - U ELR for a lower bound is  $x_{ELR}^L = \sqrt{\frac{b^L}{a^U}}$

U ELR for an upper bound is  $x_{ELR}^U = \sqrt{b^U/a^L}$ 

$$x_{ELR}^{L} \le x^{*} \le x_{ELR}^{U}$$

ົ

## **General Dominance Property**

#### n Theorem ([Cong-He, ISPD'98, TCAD'99])

u Dominance property holds for bounded CH-program with respect to ELR operation

#### **General Dominance Property**

n Theorem ([Cong-He, ISPD'98, TCAD'99])

u Dominance property holds for bounded CH-program with respect to ELR operation

n **To minimize** 
$$f(X) = \sum_{p=0}^{m} \sum_{q=0}^{m} \sum_{i=1}^{n} \sum_{j=1}^{n} (\frac{a_{pi}(X)}{x_i^p}) \cdot (b_{qj}(X) \cdot x_j^q)$$

u If X dominates optimal solution X\*
 X' = Extended-LR of X
 Then, X' dominates X\*

If X is dominated by optimal solution X\*
 X'= Extended-LR of X
 Then, X' is dominated by X\*

### **Solution to STIS-DSM Problem**

#### n STIS-DSM can be solved as a bounded CH-program

- u Lower bound computed by ELR starting with minimum sizes
- **u** Upper bound computed by ELR starting with maximum sizes
- u Lower- and upper-bound computations are alternated to shrink solution space
- **u** Up-to-date lower and upper bounds of  $\mathbf{R}_0$ ,  $\mathbf{C}_0$  and  $\mathbf{C}_1$  are used
  - ▶ Uncertainty of R<sub>0</sub>, C<sub>0</sub> and C<sub>1</sub> is reduced when the solution space is shrunk
- n There exists an optimal solution to the STIS-DSM problem between final lower and upper bounds
  - u How large is the gap?

# **Gaps between Lower and Upper Bounds**

- **n** Two nets under 0.18um technology: *DCLK* and *2cm line* 
  - u STIS-DSM uses table-based device model and ELR operation
  - u STIS uses simple device model and LR operation
- **n** We compare average lower-bound width / average gap

|          | Tra       | nsistors  | Wires      |            |  |
|----------|-----------|-----------|------------|------------|--|
| DCLK     | STIS      | STIS-DSM  | STIS       | STIS-DSM   |  |
| sgws     | 5.39/0.07 | 13.0/1.91 | 2.50/0.003 | 2.78/0.025 |  |
| stis     | 17.2/1.53 | 21.6/2.36 | 2.69/0.017 | 2.82/0.030 |  |
| 2cm line | STIS      | STIS-DSM  | STIS       | STIS-DSM   |  |
| sgws     | 108/0.108 | 112/0.0   | 4.98/0.004 | 4.99/0.106 |  |
| stis     | 126/0.97  | 125/1.98  | 5.05/0.032 | 5.11/0.091 |  |

- n Gap is almost negligible
  - **u** About 1% of lower-bound width in most cases

### **Delay Reduction by Accurate Device Model**

#### n STIS-DSM versus STIS

- **u** STIS-DSM uses table-based device model and ELR operation
- u STIS uses simple device model and LR operation

| DCLK             | STIS                | STIS-DSM                 |
|------------------|---------------------|--------------------------|
| sgws             | 1.16 (0.0%)         | 1.08 (-6.8%)             |
| stis             | 1.13 (0.0%)         | 0.96 (-15.1%)            |
|                  |                     |                          |
| 2cm line         | STIS                | STIS-DSM                 |
| 2cm line<br>sgws | STIS<br>0.82 (0.0%) | STIS-DSM<br>0.81 (-0.4%) |

- **n** STIS-DSM achieves up to 15% extra reduction
- n Runtime is still impressive
  - Total optimization time ~10 seconds

# **Delay Reduction by Wire Spacing**

- n Multi-net STIS-DSM versus single-net STIS-DSM
  - u Test case:
    - ▶ 16-bit bus

▶ each bit is 10mm-long with 500um per segment

| pitch-spacing |            | runtime      |              |
|---------------|------------|--------------|--------------|
|               | single-net | multi-net    | multi-net    |
| <b>1.10um</b> | 1.31       | 0.79 (-39%)  | <b>2.0s</b>  |
| <b>1.65um</b> | 0.72       | 0.52 (-27%)  | <b>2.4</b> s |
| <b>2.20um</b> | 0.46       | 0.42 (-8.7%) | <b>2.3</b> s |
| 2.75um        | 0.38       | 0.36 (-5.3%) | <b>4.9</b> s |
| <b>3.30um</b> | 0.35       | 0.32 (-8.6%) | 7.7s         |

#### Multi-net STIS-DSM achieves up to 39% delay reduction

**E** Single-net STIS-DSM has a significant delay reduction if we compare it with previous wire sizing formulations

# Conclusions

- n Interconnect-centric design is the key to DSM and GHz IC designs
- Interconnect optimization is able to effectively control interconnect delay Valid for general problems
   Problem formulations should consider DSM effects
   I.e.g., LR-based optimization for STIS-DSM problem
- n More is needed to close the loop of interconnectcentric design
  - u Interconnect planning
  - u Interconnect optimization for inductance and noise
  - Interconnect verification, especially for patterndependent noise and delay

U ....