## 1. TIME SLACK ALLOCATION WITH WIRE SEGMENTS OF DIFFERENT LENGTHS

we use the net-based formulation which partitions the constraints on path delay into constraints on delay across circuit elements or routing. Let $a(v)$ be the arrival time for vertex $v$ in $\mathcal{G}$ and the timing constraints for PI and PO are as below,

$$
\begin{array}{cl}
a(v) \leq T_{\text {spec }} & \forall v \in \mathcal{P O} \\
a(v)=0 & \forall v \in \mathcal{P} \mathcal{I} \tag{2}
\end{array}
$$

For the edges corresponding to routing in $\mathcal{G}$, the constraints considering slack can be expressed as

$$
\begin{gather*}
a\left(p_{i 0}\right)+d\left(p_{i 0}, p_{i k}\right)+S_{i k} \leq a\left(p_{i k}\right) \\
0 \leq i<N_{r} \wedge \forall p_{i k} \in \mathcal{F} \mathcal{O}_{p_{i 0}} \tag{3}
\end{gather*}
$$

where vertex $p_{i 0}$ is the source of $\mathcal{R}_{i}$ in $\mathcal{G}$, vertex $p_{i k}$ is $k^{\text {th }}$ sink of $\mathcal{R}_{i}$ in $\mathcal{G}, S_{i k}$ is the slack allocated to $k^{t h} \operatorname{sink}$ in $\mathcal{R}_{i}$ and $d\left(p_{i 0}, p_{i k}\right)$ is the delay from $p_{i 0}$ to $p_{i k}$ in $\mathcal{R}_{i}$ using VddH. For the edges other than routing in $\mathcal{G}$, the constraints can be expressed as

$$
\begin{equation*}
a(u)+d(u, v) \leq a(v) \quad \forall u \in \mathcal{V} \wedge u \notin \mathcal{S R C} \wedge v \in \mathcal{F} \mathcal{O}_{u} \tag{4}
\end{equation*}
$$

where $\mathcal{S R C}$ is a subset of $\mathcal{V}$ and gives the set of vertices corresponding to routing tree sources.

There is an upper bound for slack, which is the delay increase when VddL is assigned to all the switches in a tree. Clearly, slack more than the upper bound cannot lead to more VddL switches. We define the useful slack of each routing tree sink as the slack less than this upper bound. For the rest part of the paper, we use slack to represent the useful slack. The slack upper bound constraints can be expressed as

$$
\begin{equation*}
0 \leq S_{i k} \leq D_{i k} \quad 0 \leq i<N_{r} \wedge 1 \leq k \leq N_{k}(i) \tag{5}
\end{equation*}
$$

where $N_{k}(i)$ is the number of sinks in $\mathcal{R}_{i}$ and $D_{i k}$ is the delay increase of the path from the source to $k^{t h} \operatorname{sink}$ in $\mathcal{R}_{i}$ when VddL is assigned to all the switches in that path.

Given a routing tree with arbitrary topology and allocated slack for each sink, we need to estimate power reduction that can be achieved. We use $l_{i k}$ represent the number of switches in the path from the source to $k^{t h}$ sink in $\mathcal{R}_{i}$. We first transform slack $S_{i k}$ into $s_{i k}$, which is expressed in number of switches as follows,

$$
\begin{equation*}
s_{i k}=\frac{S_{i k}}{D_{i k}} \cdot l_{i k} \tag{6}
\end{equation*}
$$

We then estimate the VddL switch number that can be achieved using $s_{i k}$. We use $C_{i k}$ to represent the total load capacitance of the switches in the path from the source to $k^{t h} \operatorname{sink}$ in $\mathcal{R}_{i}$. We use $c_{i j}$ to represent the load capacitance of $j^{t h}$ switch in $\mathcal{R}_{i}$. We define sink list $\mathcal{S} \mathcal{L}_{i j}$ as the set of sinks in the fanout cone of $j^{\text {th }}$ switch
in $\mathcal{R}_{i}$. We then estimate the number of VddL switches that can be achieved given the allocated slack as

$$
\begin{equation*}
F_{n}(i)=\sum_{j=0}^{N_{s}(i)-1} \min \left(\frac{s_{i k}}{C_{i k}} \cdot c_{i j}: \forall k \in \mathcal{S} \mathcal{L}_{i j}\right) \tag{7}
\end{equation*}
$$

To estimate the number of VddL switches that can be achieved in tree $\mathcal{R}_{i}$, we first deliberately distribute the slack $s_{i k}$ to the switches in the path from source to $k^{t h}$ sink in $\mathcal{R}_{i}$ considering the load capacitance of each switch. For a switch with multiple sinks in its fanout cone, we choose the minimum $s_{i k} c_{i j} / C_{i k}$ as the slack distributed to the switch. We then add the slack distributed to all the switches in $\mathcal{R}_{i}$. and get the estimated number of VddL switches. The rationale is that we consider $k^{t h}$ sink with minimum $s_{i k} c_{i j} / C_{i k}$ in sink list $\mathcal{S}_{i j}$ as the most critical sink to $j^{t h}$ switch in $\mathcal{R}_{i}$.

We then estimate the power reduction of $\mathcal{R}_{i}$. The dynamic power reduction of the tree $\mathcal{R}_{i}$ is estimated as the sum of the dynamic power reduction of each switch in $\mathcal{R}_{i}$ and can be expressed as,
$P_{d r}(i)=0.5 f_{c l k} \cdot \Delta V d d^{2} \cdot f_{s}(i) \sum_{j=0}^{N_{s}(i)-1}\left[\min \left(\frac{s_{i k}}{C_{i k}} \cdot c_{i j}: \forall k \in \mathcal{S} \mathcal{L}_{i j}\right) \cdot c_{i j}\right]$
The leakage power reduction of $\mathcal{R}_{i}$ is also the sum of the leakage power reduction of each switch in $\mathcal{R}_{i}$ and can be expressed as,

$$
\begin{equation*}
P_{l r}(i)=\sum_{j=0}^{N_{s}(i)-1}\left[\min \left(\frac{s_{i k}}{C_{i k}} \cdot c_{i j}: \forall k \in \mathcal{S} \mathcal{L}_{i j}\right) \cdot \Delta P_{s}(i, j)\right] \tag{9}
\end{equation*}
$$

where $\Delta P_{s}(i, j)$ is the leakage power difference of $j^{\text {th }}$ switch in $\mathcal{R}_{i}$ between VddH and VddL. Wire segments with different lengths might be driven by switches with different sizes. The objective is to maximize power reduction given by the sum of (8) and (9). To incorporate (8) and (9) into mathematical programming, we introduce a variable $f_{n}(i, j)$ for $j^{\text {th }}$ switch in $\mathcal{R}_{i}$ and some additional constraints. The new objective function after transformation plus the additional constraints can be expressed as

$$
\begin{array}{r}
\operatorname{Maximize} \sum_{i=0}^{N_{r}-1} 0.5 f_{c l k} \Delta V d d^{2} \sum_{j=0}^{N_{s}(i)-1} f_{n}(i, j) f_{s}(i, j) c_{i j} \\
+\sum_{i=0}^{N_{r}-1} \sum_{j=0}^{N_{s}(i)-1} f_{n}(i, j) \Delta P_{s}(i, j) \tag{10}
\end{array}
$$

s.t.
$f_{n}(i, j) \leq \frac{s_{i k}}{C_{i k}} c_{i j} \quad 0 \leq i<N_{r} \wedge 0 \leq j<N_{s}(i) \wedge \forall k \in \mathcal{S L}(\nmid \nmid 1)$
We formulate the time slack allocation problem using objective function (10), additional constraints (11), slack upper bound constraints (5), plus timing constraints (1), (2), (3) and (4). It is
easy to verify that (1), (2), (5) ~(4) and (11) are linear, and the objective function (10) is linear too. Hence we have the following theorem.

Theorem 1. The time slack allocation problem is a linear programming (LP) problem.

Therefore, we can first solve the extended time slack allocation problem and then perform net-level assignment and refinement for the interconnects using wire segments of different lengths.

## 2. STATISTICAL DELAY AND POWER MODELS

We consider the variation in threshold voltage ( $V_{t h}$ ), effective channel length ( $L_{e f f}$ ), and gate oxide thickness ( $T_{o x}$ ). Similar to [1] where ASIC is assumed, each variation $(\Delta P)$ is decomposed into global (die-to-die) variation ( $\Delta P_{g}$ ) and local (within-die) variation $\left(\Delta P_{l}\right)$. Both global and local variations are modeled as normal random variables. To make presentation simple, we denote $\Delta L_{e f f}$, $\Delta V_{t h}$ and $\Delta T_{o x}$ as $L, V$ and $T$, respectively. $L, V$ and $T$ can be decomposed into local ( $L_{l}, V_{l}, T_{l}$ ) and global ( $L_{g}, V_{g}, T_{g}$ ) components.

Power reduction of an interconnect switch by changing the Vddlevel from VddH to VddL considering process variation is presented as below,

$$
\begin{array}{r}
\Delta P=\Delta P_{0} e^{-c_{1} L} e^{-c_{2} V} e^{-c_{3} T} \\
\Delta E=\Delta E_{0}+\frac{\partial \Delta E}{\partial V} \Delta V=\Delta E_{0}+c_{4} \Delta V \tag{13}
\end{array}
$$

$\Delta P$ is leakage power reduction, $\Delta E$ is dynamic energy per signal switch reduction when driving unit load capacitance. $c_{1}, c_{2}, c_{3}$ and $c_{4}$ are curve fit parameters. $\Delta P_{0}$ and $\Delta E_{0}$ are the nominal values. $\Delta P$ exponentially depends on $L, V$ and $T$, and it can be modeled as a lognormal (LN) random variable. We assume a first-order Taylor expansion of dynamic energy reduction $\Delta E$ is adequate, which therefore can be modeled as a normal random variable. A firstorder Taylor expansion of circuit element delay is also assumed as adequate,

$$
\begin{equation*}
d=d_{0}+\frac{\partial d}{\partial L} \Delta L+\frac{\partial d}{\partial V} \Delta V=d_{0}+c_{5} L+c_{6} V \tag{14}
\end{equation*}
$$

where $d_{0}$ is the nominal value of circuit element delay, $c_{4}$ and $c_{5}$ are curve fit parameters. Delay $d$ is a normal random variable under this model.

## 3. STATISTICAL POWER OPTIMIZATION

### 3.1 Chip-level Time Slack Allocation

To handle the variability of process parameters, we reformulate the time slack allocation problem as a robust linear program problem which can be solved efficiently using interior-point convex methods [2]. We first the objective function (10) into an equivalent formulation as below, which places the power reduction into the constraint set

$$
\begin{equation*}
\text { Miminize } \sum_{i=0}^{N_{r}-1} \sum_{k=0}^{N_{k}(i)-1} S_{i k} \tag{15}
\end{equation*}
$$

s.t.

$$
\begin{array}{r}
\sum_{i=0}^{N_{r}-1} \sum_{j=0}^{N_{s}(i)-1} \Delta P_{d}(i, j) f_{n}(i, j) \geq \Delta \hat{P}_{d y n} \\
\Delta P_{d}(i, j)=\Delta E c_{i j} f_{c l k} f_{s} i, j \\
\sum_{i=0}^{N_{r}-1} \sum_{j=0}^{N_{s}(i)-1} \Delta P_{s}(i, j) f_{n}(i, j) \geq \Delta \hat{P_{l e a k}} \tag{17}
\end{array}
$$

where $\Delta \hat{P}_{d y n}$ and $\Delta \hat{P_{l e a k}}$ are the optimal dynamic and leakage power reduction achieved by the time slack allocation problem without considering process variation.

### 3.2 Net-level Bottom-up Assignment

## 4. REFERENCES

[1] R. Rao, A. Devgan, D. Blaauw, and D. Sylvester, "Parametric yield estimation considering leakage variability," in Proc. Design Automation Conf., June 2004.
[2] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, 2004.

