# Analytical Bound for Unwanted Clock Skew due to Wire Width Variation

Anand Rajaram, Bing Lu<sup>†</sup>, Wei Guo<sup>‡</sup>, Rabi Mahapatra<sup>\*</sup>, Jiang Hu

Dept. of EE, Texas A&M University, College Station, TX 77843

<sup>†</sup>Cadence Design Sys. Inc., 35 Spring Street, New Providence, NJ 07974. <sup>‡</sup>Institut Français du Pétrole, 1 et 4, avenue de Bois-Préau, 92852 Rueil-Malmaison - Cedax France \*Dept. of CS, Texas A&M University, College Station, TX 77843

(anandr,rabi,jianghu)@tamu.edu, bingl@cadence.com, wei.guo@ifp.fr

# ABSTRACT

Under modern VLSI technology, process variations greatly affect circuit performance, especially clock skew which is very timing sensitive. Unwanted skew due to process variation forms a bottleneck preventing further improvement on clock frequency. Impact from intra-chip interconnect variation is becoming remarkable and is difficult to be modeled efficiently due to its distributive nature. Through wire shaping analysis, we establish an analytical bound for the unwanted skew due to wire width variation which is the dominating factor among interconnect variations. Experimental results on benchmark circuits show that this bound is safer, tighter and computationally faster than similar existing approach.

### **1. INTRODUCTION**

When the VLSI feature size becomes progressively smaller, previously negligible process variations start to affect circuit performance significantly. These process variations result from many non-ideal conditions such as etching error, mask mis-alignment and defects. For clock networks, these variations cause clock skew variations or unwanted skews. Since clock skew is a lower bound for clock period, the unwanted skew due to process variations form a bottleneck preventing improvement on clock frequency. For modern high performance circuit designs, process variation induced clock skew is taking a greater and greater portion of clock period time. Therefore, it is very important to model the impact of process variations on clock network performance so that it can be considered during circuit/clock design.

Realizing the great importance of process variations, many works have been carried out to model their effect [10, 7, 2, 9, 15, 1, 17, 14]. One major objective of these modeling works is to find a reliable estimation on the worst case timing performance induced by process variations, either the worst delay along timing critical paths in timing analysis or the worst skew in a clock network. The estimation results can be applied as a feedback to guide further design iterations.

One common approach to find the worst case performance is to run Monte Carlo simulations for a certain number of iterations and pick the worst case performance among the results. In order to obtain a reliable estimation, the number of iterations is generally very large and consequently the high computational cost makes this method impractical. Another straightforward technique is to estimate the performance only at corner points of process variations. Even though this technique is computationally fast, it is overly pessimistic in estimating the worst case delay due to gate process variations as correlations among the gates are neglected. Therefore, the corner-point technique is useful only when a loose bound needs to be found quickly. In seeking a good balance between estimation quality and runtime, probability based approaches [7, 14] are developed. The interval analysis method [10] is proposed to establish a bound for the worst case timing performance.

The effect of interconnect process variations is becoming more and more remarkable and it is found in [11] that interconnect variations may cause 25% clock skew variations. The impact of interconnect variation is intrinsically hard to be modeled efficiently, since the worst case interconnect delay does not always occur at the process variation corner points. This fact makes the corner-point technique not applicable for interconnect variations. Furthermore, interconnect variations is distributive in nature in contrast to the localized variations for transistors. Since an interconnect wire may span a long distance, using a single variable to model each process parameter such as wire width or thickness is inadequate to capture the different process variation levels in different regions. This is especially true when the intra-chip variations start to dominate the inter-chip variations[12]. A naive approach to solve this problem is to segment a long wire into smaller pieces, and consider the variation of each piece individually. However, this approach may increase the number of variables considerably and thereby slow down the estimation speed.

Our main objective in this paper is to obtain a fast skew bound estimation, that can be used in applications such as process variation driven clock network design and chip level design planning. Note that, in the above applications, the speed and fidelity of the estimation is more important that the numerical accuracy, so that it can be used in the inner loop of optimization. This requirement is quite different from the requirements of the post optimization estimation where probabilistic approach is better. In this work, we concentrate on the effect of wire width variation which is the dominating factor compared to other wire parameters such as wire thickness, since wire width shrinks faster under the non-uniform technology scaling. We have derived an analytical bound for unwanted clock skew between any two clock sinks due to wire width variation. Our proposed technique is based on the observation that estimating the worst case skew due to wire width variation is closely related to the non-uniform wire sizing problem in physical optimizations. The minimum delay non-uniform wire sizing problem for a 2-pin wire(single load path) has been solved in [8, 3]. We derive the maximum wire shaping function for both single load path and multi-pin trees. The minimum delay wire shaping formula for multi-pin trees is also obtained in our work. Previously, a more general version of this problem was solved through an iterative algorithm [4]. Similar to the works of [8, 3, 4], we employ Elmore delay model for our derivations because of its high fidelity and ease of computation. Besides the bound, we discovered that the bound



Figure 1: When estimating the worst case skew between sink  $s_1$  and  $s_4$  in (a), the clock tree can be reduced to a simpler model in (b).

for skew between two clock sinks depends on their common upstream path, even though the skew between them is independent of the common path. This dependence is analyzed and found to be monotone in practice. These results establish an analytical bound for the unwanted skews due to wire width variation. Since the analytical bound can be computed very quickly, it can be applied to process variation driven clock network design as well as chip level design planning. It is to be noted that, for the above applications, speed and fidelity of the the estimation is more important than numerical accuracy. This is because the estimation will be queried frequently in the inner loop of optimization and design planning. Thus, the analytical bound fits the requirement precisely.

### 2. PROBLEM ANALYSIS

Given a pair of sinks  $s_1$  and  $s_2$  in a clock routing tree, our objective is to find a bound for the worst case skew due to wire width variation. The skew between the two sinks can be expressed as  $q_{12} = t_1 - t_2$  where  $t_1$  and  $t_2$  are the delay from clock driver to  $s_1$ and  $s_2$ , respectively. When there is wire width variation for the paths from driver to  $s_1$  and  $s_2$ , the delay  $t_1$  and  $t_2$  vary in certain ranges of  $[t_{1,min}, t_{1,max}]$  and  $[t_{2,min}, t_{2,max}]$ , respectively. Evidently, the worst case skew occurs at  $q_{12,max} = t_{1,max} - t_{2,min}$  or  $q_{12,min} = t_{1,min} - t_{2,max}$ . In previous work, sometimes the skew is defined as  $\max(|q_{12,max}|, |q_{12,min}|)$  or the maximum absolute value of skews among all clock sink pairs. The latter definition of global skew is usually for traditional zero-skew clock network. For modern aggressive VLSI designs, useful skews [16] are applied more frequently, thus, we use the concept of local pair-wise skew instead of the single global skew. In handling process variations and other delay uncertainties, target skews are usually specified as a set of permissible ranges [13] instead of a set of single values. Since there might be skew violation on both the upper-bound side and the lower-bound side of a permissible range, we consider the maximum and the minimum skew separately. It can be seen that a bound for the worst case skew can be obtained by estimating the maximum and the minimum delays under process variations.

In order to estimate the maximum and the minimum delays due to wire width variation, we reduce the routing tree into a simplified model demonstrated in Figure 1. In Figure 1(a), we wish to estimate the worst case skew between sink  $s_1$  and  $s_4$ . Since the *common upstream path*  $s_0 \rightarrow v_7$  for  $s_1$  and  $s_4$  does not contribute to the skew between them, we lump its wire resistance together with the driver resistance into a virtual driving resistor *R* at the *nearest common parent node*  $v_7$  for  $s_1$  and  $s_4$  in Figure 1(b). We



Figure 2: The worst case skew estimation is equivalent to finding wire shaping to maximize/minimize the delay.



Figure 3: The branch load function.

call the branches  $v_7 \rightarrow s_1$  and  $v_7 \rightarrow s_4$  as *critical branches*. For wires off the critical branches such as  $v_5 \rightarrow s_2$  and  $v_6 \rightarrow s_3$  in Figure 1(a), their capacitance can be lumped to their load capacitance to get  $C_2$  and  $C_3$  at node  $v_5$  and  $v_6$ , respectively. If we attempt to estimate the maximum(minimum) delay for sink  $s_1$ , the width of wire  $v_5 \rightarrow s_2$  should be the maximum(minimum) in order to maximize(minimize) the load  $C_2$  for branch  $v_7 \rightarrow s_1$ .

After the transformation in Figure 1, a bound for the worst case skew estimation between two sinks is reduced to estimating the maximum and the minimum delay of a path as in Figure 2 considering wire width variations. When the wire width w varies in the range of  $[w_{lo}, w_{hi}]$ , we need to find a wire shaping function w(x)such that the delay from the virtual driver R to the sink is maximized or minimized. The minimum delay wire shaping function for a path without branch loads is solved in [8, 3]. An iterative wire shaping algorithm is provided in [4] to minimize a weighted sum of sink delays in a routing tree. Though this algorithm can guarantee the optimal wire shaping solution and can be adopted directly in our case, the convergence rate of this algorithm has not been proved. Since we wish to minimize the delay to only one particular sink instead of a weighted sum of sink delays, we are able to derive an analytical formula of wire shaping in Section 3. The formula for the maximum delay wire shaping is introduced in Section 4.

The wire sheet resistance is denoted as r and the wire capacitance per unit length and unit width is represented as c. If there are k branch loads as indicated in Figure 2, we define the lumped downstream branch load capacitance as:

$$C_L(x) = \begin{cases} C_0 & : & 0 \le x < l_1 \\ C_0 + C_1 & : & l_1 \le x < l_2 \\ \vdots & : & : \\ C_0 + C_1 + \dots + C_k & : & l_k \le x \le l_{k+1} \end{cases}$$
(1)

This *branch load function* is depicted in Figure 3. Note that  $l_0 = 0$  and  $l_{k+1}$  is same as the path length *L*. Then the downstream wire

capacitance C(x) at position x is  $C_w(x) = c \int_0^x w(x) dx$ . The total downstream capacitance can be written as  $C(x) = C_L(x) + C_w(x)$ The Elmore delay of the path in Figure 2 can be expressed as:

$$t = RC(L) + r \int_0^L \frac{C(x)}{w(x)} dx$$
<sup>(2)</sup>

We can define the upstream resistance at position x as R(x) = R + $\int_{x}^{L} \frac{r}{w(x)} dx$  Fix the wire shaping function w(x) except an infinitesimal strip of width  $\delta$  at z and let w(z) to be a variable y. Then we may obtain the first order derivative as:

$$\frac{dt}{dy} = R(z - \frac{\delta}{2})c\delta - \frac{r\delta}{y^2}C(z + \frac{\delta}{2})$$
(3)

Same conclusion is derived in [3] for the single load case, thus we omit the derivation here. From the above equation we can obtain  $\frac{d^2t}{dy^2} = \frac{2r\delta}{y^3}C(z+\frac{\delta}{2}) > 0$ . Thus the delay function is convex with respect to y or wire width.

#### THE MINIMUM DELAY WIRE SHAP-3. **ING FOR PATH WITH BRANCH LOAD**

The minimum delay wire shaping for a path with single load (k = 0) is shown in [8, 3]. In this section, we describe the minimum delay wire shaping for a path with multiple branch loads. Letting  $q(x) = \frac{L}{2} \sqrt{\frac{rc}{RC_L(x)}}$ , we first obtain the minimum delay wire shaping function when the wire width variation bound is not considered. (The proof is similar to [3] and is omitted here.)

**Theorem 1**: The unconstrained minimum delay wire shaping function for a path with multiple branch loads is:

$$w(x) = \frac{2C_L(x)}{cL} W(q(x)) e^{2W(q(x))}$$
(4)

where  $W(x) = \sum_{n=1}^{\infty} \frac{(-n)^{n-1}}{n!} x^n$  is the Lambert's W function. For each wire segment between  $l_i \le x < l_{i+1} (0 \le i \le k)$ , the wire shaping w(x) is an exponential function. The overall wire shaping function is a piecewise exponential function which may be discontinuous at each branch point, since it depends on  $C_L(x)$  which is a piecewise constant function whose value changes at the branch points. Even though there might be discontinuity, this wire shaping is monotonously increasing with respect to x [6].

When the bound on wire width variation  $[w_{lo}, w_{hi}]$  is considered, the situation is more complicated. For wire segment between  $l_i \leq$  $x < l_{i+1} (0 \le i \le k)$ , there are six cases that may occur:

- Case 1: The shaping of entire segment follows the exponential function as in Equation(4).
- Case 2: The width is uniformly w<sub>hi</sub>.
- Case 3: The width is uniformly *w*<sub>lo</sub>.
- Case 4: The width is  $w_{lo}$  when x is smaller than a value g, and the wire shaping follows exponential function from g to  $l_{i+1}$ .
- Case 5: The width is  $w_{hi}$  when x is greater than a value h, and the wire shaping follows exponential function from  $l_i$  to h.
- Case 6: The width is  $w_{lo}$  when x is smaller than a value g,  $w_{hi}$  when x is greater than a value h, and the wire shaping follows exponential function from g to h.



Figure 4: The delay function w.r.t. wire width is a convex function. The maximum/minimum delay wire width depends on the overlap between this function and wire width range  $[w_{lo}, w_{hi}]$ .



Figure 5: The maximum delay wire shaping.

We call the position of x = g and x = h as switching points. The method to decide the switching points is very complicated as shown in [3]. Moreover, it is possible that all six cases need to be evaluated to find the exact minimum delay wire shaping. In practice, one can take the wire shaping according to Equation (4) without considering the wire width bound, and round the width to either  $w_{lo}$  or  $w_{hi}$  at where the width from Equation (4) exceeds the bound. The switching points can be found in the rounding process and the wire shaping in the exponential segment between x = g and x = h can be recomputed according to the updated upstream resistance and downstream capacitance. Note that this is slightly different from the rounding-alone mentioned in [3]. Even though this heuristic may result in suboptimal solutions, the computation becomes much easier and we observed that the error due to this approximation is negligible in practice.

#### THE MAXIMUM DELAY WIRE SHAP-4. ING

In this section, we will derive the maximum delay wire shaping for both single load path and path with multiple branch loads. For the ease of presentation, we start with the single load situation where k = 0. It has been shown in Section 2 that the delay is a convex function with respect to w(x). This is illustrated in Figure 4. Therefore, w(x) has to be either  $w_{lo}$  or  $w_{hi}$  to maximize the delay. Because of Equation (3), delay function with respect to w(x) is monotonously decreasing when x is large or the position is closer to the driver. Similarly, the delay function is monotonously increasing when the position is closer to the sink side. This fact can be translated to the effect that w(x) will be  $w_{lo}$  when x is large and  $w_{hi}$  when x is small. When the value of x increases, the delay function with respect to wire width at x changes in the direction from (a) to (b) to (c) in Figure 4.

Therefore, the maximum delay wire shaping looks like the example in Figure 5. The next problem is to find the partitioning point x = p where the wire width is switched from  $w_{lo}$  to  $w_{hi}$ . When k = 0, the Elmore delay for Figure 5 can be written as:

$$t = \alpha p^{2} + \beta p + \gamma$$

$$\alpha = -\frac{rc(w_{hi} - w_{lo})}{w_{lo}}$$

$$\beta = (w_{hi} - w_{lo})(Rc + \frac{rcL}{w_{lo}} - \frac{rC_{0}}{w_{lo}w_{hi}})$$

$$\gamma = RLcw_{lo} + RC_{0} + \frac{1}{2}rcL^{2} + \frac{rLC_{0}}{w_{lo}}$$
(5)

In order to find the *p* that maximize the delay, we first find the derivative:

$$\frac{dt}{dp} = -2\alpha p + \beta \tag{6}$$

Since  $\alpha$  is always negative,  $\frac{d^2t}{dp^2} < 0$  and the maximum delay is reached when

$$p = \frac{1}{2} \left( \frac{Rw_{lo}}{r} + L - \frac{C_0}{cw_{hi}} \right)$$
(7)

by letting  $\frac{dt}{dn} = 0$ .

In other words, p satisfies the following equation:

$$\frac{R}{r/w_{lo}} + (L-p) = p + \frac{C_0}{cw_{hi}}$$
(8)

If we transform the driver into a piece of wire with width  $w_{lo}$  whose resistance is same as the driver resistance and treat the load  $C_0$  as a wire of width  $w_{hi}$  with same value of capacitance  $C_0$ , then the above equation shows that p makes length of the fat segment the same as the length of the thin segment.

When we consider the situation with branch loads, i.e., k > 0, the properties on  $\frac{dt}{dy}$  for Equation (3) do not change and the wire shape is same as in Figure 5 and the value of p is determined by:

$$\frac{R}{r/w_{lo}} + (L-p) = p + \frac{C_L(p)}{cw_{hi}} \tag{9}$$

Even though the wire shape in Figure 5 looks strange, it may happen in reality. If the path is routed in an L-shape with a horizontal and a vertical segment, the two segments are usually on different metal layers. Therefore, it is likely that the width variation has a abrupt change at layer switching.

#### THE SKEW BOUND DEPENDS ON THE 5. **COMMON PATH**

It is easy to see that the variation along the upstream common path for a pair of sinks does not contribute to the skew between them. For the example in Figure 1, the variation along path  $s_0 \sim v_7$ has nothing to do with the skew value between sink  $s_1$  and sink  $s_4$ . However, when the wire width at the common path changes, the resistance R in Figure 1 changes and the maximum/minimum wire shaping from  $v_7$  to  $s_1$  and  $s_4$  will change as well. Therefore, the skew bound is affected by the variation of R or their common path.

**Theorem 2:** Given fixed switching points, the skew upper(lower) bound between two sinks is a convex(concave) function with respect to their common driving resistance.

Proof: Let us consider the case where skew upper bound between two sinks  $s_1$  and  $s_4$  need to be computed. Let the distance from  $s_1$  and  $s_4$  to their nearest parent node  $v_7$  be  $L_1$  and  $L_4$ , respectively. The load capacitance at  $s_1$  and  $s_4$  are  $C_1$  and  $C_4$ , respectively. The branch loads are temporarily ignored, as the conclusion



Figure 6: Case 6 for the minimum delay wire shaping, the wire width is  $w_{lo}$  in [0,g] and  $w_{hi}$  in [h,L]. Between g and h, the wire shape follows exponential function.

for path with branch loads is the same and can be extended from the single load case easily. The skew upper bound between them can be expressed as  $q_{14,max} = t_{1,max} - t_{4,min}$ . From Section 3 and 4, we know that both  $t_{1,max}$  and  $t_{4,min}$  depend on the value of R. We will analyze  $\frac{dq_{14,max}}{dR}$  by evaluating  $\frac{dt_{1,max}}{dR}$  and  $\frac{dt_{4,min}}{dR}$ . By plugging Equation (7) and the expression of  $\beta$  into  $\beta p$ , we

have  $\beta p = -\alpha p^2$ . Then the maximum delay Equation (5) can be transformed to:

$$t_{1,max}(R) = -\alpha p^2(R) + \gamma(R)$$

Then we can obtain the derivative:

$$\frac{dt_{1,max}}{dR} = -2\alpha p(R) \frac{dp}{dR} + \frac{d\gamma}{dR}$$
(10)
$$= c(w_{hi} - w_{lo})p(R) + L_1 cw_{lo} + C_1$$

$$= \frac{cw_{lo}(w_{hi} - w_{lo})}{2r}R$$
(11)
$$+ \frac{1}{2} \left( (w_{lo} + w_{hi})(L_1 c + \frac{C_1}{w_{hi}}) \right)$$

The derivative of  $t_{4,min}$  depends on the six different cases described in Section 3. We will show the derivations for the most basic case 1 and the most complex case 6. Derivations for other cases are similar, and all these cases lead to the same conclusion.

For case 1, the equation for the minimum delay (for single load case) is given as [8]

$$t_{4,min} = \frac{1}{4} cr L_4^2 (1 + 2W(\hat{x})) / W(\hat{x})^2$$
(12)

where  $W(\hat{x})$  is the Lambert's W function and  $\hat{x} = \frac{L_4}{2} \sqrt{\frac{cr}{C_4 R}}$ . Since the Lambert's W function satisfies  $W(\hat{x})e^{W(\hat{x})} = \hat{x}$ , we can obtain its derivative as

$$W'(\hat{x}) = \frac{W(\hat{x})}{\hat{x}(1+W(\hat{x}))}$$
(13)

Then we have the derivative of  $t_{4,min}$  with respect to R:

$$\frac{dt_{4,min}}{dR} = \frac{rcL_4^2}{4W^2(\hat{x})R}$$
(14)

Combining Equation (10) and (14), we can get the second order derivative of  $q_{14,max}$ :

$$\frac{d^2 q_{14,max}}{dR^2} = \frac{c w_{lo}(w_{hi} - w_{lo})}{2r} + \frac{l_4^2 r^2 c}{4r R^2 W(\hat{x})(1 + W(\hat{x}))}$$
(15)

Evidently,  $\frac{d^2q_{14,max}}{dR^2} > 0$  and  $q_{14,max}(R)$  is a convex function. Now we consider case 6 which is more complex and general.

The wire shaping for case 6 is depicted in Figure 6. We can divide

this wire into three segments: (i) the thin segment from x = 0 to x = g, (ii) the exponential segment between x = g and x = h and (iii) the fat segment from x = h to  $x = L_4$ . We use  $C_{thin} = cw_{lo}g$ ,  $C_{exp} = \int_g^h cw(x)dx$  and  $C_{fat} = cw_{hi}(L_4 - h)$  to represent the wire capacitance for each segment. Similarly, the wire resistance for each segment can be defined as  $R_{thin} = \frac{rg}{w_{lo}}$ ,  $R_{exp} = \int_g^h \frac{r}{w(x)} dx$  and  $p = \frac{r(L_4 - h)}{r(L_4 - h)}$ 

 $R_{fat} = \frac{r(L_4 - h)}{w_{hi}}.$ We can find the wire delay for the exponential segment itself as:

$$\tilde{t} = \int_{g}^{h} \frac{r}{w(x)} \left( \int_{g}^{x} cw(z) dz \right) dx$$
(16)

For the exponential segment, we can treat the fat segment as part of its driving resistance and the thin segment as part of its load capacitance. Thus we can express the delay from x = h to x = g as:

$$t_{exp} = (R + R_{fat})(C_{exp} + C_{thin} + C_4) + \tilde{t} + R_{exp}(C_{thin} + C_4)$$

The value of  $t_{exp}$  can be obtained through Equation (12) except that the *R* is replaced by  $\tilde{R} = R + \frac{r(L_4-h)}{w_{hi}}$  and the  $C_4$  is replaced by  $\tilde{C}_4 = C_4 + gcw_{lo}$ .

The total path delay from  $x = L_4$  to x = 0 can be written as:

tл

$$\begin{array}{lll} \min & = & R(C_{fat} + C_{exp} + C_{thin} + C_4) \\ & & + R_{fat}(\frac{1}{2}C_{fat} + C_{exp} + C_{thin} + C_4) \\ & & + \tilde{\iota} + R_{exp}(C_{thin} + C_4) \\ & & + R_{thin}(\frac{1}{2}C_{thin} + C_4) \\ & = & t_{exp} + RC_{fat} + \frac{1}{2}R_{fat}C_{fat} + R_{thin}(\frac{1}{2}C_{thin} + C_4) \\ & = & \frac{1}{4}cr(h - g)^2(1 + 2W(\tilde{x}))/W(\tilde{x})^2 \\ & & + RC_{fat} + \frac{1}{2}R_{fat}C_{fat} + R_{thin}(\frac{1}{2}C_{thin} + C_4) \end{array}$$

where  $\tilde{x} = \frac{h-g}{2} \sqrt{\frac{rc}{RC_4}}$ . Comparing the above equation and Equation (12), the differences are (i) there are additional terms with at most linear dependence on *R* and (ii) *R* is replaced by  $\tilde{R}$  through a linear transformation. Since both differences have only linear dependence on *R*, they do not change the property of  $\frac{d^2t_{4,min}}{dR^2} > 0$  and  $q_{14,max}(R)$  is still a convex function for case 6. Other cases can be proved in the same way.

For a path with multiple branch loads, the difference is that the constant load  $C_4$  is replaced by the branch load function  $C_L(x)$ . Since the branch load function is piecewise constant, it does not change the property of  $\frac{d^2t_{4,min}}{dR^2} > 0$  either. Since  $q_{14,min} = t_{1,min} - t_{4,max}$  is symmetric to  $q_{14,max} = t_{1,max} - t_{4,min}$ , we can conclude that  $q_{14,min}$  is a concave function with respect to *R*. **Q.E.D.** 

According to Theorem 2, we need to compute the wire shaping for  $q_{14,max}$  twice, one with the minimum *R* by setting the wire width along the common path  $s_0 \rightarrow v_7$  to  $w_{hi}$ , the other with the maximum *R* by letting the wire width along the common path be  $w_{lo}$ . The maximum of the two skew results is finally selected as the worst case bound.

## 6. EXPERIMENTAL RESULTS

In order to validate the bound we derived, we implemented our formulas, Monte Carlo and the interval analysis [10] for comparisons. Even though Monte Carlo method is not efficient, it can generate a reliable estimation on the worst case performance if the number of iterations is sufficiently large. Therefore, the result of Monte Carlo serves as an ideal baseline for comparisons. The reason to compare with interval analysis method is because its objective is very close to ours: to establish a bound efficiently.



Figure 7: Histogram of difference compared with Monte Carlo simulation for the maximum skew.



Figure 8: Histogram of difference compared with Monte Carlo simulation for the minimum skew.

The test cases are the r1 - r5 which are applied in the bounded skew clock routing (BST) work [5]. We downloaded the bounded skew clock routing code from the GSRC bookshelf (*htt p* : //vlsicad.ucsd.edu/GSRC/bookshelf/Slots/BST/) and generated clock routing trees for r1 - r5 by running the BST code. The global skew bound is set to be 100*ps*. We assume  $\pm 30\%$ variations on wire width. We implemented our formulas, Monte Carlo and the interval analysis method in C language. The experiments are performed on a PC with Pentium III, 655 MHz processor and 512 MB memory.

For each clock tree, we calculated the skew bound of one of the sinks with respect to every other sink in the clock tree. Thus for a clock tree with *n* sinks, we will have n - 1 pairs. We evaluated the skew bound due to wire width variations for the 6725 pair of sinks in the five test cases of r1 - r5 by all of the three methods. In order to obtain a meaningful estimation, we segment long wires into small pieces of about  $50\mu m$  for the Monte Carlo and interval anal-

ysis. Therefore, the wire width for each piece is an individual variable. For each sink pair, we run Monte Carlo simulation for 50,000 trials when the width for each wire piece is selected randomly in the range of  $[w_{lo}, w_{hi}]$ . When estimating the minimum delay wire shaping we applied the heuristic described in Section 3 to decide the switching points and the optimal wire shaping for them. This heuristic brings great implementation convenience with negligible quality penalty.

We take the result from Monte Carlo simulation as baseline. The result from our bound is evaluated by taking the difference between them. In other words, we evaluate the maximum/minimum skew from our bound minus the maximum/minimum skew from Monte Carlo simulation for each pair of sinks. The bound result from the interval analysis is evaluated in the same way. Figure 7 and 8 show the histograms of the difference for the maximum skew and the minimum skew respectively. The horizontal axis represents the skew difference in ns and the vertical axis represents the number of sink pairs with a particular value of skew difference. In Figure 7. the difference from our bound is always greater than zero. Meanwhile, the difference from our bound for the minimum skew in Figure 8 is always non-positive. This fact tells that our method provides a bound for the worst case skew in practice, even though we applied heuristic in the implementation. Some sink pairs have similar path lengths to their nearest common parent node and the driver, thus they have similar skew variation behaviors. This fact results in serval peaks in the histograms. Figure 7 and 8 show that the peaks from our method are closer to zero difference compared to the peaks from the interval analysis. Thus, our method provides a tighter bound than the interval analysis.

Table 1: Comparison on computation time in seconds.

| Testcase | #sinks | #pairs | Monte Carlo | Interval | Ours  |
|----------|--------|--------|-------------|----------|-------|
| r1       | 267    | 266    | 8277        | 3        | 0.46  |
| r2       | 598    | 597    | 35684       | 7        | 4.72  |
| r3       | 862    | 861    | 70288       | 14       | 9.65  |
| r4       | 1903   | 1902   | 180270      | 90       | 38.31 |
| r5       | 3100   | 3099   | 408750      | 277      | 77.5  |

The computation time for each method is shown in Table 1. In Table 1, column 2 and column 3 show the number of sinks and the number of sink pairs whose skew are evaluated. We can see that the runtime from the Monte Carlo simulation is impractical. Compared with interval analysis, the runtime of our method is always shorter.

## 7. CONCLUSION AND FUTURE WORK

Through wire shaping analysis, an analytical bound for the unwanted skew due to wire width variation is established. Since this bound can be obtained very quickly, it can be applied to interconnect variation driven design and design planning. Experimental results show that our method is safer, faster and more accurate than the interval analysis. This result can be extended to consider the effect of buffer variations on clock skew and can be used to enhance the toolkit to harness the progressively severe process variation effect.

### 8. **REFERENCES**

 E. Acar, S. R. Nassif, Y. Liu, and L. T. Pileggi. Assessment of true worst case circuit performance under interconnect parameter variations. In *Workshop Notes, International* Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, pages 45–49, 2000.

- [2] N. Chang, V. Kanevsky, O. S. Nakagawa, K. Rahmat, and S.-Y. Oh. Fast generation of statistically-based worst-case modeling of on-chip interconnect. In *Proc. of ICCD*, pages 720–725, 1997.
- [3] C.-P. Chen and D. F. Wong. Optimal wire-sizing function with fringing capacitance consideration. *Technical Report TR96-28*, Department of Computer Science, University of Texas, November, 1996.
- [4] C.-P. Chen, H. Zhou, and D. F. Wong. Optimal non-uniform wire-sizing under the Elmore delay model. In *Proc. of ICCAD*, pages 38–43, 1996.
- [5] J. Cong, A. B. Kahng, C.-K. Koh and C.-W. A. Tsao. Bounded-skew clock and Steiner routing under Elmore delay. In *Proc. of ICCAD*, pages 66–71, 1995.
- [6] J. Cong and K.-S. Leung. Optimal wiresizing under the distributed Elmore delay. *IEEE Tran. on CAD*, 14(3):321–336, June 1995.
- [7] A. D. Fabbro, B. Franzini, L. Croce, and C. Guardiani. An assigned probability technique to derive realistic worst-case timing models of digital standard cells. In *Proc. of DAC*, pages 702–706, 1995.
- [8] J. P. Fishburn and C. A. Schevon. Shaping a distributed RC line to minimize Elmore delay. *IEEE Tran. on Circuits and Systems*, 42(12):1020–1022, December 1995.
- [9] P. Zarkesh-Ha, T. Mule, and J. D. Meindl. Characterization and modeling of clock skew with process variations. In *Proc.* of CICC, pages 441–444, 1999.
- [10] C. L. Harkness and D. P. Lopresti. Interval methods for modeling uncertainty in RC timing analysis. *IEEE Tran. on CAD*, 11(11):1388–1401, November 1992.
- [11] Y. Liu, S. R. Nassif, L. T. Pileggi, and A. J. Strojwas. Impact of interconnect variations on the clock skew of a gigahertz microprocessor. In *Proc. of DAC*, pages 168–171, 2000.
- [12] S. R. Nassif. Modeling and analysis of manufacturing variations. In *Proc. of CICC*, pages 223–228, 2001.
- [13] J. L. Neves and E. G. Friedman. Optimal clock skew scheduling tolerant to process variations. In *Proc. of DAC*, pages 623–628, 1996.
- [14] M. Orshansky and K. Keutzer. A general probabilistic framework for worst case timing analysis. In *Proc. of DAC*, pages 556–561, 2002.
- [15] D. Sylvester, O. S. Nakagawa, and C. Hu. Modeling the impact of back-end process variation on circuit performance. In Proc. of the International Symposium on VLSI Technology, Systems and Applications, pages 58–61, 1999.
- [16] C.-W. A. Tsao and C.-K. Koh. UST/DME: a clock tree router for general skew constraints. In *Proc. of ICCAD*, pages 400–405, 2000.
- [17] S. Zanella, A. Nardi, A. Neviani, M. Quarantelli, S. Saxena, and C. Guardiani. Analysis of the impact of process variations on clock skew. *IEEE Tran. on Semiconductor Manufacturing*, 13(4):401–407, November 2000.