# PAPER A Fast Delay Computation for the Hybrid Structured Clock Network\*

# Yi ZOU<sup>†a)</sup>, Student Member, Yici CAI<sup>†b)</sup>, Qiang ZHOU<sup>†c)</sup>, Xianlong HONG<sup>†d)</sup>, and Sheldon X.-D. TAN<sup>††e)</sup>, Nonmembers

**SUMMARY** This paper presents a novel approach to reducing the complexity of the transient linear circuit analysis for a hybrid structured clock network. Topology reduction is first used to reduce the complexity of the circuits and a preconditioned Krylov-subspace iterative method is then used to perform the nodal analysis on the reduced circuits. By proper selection of the simulation time step and interval based on Elmore delays, the delay of the clock signal between the clock source and the sink node as well as the clock skews between the sink nodes can be computed efficiently and accurately. Our experimental results show that the proposed algorithm is two orders of magnitude faster than HSPICE without loss of accuracy and stability. The maximum error is within 0.4% of the exact delay time. *key words:* clock networks, simulation, analysis, delay, Elmore delay

## 1. Introduction

Clock network distributes the clock signal from the clock source to the local sink nodes on a chip. The design of the clock networks has huge impacts on the behavior of the synchronized circuits on a chip. In deep sub-micron VLSI technology, clock distribution has become an increasingly challenging problem for VLSI designs. The high-performance VLSI circuit designs require accurate and yet fast analysis of clock networks for efficient synthesis of clock networks.

As the process variation becomes an important factor in the design of clock distribution networks in deep sub-micron technology, the portion of the clock skew introduced by the process variations on the wire width and the clock buffer size can no longer be ignored. Hybrid structures, which consists of both tree and mesh structures, are more tolerant

<sup>†</sup>The authors are with the Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.

<sup>††</sup>The author is with the Department of Electrical Engineering, University of California, Riverside, CA 92521, USA.

\*Some preliminary results of this paper was presented at IEEE International Conference on Computer Design (ICCD), San Jose, CA, Oct. 2004 [12]. This work is supported by National Natural Science Foundation of China (NSFC) 60476014, Hi-Tech Research & Development (863) Program of China No.2004AA1Z1460. The work of Sheldon X.-D. Tan is supported by NSF CAREER Award CCF-0448534 and University of California Regent's Faculty Fellowship (04–05).

c) E-mail: zhouqiang@mail.tsinghua.edu.cn

of process variations and thus suitable for high-performance rubust VLSI designs. Another major advantage of hybrid structured topologies is that even very non-uniform load distributions have very small impacts on the local skew so that changes in clock loads or locations cause little change on clock timing. Therefore, re-sizing of the grid wires is rarely necessary [9]. As a result, mesh-based hybrid structured clock networks are becoming more popular in the topology design of high-performance clock networks. However, compared with tree structured clock networks, hybrid structured clock networks are difficult for analysis and optimization. The root of the problem stems from the more complicated topologies as more interconnects are present at cross nodes and a large number of loops exist, which cause more dense circuit matrices and thus makes traditional analysis method inefficient.

As the clock frequency climbs to GHz range, the inductance effect can no longer be ignored especially when the chip has faster on-chip rise times and long transmission wire length. To consider increasing capacitive, inductive and other high-frequency effects, complicated RLCM circuit models like partial element equivalent circuit (PEEC) model [13] are typically used by analysis tools to obtain better accuracy. The resulting clock networks due to extracted parasitics may become too large to be analyzed efficiently by current transistor-level SPICE-like simulation tools. A fast yet accurate analysis method using high order model for the hybrid structured topology is urgently needed for the efficient design, analysis and optimization of high-performance clock networks.

Mesh-based or hybrid structured clock network design [1], [11] use the Elmore delay model to calculate delay and skew. Elmore delay model is widely used for fast delay approximation during the physical synthesis processes. One shortcoming of this model is that it does not include the inductive effects and is a first order moment approximation. Higher order propagation delays can be obtained through moment matching methods such as AWE [3], PRIMA [4], Multi-node Moment Matching [5]. But they are efficient only for handling the tree and tree-like topologies. The path tracing algorithm, which is linear and is the core of the Rapid Interconnect Evaluator [6], is responsible for efficiency of methods in [3]–[5]. Those methods, however, become less efficient as circuit matrices with coupling loops require more CPU time to solve. Therefore, a fast yet accurate algorithm for the timing analysis of the hybrid struc-

Manuscript received October 26, 2004.

Manuscript revised March 2, 2005.

Final manuscript received April 8, 2005.

a) E-mail: zou\_yi00@mails.tsinghua.edu.cn

b) E-mail: caiyc@mail.tsinghua.edu.cn

d) E-mail: hxl-dcs@mail.tsinghua.edu.cn

e) E-mail: stan@ee.ucr.edu

DOI: 10.1093/ietfec/e88-a.7.1964

tured clock networks is still urgently needed for topology design and post-design verification of clock networks.

In this paper, we propose an efficient method to analyze the mesh-tree (hybrid) structured clock networks. Clock networks modeled as RLC circuits are used in our method for high accuracy. We target at very large clock networks from real designs and use a divide-and-conquer scheme to perform the complexity reduction. The resulting algorithm can give faster yet more accurate results for hybrid structured clock network than existing methods [1], [4], [6]. We demonstrated the performance through a comparison with the widely used simulator HSPICE.

Our algorithm is based on the observation that there are many underlying tree structured wires driven by the overlying mesh, each tree branch structure can be reduced into a transient circuit source with an equivalent conductor. The nodes on the mesh structure just connecting to resistors and inductors can also be eliminated. The use of the lumped transmission RLC models further increases the problem complexity. But the RLC chains originate from the lumped transmission model can also be simplified via topology reduction.

After the order reduction based on topology is finished, the system equations formulated by Nodal Analysis (NA) are solved by a preconditioned Krylov-subspace iterative method [7], which shows an almost linear performance in practice. With the simulation results, the delay time between the source and sink nodes can be obtained by interpolation. Since the delay times of the sink nodes span a small period in the transient waveforms compared to clock periods, we can further decrease the steps to get more precise simulation results based on Elmore delay estimation for simulation intervals. Experiments show that our method is two orders of magnitude faster than HSPICE without introducing any significant error.

We organize this paper as follows: Sect. 2 provides our simulation model and the formulation of Nodal Analysis in the time domain. Section 3 presents the detailed analysis of the structure reduction via equivalent circuit model. Section 4 describes the delay analysis through hierarchical simulation step tuning. The overall algorithm is also given in this section. At the end of this paper, experimental results and conclusion are presented.

# 2. Description of the Simulation Model

# 2.1 Hybrid Structured Clock Network

The topology under consideration is a tree-mesh-tree hybrid structure that is very suitable in high-performance system on a chip (SoC) technology [8] and is preliminarily implemented in the clock network design in real CPUs [2], [11].

In our simplified topology for timing analysis, a clock source, which is treated as a voltage source in transient simulation, drives a global H or X tree that in turn directly drives the mesh structure. And the local trees try to get the clock signal from the mesh structure and transmit the signal to the



Fig. 1 Tree-mesh-tree hybrid clock network.



Fig. 2 Mesh structure in hybrid clock network.



Fig. 3 Tree structures in clock network.

sink nodes. Figure 1 shows one example of the hybrid clock structure that we use in our analysis and simulation [8].

### 2.2 RLC Circuits for Clock Network Modeling

We use the RLC model to describe the electromagnetic behavior of clock networks. The distributed transmission wire between any two nodes are modeled as a chain of connected resistors and inductors with capacitors between the wire and ground as shown in Fig. 2 where the mesh structure consists of both R and L, C parameters of metal wires.

Also, the wires in the tree structures are also modeled as RLC circuits just like the wires in the mesh structures which is shown in Fig. 3 below:

In order to efficiently simulate the timing characteris-

1966





Fig. 5 Linear model of L and C component.

tic of each wire, we further divide each wire into several RLC sections to get a lumped transmission line model. Figure 4 shows the lumped transmission line model [10]. We incorporate the lumped transmission line modeling in our test case generations.

The bottom-level local tree structures can exist at any inner nodes of the mesh networks and at the transmission line inner nodes. The positions of tree roots, the depths of trees and the branch numbers are randomly generated.

Similar to the models proposed before, we only consider the load capacitance in the sink node since the load resistance of the clock pin of MOSFETs is very large. The whole model for the clock networks is adequate and accurate.

### 2.3 Nodal Analysis for General RLC Circuits

Modified Nodal Analysis provides a good solution for general circuits. However, it may lead to circuit matrices that are non-symmetric positive definite, which may cause the iterative method to converge slowly. As a result, we use the nodal analysis to avoid the problem. The voltage source at the clock sources can also be modeled as current sources to avoid the introduction of extra current variables.

We first transform the differential equations into algebraic equations by using trapezoidal difference method: Suppose h be the analysis time step. For capacitors, this finite difference equation will have

$$I_{c,k+1} = \frac{2C}{h} V_{c,k+1} - \left(\frac{2C}{h} V_{c,k} + I_{c,k}\right)$$
(1)

where  $V_{c,k}$ ,  $I_{c,k}$ ,  $V_{c,k+1}$ ,  $I_{c,k+1}$  denote the branch voltage and branch current of the capacitor on step k and step k + 1 respectively. C is the value of the capacitor. Similarly the current through an inductor L at step k + 1 is:

$$I_{L,k+1} = \frac{h}{2L} V_{L,k+1} - \left(\frac{h}{2L} V_{L,k} + I_{L,k}\right)$$
(2)

The companion models of L and C component are shown in Fig. 5 where  $G_c$ ,  $G_L$  stands for  $\frac{2C}{h}$ ,  $\frac{h}{2L}$  respectively;

 $IS_c$ ,  $IS_L$  stand for  $G_cV_{c,i}+I_{c,i}$  and  $G_LV_{L,i}+I_{L,i}$  in (1) and (2) respectively.

After replacing all the capacitances and inductances in the network with their companion models, we obtain a pure resistor equivalent network. Once the topology of the clock network and parameters of its components are given, resistors in the equivalent network remain the same at any analysis time point under the fixed time step, only the equivalent current sources in the network change from time to time.

The equivalent resistor network can be described by the Nodal Analysis equation below:

$$G_{n \times n} V_n^t = I S_n^t \tag{3}$$

$$g_{ii} = -\sum_{j=1, j \neq i}^{n} g_{ij}$$
 (4)

where  $G_{n \times n}$  is a  $n \times n$  symmetrical matrix with element  $g_{ij}$  $(i \neq j)$  standing for the conductance between node *i* and *j*.  $V_n^t$  is the vector of node voltage at time point *t*.  $IS_n^t$  is the current vector at time point *t* with each element standing for the equivalent current that flows from a node to the ground.

## 3. Structure Complexity Reduction

The biggest challenge in performing nodal analysis of a RLC modeled clock network is its huge circuit matrix size. Reduced order transfer function methods can significantly reduce the order of the problem but itself is also a time intensive algorithm. We want to demonstrate that through structure or topology complexity reduction, we can also reduce circuit complexity while such a reduction is simple and easy for implementation.

# 3.1 Reductions of Series or Parallel Connected Elements

Circuits that have parallel or series connected elements named type 1 and 2 in Fig. 6 can be simplified to the equivalent one via simple algebraic operations (5) (6), and type 1 and type 2 circuits can also change back and forth by (7).

$$G_E = G1 + G2 , \ I_E = I1 + I2 \tag{5}$$

$$G_E = \frac{G1G2}{G1+G2}, \ V_E = V1+V2$$
 (6)

$$V_E = \frac{I_E}{G} \tag{7}$$

# 3.2 Simplification of Tree Structures

In order to simplify the tree structures at the bottom of the whole topology, the equivalent conductance and current source of the tree must be computed first. Because the branches would have the same parent node (the number of children it has is called its degree), we must start at the bottom of the tree. We reduce leaf branches first (node with degree 0). Once a branch has been reduced, we subtract its parent node's degree by 1, and add the address information



Fig. 6 Three types of equivalent circuits.



Fig. 7 Simplify a tree branch.

of its parent node into a root chain while the coefficients information of this branch is added into a branch chain. Then we search the new root chain made to find whether new leaf nodes exist. If they do, we repeat this process until no new leaf nodes can be found. When the tree is reduced completely, we obtain the equivalent conductance of the tree. Figure 7 shows how to simplify a branch. In Eq. (8),  $G^*$  is used in the recursive simplification process to represent  $G_1$ ,  $G_2 \dots G_k$  as shown in Fig. 7.

$$G_{bottom} = \sum_{i=1}^{k} G_i + G_c,$$
  

$$G_{middle} = \frac{G_L G_{bottom}}{G_L + G_{bottom}},$$
  

$$G^* = \frac{G_{top} G_{middle}}{G_{top} + G_{middle}}$$
(8)

If we store coefficients in Eq. (9) during the reduction process, the total current of a tree can be expressed by a linear combination of currents on capacitances and inductances:

$$a = G_L d$$
,  $b = 1 - a$ ,  $c = G_{top} e$ ,

$$d = \frac{1}{G_L + G_{bottom}},$$
  

$$e = \frac{1}{G_{top} + G_{middle}}$$
(9)

Now we have the branch chains we have made during the reduction process that have recorded all the information of the branches from the bottom of the tree to the root. We use them to compute the linear combination like this: first, the equivalent current of each branch is computed, then we add them to their parent nodes. When we reach the end of the branch chain, the total current of the tree is obtained. Equations below can be proved using the equivalent circuits in Fig. 7,

$$I_{bottom}^{t} = \sum_{i=1}^{p} I_{j}^{t} - IS_{c}^{t} + I_{node}^{t}$$
(10)

$$I_{middle}^{t} = aI_{bottom}^{t} + bIS_{L}^{t}$$
(11)

$$I_{top}^{t} = cI_{middle}^{t} \tag{12}$$

Here  $I_{bottom}^{t}$ ,  $I_{middle}^{t}$ ,  $I_{top}^{t}$  stand for the equivalent currents of the bottom, middle, and top nodes in a branch;  $I_{node}^{t}$  is the current absorbed by cell block at time point *t*, *a*, *b*, *c* are defined in Eq. (9),  $IS_{L}^{t}$ ,  $IS_{c}^{t}$  are the currents of inductance and capacitance. Details of these formulas are shown in Fig. 7. Here we store  $I_{bottom}^{t}$ ,  $I_{middle}^{t}$  of each branch in order to recover the voltage of the tree node.  $I_{top}^{t}$  will be added into the current of the top node later.

After reducing all the tree nodes in the network, the size of linear equation set is reduced. By solving the new equation set, the root node's voltages are computed. Then we can compute voltages of the internal nodes in the tree from the root down to the tree bottom along the branch chain inversely. Suppose we get the voltage value of the top node in a branch, then the voltages of the middle and the bottom node can be computed using Eq. (13) and Eq. (14).

$$V_{middle}^{t} = cV_{top}^{t} - eI_{middle}^{t}$$
(13)

$$V_{bottom}^{t} = aV_{middle}^{t} + d(I_{L}^{t} - I_{bottom}^{t})$$
(14)

# 3.3 Reduction of R-L Nodes

The node just connecting to a conductor and an inductor is called RL node. All the RL nodes also can be suppressed to further reduce the size of the equation set. Current of these branches can be treated as a source flowing from G node to L node directly as shown in Fig. 8. Because the number of the RL nodes can be up to the half of the total number of mesh nodes, even if the overall topology contains no tree structures, the dimension of equations can be reduced by half by the structure reduction.

When backing solve the voltage of the RL node, we can use formula

$$V_{RLnode}^{t} = mV_{Gnode}^{t} + nV_{Lnode}^{t} - oIS_{L}^{t}$$
<sup>(15)</sup>

where m, n, o in Eq. (15) are computed and stored using



Fig. 8 Simplify a RL node.



**Fig.9** Y model circuit to  $\pi$  model circuit.

Eq. (16) during the reduction process.

$$m = oG , n = 1 - m , o = \frac{1}{G + G_L}$$
 (16)

# 3.4 Reduction of Transmission RLC Chains

The presence of lumped RLC chains in our topology for clock network is coming from the transmission line model as has been used in [10]. Each wire of trees or meshes is divided into several RLC sections modeled as a RLC chain circuit.

In this subsection, we show how a RLC chain circuit can be further reduced by an equivalent circuit consisting of only the cross nodes. This is done by repeatedly transforming a **Y** model circuit to a  $\pi^{\dagger}$  model circuit as shown in Fig. 9:

In Fig. 9,  $E_a$  and  $E_b$  are the currents flowing into node a and node b respectively, and the corresponding arrowheads show the current directions. To compute the equivalent currents and resistances shown in the right-hand side of Fig. 9, we first look at node a and we have:

$$V_{b} - V_{a} = R_{a}(E_{a} - I_{a}) + R_{b} \left( E_{a} + I_{c} + \frac{V_{a} + R_{a}(E_{a} - I_{a})}{R_{c}} + I_{b} \right)$$

We define the following equivalent resistors:

$$R_{ab} = \frac{R_a R_b + R_a R_c + R_b R_c}{R_c} \tag{17}$$

$$R_{ac} = \frac{R_a R_b + R_a R_c + R_b R_c}{R_b}$$
(18)

$$R_{bc} = \frac{R_a R_b + R_a R_c + R_b R_c}{R_a} \tag{19}$$

As a result, we have

$$E_a = \frac{V_b}{R_{ab}} - \frac{V_a}{R_{ac}} + \frac{R_a I_a - R_b I_b}{R_{ab}} + \frac{R_c I_c - R_a I_a}{R_{ac}}$$

Similar result can be obtained for node *b*:

$$E_{b} = \frac{V_{a} - V_{b}}{R_{ab}} - \frac{V_{b}}{R_{bc}} - \frac{R_{a}I_{a} - R_{b}I_{b}}{R_{ab}} - \frac{R_{c}I_{c} - R_{b}I_{b}}{R_{bc}}$$

We further define the following equivalent currents:

$$I_{ab} = \frac{R_a I_a - R_b I_b}{R_{ab}} \tag{20}$$

$$I_{ac} = \frac{R_c I_c - R_a I_a}{R_{ac}} \tag{21}$$

$$I_{bc} = \frac{R_c I_c - R_b I_b}{R_{bc}}$$
(22)

Finally we can represent  $E_a$  and  $E_b$  in terms of those equivalent currents and resistances as follows:

$$E_{a} = \frac{V_{b} - V_{a}}{R_{ab}} - \frac{V_{a}}{R_{ac}} + I_{ab} - I_{ac}$$
(23)

$$E_{b} = \frac{V_{a} - V_{b}}{R_{ab}} - \frac{V_{b}}{R_{bc}} - I_{ab} - I_{bc}$$
(24)

Equations (23) and (24) essentially give us the  $\pi$  model representation of the same circuit as shown in the right-hand side of Fig. 9.

# 3.5 Topology Reduction and Reconstruction

First, we suppress all the tree structures as in Sect. 3.2; after this we reduce all the RL nodes as in Sect. 3.3; then we repeatedly apply the **Y** model to  $\pi$  model transformation as shown in Sect. 3.4 starting from the node at one end of the chain until we reach the node at another end of the chain. After the topology reduction of the hybrid network, we proceed to further solve the linear Eq. (3) of the reduced system by preconditioned conjugate gradients methods [7].

Topology reconstruction does the inverse steps. After the voltages at the nodes of the reduced system are computed,  $E_a$  and  $E_b$  can then be computed via Eq. (23) Eq. (24). The voltages at the intermediate nodes of the chain can be obtained iteratively by the voltage at one end of the reconstructed chain plus the voltage drop on the equivalent resistor. Then the voltages at the RL nodes and node inside tree structures can be simply computed via equation and Eq. (13) Eq. (14) and Eq. (15).

## 4. Simulation Time-Step Tuning

After we finished the structure reduction, a hierarchical simulation time-step tuning scheme is used to get the delay time between sink nodes. By cubic spine interpolation of the nodal voltages at each sink node, the ramp response or the step response for each node is plotted. Delay time can be obtained by a one-dimensional root finding method such as Bisection method or Brent method [14] via the evaluation of the spine after the interpolation.

In order to better tune the simulation step, the maximum delay of the measured nodes, which can be taken as the

<sup>&</sup>lt;sup>†</sup>Also called Y to  $\Delta$  transformation.

end time for simulation, must be approximately estimated. This can be archived by using Elmore delay model as Elmore model gives the upper bound on the worst-case propagation delay for interconnects [11]. After this, we can split the maximum Elmore delay into several hundreds or thousands simulation steps depending on the accuracy needed.

The overall algorithm can be described as in Fig. 10.

Here *refresh* means the computation of the equivalent circuit currents based on the aforementioned structure reduction equations.

### 5. Experimental Results

Our algorithm is implemented in C language. The number of nodes in our test cases ranges from 1 thousand to 0.2 millions. Statistics information is shown in Table 1. The using of transmission line RLC chains are incorporated in the test circuit generation. In our implementation, they are put in some wires of the mesh structures. So the number of RL-

| Solver {                                                 |
|----------------------------------------------------------|
| compute the maximum Elmore delay                         |
| determine the simulation step from the maximum delay     |
| build node chain from the netlist file ;                 |
| build capacitance and inductance chain ;                 |
| build branch and RL node chain ;                         |
| simplify tree structure ;                                |
| simplify RL nodes ;                                      |
| simplify transmission RLC chains                         |
| build $G_{n \times n}$ matrix ;                          |
| while timepoint < simulatetime                           |
| {                                                        |
| refresh current on local trees;                          |
| refresh current on RL branch ;                           |
| refresh current on transmission RLC chains               |
| refresh $IS_n^t$ vector;                                 |
| solve (3) using PCG;                                     |
| reconstruct volages of transmission RLC chains           |
| reconstruct voltages of RL nodes ;                       |
| reconstruct voltages of tree nodes ;                     |
| if all the voltages of sink nodes overpass the threshold |
| print the delay time via interpolation                   |
| refresh current of inductances ;                         |
| refresh current of capacitances ;                        |
| timepoint+=timestep;                                     |
| }                                                        |
| }                                                        |

Fig. 10 The main hyper-structured clock network analysis flow.

nodes and transmission chains are proportional to the mesh size which could be seen from the table. All the experimental results are obtained on Sun Ultra Sparc workstation with 4 250 MHz CPUs, 8 GB memory. Simulation steps are controlled for the desired accuracy. We analyze the circuit until the delay times of all the sink nodes are obtained.

From the results, we can clearly see that, three types of topology reduction all contribute to the speedup. When no local trees are present, the speedup is not as large as cases with many local trees. However, the reduction of RL-nodes and RLC chain still lead to a decent speedup. Note that the performance of the preconditioned Krylov subspace iterative method is illustrated in [7] and the speedup of this method is roughly about 10-20X over Spice when no reduction is performed.

Note that in order to make a fair comparison, the CPU times listed in Table 1 include the initial build-up time such as the topology reduction and matrix building.

The maximum relative error of the 50% delay measured is within 0.4% of the given by HSPICE. The accuracy can be further improved by decreasing the simulation steps. It is clear that the proposed method is fast while also accurate for hybrid structured clock networks.

#### 6. Conclusion and Future Work

This paper has proposed an efficient technique for analysis of hybrid structured clock networks modeled as general RLC networks with trees and meshes in time-domain. At each time step, after integration approximation by Norton-form companion models, we first perform the topology/structure reduction for many RLC trees, RL nodes and transmission RLC chain circuits to reduce complexities of the original network. Then precondition conjugate gradient (PCG) based iterative method is used to solve the reduced resistor-only networks using nodal analysis formulation. Experimental results have demonstrated that the node reduction technique contributes at least one order of magnitude speedup over methods without node reduction. The resulting new algorithm is two orders of magnitude faster than HSPICE at almost no accuracy loss.

The topology reduction is first step toward efficient extraction of the timing characteristic of clock networks. We

| Total Nodes | Mesh Size | Number of<br>Tree Branches | HSPICE time (sec.) | Solver time (sec.) | Speedup over<br>HSPICE | Step time | Max relative error |
|-------------|-----------|----------------------------|--------------------|--------------------|------------------------|-----------|--------------------|
| 1451        | 10*10     | 54                         | 23.49              | 0.36               | 65.25                  | 0.1ns     | 0.37%              |
| 2531        | 16*19     | 192                        | 50.10              | 0.16               | 313.125                | 0.1ns     | 0.22%              |
| 6561        | 20*20     | 400                        | 208.85             | 1.89               | 110.50                 | 0.1ns     | 0.27%              |
| 7739        | 14*13     | 96                         | 259.74             | 3.73               | 69.63                  | 0.1ns     | 0.37%              |
| 19901       | 100*100   | 0                          | 2039               | 49.75              | 40.98                  | 1ns       | 0.29%              |
| 29021       | 20*20     | 306                        | 2159               | 35.91              | 60.12                  | 1ns       | 0.35%              |
| 189091      | 30*30     | 720                        | NA                 | 177.10             | NA                     | 5ns       | NA                 |

Table 1 Experiment results of the speed and accuracy.

are working on topology reduction via hierarchically partitioning the mesh network to further increase the speedup and accuracy of the whole algorithm.

#### References

- H. Su and S.S. Sapatnekar, "Hybrid structured clock network construction," Proc. IEEE Int. Conf. Computer-Aided Design (ICCAD), pp.333–336, 2001.
- [2] P.J. Restle, T.G. McNamara, D.A. Webber, P.J. Camporese, K.F. Eng, K.A. Jenkins, D.H. Allen, M.J. Rohn, M.P. Quaranta, D.W. Boerstler, C.J. Albert, C.A. Carter, R.N. Bailey, J.G. Petrovick, B.L. Krauter, and B.D. McCredie, "A clock distribution network for microprocessors," IEEE J. Solid-State Circuits, vol.36, no.5, pp.792– 799, May 2001.
- [3] L.T. Pillage and R.A. Rohrer, "Asymptotic waveform evaluation for timing analysis," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.9, no.4, pp.352–366, April 1990.
- [4] A. Odabasioglu, M. Celik, and L.T. Pilleggi, "PRIMA: Passive reduced-order interconnect macromodeling algorithm," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.17, no.8, pp.645– 654, Aug. 1998.
- [5] Y.I. Ismail, "Improved model-order reduction by using spatial information in moments," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.11, no.5, pp.900–908, Oct. 2003.
- [6] C.L. Ratzlaff and L.T. Pillage, "RICE: Rapid interconnect circuit evaluation using AWE," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.13, no.6, pp.763–776, June 1994.
- [7] T. Chen and C.C. Chen, "Efficient large-scale power grid analysis based on preconditioned Krylov-subspace iterative method," Proc. IEEE/ACM Design Automation Conf., pp.559–562, June 2001.
- [8] C-C.P. Chen and E. Cheng, "Future SOC design challenges and solutions," Proc. International Symposium on Quality Electronic Design, pp.534–537, March 2002.
- [9] P.J. Restle and A. Deutsch, "Design the best clock network," Proc. IEEE VLSI Circuit Symposium, pp.2–5, 1998.
- [10] D. Kucar and A. Vannelli, "Interconnection modeling using distributed RLC models," Proc. IEEE International Workshop on System-on-Chip for Real-Time Applications, pp.32–35, July 2003.
- [11] M.P. Desai, R. Cvijetic, and J. Jensen, "Sizing of clock distribution networks for high performance CPU chips," Proc. Design Automation Conference, pp.389–394, June 1996.
- [12] Y. Zou, Y. Cai, Q. Zhou, X. Hong, and S.X.-D. Tan, "A fast delay analysis algorithm for the hybrid structured clock network," Proc. Int. Conf. Computer Design (ICCD), pp.344–349, Oct. 2004.
- [13] A.E. Ruehli, "Equivalent circuits models for three dimensional multiconductor systems," IEEE Trans. Microw. Theory Tech., vol.22, no.3, pp.216–220, 1974.
- [14] R.P. Brent, "An algorithm with guaranteed convergence for finding a zero of a function," Comput. J., no.14, pp.422–425, 1971.





Yici Cai received B.S. degree in Electronic Engineering from Tsinghua University in 1983 and received M.S. degree in Computer Science and Technology from Tsinghua University in 1986. She has been an associate professor in the Department of Computer Science & Technology, Tsinghua University, Beijing, China. Her research interests include physical design automation for VLSI integrated circuits algorithms and theory.

Qiang Zhou received B.S. degree in computer science from University of Science and Technology of China in 1983, M.S. degree in computer science from Tsinghua University in 1986, and Ph.D. degree from Chinese University of Mining and Technology, Beijing in 2002. He has been an associate professor in the Department of Computer Science & Technology, Tsinghua University. Beijing, China. His research interests include VLSI layout algorithms and systems.



tute of Electronics.



University, Beijing, China in 1964. He was a Visiting Scholar at the University of California, Berkeley, and worked in the research group of Prof. E.S. Kuh form April 1991 to October 1992 and from June to September 1993. Since 1988, he has been a professor in the Department of Computer Science & Technology, Tsinghua University. His research interests include VLSI layout algorithms and DA systems. He is IEEE Follow and the Senior Member of Chinese Insti-

graduated from Tsinghua

Xianlong Hong

**Sheldon X.-D. Tan** received his B.S. and M.S. degrees in electrical engineering from Fudan University, Shanghai, China in 1992 and 1995, respectively and the Ph.D. degree in electrical and computer engineering from the University of Iowa, Iowa City, in 1999. He was a faculty member in the electrical engineering Department of Fudan University from 1995 to 1996. Now he is with the Department of Electrical Engineering, University California at Riverside as an Assistant Professor. His research in-

terests include several aspects of design automation for VLSI integrated circuits—signal integrity issues in VLSI physical design, high performance power/ground distribution network design and optimization, symbolic analysis and modeling of mixed signal/RF/analog circuits.



Yi Zou received B.S. degree from Computer Science and Technology in Tsinghua University, Beijing, P.R. China in 2004. Currently, he is a master candidate at EDA lab of computer science and technology of Tsinghua University. His research interests include clock synthesis, circuit simulation and verification and low power clock network.