# Using Rotary Clock For Low-Power Clock Distribution

# ABSTRACT

Rotary clock is a recently proposed clock distribution technique based on wave propagation in transmission lines. In this paper, we investigate the problem of power minimization of rotary clock designs. Specifically, we have developed a software tool based on the method of partial element equivalent circuit (PEEC) that is capable of extracting the SPICE netlist from the layout specification of a rotary clock design. Using our tool, we have performed extensive analysis that links various design parameters of a rotary clock design to its oscillation frequency and power dissipation. Based on the results of our analysis, we then propose a power minimization algorithm. Our algorithm derives a rotary clock structure that dissipates minimal power while satisfying the clock dimension requirement and oscillating at the target frequency with the given clock load. Experimental results have demonstrated that, for target operating frequencies ranging from 0.5 to 5 Gigahertz, rotary clock designs can achieve power savings of up to 69% in comparison with conventional clock tree implementations.

# 1. INTRODUCTION

In most synchronous CMOS ICs, global clock signals are transmitted from the clock source to individual registers using a network of interconnects and buffers. Electronic charges are placed on the clock network from the power grid during the charging procedure and drained to the ground during the discharging phase. Consequently, the power dissipation of clock networks increases significantly with the increase of operating frequencies and chip dimensions of modern VLSI systems. The design of low-power global clock distribution networks has become one of the major challenges for the IC industry.

Clock distribution networks can be designed as LC oscillators since clock signals simply oscillate at a given frequency. These oscillators store the energy in the inductors during the discharging phase and reuse it during the charging procedure, potentially achieving low-power clock distribution. In particular, the rotary clock technique proposed in [23] is one of such approaches. However, no design methodology has been presented to minimize the power dissipation of rotary clock designs. As a result, the true power saving capability of the rotary clock scheme remains unknown. In this paper, we investigate the low-power synthesis of rotary clock structures. Two major contributions are presented. First, we have developed a layout extraction tool for rotary clock designs. Our tool is based on the PEEC method [18] and therefore is highly accurate. Using our extraction tool, we are able to perform an extensive analysis that links various design parameters, e.g., interconnect width and separation, to the oscillation frequency and power dissipation of a rotary clock design. Based on our analysis, we then propose a novel low-power rotary clock design methodology. Specifically, given a design specification including the target clock frequency, the clock structure dimension, and the total clock load, our scheme searches for a rotary design with the lowest power consumption while satisfying the design specification. To our knowledge, our work is the first attempt on rotary clock power minimization.

We have applied our power minimization algorithm to rotary clock designs with a range of target frequencies, various dimensions, and different clock loads. Experimental results demonstrate that, in comparison to the power-unaware approaches, our scheme achieves an average power reduction of 24.3%. Our data also reveal that the rotary clock scheme can reduce the power consumption by 69% in comparison to clock tree designs.

The remainder of this paper is organized as follows. Section 2 briefly reviews previous clock distribution schemes. It also provides the background material on the rotary clock design. In Section 3, our layout extraction tool is described. In Section 4, detailed analysis is given that relates various design parameters to the frequency and power of a rotary clock design. Our power minimization methodology is proposed in Section 5. Section 6 gives the experimental results. Section 7 summarizes our paper.

## 2. BACKGROUND

#### 2.1 Previous Clocking Schemes

Extensive efforts have been devoted to global clock distribution [3, 4]. This section can only give a very brief review on the related research due to space limitation.

Early clock distribution networks are implemented using clock trees. For modern VLSI designs, however, the combination of high target frequencies and significant physical variations makes it very difficult, if possible, for even balanced clock trees, e.g., H-trees, to satisfy the strict clock skew requirements. Consequently, non-tree topologies, e.g., meshes [16] and spines [9], are widely adopted. In addition, active devices such as PLLs are often added to further reduce clock skews [7, 21]. As a result, clock networks become highly complex, leading to large power consumption. A plethora of research has been proposed to reduce the clock power, which includes clock signal activity reduction, clock swing reduction, and clock load reduction [8, 13].

Conventional clocking schemes based on capacitance charging become incompetent in implementing low-power clock distribution for future SoCs due to the high dynamic power consumed by the large clock load. Clocking techniques based on the LC resonant oscillation have been proposed. Such oscillators have energy recovery capabilities and therefore can provide stable clock signals while dissipating a small amount of power. In particular, a clock network based on standing-wave oscillators are presented in [12]. In [24], a single-phase resonant clock structure for adiabatic circuits is described. However, both schemes produce sinusoidal clock waveforms and require sine-to-square wave converters or special logics, which lead to power overhead.

# 2.2 Rotary Clock

Rotary clock is a resonant clock distribution scheme that produces square-wave clock signals. The circuit structure of rotary clock and its operating principle can be explained by Figure 1. Specifically, a rotary ring is a double loop made of interconnects. A voltage wave can propagate in the transmission line formed by the parallel interconnects of the inner and outer loops. Since the transmission line inverts at point A, the voltage wave changes its polarity during the consecutive rotations. As a result, every location along the ring provides a square-wave clock signal. Registers can be connected to the rotary ring by attaching their clock inputs to the interconnects. Inverter pairs connected back-to-back are attached to the clock ring to compensate the energy loss due to the resistivity of the interconnects so that the clock wave can be sustained indefinitely. Multiple rotary rings can be connected to form a rotary array as shown in Figure 1(b). The adjacent rings are joined together by two nodes, one in the inner loop and one in the outer loop. The clock phases of different rings are locked at the merge points so that the rings synchronize themselves at a single frequency.



Figure 1: Rotary clock structure (a) ring (b) array.

The design and optimization of rotary clock arrays are much more challenging than those of the conventional clock networks. A rotary array has many design parameters that need to be determined, including the number of rings, the dimension of each ring, the interconnect width, the separation between the inner and outer interconnect loops, the number of inverter pairs, and the location and size of each inverter. Moreover, the selection of these parameters must satisfy two constraints. Specifically, different from the conventional clock networks, a rotary clock structure does not contain a clock source. As a result, the design parameters must be selected so that the resulting rotary array oscillates at the target frequency with the given clock load. Second, since the registers are attached to the rotary clock interconnects, the rotary array must cover the entire chip to facilitate the register placement. In addition to satisfying the above constraints, the rotary clock array must dissipate the minimal power.

In this paper, we introduce some restrictions on the selection of ro-

tary array parameters to simplify the design procedure. First, the number of inverter pairs per rotary ring is fixed. Moreover, all the inverters are same and distributed evenly along their corresponding rotary rings. As a result, we can use the inverter width to describe the inverter design. Second, we assume that both the rotary array and rotary rings are in square shape. Therefore, we can use the width of a ring or an array to represent its dimension. Third, we make all the rings in an array identical and assume that clock load can be evenly distributed to all the rings. Consequently, once one rotary ring is designed, we can connect multiple rings in a grid fashion to form the array as shown in Figure 1(b). Finally, we assume that all interconnects have the same width and the separation between the inner and outer interconnect loops are uniform. With our simplification, the problem of low-power rotary clock (LPRC) design can be described as follows:

**Problem LPRC:** Given a target frequency *F*, the total capacitive clock load *C*, and clock array width *D*, compute the interconnect width *w*, the interconnect separation *s*, the width of the inverters *W*, and the width of a single rotary ring *d* so that n = D/d is an integer, and the rotary clock array made of  $\lfloor (n^2 + 1)/2 \rfloor$  rings oscillates at *F* with *C* evenly distributed to all the individual rings.

# 3. PEEC BASED EXTRACTION TOOL

To derive an efficient algorithm to solve Problem LPRC, it is critical to understand the relations between various design parameters and the frequency or power dissipation of a rotary clock structure. To this end, we have designed a layout extraction tool based on the PEEC method. Our tool is capable of converting the geometric specification of a rotary ring into a SPICE netlist. Consequently, we can accurately derive the oscillation frequency and power dissipation of the rotary ring using SPICE simulations. In this section, we first describe the PEEC method. We then present our PEEC-based extraction tool.

# **3.1 PEEC Method**

The PEEC method is proposed to derive the equivalent circuit for a design that consists of an arbitrary number of conductors in any shapes [17, 18, 19]. It is considered as a highly accurate scheme and has been widely used in applications such as RF system design [10, 14] and spiral inductance extraction [11, 22]. Specifically, the PEEC method is based on the Maxwell's equation used to compute the electrical field  $\vec{E}$  at any 3-dimensional point  $\vec{r}$  and time *t*, namely

$$\vec{E}(\vec{r},t) = \frac{\vec{J}(\vec{r},t)}{\sigma} + \frac{\partial \vec{A}(\vec{r},t)}{\partial t} + \nabla \Phi(\vec{r},t) , \qquad (1)$$

where  $\vec{J}$ ,  $\sigma$ ,  $\vec{A}$ , and  $\Phi$  are the current density, conductivity of the conductor, magnetic vector potential, and electric scalar potential, respectively. The magnetic vector potential  $\vec{A}$  and electrical scalar potential  $\Phi$  can be calculated by integrating the current density  $\vec{J}$  and charge density q within all conductors, respectively:

$$\vec{A}(\vec{r},t) = \sum_{k=1}^{K} \frac{\mu}{4\pi} \int_{\nu_k} \frac{\vec{J}(\vec{r}',t')}{|\vec{r}-\vec{r}'|} d\nu' , \qquad (2)$$

$$\Phi(\vec{r},t) = \sum_{k=1}^{K} \frac{1}{4\pi\varepsilon} \int_{\nu_k} \frac{q(\vec{r}',t')}{|\vec{r}-\vec{r}'|} d\nu', \qquad (3)$$

where *K* is the number of conductors,  $\vec{r}'$  is the location of dv', and  $v_k$  represents the volume of conductor *k*. The parameters  $\mu$  and  $\varepsilon$  are permeability and permittivity of the medium at  $\vec{r}$ . The variable

t' is the retarded time and can be calculated as:

$$t' = t - \frac{|\vec{r} - \vec{r}'|}{c(\vec{r}, \vec{r}')} , \qquad (4)$$

where  $c(\vec{r}, \vec{r}')$  is the average speed of the electromagnetic wave between  $\vec{r}$  and  $\vec{r}'$ .

Although Equation (1) accurately captures the electrical field, its application to real circuit is limited due to its high computational complexity. The PEEC method simplifies Equation (1) by introducing several approximations. Specifically, since the dimension of the target circuit is often small, the retarded time t' at any location dv' is set equal to t. Furthermore, each conductor is divided into several elements so that the current density is constant within each element. As a result, the electrical field can be derived by combining Equations (1)–(3) as follows:

$$\vec{E}(\vec{r},t) = \frac{J(\vec{r},t)}{\sigma} + \sum_{k=1}^{K} \sum_{n=1}^{N_k} \frac{\mu}{4\pi} \left[ \int_{v_{nk}} \frac{1}{|\vec{r} - \vec{r}'|} dv' \right] \frac{\partial \vec{J}_{nk}(t)}{\partial t} + \sum_{k=1}^{K} \nabla \left[ \frac{1}{4\pi\epsilon} \int_{v_k} \frac{q(\vec{r}',t')}{|\vec{r} - \vec{r}'|} dv' \right], \quad (5)$$

where  $N_k$  is number of elements in conductor k, and  $v_{nk}$  and  $\vec{J}_{nk}$  are the volume and current density of the *n*th element in conductor k.

The first and third terms in the right hand side of Equation (5) are related to the resistance and capacitance of the conductors, which can be extracted from the geometric description of the design. The second term is related to the inductance effect, which is modeled as follows. Without losing the generality, all elements  $v_{nk}$  are assumed to be cylinders with current flowing in parallel to their own axises. Furthermore, let  $a_{nk}$  be the cross-section area of the cylinder  $v_{nk}$ . Given a cylinder *s*, the voltage potential  $v_s^L$  between the top and bottom surfaces contributed by the inductance effect can be derived by integrating the second term in the right hand side of Equation (5) along the axis of *s*:

$$v_{s}^{L} = \sum_{k=1}^{K} \sum_{n=1}^{N_{k}} \left[ \frac{\mu}{4\pi} \frac{1}{a_{nk}} \int_{l_{s}^{b}}^{l_{s}^{e}} \int_{v_{nk}} \frac{\cos\theta}{\left| \bar{r} - \bar{r} \right|} dv' dl \right] \frac{dI_{nk}(t)}{dt}$$

where  $\theta$  is the angle between the axises of *s* and  $v_{nk}$ ,  $I_{nk}$  is the current through  $v_{nk}$ , and  $l_s^b$  and  $l_s^e$  are the two end points of the axis of *s*. If we denote

$$L_{s,nk} = \frac{\mu}{4\pi} \frac{1}{a_{nk}} \int_{l_s^b}^{l_s^e} \int_{v_{nk}} \frac{\cos\theta}{|\bar{r} - \bar{r}|} dv' dl , \qquad (7)$$

Equation (6) can be rewritten as

$$v_s^L = \sum_{k=1}^K \sum_{n=1}^{N_k} L_{s,nk} \frac{dI_{nk}(t)}{dt} .$$
(8)

Apparently,  $L_{s,nk}$  can be considered as the partial inductance extracted between elements *s* and  $v_{nk}$ . The inductance effect of all the conductors can therefore be represented by a set of partial inductors. Together with the other extracted circuit components, e.g., resistors and capacitors, these inductors can be simulated in SPICE to derive the electrical behavior of the given 3-dimensional design.

#### **3.2 Rotary Clock Extraction**

In this subsection, we present our layout extraction tool based on the PEEC method for rotary clock designs. Figure 2 illustrates our way to partition a rotary clock structure. As can be seen, except for the region where the inner and outer clock rings switch, the interconnects are either horizontal or vertical. To apply the PEEC method, we first partition the interconnect into pairs of parallel metal lines along the current flowing direction. We then divide each metal line within every parallel interconnect pair into several evenly wide filaments as shown in Figure 2. In the current version of our tool, we do not divide the interconnect along the up-down direction. As a result, we only need to determine two segmentation parameters: the length of the interconnect pairs  $l_p$  and the width of the filament  $w_f$ .

Two important issues need to be addressed in the selection of  $l_p$  and  $w_f$ . First, their values must be small enough so that the circuit netlists extracted reflect the transmission line characteristics, ensuring the accuracy of the ensuing circuit analysis. On the other hand, since the number of mutual inductance grows quadratically with the number of elements, the values  $l_p$  and  $w_f$  must be large enough so that the number of elements generated is limited, avoiding the prohibitively long circuit simulation time. In the transmission line analysis, a rule of thumb for maintaining high accuracy is [20]

$$t_{prop} < t_r , \qquad (9)$$

where  $t_{prop}$  is the time-of-flight derived using the element length and  $t_r$  is the signal rising or falling time, whichever is smaller. In the rotary clock, the velocity of the traveling wave is  $2lf_c$ , where *l* is the perimeter of the rotary clock structure and  $f_c$  is the clock frequency. As a result,  $t_{prop}$  can be calculated as

$$t_{prop} = l_p / (2lf_c) . \tag{10}$$

Combining Inequality (9) and Equation (10), we have

$$l_p < 2lf_c t_r , \qquad (11)$$

In addition, since each interconnect filament will be converted into a lumped inductor as shown in Figure 2, any inverter pair can only be connected to the two endpoints of the filament. Therefore, we must have

$$l_p \le d_{inv} , \qquad (12)$$

where  $d_{inv}$  is the distance between adjacent inverter pairs. Consequently, we set  $l_p$  to the maximum value that satisfies Inequalities (11) and (12). To determine the width of the filament  $w_f$ , we follow the empirical rule below based on the experimental data in [15].

$$w_f \le 0.1s$$
, (13)

where s is the space between the inner and outer interconnects of the rotary clock structure.

Once a rotary clock design is partitioned into multiple segments, its equivalent circuit is then extracted. Specifically, the self inductance of the element is derived in the same way as that in [15]. The mutual inductance between any two elements depends on the orientations and locations of both elements. For rotary clock designs, there are four different cases as shown in Figure 3. When two elements are perpendicular to each other, as shown in Figure 3(a), their mutual inductance is often negligible and therefore set to zero in our scheme. For the other three cases in which the elements are in parallel, the mutual inductance values are calculated using the analytical formulas in [6].

The modeling of resistance, capacitance, and transistors is straightforward. Specifically, the resistance of each element is derived using the element dimensions and the sheet resistance. The capac-



Figure 2: Segmentation of rotary clock.



Figure 3: Four cases in the mutual inductance derivation.

itance is calculated by scaling the values extracted by *icfb*, a Cadence layout design tool. The transistors are represented by the corresponding SPICE models from TSMC.

# 4. FREQUENCY AND POWER ANALYSIS OF ROTARY CLOCK DESIGNS

Our PEEC-based extraction tool can convert the layout of a rotary clock design into a SPICE netlist. As a result, we can accurately derive the timing and power information of the rotary clock design using SPICE simulations. We next perform a detailed analysis on rotary clock designs.

We first investigate the selection of interconnect width w and separation s. Specifically, we have designed and simulated a set of rotary rings with constant length and capacitive load. Figure 4 shows the solution space with contours of constant frequency and power. The constant-frequency contours close to the horizontal axis represent low frequencies. As a result, oscillation frequency is a monotonic function of w. The solutions with a constant power  $P_u$  form a closed curve u. The solutions inside u dissipate less power than  $P_u$ . It is clear that the power curve is convex with only a single minimum. Note that, since the separation s is defined as the distance between the centerlines of the interconnect width w. As a result, the top-left corner of the solution space is not realizable.



Figure 4: (a) Frequency and (b) power contours of a rotary ring with respect to interconnect width and separation.

Similarly, we plot the frequency and power contours with respect to the clock load c and inverter widths W in Figure 5. (The width of the NMOS transistor is used to represent W and the PMOS-NMOS transistor width ratio is kept constant.) Both c and W affect the frequency by changing the equivalent capacitance of the transmission lines in the clock ring. As a result, the frequency contours are straight lines. The gradient of the frequency surface is nearly horizontal, indicating that the passive capacitance load has a larger impact than inverters, whose gate and drain capacitances are relatively small. The power contours are also close to straight lines. However, the power impact of inverters is comparable to the clock load. It is worth pointing out that there is an empty zone at the rightlower corner of the figures. Such a scenario is explained by the fact that when the inverters are small and capacitive load is large, the rotary ring stops oscillating. Consequently, a lower bound of W-to-c ratio needs to be maintained.



Figure 5: (a) Frequency and (b) power contours of a rotary ring with respect to clock load and inverter width.

Figure 6 shows the frequency and power contours with respect to the ring width d and clock load c. The interconnect width w, interconnect separation s, and inverter width W are kept constant. It can be seen that the ring width has the largest impact on the frequency among all the parameters since the slope of the frequency surface in the vertical direction is steep. The oscillation frequency decreases monotonically with the increase of d. The ring width d does not affect the power dissipation significantly, however, especially when d is large. The empty region near the bottom indicates that the rotary ring stops functioning when the clock load per transmission line length is too large.

## 5. LOW-POWER ROTARY CLOCK DESIGN

This section presents our heuristic algorithm that solves Problem LPRC. Our algorithm first derives a solution that satisfies the target frequency. It then conducts a search in the solution space to minimize the power dissipation.

The procedure to calculate the initial solution is given in Figure 7.



Figure 6: (a) Frequency and (b) power contours of a rotary ring with respect to interconnect length and clock load.

INITIALIZATION (F, D, C)1 choose  $d_{min}$  and  $d_{max}$  based on empirical data 2 while(1) 3  $d=(d_{min}+d_{max})/2$ 4  $s = \beta \cdot d, n = \lfloor D/d \rfloor$ 5  $w = \gamma \cdot s, m = \lfloor (n^2 + 1)/2 \rfloor$ 6 c = C/m7  $W=\alpha \cdot c$ 8 f=SPICE(d, s, w, W, c)9 if  $(|F - f|/F \le \varepsilon_f)$  break 10  $d_{max} = (F > f)?d: d_{max}$ 11  $d_{min} = (F < f)?d: d_{min}$ 12 return (d, s, w, W)

Figure 7: Subroutine INITIALIZATION.

It performs a binary search on the rotary ring width *d*, since *d* has the largest impact on the oscillation frequency based our analysis. Specifically, the search range of *d* is set in Line 1. The **while** loop in Lines 2 to 11 is repeated until the range is smaller than a threshold. During each iteration, all other design parameters are calculated using *d* on Lines 3–7. The constants, i.e.,  $\beta$ ,  $\gamma$ , and  $\alpha$  used in the procedure are derived from the case studies in [23] to ensure that the resulting designs can oscillate. SPICE simulations are used to derive the oscillation frequencies in all iterations. The final solution is returned in Line 12.

Once the initial solution is computed, our heuristic analyzes a sequences of rotary clock designs to minimize power dissipation. Specifically, the rotary ring width d and inverter width W are first determined. For each (d, W) pair, the optimal interconnect width

TUNE\_WIRE (F, c, d, W)1 set  $s_{max}$  and  $s_{min}$  based on d2 while  $(s_{max} - s_{min})/s_{min} > \varepsilon_s$ 3  $s1 = (s_{min} + s_{max})/2 - \delta$ ,  $s2 = (s_{min} + s_{max})/2 + \delta$ ,  $\delta > 0$  is a small real number 4  $w_1 = \gamma \cdot s_1, w_2 = \gamma \cdot s_2$ 5  $(f_1, p_1)$ =SPICE $(d, s_1, w_1, W, c)$ 6 while  $(|F - f_1|/F > \varepsilon_f)$ 7 tune  $w_1$  and re-perform SPICE for new  $(f_1, p_1)$ 8  $(f_2, p_2)$ =SPICE $(d, s_2, w_2, W, c)$ 9 while  $(|F - f_2|/F > \epsilon_f)$ 10 tune  $w_2$  and re-perform SPICE for new  $(f_2, p_2)$ 11 **if**  $(p_1 > p_2) s_{min} = s_1$ 12 else if  $(p_1 < p_2) s_{max} = s_2$ 13 else  $s_{min} = s_1, s_{max} = s_2$ 14 **return**  $(p_1, s_1, w_1)$ 

Figure 8: Subroutine TUNE\_WIRE.

ROTARYPMIN (F, D, C)1  $(d_0, s_0, w_0, W_0)$  = Initialization(F, D, C) $2 n_0 = |D/d_0|$ 3 for  $(n = n_0 - 1, n_0 + 1, 1)$ 4 d = D/nfor  $(W = 1.2W_0, 0.2W_0, 0.2W_0)$ 5 6  $m = \lfloor (n^2 + 1)/2 \rfloor$ 7 c=C/m 8  $(p, s, w) = \text{TUNE}_WIRE(F, c, d, W)$ 9 **if**  $(p_{min} > p)$ 10  $p_{min} = p, (d_{min}, W_{min}, s_{min}, w_{min}) = (d, W, s, w)$ 11 return  $(d_{min}, W_{min}, s_{min}, w_{min})$ 

#### Figure 9: Algorithm ROTARYPMIN.

*w* and separation *s* are computed so that the resulting rotary array oscillates at frequency *F* and dissipates the minimal power. The procedure TUNE\_WIRE for the calculation of (s, w) is shown in Figure 8. It is a modified bisection algorithm. Intuitively, an interval of *s* is first created with the optimal *s* inside. During each iteration, the length of the interval is reduced by about half while still containing the optimum. The corresponding *w* values are adjusted so that the rotary ring oscillates at the target frequency. The optimum (s, w) is derived when the length of the interval is less than a threshold. Since, from Figure 4, there is only one minimum in the (s, w) solution space, TUNE\_WIRE guarantees to converge. (Proof is omitted due to space limitations.)

Figure 9 gives the pseudocode of our algorithm, called ROTARYP-MIN, to solve Problem LPRC. Specifically, ROTARYPMIN derives the initial solution in Line 1. It selects the ring width *d* and inverter width *W* exhaustively in a region around the initial solution using double-nested **for** loops in Lines 3–5. For each (d, W) pair, it uses TUNE\_WIRE to compute the optimum (s, w) and the corresponding power dissipation *p*. It then updates the current best solution in Lines 9–10. Algorithm ROTARYPMIN returns the best design parameters when all (d, W) pairs have been visited.

# 6. EXPERIMENTAL RESULTS

We applied our scheme to designs of different specifications. Specifically, we selected the target frequency, in gigahertz, from the set {0.5, 1.0, 2.0, 3.0, 4.0, 5.0}. The clock array width ranged from 7 mm to 20 mm. We set the total clock load proportional to the chip area, i.e.,  $C = \theta D^2$ , where  $\theta$  was from {6.8, 9.6, 12.4} in pf/mm<sup>2</sup> according to industrial microprocessor designs [1, 2, 5]. For all clock rings designed, we inserted 32 evenly distributed inverter pairs. Our rotary arrays were designed in a 0.18  $\mu$ m technology with a 1.8V power supply.

Two comparisons were made. First, results from our scheme were compared with those of Subroutine INITIALIZATION. Since INITIALIZATION only generates solutions with the required frequencies, i.e., it is power-unaware, such a comparison indicates the effectiveness of our scheme, which is independent of the rotary clock technique. The second comparison was between the results of our approach and those using conventional clock trees. The power dissipation values of the clock trees were calculated as  $CV_{dd}^2 F$ . This comparison reveals the true power saving potential of the rotary clock technique.

Figure 10 shows the experimental results. The first column gives the power savings of our scheme over INITIALIZATION. Our ap-



Figure 10: Power savings vs (a) power-unaware rotary clock scheme (b) clock tree scheme.

proach achieves 24.3% power savings on the average compared to the power-unaware rotary clock design approach. The maximal power reduction is 60%. Better power savings are achieved at lower frequencies. As shown by the figures in the second column, our low-power rotary design method achieves a power reduction of up to 69% in comparison with conventional clock trees. The average power reduction is 47.5%. It is worth mentioning that the power values of conventional clock tree designs are underestimated since we ignore the capacitance of clock tree interconnects and buffers. The comparison under realistic scenarios would result in even larger power saving improvement.

# 7. CONCLUSION

In this paper, we propose a rotary clock design methodology for low-power clock distribution. Specifically, we first develop a PEECbased extraction tool so that we can simulate a given rotary clock structure using SPICE. Using our tool, we then perform extensive simulations to analyze the impacts of various design parameters on the frequency and power dissipation of rotary clock rings. Based on the analysis result, we propose the first power minimization algorithm for rotary clock designs. Experimental results have demonstrated that rotary clock arrays generated using our proposed scheme achieve up to 69% power reduction on the average in comparison to the conventional clock trees.

# 8. REFERENCES

- F. E. Anderson, J. S. Wells, and E. Z. Berta. The core clock system on the next generation itanium microprocessor. In *Inter. Solid-State Circuits Conf.*, Feb. 2002.
- [2] D. E. Duarte, N. Vijaykrishnan, and M. J. Irwin. Impact of technology scaling in the clock system power. In *Proc. of the IEEE Computer Society Annual Symp. on VLSI*, Apr. 2002.
- [3] E. G. Friedman. Clock Distribution Networks in VLSI Circuits and Systems. IEEE Press, 1995.
- [4] É. G. Friedman. Clock distribution networks in synchronous digital integrated circuits. *Proceedings of the IEEE*, 89(5):665–692, May 2001.

- [5] P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L. Allmon. High-performance microprocessor design. *Journal of Solid-State Circuits*, 33(5):676–686, May 1998.
- [6] F. W. Grover. Inductance Calculations: Working Formulas and Tables. Dover Publications Inc., 180 Varick Street, New York, N.Y. 10014, 1973.
- [7] V. Gutnik and A. P. Chandrakasan. Active GHz clock network using distributed PLLs. *Journal of Solid-State Circuits*, 35(11):1553–1560, Nov. 2000.
- [8] A. Hemani et al. Lowering power consumption in clock by using globally asynchronous locally synchronous design style. In *Design Automation Conference*, June 1999.
- [9] N. A. Kurd et al. A multigigahertz clocking scheme for the Pentium 4 microprocessor. *Journal of Solid-State Circuits*, 36(11):1647–1653, Nov. 2001.
- [10] A. F. Milsom, K. J. Scott, G.Clark, J. C. McEntegart, S. Ahmed, and F. N. Soper. Facet - a cae system for rf analogue simulation including layout. In *IEEE Design Automation Conf.*, pages 622–625, June 1989.
- [11] A. M. Niknejad and R. G. Meyer. Analysis, design, and optimization of spiral inductors and transformers for si rf ic's. *Solid-State Circuits*, 33(10):1654–1665, Oct. 1998.
- [12] F. O'Mahony, C. P. Yue, M. A. Horowitz, and S. S. Wong. A 10-GHz global clock distribution using coupled standing-wave oscillators. *Journal of Solid-State Circuits*, 38(11):1813–1820, Nov. 2003.
- [13] J. Pangjun and S. Sapatnekar. Low-power clock distribution using multiple voltages and reduced swings. *IEEE Trans. VLSI Systems*, 10(3):309–318, Mar. 2002.
- [14] J. O. Plouchart, H. A. andM Soyuer, and A. E. Ruehli. A fully-monolithic sige differential voltage-controlled oscillator for 5 GHz wireless applications. In *IEEE Radio Frequency Integrated Circuits Symp.*, June 2000.
- [15] C. K. R. Wu and K. Chang. Inductance and resistance computations for three-dimensional multiconductor interconnection structures. *IEEE Trans. Microwave Theory and Technique*, 40(2), Feb. 1992.
- [16] P. J. Restle et al. A clock distribution network for microprocessors. *Journal of Solid-State Circuits*, 36(5):792–799, May 2001.
- [17] A. E. Ruehli. Inductance calculations in a complex integrated circuit environment. *IBM Journal of Research and Development*, pages 470–481, Sept. 1972.
- [18] A. E. Ruehli. Equivalent circuit models for three-dimensional multiconductor systems. *IEEE Transactions on Microwave Theory* and Techniques., MTT-22(3), Mar. 1974.
- [19] A. E. Ruehli and P. A. Brennan. Efficient capacitance calculations for three-dimensional multiconductor systems. *IEEE Transactions on Microwave Theory and Techniques*, MTT-21(2):76–82, Feb. 1973.
- [20] G. W. H. S. H. Hall and J. A. McCall. High Speed Digital System Design:: a Handbook of Interconnect Theory and Design Practices. New York: Willey, c2000, 2000.
- [21] M. Saint-Laurent, M. Swaminathan, and J. D. Meindl. On the micro-architectural impact of clock distribution using multiple PLLs. In *Inter. Conf. Computer Design*, Sept. 2001.
- [22] N. Talwalkar, C. P. Yue, and S. S. Wong. Compact modeling of high frequency phenomena for on-chip spiral inductors. In *Workshop on Compact Modeling*, Feb. 2003.
- [23] J. Wood, T. C. Edwards, and S. Lipa. Rotary traveling-wave oscillator arrays: A new clock technology. *Journal of Solid-State Circuits*, 32(11):1654–1665, Nov. 2001.
- [24] C. H. Ziesler, S. Kim, and M. C. Papaefthymiou. A power-clock generator for true single-phase adiabatic logic. In *Proceedings of International Symposium on Low-Power Electronics and Design*, 2001.