# Noise Driven In-Package Decoupling Capacitor Optimization for Power Integrity

Jun Chen Electrical Engineering Department University of California, Los Angeles jchen@ee.ucla.edu

## ABSTRACT

The existing decoupling capacitance optimization approaches meet constraints on input impedance for package. In this paper, we show that using impedance as constraints leads to large overdesign and then develop a noise driven optimization algorithm for decoupling capacitors in packages for power integrity. To solve the worst case noise in the power delivery system, our algorithm uses the simulated annealing algorithm to minimize the total cost of decoupling capacitors under the constraints of a worst case noise. The key enabler for efficient optimization is an incremental worst-case noise computation based on FFT over incremental impedance matrix evaluation. Compared to the existing impedance based approaches, our algorithm reduces the decoupling capacitor cost by  $3 \times$  and is also more than  $10 \times$  faster even with explicit noise computation.

## **Categories and Subject Descriptors**

B.7.2 [Hardware]: Integrated Circuits—Design Aids; B.8.2 [Hardware]: Performance and Reliability—Performance Analysis and Design Aids

## **General Terms**

Design, performance, reliability

#### Keywords

Decoupling capacitor, power distribution system, IC package, power integrity, resonance, noise, modeling

## 1. INTRODUCTION

Power integrity is very important for the performance of integrated circuits. Compromising it may lead to logic errors

Copyright 2006 ACM 1-59593-299-2/06/0004 ...\$5.00.

Lei He Electrical Engineering Department University of California, Los Angeles Ihe@ee.ucla.edu

and slow transition. Nowadays, chips operate at very high frequencies and consume a large amount of power. The number of I/O's is ever increasing. High power leads to large current flows in the power delivery system (PDS), which causes large IR drop and di/dt noise. High frequencies cause inductive effects and may trigger resonance, which presents large impedance in PDS. A large number of I/O's lead to serious simultaneous switching noise (SSN). All of these may lead to power rail collapse and affect the operation of circuits. Power integrity has to be guaranteed in the entire PDS from voltage regulator module (VRM) to on-chip power grid. In this paper, we focus on decoupling capacitor optimization for the power integrity of IC package, especially SSN problem. However, our method can be also used for decoupling capacitor optimization in other part of the power delivery system.

Decoupling capacitors, which act as temporary current sources and low passes for ac signals, are essential to reduce the voltage fluctuation in the PDS. For package decoupling purpose, discrete decoupling capacitors are used. These decoupling capacitors are not perfect. Their frequency responses can be modeled with an equivalent serial capacitance (ESC), an equivalent inductance (ESL) and an equivalent resistance (ESR). With different prices, different types of decoupling capacitors have different ESC, ESL and ESR, and therefore different effective frequency ranges. As pointed out in [1], the expensive decoupling capacitors may not be the best choice for electrical performance. Also the effectiveness of the decoupling capacitors depends on its electrical environment and therefore varies with locations. Unlike on-chip decoupling capacitors, in-package decoupling capacitors can be put almost anywhere in the package. Therefore, the types and locations of the decoupling capacitors have to be optimized for most effective design with minimal cost.

The problem of decoupling capacitor optimization has already been presented in the literature. In [2, 3, 4, 5, 6, 7], on-chip decoupling capacitor optimization problem has been studied for different objective functions. However, onchip decoupling capacitors normally have negligible ESL and ESR and can take continuous values. Unfortunately, these are not true for in-package decoupling capacitors.

In-package and on-board decoupling capacitor optimization has also been studied, but majority of existing work is trial-and-error methods, such as [8] and [9], both of which are manual processes. Automatic optimization methods also exist. For example, the authors of [10] use the PEEC model and model order reduction techniques to compute the input impedance and then search for the optimal locations

<sup>\*</sup>This paper is partially supported by NSF CAREER award CCR-0093273/0401682 and a UC MICRO grant sponsored by Analog Devices, Intel, Mindspeed and RIO Design Automation. Address comments to lhe@ee.ucla.edu.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ISPD'06, April 9-12, 2006, San Jose, California, USA.



Figure 1: Impedance and noise waveform

of the decoupling capacitors to minimize the impedance by gradient based search. In [11] the authors use FDTD and FFT to obtain frequency dependent Poynting vector and decoupling capacitors are iteratively put at the port with maximum Poynting vector. However, in both papers the decoupling capacitor value is fixed and ESL or ESR is not considered.

The most comprehensive work on automatic optimization of package decoupling capacitors is [1]. In this work, the authors model the inductive effect of packages with susceptance (inverse of inductance) instead of inductance, and extract a resistance-capacitance-susceptance(RCS) model of the package. Based on this model a macromodel is built with a model order reduction technique. Then based on the macromodel a simulated annealing algorithm is developed to search for the optimal types of decoupling capacitors at given locations to minimize the cost under the constraint of a target impedance at chip I/O ports. Different types of decoupling capacitors with different ESC, ESL and ESR are considered.

However, the approach is based on impedance metrics, which will lead to significant overdesign. For example, in Fig.1 we show a case where the noise bound is met but impedance bound is not for a signal with effective frequency range up to 10GHz. Fig.1(a) shows that the target impedance is not met in most part of the frequency band. However, the noise bound has been met as shown in Fig.1(b). It is clear that the target impedance can not capture the noise accurately and may cause overdesign.

In this work, we directly use noise as the metric of SSN and develop an efficient noise model to optimize the location and types of decoupling capacitors. We consider a large number of ports to search for the optimal location for decoupling capacitor insertion. We assume the impedance matrix is given and develop an efficient model to compute the new impedance matrix with one decoupling capacitor inserted or removed. The time complexity of our algorithm is  $O(n^2)$  compared to  $O(n^3)$  in the state-of-the-art existing work [12]. With impedance matrix and pre-characterized switching current waveform, we use FFT to compute the noise waveform and obtain the worst case noise. Based on these models, we develop a simulated annealing algorithm to minimize the cost subject to the maximum noise constraint. The algorithm demonstrates good efficiency with large number of port. It finished a case with 93 ports in less than 7 minutes with 5881 iterations, which is more than  $10 \times$  faster than previous work. We also compare our approach with impedance based approach and show that impedance is not a good metric for noise and impedance based approach leads to overdesign. Compared to our noise based approach, the impedance based solution has  $3 \times \text{larger cost}$ .

The rest of the paper is organized as follows: In section 2 we discuss the electrical models for the package system. In section 3 we present the method to incrementally compute the impedance matrix. In section 4, we discuss the noise metric for optimization. In section 5, we use simulated annealing algorithm to optimize the decoupling capacitor insertion. We conclude the paper in section 6.

## 2. ELECTRICAL MODELS

#### 2.1 Package model

As shown in Fig.2, packages for semiconductor chips often consist of multiple signal layers, power planes and ground planes with dielectric in between. Metal signal traces connecting the chip I/O cells to the PCB traces are routed between planes, and package planes are stapled together with vias, and connected to PCB by balls. We assume the locations of chip I/O ports are known and the possible locations for the decoupling capacitors are predefined. We can prebuild the macromodel of the package with the specified ports for I/O's and decoupling capacitors before the optimization process. This macromodel not only includes the power or ground planes, but can also include vias and traces. Other part of PDS such as on-chip power grid, PCB and VRM can also be included. Specifically, for the macromodel we obtain the impedance matrix  $Z(f_k)$  for the specified ports at a number of sample frequencies  $f_k$  before hand. The matrix element  $Z_{ij}(f_k)$  of  $Z(f_k)$  is the transfer impedance from port j to port i at frequency  $f_k$ . The frequency dependent impedance Z can be obtained by various methods, such as 3D field solvers, model reduction, or measurement, depending on the time and accuracy requirement and design stages. Our method can be used with any of these methods. With the macromodel, the efficiency of following optimization process no longer depends on the size of the original circuits, but only depends on the number of ports defined. This allows a very complex package to be optimized in a very short time.

In this paper, we first extract a detailed RLCK circuit of the package, and then use a model order reduction technique to obtain the impedance matrix. For the detailed RLCK circuit, the planes are partitioned into grids and the traces are divided into small segments. Then, we extract the resistance, self inductance and grounding capacitance of each segment, the coupling inductance between each pair of segments and the coupling capacitance between adjacent segments.



Figure 2: IC package

#### 2.2 Decoupling capacitor model



Figure 3: Model of decoupling capacitors

As discussed in the introduction, the decoupling capacitors for the package are discrete elements. Each type of decoupling capacitors has different frequency domain response and can be characterized by ESC, ESL and ESR as shown in Fig. 3. We assume there are multiple types of decoupling capacitors and their ESC, ESL and ESR are given. For efficient optimization, we pre-compute the frequency dependent impedance of each type at the sample frequencies as

$$Z_d(\omega) = ESR + \frac{1}{j\omega ESC} + j\omega ESL \tag{1}$$

#### 2.3 Model of I/O cells



Figure 4: Switching current model

Normally each I/O cell drives a transmission line in the package. When an I/O cell switches, it draws a large current from the power delivery system and causes voltage fluctuation (SSN noise). The electrical behavior of I/O cells can be modeled by various models, for example, a physical model such as the BSIM model [13] or a behavior model such as the IBIS model [14]. With a given load, we can pre-characterize the I/O cell and obtain the time dependent current waveform similar to IBIS model by simulation. We further transfer the obtained time-domain waveform to frequency domain and obtain the frequency component of the current to be

used in the following optimization process. In this work, we model the time domain waveform as a piece-wise linear waveform, their frequency component can be computed analytically as a sum of a series ramp functions in the following form,

$$I(\omega) = \sum_{i} \frac{b_i}{s^2} e^{-sT_i}$$
(2)

Similarly to [6], for simplicity we model the current waveform as a two-segment piece-wise linear waveform (triangular waveform) as shown in Fig.4. With the parameter defined in the figure, the frequency components are computed as,

$$I(\omega) = \frac{a}{s^2} e^{-sT_d} + \frac{b-a}{s^2} e^{-s(T_d+T_r)} - \frac{b}{s^2} e^{-s(T_d+T_r+T_f)}$$
(3)

where,

$$a = \frac{A}{T_r} \tag{6}$$

$$b = -\frac{A}{T_f} \tag{7}$$

In this model, each I/O cells can have different amplitude, rising time and falling time. Note that our methods discussed in the rest part of this paper are not limited to such waveform but can be applied to any waveform. More accurate and complex current model can be used and the frequency components can be obtained either numerically or analytically before hand without affecting the optimization process.

# 3. INCREMENTAL COMPUTATION OF IM-PEDANCE



Figure 5: Connection of a decoupling capacitor

With a given current injection, the noise at a port depends on the impedance. With the insertion or removal of decoupling capacitors, the impedance matrix of the system will change and affect the noise value. Therefore, the impedance matrix has to be updated with changes of decoupling capacitor distribution. In [1], this is done by  $n_{io}$  AC sweeps, where the  $n_{io}$  is the number of I/O ports. Another method is presented in [12]. Assuming the macromodel without decoupling capacitors is given in terms of admittance matrix  $Y(\omega)$ , the impedance with decoupling capacitors is computed as,

$$Z(\omega) = (Y(\omega) + \tilde{Y}(\omega))^{-1}$$
(8)

Where  $\tilde{Y}(\omega)$  is a diagonal matrix with  $\tilde{Y}_{ii}$  equal to the admittance of the decoupling capacitor at port *i* at frequency

 $\omega$ . Both of these methods need at least one matrix inversion, on which the computation time of this operation mainly depends. Because Y is a macromodel, it is usually a dense matrix and the time complexity of the matrix inversion is roughly  $O(n_p^2)$ , where  $n_p$  is the number of ports including the I/O ports and the ports for the decoupling capacitors.

The approach above is good for computing impedance when simultaneously inserting or removing a large number of decoupling capacitors. However, in iterative optimization process such as the one to be presented later in this paper, we normally add or remove one or a small number of decoupling capacitors each time. In this case, matrix inversion is not necessary for impedance computation. We propose to incrementally compute the impedance matrix.



Figure 6: Thevenin equivalent circuit

Assuming at a certain frequency the impedance matrix before inserting the decoupling capacitor is Z and we insert one decoupling capacitor at port k as shown in Fig.5, we need to solve the new impedance  $\hat{Z}$ .  $\hat{Z}_{ij}$ , which is the transfer impedance from port j to port i, is equal to the voltage at i when applying an 1A current source at port j. Because the system is linear, we can replace the rest of the package except the decoupling capacitor with a Thevenin equivalent circuit as shown in Fig.6. The voltage source is equal to  $Z_{kj}$  and the source impedance is equal to  $Z_{kk}$ . Therefore the current running through the decoupling capacitor is  $Z_{kj}/(Z_{kk}+Z_d)$ , where  $Z_d$  is the impedance of the decoupling capacitor. Replacing the capacitor with a current source of the same current as shown in Fig. 7 will not change the voltage or current in the rest part of the circuit. According to the superposition principle the change of  $Z_{ij}$ is equal to  $-Z_{ik}Z_{kj}/(Z_{kk}+Z_d)$  and

$$\hat{Z}_{ij} = Z_{ij} - \frac{Z_{ik}Z_{kj}}{Z_{kk} + Z_d} \tag{9}$$

where  $Z_{ij}$  is the transfer impedance from port j to port ibefore inserting the decoupling capacitor. We can see that the change of  $Z_{ij}$  only depends on  $Z_{ik}$ ,  $Z_{jk}$ ,  $Z_{kk}$  and  $Z_d$ . Therefore, the overall impedance matrix with the decoupling capacitor added at port k at a given frequency is

$$\hat{Z} = Z - \frac{b_k a_k}{Z_{kk} + Z_d} \tag{10}$$

where  $a_k$  is the kth row of Z and  $b_k$  is the kth column of Z. The computation time of this process is mainly determined by computing  $b_k a_k$  which is an  $O(n_p^2)$  process. Removing a decoupling capacitor from port k is equivalent to adding a negative admittance of the same value at port k. Therefore, the overall impedance matrix with the decoupling capacitor removed from port k at a given frequency is

$$\hat{Z} = Z - \frac{b_k a_k}{Z_{kk} - Z_d} \tag{11}$$



Figure 7: Equivalent current source

Compared to (8), this method is obviously more efficient and scalable with the number of ports, when only one decoupling capacitor is added or removed. This is especially suitable for iterative optimization process or trial-and-error process, in which one or a small number of decoupling capacitors are changed and the impedance matrix is needed to be reevaluated in each iteration. Another advantage of this method is that to obtain certain ports' impedance we only need to selectively compute them with (9) without computing the impedance of other ports. This again is good for try and error method. For example, in simulated annealing method, we can first only compute the impedance of I/O ports. If the solution has been accepted, we further compute the impedance of other ports. Otherwise, we can move to the next iteration without any further computation. Since I/O ports are only a fraction of the total ports, we can save significant computation time.

If n decoupling capacitors are changed, the computation in (10) needs to be repeated for n times. When  $n \ll n_p$ , it will still be more efficient than (8). The worst case is that  $n = n_p$ , which means the distribution of decoupling capacitors changes at all the ports, and the complexity becomes  $O(n_p^3)$  same as [12]. Fortunately, this case will never happen in one iteration.

## 4. NOISE METRIC

#### 4.1 Impedance metric

Traditionally, for the integrity of power delivery system, the impedance at given ports is required to be lower than a computed target impedance in the entire frequency bandwidth of interest. According to [15], the target impedance can be computed as follows,

$$Z_t = \frac{\delta V dd}{I} \tag{12}$$

where,  $\delta$  is tolerable variation of Vdd and I is the switching current at the given ports. In [1], authors also proposed a weighted combined impedance to consider the coupling between ports. However, the impedance is not directly proportional to the noise and this kind of approaches is pessimistic. Physically, it actually assumes that all the frequency components have the same impedance with the same phase, and add up to the total noise.

In fact, the current is not uniformly distributed in the entire frequency band, and impedance can be different at different frequency. Also, different frequency components have different amplitude and phase, and may cancel each other. The impedance also varies with the frequency and needs not to be very small in the entire frequency band. In Fig.8 we show an excitation current waveform and its spectrum up to 10GHz. It is a triangular waveform with rising and falling time both equal to 100ps and the amplitude is



Figure 8: Transient current waveform and its spectrum

50mA. We can see that the current is mostly distributed from 0 to 10GHz, but the amplitude of the frequency component gradually decreases with the frequency increasing. The time domain noise is the convolution of current and impedance in frequency domain. Therefore, a large impedance at a lower frequency may cause large time domain noise, but may not cause problem at a higher frequency. One case has been shown in Fig.1.

#### 4.2 Time domain metric

In this paper, we directly consider the noise in the power delivery system at each port of interest. We can easily compute the impedance at different sampling frequencies and also pre-computed the spectrum of the switching current of each port. For the noise at port i induced by the switching activity at port j, the noise component at the kth frequency sampling point can be easily computed as,

$$V_{ij}(f_k) = Z_{ij}(f_k)I_j(f_k) \tag{13}$$

We then use Fast Fourier Transform (FFT) to compute the time domain waveform which is the noise waveform induced by port j at port i. The time complexity of FFT is  $O(n \log n)$  where n is the number of the sampling points. For the signal shown in Fig.8, 512 sampling points from 0 to 50GHz are used. For a signal with shorter rising time or falling time, more sampling points in higher frequency are needed.

At a given port, we consider both the noise induced by the I/O cells connected to the port and the noise induced by the switching activity of other I/O cells connected at other ports. Because the switching of the I/O cells are random and the system is linear, the worst case noise at one port is the sum of the maximum noises induced by all the cells. Each

| Table 1: Decoupling capacitors [1] |      |      |      |      |  |  |  |  |
|------------------------------------|------|------|------|------|--|--|--|--|
| Type                               | 1    | 2    | 3    | 4    |  |  |  |  |
| ESC(nF)                            | 50   | 100  | 50   | 100  |  |  |  |  |
| $\text{ESR}(\Omega)$               | 0.06 | 0.06 | 0.03 | 0.03 |  |  |  |  |
| ESL(pH)                            | 100  | 100  | 40   | 40   |  |  |  |  |
| Price                              | 1    | 2    | 2    | 4    |  |  |  |  |

of the maximum noises can be computed with the proposed method.

## 5. NOISE DRIVEN OPTIMIZATION OF DE-COUPLING CAPACITORS

# 5.1 Settings



Figure 9: An IC package

In this section, we use the developed impedance and noise models to minimize the cost of the decoupling capacitors in a package under the constraint of noise in the power delivery system. Fig.9 shows a sketch of the IC package we considered with I/O cells located on a ring structure along the chip boundary and decoupling capacitors located around the chip. The package is often cut into different domains for different supply voltages. Each voltage domain can be optimized separately. As an example, in this paper we only consider one side of the package, which can be considered as one voltage domain of the package.

Similar to [1], we also try to minimize the total decoupling capacitor cost. We consider different types of decoupling capacitors with different prices. We assume the same set of decoupling capacitors as in [1], which are summarized in table 1.

However, different from [1], we do not apply the target impedance constraint. Instead, we directly require the worst case noise less than the given noise bound. We assume that the Vdd is 2.5V and require the noise to be less than 15% of Vdd, which is 0.35V at each port.

#### 5.2 Algorithm

We use the simulated annealing algorithm to optimize the types and locations of the decoupling capacitors so that the total cost is minimized and the noise in the power/ground plane is smaller than a given bound. The objective function is defined as

$$F(p_i, c_j) = \alpha \sum_{i \in IO} p_i + \beta \sum_j c_j$$
(14)

where  $\alpha$  and  $\beta$  are weights for the noise and cost respectively.  $\alpha$  is chosen to be much larger than  $\beta$  so that the

noise constraint can be achieved.  $p_i$  is the penalty function for violation of the noise constraint and is defined as

$$p_i = \begin{array}{cc} 0 & (V_i < V) \\ V_i - \bar{V} & (V_i > \bar{V}) \end{array}$$
(15)

where  $\overline{V}$  is the noise upper bound and  $V_i$  is the worst case noise at port *i*, which is computed by the method proposed in section 4.2.

There are two types of moves in our simulated annealing (SA) scheme: (1) adding a decoupling capacitor of a random type at a randomly picked port. (2) removing a decoupling capacitor. At most one decoupling capacitor is allowed at one port. After each move, we compute the new impedance according to (10) and the noise according to (13). We start the SA with initial temperature of 20 and terminate it at 0.001. The temperature is decreased by a factor of 0.95 and the number of moves at a particular temperature is 100.

#### 5.3 Results

5.3.1 Case 1





Our model and algorithm can be applied to any package configurations with any number of layers. In this case, we assume  $1 \text{cm} \times 2 \text{cm}$  rectangular planes with a power plane and a ground plane as shown in Fig.10 . I/O cells are located at one edge of the structure. We assume that there are 30 I/O cells. Each of them will draw the current shown in Fig.8. Since cells close to each other have similar impedance and strongly couple to each other, we partition the 30 I/O cells into 3 groups and define 3 I/O ports. Each cell is connected to the closest I/O port and each of the ports is connected with 10 I/O cells. Note for higher accuracy, more ports can be defined if necessary. We allow the decoupling capacitors to be distributed across the plane, and therefore define 90 uniformly distributed ports on the package. Totally, there are 93 ports in our macromodel.

Our noise based algorithm found a valid solution where all the ports meet the noise constraint. The worst case noise of each port is listed in table 2. The total cost of the decoupling capacitors is 20. In Fig.11, we show the distribution of the decoupling capacitors in a uniform grid. In this figure, the numbers stand for the type of decoupling capacitor, and '0' means no decoupling capacitor. We can see that the decoupling capacitors are concentrated along the I/O rings and located in two rings around the chip, which shows that in this simple structure the decoupling capacitors should be placed as close as possible to the I/O cells to minimize the noise at the I/O cells.

We further compare our results with an impedance based approach. In this approach, for the objective function we substitute the noise with the maximum impedance and replace the noise bound with the target impedance. Because

Table 2: Worst-case noise at ports

| port                |   |   |    |      | 1     |    |     | 2     |     |      | 3   |
|---------------------|---|---|----|------|-------|----|-----|-------|-----|------|-----|
| before optimization |   |   |    | 2    | 2.52V |    | 2   | 2.49V |     | 2.   | 48V |
| after optimization  |   |   | 0. | .344 | ΙV    | 0. | 343 | V     | 0.3 | 344V |     |
|                     |   |   |    |      |       |    |     |       |     |      |     |
| 0                   | 0 | 0 | 0  | 0    | 0     | 0  | 0   | 0     | 0   | 0    |     |
| 0                   | 0 | 0 | 0  | 0    | 0     | 0  | 0   | 0     | 0   | 0    |     |
| 0                   | 0 | 0 | 0  | 0    | 0     | 0  | 0   | 0     | 0   | 0    |     |
| 0                   | 0 | 0 | 0  | 0    | 0     | 0  | 0   | 0     | 0   | 0    |     |
| 0                   | 0 | 0 | 0  | 0    | 0     | 0  | 0   | 0     | 0   | 0    |     |
| 0                   | 0 | 0 | 0  | 0    | 0     | 0  | 0   | 0     | 0   | 0    |     |
| 0                   | 0 | 0 | 1  | 0    | 0     | 0  | 0   | 0     | 0   | 0    |     |
| 0                   | 1 | 0 | 0  | 0    | 3     | 0  | 0   | 0     | 0   | 3    |     |
| 1                   | 0 | 0 | 1  | 0    | 4     | 0  | 2   | 3     | 0   | 1    |     |
| 0                   | 0 | 0 | 0  | 0    | 0     | 0  | 0   | 0     | 0   | 0    |     |
| Chip                |   |   |    |      |       |    |     |       |     |      |     |
| Спр                 |   |   |    |      |       |    |     |       |     |      |     |

Figure 11: Optimal distribution of decoupling capacitors from noise driven approach

we require the noise less than 0.35V and the total peak current of 10 I/Os connected to one port is 500mA, the target impedance for each port is calculated to be 0.7. The distribution of the decoupling capacitors in the best solution given by the impedance approach is shown in Fig. 12. We can see that though the decoupling capacitors still concentrate around the chip but spread more across the planes than noise driven approach. The total cost is 72, which is more than  $3 \times$  larger than the results of noise driven approach. In table 3, we show the maximum impedance and the noise at each port. We can see the target impedance can not be reached but the noise is already well below the noise bound. This shows using impedance as a noise metric will lead to large overdesign.

| 0 | 0 | 0    | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|---|---|------|---|---|---|---|---|---|---|---|
| 0 | 0 | 0    | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0    | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 |
| 0 | 0 | 0    | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 |
| 1 | 0 | 0    | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0    | 0 | 0 | 0 | 0 | 1 | 0 | 4 | 1 |
| 0 | 0 | 0    | 0 | 0 | 2 | 0 | 0 | 2 | 3 | 4 |
| 0 | 0 | 2    | 4 | 1 | 2 | 0 | 4 | 2 | 2 | 1 |
| 2 | 4 | 3    | 3 | 1 | 1 | 0 | 1 | 4 | 1 | 4 |
| 0 | 0 | 0    | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|   |   |      |   |   |   |   |   |   |   |   |
|   |   | Chip |   |   |   |   |   |   |   |   |
|   |   |      |   |   |   |   |   |   |   |   |
|   |   |      |   |   |   |   |   |   |   |   |

Figure 12: Optimal distribution of decoupling capacitors from impedance driven approach

#### 5.3.2 Case 2

In case 2, we assume a domain on one side of the chip as shown in Figure 13. The package is assumed to have four layers of power or ground planes. All the planes are sta-

Table 3: Impedance and noise at ports

| port              | 1            | 2            | 3            | bound       |
|-------------------|--------------|--------------|--------------|-------------|
| maximum impedance | $5.31\Omega$ | $5.59\Omega$ | $7.12\Omega$ | $0.7\Omega$ |
| worst-case noise  | 0.256V       | 0.302V       | 0.284V       | 0.35V       |



Figure 13: One package domain

pled together with uniformly distributed vias and the bottom power/ground plane are grounded at several locations in the plane. We defined 70 ports for the decoupling capacitors and 3 ports for noise optimization. The capacitor distribution of the best solution is shown in Fig.14. We can see that the capacitor is distributed around the chip and also across the planes. This is because the vias and grounding connections change the electrical environment at different locations. The best location for decoupling capacitors may not be just closest to the chip.

| 0 | 0    | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|---|------|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0    | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0    | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 3 | 0 |
| 0 | 0    | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0 | 0    | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0    | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 |
| 0 | 2    | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0    | 0 | 0 | 0 | 0 | 4 | 0 | 3 | 0 | 0 | 1 | 0 |
| 0 | 0    | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0    | 0 | 2 | 2 | 0 | 1 | 2 | 1 | 0 | 2 | 2 | 1 |
| 0 | 0    | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0    | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|   |      | 1 |   |   |   |   |   |   |   |   |   |   |
|   | Chip |   |   |   |   |   |   |   |   |   |   |   |
|   |      |   |   |   |   |   |   |   |   |   |   |   |
|   |      |   |   |   |   |   |   |   |   |   |   |   |

Figure 14: Optimal distribution of decoupling capacitors from noise driven approach

#### 5.4 Runtime

We implement the algorithms in Matlab and conduct experiments on a 2.8GHz Xeon system. For comparison, we also implemented the method of (8). The runtime of case 1 for different methods listed in table 4 is shown in table  $5^1$ . In the table, method 1 is the proposed method using the proposed incremental impedance computation and FFT for noise computation. Method 2 uses the impedance computation method from [12] and FFT for noise computation. Method 3 is from [1]. By comparing method 1 and 2, we can see that the incremental computation of impedance is  $11 \times$  faster than the matrix inversion based approach. Comparing method 1 and 3, our method is significantly faster than method 3 even considering the speed difference of the computing platforms and with more ports. As we pointed out earlier, after obtaining the macromodel the runtime only

| Table 4: Approaches |                                           |  |  |  |  |  |  |  |
|---------------------|-------------------------------------------|--|--|--|--|--|--|--|
| 1                   | incremental impedance + noise objective   |  |  |  |  |  |  |  |
| 2                   | matrix inversion $[12]$ + noise objective |  |  |  |  |  |  |  |
| 3                   | ref. [1]                                  |  |  |  |  |  |  |  |

| Table 5: Runtime.  |        |        |        |  |  |  |  |  |  |
|--------------------|--------|--------|--------|--|--|--|--|--|--|
| approach           | 1      | 2      | 3      |  |  |  |  |  |  |
| ports              | 93     | 93     | 20     |  |  |  |  |  |  |
| iterations         | 5881   | 5403   | 1920   |  |  |  |  |  |  |
| run time(s)        | 389.5  | 4156.1 | 2916.0 |  |  |  |  |  |  |
| avg. $run time(s)$ | 0.0662 | 0.7692 | 1.519  |  |  |  |  |  |  |

depends on the number of ports. For packages, the number of I/O ports and possible locations for decoupling capacitors are often less than a few hundred. From the results, we can see that the models and algorithm can handle such large number of ports and can be readily used for optimization of decoupling capacitors in real designs.

#### 6. CONCLUSION

In this paper, we studied the optimization of decoupling capacitors for package power integrity. Different from traditional frequency domain impedance based approach, we directly used time domain noise as the metric to guide the optimization. To do this, we developed an efficient worst case noise model. We first developed an efficient method to compute the port impedance incrementally with changes in decoupling capacitor configuration. The complexity of the method is only  $O(n^2)$  compared to previous work's  $O(n^3)$ complexity. Based on the impedance we then computed the noise with FFT. We further developed a simulated annealing algorithm to minimize the cost of the decoupling capacitors under the constraints of worst-case noise. Experiments showed that our algorithm demonstrates good efficiency with large number of ports. Compared to previous work, we gained more than  $10 \times$  speedup. We also showed that impedance based approach leads to large overdesign. The cost of the solution from our noise based approach is  $3 \times$  smaller than the cost from the solution of the impedance based approach.

## 7. REFERENCES

- H. Zheng, B. Krauter, and L. Pileggi, "On-package decoupling optimization with package macromodels," in *Custom Integrated Circuits Conference*, 2003.
- [2] H. H. Chen and S. E. Schuster, "On-chip decoupling capacitor optimization for high-performance vlsi design," in *International Symposium on VLSI Technology, Systems, and Applications*, 1995.
- [3] H. H. Chen, J. S. Neely, M. F. Wang, and G. Co, "On-chip decoupling capacitor optimization for noise and leakage reduction," in *IEEE International* Symposium on Integrated Circuits and Systems Design, 2001.
- [4] M. D. Pant, P. Pant, and D. S. Wills, "On-chip decoupling capacitor optimization using architectural level prediction," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 10, pp. 319–326, 2002.
- [5] S. Zhao, K. Roy, and C.-K. Koh, "Power supply noise aware floorplanning and decoupling capacitance

<sup>&</sup>lt;sup>1</sup>The runtime of method 3 in table 5 is taken from [1]. The computation platform is 1GHz Pentium 3, and the computing language is unknown.

placement," in Proc. Asia South Pacific Design Automation Conf., 2002.

- [6] H. Su, S. S. Sapatnekar, and S. R. Nassif, "Optimal decoupling capacitor sizing and placement for standard-cell layout designs," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 22, pp. 428–436, 2003.
- [7] J. Fu, Z. Luo, X. Hong, Y. Cai, S. X.-D. Tan, and Z. Pan, "A fast decoupling capacitor budgeting algorithm for robust on-chip power delivery," in *Proc. Asia South Pacific Design Automation Conf.*, 2004.
- [8] Y. Chen, Z. Chen, and J. Fang, "Optimum placement of decoupling capacitors on packages and printed circuit boards under the guidance of electromagnetic field simulation," in *Electronic Components and Technology Conference*, 1996.
- [9] X. Yang, Q. Chen, and C. Chen, "The optimal value selection of decoupling capacitors based on FDFD combined with optimization," in *IEEE Topical Meeting on Electrical Performance of Electronic Packaging*, 2002.
- [10] A. Kamo, T. Watanabe, and H. Asai, "An optimization method for placement of decoupling capacitors on printed circuit board," in *IEEE Topical Meeting on Electrical Performance of Electronic Packaging*, 2000.
- [11] I. Hattori, A. Kamo, T. Watanabe, and H. Asai, "A searching method for optimal locations of decoupling capacitors based on electromagnetic field analysis by FDTD method," in *IEEE Topical Meeting on Electrical Performance of Electronic Packaging*, 2002.
- [12] J. Zhao and O. P. Mandhana, "A fast evaluation of power delivery system input impedance of printed circuit boards with decoupling capacitors," in *IEEE Topical Meeting on Electrical Performance of Electronic Packaging*, 2004.
- [13] http://www-device.EECS.Berkeley.EDU/ ptm/.
- [14] http://www.eda.org/pub/ibis/.
- [15] L. Smith, R. Anderson, D. Forehand, T. Pelc, and T. Roy, "Power distribution system design methodology and capacitor selection for modern cmos technology," *IEEE Transactions on Advanced Packaging*, vol. 22, pp. 284–291, 1993.