# Weekly Report for Yu Hu's work from Aug. 29 to Sep. 4

September 4, 2005

## 1 Prove the line case of level converter free formulation

#### 1.1 Model and formula

The line case (see Figure 1) in King Ho's DAC'05 paper is used in our proof. The logic of the proof is that, whenever the first level converter is used (low  $V_{dd}$  buffer drives high  $V_{dd}$  buffer ), we can always switch the high  $V_{dd}$  buffer and low  $V_{dd}$  buffer while keeping the optimal power under the given timing constraint.



Figure 1: Comparison cases for the proof

In Figure 1, we assume  $l_1 + l_2 = p_1 + p_2 = L$ . Under a certain technical node, we can always assume that the settings (input capacitance  $C_{in}^0$ , intrinsic delay  $D_{int}^0$  and output resistance  $R_d^0$ ) min-size buffer are given. We set the size of low  $V_{dd}$  buffer , high  $V_{dd}$  buffer and level converter in case (A) as x, y and z, and the size of low  $V_{dd}$  buffer in case (B) as u and w. Based on these constant parameters, we calculate the Elmore delay of case (A) and (B) as follows.

$$Delay_{A} = R_{in}C_{L} + D_{L} + R_{L}(c_{w} + l_{1} + C_{LC}) + r_{w}l_{1}(c_{w}l_{1}/2 + C_{LC}) + D_{LC} + R_{LC}C_{H} + D_{H} + R_{H}(c_{w}l_{2} + C_{load}) + r_{w}l_{2}(c_{w}l_{2}/2 + C_{load})$$

$$Delay_{B} = R_{in}C'_{H} + D'_{H} + R'_{H}(c_{w}p_{1} + C'_{L}) + r_{w}l_{1}(c_{w}p_{1}/2 + C'_{L}) + (1)$$

$$D'_{H} + R'_{L}(c_{w}p_{2} + C_{load}) + r_{w}p_{2}(c_{w}p_{2}/2 + C_{load})$$
(2)

where

Power dissipation by case (A) and (B) can be calculated as follows.

$$Power_{A} = x \cdot E_{L} + y \cdot E_{L} + z \cdot E_{LC} + 0.5 \cdot c_{w} \cdot l_{2} V_{dd}^{H^{2}} + 0.5 \cdot c_{w} \cdot l_{1} V_{dd}^{L^{2}}$$
(3)

$$Power_B = u \cdot E_L + w \cdot E_L + 0.5 \cdot c_w \cdot p_1 V_{dd}^{H^2} + 0.5 \cdot c_w \cdot p_2 V_{dd}^{L^2}$$
(4)

Obviously, the power dissipation of case (A) and (B) might be sensitive to min-size buffer settings, buffer size,  $V_{dd}$  level, and timing constraints etc. Now I'm going to show the effect of the above parameters to the difference of power between (A) and (B). A typical global wire length is selected in the following analysis since the results are scalable based on the wire length.

#### **1.2** Effect of timing constraints

Firstly, I'll show the effect of timing constraint on power dissipation. Under the typical settings of min-buffer and  $V_{dd}$  level in 65nm technical node, the power of (A) and (B) under the timing constraints from the optimal delay to 110% optimal delay is shown in Figure 2. It's easy to find that  $Power_A$  is much larger than  $Power_B$  when the timing constraint is tight,  $Power_A$  becomes close to  $Power_B$  when timing constraint is looser, and ultimately the difference between  $Power_A$  and  $Power_B$  is a constant as power is totally decided by  $V_{dd}$  level and min-buffer size under over-loosed timing constraint. The most important part is that  $Power_A$  is always larger than  $Power_B$  under the same timing constraint. To exclude the effect of  $V_{dd}$  level on the result, I tested two groups of  $V_{dd}$  settings. Figure 2 shows that the results under both  $V_{dd}$  levels are consistent.



Figure 2: Slack vs. Power

#### **1.3** Effect of upstream res and downstream cap

Then, I'll show the effect of upstream resistance  $R_{in}$  and the downstream load capacitance  $C_{load}$  on power dissipation of (A) and (B). The  $R_{in}$  and  $C_{load}$  are set to be from 1x to 1000x resistance and capacitance of the min-size buffer, respectively. Also three kinds of timing constraints, such as optimal delay, 10% slack and 50% slack, are used respectively. Figure 3 shows the results. We can find that  $Power_A$  is always larger than  $Power_B$  when  $C_{load}$ is less than 700x capacitance of the min-size buffer. Actually, this is a reasonable assumption in our design as the downstream buffering can bound the load capacitance very well.

### 1.4 Effect of min-size buffer settings

At last, I'll show the effect of the output resistance  $R_D^0$  and intrinsic delay  $D_{int}$  of the min-size buffer (we can treat the input capacitance of the min-size buffer as a constant in practice).  $R_D^0$  is set to be from 50 $\Omega$  to 10000 $\Omega$ ,



Figure 3:  $R_{in}$  and  $C_{load}$  vs. Power

and  $D_{int}$  is from 10ps to 1000ps. The optimal delay, 10% and 50% slack timing constraints are tested. The results are shown in Figure 4, whose Z axe is  $\frac{Power_A - Power_B}{Power_B}$  and x, y axes are  $R_D^0$  and  $D_{int}$  respectively. We can find that  $Power_A$  is always larger than  $Power_B$  in this range.



Figure 4:  $R_D^0$  and  $D_{int}$  vs. Power

#### 1.5 A sufficient condition for the level converter free formulation

**Thereom** Given x-size low  $V_{dd}$  buffer and y-size high  $V_{dd}$  buffer in case (A), if we can always find a proper segments  $p_1$  and  $p_2$  to satisfy the timing constraint with simply switching these two buffers (without changing buffer sizes), then we have  $Power_A > Power_B$  if  $\beta \ge \frac{\alpha y+x}{2}$ , where  $\beta$  is a ratio of the size of the wire L we considered comparing to the min-size buffer.

Proof We can write the resistance and capacitance by those of min-size buffer. Without losing generality, we ignore the constant coefficients and the Res and Cap of the wire l can be written as  $\beta R_d^0$  and  $\beta C_i^0 n$ . For the level converter, we can simply treat it as  $\alpha \times$  high  $V_{dd}$  buffer. In the other hand, the timing constraints  $Delay_A \leq T$ ,  $Delay_B \leq T \Leftrightarrow Delay_A - T = 0$ ,  $Delay_B - T = 0$  for power optimization. We can substitute  $\beta R_d^0$ ,  $\beta C_i^0 n$  and the buffer sizes in Eq.1 and Eq.2 and divide  $C_i^0 n \cdot R_d^0$  in both sides and ignore all constant coefficients, then rearrange and get the following equations.

$$\beta_1^2 + \beta_1 \cdot (1/x + \alpha y - \beta - 1) + (1 + x + \alpha y/x + 1/\alpha + (\beta + 1)/y + \beta^2/2 + \beta - T) = 0$$
(5)

$$\gamma_2^2 + \gamma_2 \cdot (1/x - \beta + 1 - 1/y - x) + ((\beta + x)/y + \beta x + y + 1/x + \beta^2/2 - T) = 0$$
(6)

 $\Rightarrow$ 

$$\beta_{1} = \frac{-(1/x + \alpha y - \beta - 1) + \sqrt{(1/x + \alpha y - \beta - 1)^{2} - 4(1 + x + \alpha y/x + 1/\alpha + (\beta + 1)/y + \beta^{2}/2 + \beta - T)}}{2}$$

$$= \frac{B + \sqrt{B^{2} - 4C}}{2}$$
(7)

$$\gamma_{2} = \frac{-(1/x - \beta + 1 - 1/y - x) + \sqrt{(1/x - \beta + 1 - 1/y - x)^{2} - 4((\beta + x)/y + \beta x + y + 1/x + \beta^{2}/2 - T)}}{2}$$

$$= \frac{B' + \sqrt{B'^{2} - 4C'}}{2}$$
(8)

where  $\beta_1$  and  $\gamma_2$  is similar defined as  $\beta$  and T is the timing constraint divided by some constant. A sufficient condition for  $\beta_1 < \gamma_2$  is that we have the following inequations.

$$B - B' = -(1/x + \alpha y - \beta - 1) - (-(1/x - \beta + 1 - 1/y - x)) = 2 - 1/y - x - \alpha y < 0$$
(9)

$$(B^{2} - 4C) - (B'^{2} - 4C') = \alpha^{2}y^{2} + 2\alpha y/x - 2\alpha y\beta - 2\alpha y + 2\beta - 4(1 + x) - 4\alpha y/x - 4/\alpha - 4(\beta + 1)/y - x^{2} - 1/y^{2} + 2/(xy) + 2 + 2\beta + 2(\beta + 1)/y + 2(\beta + 1)x + 2x/y + 4y \approx \alpha^{2}y^{2} - 2\alpha y\beta + 2\beta - x^{2} + 2\beta + 2(\beta + 1)x = \alpha^{2}y^{2} - x^{2} - (2\alpha y - 4 - 2x)\beta < 0 \Rightarrow \beta > \frac{\alpha^{2}y^{2} - x^{2}}{2\alpha y - 2x - 4} > \frac{(\alpha y - x)(\alpha y + x)}{2(\alpha y - x)} = \frac{\alpha y + x}{2}$$
(10)

Based on Eq.3 and Eq.4, the power of (A) and (B) can be written as follows with ignoring all constant coefficients.

$$pwr_A = x + (1+\alpha)y + V_H\beta_2 + V_L\beta_1 \tag{11}$$

$$pwr_B = x + y + V_H \gamma_1 + V_L \gamma_2 \tag{12}$$

$$\Rightarrow$$
 (13)

$$pwr_A - pwr_B = \alpha y + (V_L - V_H)(\beta_1 - \gamma_2)$$
(14)

Obviously, if  $\beta_1 < \gamma_2$  then  $pwr_A > pwr_B$ .