## A min-cost flow approach for FPGA Vdd assignment for power minization with retiming and budgeting

October 29, 2005

## 1 Timing constraints considering retiming

The timing graph of the whole circuit is represented by a edge-weighted DAG. The nodes are the primary inputs, primary outputs, and input/out of the logic gates (i.e. LUTs in our study). Note the difference of this timing graph from the one we used in combinational budgeting (ispd06 FGPA submission) is that FFs are not included in the timing graph. Instead, we associate a weight, FF(e), of an edge e, which denotes the number of the FF inserted in it. We can get the initial weight of each edge by the number of FF in each edge after placement and routing.

In the timing graph, there are two cases for FF insertion, with and without feedback loop in the circuits, which are shown in Figure 1, where we suppose there is only one basic logic block in a cluster for simplicity. The two red edges are initially assigned positive weight, i.e. FFs are inserted in these edges.



Figure 1: Two cases for FF insertion in VPR

The retiming constraints and timing constraints considering both edge delay and FF delay  $D_{ff}$  can be expressed as [Yeh:ICCAD'03]

$$x_u - x_v \le FF(e) \cdot (P - D_{ff}) - d_{uv} \quad \forall e(u, v) \in E$$

$$x_k \le P \qquad k \in PO$$

$$x_k = 0 \qquad k \in PI$$
(1)

where  $d_{uv}$  is the lower bound of the delay of edge (u, v), we use this lower bound to control the budget and also satisfy the retiming constraints. This lower bound can be calculated by running combinational budgeting.

Now we derive the denotation of the slack assigned to sink j of routing tree  $\mathcal{R}_i$ . In timing graph, a routing tree  $\mathcal{R}_i$  with k sinks is combined with the output of the upstream LUT of the source. Suppose the corresponding node of the output of this LUT in timing graph is  $x_l$ , then the routing tree  $\mathcal{R}_i$  is denoted as k edges in timing graph, each edge is from this node  $x_l$  to sink  $x_j$ . We define these edges as *TreeEdge*. The set of all Tree Edges is *TE*. Hence, we need to assign a slack  $S_{ij}$  on each of such edges corresponding to each sink j in routing tree  $\mathcal{R}_i$ , respectively.  $S_{ij}$  can be expressed as

$$S_{ij} = FF(e(x_l, x_j)) \cdot (P - D_{ff}) + x_j - x_l - d_{lj}$$
<sup>(2)</sup>

where  $FF(e(x_l, x_i))$  is equal to 0 if the FF associated with LUT  $x_l$  is not used. Otherwise, it is equal to 1.

Besides timing constraints (1), the slack upper bound constraint,  $S_{ij} \leq D_{ij}$ , can be written as

$$x_v - x_u \le D_{ij} - FF(e) \cdot (P - D_{ff}) + d_{uv} \qquad \forall e(u, v) \in TE$$
(3)

We can calculate the number of FFs in edge e(u, v),  $FF_{new}$ , for the retimed circuit as follows

$$FF_{new}(e) = FF(e) + r(v) - r(u)$$

$$\tag{4}$$

$$r(v) = \begin{cases} 0 & v \in PI/PO\\ \left\lceil \frac{x_v}{P} \right\rceil - 1 & otherwise \end{cases}$$
(5)

Following the FPGA structure used in VPR, the input of a FF is always from the output of a LUT, therefore, there is no more than one FF that can be inserted in an edge in the timing graph. In the other word,  $FF_{new}(e)$  can not larger than 1. Hence, we need to add the following extra retiming constraints for our FPGA structure

$$FF_{new}(e) = FF(e) + r(v) - r(u) \le 1$$
 (6)

$$\Rightarrow \qquad \left\lceil \frac{x_v}{P} \right\rceil - \left\lceil \frac{x_u}{P} \right\rceil \le 1 - FF(e) \tag{7}$$

$$\Leftrightarrow \qquad x_v - x_u < P(2 - FF(e)) \tag{8}$$

$$\Leftrightarrow \quad x_v - x_u < P \qquad if \qquad FF(e) = 1 \tag{9}$$

$$x_v - x_u < 2P \qquad if \qquad FF(e) = 0 \tag{10}$$

## 2 Objective function

When we consider mixed interconnect wire length in FPGA, the dynamic power reduction and leakage power reduction for routing tree  $\mathcal{R}_i$  can be rewritten as follows,

$$P_{dr}(i) = 0.5f_{clk} \cdot \Delta V dd^2 \cdot \sum_{j=0}^{N_s(i)-1} S_{ik} \cdot [f_s(i,j) \cdot \frac{l_{ik} \cdot c_{ij}^2}{D_{ik} \cdot C_{ik}}]$$
(11)

$$P_{lr}(i) = \sum_{j=0}^{N_s(i)-1} S_{ik} \cdot \left[ \left( \frac{l_{ik} \cdot c_{ij}}{D_{ik} \cdot C_{ik}} \right) \cdot \Delta P_s(i,j) \right]$$
(12)

The objective function can be rewritten as following by merging the coefficient of the slack  $S_{ik}$  of  $k^{th}$  sink in  $\mathcal{R}_i$  as follows,

$$Maximize \qquad \sum_{i=0}^{N_r-1} \sum_{k=0}^{N_k(i)-1} W_{ik} \cdot S_{ik} = \sum_{\forall Sink} W_{ik} \cdot S_{ik}$$
(13)

$$W_{ik} = \sum_{\forall j \in \mathcal{UBC}_{ik}} \left[ 0.5 f_{clk} \Delta V dd^2 c_{ij} f_s(i,j) + \Delta P_s(i,j) \right] \cdot \frac{c_{ij} \cdot D_{ik}}{(C_{ik} \cdot l_{ik})}$$
(14)

where set  $\mathcal{UBC}_{ik}$  include all switches with  $k^{th}$  sink as the critical sink in  $\mathcal{R}_i$ .

## 3 Min-cost flow formulation

The power optimal vdd assignment with retiming can be formulated as objective function (13) and constraints (1) and (3). Obviously, the dual of this problem is a min-cost flow problem.

The network is still the timing graph and the cost associated with each edge e is  $FF(e) \cdot (P - D_{ff}) - d_{uv}$ . (3) and (7) will introduce backward edges with cost  $D_{ij} - FF(e) \cdot (P - D_{ff}) + d_{uv}$  and P(2 - FF(e), respectively in the timing graph. Note that these backward edges can not introduce negative circles to the timing graph. For the backward edges introduced by feedback path in the first case in Figure 1, it won't introduce negative circles in the circuit neither as we always assume the circuit is well designed.

After solving the flow problem and shortest path problem, we can get  $x_u$  for each node in the timing graph. With constraint (7), we can make sure that each edge in timing graph can be assigned either 0 or 1 FF, which means we use only the FF in 1-weight edge. Finally, we can calculate slack based on (2) and update the timing graph, and perform bottom-up vdd assignment and refinement.