### On Optimum Switch Box Designs for 2-D FPGAs

Hongbing Fan

Department of Computer Science University of Victoria BC. Canada V8W 3P6 hfan@csr.csc.uvic.ca Jiping Liu

Department of Mathematics and Computer Science The University of Lethbridge AB. Canada T1K 3M4 liu@cs.uleth.ca Yu-Liang Wu<sup>†</sup> Chak-Chung Cheung

Department of Computer Science and Engineering The Chinese University of HK Shatin, N. T., Hong Kong

{ylw,ccheung2}@cse.cuhk.edu.hk

### ABSTRACT

An FPGA switch box is said to be universal (hyper-universal) if it can detailed route all possible surrounding 2-pin (multi-pin) net topologies satisfying the global routing density constraints. A switch box is optimum if it is hyper-universal and the switches inside is minimum. It has been shown that if the net topology is restricted to 2-pin nets, then a 2-D (4-way) switch box can be built to be universal with only 6W switches, where W is the global routing channel density. As the routing resource is relatively expensive in FPGA chips, study of the optimum switch box designs is clearly a topic with theoretical and commercial value of reducing silicon cost. A previous work has constructed a formal mathematical model of this optimum design problem for switch boxes with arbitrary dimensions, and gave a scheme to produce hyper-universal designs with less than 6.7W switches for 4-way FPGA switch boxes. In this paper, we will further investigate this most common 4-way switch box case, and will give new theoretical results followed by extensive experimental justification. The results seem to be quite attractive. We show that such an optimum switch box can be built with a very low number of additional switches beyond 6W for today's practical range of low W's (e.g. just 6W plus 1 or 2 additional switches for W's up to 7). Even for arbitrary large W's, the bound can be shown to be under 6.34W. To make experimental comparison, we run today's published best FPGA router VPR on large benchmarks for the popular Disjoint structure and our proposed designs. The results are quite encouraging.

DAC 2001, June 18-22, 2001, Las Vegas, Nevada, USA

Copyright 2001 ACM 1-58113-297-2/01/0006 ...\$5.00.

### 1. INTRODUCTION

Field Programmable Gate Array (FPGA), a kind of Very Large Scale Integrated (VLSI) circuit, consists of an array of pre-fabricated functional blocks and wire segments with user-programmability of logic and routing resources. Because of their fast turn-around time and economic manufacturing cost for low volume designs, FPGAs have been used in a great amount of digital equipments. FPGA technologies are commonly classified into three major categories: (1) Look-Up-Table (LUT), SRAM based (2) multiplexer, channel organized and anti-fused, and (3) PLD, EPROM based. In this paper, we will study the optimum routing structure problems for the popular LUT and SRAM based two-dimensional (2-D) FPGAs. The architecture of an industrial product of this type is described in [1, 2, 4, 6].

The importance of routing resource issues in FPGAs is never over-emphasized. In commercial FPGA products, the routing resource consumes most of the chip area, and is responsible to most of the circuit delay. A typical 2D FPGA architecture is shown in Fig. 1. The functional blocks (or logic cells) are marked by L, which are separated by vertical and horizontal channels. There are W (called channel density) prefabricated parallel wire segments running between each pair of adjacent L-cells in both vertical and horizontal channels. The wire segments in a vertical (or horizontal) channel are aligned into W vertical (or horizontal) tracks; each track within a channel is assigned an integer in  $\{1, \ldots, W\}$  as its track ID. There are C-boxes in the channel between adjacent Lcells. A Switch Box (S-box), located at each intersection of a vertical and horizontal channels, contains programmable switches to connect wire segments running from its surrounding C-boxes.

When an FPGA is used to realize a specified Boolean function, the pins used to realize the Boolean function are partitioned into groups (called nets). Then the pins in each group are connected together to form a real net by using available wire segments and switches in both C-boxes and S-boxes; different nets are disconnected. The latter process is referred to as a routing. Conventionally, the routing process is divided into two subsequent steps, global routing and detailed routing, although there is no absolute need for doing routing in these two phases. In this paper, we simply use the term of global routing to specify the connection topologies for all nets. The detailed routing decides the exact assignment of wire segments and switches used to materialize the complete routing. As the connectivity within C-boxes is complete, the routability of

<sup>\*</sup>This research was partially supported by the Natural Sciences and Engineering Research Council of Canada and the Alberta Research Excellence Envelope.

<sup>&</sup>lt;sup>†</sup>Research partially supported by a Hong Kong Government RGC Earmarked Grant, Ref. No. CUHK4163/99E, and Direct Grant CUHK2050199.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.



Figure 1: The architecture of a 2D FPGA.

the entire chip is fully dependent on the structure and connectivity of the S-boxes [1, 4, 5, 6, 9, 10, 11, 12, 13, 14]. As the routing resource is relatively expensive in FPGA chips, it is clearly desirable to design switch boxes (S-boxes) with maximized routability and minimum number of programmable routing switches.

In [6] a so called Universal Switch Module (USM) structure, which is a 4-way S-box of density W with 6W switches, has been proposed. However, this model only accommodates 2-pin nets, therefore designing a general S-box for all kinds of nets is important and necessary. An FPGA switch box is said to be hyperuniversal if it can detailed route all possible multi-pin net topologies satisfying the global routing density constraints. It is optimum if it is hyper-universal and the number of switches inside is minimum.

It has been shown that if the net topology is restricted to 2-pin nets, then a 2-D (4-way) switch box can be built to be universal with only 6W switches, where W is the global routing channel density. As the routing resource is relatively expensive in FPGA chips, searching for optimum switch box designs is clearly a topic with both theoretical challenges and immense commercial silicon reduction values. In [8] a formal mathematical model of this optimum switch box design problem for arbitrary dimensions has been constructed. It gave a scheme to produce hyper-universal designs with less than 6.7W switches for 4-way FPGA switch boxes. In this paper, we will further investigate this most common 4-way switch box case, and give new theoretical results followed by extensive experimental justifications. The results seem to be quite attractive. We show that such an optimum switch box can be built with a very low number of additional switches beyond 6W for today's practical range of low W's. If we represent the number of switches of an optimum 4-way switch box with a density of W to be e(4, W), then e(4, W) = 6W for W being 2, 3, or 5; e(4, W) = 6W + 1 for W = 4;  $e(4, 6) \le 6W + 2$ , and  $e(4, 7) \le 6W + 1$ . For arbitrary large W's, the bound can be shown to be under 6.34W. To make an even experimental comparison, we run VPR [15], today's published best FPGA router, for large benchmarks on the popular Disjoint FPGA switch box structure and our proposed designs. The results are quite encouraging.

This paper is organized as follow. Section 2 gives definitions and various graph design problems associated with the switch box designs. For completeness, some basis previous results on S-box designs and a decomposition property of global routings will also be mentioned. In Section 3, we provide our new results for optimum or hyper-universal S-boxes. Section 4 shows our experimental results and we give our conclusions in Section 5.

### 2. PRELIMINARIES

The terminology and symbols of graphs are referred to [3]. Let G = (V(G), E(G)) be a simple graph with vertex set V(G) and edge set E(G). We denote by |V(G)| (or |G|) and |E(G)| the number of vertices and edges in G, respectively. Let  $S \subset V(G)$ . G[S] denotes the induced subgraph of G by S. We use  $v_{i_1}v_{i_2} \ldots v_{i_l}$  to denote the path with consecutive vertices  $v_{i_1}, v_{i_2}, \ldots, v_{i_l}$ .

Let W and k be positive integers with  $k \ge 2$ . In [8], we have represented a (k, W)-global routing as a collection  $GR = \{N_i | i = 1, \ldots, l\}$  of non-empty subsets of  $\{1, \ldots, k\}$  such that each element of  $\{1, 2, \ldots, k\}$  belongs to exactly W subsets of GR. W is called the **density** of the global routing, and a (k, W)-global routing is also called a k-way global routing with density W. Each  $N_i$  in GR is referred to as a **net** of the global routing. We note that a global routing GR is a multiple set; two equal sets in GR represent two different nets in the global routing. Note also that a net of cardinality n corresponds to an n-pin net. For simplicity, 1-pin nets are allowed to ensure that each element of  $\{1, 2, \ldots, k\}$  appears exactly W subsets of GR.

Let  $GR_1$  and  $GR_2$  be global routings and m a positive integer. The union of  $GR_1$  and  $GR_2$  as multiple set is denoted by  $GR_1 + GR_2$ , and  $mGR_1$  is the union of  $m GR_1$ 's.

Having given a global routing a local and mathematical view, we further view the track with ID j on the *i*-th side of an S-box as a vertex  $v_{i,j}$  and a switch connecting the track with ID j on the *i*-th side and the track with ID m on the *h*-th side is an edge  $v_{i,j}v_{h,m}$ . Therefore, any k-way S-box of density W can be represented as a k-partite graph G on  $\bigcup_{i=1}^{k} V_i$ , where  $V_i = \{v_{i,j} | j = 1, \ldots, W\}$  and each  $V_i$  is an independent set in G for  $i = 1, \ldots, k$ . We call such a graph G a (k, W)-design. In particular, a 4-way S-box of density W is a (4, W)-design.

Let G be a (k, W)-design on  $(V_1, \ldots, V_k)$ . A detailed (k, W)routing (or shortly detailed routing) of a (k, W)-global routing  $\{N_i | i = 1, \ldots, l\}$  in G is a set of mutually vertex disjoint subgraphs  $\{T(N_i) | i = 1, \ldots, l\}$  of G satisfying: (1)  $T(N_i)$  is a tree of  $|N_i|$  vertices, and (2)  $|V_j \cap V(T(N_i))| = 1$  if  $j \in N_i$ , for  $i = 1, \ldots, l$ .  $T(N_i)$  is called a **detailed routing** of  $N_i$ .

A hyper-universal (k, W)-design on  $(V_1, \ldots, V_k)$  is a (k, W)design on  $(V_1, \ldots, V_k)$  such that it contains a detailed routing for each (k, W)-global routing. For example, the complete k-partite graph on  $(V_1, \ldots, V_k)$  (in which, there is an edge joining each pair of vertices  $v_{i,j}$  and  $v_{i_1,j_1}$  with  $i_1 \neq i$ ) is a hyper-universal (k, W)design. A hyper-universal (k, W)-design represents a k-way S-box of density W (also called a (k, W) S-box) which can accommodates any (k, W)-global routings.

An optimum (k, W)-design is a hyper-universal (k, W)-design with the minimum number of edges. Clearly, the number of edges in an optimum (k, W)-design is uniquely determined by k and W, which is denoted by e(k, W).

A global routing is called **primitive** if it does not contain two unequal nets of size 1. If a (k, W)-global routing GR is not primitive, then we can combine the unequal nets of size 1 into nets of size 2 to obtain a primitive (k, W)-global routing GR'. Any detailed global routing of GR' will induce a detailed global routing of GR by simply deleting the edges of those one edge trees representing the nets of size two in GR' which are obtained by combining the unequal nets of size 1 in GR. Therefore, to verify that a (k, W)-design is hyper-universal, we need only to show that each primitive global routing is detailed routable. Our approach depends on a very nice decomposition property of global routings. Let GR be a (k, W)-global routing and GR'be a sub-collection of GR. If GR' is a (k, n)-global routing with n < W, GR' is called a **sub-global routing** of GR. GR is said to be **minimal** if it does not contain subglobal routings. The following result was proved in [7].

LEMMA 1. For any integer k with  $k \ge 2$ , there exists an integer f(k) such that any k-way global routing GR could be decomposed into minimal k-way subglobal routings with densities at most f(k). Moreover, f(k) = k - 1 for k = 2, 3, 4.

In [8], we have developed a general reduction technique for designing (k, W) S-boxes.

- **I.** Find f(k) and all k-way minimal global routings. The existence of f(k) and the finiteness of the number of minimal k-way global routings are guaranteed by Lemma 1.
- **II.** Determine p and r such that W = pq + r, p and r are as small as possible so that any (k, W)-global routing is a union of q subglobal routings of density p and a subglobal routing of density r. (Note that as k is fixed, each W corresponds to a unique r, we may have more than one r as W varies but there are finitely many such r's for all W).
- **III.** Design a hyper-universal (k, p) S-box  $S_1$  and a (k, r) S-box  $S_2$  with the number of switches as small as possible. Then a disjoint union of q copies of  $S_1$ s and an  $S_2$  is a hyper-universal (k, W) S-box.
- **IV.** Design an efficient detailed routing algorithm to detailed-route any k-way global routing in the S-boxes designed in III.

We note that we have given an efficient algorithm for IV. Now we focus on 4-way S-box designs.

For Step 1, Lemma 1 shows that f(4) = 3. For Step 2, [8] showed that any (4, W)-global routing can be decomposed as a union of (4, 6)-global routings and at most 1 (4, r)-global routing for some r = 1, 2, 3, 4, 5 and 7.

The following are all primitive minimal (4, W)-global routings.  $GR_1^1 = \{\{1, 2, 3, 4\}\}.$ 

```
GR_{2,1}^{\hat{1}} = \{\{1,2\},\{3,4\}\}, GR_{2,2}^{\hat{1}} = \{\{1,3\},\{2,4\}\}, GR_{2,3}^{\hat{1}} = \{\{1,4\},\{2,3\}\},
GR_{3,1}^{1} = \{\{1\}, \{2, 3, 4\}\}, GR_{3,2}^{1} = \{\{2\}, \{1, 3, 4\}\},\
GR_{3,3}^1 = \{\{3\}, \{1,2,4\}\}, \ GR_{3,4}^1 = \{\{4\}, \{1,2,3\}\}.
GR_{1,1}^2 = \{\{1,2,3\},\{1,2,4\},\{3,4\}\}, GR_{1,2}^2 = \{\{1,2,3\},\{2,3,4\},\{1,4\}\},
GR_{1,3}^2 = \{\{1, 2, 4\}, \{2, 3, 4\}, \{1, 3\}\}, GR_{1,4}^2 = \{\{1, 3, 4\}, \{2, 3, 4\}, \{1, 2\}\},\
GR_{1,5}^2 = \{\{1,2,3\},\{1,3,4\},\{2,4\}\}, \ GR_{1,6}^2 = \{\{1,2,4\},\{1,3,4\},\{2,3\}\},
GR_{2,1}^2 = \{\{1,2,3\},\{1,4\},\{2\},\{3,4\}\}, GR_{2,2}^2 = \{\{1,2,4\},\{1,3\},\{2\},\{3,4\}\},
GR_{2,3}^{2} = \{\{2,3,4\},\{1,4\},\{2\},\{1,3\}\}, GR_{2,4}^{2} = \{\{1,2,3\},\{1,4\},\{3\},\{2,4\}\},
GR_{2,5}^2 = \{\{1,4,3\},\{1,2\},\{3\},\{2,4\}\}, GR_{2,6}^2 = \{\{2,4,3\},\{1,2\},\{3\},\{1,4\}\}.
GR_{2,7}^2 = \{\{2,4,3\},\{1,2\},\{4\},\{1,3\}\}, GR_{2,8}^2 = \{\{1,4,3\},\{1,2\},\{4\},\{2,3\}\},
GR_{2,9}^2 = \{\{1, 2, 4\}, \{1, 3\}, \{4\}, \{2, 3\}\}, GR_{2,10}^2 = \{\{1, 2, 4\}, \{4, 3\}, \{1\}, \{2, 3\}\}.
GR_{2,11}^2 = \{\{1,4,3\},\{4,2\},\{1\},\{2,3\}\}, GR_{2,12}^2 = \{\{2,1,3\},\{4,2\},\{1\},\{4,3\}\},
GR_{3,1}^2 = \{\{1,2\},\{3,1\},\{2,3\},\{4\},\{4\}\}, GR_{3,2}^2 = \{\{1,2\},\{4,1\},\{2,4\},\{3\},\{3\}\},
GR_{3,3}^2 = \{\{1,3\},\{4,1\},\{3,4\},\{2\},\{2\}\}, GR_{3,4}^2 = \{\{3,2\},\{4,3\},\{2,4\},\{1\},\{1\}\},
GR_1^3 = \{\{1, 2, 3\}, \{1, 2, 4\}, \{3, 4, 1\}, \{2, 3, 4\}\},\
GR_{2,1}^3 = \{\{1,2,3\},\{1,4\},\{2,4\},\{3,4\},\{1,2,3\}\},\
GR^3_{2,2} = \{\{2,3,4\},\{1,2\},\{1,3\},\{1,4\},\{2,3,4\}\}.
GR^3_{2,3} = \{\{3,4,1\},\{2,1\},\{2,3\},\{2,4\},\{3,4,1\}\},
GR_{2,4}^3 = \{\{4,1,2\},\{3,1\},\{3,2\},\{3,4\},\{4,1,2\}\}.
```

Table 1. Primitive minimal 4-way global routings

For Step III, [8] gave a hyper-universal (4, W)-design F(W) with less than 6.7W switches.

Our goal in this paper is to further investigate Step III to obtain better (4, i)-designs for i = 3, 4 and 6 and hence obtain a better (4, W)-design than the (4, W)-design F(W) constructed in [8].

The following result was proved in [8] which will be used in this paper.

LEMMA 2. Let  $G = ((V_1, V_2, V_3, V_4), E)$  be a hyper-universal (4, W)-design. Then  $|E| \ge 6W$ . G restricting on any two parts gives a hyper-universal (2, W)-design, and G restricting on any three parts gives a hyper-universal (3, W)-design. The optimum (2, W)-design is a perfect matching, and an optimum (3, W)-design is a Hamilton cycle. Moreover, the optimum (3, 4)-design must be a Hamiltonian cycle.

### 3. AN HYPER-UNIVERSAL (4, W)-DESIGN

To design a hyper-universal (4, W)-design, we need to proceed Step III of the reduction design technique mentioned in Section 2. Fig. 3 provides a family of (4, i)-designs  $H_i$  for i = 1, 2, 3, 4, 5, 6, 7in which the label  $V_{i,j}$  follows the labeling rule of S-box in Fig.1. These  $H_i$ 's are needed in the designing of a (4, W)-design for all W according to our technique. In [8], we have shown that  $H_i$ is an optimum (4, i)-design for i = 1, 2. The following lemmas claim that  $H_i$ 's are hyper-universal or optimum (4, i)-designs for i = 3, 4, 5, 6, 7. We will prove these lemmas later in this section.

LEMMA 3.  $H_i$  is an optimum (4, i)-design for i = 3, 4, 5.

LEMMA 4.  $H_i$  is a hyper-universal (4, i)-design for i = 6, 7.

Now define G(W) as the following graph:

| G(W) = | disjoint union of $h H_6$ 's<br>disjoint union of $(h - 1) H_6$ 's and a $H_7$<br>disjoint union of $h H_6$ 's and a $H_2$<br>disjoint union of $h H_6$ 's and a $H_3$<br>disjoint union of $h H_6$ 's and a $H_3$ | if $W = 6h$ ,<br>if $W = 6h + 1$ ,<br>if $W = 6h + 2$ ,<br>if $W = 6h + 3$ ,<br>if $W = 6h + 4$ ,<br>if $W = 6h + 4$ , |
|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
| L L    | disjoint union of h H6's and a H5                                                                                                                                                                                  | if $W = 6h + 5$ .                                                                                                      |

By the definition of  $G_i$ , i = 1, ..., 7, it is easy to see that the number of edges of F(W) for W > 1 is

|          | 19 W                           | if W = 0 (mod 6)           |
|----------|--------------------------------|----------------------------|
|          | 1 <u>9</u> W - 4               | if $W \equiv 1 \pmod{6}$ , |
| 0000     | $\frac{19}{3}W = \frac{2}{3}$  | if $W \equiv 2 \pmod{6}$ , |
| G(W) = 1 | $\frac{19}{3}W = 1$            | if $W \equiv 3 \pmod{6}$ , |
|          | $\frac{19}{3}W = \frac{1}{3}$  | if $W \equiv 4 \pmod{6}$ , |
|          | $\frac{19}{19}W = \frac{5}{2}$ | if $W \equiv 5 \pmod{6}$ . |

THEOREM 5. For W > 1, G(W) is a hyper-universal (4, W)-design.

PROOF. If W = 6h + 1, then  $h \ge 1$  and any (4, 6h + 1)-global routing GR can be decomposed into a union of (h - 1) (4, 6)global routings and a (4, 7)-global routing. (sometimes, GR can be decomposed into h (4, 6)-global routings and a (4, 1)-design. But this does not always happen.) Now it is easy to see that G(6h + 1)contains a detailed routing of GR as G(W) is a disjoint union of (h-1)  $H_6$ 's and an  $H_7$ , and  $H_6$  and  $H_7$  are hyper-universal (4, 6)design and (4, 7)-design, respectively by Lemma 4.

Let W = 6h + i where  $i \neq 1$ . Since the densities of minimal 4-way global routings are 1, 2 or 3, any (4, 6h + i)-global routing GR can be decomposed into h(4, 6)-global routings and a (4, i)global routing for  $2 \leq i \leq 5$ . Since G(6h + i) is a disjoint union



Figure 2: A family of hyper-universal 4-way designs.

of h  $H_6$ 's and an  $H_i$ , and  $H_6$  and  $H_i$  are hyper-universal (4, 6)design and (4, i)-design, respectively by Lemma 3, then G(6h+i)contains a detailed routing of GR.  $\Box$ 

Next we divide the proofs of the lemmas 3 and 4 into subsections. For simplicity, we will label each vertex of  $H_i$  by only the side label, where the corresponding track belongs to.

### **3.1** $H_3$ is an optimum (4,3)-design

We note that  $H_3$  has 18 edges which is the lower bound. Therefore, we need only to show that  $H_3$  is hyper-universal. It is suffice to show that  $H_3$  contains all possible (4, 3)-global routings obtained by combining the primitive minimal global routings in Table 1. Notice that the permutation  $\sigma = (1, 4)(2, 3)$  is an automorphism of  $H_3$ , and therefore, we only need to detailed-route GR in  $H_3$  where GR is a union of those global routings in Table 1 which are not isomorphic to each other under  $\sigma$ . These global routings are

 $\begin{array}{c} GR_{1}^{1}, GR_{2,1}^{1}, GR_{2,2}^{1}, R_{2,3}^{1}, GR_{3,1}^{1}, GR_{3,2}^{1}, \\ GR_{1,1}^{2}, GR_{1,2}^{2}, GR_{1,3}^{2}, GR_{1,6}^{2}, GR_{2,1}^{2}, GR_{2,2}^{2}, \\ GR_{2,3}^{2}, GR_{2,7}^{2}, GR_{2,8}^{2}, GR_{2,9}^{2}, GR_{3,1}^{2}, GR_{3,2}^{2}, \\ GR_{1}^{3}, GR_{2,1}^{3}, GR_{2,3}^{3}. \end{array}$ 

## Table 2. Primitive minimal 4-way global routings which are not $\sigma$ isomorphic.

Let GR be a (4, 3)-global routing which is a union of global routings from Table 2.

**Case 1.**  $GR \in \{GR_1^3, GR_{2,1}^3, GR_{2,3}^3\}$ . Fig. 3(1)-(3) show the detailed routings of GR.

**Case 2.** GR consists of three density 1 global routings. Suppose GR contains h copies of  $GR_{2,3}^1$ . Then  $0 \le h \le 3$ .

When h = 0. We partition  $H_3$  into three subgraphs  $G_1$ ,  $G_2$ and  $G_3$ :  $G_1$  consists of the lower level,  $G_2$  consists of the left of the top and second levels and  $G_3$  consists of the right of top and second level. Each of these subgraphs can detailed-route any of  $GR_1^1$ ,  $GR_{2,1}^1$ ,  $GR_{2,2}^1$ ,  $GR_{3,1}^1$  and  $GR_{3,2}^1$ . Therefore, we can detailed-route GR in  $H_3$ .

When h = 3.  $GR = 3GR_{2,3}^1$ . Fig. 3(4) gives the detailed routing of GR.

When h = 1. If GR contains  $GR_{2,3}^1 + GR_{2,2}^1$ , then we detailed-route it as in Fig. 3(5), and if GR contains  $GR_{2,3}^1 + GR_{3,2}^1$ , we



Figure 3: Detailed routings for  $H_3$  and  $H_4$ .

detailed-rout  $GR_{2,3}^1 \cup GR_{3,2}^1$  as shown in Fig. 3(6). We note that the unused part in  $H_3$  can detailed-rout any

 $R \in \{GR_1^1, GR_{2,1}^1, GR_{2,2}^1, GR_{3,1}^1, GR_{3,2}^1\}$ . This proves that GR is routable in  $H_3$ .

For those GR's which do not contain  $GR_{2,2}^1$  and  $GR_{3,2}^1$ , detailed routings in  $H_3$  are shown in Fig. 3(7)-(9).

Finally when h = 2. If we detailed-route  $2GR_{2,3}^1$  as shown in Fig. 3(10), then the unused top level can detailed-rout any of  $\{GR_1^1, GR_{2,1}^1, GR_{3,1}^1\}$ . For  $GR = 2GR_{2,3}^1 + GR_{2,2}^1$  or  $GR = 2GR_{2,3}^1 + GR_{3,2}^1$ , a detailed routing is given by Fig. 3(11),(12). **Case 3.** GR contains a minimal global routing of density 2.

Fig. 3(13)-(24) show all detailed routings of density 2 global routings in  $H_3$ . Then the lower level of  $H_2$  which is not used in all this diagrams can be used to detailed-route and  $B \in [CP^1, CP^1, CP^1,$ 

 $R \in \{GR_1^1, GR_{2,1}^1, GR_{2,2}^1, GR_{3,1}^1, GR_{3,2}^1\}.$ 

When GR is a union of density 2 global routing and  $GR_{2,3}^1$ . A detailed routings of GR in  $H_3$  are given in Fig. 3(25)-(36).

This completes the verification that it is detailed routable in  $H_3$  for all (4, 3)-global routings, and hence  $H_3$  is hyper-universal.

We note that only when  $GR \in \{2GR_{2,3}^1 + GR_{2,2}^1, 2GR_{2,3}^1 + GR_{3,2}^1, GR_{2,3}^1\}$  we need the top level edge  $\{1, 4\}$  in our detailed routing. This fact will be used in the verification of  $H_6$ .

### **3.2** $H_4$ is an optimum (4, 4)-design

We first show that  $H_4$  is hyper-universal. Let GR be any (4, 4)global routing which is a union of global routings from Table 1. If GR is a union of a minimal (4, 3)-global routing and a (4, 1)global routing from Table 1, we can detailed-route the five minimal (4, 3)-global routings as in Fig. 3(a). Note that the unused part in  $H_4$  is a cycle 1, 2, 4, 3 which can be used to detailed-route all the (4, 1)-global routings except  $GR_{2,3}^1$  from Table 1. If GR contains a  $GR_{2,3}^2$ , a detailed routing of GR in  $H_4$  is given in Fig. 3(b).

Now assume that GR is a union of two (4, 2)-global routings, then GR is routable in  $H_4$  as  $H_4$  contains two disjoint  $H_2$ 's.

This proves that  $H_4$  is hyper-universal. The fact that  $H_4$  is an optimum design follows from the following result which also indicates that the lower bound 6W given in Lemma 2 is not achievable in general.

THEOREM 6. There is no 3-regular hyper-universal (4, 4)-design. An optimum (4, 4)-design must have at least 25 edges.

**Proof:** Suppose that there is a 3-regular hyper-universal (4, 4)-design G. Then G is a 4-partite graph on  $(V_1, V_2, V_3, V_4)$ ; there is a 4-matching between each pair of  $V_i, V_j$  for  $i \neq j$  and the induced subgraph of G on each set  $V_i \cup V_j \cup V_m$   $(i \neq j \neq m)$  is a cycle (see Lemma 2). Based on these facts, we construct all such 3-regular graphs and verify that they are not hyper-universal. We delete the detailed verification here.

# **3.3** $H_i$ is a hyper-universal (4, i)-design for i = 5, 6 and 7

Note that any (4, 5)-global routing is a union of a (4, 3)-global routing and a (4, 2)-global routing, and  $H_5$  contains an  $H_3$  and a disjoint  $H_2$ , therefore,  $H_5$  is hyper-universal. Also note that  $H_5$  has 30 edges which is the lower bound of an hyper-universal (4, 5)-design. Therefore,  $H_5$  is an optimum design.

 $H_7$  is a hyper-universal (4, 7)-design is similarly proved.

To show that  $H_6$  is a hyper-universal S-box, we need to show that  $H_6$  contains a detailed routing for every (4, 6)-global routing.

Again, we note that the permutation  $\sigma = (1, 4)(2, 3)$  is an automorphism of  $H_6$ , and therefore, we only need to check those global routings which are union of global routings from Table 2. Let *GR* be such a global routing.

**Case 1**  $GR = GR_1 + GR_2$ , where both  $GR_1$  and  $GR_2$  are (4,3)-global routings.

It is easy to see that  $H_6$  contains a detailed routing of GR as  $H_6$  contains two disjoint  $H_3$ 's.

**Case 2**  $GR = GR_1 + GR_2 + GR_3$ , where each  $GR_i$  (i = 1, 2, 3) is a primitive minimal global routings from Table 2.

Let K(i, i+1) be the subgraph of  $H_6$  which consists of the levels i and i + 1. We have three disjoint subgraphs K(1, 2), K(3, 4) and K(5, 6) of  $H_6$ . By Fig. 3(13)-(24), we see that K(1, 2) and K(5, 6) can detailed-route any minimal (4, 2)-global routings in Table 2. All detailed routing of minimal global routing in K(3, 4) are routable. Therefore, there is a detailed routing for GR in  $H_6$ , and hence  $H_6$  is hyper-universal.

### 4. EXPERIMENTAL RESULTS

From our combinatorial analysis shown above, it seems quite surprising that an optimum (hyper-universal) S-box can actually be built by using only very few more switches beyond the widely believed lower bound of 6W. It is also interesting to see that there exist HUSBs with switch density of only 6W for some W's and the construction of optimum HUSBs seem to hardly possess regular scalability which can be observed in the construction of some 4way USB family. Besides the theoretical analysis, in order to get some experimental justification, we choose to adopt the currently known best FPGA router VPR [15], which is available on the Web, for our experiment. The logic block structure for our VPR runs is set to consist of one 4-input LUT and one flip-flop. The input or output pin of the logic block is able to connect to any track in the adjacent channels,  $F_c = W$ . Inside the switch box, each input wire segment can connect to three other output wire segments of other channels,  $F_s = 3$ .

In order to have an even comparison (partially is also due to the limitation of VPR router limiting  $F_s$  to 3) with the well-known Disjoint structure, we deliberately eliminate the "additional" switches of our HUSBs to make our H'USBs have density of 6W, which is the same as Disjoint S-boxes. Fig. 4(b) and Fig. 5 show the structure of a hyper-universal S-box and a routing result of our experiments.

In table 3, we show the comparing results of the number of tracks required to route some larger MCNC benchmark circuits [16] by Disjoint and our H'USB FPGAs. Overall, the H'USB FPGAs use about 10% less tracks than the Disjoint FPGAs. (Beware that since the VPR is a simulated annealing based non-deterministic router, the results we produced for Disjoint FPGAs could be a bit different to their other reported results.)

### 5. CONCLUSION

In this paper, we have again applied the reduction design technique developed in [8] in 4-way S-box designs. The new (4, W)S-box has at most  $6.\overline{3}W$  switches compared with our previous one which has about  $6.\overline{6}W$  switches. We note that, according to our new design, we could have a complete database for detailed routings in  $H_i$  for i = 2, 3, 4, 5, 6 and 7 and hence have an efficient detailed routing algorithm to detailed-route any (4, W)-global routing in the S-box G(W).

|          | Disjoint | H'USB        |
|----------|----------|--------------|
| alu4     | 12       | 10           |
| apex2    | 12       | 11           |
| apex4    | 15       | 13           |
| bigkey   | 8        | 7            |
| des      | 9        | 8            |
| diffeq   | 9        | 8            |
| dsip     | 7        | 7            |
| elliptic | 11       | 11           |
| ex5p     | 15       | 13           |
| misex3   | 13       | 12           |
| seq      | 12       | 12           |
| spla     | 16       | 14           |
| tseng    | 8        | 7            |
| e64      | 9        | 8            |
| Total    | 156      | 141 (-9.62%) |

Table 3. Channel widths required for different benchmark circuits  $F_C = W$ ,  $F_S = 3$ 



Figure 4: Structures of S-boxes.



Figure 5: Routing result of e64 by using H'USB S-Box, W=8

We have also provided one example to show that 6W is not a lower bound of the number of switches in an optimum (4, W) Sbox for some W while which is widely believed to be. This suggests that our newly design G(W) is very close to an optimum. We are currently working on applying our combinatorial analysis models for other FPGA routing architecture designs.

### 6. **REFERENCES**

- M.J. Alexander and Gabriel Robins, "New Performance FPGA Routing Algorithms," Proceedings DAC, pp. 562-567. 1995.
- [2] Altera Corp., The Maximalist Handbook, 1990.
- [3] J.A. Bondy and U.S.R. Murty, Graph Theory with Applications. London: Macmillan Press 1976.
- [4] S. Brown, R. J. Francise, J. Rose and Z.G. Vranesic, *Field-Programmable Gate Arrays*, Kluwer-Academic Publisher, Boston MA, 1992.
- [5] S. Brown, J. Rose and Z.G. Vranesic, "A detailed router for field-programmable gate arrays", IEEE Trans. Computer-Aided Design, vol.11, pp.620-628, May 1992.
- [6] Y. W. Chang, D. F. Wong and C. K. Wong, "Universal switch modules for FPGA design", ACM Trans. on Design Automation of Electronic Systems, 1(1):80-101, January 1996.
- [7] H. Fan, P. Haxell, and J. Liu, "The global routing—a combinatorial design problem," (submitted).
- [8] H. Fan, J. Liu, and Y. L. Wu, "General Models for Optimum Arbitrary-Dimension FPGA Switch Box Designs," Proc. IEEE International Conference on Computer-Aided Design (ICCAD). pp. 93-98, Nov. 2000, San Jose.
- [9] Y.S. Lee and Allen C.H. Wu, "A Performance and Routability Driven Router for FPGAs Considering Path Delays," Proceedings DAC, pp. 557-561, 1995.
- [10] J.F. Pan, Y.L. Wu, C. K. Wong and G. Yan, "On the Optimal Four-Way Switch Box Routing Structures of FPGA Greedy Routing Architectures," Integration, the VLSI Journal. Vol. 25, pp. 137-159, 1998.
- [11] Y.L. Wu and D. Chang, "On NP-Completeness of 2-D FPGA Routing Architectures and a Novel Solution," Proceedings of International Conference on Computer-Aided-Design 1994, pp. 362-366.
- [12] Y.L. Wu, and M. Marek-Sadowska, "Routing for Array Type FPGAs," IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, Vol. 16, No. 5, pp. 506-518, May 1997.
- [13] Y.L. Wu, S. Tsukiyama and M. Marek-Sadowska, "On computational complexity of a detailed routing problem in two-dimensional FPGA's", in Proc. 4th Great Lakes Symp. VLSI, Mar. 1994.
- [14] Y.L. Wu, M. Tsukiyama and M. Marek-Sadowska, "Graph based analysis of 2-D FPGA routing," IEEE Trans. Computer-Aided Design 15(1)(1996) 33-44.
- [15] Vaughn Betz and Jonathan Rose, "A New Packing, Placement and Routing Tool for FPGA Research," Seventh International Workshop on Field-Programmable Logic and Applications (Available for download from http://www.eecg.toronto.edu/~jayar/software.html), 1997, pp. 213-222.
- [16] S. Yang, "Logic Synthesis and Optimization Benchmarks, Version 3.0" Tech. Report, Microelectronics Centre of North Carolina, 1991