# $M^2R$ : Multilayer Routing Algorithm for High-Performance MCMs

Jun Dong Cho, Kuo-Feng Liao, Salil Raje, and Majid Sarrafzadeh

Abstract-We introduce a new multilayer routing strategy for high-performance MCMs whose objective is to route all nets optimizing routing performance and to satisfy various design constraints (e.g., minimizing coupling between vias as well as between signal lines and minimizing discontinuities such as vias and bends). First we introduce the Pin Pre-wiring and Redistribution Problem, which redistributes the pins or prewired subnets uniformly over the MCM substrate using pin redistribution layers. Pin redistribution is very important in MCM design. Our experience shows that it not only provides a global distribution for the pins congested in the chip site over the chip layer so as to ease the future routing difficulty, but also reduces the capacitive coupling between vias induced by many layers (up to 63 layers) by separating the pins far apart. The goal of the problem is to minimize the number of layers required to redistribute the entire set. An effective approach is proposed for solving this problem. Next we develop four effective algorithms for signal distribution, i.e., two variations on both single-layer routing and xy plane-pair routing paradigms. Based on these algorithms, a mixed version of single-layer routing and xy plane-pair routing techniques is proposed to establish a good trade-off between them to favor circuit performance and/or design objective instead of overemphasizing on the area minimization. One strategy is to apply single-layer routing iteratively until  $\alpha$ % of the nets are routed, then route the remaining  $(100 - \alpha)\%$  nets by xy planepair routing process. This provides the designer with a trade-off (e.g., between the number of layers and total number of vias) and shows the versatility of the proposed techniques. Various strategies are compared using practical MCM examples (each MCM has 25-100 ICs, 25-60 I/Os per IC, and 50-1,200 nets).

#### I. INTRODUCTION

**P**ACKAGING is becoming a limiting factor in translating semiconductor speed into system performance. In highend systems such as supercomputers, mainframes and military electronics, 50% of the total system delay is usually due to packaging, and by the year 2000, the share of packaging delay is expected to rise to 80% [1]. Moreover, increasing circuit count and density in circuits continued to place further demands on packaging. In order to minimize the delay, chips must be placed close together. Thus, the *Multi-Chip Module* 

Manuscript received February 11, 1993. This work was supported in part by the National Science Foundation under Grants MIP-8921540 and MIP-9207267. The Associate Editor responsible for recommending this paper was Dieter Mlynski. A preliminary version of this paper appeared in the *Proceedings of the Fourth Annual IEEE International ASIC Conference and Exhibit*, September 1991, pp. 9-2.1-9-2.4.

J. D. Cho is now with Samsung Electronics, Buchun, Kyunggi-Do, Korea. K.-F. Liao is now with IBM East Fishkill, Rt. 52, Bldg. 306, Zip 3A1, Hopewell Junction, NY 12533.

M. Sarrafzadeh is with the Department of Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208. IEEE Log Number 9400559. (MCM) [2] technology has been introduced to improve system performance significantly by virtue of the elimination of an entire level of interconnection. An MCM is a packaging technique that places several semiconductor chips, interconnected in a high-density substrate, into a single package. This innovation led to major advances in interconnection density at the chip level of packaging. Compared with single chip packages or surface mount packages, MCMs reduce circuit board area by five to 10 times and improve system performance by 20% or more. Therefore, MCM is used in a large percentage of today's mainframe computers as a replacement for the individual packages. The size of MCMs varies widely [8]: 10-150 ICs, 40-1000 I/Os per IC, 1.000 -10,000 nets, where the low end is ceramic/wire-bond, the high end is thin film/flip chip (maximum linear dimension is now up to 4-6 inches for thin-film MCMs, up to around 8.5 inches for ceramic MCMs, and up to 18 inches for laminated MCMs).

An example of the early MCM is the thermal conduction module (TCM) developed for the IBM 3081 mainframe computer [2]. The TCM substrate compresses a wiring network of extraordinary complexity into  $9 \times 9 \ cm^2$  square with roughly 5.5 mm thickness. On the top surface are sites for between 100 and 133 high-speed chips, with a total of more than 12,000 chip contact pads. The substrate itself has 33 molybdenum metallized alumina layers that are interconnected by more than 350,000 vias. A typical substrate has 130 meters of wires on these planes. A key design feature is the routing of all signal connections from the chip, through the uppermost five layers, to an array of surface pads, which in turn are connected to internal wiring layers. At the top, there is a bonding level on which chips are flip-mounted via solder bumps. The next five layers are for redistribution and have a 0.25 mm line pitch that matches the chip contact pad spacing. The redistribution layers are required to connect the chip bonding pads through the surface pads to the signal wires, because the x-y signal wires are not sufficiently dense to connect directly to the bonding pads in the small area in which they are concentrated. This is referred to as the escape problem. The two rows of surface pads, so-called engineering change pads (EC pads) surrounding the chip, can be also used for engineering changes. If a wiring change is required in the module, either because of design changes or because of wiring defects, EC pads are provided to allow surface wires to be routed from one chip site to another (see [2] for more details on engineering changes of TCM). Sixteen of the following layers in the substrate are x or y signal distribution planes with a line pitch of 0.5 mm. Between each pair of signal planes is a voltage reference plane

1057-7122/94\$04.00 © 1994 IEEE



Fig. 1. The chip layer, pin redistribution layers, and signal distribution layers in MCM

(total of eight) to control the impedance of the signal lines. Three following power distribution planes (for two supply levels and ground) bring the number of layers up to 33 (1 + 5 + 16 + 8 + 3 = 33).

Recently, [14] introduced the S/390 alumina TCM which is used in an intermediate-performance processor in IBM  $ES/9000^{TM}$ . The new TCM utilizes a new multilayer ceramic substrate, top-surface thin-film redistribution wiring, and a new cooling technology which allows the package to dissipate 600 W, and uses 2772 pins to connect with the second-level package. In this module, CMOS and bipolar chip technologies are packaged together on a single TCM for the first time.

Fig. 1 shows an example of a typical MCM, where chips are placed and bonded on a surface at the top layer (called *chip layer*). Below the *chip layer*, a set of *pin redistribution layers* is provided for distributing chip I/O pins to the *signal distribution layers*.

In this paper, we develop a new type of multilayer router for high-performance MCMs, called  $M^2R$ .  $M^2R$  is based on two "previously well-established" routing paradigms, *singlelayer routing* and *xy plane-pair routing*. Our research on the multilayer router is motivated by a number of performancedriven routing requirements of designing MCMs (as we will show in Section 3). Thus, our routing algorithm is also different from the conventional routing approaches that usually do not take those performance issues into full consideration.

This paper is organized as follows. Section 2 reviews the previous works. We investigate the routing requirements on MCM layout in Section 3. Next, in Section 4, we define and formulate the solution to the multilayer routing problem for MCMs. In Section 5, we present an overall routing strategy. A pin redistribution strategy is presented in Section 6. An effective routing strategy based on *single-layer routing* and *xy plane-pair routing* is proposed in Section 7. Finally, experimental results are shown in Section 8, followed by the conclusion of this paper.

### II. PREVIOUS WORKS

There are several new and interesting MCM routers associated with this new packaging environment [18], [9], [6], [20], [3].

Automatic layout of silicon-on-silicon hybrid packages developed at Xerox PARC was presented in [18]. Placement determines the relative positions of the ICs automatically or interactively and organizes the routing areas into channels. The hybrid routing uses the topological model that reduces the complexity of hybrid routing by abstracting away the geometrical information. Computation of geometry is deferred until needed. Global routing attempts to find a minimum Steiner tree based on symmetric expansion from all pins. An improvement phase follows to minimize hybrid area by selecting and rerouting critical nets. Detailed routing is based on the enhanced dogleg router that guarantees routing completion even in the presence of constraint cycles. Once two-dimensional global paths for all nets are defined, the next problem is to assign wires to specific layers, such that some objective function is minimized and the specific constraints are satisfied. This problem is referred to as the constrained layer assignment problem. [9], [20] investigated the layer assignment problem that arises in MCM and presented approximation algorithms on the number of xy plane-pairs. A multilayer router for generating rubber band sketches is described in [6]. The router uses hierarchical top-down partitioning to perform global routing for all nets simultaneously. Layer assignment is performed during the partitioning process to generate routing that has fewer vias and is not restricted to one-layer one-direction. The local router generates shortest-path rubber band routing.

Another version of layer assignment is to generate a graph based on interference analysis without actually performing the routing in the layer assignment stage. Some metric of pairwise interference between nets can be used to generate a weighted graph. Given a net interference graph (NIG), the problem is to assign vertices (nets) to minimum number of colors (layers) so that the total edge weight (interference between nets) is minimized. We refer to this problem as topological multilayer assignment [11], [16], [19], [21]. Chen and Wong [5] proposed a channel-based thin-film wiring methodology (using two layers) considering cross talk minimization between adjacent transmission lines. Recently, [3] proposed a topological multilayer routing strategy for high-performance MCMs, focusing on cross talk minimization between nets, while simultaneously minimizing the number of vias and layers. In that paper, a net interference measure based on potential cross talk and planarity is used to construct a net interference graph, and a new graph coloring and linear ordering algorithm is used to find an interference-minimized subset in each layer and a minimum cross talk between layers.

Another type of the performance-driven multilayer routing for MCMs is shown in [13]. The routing problem is formulated as the k-layer planar subset problem which is to choose a maximal subset of nets such that each net in the subset can be routed in one of k "preferred" layers.

# **III. MCM ROUTING REQUIREMENTS**

The primary goal of MCM routing is to meet the high performance requirements and design objective, rather than overconstraining the layout area minimization. There are two different types of requirements on MCM routers which are as follows.

# A. Generic Requirements for Performance-Driven Routers

In general, an efficient MCM multilayer routing algorithm for high-speed systems should satisfy the following features [1].

- Propagation delays associated with discontinuities: Unlike conventional approaches, the electrical characteristics of the packages require the signal lines to be treated as transmission lines. In practice, transmission lines are not perfectly uniform. That is, in the package level, significant reflections can be generated from capacitive and inductive discontinuities along the transmission lines. Moreover, in a multilayer ceramic substrate of MCM, wires at different levels do not have exactly the same impedance. Such mismatches of line impedance can cause reflections from the junction points such as vias and bends. These discontinuities must be controlled in order to keep the resulting reflections to a minimum. Therefore, producing a via/bend-minimum layout for MCM is very important. Furthermore, to reduce the voltage drops at the voltage (or ground) vias, the vias have to be placed far apart from each other. Note that the redistribution vias are neglected in the calculation of the voltage drop because they are five times shorter than the signal vias.
- Cross-talk between signal lines: Cross-talk noise is a parasitic coupling (i.e., mutual capacitances and inductances) phenomenon between neighboring signal lines. This noise becomes serious in a high-density wiring substrate with narrow line pitch. The closer the lines, the higher they are from the ground plane, and the longer they are adjacent - the larger the amount of coupling is. The couplings between the lines can be minimized by making sure that no two lines are laid out in parallel or next to each other for longer than a maximum length. Noise can be reduced by keeping the lines far apart. The couplings can also be reduced by placing ground lines between signals. For the signal lines running in parallel in adjacent planes, we can control the impedance of the signal lines by placing a ground plane between the planes. We can also eliminate the problem by forcing wires of one layer to be laid out orthogonally (0 and 90 degrees) and wires of its adjacent layers to be laid out diagonally (45 and 135 degrees).
- The skin effect in thin-film interconnections for ULSI/VLSI packages: As the rise time of digital pulses is reduced to the subnanosecond range, the skin effect becomes an important issue in high-speed digital systems. The skin effect is defined as follows: an effect characteristic of current distribution in a conductor at high frequencies by virtue of which the current density is greater near the surface of the conductor than in its interior. As the frequency, the conductivity, and the permeability of the conductor is increased, the current concentration is increased. It results in increasing resistance and decreasing internal inductance at frequencies for which this effect is significant. The conductor loss caused by the skin effect is

the most significant contributor to losses in a micro-strip line at high frequencies. It was shown in [10] that the maximum length of interconnections in an ULSI/VLSI package is limited by the skin effect. Therefore, the maximum length (upper-bound of wire pattern length plus via length) can no longer be ignored.

All kinds of noises mentioned above increase delay or cause inadvertent logic transitions, so that they should be minimized through careful design.

Besides those performance issues, there are a number of routing requirements for MCMs [17]. Among them are to:

- 1) Minimize delay, given a priority weight and maximum/fixed delay assignment
- 2) Equalize delay for signal groups, with specified tolerance
- 3) Produce minimum bends
- Handle stacked and unstacked vias and to control the number of vias
- 5) Assign layers constrained with maximum number of vias allowed to a net
- 6) Regulate lengths and delays to meet timing and noise margins
- 7) Be easily extensible as technologies change
- 8) Constrain routes to specified layers (e.g., designated power, ground or signal layers)
- Pick the right kind of via based on the layer change and technology used
- 10) Specify route ordering.
- 11) Accept preferred directions on specific layers or nets

Note that typical wiring rules include topology choices [7], [12] (e.g., distributed or clustered load by stub length control and load ordering). However, in this paper, we propose a flexible routing strategy that best meets most of routing requirements listed above.

# B. Specific Requirements for Different MCM Technologies

These includes handling via types unique to each MCM type (-D, -L, -C) and handling redistribution layers (even if currently few MCMs use them). The comparisons of routing results both on different via types and on different strategies of without/with using pin redistribution layers are shown in Section 8.

Also, design rules differ between technologies. The minimum spacing between signal wires in signal distribution layers is usually integral multiples of the chip contact pad spacing (i.e., in *TCM* [2], 0.25 *mm* line pitch for the chip contact pad spacing and 0.5 *mm* line pitch for signal distribution layers is used). Thus, the pin redistribution layers are used for providing a minimum spacing between signal wires in *signal distribution layers* in some MCMs.

However, in our multilayer routing model for MCMs, we generalize the model by allowing multiple wiring tracks between adjacent vias. The generalized model has been used for *Thin-Film Multi-Chip Module (TFMCM)* [14].

## IV. PROBLEM FORMULATION

To define the routing problem for an MCM, we adopt the conventional multilayer routing environment involving multi-



Fig. 2. An instance of MCM routing,  $O \in S$ ,  $X \in T$ .

terminal nets. We assume that a path goes from cell to cell rather than from grid-point to grid-point. Each plane of an *MCM* consists of a two-dimensional  $m \times m$  grid (called a *basic-grid*), being a square tessellation of the plane, with  $1 \times 1$  being the basic cell-grid size. Therefore, each cell contains at most one pin.

Formally, an instance of the routing problem is a 6-tuple,  $(k, m, S, T, \lambda, \sigma)$  (see Fig. 2), where:

- 1) k: In our model, the routing environment is a collection of k layers, including a chip layer, a number of pin redistribution layers and signal distribution layers.
- 2) m: The number of rows (or columns) of the tessellation.
- 3) S: The set of grid pins (or contact pads) in the basic-grid of the chip layer (i.e., the top layer of MCM, where each pin in S is marked with "O"). The set is denoted by S = {(i,j)|i, j ∈ {0,...,m-1}}, where (i, j) is the cell containing a terminal of S. Pins are labeled with 1, 2, ..., n. Two pin configurations [2] are used: square package with pins at the perimeter and a full two-dimensional array of pins (the densest configuration for a chip carrier)<sup>1</sup>.
- 4) T: The set of grid points of the imposed uniform via-grid (as marked with "X" in Fig. 2), denoted by  $T = \{(i, j) | i = 0 \pmod{\sigma} \text{ and } j = 0 \pmod{\sigma}, i, j \in \{0, \ldots, m-1\}\}$ .  $|T| \geq |S|$  so that it is possible to redistribute all pins  $\in S$  over the uniform via-grid. Note again that the uniform via-grid is used for pin redistribution.
- λ: Intersecting wires must not be placed on the same layer, and adjacent wires on the same layer must be separated by a specified minimum spacing λ.
- 6)  $\sigma$ : The distance between two adjacent via-grid points in the pin redistribution layers. We refer to it as the *separation*. After pin redistribution, signal distribution

layers contain vias to be placed on grid points with a spacing of  $\sigma$ . The separation  $\sigma$  determines the minimum spacing for *signal distribution layers* in [2]. However, we allow multiple wiring tracks between adjacent vias. Thus, in each signal distribution layer,  $(\lfloor (\sigma - 2)/\lambda \rfloor + 1)$  signal lines (tracks) are allowed to run between adjacent via-grid points. Note that  $\sigma$  should be greater than or equal to  $\lambda$  so that the separation  $\sigma$  does not violate the minimum spacing in signal distribution layers.

Here we call the set of pins in S and the set of grid points in T pin terminals and via terminals, respectively. We refer to a *net* as the set of electrically equivalent pin terminals. The model does not assume that via holes are drilled through all the k layers. Thus, a path on layer l can either change layers at any via terminal which is not occupied by any other net or cross over any empty via terminal. This allows the same x-y location to be used as a via terminal by different layers, thus reflecting the availability of segmented vias in the MCM fabrication technology [20]. Each of pin redistribution layers and signal distribution layers consists of a via-grid superimposed uniformly on the basic-grid. The cell-grid size of via-grid for pin redistribution layers is  $\sigma \times \sigma, \sigma > 1$ . Note that the cell-grid size of via-grid for signal distribution layers is  $\sigma \times \sigma$  or  $\lambda \times \lambda$  (we shall discuss in detail in Section 7). Thus, vias are only allowed in such via-grid points.

A single-layer routing is to assign a net entirely to a specific layer and an xy plane-pair routing is to assign a net to layers by splitting a net into one or more layers, i.e., to allow preferred wiring directions, horizontally or vertically, to layers.

# V. OVERALL ROUTING STRATEGY

In MCM routing, first, we do the pin redistribution, which aims to distribute pins uniformly using the pin redistribution layers. The problem of pin redistribution is to assign  $s \in S$ into the closest unique via-grid point  $t \in T$  in the uniform grid. We refer to it as the pin redistribution (PR) problem (PRP). As mentioned earlier, the pin redistribution layers have been used for providing a minimum spacing between signal wires in signal distribution layers in some MCMs [2]. As also emphasized in [2], PRP has also been used for engineering changes purpose. The purpose of an engineering change action is to correct design errors, enhance design performance or modify the logic due to changing design specifications. Our experience showed that it also provides a global distribution for the pins congested in the chip site over the chip layer so as to yield less signal distribution layers and less via counts required to interconnect all nets as we shall show in Section 8.3. The pin redistribution can also be used to reduce the crosstalk problem caused by long parallel vias and signal lines by separating them far apart as we shall discuss in Section 4. Therefore, the pin redistribution is effective in practical MCM routing.

For the pin redistribution problem, we prefer single-layer routing model. Otherwise, if we use the xy plane-pair routing technique for pin redistribution, many vias between xy planepairs are introduced due to multiple bends in a net. We shall show that planar routing is necessary and sufficient.

<sup>&</sup>lt;sup>1</sup>One reason packaging has become so important is the imperative to make the central elements of a computing system exceedingly compact. Thus, the most efficient (densest) contact pad configuration for a chip carrier of MCM is a full two-dimensional array. The design significantly reduces space requirements. The advantage of the two-dimensional array format becomes even greater as the number of contact pads increases.

Minimizing the number of layers is important because it helps to increase the processing yields and improves the package performance. Therefore, in *PRP*, we aim to minimize the number of layers k such that  $\sum_{i=1}^{k} |P_i| = n$ , where  $P_i$  is a *planar subset* of n nets in the *i*-th layer and n is the number of nets. As an output of *PRP*, an S-T assignment A is specified by a set of pairs such that  $A = \{(s, t) | s \in S, t \in T\}$ . Fig. 2 presents an instance of a *PRP* with 8 pins in a  $5 \times 5$  array. In this example, one layer is necessary and sufficient for pin redistribution as shown in Fig. 5.

Finally, we use the redistributed pins as an input to wire the nets in the signal distribution layers. That is, the signal distribution layers are provided for distributing the redistributed chip I/O pins to the various wiring layers. We refer to it as the signal distribution (SD) problem (SDP). Fig. 6 illustrates three proposed strategies for signal distribution (using the redistributed pins shown in Fig. 5 as an input). To solve the SDP, we will first propose two basic signal distribution techniques: single-layer routing and xy plane-pair routing. Then, one derivative from each model will be proposed. In each signal distribution layer, either single-layer routing or xy plane-pair routing will be performed based on the various wiring rules as required for specific design applications.

Our major goal is to guarantee higher circuit performance. Thus, one good strategy is to assign the critical nets near the top layers so that the number of vias as well as via length required for those critical nets will be reduced. Since the routing procedures are being biased by the critical net data, some trade-off should be made to favor circuit performance compared with the processing cost. One effective strategy to solve *SDP* is to use the single-layer routing (planar routing) technique exclusively for the critical nets. For the remaining noncritical nets, xy plane-pair routing is performed, producing smaller number of layers compared with when using singlelayer routing. Our experimental results will show the success of the proposed approach.

If we want only to take advantage of single-layer routing, then the the stacked vias (see Fig. 3 for two via types) that connect the pins on the chip layer to the signal distribution layers are required. We refer to the via type as the primary via. That is, guaranteeing the minimum number of staircase vias (or avoiding them), the number of layers required for wiring entire nets will be greatly increased (thus, the number of stacked vias will be also increased). Otherwise, if we restrict ourself to using xy plane-pair routing, then the routing model introduces many staircase vias per net between two layers of each xy plane-pair. We refer to the via type as the secondary via. The delay difference between two via types in thin-film MCMs is minuscule at less than 10 GHz [8]. However, in certain high-speed applications, we cannot ignore the delay effect on the z-direction bends introduced by staircase vias. Thus, a stacked via may be preferred to a staircase via (i.e., as illustrated in Fig. 3, one staircase via introduces two zdirection bends). We shall show the comparison of using two via types in Section 8 (also illustrated in Fig.  $6 \sim 9$ ).

The following equation summarizes the cost of MCM routing: Cost =  $w_k \times$  number of layers + $w_{strv} \times$  number of staircase vias + $w_{strk} \times$  number of stacked vias + $w_l \times$  total



Fig. 3. Two via types: a *stacked via* is referred to as the via when a net changes a layer at the same grid points, whilea *staircase via* is referred to as the via introduced when a net changes layers at the new position.

wire length, where,  $w_k$ ,  $w_{strv}$ ,  $w_{stkv}$  and  $w_l$  are constants that control the relative importance of the number of layers, number of staircase/stacked vias and the wire length, respectively. Based on the above cost function, one can choose the best MCM routing strategy for her/his own application.

# VI. PIN PRE-WIRING AND REDISTRIBUTION

As we mentioned before, in MCM routing, we first redistribute pins attached on each chip in chip layer over the uniform grid of *pin redistribution layers*. We emphasized in the previous section that separating vias far apart is important in MCM design. Pin redistribution strategy can provide a global distribution for the pins congested in the chip site over the chip layer so as to ease the routing difficulty in the successive layers. It also reduces the cross-talk problem caused by both long parallel vias and long parallel signal lines. The uniform grid in every pin redistribution layer is formed based on the following design requirements. Note that the separation  $\sigma$ exists in the following range:

$$\lambda < \sigma < m/(\lceil \sqrt{|S|} \rceil - 1). \tag{1}$$

The value of  $\sigma$  is determined by the designer. That is, if we want to prefer a redistribution with shorter nets rather than one with wider separation, we set  $\sigma = \lambda$  (i.e., the case where |T| is the largest); otherwise, if we want to maximize  $\sigma$ , we set  $\sigma$  to be the upper bound of the above inequality (i.e., the case where |T| is the smallest). Note that the separation  $\sigma$  may affect the number of layers required [4].

In [4], various approaches to PRP have been introduced. We briefly describe one approach – the Concurrent Maze Router (CMR) — as follows. Note again that in PRP we aim to minimize the number of layers k such that  $\sum_{i=1}^{k} |P_i| = n$ , where  $P_i$  is a planar subset of n nets in the *i*-th layer and n is the number of nets. For the balanced wire distribution of nets to utilize the routing area maximally, an algorithm should process all nets simultaneously. Thus, one idea is to allow the routing procedures dictate the best choices by giving them equal priorities. We consider a two-dimensional array, which consists of  $m \times m$  identical cells, as defined in Section 4. In each layer, we process the  $m \times m$  plane grid, cell by cell, from the upper-left corner to the lower-right corner in the array. We classify the cells into even and odd cells. Every iteration

consists of two phases: in the first phase, we scan all even cells expanding the corresponding nets into the adjacent odd cells; in the second phase, all odd cells are scanned. The sum of x and y coordinates of each even square is even, and odd for each odd square. This ordering scheme is referred to as even-odd ordering. In each step of the concurrent expansion process, we propagate the cells concurrently based on the ordering scheme and perform the following two tasks: Phase 1. wave propagation, Phase 2. backtrace. During the wave propagation, the pairs of points that are closest together, are connected first. Once a pin terminal reaches to any via terminal, the two terminals are interconnected using backtrace algorithm. We continue this concurrent expansion process for all nets until one of the following situations occurs: either nets that have not yet been able to find their via terminals have been blocked by existing nets or all nets have been assigned to via terminals to the current layer. There might exist some pin terminals yet-unassigned to via terminals in the current layer. In this case, they are brought to the next layer, where the same greedy strategy would be applied again for them to be assigned to the closest unoccupied via terminals in the next layers. This process is repeated until every pin terminal is assigned to its unique via terminal. We refer to the above algorithm as greedy-PR.

However, the technique described above is only preliminary and does not take any timing issues into consideration. The new approach to PRP is as follows. We refer to a *s*-net s' as the set of pin terminals of partially connected net. We denote by S' the set of s'. Instead of assigning a single pin terminal to a via terminal as in PRP, an s-net is to be assigned to a via terminal in the new problem. We refer to the assignment (s', t) as a *u*-net  $\in U$ . We refer to the problem as the Pin Pre-Wiring and Redistribution (PWR) Problem (PWRP). As described earlier, we aim to minimize k such that  $\sum_{i=1}^{k} |P_i| = |U|$ , where  $P_i$  is a planar subset of an  $S' \to T$  assignment U in the *i*-th layer. As an output of PWRP, an  $S' \to T$  assignment U is specified by a set of pairs such that  $U = \{(s', t)|s' \in S', t \in T\}$ .

Given a routing plane with obstacles, we formulate the PWRP as a constrained rectilinear Steiner minimum forest (CRSMF). Given a set of points on the plane, the rectilinear Steiner minimum tree (RSMT) problem is to find the rectilinear tree in the plane, of minimum total length, which connects the given set of points. The CRSMF is either a RSMT of entire point set S or a set of disjoint RSMTs of point set  $S \cup T$ . For the latter case, we impose a constraint that each component should contain exactly one point in T. Thus, the PWRP is equivalent to the problem of finding CRSMT for each net, minimizing the number of layers.

We employ the greedy-PR with some extensions. This approach is also considered to be effective in PWRP, because the PWRP has a particular property that the via terminal for each pin terminal is undefined. That is, each s-net can be assigned to any closest via terminal (the PWRP is a less constrained version of the traditional multi-terminal routing problems). Our approach to PWRP is as follows. We define the bounding box for each net as the smallest rectangle enclosing all terminals belonging to the same net. The underlying idea is to choose via terminals inside a net's bounding box rather than ones outside. By establishing u-nets inside the bounding box, we can reduce wire length further compared with when applying PRP (illustrated in Fig. 2). If all via terminals inside the bounding box of net i are already occupied by any other u-net, then the net is allowed to find the closest via terminal outside the bounding box.

In each layer, three steps are considered: Step 1. Setup, Step 2. Interior Bounding Box Routing, Step 3. Exterior Bounding Box Routing. In the setup stage, every cell is initialized with its cell-type and cost. In Step 2, during the concurrent expansion process, we try to find a CRSMF by successively combining two s-nets into a larger s-net until every s-net becomes a u-net. In this step, the concurrent expansion for a s-net is only allowed inside the corresponding bounding box. Once two s-nets belonging to the same net are to be merged into a larger s-net all the relevant information is updated along paths of each s-net.

If there remain s-nets unassigned to any via terminal in current layer, Exterior Bounding Box Routing procedure is performed to find more planar subset in the current layer. Here, the s-nets, which failed to find their via terminals inside the Interior Bounding Box region, are allowed to expand toward the closest via terminals outside the corresponding bounding boxes. If s-nets unassigned to via terminals still remain in the current layer, then they are brought to the next layer, where the same greedy strategy would be applied again for them to be assigned to the closest unoccupied via terminals in the next layers. This process is repeated until every s-net becomes a u-net. We refer to the above algorithm as greedy-PWR.

Fig. 3 illustrates the above process step by step using the instance shown in Fig. 2, as follows: (a) first iteration : all even cells are activated. Net 2 in (0,0) in chip layer is coincident with Target terminal (0,0) in pin redistribution layer, thus they are interconnected using a via. Net 4 in (1,1)are expanded into their two empty adjacent cells. (b) second iteration : the scanner activates the odd wave-front cells at this time, such as Net 1 in (1,0), Net 3 in (0,1), Net 4 in (2,1)and (1,2), Net 3 in (3,2) and (2,3), Net 1 in (4,3) and Net 4 in (3,4) in Fig. 3(b). All nets except Net 3 have been assigned to its closest via-grid points, to be an obstacle to future wiring by assigning a higher cost. (c) third iteration : note that Net 3 in (1,3) is not allowed being expanded to its neighboring cells outside the net's bounding box (e.g., (1,4)) during Interior Bounding Box Routing stage. (d) fourth iteration : Net 3 successfully found its via terminal (0,2) inside the interior bounding box.

The complexity of the algorithm  $greedy_PWR$  is  $O(km^3)$  time. However, indeed we can show that the typical run time is  $O(km^2)$  since the average length of nets is very short in the most instances of PWRP.

#### VII. SIGNAL DISTRIBUTION

As we defined before, the *SDP* is given a set of nets  $N = \{N_1, \ldots, N_n\}$  with terminals redistributed over *viagrid* during the pin redistribution process, to route all nets optimizing MCM performance. Our strategy is to do *single-layer routing* first for more critical nets, followed by xy



Fig. 4. A comparison between PRP and PWRP: (a) PRP; (b) PWRP.



Fig. 5. A step-by-step solution to PWRP using greedy PWR.

plane-pair routing for less critical nets. That is, we first iterate single-layer routing until  $\alpha$ % of the nets have been routed, then route other  $(100 - \alpha)$ % nets by applying xy plane-pair routing process iteratively. Each of signal distribution layers consist of a via-grid superimposed uniformly on the basic-grid as we defined in Section 4. Note that vias are allowed in via-grid points is  $\sigma$ .

Let the *i*th layer of signal distribution layers be  $\Gamma_i$ , and  $\alpha_i$  be the percentage of nets routed using layers  $\Gamma_1$  to  $\Gamma_i$ . We apply iterative single-layer routing for each layer  $\Gamma_i$ ,  $1 \leq i \leq k$ . Then, xy plane-pair routing is employed to route those unrouted nets, where  $\Gamma_{k+2j-1}$  and  $\Gamma_{k+2j}$  (j > 0) form xy plane-pairs. In this section, we will propose various routing approaches based on the two routing paradigms. Fig. 4 compares three strategies to SDP.

#### A. Iterative Single-Layer Routing

The proposed single-layer routing works in the following way. In each layer, we first employ *Force-Routing* – an extension of the technique, *Density Algorithm* [15] – whose



(c)
Fig. 6. Three different approaches to signal distribution using the redistributed pins shown in Fig. 5: (a) Single layer routing (vias=6, length=20);
(b) XY-reserved routing (vias=7, length=24); (c) XY-free routing (vias=3, length=24).

goal is to maximize the number of routed nets in each layer.

The density algorithm first determines the routing sequence based on a linear combination of the estimated congestion, net length and priority. Then, it finds a routing path, being a sequence of cells, for each net (one net at a time), avoiding "congested" areas. The density algorithm works as follows.

- Step 1: obtain a square tessellation of the routing plane with each tile's size being  $w \times w$ , where w is an input parameter to Density Algorithm.
- Step 2: establish the information of estimated congestion, which is stored in a data structure, *Congestion-Map* (CM).
- Step 3: determine the sequence of nets to be processed according to the value  $f(N_i)$ , which is a linear combination of  $L(N_i)$ ,  $G(N_i)$  and  $P(N_i)$ , where  $L(N_i)$  is the half-perimeter of the bounding box (in  $L_1$  metric),  $G(N_i)$  is the estimated congestion, and  $P(N_i)$  is priority of the net.
- Step 4: find the shortest routing path, one net at a time, that (1) avoids passing through "congested" areas (according to the information in CM), (2) satisfies the capacity constraint on each tile's boundary, and (3) maintains the planarity in each tile.

The variable w in density algorithm determines how much information is provided for detailed routing. As w gets smaller, more precise information are provided; when w = 1, the solution is indeed a detailed routing. In force-routing, we set the value of w to be one. Note that in this model entire net is assigned to a layer without splitting it into different layers.

We also propose an optional operation, *shrink-box algorithm* (SBA), as the post-processing of *Force-Routing*. The objective of shrink-box algorithm is to utilize the unused space left

by force-routing to reduce the size of the bounding box for each unrouted net. The bounding-box of a net  $N_i$  (denoted by  $Box(N_i)$ ) is the smallest rectangle that encloses all terminals of  $N_i$ . An outline of SBA is as follows.

- Step 1: Select one unrouted net  $N_i$  (according to the sequence of routing determined in density algorithm).
- Step 2: Let  $T_{Box}^i$  be the terminals of  $N_i$  that are in  $Box(N_i)$ , and  $GC_i$  be the geometric center of all terminals of  $N_i$ .
  - Step 2.1: For each terminal t<sub>j</sub> ∈ T<sup>i</sup><sub>Box</sub>, move t<sub>j</sub> toward GC<sub>i</sub> using one "L" shape path (i.e., only one horizontal and one vertical segment) as close as possible.
  - Step 2.2: Update the positions of moved terminals and  $Box(N_i)$ , and recalculate  $GC_i$  for  $Box(N_i)$ .
- Step 3: Repeat Step 2 until  $Box(N_i)$  cannot be shrunk anymore.

The SBA is effective, since it improves the routability (using free space left in current layer) in the next layers. Note that in this model a partial net can be assigned to a layer by splitting it into different layers.

The experimental results show that SBA indeed reduces the number of layers needed to complete routing. Note that SBA may introduce more staircase vias than stacked vias. Thus, in the case where stacked vias are preferred, the use of this technique should be restricted.

To incorporate various routing requirements, the proposed force-routing has been extended (from density algorithm) as follows:

1. Skin Effect Problem (with respect to the limitation [maximum] on wire length): Initially, nets with the halfbounding-box length that are near their threshold (i.e., net's length limitation) are given higher priority in the routing sequence. The priority of each net is adjusted at each layer  $\Gamma_k$ , for each net  $N_i$  with  $allowance(N_i, k) =$  $((maximum(N_i) - accumulated(N_i, k-1))/L1(N_i, k)$  less than *critical\_ratio* (a parameter specified by layout designer). Here,  $maximum(N_i)$  is the maximum wire length allowed for net  $N_i$ ,  $accumulated(N_i, k)$  is the sum of wire length for  $N_i$  spanning from layer  $\Gamma_1$  to  $\Gamma_k$  and  $L1(N_i, k)$  is the lowerbound of  $N_i$  in layer  $\Gamma_k$ . The intuition behind the approach is that when the wire length of a net is about to exceed its threshold, it should be given higher priority during the routing process. Note that when the number of vias is considered as an important factor in delay, this method can be easily modified to accommodate the situation.

2. Cross-Talk Problem (with respect to a linear cross talk): The linear cross talk is defined by  $parallel\_allowed(i, j)/\lambda$ , where two different nets *i* and *j* can be parallel to each other with a spacing  $\lambda$  This problem can be resolved by modifying the path-finding process of density algorithm. When a wire has more than one option (at certain point) during the process, the directions that cause the wire to be parallel with another wire for larger than the accumulated linear cross-talk between two nets are charged with a higher cost. Thus, once the "bad" direction is charged with a higher cost, then the situation that two nets run parallel to each other is avoided.

#### B. XY Plane-Pair Routing

Next, we propose an xy plane-pair routing process, which is a direct extension from force-routing with following modifications.

- Two layers of routing plane,  $\Gamma_{k+2j-1}$  and  $\Gamma_{k+2j}$ , are used in each iteration of the process.
- In the upper layer  $\Gamma_{k+2j-1}$  (X plane), we set the cost to take vertical expansion to be higher than the horizontal direction; a symmetric cost assignment is used in the lower layer (Y plane).
- The cost for a path to change layer is determined by the allowance for vias to be used (for example, if via is not preferred, we set the cost of changing layer to be much higher than expanding in other directions).

The unrouted nets in a plane-pair  $\Gamma_{k+2j-1}$  and  $\Gamma_{k+2j}$  are brought to the next plane-pair ( $\Gamma_{k+2j+1}$  and  $\Gamma_{k+2j+2}$ ), where xy plane-pair routing process is again employed. This process is repeated until all nets are routed. Note that using this routing process each plane-pair may contain staircase vias for each net that travels the two layers up and down.

The following two models are employed for xy plane-pair routing:

- **XY-reserved model** In each plane-pair, one layer permits only x-direction wiring, and the other permits only Y-direction wiring. Every bend in a net introduces vias between one plane and the other;
- **XY-free model** On every layer, both x- and y-direction wiring is allowed. In this case, bends in nets do not necessarily introduce vias.

In each of those strategies, if there remain nets that could not be laid out entirely in each plane-pair, then they are extended to the next plane-pair by introducing vias. The proposed xy plane-pair routing process can be applied at any stage during the interconnection phase. For example, we can apply it after (or before) the iterative single-layer routing process as decided by the designers.

Note that xy plane-pair routing process not only inherits the performance-driven characteristics (the capability to deal with skin effect and cross-talk problems) from force-routing, but also incorporates the delay associated with discontinuities (vias) by charging higher cost when a wire is trying to change from one plane to another.

# VIII. EXPERIMENTAL RESULTS

The devised multilayer routing algorithm has been implemented in C on a SUN SPARC station IPC, under UNIX. As shown in Table I, as a test of effectiveness, a number of practical MCM examples were used for experimentation. In real MCM circuits, high-speed nets suffer if there are more than four loads. However, a lot of nets are not high speed or need to "talk" to multiple ICs. Thus, the multiplicity of nets (number of terminals per net) in MCMs varies from 1 to 20 [8]. The multiplicity varies from 1 to 10 in our experimentation.

Two pin configurations [2] are used in our experimentation; that is, a square package with pins at the perimeter (MCM1, MCM2, and MCM3) and a full two-dimensional array of pins

| MCM ROUTING INSTANCES |                |   |                     |                     |  |  |
|-----------------------|----------------|---|---------------------|---------------------|--|--|
| Sample                | # chips        | σ | size $(m \times m)$ | # nets (n)<br>/pins |  |  |
| MCM1                  | 7 × 7          | 2 | $100 \times 100$    | 659/2,179           |  |  |
| MCM2                  | 8 × 8          | 2 | 120 × 120           | 865/2,842           |  |  |
| MCM3                  | 8 × 8          | 2 | 180 × 180           | 1109/4,160          |  |  |
| MCM4                  | 5 × 5          | 2 | 78 × 78             | 213/694             |  |  |
| MCM5                  | $10 \times 10$ | 2 | 149 × 149           | 848/2,772           |  |  |

TABLE I MCM ROUTING INSTANCES

TABLE II Test Results of greedy-CMR ( the Numbers in Parenthesis Show the Result of Applying Pin Redistribution without Prewiring)

| Sample | layers | %<br>redistributed<br>(2 layers) | length        | time (sec). |
|--------|--------|----------------------------------|---------------|-------------|
| MCM1   | 5      | 93.5                             | 4,204         | 124.5       |
| MCM2   | 6      | 92.5                             | 5,646         | 187.9       |
| MCM3   | 3      | 99.7                             | 4,904         | 463.2       |
| MCM4   | 3(3)   | 91.4 (96.4)                      | 1,970 (1,685) | 20.3 (1.7)  |
| MCM5   | 3(3)   | 94.6 (96.3)                      | 7,753 (6,736) | 186.6 (6.6) |

(the densest configuration for a chip carrier) (MCM4 and MCM5).

# A. Results on Pin Redistribution

The greedy-CMR successfully assigns about 92% or more of the pins to the uniform grid using two pin redistribution layers as shown in Table II. The remaining nets are now sparsely distributed over the chip layer because pins are redistributed with short nets using the first two layers and the local congestion is reduced. Therefore, the remaining pins can be redistributed using no more than three layers in most instances, as shown in Table II.

Furthermore, we could observe that the 8% of pins unassigned to the uniform grid (after two successive layers of pin redistribution) are already separated from the pins redistributed over the two layers. Thus, the pins do not have to find its via terminals in a long distance in the kth layer,  $k \ge 3$ . Therefore, the number of layers required for the pin redistribution can be practically limited by two. The remaining pins can be assigned directly to the signal distribution layers without redistributing them.

Note again that the PWRP is a more restricted version of PRP such that an s-net unassigned to any via terminal attempts to find the closest via-terminal lying inside the net's bounding box. However, the closest via-terminal may reside outside the bounding box. Therefore, we could observe that applying PWR increases the total wire length in pin redistribution layers with more processing time compared with when applying PR. However, the strategy significantly reduces nets' total wire length, number of vias and number of layers after signal distribution as we shall show in Section 8.3 (as shown in Tables III and IV ).

# B. Results on Various Routing Strategies

In this section, to provide a trade-off between the various performance parameters (i.e., number of layers, number of stacked/staircase vias, total wire length and delay), we com-

pare the results on various signal distribution strategies as shown in Figs. 7-9:

- Model 1. SLR $\alpha$  + XY-reserved with SBA;
- Model 2. SLR $\alpha$  + XY-reserved without SBA;
- Model 3. SLR $\alpha$  + XY-free with SBA;
- Model 4. SLR $\alpha$  + XY-free without SBA;

where SLR $\alpha$  denotes the strategy of using iterative singlelayer routing until  $\alpha$ % of the nets are routed, then routing other  $(100 - \alpha)$ % nets by xy plane-pair routing process. Here, for the case of  $\alpha = 100$ , we use only single-layer routing iteratively until all nets are routed, Whereas, for the case of  $\alpha = 0$ , we use only xy plane-pair routing iteratively until all nets are routed. The running time of the algorithm for MCM3 (the largest size) was the minimum (5 hrs) when applying SLR100 and the maximum (9 hrs) when applying SLR0.

First, we compare the routing results with/without applying SBA. Fig. 7 shows that when SBA is applied, the number of layers and vias (more staircase vias, but less stacked vias) are significantly reduced at the cost of wire length increase. When the number of vias are dominating factors in delay estimation as in high performance MCMs (above 1 GHz), the method will be particularly desirable.

Next, we compare the routing results on various strategies as in the following subsections.

Lower Bound Estimation: The average routing area utilization (RAU) achieved in each layer is calculated for each strategy. Notice that the RAU reflects the routing difficulty (e.g., depending on pin population and net congestion). SLR100 maximized the RAU in the 1st layer, i.e., 59% (30%) using (without using) SBA in the first layer, and the RAU is decreased in the next layer as the number of routing layers increases. Whereas, in both SLR0 + XY-reserved and SLR0 + XY-free model, the RAU is almost constant for every layer (75% on the average)<sup>2</sup>. Thus, the maximum routing density achievable in a single layer occurs when we use those two models. Note that the two models can be best used for the case where producing uniform density on every layer is imperative.

We estimate the lower bounds on various parameters as follows. We estimate the lower bound length of net i as half perimeter of smallest bounding box containing all the terminals in the net *i*. However, note that the lower bound length is loosely estimated for the following reason. Let us assume that we want to find an optimum global routing (whose goal is to minimize the density of global cells) in a two-dimensional array, constrained to allowing each net to be wired using the estimated lower bound length. It is clear that the global density, i.e., the maximum density of global cells, is a lower bound on the number of lavers required. Even if it is often true that global density will be lower if we route the nets with shortest routes, there exists the case where the shortest routes do not necessarily lead to the minimum density. In other words, if we route nets using the estimated lower bound length for each net, then the number of layers required may be greatly increased, because of nets' congestion. Thus, we may allow some non-

 $^{2}$  [20] also showed that the maximum routing density achievable on a single layer is about 75%, on the average.

critical nets to be routed using "detour" to arrive at a desirable multilayer routing solution.

We estimate the lower bound on the number of layers,  $\Gamma_{low}$ , as follows. Note that the lower bound  $\Gamma_{low}$  is a function of congestion, average/total lower bound length and RAU. Since we have not yet been able to find a "tight" lower bound on the number of layers using all those metrics, we consider only two metrics,  $L_{low}$  and A, where  $L_{low}$  is the total lower bound length of all nets and A is the area of MCM substrate, as follows. If we assume, without loss of generality, that we could maximize the routing area utilization up to 75% in each layer (as shown in the previous paragraph).

$$\Gamma_{low} = L_{low} / (0.75 \times A). \tag{2}$$

Note that the loosely estimated  $\Gamma_{low}$  is far below an optimal one.

We also assume without loss of generality that RAU is constant for every layer as we observed that when we use either SLR0 + XY-reserved or SLR0 + XY-free model, then, the estimated lower bound on the number of vias is:

$$V_{low} = U \times \Gamma_{low}/2 \tag{3}$$

where U is the total number of pins. Only primary vias are involved in calculating the lower bound on the number of vias; i.e., secondary vias are not included in the lower bound. As we pointed out, in xy plane-pair routing model, each planepair may contain staircase vias for each net that travels up and down between two layers. Thus, note that the value  $V_{low}$  is a rather loose lower bound. However, it does provide some intuition on how well we are doing.

The delay model (only concerned with geometric parameters) used for the experimentation is as follows. The geometric delay for each net *i* can be estimated as  $d_i^G = k_0 \times l_i + k_1 \times$  $stk_i + k_2 \times str_i$ , where  $l_i$  is the actual routing length,  $stk_i$  $(str_i)$  is the number of stacked vias (staircase vias) used for connecting net *i* and  $k_0$  is the delay constant of unit net length,  $k_1 (k_2)$  is a delay constant per stacked via (staircase via). Then, we can estimate the lower bound on the geometric delay as follows:

$$D_{low} = k_0 \times L_{low} + (k_1 + k_2)/2 \times V_{low}.$$
 (4)

Distributions on Performance Ratios as a Function of Five Mixed Strategies: What is most likely to produce an unacceptable (desirable) solution, i.e., solutions producing the largest (smallest) values of wire length, vias, layers and the estimated geometric delay? We provide four types of distributions on the performance ratios (i.e., actual bound /lower bound) based on the estimated lower bounds, over the five mixed strategies  $(\alpha = (100, 75, 50, 25, 0))$  with/without using SBA. That is, in Fig. 7, the estimated lower bounds performing four basic algorithms (Models 1, 2, 3, and 4) is plotted as a function of  $(\alpha = (100, 75, 50, 25, 0))$ . Based on the lower bound estimation, our experiments showed that for every routing model the performance ratio curves on both total wire length and number of layers shift monotonically as  $\alpha$  decreases. Whereas, the performance ratios on number of vias for Models 1, 2, 3, and 4 were at the minimum when  $\alpha = 100, 50, 50$  and 0, respectively.

Based on the above lower bound estimation, our experiments showed that the performance ratios on the number of vias, total wire length, number of layers and delay are, as shown in Fig. 7, only about 2.7, 1.2, 2.8, and 1.4, respectively, on the average, when we select the best strategy in terms of those parameters. Note that lower bounds on those metrics are estimated far below the optimal one.

For results on various signal distribution strategies we learned that as the number of iterations for single-layer routing increased, the total wire length is further increased, and the number of layers also increased.

It is interesting to know that among all mixed strategies of  $SLR\alpha$  + XY-reserved model the strategy of SLR50 without SBA (i.e., Model 2 with  $\alpha = 50$ ) is the best in terms of the estimated delay (where  $k_0 = 1, k_1 = 1.2$  and  $k_2 = 1.2$ ). The reason is that the wire length difference between maximum and minimum value is small compared with the via-count difference between maximum and minimum value as shown in Figs. 7(a) and 7(b). That is, the strategy of  $\alpha = 50$  showed the minimum via-counts in the V-shaped curve of Model 2. The strategy of SLR0 + XY-free model without using SBA were performed best (with little difference from the strategy of SLR50 + XY-reserved model) among all strategies in terms of the estimated delay (where  $k_0 = 1.0, k_1 = 1.2$  and  $k_2 = 1.2$ ). Distribution on Stacked/Staircase Via-Count as a Function of Five Mixed Strategies: What is most likely to produce an unacceptable (desirable) solution, i.e., solutions producing the largest (smallest) number of stacked/staircase vias? In Fig. 8, we also provided the distribution on the number of stacked (staircase) vias as a function of the five routing strategies (only the result on Sample MCM2 is shown; the others showed similar results). Note that the more (less) iterations of singlelayer routing produces fewer staircase (stacked) vias except for applying Model 3.

Distribution on Delay Performance with Different Weights on Two Via Types: As we mentioned before, we cannot ignore the delay effect on the z-direction bends introduced by staircase vias at 10 GHz or more. In that case, the stacked via should be preferred to the staircase via. Therefore, we performed another test by setting  $k_0 = 1.0, k_1 = 1.2$  and  $k_2 = 2.0$ . Fig. 7(c) shows that SBA indeed reduces the number of layers needed to complete the routing compared with when we do not use the procedure. However, applying SBA introduced more staircase vias as shown in Fig. 8. With higher weight on staircase vias than on stacked vias (e.g.,  $k_0 = 1.0, k_1 = 1.2$  and  $k_2 = 2.0$ ), an experiment showed that there is a larger delay difference between two models with/without applying SBA, as shown in Fig. 9(b), compared with the case of Fig. 9(a). Thus, in the case where stacked vias are preferred, the use of SBA should be restricted.

Since producing via/bend-minimum layout is very important, especially for high-performance MCMs, we next impose higher weight on both vias such as  $k_0 = 1.0, k_1 = 2.0$  and  $k_2 = 2.0$ , as shown in Fig. 9(c). Since xy-reserved model produced more vias than xy-free model (when  $\alpha \leq 50$  as shown in Fig. 7(b)), Model 4 were performed best among all strategies in terms of the estimated delay (where  $k_0 =$  $1.0, k_1 = 2.0$  and  $k_2 = 2.0$ ). Thus, for the case where the



Fig. 8. Stacked/staircase via-count distributions as a function of  $\alpha$ .

(b)

(a)

delay effect on vias is more critical than one on wire length (e.g.,  $k_0 = 1.0, k_1 \ge 1.2$  and  $k_2 \ge 2.0$ ), the use of xy-reserved model should be restricted.

Distribution on the Number of Nets as a Function of Wire Lengths/Stacked Vias/Staircase Vias: In Fig. 10, we also provided three types of distributions: distributions on the number of nets versus wire lengths/stacked vias/staircase vias. The stick diagram represents nine strategies such that sticks lie between two intervals if their corresponding wire lengths lie between two intervals. The thickest solid line represents SLR0 model without SBA. Except for the model, the thinner solid (dotted) stick represents more iterations of single-layer routing with using SBA (without using SBA). On the average, the percentage of nets routed using less than 50 unit length is about 60%. In both XY-reserved and XY-free model, the more (less) iterations of single-layer routing lengthens (shortens) the tail of the wire length distributions as shown in Fig.10(a).

As shown in Fig. 10(b) (the result on Sample MCM2 is shown only; the others showed similar results), we provided the distribution on the number of nets versus staircase vias over the nine routing strategies. The approaches without SBA shortens the tail of the staircase via distributions in both XYreserved and XY-free model. The five strategies of SLR $\alpha$  + XY-free, where  $\alpha = (100,75,50,25,0)$ , yielded the distribution such that 50% ~ 65% of nets are routed using less than five staircase vias. In that case, the maximum (minimum) case occurs when applying SLR0 (SLR50). Whereas, as shown in Fig. 10(c), the five strategies of SLR $\alpha$  + XY-reserved, where  $\alpha = (100,75,50,25,0)$ , yielded the distribution such that 40% ~ 70% of nets are routed using less than five staircase vias. In that case, the maximum (minimum) case occurs when applying SLR0 (SLR25).

(d)

(c)

In Fig. 10(d), we provided the distribution on the number of nets versus stacked vias over the nine routing strategies. Both XY-reserved and XY-free model "with SBA" yielded the uniform distribution such that  $60\% \sim 70\%$  of nets are routed using less than five stacked vias. The "maximum (minimum)" case occurs when applying SLR $\alpha$ ,  $\alpha > 25$  (SLR0).



Fig. 9. Distribution on delay performance with different weights on two via types.

TABLE III Comparison on Two MCM Routing Approaches with PR/PWR. The Strategy of ''SLR0 + XY-Free'' Is Used for SD

|         |      | PWR   | SD     | total  | PR             | SD     | total  |
|---------|------|-------|--------|--------|----------------|--------|--------|
| wire    | MCM4 | 1,970 | 12,649 | 13,819 | 1,685          | 12,763 | 14,448 |
| length  | MCM5 | 7,753 | 66,951 | 74,704 | 6 <u>,73</u> 6 | 67,991 | 74,727 |
| number  | MCM4 | 361   | 1,802  | 2,163  | 275            | 1,902  | 2,177  |
| of vias | MCM5 | 935   | 10,160 | 11,095 | 1,100          | 10,432 | 1,1532 |
| number  | MCM4 | 3     | 6      | 9      | 3              | 8      | 11     |
| of      | MCM5 | 3     | 10     | 13     | 3              | 10     | 13     |
| layers  |      |       | l      |        |                |        |        |

We finally provided the distribution on the number of nets versus stacked vias over the nine routing strategies. Both XY-reserved and XY-free model "without SBA" yielded the uniform distribution such that  $60\% \sim 70\%$  of nets are routed using less than five stacked vias. The "minimum (maximum)" case occurs when applying SLR $\alpha$ ,  $\alpha > 25$  (SLR0). In all, we learned that both XY-reserved and XY-free model + SLR0 without SBA produced the largest number of nets that use less than (5 stacked vias + 5 staircase vias).

## C. Results on PR + SD

Finally, we tested the routing effect on pin redistribution after performing signal distribution. We first compared the routing result of using PR/PWR. Applying PWR increases the total wire length in *pin redistribution layers* compared with applying *PR*. However, after signal distribution, the strategy significantly reduces nets' total wire length, number of vias and number of layers as shown in Tables III and IV.

Next, we compared the routing result with/without applying PWR. As shown in Table V, the routing result achieved using pin redistribution was better than that of without using pin redistribution.

Table V shows that when PWR is applied the number of layers and vias are significantly reduced at the cost of greater total wire length. PWR cut down the number of layers by 24% and the number of vias by 45%, which is desirable for high performance MCMs. However, there was a 15%

TABLE IV COMPARISON ON TWO MCM ROUTING APPROACHES WITH PR/PWR. THE STRATEGY OF "SLR0 + XY-RESERVED " IS USED FOR SD

|         |      | PWR   | SD     | total  | PR    | SD     | total  |
|---------|------|-------|--------|--------|-------|--------|--------|
| wire    | MCM4 | 1,970 | 1,2490 | 1,4460 | 1,685 | 12,795 | 14,480 |
| length  | MCM5 | 7,753 | 64,652 | 72,405 | 6,736 | 72,375 | 79,111 |
| number  | MCM4 | 361   | 2,807  | 3,167  | 275   | 2,698  | 2,973  |
| of vias | MCM5 | 935   | 13,714 | 1,4649 | 1,100 | 13,940 | 1,4940 |
| number  | MCM4 | 3     | 8      | 11     | 3     | 8      | 11     |
| of      | MCM5 | 3     | 8      | 11     | 3     | 10     | 13     |
| layers  |      |       |        |        |       |        |        |

TABLE V COMPARISON ON TWO MCM ROUTING APPROACHES WITH/WITHOUT PWR. THE STRATEGY OF "SLR0 + XY-FREE" IS USED FOR SD

|                  |      | w/o PWR | w/ PWR |
|------------------|------|---------|--------|
| and a largeth    | MCM4 | 11939   | 13,819 |
| wire length      | MCM5 | 64760   | 74,704 |
|                  | MCM4 | 3843    | 2163   |
| number of vias   | MCM5 | 17707   | 11,095 |
| mumber of laws   | MCM4 | 14      | 10     |
| number of layers | MCM5 | 16      | 13     |
| actual delay     | MCM4 | 1.59    | 1.54   |
| $D_{low}$        | MCM5 | 1.46    | 1.46   |

increase in total wire length, and so performance ratio on the estimated delay ( $k_0 = 1.0, k_1 = 1.2$  and  $k_2 = 1.2$ ) was not much improved when PWR is applied. If the number of vias is a dominating factor in delay estimation, PWR will reduce the estimated delay significantly as the values  $k_1$  and  $k_2$  is increased. The experimental results showed that pin redistribution strategy provides a global distribution for the pins congested in the chip site over the chip layer so as to yield less signal distribution layers and less via counts required to interconnect all nets.

We provide the graphic printout generated by running the proposed algorithms in Figs. 11 - 13.

# IX. CONCLUSION

We proposed a new multilayer MCM routing system, called  $M^2R$ , to deal with new design features in high performance

MCMs. The cross-talk noise between vias (one of the major problem in MCM) induced by many layers (generally, up to 63 layers), was reduced by distributing pins evenly over the MCM substrate using pin redistribution layers. The experimental results verified that pin redistribution strategy provides a global distribution for the pins congested in the chip site over the chip layer so as to yield less signal distribution layers and less via counts required to interconnect all nets. For signal distribution, an innovative approach, which combines both iterative single-layer routing and xy plane-pair routing, was presented to provide a trade-off between the various performance parameters (i.e., number of layers, number of stacked/staircase vias, total wire length and delay).

Future works are as follows: (1) A flexible data structure for engineering change is needed to provide a short design cycle; (2) more realistic transmission line delay models should be incorporated with the proposed routing model.

# REFERENCES

- [1] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI.
- Reading, MA: Addison-Wesley Publishing Co., pp. 81–112, 1990.
  [2] A. J. Blodgett, "Microelectronic packaging," *Scientific Amer.*, pp. 86-96,
- July 1983.
- [3] J. D. Cho, S. Raje, M. Sarrafzadeh, M. Sriram, and S. M. Kang, "Crosstalk minimum layer assignment," in *Proc. IEEE Custom Integr.* Circuits Conf., pp. 29.7.1-29.7.4, 1993; also in IEEE Design and Test., Dec. 1993, pp. 27-37.
- [4] J. D. Cho and M. Sarrafzadeh, "The pin redistribution problem in multichip modules," in Proc. Fourth Annu. IEEE Int. ASIC Conf. and Exhibit, pp. 9-2.1-9-2.4., 1991; also to appear in Math. Programming, Series R
- [5] H. H. Chen and C. K. Wong, "Wiring and crosstalk avoidance in multi-chip module design," in *IEEE Custom Integrated Circ. Conf.*, IEEE, 1992
- W. W.-M. Dai, "Topological routing in SURF: Generating a rubber-[6] band sketch," in Proc. IEEE Design Automat. Conf., pp. 39-48, 1991;
- also in *IEEE Design and Test.*, pp. 18–26, Dec. 1993. [7] E. E. Davidson and G. A. Katopis, "Package electrical design," in Microelectronics Packaging Handbook, R. R. Tummala and E. J. Rymaszewski, Eds., New York: Van Nostrand Reinhold, 1989.
- [8] P. D. Franzon, Private communication, Electronic MCM Clearing House in North Carolina State University, 1992.
- [9] J. M. Ho, M. Sarrafzadeh, G. Vijayan, and C. K. Wong, "Layer assignment for multi-chip modules," *IEEE Trans. Comput. Aided Design*, vol.
- pp. 1272-1277, Dec. 1990.
   L.-T. Hwang and I. Turlik, "The skin effect in thin-film interconnections for ULSI/VLSI packages," in MCNC Technical Reports, Technical Report Series TR91-13, 1991.
- [11] L. Hwang, I. Turlik, and A. Reisman, "A thermal module design for advanced packaging," J. Electron. Mats., vol. 16, pp. 347-355, 1987.
- [12] W. R. Blood Jr., MECL System Design Handbook. Motorola Inc, 1983. [13] K. Y. Khoo and J. Cong. "A fast multilayer general area router for
- mcm and dense PCB designs," IEEE Trans. Circ. and Syst.,, vol. 39, pp. 841–851, Nov. 1992.. [14] J. U. Knickerbocker, G. B. Leung, W. R. Miller, S. P. Young, S. A.
- Sands, and R. F. Indyk, "IBM System/390 air-cooled alumina thermal conduction module," IBM J. Res. and Develop., vol. 35, pp. 330-340,
- May 1991.
  [15] K. F. Liao, M. Sarrafzadeh, and C. K. Wong, "Single-layer global routing," in *Proc. Fourth Annu. IEEE Int. ASIC Conf. and Exhibit*, pp. 14-4.1-14-4.4., 1991; also in IEEE Trans. Comput.-Aided Design, vol. 13, no. 1, pp. 38–47.[16] M. Marek-Sadowska, "An unconstraint topological via minimization,"
- IEEE Trans. Comput. Aided Design, vol. CAD-3, pp. 184-190, July 1984. [17] H. Moreno, "OSS physical design cad tools specifications," MCC Technical Report P/I-250-90, 1990.
- [18] B. Preas, M. Pedram, and D. Curry, "Automatic layout of siliconon-silicon hybrid packages," in Design Automat. Conf., pp. 394-399, 1989.

- [19] M. Stallmann, T. Hughes, and W. Liu, "Unconstrainted via minimization for topological multilayer routing," IEEE Trans. Comput. Aided
- [20] M. Sriram and S. M. Kang, "Detailed layer assignment for MCM routing," in *Int. Conf. on Comput.-Aided Design*, pp. 386-389, 1992.
  [21] M. Sarrafzadeh and D. T. Lee, "A new approach to topological via the statement of the
- minimization," IEEE Trans. Comput.-Aided Design, vol. 8, pp. 890-900, Aug. 1989.



Jun-Dong Cho received the B.S. degree in Electronics from Sung Kyun Kwan University in Seoul, Korea in 1980. He received an M.S. in computer science at Polytechnic University, New York, in 1989, and his Ph.D. in computer science at Northwestern University, Evanston, Illinois in 1993

He works with the CAE team in Micro Devices Business, at Samsung Electronics Co., Ltd., Korea. His research interests include electronic computeraided design, VLSI layout algorithms, graph theory and computational geometry/complexity

He received the Best paper award at the 1993 Design Automation Conference in the area of physical design.

Kuo-Feng Liao received the B.S. degree in computer science engineering from the National Chiao-Tung University, Taiwan, in 1983, and the M.S. and Ph.D. degrees in electrical engineering and computer science from Northwestern University, Evanston, IL, in 1988 and 1992, respectively

He joined the IBM Microelectronics Division, East Fishkill facility, Hopewell Junction, NY, in 1992. His main interests are in the area of computer-aided design system, early design analysis, physical layout design, design and analysis of algorithm, and application of graph theory.

Salil Raje received his B.Tech. in electrical engineering from the Indian Institute of Technology, Madras, in 1991, and the M.S. in electrical engineering from Northwestern University, Evanston, IL, in 1993. He is currently working on his Ph.D. in computer science in Northwestern University, Evanston, IL.

He was a guest scientist at the German National Laboratory for Computer Science, Sankt Augustin, Germany, in the summer of 1993. His main interests include behavioral synthesis, physical design, design, and analysis of algorithms and graph theory,



Majid Sarrafzadeh (M'87, SM'92) received his B.S., M.S. and Ph.D. in 1982, 1984, and 1987, respectively, all from the University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering. He joined Northwestern University as an As-

sistant Professor in 1987. Since 1991 he has been Associate Professor of Electrical Engineering and Computer Sciene at Northwestern University, His research interests lie in the area of design and analysis of algorithms and computational complexity, with emphasis in VLSI.

Dr. Sarrafzadeh is a member of the IEEE Computer Society. He received an NSF Engineering Initiation award in 1987, two distinguished paper awards in ICCAD-91, and the best paper award in DAC-93 in the area of physical design. He has served on the technical program committee of various conferences, for example, ICCAD-92, ISCAS-93, and ICCAD-93. He is a co-editor of the book Algorithmic Aspects of VLSI Layout, and an Associate Editor of IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN.