ieee transactions on components and ...ieee transactions on components and packaging technologies,...

IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. ??, NO. ??, JANUARY(?) 2005 1

Placement and Routing for3D System-On-Package Designs

Jacob Minz, Eric Wong, Mohit Pathak, and Sung Kyu Lim,Member, IEEE

Abstract— 3D packaging via System-On-Package (SOP) is aviable alternative to System-On-Chip (SOC) to meet the rigorousrequirements of today’s mixed signal system integration. Inthis article, we present the first physical design algorithmsfor thermal and power supply noise-aware 3D placement andcrosstalk-aware 3D global routing. Existing approaches considerthe thermal distribution, power supply noise, and crosstalk issuesas an afterthought, which may require an expensive coolingscheme, more decoupling capacitors (= decap), and additionalrouting layers. Our goal is to overcome this problem withour thermal/decap/crosstalk-aware 3D layout automation tools.The traditional design objectives such as performance, area,wirelength, and via are considered simultaneously to ensure highquality results. The related experimental results demonstrate theeffectiveness of our approaches.

Index Terms— System-On-Package, mixed-signal CAD, 3Dpackaging, placement and routing, thermal distribution, powersupply noise, crosstalk.

I. I NTRODUCTION

SEMICONDUCTOR industry is beginning to question theviability of System-On-Chip (SOC) approach due to its

low-yield and high-cost problem. Recently, 3D packaging viaSystem-On-Package (SOP) [1], [2], [3] has been proposed asan alternative solution to meet the rigorous requirements oftoday’s mixed signal system integration.1 The SOP is about3D integration of multiple functions in a miniaturized packageachieved by thin film embedding. The 3D SOP conceptoptimizes ICs for transistors and the package for integration ofdigital, RF, optical, sensor and others. It accomplishes this byboth build-up SOP, similar to IC fabrication, and by stackedSOP, similar to parallel board fabrication. The uniquenessof 3D SOP is in the highly integrated or embedded RF,optical or digital functional blocks, and sensors, in contrast tostacked ICs and stacked package. Due to the high complexityin designing large-scale SOP under multiple objectives andconstraints, computer-aided design (CAD) tools have becomeindispensable. Thus, innovative ideas on CAD tools for SOPtechnology are crucial to fully exploit the potential of this newemerging technology. However, there exists very few tools, ifnot none, that handle the complexity of automatic 3D SOPlayout generation.

This article tackles three most crucial issues that threatenthe reliability of SOP paradigm: thermal distribution, power

Jacob Minz, Eric Wong, Mohit Pathak, and Sung Kyu Lim are withthe School of Electrical and Computer Engineering, Georgia Institute ofTechnology.

Publisher Item Identifier ???.1A special issue on SOP (vol. 27, issue 2, May 2004) provides a compre-

hensive survey of the state-of-the-art in SOP technology.

A

D

P

MA

D

P

M

D

D A

M D

DA

P

M

P

PM

P

MMD

D

A

M A D P

A

P

D

top-down view cross-section view

Fig. 1. Illustration of 3D System-On-Package, where A, D, M, and Prespectively node analog chip, digital chip, memory module, and embeddedpassive blocks.

supply noise, and crosstalk. First, thermal issues can no longerbe ignored in high performance 3D packages due to higherpower densities and other issues. High temperatures not onlyrequire more advanced heat sinks, they also degrade circuitperformance. Interconnect delay increases with temperature,which degrades circuit timing. If timing deteriorates enough,logic faults can occur. Hence thermal issues must be consid-ered early-on in the design process. Second, the continuingtrend of reducing power supply voltage has resulted in reducednoise margin, which effects reliability and may even causefunctional failures due to spurious transitions. Active devicesin 3D packaging draw a large volume of instantaneous currentduring switching, which causes simultaneous switching noise(SSN). Third, due to the scaling down of device geometryin deep-submicron technologies, the crosstalk noise betweenadjacent nets has become a major concern in high performancepackaging design. Increased coupling noise can cause signaldelays, logic hazards and even malfunctioning of the designs.Thus controlling the level of crosstalk noise in 3D packageschip is an important task for the designers.

However, existing approaches consider these issues as anafterthought, which may require expensive cooling schemes,more decoupling capacitors (= decap), and more routinglayers. In addition, many time-consuming iterations are re-quired between full-length thermal/SSN/crosstalk simulationand manual layout repair until the convergence to a satisfactoryresult. We note that the placement of modules in 3D SOPdesign has huge impact on thermal distribution and SSNwhile the routing of the signal nets has direct impact oncrosstalk. Therefore, the goal of this article is to present the

0000–0000/00$00.00c© 2005 IEEE


first physical layout algorithm for 3D SOP that combinesthermal/decap-aware 3D placement and crosstalk-aware 3Dglobal routing algorithm. The traditional design objectivessuch as performance, area, wirelength, and via costs areconsidered simultaneously to ensure high quality results. Therelated experimental results demonstrate the effectiveness ofour approach.

The remainder of this paper is organized as follows. SectionII discusses previous works. Section III presents the problemformulation and an overview of our 3D algorithms. Section IVpresents the SOP placement algorithm. Section V presents theSOP routing algorithm. Section VI presents the experimentalresults. We conclude in Section VII.

II. RELATED WORKS

The physical design for 3D integration has drawn interestfrom both academia and industry recently. Unlike the activephysical design research effort in mixed signal System-On-Chip (SOC) [4], [5] and 3D stacked die technology [6], [7],[8], however, the research for 3D System-On-Package (SOP)considerably lacks behind due to its short history. The physicaldesign research for SOP has been recently pioneered by theauthors [9], [10], [11], [12], [13], [14]. Decoupling capacitorplacement is done for 2D printed circuit board (PCB) designs[15], [16], [17]. Thermal-aware module placement algorithmsare proposed for PCB designs [18], [19]. Many multi-chipmodule (MCM) routing algorithms have been proposed in theliterature [20], [21], [22], [23], [24], [25]. Several works onMCM pin redistribution include [26], [27], [28].

There exists several similarities and differences betweenMCM and SOP routing. In general, the design objectives andconstraints in both MCM and SOP routing are quite similar,where performance, wirelength, via, layer, and crosstalk opti-mization are crucial. In addition, the routing process is dividedinto multiple steps: pin distribution, topology generation, layerassignment, and detailed routing. A notable difference betweenSOP and MCM routing lies in the fact that there existsmultipledevice layers in SOP, whereas in MCM there is only onedevice layer. Therefore, nets are now connecting pins locatedin all intermediate layers in SOP, and the blocks in each layerbehave as obstacles. In MCM, however, all pins are locatedonly at the top layer, and there exists no obstacles except forthe wires themselves. This makes SOP or 3D package routingproblem more general than MCM routing.

In SOP designs, some pins in each device layer can bedistributed either to the layer above or below. In addition,some nets that connect blocks in non-neighboring device layersneed to penetrate intermediate device layers. Since the blocksin these intermediate layers become obstacles, designers userouting channels in each device layer to insert “feed-throughvias”. Thus, we have to determine which routing channel touse for feed-through vias. These routing channels are alsoused for intra-layer connection. Thus, after feed-through viainsertion is done, we need to decide which remaining channelsto use for the intra-layer connection. We then need to finishconnection between the original and distributed pins. In casethe pin location is not known for some “soft blocks”, we

x-y routing lyr1, Lr(1)

bot pin distr lyr1, Lb(1)

placement lyr2, Lp(2)

top pin distr lyr1, Lt(1)


x-y routing lyr2, Lr(2)

bot pin distr lyr2, Lb(2)


top pin distr lyr2, Lt(2)

routing

interval 1

RI(1)

routing

interval 2

RI(2)

Fig. 2. Illustration of the layer structure and routing resource in SOP. Theblock and white dots respectively denote the original and redistributed pins.The “x” denotes a feed-through pin for an x-net to pass through a placementlayer using a routing channel. The solid, dotted, and arrowed lines respectivelydenote signal wires, vias, and feed-through vias.

need to determine their location either along the boundary orunderneath the block during our local routing step.

III. PROBLEM FORMULATION

A. SOP Layer Structure

The layer structure in multi-layer SOP is illustrated inFigure 2. Theplacement layers2 contain the blocks (suchas ICs, embedded passives, opto-electric components, etc),which from the point of view of physical design are justrectangular blocks with pins along the boundary. The intervalbetween two adjacent placement layers is called theroutinginterval. A routing interval contains a stack ofrouting layerssandwiched betweenpin distribution layers. These layers areactually x-y routing layer pairs so that the rectilinear partialnet topologies may be assigned to them. The pin distributionlayers in each routing interval are used to evenly distributepins from the nets that are assigned to this interval. Then theseevenly distributed pins are connected using the routing layerpairs. Each placement layer consists of a pair ofx-y routinglayers, so routing is permitted. Afeed-through viais usedto connect two pin distribution layers from different routingintervals. Thus, the routing channels in each placement layerare used for two purposes: (i) accommodate feed-through vias,and (ii) perform local routing, where limited number of intra-layer connections are made.

In the SOP model the nets are classified into two categories.The nets which have all their terminals in the same placementlayer are calledi-nets, while the ones having terminals indifferent placement layers arex-nets. The i-nets can be routedin a single routing interval or indeed within the placementlayer itself. On the other hand, the x-nets may span more thanone routing interval.

We use the Floor Connection Graph (FCG) [29], illustratedin Figure 3, to model the placement layer, where each routingchannel becomes an edge and each channel intersection pointbecomes a node. In addition, each soft block becomes a node,and pin assignment edgesare added to this node to connectto all adjacent channels. This model allows us to determinewhich boundary to use for the pins with unknown location.Each channel edge is associated with (i) via capacity for

2We use placement layer and device layer interchangeably.


b1b2

b3

b4

b1b2

b3

b4

(a) (b)

Fig. 3. Graph-based modelling of placement layer. (a) a connection betweentwo hard blocks, (b)b2 is a soft block, and the new pin is located on thebottom boundary. Dotted lines denote pi assignment edges.

feed-through vias, and (ii) wire capacity for local routing.In addition, pin assignment edges have pin capacity for eachboundary of a soft block. The routing and pin distributionlayers in each routing interval are modelled with a standardx×y×z 3D grid, where each node represents a routing regionand eachx/y-direction edge represents each horizontal/verticalboundary among the regions. Eachz-direction edge representsa group of vias each region can accommodate. Thus, all edgesin this 3D grid are associated with capacity:x and y edgesfor wire capacity, andz edges for via capacity.

B. SOP Placement Problem

The following are given as the input to our 3D SOPplacement problem: (i) a set of blocksB = B1, B2, · · · , Bmthat represent the various active and passive components in thegiven SOP design, (ii) width, height, and maximum switchingcurrents for each block, (iii) a netlistNL = n1, n2, · · · ,nk that specifies how the blocks are connected via electricalwires, (iv) K, the number of placement layers (=|Lp|) inthe 3D packaging structure, (v) the number of power/groundsignal layers along with the location of the power/ground pins.

For each netn from a given netlistNL, let wln denote thewirelength ofn. The wirelengthwln is the sum of Manhattandistance inx, y, and z directions, where thez directionis the height of the associated vias.3 Let w(i), h(i), andA(i) respectively denote the width, height, and area of theplacement layerLp(i). Let At = maxw(i) · maxh(j),i.e., the product of the maximum width and the maximumheight among all placement layers. LetAs =

∑i A(i). Let

wave and have denote the average width and height amongall placement layers. LetAd =

∑i|wave − w(i)| + |have −

h(i)|, i.e., the sum of the deviation of the dimension amongall placement layers. Lastly, the area objective of 3D SOPplacement, denotedAtot, is the weighted sum ofAt, As, andAd.

Let T tot denote the maximum temperature among allblocks. LetDtot denote the total amount of decoupling capac-itance required to suppress the simultaneous switching noise(SSN) under the given tolerance value. Lastly, the goal of theSOP Placement Problemis to find the location of each block

3We assume that the height of vias connecting two x-y layers in a routingand pin distribution layer pair is of one grid, whereas the height of feed-through vias that penetrate a placement layer is of 6 grid.

4

3

(a) (b)

Fig. 4. Illustration of crosstalk computation between two Steiner trees.The numbers denote the coupling length. (a) crosstalk is 7, (b) crosstalk-freerouting.

in the placement layers such that the following cost functionis minimized while SSN constraint is satisfied:4

w1 ·Atot + w2 ·∑

n∈NL

wln + w3 ·Dtot + w4 · T tot (1)

In addition, decaps are required to be placed adjacent to theblocks that require them.

C. SOP Routing Problem

For each netn from a given netlistNL, let xtn and vn

respectively denote the amount of crosstalk and via associatedwith n. Let cl(n,m) denote the coupling length betweennandm as illustrated in Figure 4. We definextn as follows:

xtn =∑

m∈NL, m 6=n

cl(n,m)|z(n)− z(m)|

wherez(n) denote the routing layer that contains netn. Foreach netn, let dn(i) denote the Elmore delay [30] at sinki. Then, the maximum sink delay of netn, denoteddn, ismaxdn(i)|i ∈ n. The performance of a SOP global routingis to minimize the following cost function:

P x = maxdn|n ∈ NLFor each routing intervali in a 3D package withK place-

ment layers, letLt(i), Lr(i), and Lb(i) respectively denotethe top pin distribution layer pairs, routing layer pairs, andbottom pin distribution layer pairs. There exists three kinds ofconnections in each routing intervali: top (connection betweenLp(i) and Lt(i)), middle (connection betweenLt(i) andLb(i)), and bottom (connection betweenLb(i) andLp(i+1)).We use the routing layer pairs inLt(i), Lr(i), and Lt(i)respectively for the top, middle, and bottom connections. Weconstruct therouting grid, denotedG(i), that contains all thedistributed pins fromLt(i) andLb(i) in am×n 2D grid. Thesepins are connected by a graph-based Steiner tree topologygeneration algorithm (we use an existing heuristic [31] forthis purpose). The total layers used in a SOP global routingsolution is given by:

Ltot =∑

1≤i≤K

(|Lt(i)|+ |Lr(i)|+ |Lb(i)|)

4In this weighted cost function, the area termAtot is normalized toAtot/A0, where A0 is the footprint area of the initial solution used forSimulated Annealing. This will ensure the area term stays close to one duringthe annealing process. The same normalization is done for the other objectivesas well—wirelength, decap, and thermal.


noise/thermal

analysis

SOP 3D Global Routing

1. pin redistribution

2. topology generation

3. layer assignment

4. channel assignment

5. local routing

simulated

annealing

module netlist

SOP 3D Module Placement

3D SOP layout

Fig. 5. Overall design flow

Section V-D discusses how to compute|Lr(i)| and SectionV-E discusses how to compute|Lt(i)| and |Lb(i)|.

Lastly, the formal definition ofSOP Global Routing Prob-lem is as follows: Given a 3D placement and netlist, generatea routing topology for each netn, assign n to a set ofrouting layers and assign all pins ofn to legal locations. Allconflicting nets are assigned to different routing layers whilesatisfying various wire/via capacity constraints. The objectiveis to minimize the following cost function:

w5 ·Ltot +w6 ·P x +∑

n∈NL

(w7 ·xtn +w8 ·wln +w9 ·vn) (2)

We minimize |Lr| during our layer assignment step and|Lt|+|Lb| during our channel assignment step.P x is the focusduring our topology generation step. The wirelength, via, andcrosstalk minimization are addressed in all steps of our globalrouter.

D. Overview of the Approach

An overview of our 3D SOP physical design tool is shownin Figure 5. The input to the flow is a module-level netlist thatspecifies how the SOP modules such as digital die, analog die,embedded passives, opto/MEMS modules, etc, are connected.Our 3D module placement is based on Simulated Annealingapproach [32], where we examine many candidate 3D moduleplacement solutions and evaluate each of them using ourthermal and power supply noise analyzers. More details arepresented in Section IV. Our 3D global routing is decomposedinto five steps, where the IO pins from the blocks are firstevenly distributed using pin distribution layers to alleviatecongestion problem. We then construct routing tree topologyfor all nets and assign them to the routing layers to minimizecrosstalk. We place the feedthrough-vias for x-nets and finishconnection in each placement layer. More details are presentedin Section V.

IV. SOP PLACEMENT ALGORITHM

A. Overview of the Algorithm

Simulated Annealing is a very popular approach for moduleplacement due to its high quality solutions and flexibility

dy

dz

dx

zpxn

yp

zn

yn

xp

Fig. 6. Top: 3D grid of a SOP for thermal modelling

in handling various constraints. We extend the existing 2DSequence Pair scheme [33] to represent our 3D module place-ment solutions. Simulated Annealing procedure starts with aninitial multi-layer placement along with its cost in terms ofarea, wirelength, decap, and maximum temperature. We thenmake a random perturbation (move) to the initial solutionto generate a new 3D placement solution and measure itscost. We perform 3D SSN analysis to compute the amountof decap needed. The algorithm does a one time set-up of thethermal matrices. These matrices are used during incrementaltemperature calculations to evaluate the thermal cost. Thethermal modelling and evaluation is explained in section IV-B. White space insertion (discussed in Section IV-D) is doneand the increase in footprint area is estimated. If the new costis lower than the old one, the solution is accepted; otherwisethe new solution is accepted based on some probability thatis dependent on temperature of the annealing schedule. Weexamine a pre-determined number of candidate solutions ateach temperature. The temperature is decreased exponentially,and the annealing process terminates when the freezing tem-perature is reached. The actual decap allocation is done afteroptimization using LP solver discussed in Section IV-D.

B. 3D Thermal Analysis

We use a 3D thermal resistance mesh, as shown in Figure6, for thermal analysis. Each node models a small volume ofthe 3D SOP substrate, and each edge denotes the connectivitybetween two adjacent regions. This is equivalent to using adiscrete approximation of the steady state thermal equation−k∇2T = P , where k is thermal conductivity,T is tem-perature, andP is power. This results in the matrix equationR · P = T .

Since the matrix computation involved with thermal analysisfor each candidate 3D placement solution is expensive, wetackle this problem with the following scheme. Assuming thatthe thermal conductivity of the modules are similar, swappingthe module locations would not change the thermal resistancematrixR too much. This means that matrixR only needs to becomputed once in the beginning. To calculate the temperatureprofile of a new block configuration, the power profileP needsto be updated and then multiplied byR. Alternatively, a changein power profile∆P can be defined. MultiplyingR and∆P


POWER 1

POWER 2

POWER 3

(a) (b)

Fig. 7. Illustration of 3D power supply modelling. (a) multi-layer powersupply network, where multiple sets of device, routing, and power supplylayers are stacked together, (b) 3D-grid modelling. where black and graynodes respectively denote power supply and consumption nodes.

will give change in temperature profile∆T . Adding∆T to theold temperature profile will give the new temperature profile.These equations summarize the two methods:Tnew = R ·Pnew, ∆P = Pnew−Pold, ∆T = R ·∆P , Tnew = Told+∆T .Swapping two blocks usually has a small effect on the powerprofile, so∆P should be sparse. This reduces the number ofmultiplications used by the second method at the expense ofdoing extra additions and subtractions.

C. 3D Power Supply Noise Modelling

We model the P/G network for 3D SOP as a 3D gridgraph as shown in Figure 7. The edges in the grid-graphhave inductive and resistive impedances. The mesh containspower-supply points and connection points, which supply andconsume currents. The current distibution for the blocks isinversely proportional to the impedance of path from source.

The dominant current sourcefor a block is defined as thevoltage source supplying significantly more power to the blockthan any other neighboring sources. Thedominant pathfor ablock is the path from the dominant supply to the block caus-ing the most drop in voltage. It has been shown experimentallyin [34] that the shortest path between the dominant currentsource (nearest Vdd pins) and the block offers highly accurateSSN estimation within reasonable runtime. In our 3D SSNanalysis engine, we compute dominant paths for all blockswhich is dynamically updated whenever a new placementsolution is evaluated in terms of SSN. LetPk be a dominantcurrent path for blockk. Then T k = Pj : Pj ∩ Pk 6= ∅denotes the set of dominating paths overlapping withPk

(T k includesPk itself). Let Pjk be the overlapping segmentsbetween pathPj and Pk. Let RPjk

and LPjkdenote the

resistance and inductance ofPjk. After the current paths andtheir values have been determined for all blocks, the SSN forBk is given by

V knoise =

∑

Pj∈T k

(ij ·RPjk+ LPjk

dijdt

)

whereij is the current in the pathPj , which is the sum of allcurrents through this path to various consumers. An illustrationis shown in Figure 8. The weight ofij and its rate of change

s3

AB

C

s2

s1

p1

p2

p3p

4

p5

p0

Fig. 8. Illustration of SSN calculation. The dominant current source for blockA is s1, which is not located in the same layer. The dominant (shortest) pathp0 carriesIA/6 amount of current, whereIA denotes the current demand ofA. The blockC draws current froms2 and s3 using p1, p2, andp3 (eachof these carriesIC/3 amount of current). The resistance ofp34, the overlapbetweenp3 andp4, contributes to the SSN atB andC.

3D Decap Allocation

Maximize S =

HXk=1

Xj∈Nk

βkjx(j)k (3)

Subject toXj∈Nk

x(j)k ≤ Ak, k = 1, 2, · · · , H (4)

k=HXk=1

x(j)k ≤ S(j), j = 1, 2, · · · , M (5)

x(j)k ≥ 0, ∀k, ∀j (6)

Fig. 9. LP-based 3D decap allocation

are the resistive and inductive components of the path. LetQk

denote the maximum charge drawn from the power supply byblock Bk. If θ = max(1, V k

noise/V limnoise), whereV lim

noise is thenoise tolerance, the decap allocated to blockBk is given by

Dk =(1− 1/θ)Qk

V limnoise

, 1 ≤ k ≤ M

where M denotes the total number of blocks to be placed.Finally, the decap cost is given byDtot =

∑Mk=1 Dk.

D. 3D Decoupling Capacitor Placement

In our LP-based 3D decap allocation shown in Figure 9,x

(j)k denotes the amount of decap allocated from white space

k to block j. The objective is to maximize the utilization ofwhite spaces for decap allocation. The constraint (4) limitsthe total allocation of a white space to its total area, whereAk denotes the area of white spacek, and Nk denotes theneighboring blocks ofk. The constraint (5) ensures that thetotal amount of decap allocated to each block does not exceedits demand, whereS(j) denotes the demand.

Unlike the 2D decap assignment as done in [34], theneighboring blocks in 3D case are the adjacent blocks eitherfrom the same placement layer or from neighboring layers.The assumption here is that the white space from differentlayers can be used to allocate decap. In order to facilitate this,we introduce predetermined parametersβkj to control decapallocation to blockj from white space modulek. βkj evaluatesthe usefulness of whitespacek to be used as a decap for


(a) (b) (c)

ws

Fig. 10. Illustration of 3D decap allocation. (a) 3D placement, (b) X-expansion, (c) XY-expansion, where the darker blocks denote the neighboringblocks of the decap (= white space) inserted. Note that blocks from otherlayers can utilize the white space for decap insertion.

block j. The value ofβkj decreases with increasing distanceof whitespace from the module that requires decap. Henceallocations of decap from far located white spaces will beavoided.

The second aspect of the formulation deals with the gen-eration of Ak, the optimal area that will accommodate alldecap without too much penalty on the final footprint area. Thedecap budgetS(j) for each block is determined throughSSNanalysis of a particular 3D placement. The decap allocationinvolves several iteration between white space insertion and LPsolving to reach a good solution both in terms of completenessof decap allocated and the additional increase in area. Thewhite space is generated by expanding the floorplan in theX and Y direction as illustrated in Figure 10. In a multi-layerplacement, however, we have to take additional care to balancethe expansion in each layer, so as to minimize the expansionof the total footprint area. The expansion is proportional to thedecap demand of each module. In our Sequence Pair-based 3Dplacement, we modify the horizontal and vertical constraintgraphs to expand the placement into X and Y directions,respectively.

Note that we may have to iterate between white space in-sertion and allocation if the current expansion does not satisfyall the decap needs. In order to prevent from iterating betweenLP and white space insertion, we start by adding whitespaces generously in the floorplan then use LP to performcompaction. In our scheme, the XY-expansion is controlled byβ parameters discussed earlier, where the existing white spaceshave higher ranks than the newly inserted ones. The actualamount of expansion is determined by a parameterα ≥ 1,where we increase the amount of expansion by a factor ofαfor each block that needs decap. This ensures that the initialexpansion satisfies the LP constraint. Since LP determinesthe individualx(j)

k , we use these assignment values to decidewhich white space insertion was not necessary and thus canbe removed.

As exact decap allocation is a time-consuming process, it isimpractical to solve an LP problem for every candidate solu-tion. However, our white space insertion takesO(n2), which iscomputationally equivalent to computing block location usinglongest path computation for area. Therefore, we can affordwhite space insertion (and the calculation of area increase dueto decap insertion) at every move. We perform the actual LP-based decap allocation at the end of the annealing process as

Coarse Pin Distribution1: CP = m× n grid;2: for (each pinp ∈ NL)3: s = slot closest top and under-utilized;4: placep in s;5: C = restricted multi-level clustering of pins;6: hgt = height ofC;7: B(hgt) = initial m× n placement at top level;8: for (i = hgt downto 0)9: move clusters inC(i) to optimize cost;10: B(i) = new m× n placement at leveli;11: projectB(i) to B(i− 1);12: return B(0);

Fig. 11. Pseudocode for coarse pin distribution algorithm

a compaction stage.

V. SOP GLOBAL ROUTING

A. Overview of the Approach

Our 3D global router is divided into the following five steps:

1) pin redistribution: we first determine which set of i-netsand x-net segments are assigned to each routing interval.The pins from these nets are then evenly distributedin the top and bottom pin distribution layers. Our pinredistribution step is further divided into three steps:coarse pin distribution, net distribution, and detailed pindistribution.

2) topology generation: Steiner trees are generated for allnets in each routing interval so that the performance ofthe routed design is optimized.

3) layer assignment: the routed nets are assigned to aunique routing pair in the routing layer so that the totalnumber of layers used is minimized.

4) channel assignment: for each x-net, its location of feed-through via in the routing channel is determined. Wealso assign channels and finish the connections for thei-nets that are to be routed in each placement layer.

5) local routing: we finish connections between the pinsnow located in the routing channels and the pins on theblock boundaries. We also determine the location of pinsfrom soft blocks.

We use an existing max-cut partitioning heuristic [35] for thenet distribution in step 1. For step 2, we use an existing RSA/Gheuristic [31] to generate the net topologies. In addition, weuse the congestion-driven rip-up-and-reroute [36] for step 5.Therefore, the focus of this paper is to develop heuristics forthe remaining steps in 1, 3, and 4.

B. Coarse Pin Distribution

A pseudocode for our coarse pin distribution algorithm isshown in Figure 11. First, we assign all pins in the placementlayers to a nearby grid point inCP , a m× n 2D grid, whiletrying to balance the number of pins assigned to each gridpoint (line 1-4). An illustration is shown in Figure 12. Weimpose pin capacity for each grid point so that the pins areevenly distributed inCP . Our approach is to visit the pins in


Fig. 12. Illustration of coarse pin distribution. Pins along the externalboundary are not shown for simplicity.

a random order and find the best grid for each pin. For eachpin p, a grid point that is closest to the original location ofpand has not violated the pin capacity constraint is chosen (line3). After this process is finished,CP serves the starting pointof our min-cut placement based algorithm, where each gridpoint corresponds to a partition. We then iteratively improvethe quality of this initial solution via move-based approach.In our new heuristic algorithm, our cost function is based on(i) how far the new pin location is from the initial location,(ii) total wirelength, and (iii) how evenly distributed the inter-partition connections are.

For each pinp, we define thedisplacement gain, denotedgd(p), to represent how much the distance between the originaland new location is reduced ifp is moved to another partition.We define thewirelength gain, denotedgw(p), to representhow much the length of the nets that containp (estimatedby the half-perimeter of the bounding box) is reduced ifp ismoved to another partition. For a partitionP , let deg(P ) de-note the number of nets that have connections toP . Then, thecutsize balance factoris defined to be the difference betweenthe maximum and minimumdeg(P ) among all partitions. Wedefine thebalance gain, denotedgb(p), to represent how muchthe cutsize balance factor is reduced. Our move-based multi-level mincut partitioning algorithm performs cell move basedon the combined gain function:

g(p) = w1 · gd(p) + w2 · gw(p) + w3 · gb(p)

Figure 13 shows an illustration of the gain computation. In ourmulti-level approach, we first performrestricted multi-levelclustering(line 5) that preserves the initialm× n placementresult, where two pins that are in different partitions initiallyare not clustered together. At each level of the cluster hierarchyfrom top to bottom (line 8), we compute the combined gaing(p) for each cluster and perform cluster moves. In order tocompute the displacement and balance gain of a group of pins(= cluster), we add the individual displacement and balancegain of all pins in this cluster. When there is no gain at acertain level, we decompose the clusters into next lower leveland perform refinement. This process continues until we obtaina solution at the bottom level (line 12).

C. Detailed Pin Distribution

After coarse pin distribution and net distribution are fin-ished, we know which set of nets are assigned to each routinginterval as well as their (evenly distributed) entry/exit points inpin distribution layers. However, the coarse pin distribution is

P1

P2

Fig. 13. Illustration of the gain computation for coarse pin distribution. A netn indicated by the black node is moved fromP1 to P2. Then,gd(n) = −2,gw = 0, andgb = −1, wheredeg(P2) = 3 becomes the maximum-degreepartition.

Detailed Pin Distribution1: for (each routing intervali)2: mi = ni = d

ptotal pin(i)e;

3: DP [i] = mi × ni grid;4: for (each pinp ∈ i)5: assignp to DP [i] usingCP result;6: for (each routing intervali)7: for (each netn ∈ DP [i])8: Fn = displacement force onn;9: obtain new location inDP [i] based onFn;10: Li = sort nets inDP [i] in lexicographic order;11: split Li and form columns inDP [i];

Fig. 14. Pseudocode for SOP detailed pin distribution algorithm

done based on the 2D grid that merged all multiple placementlayers into one. The even pin distribution in this 2D grid offersa good enough reference points for net distribution. But, it doesnot consider even pin distribution in each individual routinginterval. In addition, it is also possible that pin capacity foreach routing region in each routing interval may be violated.Therefore, the goal of detailed pin distribution is to addressthese problems in each routing interval so that the subsequenttopology generation and layer assignment truly benefit fromthis even pin distribution. In addition, we use a grid largeenough for each routing interval tolegalizepin location, i.e.,each grid point contains only one pin. Since the crosstalkminimization are addressed during the prior steps, the majorfocus of detailed pin distribution step is on (i) how far the newlocation is from the original location obtained from coarse pindistribution, and (ii) the total wirelength.

A pseudocode for our net distribution algorithm is shown inFigure 14. Our force-directed heuristic algorithm encouragesall pins from the same net to be placed closer to the center ofmass while minimizing the distance between the old and newpin location. The size of the grids for detailed pin distribution(= DP [i]) is determined for each routing interval so that eachpin can be assigned to a unique grid point (line 1-3). Inaddition, we project the coarse pin distribution result (=CP )to this new set of grids (line 5). Note that there still existsoverlap among the pins inDP [i] at this point even thoughDPis usually finer thanCP . In order to remove this overlap, weapply an additional force that slightly pulls each pin towardsthe center of mass. For each pinp in a netn, thedisplacementforce (line 8) for x-direction is defined as follows:

Fx(p) =x(Mp)− x(p)

width(np)


Fig. 15. Illustration of layer assignment

where x(Mp) denotes thex-coordinate ofM , the center ofmass ofn, andwidth(np) denotes the width of the boundingbox of n. We computeFy(p) using they-coordinates. Notethat −1 ≤ Fx(p), Fy(p) ≤ 1. The vector(Fx(p), Fy(p)) isthen added to(xp, yp) (line 9). This minor change on theoriginal pin location helps to remove most of the overlapin DP [i] while not increasing the wirelength too much. Wethen sort the pins based on the lexicographic order of newlocations and assign each pin starting from the top-most rowin the left-most column (line 10-11). Due to its simplicity,this deterministic algorithm is quite efficient and effective inreducing the additional wirelength required for pin distributionas well as the total wirelength among all nets as shown inSection VI.

D. SOP Layer Assignment

For each routing intervali, the routing gridGi containsall the (redistributed) pins from the top and bottom pindistribution layers (Lt(i) and Lb(i)). We generate Steiner-tree based routing topology using the RSA/G heuristic [31] toconnect these pins. The goal is to minimize the maximum sinkdelayDmax as discussed in Section III-C. The routing layerLr(i) in each routing intervali consists of several layer pairs,where each pair consists of one layer for horizontal wires andanother layer for vertical wires. Thus, we can assign a entirerectilinear routing tree to a routing pair. In addition, two treesthat are intersecting can also be assigned to the same routingpair provided that they do not violated the wire capacity ofthe routing regions involved. An illustration is shown in Figure15. The SOP layer assignment problem is to assign each netto a routing layer pair so that the wire capacity constraintis satisfied and the total number of layer pairs used for allrouting intervals is minimized. Our approach is to performlayer assignment for each routing interval independently.

For each routing intervali, we construct a Layer ConstraintGraph (LCG) [37], denotedLCGi, as follows: correspondingto each net inn ∈ RI(i) we have a node inLCGi. Two nodesx, y ∈ LCGi have an edgee = (x, y) between them if a netsegmentsx ∈ x and sy ∈ y are sharing the same edge inGi, i.e.,sx andsy are sharing the same boundary of a routingregion. Then we use a node coloring algorithm to assign colorsto the nodes inLCGi such that no two nodes sharing an edgeare assigned the same color. LetCtot denote the total numberof colors used during the node coloring, and letw denotethe wire capacity of the boundary in routing region. Then,the total number of layer pairs used in this routing interval iscomputed as follows:|Lr(i)| = dCtot/we. Lastly, a node withcolor q · w + r (r < w) is assigned to layer pairq. Let hmax

SOP Layer Assignment1: for (each routing intervali)2: LCGi = layer constraint graph fori;3: S = sort all nodes inLCGi;4: color = 0;5: for (nodej ∈ S)6: fin[j] = colors used by neighbors ofj;7: tmp = lowest used color/∈ fin[j];8: if (tmp = ∅)9: color = color + 1;10: j ← color;11: else12: j ← tmp;13: assign a layer pair to each net;

Fig. 16. Pseudocode for SOP layer assignment algorithm

andvmax respectively denote the maximum number of wiresused among all horizontal and vertical edges inGi. Then, thefollowing is a lower-bound on the number of layers used inLr(i): |Lr(i)| ≥ maxhmax, vmax/w.

Figure 16 shows the SOP layer assignment algorithm thatincludes our coloring heuristic. We first sort all nodes inLCGi

in a decreasing order of the number of their neighbors (line3). Let fin[n] denote the set of colors used by the neighborsof n (line 6). We visit the nodes in the sorted order (line5). In case there exists an used color that is not included infin[n] (line 7), we assign this color ton (line 12). In casethere exist multiple colors that satisfy this condition, we usethe lowest color. Otherwise, we introduce a new color andassign it ton (line 9-10). Lastly, we assign a layer pair toeach net based on its color (line 13). In spite of its simplicity,this greedy algorithm provides results that are very close to thelower bound on total number of layers used as demonstratedin Section VI.

E. SOP Channel Assignment

Our prior topology generation and layer assignment stepsfocus on the connections among the distributed pins in thepin distribution layers. after these steps are finished, thereremain two kinds of connections: (i) connections betweenthe original and distributed pins, and (ii) feed-through viainsertion. For both types of connections, the routing channelsin the placement layers are used. During the SOP channelassignment step, each pin in the pin distribution layer ismapped to a routing channel in the neighboring placementlayer. In addition, each pin from an x-net that needs topenetrate a placement layer is also mapped to a routing channelin the placement layer. Lastly, we generate routing topologyfor each pin-to-channel connection and assign it to a routinglayer pair in the pin distribution layer. Each placement layeris modelled with the Floor Connection Graph (FCG) [29] asillustrated in Figure 3. During the SOP local routing step,we use the congestion-driven rip-up-and-reroute [36] to (i)finish the routing for the given set of two-pin connectionswhile satisfying the wire capacity constraint, and (ii) decidethe location of pins for soft blocks along their boundaries. Anillustration is shown in Figure 17.


(a) (b)

Fig. 17. Illustration of (a) channel assignment, (b) local routing

For each placement layerLp(i), let X(i) denote the setof pins that need to be mapped to a routing channel inLp(i).X(i) contains pins fromLb(i−1) andLt(i). The pins inX(i)are grouped into two sets:terminal pin setPt(i) for the pinsthat have terminals inLp(i) and feed-through pin setPf (i)for the pairs of pins that require feed-through vias to penetrateLp(i). Let C(i) denote the set of routing channels inLp(i).Each channelc ∈ C(i) is associated with the pin capacityconstraint. The goal ofSOP Channel Assignment Problemisto map each pinp ∈ Xi to a routing channelc ∈ Ci for1 ≤ i ≤ K and finishp-to-c connection while satisfying thepin capacity constraintA, i.e., |Ci| < A. For each pair ofpins (p1, p2) ∈ Pf (i), we mapp1 andp2 to the same channelc ∈ Ci. Let |Lt(i)| denote the number of layers used to finishconnection between pins fromLp(i) andLt(i), and let|Lb(i)|denote the number of layers used to finish connection betweenpins from Lp(i+1) andLb(i). The objective of SOP channelassignment is to minimize the following cost function:

w5 ·∑

1≤i≤K

(|Lt(i)|+ |Lb(i)|) +∑

n∈NL

(w8 · wln + w9 · vn)

A pseudocode for SOP channel assignment algorithm isshown in Figure 18. We visit each placement layer and assignthe feed-through pins and terminal pins to the channels. Inaddition, an L-shaped routing topology for each pin-to-channelis constructed in a 2D gridG(i) (line 3). The signal delayof feed-through vias is larger than that of other types ofvias. Since each channel is under a capacity constraint, itis important to assign the feed-through pins to the nearestchannels first. Therefore, our strategy is to perform channelassignment for the feed-through pins first to minimize thedelay of x-nets that require feed-through vias. In addition,we give priority to the pins that are included in long nets.Thus, we sort the pins based on the wirelength (line 4). Ourheuristic algorithm assigns pins to channels based on the costof mapping—we seek a channel with the best mapping costfor a given pin (line 6-8). We compute the cost of mappingfor a given pinp and a channelc as follows:

cost(p, c) =room(c)

dist(p, c) · bend(p, c) · cong(p, c)

whereroom(c) denotes the number of pinsc can accommo-date until it violates the capacity constraint,dist(p, c) is theManhattan distance betweenp andc, bend(p, c) is the numberof bends in the connection betweenp and c, and cong(p, c)

SOP Channel Assignment1: for (each placement layeri)2: C(i) = channels ini;3: G(i) = 2D routing grid;4: S = sorted feed-through pins;5: for (each pinp ∈ S)6: best = ∅;7: for (each channelc ∈ C(i));8: computecost(p, c) and updatebest;9: Tp = p-to-best routing topology inG(i);10: update channel and edge usage inG(i);11: repeat line 4-10 for terminal pin set;12: perform layer assignment onG(i);

Fig. 18. Pseudocode for SOP channel assignment algorithm

denotes the total number of existing connections along theproposed L-shaped route. We choose the channel with themaximumcost(p, c). Note that a channel is represented witha line instead of a point. Thus,distance(p, c) andbend(p, c)are based on the shortest connection betweenp and any pointon c. Upon a pin to channel mapping, we update the usageof channel inC(i) and edges inG(i) (line 10). After thechannel assignment and topology generation for all pin-to-channel connections are finished, we perform layer assignmentusing our coloring heuristic presented in Section V-D andcompute the total number of layers used.

VI. EXPERIMENTAL RESULTS

We implemented our algorithms in C++/STL and ran our ex-periments on Linux Beowulf clusters. We tested our algorithmswith two sets of benchmarks. The first set is from the standardGSRC floorplan circuits [38]. The second set, named the GTbenchmark, was synthesized from the IBM circuits [39], wherewe use our multi-level partitioner [40] to divide the gate-levelnetlist into multiple blocks. The GSRC benchmarks are smallto medium-sized in terms of both the number of blocks andnets.5 The GT benchmarks contain medium to large numberof blocks withdenseconnections.

A. SOP Placement Results

The dimension of the area was assumed to be inmm2

and area per unit decap cost was chosen to be50mm2. Thenumber of placement layers is fixed tofour for all experiments.In our experiments, we used the same parameters acrossthe benchmarks. The technology parameters used were 0.01Ω/µm for wire resistance per unit length,1pH/µm for wireinductance per unit length, and10fF/µm for wire capacitanceper unit length. We report the average runtime for eachbenchmark measured in seconds. For our thermal analysis weused a grid size of 10 x 10 x 7 in the x, y and z direction. Thenumber of active layers was four with three passive layers inbetween them. The conductivity of the substrate was chosen tobe that of silicon (0.1W/mmC). The conductivity of the sidesof the package was fixed at0.01W/mmC. The heat sink was

5The GT benchmark circuits are available for download at our website:www.gtcad.gatech.edu .


assumed to be at the top of the package with a conductivityof 0.5W/mmC and the conductivity of the bottom of thepackage was chosen to be0.2W/mmC. The power density ofthe blocks varied from10W/mm2 to 300W/mm2 dependingon the switching current demands. The switching currentdemands were generated using a formula using a randomnumber and the size of the block.

Table I shows a comparison among 3D SOP placementalgorithms with three different objectives: (1) area/wire-driven(w1 = w2 = 1 and w3 = w4 = 0 in Equation (1), SectionIII-B), (2) decap-driven (w1 = w2 = w3 = 1 and w4 = 0)and (3) thermal-driven (w1 = w2 = w4 = 1 andw3 = 0). Ourbaseline algorithm is the area/wirelength-driven algorithm.• We achieved a 21% improvement over baseline in decap

cost using our decap-driven algorithm, with a 7% increasein final area and 15% increase in wirelength. The tem-peraturedecreasesby 3%.

• Our thermal-driven floorplanning achieved a 21% im-provement over baseline with a dramatic increase in totalarea of 76%. Wirelength increased by 28% and decaprequirementincreasesby 19%. We note that it may bepossible to reduce area and wirelength costs by fine-tuning parameters.

Table II shows the result of our final 3D placement al-gorithm that considers all four objectives: area, wirelength,decap, and thermal (w1 = w2 = w3 = w4 = 1 in Equation(1), Section III-B). We note that our area/wire/decap/thermal-driven algorithm achieves an improvement inboth decap andtemperature over baseline by 13% and 9% respectively. Thereis a reasonable increase in final area and wirelength by 19%and 15% respectively. We also tested our area/wire/decapalgorithm under a thermal constraint. In this case, all candidatesolutions during the simulated annealing are discarded if themaximum temperature is above a certain threshold (100C inour case). We observe that the area, wirelength, and decapresults improve simultaneously at the cost of slight increasein temperature. The CPU times of the algorithm depend on thenumber of candidate solutions evaluated. The times reported inthe table directly reflect the subset of the configuration spacespanned by the algorithm. The constrained algorithm ran muchfaster, making a small number of moves because fewer moveswere accepted.

Figure 19 shows the temperature and decap requirement ofeach block of the final floorplan of the n300 benchmark. Thefigure clearly shows that there is little correlation between tem-perature and decap. This shows that minimizing one objectivedoes not necessarily minimize the other objective as a positivecorrelation would indicate. It also means that minimizing bothobjectives is not mutually exclusive, as a negative correlationwould indicate. This matches with the block placement results.Table III shows the correlation constants.

Table IV shows power supply noise simulation results forthree 3D placement schemes–no-decap aware, decap-aware,and decap-aware+decap-placement. The P/G plane structuresize is 246mm × 254mm, and the top P/G plane pair wasmodelled using cavity resonator model [41] and simulated inHSPICE. The placement layer that uses this P/G plan includes14 active devices. The DC5V sources are located at four

Decap Requirement vs Block Temperature

R2 = 0.0003

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 20 40 60 80 100

Temperature (C)

Decap n300

Trend Line

Fig. 19. Decapvs temperature correlation plot for n300

TABLE III

DECAP vs TEMPERATURE CORRELATION CONSTANTS

ckt correlation ckt correlationn50 0.0350 n50b 0.0329n50c 0.0441 n100 0.0915

n100b 0.0570 n100c 0.0092n200 0.0001 n200b 0.0177n200c 0.0052 n300 0.0003gt100 0.0007 gt300 0.0458gt400 0.0000 gt500 0.0345gt600 0.0135 - -

edges in the plane pair and fourteen current sources exist inthe plane pair. As can be seen from the Table IV, the SSN forthe decap-aware algorithm is lower compared to non-decap-aware algorithm. The SSN of the noisiest block blk4 is1.58V ,which is reduced to1.42V by our decap-aware scheme. Withthe insertion of decap, the noise is suppressed to0.22V . Inaddition, the total amount of decap required for non-decap-aware algorithm is 26.7, which is reduced to 19.9 with ourdecap-aware scheme. The largest amount of decap is usedfor blk5 (0.50nF ), because of an increase in its SSN afteroptimization. The numbers show that the SSN was efficientlysuppressed and the amount of decap reduced by using ouralgorithms.

B. SOP Routing Results

Table V shows the characteristics of the GSRC and GTbenchmark designs used during routing. In Table VII we com-pare pin redistribution results. Under the DPD columns, weperform DPD (= detailed pin distribution presented in SectionV-C) only, where we skip CPD (= coarse pin distributionpresented in Section V-B) and assign all i-nets to the routinginterval below for net distribution. Under the CPD+DPD, weperform CPD using the algorithm, assign all i-nets to therouting interval below for net distribution, and perform DPD.DPD serves as our baseline, where CPD+DPD respectivelydemonstrate the impact of our coarse pin distribution. In allcases, we perform detailed pin distribution to legalize thepin location, i.e., remove overlaps among the pins. We usethe following metrics to evaluate our solutions: wirelength


TABLE I

AREA/WIRE-DRIVEN VS DECAP-DRIVEN VS THERMAL-DRIVEN ALGORITHMS

ckts area/wire-driven decap-driven thermal-drivenname size area wire decap temp area wire decap temp area wire decap tempn50 50 22107 2.66E+04 18.08 87.2523269 3.05E+04 5.28 85.25 37710 3.59E+04 29.73 68.90n50b 50 25509 2.51E+04 13.18 75.5326263 2.76E+04 4.10 74.44 38121 3.42E+04 15.06 63.25n50c 50 21233 2.35E+04 15.97 80.6622940 2.92E+04 2.23 70.97 31406 3.01E+04 19.54 65.11n100 100 31551 6.66E+04 78.23 86.5634307 7.31E+04 69.26 81.79 49376 8.41E+04 93.69 69.83n100b 100 28017 4.83E+04 73.04 104.8630301 5.61E+04 59.18 98.43 50151 6.95E+04 98.75 78.23n100c 100 32845 6.21E+04 80.37 90.9434217 6.76E+04 67.56 90.43 52908 8.43E+04 94.94 73.87n200 200 56012 1.71E+05 226.32 96.4669395 2.05E+05 223.22 96.21107736 2.45E+04 243.68 76.21n200b 200 55624 1.82E+05 233.20 97.7765399 2.16E+05 221.59 90.17103160 2.56E+05 251.29 76.25n200c 200 54259 1.69E+05 237.45 100.9661356 1.96E+05 228.60 103.12113549 2.43E+05 261.37 76.16n300 300 84610 2.86E+05 393.87 100.1684368 2.86E+05 393.87 100.16131052 3.88E+05 405.30 86.65gt100 100 19156 1.32E+06 60.85 71.0020743 1.68E+06 42.55 70.90 47466 2.04E+06 92.74 52.34gt300 300 23898 1.96E+06 342.52 93.2124858 2.23E+06 334.99 99.55 52809 2.80E+06 392.13 72.19gt400 400 27017 2.81E+06 493.14 114.0526851 3.25E+06 482.05 111.63 36290 3.70E+06 512.03 89.25gt500 500 31665 3.03E+06 645.35 99.7832169 3.54E+06 632.41 98.07 54125 3.85E+06 684.46 80.34gt600 600 47541 6.08E+06 806.50 115.6250903 7.11E+06 794.80 106.84 77795 7.65E+06 835.03 84.60

RATIO 1.00 1.00 1.00 1.00 1.07 1.15 0.79 0.97 1.76 1.28 1.19 0.79TIME 563 304 512

TABLE II

THERMAL /DECAP-DRIVEN AND THERMAL -CONSTRAINT/DECAP-DRIVEN ALGORITHMS (THE RATIOS ARE RELATIVE TO THE BASELINE)

ckts area/wire/decap/thermal thermal constraintname area wire decap temp area wire decap temp

n50 29492 3.55E+04 9.33 76.2922020 2.97E+04 10.31 81.82n50b 27939 2.78E+04 6.58 67.7326768 2.74E+04 5.67 73.70n50c 23355 2.82E+04 5.48 72.2621942 2.86E+04 2.27 71.71n100 41082 7.70E+04 77.91 78.5337142 7.50E+04 71.05 83.78n100b 31158 5.42E+04 64.90 100.6727867 4.83E+04 73.04 104.86n100c 42169 7.79E+04 76.66 81.7237561 7.43E+04 73.25 87.67n200 82474 2.13E+05 229.12 85.4155802 1.72E+05 225.29 96.47n200b 55560 1.82E+05 233.20 97.7767203 2.14E+05 226.33 96.11n200c 54197 1.69E+05 237.45 100.9654174 1.69E+05 237.45 100.96n300 84446 2.86E+05 393.87 100.1284263 2.86E+05 393.88 100.12gt100 26474 1.86E+06 55.25 59.2720743 1.68E+06 42.55 70.90gt300 25617 2.23E+06 343.96 85.3224487 2.26E+06 329.64 90.43gt400 28279 3.46E+06 492.69 91.1626665 2.81E+06 493.14 114.05gt500 32156 3.48E+06 635.84 95.8232926 3.68E+06 620.78 99.48gt600 52716 6.85E+06 794.27 96.5747268 6.08E+06 806.50 115.62RATIO 1.15 1.16 0.87 .91 1.05 1.10 0.84 0.98TIME 367 191

between the original and the new location (dw), total wire-length (wl), crosstalk (xt) and total number of layer pairsused (lyr) in the routing layers. The displacement (dw) andwirelength (wl) results are scaled by106, and the time reportedis the average runtime among the GSRC/GT circuits. Fromthe comparison between DPD and CPD+DPD, we note thatthe displacement result (dw) increases by an average of 50%.However, CPD lowers the total wirelength (wl) consistentlyby 10% on average and the number of layers (lyr) by 10% onaverage.

In table VIII we show our topology generation (RSA/G) andlayer assignment (LAYER) results. We used the technologyparameters for0.13µ process for Elmore delay computation.Specifically, the driver resistance of29.4kΩ, input capacitanceof 0.050fF , unit-length resistance of0.82Ω/µm and unit-length capacitance of0.24fF/µm are used. We report thetotal wirelength (wl), Elmore delay of the nets with maximumsink delay (dly), and the lower bound and the actual numberof layers used for the top-most routing interval. In general,

GSRC benchmarks have bigger delay than GT benchmarksdue to the larger average wirelength. Our layer assignmentalgorithm presented in Section V-D is able to achieve resultsvery close to the lower bounds discussed in Section V-D. Forthe GT circuits, the layer assignment results are within 10%of the lower bound. For the GSRC circuits, we were able toachieve the results equal to the lower bound.

Our channel assignment results are shown in Table IX. Thebaseline case is where we optimize the wirelength only. Wethen compare it to our multi-objective channel assignmentalgorithm that simultaneously optimizes the wirelength, via,and layer usage. Our comparison indicates that the number oflayers is consistently and significantly reduced especially forthe bigger GT benchmarks, where an average improvementof 48% is observed. In case of the second largest benchmarkgt1000, we achieved 57% improvement. This saving on thelayer usage comes at the cost of increase in wirelength andvias. The average increase in wirelength is 12% and 30% forGSRC and GT benchmarks, respectively. The average increase


TABLE IV

POWER SUPPLY NOISE SIMULATION RESULTS. WE REPORTSSNNOISE

FOR EACH BLOCK PLACED IN THE TOP PLACEMENT LAYER. (A)

NON-OPTIMIZED CASE WITHOUT ANY DECOUPLING CAPACITOR. FROM

TABLE I WE NEED 26.7DECAP UNITS TO SUPPRESSSSNNOISE. (B)

OPTIMIZED CASE WITHOUT ANY DECOUPLING CAPACITOR. FROM TABLE

I WE NEED 19.9DECAP UNITS. (C) OPTIMIZED CASE WITH DECOUPLING

CAPACITORS INSERTED.

no decap decap no decap decapblk decap aware used blk decap aware usedblk1 1.21 1.01 0.16 blk8 1.33 1.21 0.36blk2 1.31 1.15 0.17 blk9 1.44 1.16 0.32blk3 1.33 1.18 0.20 blk10 1.34 1.38 0.32blk4 1.58 1.42 0.22 blk11 1.43 1.44 0.20blk5 1.26 1.41 0.50 blk12 1.41 1.35 0.33blk6 1.54 1.27 0.31 blk13 1.37 1.27 0.27blk7 1.47 1.33 0.43 blk14 1.05 1.15 0.22

TABLE V

BENCHMARK CHARACTERISTICS OFGSRCAND GT CIRCUITS. WCAP

DENOTES THE WIRE CAPACITY OF EACH BOUNDARY IN THE ROUTING

TILES. CPAND DP RESPECTIVELY DENOTE THE DIMENSION OF2D GRIDS

USED DURING OUR COARSE AND DETAILED PIN DISTRIBUTION STEPS.

ckts blk i-net x-net pin wcap CP DPn30 30 97 252 723 10 3x3 24x22n50 50 76 409 1050 10 4x4 28x26n100 100 189 696 1873 10 5x5 34x35n200 200 297 1288 3599 10 6x6 46x48n300 300 339 1554 4358 10 8x8 51x50gt50 50 2102 7317 20835 20 4x4 110x101gt100 100 4361 11823 36033 20 6x6 138x140gt300 300 4534 15538 45901 20 8x8 163x165gt1000 1000 6908 25561 79830 20 15x15 218x219gt1500 1500 6554 28171 87785 20 15x15 225x227

in via usage is 63% and 79% for GSRC and GT benchmarks,respectively. We noted that the channel assignment result isvery sensitive to the weighting constants among the objectivesused in our cost function. This indicates that the solution spaceof the channel assignment problem offers many useful tradeoffpoints.

Table X reports our local routing results. We report the wire-length, maximum and average routing demand as well as thestandard deviation. Our baseline is the local routing optimizedfor wirelength only. We then compare it against our multi-objective local routing algorithm that simultaneously optimizeswirelength and routing demands. In both cases, the same pindemand constraint is imposed. We note that the improvementof our multi-objective algorithm over the baseline is signifi-cant, especially for GSRC circuits—the routing demands werereduced by 33% on average while the wirelength increasedby only 10%. In addition, we reduced the routing demandsfor the GT benchmarks by 41% on average, with wirelengthincrease by 20%. In our biggest benchmarks (gt1500), ourrouting demand reduction is the largest (53%), which comeswith the maximum increase in wirelength (23%). This againindicates that the local routing result is very sensitive to theweighting constants among the objectives used in our costfunction. The lower standard deviation of our multi-objectivealgorithm indicates that the routing demand is more evenly

TABLE VI

AREA INFORMATION OF THE 4-LAYER SOPFLOORPLANNING FORGSRC

AND GT BENCHMARK DESIGNS. THE MIN AND MAX RESPECTIVELY

DENOTE THE DIMENSION OF THE MINIMUM AND MAXIMUM FLOORPLAN

AREA, WHEREAS THE FINAL COLUMN DENOTES THE DIMENSION OF THE

OVERALL FOOTPRINT.

ckts min max finaln30 239x218 251x233 251x237n50 226x212 247x228 247x230n100 204x211 221x231 221x231n200 199x210 218x225 218x227n300 262x252 289x274 289x274gt50 88x108 76x139 97x139gt100 111x117 131x143 131x143gt300 125x124 127x161 127x161gt1000 128x128 122x172 128x172gt1500 132x143 136x183 137x183

TABLE VII

SOPPIN REDISTRIBUTION RESULTS USING OUR COARSE AND DETAILED

PIN DISTRIBUTION ALGORITHMS. WE REPORT THE WIRELENGTH

BETWEEN THE ORIGINAL AND THE NEW LOCATION (DW), TOTAL

WIRELENGTH (WL), CROSSTALK (XT) AND TOTAL NUMBER OF LAYER

PAIRS USED(LYR) IN THE ROUTING LAYERS.

DPD CPD+DPDckt dw wl xt lyr dw wl xt lyr

n30 0.10 0.03 0 5 0.15 0.04 0 5n50 0.15 0.04 5 7 0.24 0.05 5 6n100 0.26 0.07 10 10 0.41 0.07 10 8n200 0.51 0.16 51 15 0.80 0.15 52 14n300 0.69 0.21 219 14 1.17 0.21 220 16gt50 1.50 0.41 92 38 2.19 0.38 90 33gt100 3.01 0.84 441 73 4.34 0.78 446 57gt300 3.94 1.19 2200 74 6.14 1.07 2186 66gt1000 7.45 2.21 13365 89 11.46 2.06 13496 87gt1500 8.76 2.67 14296 87 13.38 2.33 14366 94TIME 579 588

distributed (= lower congestion) compared to the wirelength-only case. Lastly, Figure 20 shows a snapshot of a 4-layerSOP with 3 routing interval for n200 benchmark.

VII. C ONCLUSIONS ANDONGOING WORKS

In this article, we presented the first physical layout algo-rithm for 3D SOP designs that includes thermal/decap-aware3D placement and crosstalk-aware 3D routing algorithm. Weextended the thermal and SSN models to 3D and used themto guide our 3D module placement. In addition to estimatingthe amount of decap needed to keep the SSN at the circuitstolerance level, we also efficiently used near-by white spacesto allocate decap if necessary. Our routing process is dividedinto pin redistribution, topology generation, layer assignment,channel assignment, and local routing steps. Our major ob-jective is to reduce the amount of crosstalk and layers whilesatisfying various constraints on routing resource. Our ongoingwork includes SOP detailed routing.

ACKNOWLEDGMENT

This research is partially supported by the grants fromNational Science Foundation and the Packaging ResearchCenter at Georgia Institute of Technology.


TABLE X

SOPLOCAL ROUTING RESULTS FOR THE BASELINE(WIRELENGTH MINIMIZATION ONLY ) AND MULTI -OBJECTIVE ALGORITHMS. WE REPORT THE

WIRELENGTH (WL), MAXIMUM (MAX ) AND AVERAGE (AVE) ROUTING DEMAND AS WELL AS THE STANDARD DEVIATION (DEV).

wl-only wl+rdckt wl max avg dev wl max avg dev

n30 0.080 47 7.85 8.82 0.082 38 7.69 7.63n50 0.129 66 10.29 11.61 0.133 64 10.10 9.63n100 0.247 183 14.83 19.21 0.276 112 14.77 14.73n200 0.512 305 22.25 28.68 0.593 144 22.82 19.45n300 0.786 297 22.50 29.59 0.920 145 23.23 19.28gt50 1.417 3423 296.39 456.73 1.684 2089 311.30 356.83gt100 2.803 4564 343.74 611.19 3.338 3335 367.08 465.06gt300 5.069 6504 327.32 596.98 6.197 3629 358.67 435.62gt1000 8.162 6492 263.65 500.58 9.698 3730 304.08 382.46gt1500 9.666 9221 256.60 506.9311.899 4313 294.06 378.35TIME 547 573

TABLE VIII

TOPOLOGY GENERATION AND LAYER ASSIGNMENT RESULTS. WE REPORT

THE TOTAL WIRELENGTH (WL), ELMORE DELAY OF THE NETS WITH

MAXIMUM SINK DELAY (DLY ), AND THE LOWER BOUND (LOW) AND THE

ACTUAL NUMBER (LYR) OF LAYERS USED.

RSA/G LAYERckt wl dly low lyr

n30 391.8 2.779 2 2n50 429.3 3.045 3 3n100 384.7 2.728 7 7n200 400.5 2.841 7 7n300 506.7 3.599 9 9gt50 202.7 1.436 16 16gt100 232.3 1.645 29 29gt300 259.2 1.837 27 31gt1000 291.4 2.064 32 37gt1500 289.5 2.054 40 49TIME 90 150

REFERENCES

[1] R. Tummala and V. Madisetti, “System on chip or system on package?”IEEE Design & Test of Computers, pp. 48–56, 1999.

[2] R. Tummala, “SOP: What is it and why? a new microsystem-integrationtechnology paradigm-moore’s law for system integration of miniaturizedconvergent systems of the next decade,”IEEE Transactions on AdvancedPackaging, 2004.

[3] R. T. et al, “The SOP for miniaturized, mixed-signal computing, commu-nication, and consumer systems of the next decade,”IEEE Transactionson Advanced Packaging, 2004.

[4] K. Kundert, H. Chang, D. Jefferies, G. Lamant, E. Malavasi, andF. Sendig, “Design of mixed-signal systems-on-a-chip,”IEEE Trans. onComputer-Aided Design of Integrated Circuits and Systems, 2000.

[5] G. Gielen and R. Rutenbar, “Computer-aided design of analog andmixed-signal integrated circuits,”Proceedings of the IEEE, 2000.

[6] Y. Deng and W. Maly, “Interconnect characteristics of 2.5-D systemintegration scheme,” inProc. Int. Symp. on Physical Design, 2001.

[7] S. Das, A. Chandrakasan, and R. Reif, “Design tools for 3-D integratedcircuits,” in Proc. Asia and South Pacific Design Automation Conf., 2003.

[8] B. Goplen and S. Sapatnekar, “Efficient thermal placement of standardcells in 3D ICs using a force directed approach,” inProc. IEEE Int.Conf. on Computer-Aided Design, 2003.

[9] J. Minz and S. K. Lim, “Layer assignment for system on packages,” inProc. Asia and South Pacific Design Automation Conf., 2004.

[10] J. Minz, M. Pathak, and S. K. Lim, “Net and pin distribution for3D package global routing,” inProc. Design, Automation and Test inEurope, 2004.

[11] R. Ravichandran, J. Minz, M. Pathak, S. Easwar, and S. K. Lim, “Phys-ical layout automation for system-on-packages,” inIEEE ElectronicComponents and Technology Conference, 2004.

TABLE IX

SOPCHANNEL ASSIGNMENT RESULTS. WE REPORT THE LAYER USAGE,

WIRELENGTH, AND VIA FOR THE BASELINE (WIRELENGTH MINIMIZATION

ONLY) AND MULTI -OBJECTIVE ALGORITHMS.

wl-only lyr+wl+viackt lyr wl via lyr wl via

n30 9 0.023 103 8 0.033 132n50 10 0.035 167 8 0.039 239n100 12 0.039 317 9 0.049 484n200 13 0.058 603 10 0.081 1244n300 12 0.066 686 12 0.086 1314gt50 11 0.710 7033 9 0.833 11538gt100 18 1.320 11770 14 1.503 21249gt300 26 1.925 17938 17 2.087 34418gt1000 67 4.048 43029 29 4.448 77237gt1500 74 4.517 48299 35 5.033 87506TIME 194 210

[12] J. Minz, S. K. Lim, J. Choi, and M. Swaminathan, “Module placementfor power supply noise and wire congestion avoidance in 3D packaging,”in Proc. IEEE Electrical Performance of Electronic Packaging, 2004.

[13] J. Minz and S. K. Lim, “A global router for system-on-package targetinglayer and crosstalk minimization,” inProc. IEEE Electrical Performanceof Electronic Packaging, 2004.

[14] J. Minz, E. Wong, and S. K. Lim, “Thermal and congestion-aware phys-ical design for 3d system-on-pacakge,” inIEEE Electronic Componentsand Technology Conference, 2005.

[15] J. Choi, S. Chun, N. Na, M. Swaminathan, and L. Smith, “A method-ology for the placement and optimization of decoupling capacitors forgigahertz systems,” inVLSI Design Symposium, 2000.

[16] A. Kamo, T. Watanabe, and H. Asai, “An optimization method forplacement of decoupling capacitors on printed circuit board,” inProc.IEEE Electrical Performance of Electronic Packaging, 2000.

[17] Y. Chen, Z. Chen, and J. Fang, “Optimum placement of decouplingcapacitors on packages and printed circuit boards under the guidance ofelectromagnetic field simulation,” inIEEE Electronic Components andTechnology Conference, 1996.

[18] J. Lee, “Thermal placement algorithm based on heat conduction anal-ogy,” IEEE Trans. on Components and Packaging Technologies, 2003.

[19] K. Deb, P. Jain, N. Gupta, and H. Maji, “Multiobjective placement ofelectronic components using evolutionary algorithms,”IEEE Trans. onComponents and Packaging Technologies, 2004.

[20] K. Khoo and J. Cong, “An efficient multilayer MCM router based onfour-via routing,” IEEE Trans. on Computer-Aided Design of IntegratedCircuits and Systems, pp. 1277–1290, 1995.

[21] J. D. Cho, K. Liao, S. Raje, and M. Sarrafzadeh, “M2R: Multilayerrouting algorithm for high-performance MCM,”IEEE Trans. on Circuitsand Systems Part I, 1994.

[22] W. W. Dai, “Topological routing in surf: Generating a rubberbandsketch,” inProc. ACM Design Automation Conf., 1991, pp. 39–48.


Fig. 20. A snapshot of 4-layer SOP with 3 routing interval for n200benchmark. The routing-tile boundaries with heavy usage are shown withthicker lines.

[23] D. Wang and E. Kuh, “A new timing-driven multilayer MCM/IC routingalgorithm,” in Proc. IEEE Multi-Chip Module Conf., 1997.

[24] Y. Cha, C. Rim, and K. Nakajima, “A simple and effective greedymultilayer router for MCMs,” inProc. Int. Symp. on Physical Design,1997.

[25] J. Cong and P. Madden, “Performance driven multi-layer general arearouting for PCB/MCM designs,” inProc. ACM Design AutomationConf., 1998.

[26] J. Cho and M. Sarrafzadeh, “The pin redistribution problem in multi-chip modules,” inProc. Int. ASIC Conf., 1991.

[27] D. Chang, T. Gonzales, and O. Ibarra, “A flow based approach to thepin redistribution problem for multi-chip modules,” inProc. Great LakesSymposum on VLSI, 1994.

[28] J. D. Cho, “An optimum pin redistribution for multichip modules,” inProc. IEEE Multi-Chip Module Conf., 1996.

[29] J. Cong, “Pin assignment with global routing for general cell design,”IEEE Trans. on Computer-Aided Design of Integrated Circuits andSystems, pp. 1401–1412, 1991.

[30] W. Elmore, “The transient response of damped linear networks withparticular regard to wideband amplifiers,”Journal of Applied Physics,pp. 55–63, 1948.

[31] J. Cong, A. B. Kahng, and K.-S. Leung, “Efficient algorithms for theminimum shortest path steiner arborescence problem with applicationsto VLSI physical design,”IEEE Trans. on Computer-Aided Design ofIntegrated Circuits and Systems, pp. 24–39, 1999.

[32] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization bysimulated annealing,”Science, pp. 671–680, 1983.

[33] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, “Rectangle pack-ing based module placement,” inProc. IEEE Int. Conf. on Computer-Aided Design, 1995, pp. 472–479.

[34] S. Zhao, C. Koh, and K. Roy, “Decoupling capacitance allocation and itsapplication to power supply noise aware floorplanning,”IEEE Trans. onComputer-Aided Design of Integrated Circuits and Systems, pp. 81–92,2002.

[35] J. D. Cho, S. Raje, M. Sarrafzadeh, M. Sriram, and S. M. Kang,“Crosstalk-minimum layer assignment,” inIEEE Custom IntegratedCircuits Conference, 1993.

[36] L. McMurchie and C. Ebeling, “Pathfinder: A negotiation basedperformance-driven router for FPGAs,” inACM Intl Symposium onField-Programmable Gate Arrays, 1995, pp. 111–117.

[37] M. Ho, M. Sarrafzadeh, G. Vijayan, and C. Wong, “Layer assignmentfor multichip modules,” IEEE Trans. on Computer-Aided Design ofIntegrated Circuits and Systems, pp. 1272–1277, 1990.

[38] GSRC, http://www.cse.ucsc.edu/research/surf/GSRC/GSRCbench.html.[39] C. J. Alpert, “The ISPD98 circuit benchmark suite,” inProc. Int. Symp.

on Physical Design, 1998, pp. 80–85.[40] J. Cong and S. K. Lim, “Edge separability based circuit clustering

with application to multi-level circuit partitioning,”IEEE Trans. onComputer-Aided Design of Integrated Circuits and Systems, vol. 23,no. 3, pp. 346–357, 2004.

[41] N. Na, J. Choi, M. Swaminathan, J. P. Libous, and D. P. O’Connor,

“Modeling and simulation of core switching noise for asics,”IEEETrans. Advanced Packaging, pp. 4–11, 2002.

Jacob Rajkumar Minz (S’05) received his B. Techdegree in Computer Science and Engineering fromthe Indian Institute of Technology, Kharagpur in2001. He was with the Advanced VLSI Design Lab,Indian Institute of Technology, Kharagpur for a year,where he was involved in the design of digital chips.He joined the School of Electrical and ComputerEngineering, Georgia Institute of Technology in Au-gust 2002, where he is pursuing research towardshis PhD. His areas of interest are physical designautomation and algorithms for electronic CAD.

Eric Wong (S’05) received his BS degree in elec-trical engineering from the State University of NewYork, Binghamton in 2004. He is currently studyingfor his PhD in electrical and computer engineeringat Georgia Institute of Technology. His researchinterests include physical VLSI design and designautomation for VLSI circuits.

Mohit Pathak (S’05) is a Ph.D. candidate and agraduate research assistant in the school of Electricaland Computer Engineering at Georgia Institute ofTechnology. He received his B.Tech degree in Com-puter Science and Engineering from Indian Instituteof Technology, Kharagpur in 2004. He was withMagma Design Automation, India for a year, wherehe worked as a member of technical staff. His areasof interest are physical design automation and VLSIdesign.

Sung Kyu Lim (S’94-M’00) received B.S. (1994),M.S. (1997), and Ph.D. (2000) degrees all fromthe Computer Science Department of Universityof California at Los Angeles. During 2000-2001,He was a post-doctoral scholar at UCLA, and asenior engineer at Aplus Design Technologies, Inc.He joined the School of Electrical and ComputerEngineering at Georgia Institute of Technology as anassistant professor in August 2001 and the Collegeof Computing as an adjunct assistant professor inSeptember 2002. He is currently the director of the

Georgia Tech Computer Aided Design Laboratory. Dr. Lim’s research focus ison the physical design automation for 3D circuits, 3D System-On-Packages,microarchitectural physical planning, Field Programmable Analog Arrays, andQuantum Cell Automata.

He has been on the advisory board of ACM/SIGDA since 2003. He iscurrently serving the technical program committee of IEEE InternationalSymposium on Circuits and Systems, ACM Great Lakes Symposium on VLSI,IEEE International Conference on Computer Design, ACM InternationalSymposium on Physical Design, and ACM/IEEE Asia and South PacificDesign Automation Conference. He has been awarded the ACM/SIGDA DACGraduate Scholarship in June 2003.

ieee transactions on components and ...ieee transactions on components and packaging technologies,...

Documents