Analytical Minimization of Signal Delayin VLSI Placement
Andrew B. Kahng and Igor L. Markov
UCSD, Univ. of Michiganhttp://www.eecs.umich.edu/~imarkov
IBM technical contact: Paul Villarrubia
Outline
• Background: Global Placement for VLSI– wirelength minimization
– delay minimization
• Contribution– minimization objective
– “generic” minimization algorithm: outer loop and inner loop
– empirical results
• Futures
VLSI Global Placement
• Find locations for standard cells
• Standard cells placed in rows, without overlap
• Minimize wirelength, “routing congestion”
• Minimize clock cycle
• Key abstractions:– standard cells rectangular outlines
– netlist weighted hypergraph (signal nets hyperedges)
– signal delay function of cell locations (interconnect dominates)
A VLSI Global Placement Example
bad placement good placement
Netlist Hypergraph and Timing Graph
• Two signal nets: 3 pins (l.blue), and 4 pins (l.green)
• Ovals: hyperedges
• Red edges: timing graph edges
Top-Down Global Placement• Placement blocks represent cells and layout area
– single block at the start, driven by recursive (min-cut) bipartitioning– each pass: number of blocks doubles, size of blocks halves– end case: several cells in a tiny region
etc.
•Intuition: many cells can operate in parallel.Partitioning finds “independent” groups of cells
Analytical Global Placement
• Find a continuous placement (locations == reals)• Efficient optimizations when nonconvex constraints are
relaxed (e.g., cells are allowed to overlap)• Represent multi-pin hyperedges by sets of edges
– minimize total weighted “wirelength” of all edges
Popular objectives:• Linear (Manhattan) WL = w12 ( |x1-x2| + |y1-y2| )• Quadratic “squared” WL = w12 ( (x1-x2)2 + (y1-y2)2 )Constraints: fixed vertices and/or “region constraints”
P1
P2
Analytical Placement Alone is Not Enough
• Many cells overlap• Must “spread” the placement • IBM CPlace and XQ
– Remove overlap (comp. geometry)
– Cplace combines min-cut with analytical techniques
Timing-Driven Placement
• Cycle time maximum path delay, not total path delay (!) – max(x,y,...) is not differentiable
– framework: pin-based timing graph
• Analytical approaches allow cell overlaps– Cell overlaps are resolved later
• Main difficulty: cannot enumerate signal paths• Signal paths implicitly defined by device types
– signal path sources, sinks == I/O pins and storage elements
• Timing constraints also implicitly defined– “actual arrival times” (AATs) at sources– “required arrival times” (RATs) at sinks– source-sink path constraint: path delay RAT@sink - AAT@source
Implicit Analysis of Path Constraints
• Static Timing Analysis (STA) methodology– forward topological traversal in timing graph AAT@every_pin
– similar backward traversal RAT@every_pin
– slack@pin is given by RAT@pin - AAT@pin
– negative slacks violated timing constraints
• STA-based and STA-inspired placement methods– slacks net weights for HPWL minimization
• top-down placement to maximize negative slack (Marek-Sadowska/Lin 86)
– note: STA requires edge delays (e.g., from placement)– delay budgets
• zero-slack (Hauge, Nair and Yoffa 86)• iterative min-max (Shragowitz et al. 90/92)• limit-bumping (Frankle 92)
Motivations For Novelty
• Many promising techniques available– net reweighting
– delay budgeting
– others
• Existing frameworks have weaknesses– speed/scalability
– loss or ignorance of input information• delay budgeting algorithms tend to ignore fixed locations, obstacles
– optimization of “wrong” global objectives (e.g., average wirelength)
The Dimensionless Path-Timing Objective
• For path consider edge e
• Dimensionless Path-Timing Objective (DPO)
=max {t /c}= max {(e de)/c}
• Where
– c is path constraint
– t is path delay
– de= dij(xi,yi,xj,yj) is edge delay
DPO: Properties
=max {t /c}= max {(e de)/c}
• 1 all timing constraints are satisfied
• Convex when edge delay models are convex
• Min DPO max slack when all c are equal
• Max slack can be reduced to min DPO– add two new vertices: the source and the sink
– connect the source to former sources
– connect the sink to former sinks
– use constant edge delay models
Criticalities: “Multiplicative Slacks”
• By analogy with slack, define criticalities
i = max v {t /c} for vertex v=vi
ij = max e {t /c} for edge e=eij
• Criticalities are multiplicative versions of slack
• DPO and criticalities quickly computable– STA + postprocessing
• Vertex criticalities cells on critical paths– can be used by the proposed top-down timing-driven placement flow
Generic Minimization of DPO
• Reduce DPO to a simpler objective: maxij wijdij
– maximal weighted edge delay
– use “reweighting iterations”
• One reweighting iteration– assume a placement
– compute edge criticalities
– compute new edge weights wij
– minimize maxij wijdij
• (New weights: wij’= ij / dij where = maxij wijdij )
Properties of Reweighting
• Theorem 1. If = maxij wijdij does not increase at a
particular iteration, all timing constraints must be satisfied.
• Theorem 2. A re-weighting iteration either decreases DPO, or leaves it unchanged.
• Reweighting upper-bounds dij because wijdij can interpret reweighting as delay rebudgeting
• Youssef and Shragowitz used wij= ij in 1990/92– [interpretation of their iterative MiniMax]
– no iterations with placement: ignore fixed pad locations
Optimization of Maximal Edge Delay
• Must consider particular edge delay models– popular choices: linear and quadratic
• Theorem 3. 2-dim max edge delay can be reduced to 1-dim case with double #vertices
• [“Inlined” implementation: no new graph]
max akm |tk-tm|
max bkm (tk-tm)2
• Theorem 4. Let bkm=akm2 minimizers coincide
Linear and quadratic WL are numerically equivalent!
Top-Down Placement Framework
• Top-down placement done in passes• In one pass
– split every previously existing block
• Cell-to-block assignments– viewed as region constraints– gradually refine, converge to cell locs
• Assume we analytically minimized signal delay have cell locations can compute edge delays can perform Static Timing Analysis know which cells lie on critical paths• Use delay-minimizing cell locs when splitting
blocks
Empirical Validation
• We combined min-max placement with recursive min-cut bisection (Capo CapoT)
• Implemented minimization of edge delay objectives:– Length as delay
– Squared length as delay
– Quadratic RC delay
– MST-based Elmore delay (using
• Evaluated– Internal evaluators (after placement): sanity check
– Industry timing analyzer
• Compared to an industry placer on 4 test-cases– Won on three test-cases (by slack computed with industry STA)
Results of Quadratic, Linear and Min-Max Placement
Results of Quadratic, Linear and Min-Max Placement
Conclusions and Ongoing Work
• New timing-driven placement framework– can potentially be combined with budgeting or reweighting
– expected to be successful enough on its own
– leverages mincut placement
– relies on a novel analytical delay minimization
• Dimensionless Path-timing Objective (DPO)– novel global timing objective; generalizes slack optimization
• New minimization algorithms– reweighting iteration: reduction to simpler MAX-based objective
– MAX-based objective can be minimized very quickly
• Ongoing work in the context of timing-driven flows
Future Work
• Observation (how the proposed method works)– a classic placement approach is split into stages
– a new timing optimization is performed between those stages
– most critical wires/gates are found first
(traditionally: placement is found first)
Try other types of optimizations during placement– routing of timing-critical nets
• better delay estimation
• early cross-talk detection?
– sizing of timing-critical drivers
– buffer insertion for timing-critical nets
– early detection of dangerous cross-talk
Faster and cheaper ICs