partitioning and clustering professor lei he [email protected]
Post on 20-Dec-2015
224 views
TRANSCRIPT
![Page 2: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/2.jpg)
Outline
Circuit Partitioning formulation
Importance of Circuit Partitioning
Partitioning Algorithms
Circuit Clustering Formulation
Clustering Algorithms
![Page 3: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/3.jpg)
Partitioning Formulation
Bi-partitioning formulation:
Minimize interconnections between partitions
Minimum cut: min c(x, x’)
minimum bisection: min c(x, x’) with |x|= |x’|
minimum ratio-cut: min c(x, x’) / |x||x’|
X X’
c(X,X’)
![Page 4: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/4.jpg)
A Bi-Partitioning Example
Min-cut size=13Min-Bisection size = 300Min-ratio-cut size= 19
a
b
c e
d f
mini-ratio-cut min-bisection
min-cut 9
10
100
100 100100100
100
4
Ratio-cut helps to identify natural clusters
![Page 5: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/5.jpg)
Circuit Partitioning Formulation (Cont’d)
General multi-way partitioning formulation:
Partitioning a network N into N1, N2, …, Nk such that
Each partition has an area constraint
each partition has an I/O constraint
Minimize the total interconnection:
iNv
iAva )(
iii INNNc ),(
),( iN
i NNNci
![Page 6: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/6.jpg)
Importance of Circuit Partitioning
Divide-and- conquer methodology
The most effective way to solve problems of high complexity
E.g.: min-cut based placement, partitioning-based test generation,…
System-level partitioning for multi-chip designs or 3D
inter-chip interconnection delay dominates system performance
inter-layer wire pitch is much larger
Circuit emulation/parallel simulation
partition large circuit into multiple FPGAs (e.g. Quickturn), or multiple special-purpose processors (e.g. Zycad).
Parallel CAD development
Task decomposition and load
![Page 7: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/7.jpg)
Partitioning Algorithms
Iterative partitioning algorithms
Multi-way partitioning
Multi-level partitioning (to be discussed after clustering)
![Page 8: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/8.jpg)
Iterative Partitioning Algorithms
Greedy Iterative improvement method
[Kernighan-Lin 1970]
[Fiduccia-Mattheyses 1982]
[krishnamurthy 1984]
Simulated Annealing
[Kirkpartrick-Gelatt-Vecchi 1983]
[Greene-Supowit 1984]
(SA will be formally introduced in the Floorplan chapter)
![Page 9: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/9.jpg)
Kernighan-Lin’s Algorithm
Pair-wise exchange of nodes to reduce cut sizeAllow cut size to increase temporarily within a pass
Compute the gain of a swap
Repeat
Perform a feasible swap of max gain
Mark swapped nodes “locked”;
Update swap gains;
Until no feasible swap;
Find max prefix partial sum in gain sequence g1, g2, …, gm
Make corresponding swaps permanent.
Start another pass if current pass reduces the cut size (usually converge after a few passes)
u v
v u
locked
![Page 10: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/10.jpg)
Fiduccia-Mattheyses’ Improvement
Each pass in KL-algorithm takes O(n3) or O(n2 logn) time (n: #modules)Choosing swap with max gain and updating swap gains take O(n2) time
FM-algorithm takes O(p) time per pass( p: #pins)
Key ideas in FM-algorithms Each move affects only a few moves constant time gain updating per move(amortized)
Maintain a list of gain buckets constant time selection of the move with max gain
Further improvement by KrishnamurthyLook-ahead in gain computation
•
•
u1
V1 V2
u2
gmax
-gmax
![Page 11: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/11.jpg)
Simulated Annealing
Local Searchco
st f
un
ctio
n
solution space
o
o
oo
o
oo
o
?
![Page 12: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/12.jpg)
Statistical Mechanicsvs Combinational Optimization
State { r: } (configuration - a set of atomic position)
Weight
-Boltzmann distribution
E({r:}) energy of configuration
KB: Boltzmann constant; T: temperature.
Low Temperature Limit??
TKrE be /:})({
![Page 13: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/13.jpg)
Analogy
Physical System
State(configuration)
Energy
Ground State
Rapid Quenching
Careful Annealing
Optimization Problem
(Solution)
Cost function
Optimal solution
Iteration Improvement
Simulated Annealing
![Page 14: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/14.jpg)
Generic Simulated Annealing Algorithm
1. Get an initial solution S
2. Get an initial temperature T>0
3. While not yet “frozen” do the following:3.1 For 1 i L, do the following:
3.1.1 Pick a random neighbor S’ of S.3.1.2 Let cost( s’ )-cost(s)3.1.3 If ( 0 ) (downhill move),
Set S=S’3.1.4 If 0 (uphill move)
set S=S’ with probability
3.2 Set T= rT (reduce temperature)
4. Return S
Te /
![Page 15: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/15.jpg)
Basic Ingredients for S.A.
Solution space
Neighborhood Structure
Cost Function
Annealing Schedule
![Page 16: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/16.jpg)
SA Partitioning“ Optimization by simulation Annealing” -Kirkpatrick, Gaett, Vecchi.
Solution space=set of all partitions
Neighborhood Structure
abc
def
ab
def
af
bcde
abc
a solution a solution a solution
def
bcde
ac
a move
Randomly move one cell to the other side
![Page 17: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/17.jpg)
SA PartitioningCost function:f=C+B
C is the partitioning cost as used before
B is a measure of how balance the partitioning is
is a constant.
Example of B:
ab...
cd...
S2S1
B = ( |S1| - |S2| )2
![Page 18: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/18.jpg)
SA Partitioning
Annealing schedule:
Tn=(T1/T0)nT0 Ratio T1/T0=0.9
At each temperature, either
1. There are 10 accepted moves on the average;
or
2. # of attempts100 total # of cells
The system is “frozen” if very low acceptances at 3 consecutive temp.
![Page 19: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/19.jpg)
Graph Partition Using Simulated Annealing Without Rejections
Greene and Supowit, ICCD-88 pp. 658-663
Motivation:
At low temperature, most moves are rejected!
e.g. 1/100 acceptance rate for 1,000 vertices
![Page 20: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/20.jpg)
Key Idea
(I) Biased selection
If a move i has probability i to be accepted, generate move i with probability
N: size of neighborhood
In general,
In conventional model, each move has probability 1/N to be generated.
(II) If a move is generated, it is always be accepted
Graph Partition Using Simulated Annealing Without Rejections (Cont’d)
N
Jj
i
1
}.,1min{ /Ti
ie
![Page 21: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/21.jpg)
Graph Partition Using Simulated Annealing Without Rejections (Cont’d)
Main Difficulty
( 1 ) i is dynamic ( since i is dynamic )
It is too expensive to update i’s (i’s) after every move
( 2 ) Weighted selection problem
how to select move i with probability
N
jji
1
??
![Page 22: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/22.jpg)
Solution to the Weight Selection Problem(general solution to the several problems)
1+ 2
1+ •••+ 7
7
5+ 6+ 7
5+ 63+ 4
1+ •••+ 4
12 3 4
5 6 70
![Page 23: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/23.jpg)
Solution to the Weight Selection Problem (Cont’d)
Let W= 1+ 2+ 3+4+ 5+ 6+ •••+n, how to select i with probability i /W ?
Equivalent to choosing x such that 1+ •••+i-1< x i+ •••+n
v rootx random( 0, 1 )* (v)while v is not a leaf do
if x < (left (v)) then v left(v) else x x-(left(v)), v right (v)
endProbability of ending up at leaf:
1
1 1
1
(Probi
j
i
jjj
i
N
jji
x
W
)
![Page 24: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/24.jpg)
Application to PartitioningSpecial solution to the first problem
Given a partition (A, B)
Cost F(A,B)=Fc(A,B)+FI(A,B)Fc(A,B) = net-cut between A,BFI(A,B) = C(|A|2+|B|2) (min when |A|=|B|=n/2)
for move i, i=F(A’,B’)-F(A,B)
After a move
),()','(
),()','(
BAFBAF
BAFBAFcII
i
ccci
changes All
changes. few aIi
Ci
![Page 25: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/25.jpg)
Solution:
Two-step biased selection:
(i) choose A or B based on
(ii) choose move i within A or B based
Note, ’s are the same for each in A or B.
So we keep one copy of for A
one copy of for B
choose the moves within A or B using the tree algorithm
Application to PartitioningSpecial solution to the first problem(Cont’d)
) ( -> TIi
Ii
) ( TCi
Ci
) (TIi
) ( TCi•Pi=
Ii
Ii
Ii
![Page 26: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/26.jpg)
More Partitioning TechniquesSpectral based partitioning algorithms [Hagen-Kahng 1991] [Cong-Hagen-Kahng 1992]
Module replication in circuit partitioning
[Kring-Newton 1991; Hwang-ElGamal 1992; Liu et al TCAD’95; Enos, et al, TCAD’99]
Generating uni-directional partitioning
[Iman-Pedram-Fabian-Cong 1993] or acyclic partitioning [Cong-Li-Bagrodia, DAC94] [Cong-Lim, ASPDAC2000]
Logic restructuring during partitioning[Iman-Pedram-Fabian-Cong 1993]
Communication based partitioning[Hwang-Owens-Irwin 1990; Beardslee-Lin-Sangiovanni 1992]
![Page 27: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/27.jpg)
Multi-Way Partitioning
Recursive bi-partitioning [Kernighan-Lin 1970]
Generalization of Fuduccia-Mattheyse’s and Krishnamurthy’s algorithms [ Sanchis 1989] [Cong-Lim, ICCAD’98]
Generalization of ratio-cut and spectral method to multi-way partitioning [Chan-Schlag-Zien 1993] generalized ratio-cut value=sum of flux of each partition generalized ratio-cut cost of a k-way partition
sum of the k smallest eigenvalue of the Laplacian Matrix
![Page 28: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/28.jpg)
Circuit Clustering Formulation
Motivation:
Reduced the size of flat netlists Identify natural circuit hierarchy
Objectives:
Maximize the connectivity of each cluster Minimize the size, delay (or simply depth),
density of clustered circuits
![Page 29: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/29.jpg)
Lawler’s Labeling Algorithm[Lawler-Levitt-Turner 1969]
Assumption: Cluster size K; Intra-cluster delay = 0; Inter-cluster delay =1
Objective: Find a clustering of minimum delay
Algorithm:Phase 1: Label all nodes in topological order
For each PI node V, L(v)= 0;
For each non-PI node v
p=Maximum label of predecessors of v
Xp = set of predecessors of v with label p
if |Xp|<K then L(v) = p else L(v) =P+1Phase2: Form clustersStart from PO to generate necessary clusters
Nodes with the same label form a cluster
p-1
Xp
p-1
v
p-1
p
p
![Page 30: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/30.jpg)
Lawler’s Labeling Algorithm(Cont’d)
Performance of the algorithm Efficient run-time Minimum delay clustering solution Allow node duplication No attempt to minimize the number of clusters
Extension to allow arbitrary gate delays Heuristic solution
[Murgai-Brayton-Sangiovanni 1991] Optimal solution
[Rajaraman-Wong 1993]
![Page 31: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/31.jpg)
Maximum Fanout Free Cone (MFFC)
Definition: for a node v in a combinational circuit, cone of v ( ) : v and all of its predecessors such that any path
connecting a node in and v lies entirely in fanout free cone at v ( ) : cone of v such that for any node
maximum FFC at v ( ) : FFC of v such that for any non-PI node w,
vC
vC vC
vFFCvv FFCuoutputFFCvu )( , in
vMFFC
vv MFFCwMFFCwoutput then ,)( if
![Page 32: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/32.jpg)
Properties of MFFCs If
Two MFFCs are either disjoint or one contains another [CoDi93]
[CoDi93] then , vwv MFFCMFFCMFFCw
![Page 33: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/33.jpg)
Maximum Fanout Free Subgraph (MFFS)
Definition : for a node v in a sequential circuit,
Illustration
} through passes PO some to from path every|{ vuuFFSv } , allfor |{ vvv FFSuFFSuMFFS
MFFCs ??? MFFS
![Page 34: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/34.jpg)
MFFS Construction Algorithm For Single MFFS at Node v
select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)
v
![Page 35: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/35.jpg)
MFFS Construction Algorithm For Single MFFS at Node v
select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)
v
![Page 36: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/36.jpg)
MFFS Construction Algorithm For Single MFFS at Node v
select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)
v
![Page 37: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/37.jpg)
MFFS Construction Algorithm For Single MFFS at Node v
select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)
v
![Page 38: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/38.jpg)
MFFS Clustering Algorithm Clusters Entire Netlist
construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))
v
![Page 39: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/39.jpg)
MFFS Clustering Algorithm Clusters Entire Netlist
construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))
![Page 40: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/40.jpg)
MFFS Clustering Algorithm Clusters Entire Netlist
construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))
v
![Page 41: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/41.jpg)
MFFS Clustering Algorithm Clusters Entire Netlist
construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))
![Page 42: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu](https://reader037.vdocument.in/reader037/viewer/2022103022/56649d445503460f94a20aa2/html5/thumbnails/42.jpg)
Summary Partitioning is key for applying divide-and-
conquer methodology (for complexity management)
Partitioning also defines global/local interconnects and greatly impact circuit performance
Growing importance of interconnect design has introduced many new partitioning formulations
clustering is effective in reducing circuit size and identifying natural circuit hierarchy
Multi-level circuit clustering + iterative improvement based methods produce the best partitioning results