partitioning and clustering professor lei he lhe@ee.ucla.edu
Post on 20-Dec-2015
224 Views
Preview:
TRANSCRIPT
Partitioning and Clustering
Professor Lei Helhe@ee.ucla.edu
http://eda.ee.ucla.edu/
Outline
Circuit Partitioning formulation
Importance of Circuit Partitioning
Partitioning Algorithms
Circuit Clustering Formulation
Clustering Algorithms
Partitioning Formulation
Bi-partitioning formulation:
Minimize interconnections between partitions
Minimum cut: min c(x, x’)
minimum bisection: min c(x, x’) with |x|= |x’|
minimum ratio-cut: min c(x, x’) / |x||x’|
X X’
c(X,X’)
A Bi-Partitioning Example
Min-cut size=13Min-Bisection size = 300Min-ratio-cut size= 19
a
b
c e
d f
mini-ratio-cut min-bisection
min-cut 9
10
100
100 100100100
100
4
Ratio-cut helps to identify natural clusters
Circuit Partitioning Formulation (Cont’d)
General multi-way partitioning formulation:
Partitioning a network N into N1, N2, …, Nk such that
Each partition has an area constraint
each partition has an I/O constraint
Minimize the total interconnection:
iNv
iAva )(
iii INNNc ),(
),( iN
i NNNci
Importance of Circuit Partitioning
Divide-and- conquer methodology
The most effective way to solve problems of high complexity
E.g.: min-cut based placement, partitioning-based test generation,…
System-level partitioning for multi-chip designs or 3D
inter-chip interconnection delay dominates system performance
inter-layer wire pitch is much larger
Circuit emulation/parallel simulation
partition large circuit into multiple FPGAs (e.g. Quickturn), or multiple special-purpose processors (e.g. Zycad).
Parallel CAD development
Task decomposition and load
Partitioning Algorithms
Iterative partitioning algorithms
Multi-way partitioning
Multi-level partitioning (to be discussed after clustering)
Iterative Partitioning Algorithms
Greedy Iterative improvement method
[Kernighan-Lin 1970]
[Fiduccia-Mattheyses 1982]
[krishnamurthy 1984]
Simulated Annealing
[Kirkpartrick-Gelatt-Vecchi 1983]
[Greene-Supowit 1984]
(SA will be formally introduced in the Floorplan chapter)
Kernighan-Lin’s Algorithm
Pair-wise exchange of nodes to reduce cut sizeAllow cut size to increase temporarily within a pass
Compute the gain of a swap
Repeat
Perform a feasible swap of max gain
Mark swapped nodes “locked”;
Update swap gains;
Until no feasible swap;
Find max prefix partial sum in gain sequence g1, g2, …, gm
Make corresponding swaps permanent.
Start another pass if current pass reduces the cut size (usually converge after a few passes)
u v
v u
locked
Fiduccia-Mattheyses’ Improvement
Each pass in KL-algorithm takes O(n3) or O(n2 logn) time (n: #modules)Choosing swap with max gain and updating swap gains take O(n2) time
FM-algorithm takes O(p) time per pass( p: #pins)
Key ideas in FM-algorithms Each move affects only a few moves constant time gain updating per move(amortized)
Maintain a list of gain buckets constant time selection of the move with max gain
Further improvement by KrishnamurthyLook-ahead in gain computation
•
•
u1
V1 V2
u2
gmax
-gmax
Simulated Annealing
Local Searchco
st f
un
ctio
n
solution space
o
o
oo
o
oo
o
?
Statistical Mechanicsvs Combinational Optimization
State { r: } (configuration - a set of atomic position)
Weight
-Boltzmann distribution
E({r:}) energy of configuration
KB: Boltzmann constant; T: temperature.
Low Temperature Limit??
TKrE be /:})({
Analogy
Physical System
State(configuration)
Energy
Ground State
Rapid Quenching
Careful Annealing
Optimization Problem
(Solution)
Cost function
Optimal solution
Iteration Improvement
Simulated Annealing
Generic Simulated Annealing Algorithm
1. Get an initial solution S
2. Get an initial temperature T>0
3. While not yet “frozen” do the following:3.1 For 1 i L, do the following:
3.1.1 Pick a random neighbor S’ of S.3.1.2 Let cost( s’ )-cost(s)3.1.3 If ( 0 ) (downhill move),
Set S=S’3.1.4 If 0 (uphill move)
set S=S’ with probability
3.2 Set T= rT (reduce temperature)
4. Return S
Te /
Basic Ingredients for S.A.
Solution space
Neighborhood Structure
Cost Function
Annealing Schedule
SA Partitioning“ Optimization by simulation Annealing” -Kirkpatrick, Gaett, Vecchi.
Solution space=set of all partitions
Neighborhood Structure
abc
def
ab
def
af
bcde
abc
a solution a solution a solution
def
bcde
ac
a move
Randomly move one cell to the other side
SA PartitioningCost function:f=C+B
C is the partitioning cost as used before
B is a measure of how balance the partitioning is
is a constant.
Example of B:
ab...
cd...
S2S1
B = ( |S1| - |S2| )2
SA Partitioning
Annealing schedule:
Tn=(T1/T0)nT0 Ratio T1/T0=0.9
At each temperature, either
1. There are 10 accepted moves on the average;
or
2. # of attempts100 total # of cells
The system is “frozen” if very low acceptances at 3 consecutive temp.
Graph Partition Using Simulated Annealing Without Rejections
Greene and Supowit, ICCD-88 pp. 658-663
Motivation:
At low temperature, most moves are rejected!
e.g. 1/100 acceptance rate for 1,000 vertices
Key Idea
(I) Biased selection
If a move i has probability i to be accepted, generate move i with probability
N: size of neighborhood
In general,
In conventional model, each move has probability 1/N to be generated.
(II) If a move is generated, it is always be accepted
Graph Partition Using Simulated Annealing Without Rejections (Cont’d)
N
Jj
i
1
}.,1min{ /Ti
ie
Graph Partition Using Simulated Annealing Without Rejections (Cont’d)
Main Difficulty
( 1 ) i is dynamic ( since i is dynamic )
It is too expensive to update i’s (i’s) after every move
( 2 ) Weighted selection problem
how to select move i with probability
N
jji
1
??
Solution to the Weight Selection Problem(general solution to the several problems)
1+ 2
1+ •••+ 7
7
5+ 6+ 7
5+ 63+ 4
1+ •••+ 4
12 3 4
5 6 70
Solution to the Weight Selection Problem (Cont’d)
Let W= 1+ 2+ 3+4+ 5+ 6+ •••+n, how to select i with probability i /W ?
Equivalent to choosing x such that 1+ •••+i-1< x i+ •••+n
v rootx random( 0, 1 )* (v)while v is not a leaf do
if x < (left (v)) then v left(v) else x x-(left(v)), v right (v)
endProbability of ending up at leaf:
1
1 1
1
(Probi
j
i
jjj
i
N
jji
x
W
)
Application to PartitioningSpecial solution to the first problem
Given a partition (A, B)
Cost F(A,B)=Fc(A,B)+FI(A,B)Fc(A,B) = net-cut between A,BFI(A,B) = C(|A|2+|B|2) (min when |A|=|B|=n/2)
for move i, i=F(A’,B’)-F(A,B)
After a move
),()','(
),()','(
BAFBAF
BAFBAFcII
i
ccci
changes All
changes. few aIi
Ci
Solution:
Two-step biased selection:
(i) choose A or B based on
(ii) choose move i within A or B based
Note, ’s are the same for each in A or B.
So we keep one copy of for A
one copy of for B
choose the moves within A or B using the tree algorithm
Application to PartitioningSpecial solution to the first problem(Cont’d)
) ( -> TIi
Ii
) ( TCi
Ci
) (TIi
) ( TCi•Pi=
Ii
Ii
Ii
More Partitioning TechniquesSpectral based partitioning algorithms [Hagen-Kahng 1991] [Cong-Hagen-Kahng 1992]
Module replication in circuit partitioning
[Kring-Newton 1991; Hwang-ElGamal 1992; Liu et al TCAD’95; Enos, et al, TCAD’99]
Generating uni-directional partitioning
[Iman-Pedram-Fabian-Cong 1993] or acyclic partitioning [Cong-Li-Bagrodia, DAC94] [Cong-Lim, ASPDAC2000]
Logic restructuring during partitioning[Iman-Pedram-Fabian-Cong 1993]
Communication based partitioning[Hwang-Owens-Irwin 1990; Beardslee-Lin-Sangiovanni 1992]
Multi-Way Partitioning
Recursive bi-partitioning [Kernighan-Lin 1970]
Generalization of Fuduccia-Mattheyse’s and Krishnamurthy’s algorithms [ Sanchis 1989] [Cong-Lim, ICCAD’98]
Generalization of ratio-cut and spectral method to multi-way partitioning [Chan-Schlag-Zien 1993] generalized ratio-cut value=sum of flux of each partition generalized ratio-cut cost of a k-way partition
sum of the k smallest eigenvalue of the Laplacian Matrix
Circuit Clustering Formulation
Motivation:
Reduced the size of flat netlists Identify natural circuit hierarchy
Objectives:
Maximize the connectivity of each cluster Minimize the size, delay (or simply depth),
density of clustered circuits
Lawler’s Labeling Algorithm[Lawler-Levitt-Turner 1969]
Assumption: Cluster size K; Intra-cluster delay = 0; Inter-cluster delay =1
Objective: Find a clustering of minimum delay
Algorithm:Phase 1: Label all nodes in topological order
For each PI node V, L(v)= 0;
For each non-PI node v
p=Maximum label of predecessors of v
Xp = set of predecessors of v with label p
if |Xp|<K then L(v) = p else L(v) =P+1Phase2: Form clustersStart from PO to generate necessary clusters
Nodes with the same label form a cluster
p-1
Xp
p-1
v
p-1
p
p
Lawler’s Labeling Algorithm(Cont’d)
Performance of the algorithm Efficient run-time Minimum delay clustering solution Allow node duplication No attempt to minimize the number of clusters
Extension to allow arbitrary gate delays Heuristic solution
[Murgai-Brayton-Sangiovanni 1991] Optimal solution
[Rajaraman-Wong 1993]
Maximum Fanout Free Cone (MFFC)
Definition: for a node v in a combinational circuit, cone of v ( ) : v and all of its predecessors such that any path
connecting a node in and v lies entirely in fanout free cone at v ( ) : cone of v such that for any node
maximum FFC at v ( ) : FFC of v such that for any non-PI node w,
vC
vC vC
vFFCvv FFCuoutputFFCvu )( , in
vMFFC
vv MFFCwMFFCwoutput then ,)( if
Properties of MFFCs If
Two MFFCs are either disjoint or one contains another [CoDi93]
[CoDi93] then , vwv MFFCMFFCMFFCw
Maximum Fanout Free Subgraph (MFFS)
Definition : for a node v in a sequential circuit,
Illustration
} through passes PO some to from path every|{ vuuFFSv } , allfor |{ vvv FFSuFFSuMFFS
MFFCs ??? MFFS
MFFS Construction Algorithm For Single MFFS at Node v
select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)
v
MFFS Construction Algorithm For Single MFFS at Node v
select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)
v
MFFS Construction Algorithm For Single MFFS at Node v
select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)
v
MFFS Construction Algorithm For Single MFFS at Node v
select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)
v
MFFS Clustering Algorithm Clusters Entire Netlist
construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))
v
MFFS Clustering Algorithm Clusters Entire Netlist
construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))
MFFS Clustering Algorithm Clusters Entire Netlist
construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))
v
MFFS Clustering Algorithm Clusters Entire Netlist
construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))
Summary Partitioning is key for applying divide-and-
conquer methodology (for complexity management)
Partitioning also defines global/local interconnects and greatly impact circuit performance
Growing importance of interconnect design has introduced many new partitioning formulations
clustering is effective in reducing circuit size and identifying natural circuit hierarchy
Multi-level circuit clustering + iterative improvement based methods produce the best partitioning results
top related