minimal skew clock embedding considering time-variant temperature gradient hao yu, yu hu, chun-chen...
TRANSCRIPT
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient
Hao Yu, Yu Hu, Chun-Chen Liu and Lei He
EE Department, UCLA
Presented by Yu HuPresented by Yu Hu
Partially supported by NSF and UC MICRO funds. Partially supported by NSF and UC MICRO funds.
OutlineOutline
Backgrounds and Motivations
Modeling and Problem Formulation
Algorithms
Experimental Results
Conclusions
Clock Tree Synthesis in Synchronous CircuitsClock Tree Synthesis in Synchronous Circuits
Clock signals synchronize data transfer between functional elements in synchronous design
Different clock structures exist [Tree, Mesh, Hybrid, etc]
Clock skew is the delay difference between two sinks of clock tree
Clock skew becomes one of the most significant concerns in clock tree synthesis for high performance designs
PLL
MEM-ctrll
Sys
Disp
AUDIO
VIDEO
Source Intel
Methodologies for Clock Skew MinimizationMethodologies for Clock Skew Minimization
The sources of skew Un-balanced clock distribution Process, supply voltage and temperature (PVT) variation Uncertainty from loading
Methodologies Active de-skew circuit using micro-controller [Rusu’00] Passive balanced embedding by CAD algorithms [Tsay'91]
[Edahiro'91] [Chao'92] [Boese'92] [Cong’98]
s4
a b
s1 s2 s3
s0v
s0
s1
s3
s4
s2
a bv
Topo-Gen
Embedding
Variation-induced skew needs to be
considered!
Existing work and Our ContributionsExisting work and Our Contributions
This work is focused on reducing the temperature variation induced skew
The existing work for temperature aware clock skew minimization [Cho:ICCAD’05] Considered only spatial temperature variations The time-variant temperature variation was ignored Assumed the worst case temperature map was given
The major contributions of this work1. Build a parameterized macro model for temperature variations2. Present an effective algorithm PECO, which consider the time-
variant temperature variation with correlation3. PECO reduces worst case skew by up to 5x compared with the
ZST/DME algorithm
OutlineOutline
Backgrounds and Motivations
On-chip Temperature Variation Modeling Variation Sources: Spatial & Temporal Temperature Correlations
Algorithms
Experimental Results
Conclusions
Spatial Temperature Variation Induced SkewSpatial Temperature Variation Induced Skew
Spatial variant: Non-uniform power density generates on-chip temperature gradient
Clock tree embedding considering the spatial temperature variation: TACO [Cho:ICCAD’05] Ignore the time-variant temperature under different
workloads
Temporal Temperature Variation Induced SkewTemporal Temperature Variation Induced Skew
Significant different temperature maps from two SPEC2000 applications: Ammp, Gzip
DSA=7ns
DSB =7ns
DSA=2ns
DSB =6ns
A A
B B
S S
Skew = 0ns Skew = 4ns
Dilemma: Optimizing skew for one application hurts the
other….
Problem FormulationProblem Formulation Given:
The source, sinks and an initial embedding of the clock tree
Each region is modeled by mean and variance for temperature, and correlation between variations
To find: An re-embedding of the clock tree
To Minimize the worst case skew under all temperature variations
Correlations in Temperature VariationCorrelations in Temperature Variation
Spatial and Temporal Correlation: Strong correlations exist between temperature for different workloads and different regions on chip Resource sharing between workloads cause
temporal correlation
Considering temperature
correlations during optimization can
compress searching space!
(i,j) Correlation between area i and j
OutlineOutline
Backgrounds and Motivations
Modeling and Problem Formulation
Re-embedding Algorithm
Experimental Results
Conclusions
Re-embedding Process (An example)Re-embedding Process (An example)
d
x y
a b c
v
a
b
vd
c
x
y
Sink
Original merging point
Perturbation option
Re-embedding Process (An example)Re-embedding Process (An example)
a
b
vd
c
x
y
d
x y
a b c
v
New merging point
The clock tree is a SIMO linear system Cares impulse responds in each sinks
Perturbed Modified Nodal Analysis (MNA) x is for source, sinks and merging point L selects sink responses Defining a new state variable with both nominal (x) and
perturbed state variables (Δx)
Structured and parameterized state matrix
Delay, Skew Calculation for Clock TreeDelay, Skew Calculation for Clock Tree
The number of perturbation configurations I=5N is huge!
(N is number of merging points)
Compressing State Matrix by Temperature CorrelationCompressing State Matrix by Temperature Correlation
Motivations Spatial and temporal correlation of the temperature values
excludes the need to exhaustively calculate all perturbation combinations
Highly correlated merging points should be perturbed in the same fashion
Solution Clustering merging points based on correlation strength Perform the same perturbation for all points within one cluster
Merging Points Clustering by Temperature CorrelationMerging Points Clustering by Temperature Correlation
Objective Given correlation matrix C of them, a low-rank matrix, N >> K Partition N merging points into K clusters Maximize the correlation strength within each of K clusters
C
Merging Points Clustering by Temperature CorrelationMerging Points Clustering by Temperature Correlation
Objective Given correlation matrix C of them, a low-rank matrix, N >> K Partition N merging points into K clusters
Decide the clustering number K Singular Value Decomposition (SVD) reveal the real rank (K)
information from C
Partition the merging points into K clusters K-Means clustering algorithm is employed.
Low-Rank Approx.C KC
K = 4, N = 70
Reduced from 570 to 54
Structural Reduction & Transient Time AnalysisStructural Reduction & Transient Time Analysis
G0
(MxM)
DG1
(MxM)
G0
(MxM)
DGN
(MxM)
G0
(MxM)
0(MxM)
DG2
(MxM)
G0
(MxM)
0(MxM)
0(MxM)
G0
(MxM)
DG1
(MxM)
G0
(MxM)
DGK
(MxM)
G0
(MxM)
0(MxM)
G0
(mxm)
DG1
(mxm)
G0
(mxm)
DGK
(mxm)
G0
(mxm)
0(mxm)
Cluster based reduction
(SVD + K-Means)
Struct
ural
reduct
ion
[Hao
Yu,
DAC’06]
Transient time analysis
(Back-Euler)
Time domain
Vol
tage
resp
onse
OutlineOutline
Backgrounds and Motivations
Modeling and Problem Formulation
Algorithms
Experimental Results
Conclusions
Experimental SettingsExperimental Settings
Temperature variation profiles obtained by micro-architecture level power-temperature transient simulator [Liao,TCAD’05] with 6 SPEC2000 applications
100 temperature profiles are collected under every 10 million clock cycles
Compare two algorithms: DME method: minimize wire-length for zero-skew
under Elmore delay model with nominal temperature Our PECO: minimize skew under a more accurate
high-order macromodel with temperature variations
Skew DistributionSkew Distribution
Under 100 temperature maps, and PECO reduces worst-skew and the mean skew
Experimental Results (cont.)Experimental Results (cont.) PECO reduces the worst-case skew by up to 5X (i.e., for net r5)
Skew measured in higher-order delay model considering temperature variations for all applications
Skew reduction increases for larger clock nets PECO increases wire-length by less than 1%
Runtime Optimization time of PECO is less than DME Model building time is still long but more accurate
OutlineOutline
Backgrounds and Motivations
Modeling and Problem Formulation
Algorithms
Experimental Results
Conclusions
Conclusions Conclusions
Studied the clock optimization for workload dependent temperature variation
Reduced the worst-case skew by up to 5X with only 1% wire-length overhead compared to best existing method
The methodologies can be extended to handle PVT variations with spatial correlations Other design freedoms such as, floorplanning,
power/ground optimization, etc
Thank you!Thank you!
ACM International Symposium on Physical Design 2007
Hao Yu (graduated), Yu Hu, Chun-Chen Liu
and Lei He
Minimal Skew Clock Embedding Considering Time Variant Temperature Gradient