thermal via allocation for 3d ics considering temporally and spatially variant thermal power hao yu,...
TRANSCRIPT
Thermal Via Allocation for 3D ICs Considering Temporally and Spatially Variant Thermal Power
Hao Yu, Yiyu Shi and Lei HeElectrical Engineering Dept.
UCLA, USA
Partially supported by NSF and UC-MICRO fund from IntelPartially supported by NSF and UC-MICRO fund from Intel
Tanay KarnikCircuit Research Lab
Intel, USA
2New Solution for High-performance Integration
2D SoC design has limited density and interconnect performance
Potential solution: 3D Integration [Banerjee-Saraswart:IEEE’01] Fabrication Technologies: Chip-level Wafer Bonding or Die-level Silicon
Epitaxial Growth Inter-layer via plays a crucial role in signaling, power delivery and heat-
removal
Heat Sources
Active Layer
Inter-Layer
Heat-sink
Vias
Heat Sources
3Thermal Challenges in 3D ICs
Temperature increases along third dimension Inter-layer dielectric layers are poor thermal conductors
150c
135c
100c
70c
40c
High temperature affects interconnect and device reliability and leads to variations to timing
Thermal analysis and thermal-aware design for 3D ICs becomes a need
4Via Planning Problem
Motivation Inter-layer vias are good thermal-conductor to remove heat Inter-layer vias take additional chip area and routing resource
Previous work Iterative via planning during placement [Goplen-Sapatnekar:ISPD’05] Multilevel alternating direction via planning during routing [Zhang-
Cong:ICCAD’05] Both use steady-state analysis and assume a maximum-thermal
power, and may lead to over-design
Primary contributions of our work Minimize a thermal violation integral considering transient temperature Develop an efficient sensitivity-driven sequential programming with
use of macromodel
5Outline
Background and Problem Formulation Structured and Parameterized Macromodel Sequential Optimization Experimental Results Conclusions
6Thermal Model Overview
Temperature Voltage state variables (x(t))Thermal-Power Input Current sources (u(t))Thermal conductance Electrical conductance (G)Thermal capacitance Electrical capacitance (C)
Electric and thermal systems can be described in MNA (modified nodal analysis) equation
time domain: frequency domain:
( )( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
and are multi-input/output port matrices
i
T T
dx tGx t C Bu t G sC x s Bu s
dt
y t L x t y s L x s
B L
y
s the selected output response
Via conductance gi and capacitance ci are both proportional to size Ai or density (Ai/a) (a is unit via area) It can be parametrically added into MNA equation
Electric and thermal duality
7Steady State Model and Analysis
Steady-state temperature can be obtained by directly solving a time-invariant linear equation
R
Active-device and inter-dielectric layers are discretized into tiles Tiles connected by thermal resistance Heat sources modeled as time-invariant current sources
8Transient Model and Analysis
Transient temperature can be obtained by solving a time-variant linear equation
RC
Tiles connected by thermal resistance and thermal capacitance
Heat sources modeled as time-variant current sources
9Thermal Power Variation and Analysis
Different workloads and dynamic power management introduces temporally and spatially power variations Thermal power is the runtime
averaging of cycle-accurate power, and is not a constant spatially and temporally
Pow
er
C P U C yc les
M axim u mthe rm a l-p o w er
C yc le -accu ra te p o w er
T ransien t the rm a l-p o w er
nsm s
s
Steady-state analysis needs to assume a maximum thermal power simultaneously for all regions It seldom happens that each part of the chip achieves their maximum
simultaneously, and can result in an over-design
Transient analysis is accurate but time-consuming It calls for more accurate yet efficient transient thermal simulation during the design
automation
10Thermal Violation Integral
Tem
p (C
)
T im e (s)t0 ts1 te1 ts2 te2 tp
T c e ling
T m a x0
( ) max[ ( , ), ]
[ ( , ) ]
p
e
s
t
k k ceiling
t
t
k ceiling
t
f y t T dt
y t T dt
A A
A
Thermal violation is temperature overshoot for a long enough period, so maximum temperature is not a good Figure of Merit (FOM)
Thermal-violation integral as FOM fk(A) is more accurate Time-domain transient temperature (y) integral over defined ceiling
temperature (Tceiling) for a long enough period (t0 ~ tp) at ith tile
FOM f(A) for a group (K) of critical tiles
A is a via density vector1
( ) ( )K
kk
f f
A A
11Problem Formulation
Find a via density vector A to minimize the thermal violation integral under global/local routing congestion constraints
Two keys to efficiently solve this problem Efficient models to transient response, and its first-order and
second-order sensitivity with respect to via density Efficient yet effective mathematic programming
max1
max
min : ( )
. . ,
0 ( ) 1,...
K
ii
i i
f
s t A A
A A i K
A
Global constraint
Local constraint
12Outline
Background and Problem Formulation Structured and Parameterized Macromodel Sequential Optimization Experimental Results Conclusions
13Macromodel by Moment Matching
large linear network
… …
small linear network
Krylov-subspace based projection can reduce model size and preserve accuracy by matching moments of inputs [Odabasioglu-Celik-Pileggi:TCAD’98] Flat projection does not preserve block matrix structure such as
sparsity Reduced macromodel does not contain sensitivity information for
design automation
14Parameterization (I)
The inserted location is described by adjacent matrix X
The via density (Ai) is parameterized and added into MNA
1 2
34
5 6
78
1 2 3 4 5 6 7 8
1 2
3 4
5 6
7 8
1 -1
0
0
0 1
0
-1
0
0
X(2,6)=
0 0 1 ,1
1 , 1 ,
0 0
[ ( )] ( , , ) ( )
( , , ) ( , , )
where and c
K
i i i Ki
TK K
i i i i
G sC A g sc x A A s Bu t
y A A s L x A A s
g g X c X
Need to separate sensitivity from nominal response
15Parameterization (II)
( ) ( ),
has similar structure
Tap ap ap ap ap ap ap
ap
G sC x B u t y L x
C
1 1
1 1
( ... )1,..., 1( , ) ( )( ) ( )K Ki i i i
K Ki i
x s x s A A
A
Expand state variables x(A1,…AK,s) by Taylor expansion w.r.t. Ai (up to second order) x^(0), x^(1), and x^(2) are nominal values,
first-order and second-order sensitivities
Expanded system has lower-triangular structure
System size is enlarged and needs to be reduced by projection Traditional flat projection can not separate the nominal state variables and
their sensitivities [Li-Pileggi:ICCAD’04] This can be solved by a structure-preserved projection [Yu-He-Tan:BMAS’05]
0
1 1 0
0
1 1 0
2 2 1 1 0
0
0 0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0
0 0 0
K Kap
K K
G
A g G
A g GG
A g G
A g A g G
A g G
(0) (1) (1) (1) (2)0 1 1,1 ,[ , ,..., , ,..., ]ap K K Kx x x x x x
16Structured Projection (I)
Block-diagonally partition the projection matrix by the size of nominal state-variable, first-order sensitivity, and second-order sensitivity
2 2
0 0
1 1
1 1
K K
K K
K K
V V
V V
V V
V V
V V
Use structured projection can result in a reduced triangular system with nominal value and sensitivities to be solved independently ~
0
~ ~
1 1 0
~ ~~
0
~ ~
1 1 0
~ ~ ~
2 2 1 1 0
~ ~
0
0 0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0
0 0 0
K Kap
K K
G
A g G
A g GG
A g G
A g A g G
A g G
~ ~ ~ ~ ~~(0) (1) (1) (1) (2)1 1 1,1 ,
~ ~
[ , ,..., , ,..., ]
has similar structure as
ap K K K
ap ap
x x x x x x
C G
17Structured Projection (II)
Nominal response, and sensitivity can be solved separately and efficiently The reduced model is sparse There is only one LU-factorization of the reduced diagonal
block G0+(1/h)C0
~ ~ ~ ~ ~ ~ ~
~~ ~
1 1( ) ( ) ( ) ( )
( ) ( )
ap ap ap ap ap ap ap
Tap apap
G C x t C x t h B u th h
y t L x t
Generated sensitivities can be used in any gradient based optimization
Time-domain transient response can be solved using Backward-Euler method
18Outline
Background and Problem Formulation Structured and Parameterized Macromodel Sequential Optimization Experimental Results Conclusions
19Sequential Approximation of Objective Function
The objective function f(A) could be approximated
Find (ΔA) to minimize flp or fqp during each step
01
2
01 1
a first-order expansion sequential linear programming (SLP)
,
a second-order expansion sequential quadratic programming (SQP)
K
lp ii i
K K K
qp i i ji i i ji i j
ff f A
A
f ff f A A A
A A A
1
( ) [ ( , ) ]e
s
tK
i ceilingi t
f y t T dt
A A
The objective function becomes semi-definite when integration is approximated by a discretized summation [Visweswariah:TCAD’00] Sequential programming converges for convex-programming problems, and
still has good convergence in semi-definite problems
20Sensitivity Calculation
(1)
1 1 1
22 2(2)
1 1 1
first-order: ,
second-order:
e e e
s s s
e e e
s s s
t t tK K KT Tkk k i
k k ki i it t t
t t tK K KT Tkk k k
k k ki j i j i jt t t
yf xdt L dt L x dt
A A A
yf xdt L dt L x dt
A A A A A A
Direct sensitivity calculation for objective function
Structured and parameterized reduction provides an efficient calculation of both nominal value and sensitivity The via density vector A can be efficiently updated during each
iteration
The computation cost could be further reduced when an adjoint Lagrangian method is used to calculate sensitivity [Visweswariah:TCAD’00]
21Outline
Background and Problem Formulation Structured and Parameterized Macromodel Sequential Optimization Experimental Results Conclusions
22Experiment Settings
A modest 3D stacking with 1-heat-sink, 2-die-layer, 2-dielectric-layer is assumed, each extracted as RC mesh interconnected by RC-pair for via
Clock gating is assumed with a period of 250ms
Reduction algorithm assumes SIMO (single-input-multiple-output) reduction when the number of inputs is large
Compare our method (SP-Macro) with Steady-state solution
23Accuracy of Reduced Macromodel
Transient temperature responses of exact and SP-MACRO models at port 3, 18, and 58 of top layer with step-response input The responses of macromodels are visually identical to those exact models
24Optimization Profile by SQP
Temperature reduction at selected location during the procedure of via-allocation by SQP The allocated via results in a transient temperature meeting the targeted
ceiling temperature 52C
25Temperature Map
Temperature maps before and after the via allocation at the top layer The maximum temperature before allocation is about 150C The temperature after allocation meets the targeted ceiling temperature 52C
26Allocated-via and Runtime Comparison
Compared to steady-state solution SP-MACRO has smaller simulation and planning time when
increasing circuit size It reduces the runtime by 126X
SP-MACRO is more accurate to predict the via insertion It reduces the inserted via number by 2.04X
Total/
critical tile
Total via
Constraint
Original/
ceiling TSteady-state
by direct solutionTransient by
SP-MACRO
Solve-dc(s)
Solve-tran(s)
Allo-via Redu-ckt(s)
Solve-sens(s)
Qp/lp- plan (s)
Allo-via
256/30 704 120/40 1.64 10.27 440 0.12 0.19 0.15 360
1024/60 2818 120/40 12.62 130.12 2281 1.08 0.96 0.42 1609
4096/80 5980 140/50 341.13 3872.98 5620 12.92 6.28 1.92 3217
8192/100 8218 140/50 7809.12 NA 8021 46.27 16.92 8.98 4382
16384/120 18000 160/60 NA NA 17600 120.89 101.23 23.65 9280
32768/200 24000 160/60 NA NA 23800 262.12 257.21 42.75 11660
27Conclusions
Via planning based on the transient thermal analysis reduces via umber by 2.04x compared to the steady-state thermal analysis
An efficient via planning algorithm is developed Structured and parameterized model reduction provides
both nominal values and sensitivities Sequential linear/quadratic programming minimizes the
thermal-violation integral
SP-MACRO is further extended for Simultaneous power and thermal integrity driven
via planning [Yu-Ho-He:ICCAD’06]