thermal via allocation for 3d ics considering temporally and spatially variant thermal power hao yu,...

Thermal Via Allocation for 3D ICs Considering Temporally and Spatially Variant Thermal Power

Hao Yu, Yiyu Shi and Lei HeElectrical Engineering Dept.

UCLA, USA

Partially supported by NSF and UC-MICRO fund from IntelPartially supported by NSF and UC-MICRO fund from Intel

Tanay KarnikCircuit Research Lab

Intel, USA

2New Solution for High-performance Integration

2D SoC design has limited density and interconnect performance

Potential solution: 3D Integration [Banerjee-Saraswart:IEEE’01] Fabrication Technologies: Chip-level Wafer Bonding or Die-level Silicon

Epitaxial Growth Inter-layer via plays a crucial role in signaling, power delivery and heat-

removal

Heat Sources

Active Layer

Inter-Layer

Heat-sink

Vias

Heat Sources

3Thermal Challenges in 3D ICs

Temperature increases along third dimension Inter-layer dielectric layers are poor thermal conductors

150c

135c

100c

70c

40c

High temperature affects interconnect and device reliability and leads to variations to timing

Thermal analysis and thermal-aware design for 3D ICs becomes a need

4Via Planning Problem

Motivation Inter-layer vias are good thermal-conductor to remove heat Inter-layer vias take additional chip area and routing resource

Previous work Iterative via planning during placement [Goplen-Sapatnekar:ISPD’05] Multilevel alternating direction via planning during routing [Zhang-

Cong:ICCAD’05] Both use steady-state analysis and assume a maximum-thermal

power, and may lead to over-design

Primary contributions of our work Minimize a thermal violation integral considering transient temperature Develop an efficient sensitivity-driven sequential programming with

use of macromodel

5Outline

Background and Problem Formulation Structured and Parameterized Macromodel Sequential Optimization Experimental Results Conclusions

6Thermal Model Overview

Temperature Voltage state variables (x(t))Thermal-Power Input Current sources (u(t))Thermal conductance Electrical conductance (G)Thermal capacitance Electrical capacitance (C)

Electric and thermal systems can be described in MNA (modified nodal analysis) equation

time domain: frequency domain:

( )( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

and are multi-input/output port matrices

i

T T

dx tGx t C Bu t G sC x s Bu s

dt

y t L x t y s L x s

B L

y

s the selected output response

Via conductance gi and capacitance ci are both proportional to size Ai or density (Ai/a) (a is unit via area) It can be parametrically added into MNA equation

Electric and thermal duality

7Steady State Model and Analysis

Steady-state temperature can be obtained by directly solving a time-invariant linear equation

R

Active-device and inter-dielectric layers are discretized into tiles Tiles connected by thermal resistance Heat sources modeled as time-invariant current sources

8Transient Model and Analysis

Transient temperature can be obtained by solving a time-variant linear equation

RC

Tiles connected by thermal resistance and thermal capacitance

Heat sources modeled as time-variant current sources

9Thermal Power Variation and Analysis

Different workloads and dynamic power management introduces temporally and spatially power variations Thermal power is the runtime

averaging of cycle-accurate power, and is not a constant spatially and temporally

Pow

er

C P U C yc les

M axim u mthe rm a l-p o w er

C yc le -accu ra te p o w er

T ransien t the rm a l-p o w er

nsm s

s

Steady-state analysis needs to assume a maximum thermal power simultaneously for all regions It seldom happens that each part of the chip achieves their maximum

simultaneously, and can result in an over-design

Transient analysis is accurate but time-consuming It calls for more accurate yet efficient transient thermal simulation during the design

automation

10Thermal Violation Integral

Tem

p (C

)

T im e (s)t0 ts1 te1 ts2 te2 tp

T c e ling

T m a x0

( ) max[ ( , ), ]

[ ( , ) ]

p

e

s

t

k k ceiling

t

t

k ceiling

t

f y t T dt

y t T dt

A A

A

Thermal violation is temperature overshoot for a long enough period, so maximum temperature is not a good Figure of Merit (FOM)

Thermal-violation integral as FOM fk(A) is more accurate Time-domain transient temperature (y) integral over defined ceiling

temperature (Tceiling) for a long enough period (t0 ~ tp) at ith tile

FOM f(A) for a group (K) of critical tiles

A is a via density vector1

( ) ( )K

kk

f f

A A

11Problem Formulation

Find a via density vector A to minimize the thermal violation integral under global/local routing congestion constraints

Two keys to efficiently solve this problem Efficient models to transient response, and its first-order and

second-order sensitivity with respect to via density Efficient yet effective mathematic programming

max1

max

min : ( )

. . ,

0 ( ) 1,...

K

ii

i i

f

s t A A

A A i K

A

Global constraint

Local constraint

12Outline


13Macromodel by Moment Matching

large linear network

… …

small linear network

Krylov-subspace based projection can reduce model size and preserve accuracy by matching moments of inputs [Odabasioglu-Celik-Pileggi:TCAD’98] Flat projection does not preserve block matrix structure such as

sparsity Reduced macromodel does not contain sensitivity information for

design automation

14Parameterization (I)

The inserted location is described by adjacent matrix X

The via density (Ai) is parameterized and added into MNA

1 2

34

5 6

78

1 2 3 4 5 6 7 8

1 2

3 4

5 6

7 8

1 -1

0

0

0 1

0

-1

0

0

X(2,6)=

0 0 1 ,1

1 , 1 ,

0 0

[ ( )] ( , , ) ( )

( , , ) ( , , )

where and c

K

i i i Ki

TK K

i i i i

G sC A g sc x A A s Bu t

y A A s L x A A s

g g X c X

Need to separate sensitivity from nominal response

15Parameterization (II)

( ) ( ),

has similar structure

Tap ap ap ap ap ap ap

ap

G sC x B u t y L x

C

1 1

1 1

( ... )1,..., 1( , ) ( )( ) ( )K Ki i i i

K Ki i

x s x s A A

A

Expand state variables x(A1,…AK,s) by Taylor expansion w.r.t. Ai (up to second order) x^(0), x^(1), and x^(2) are nominal values,

first-order and second-order sensitivities

Expanded system has lower-triangular structure

System size is enlarged and needs to be reduced by projection Traditional flat projection can not separate the nominal state variables and

their sensitivities [Li-Pileggi:ICCAD’04] This can be solved by a structure-preserved projection [Yu-He-Tan:BMAS’05]

0

1 1 0

0

1 1 0

2 2 1 1 0

0

0 0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0

0 0 0

K Kap

K K

G

A g G

A g GG

A g G

A g A g G

A g G

(0) (1) (1) (1) (2)0 1 1,1 ,[ , ,..., , ,..., ]ap K K Kx x x x x x

16Structured Projection (I)

Block-diagonally partition the projection matrix by the size of nominal state-variable, first-order sensitivity, and second-order sensitivity

2 2

0 0

1 1

1 1

K K

K K

K K

V V

V V

V V

V V

V V

Use structured projection can result in a reduced triangular system with nominal value and sensitivities to be solved independently ~

0

~ ~

1 1 0

~ ~~

0

~ ~

1 1 0

~ ~ ~

2 2 1 1 0

~ ~

0

0 0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0

0 0 0

K Kap

K K

G

A g G

A g GG

A g G

A g A g G

A g G

~ ~ ~ ~ ~~(0) (1) (1) (1) (2)1 1 1,1 ,

~ ~

[ , ,..., , ,..., ]

has similar structure as

ap K K K

ap ap

x x x x x x

C G

17Structured Projection (II)

Nominal response, and sensitivity can be solved separately and efficiently The reduced model is sparse There is only one LU-factorization of the reduced diagonal

block G0+(1/h)C0

~ ~ ~ ~ ~ ~ ~

~~ ~

1 1( ) ( ) ( ) ( )

( ) ( )

ap ap ap ap ap ap ap

Tap apap

G C x t C x t h B u th h

y t L x t

Generated sensitivities can be used in any gradient based optimization

Time-domain transient response can be solved using Backward-Euler method

18Outline


19Sequential Approximation of Objective Function

The objective function f(A) could be approximated

Find (ΔA) to minimize flp or fqp during each step

01

2

01 1

a first-order expansion sequential linear programming (SLP)

,

a second-order expansion sequential quadratic programming (SQP)

K

lp ii i

K K K

qp i i ji i i ji i j

ff f A

A

f ff f A A A

A A A

1

( ) [ ( , ) ]e

s

tK

i ceilingi t

f y t T dt

A A

The objective function becomes semi-definite when integration is approximated by a discretized summation [Visweswariah:TCAD’00] Sequential programming converges for convex-programming problems, and

still has good convergence in semi-definite problems

20Sensitivity Calculation

(1)

1 1 1

22 2(2)

1 1 1

first-order: ,

second-order:

e e e

s s s

e e e

s s s

t t tK K KT Tkk k i

k k ki i it t t

t t tK K KT Tkk k k

k k ki j i j i jt t t

yf xdt L dt L x dt

A A A

yf xdt L dt L x dt

A A A A A A

Direct sensitivity calculation for objective function

Structured and parameterized reduction provides an efficient calculation of both nominal value and sensitivity The via density vector A can be efficiently updated during each

iteration

The computation cost could be further reduced when an adjoint Lagrangian method is used to calculate sensitivity [Visweswariah:TCAD’00]

21Outline


22Experiment Settings

A modest 3D stacking with 1-heat-sink, 2-die-layer, 2-dielectric-layer is assumed, each extracted as RC mesh interconnected by RC-pair for via

Clock gating is assumed with a period of 250ms

Reduction algorithm assumes SIMO (single-input-multiple-output) reduction when the number of inputs is large

Compare our method (SP-Macro) with Steady-state solution

23Accuracy of Reduced Macromodel

Transient temperature responses of exact and SP-MACRO models at port 3, 18, and 58 of top layer with step-response input The responses of macromodels are visually identical to those exact models

24Optimization Profile by SQP

Temperature reduction at selected location during the procedure of via-allocation by SQP The allocated via results in a transient temperature meeting the targeted

ceiling temperature 52C

25Temperature Map

Temperature maps before and after the via allocation at the top layer The maximum temperature before allocation is about 150C The temperature after allocation meets the targeted ceiling temperature 52C

26Allocated-via and Runtime Comparison

Compared to steady-state solution SP-MACRO has smaller simulation and planning time when

increasing circuit size It reduces the runtime by 126X

SP-MACRO is more accurate to predict the via insertion It reduces the inserted via number by 2.04X

Total/

critical tile

Total via

Constraint

Original/

ceiling TSteady-state

by direct solutionTransient by

SP-MACRO

Solve-dc(s)

Solve-tran(s)

Allo-via Redu-ckt(s)

Solve-sens(s)

Qp/lp- plan (s)

Allo-via

256/30 704 120/40 1.64 10.27 440 0.12 0.19 0.15 360

1024/60 2818 120/40 12.62 130.12 2281 1.08 0.96 0.42 1609

4096/80 5980 140/50 341.13 3872.98 5620 12.92 6.28 1.92 3217

8192/100 8218 140/50 7809.12 NA 8021 46.27 16.92 8.98 4382

16384/120 18000 160/60 NA NA 17600 120.89 101.23 23.65 9280

32768/200 24000 160/60 NA NA 23800 262.12 257.21 42.75 11660

27Conclusions

Via planning based on the transient thermal analysis reduces via umber by 2.04x compared to the steady-state thermal analysis

An efficient via planning algorithm is developed Structured and parameterized model reduction provides

both nominal values and sensitivities Sequential linear/quadratic programming minimizes the

thermal-violation integral

SP-MACRO is further extended for Simultaneous power and thermal integrity driven

via planning [Yu-Ho-He:ICCAD’06]

thermal via allocation for 3d ics considering temporally and spatially variant thermal power hao yu,...

Documents

timing n thermal analysis

thermal resistance

thermal duality slide

thermal challenges

thermal systems

problem formulation

maximumthermal power

maximum thermal power