02-thermalmodelmanag3dmpsocs-atienza [modo de …ecofac2010.irisa.fr/cours-atienza-3.pdf · prof....
Post on 25-Apr-2020
1 Views
Preview:
TRANSCRIPT
12/03/2010
1
Thermal Modeling and Management for 3D MPSoCs with Active Cooling
P f D id Ati Al
© ESL/EPFL 2010
Prof. David Atienza Alonso
ECOFAC 2010, Lannion, 29/03 – 2/04 2010
Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering
Advantages of 3D vs. 2D Chips
Promises • Reduce average length of on-chip global wires• Increase number of devices
reachable in given time budget
• Greatly facilitate heterogeneous integration (e g logic-DRAM stacks)
© ESL/EPFL 20102
(e.g. logic DRAM stacks)
Samsung Wafer Stack Package (WSP) memory
[Figures: Ray Yarema, Fermilab]
Latest chips increase power density
Non-uniform hot-spots in 2D chips [Sun, 1.8 GHz Sparc v9
Microproc] In 3D chips, heat affects several
layers! (even more “cool” components)
Thermal-Reliability Issues in 3D Chips
© ESL/EPFL 2010
[Sun, Niagara
BroadbandProcessor]
Courtesy:[IBM
and Irvine Sens.]
3
Latest chips increase power density
Non-uniform hot-spots in 2D chips [Sun, 1.8 GHz Sparc v9
Microproc] In 3D chips, heat affects several
layers! (even more “cool” components)
Thermal-Reliability Issues in 3D Chips
© ESL/EPFL 2010
[Sun, Niagara
BroadbandProcessor]
Courtesy:[IBM
and Irvine Sens.]
4
Higher chances of thermal wear-outs
and very short lifetimes!
12/03/2010
2
0 2000 4000 6000 8000 100000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000 Layer 2
length (um)
wid
th (
um)
400
405
410
415
420
7000
8000
9000
10000 Layer 3
404
406
2nd Tier
Run-Time Heat Spreading in 3D Chips
5-tier 3D stack: 10 heat sources and sensors
Inject between 4W – 1.5W
© ESL/EPFL 2010
0 2000 4000 6000 8000 100000
1000
2000
3000
4000
5000
6000
7000
length (um)
wid
th (
um)
394
396
398
400
402
0 2000 4000 6000 8000 100000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000 Layer 4
length (um)
wid
th (
um)
393
394
395
396
397
398
399
400
401
0 2000 4000 6000 8000 100000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000 Layer 5
length (um)
wid
th (
um)
390
391
392
393
394
395
396
397
3rd Tier
4th Tier
5th Tier
Large and non-uniform heat propagation!
(up to 130º C on top tier)5
NanoTera CMOSAIC Project: Design of 3D MPSoCs with Advanced Cooling
3D systems require novel electro-thermal co-design• Academic partners: EPFL and ETHZ• Industrial: IBM Zürich
© ESL/EPFL 20106
NanoTera CMOSAIC Project: Design of 3D MPSoCs with Advanced Cooling
3D stacked MPSoC chips: microchannels etched on back side to circulate liquid coolant
S t
3D systems require novel electro-thermal co-design• Academic partners: EPFL and ETHZ• Industrial: IBM Zürich
© ESL/EPFL 2010
System Level Active
Cooling Manager
(3D heat flow
prediction)
adjustmentof coolant flux
task scheduling andexecution control
7
Outline
Introduction
3D chip thermal modeling framework
Validation of 3D thermal model
Liquid cooling modeling
Li id li d l lid ti
© ESL/EPFL 2010
Liquid cooling model validation
Close-loop 3D MPSoCs thermal management with active cooling
Experiments and conclusions
8
12/03/2010
3
Compact RC-Based Tier Thermal Model
q qb_right
qb_left
qb_top qb_back
Gate-level thermal model
RC Network of
Si/metal layer cells
qbTbq jf
ffjbj
0
6
1
© ESL/EPFL 20109
qb_bottomqb_front
qb_top = htopA(Ta-Ttop)
qb_bottom = hbottomA(Ta-Tbottom)
I•
(qbi)I
I+1•
(qbi)I+1
face i
I-1•
2D tier modeled as heat flux moving between adjacent cells
Convective boundary conditions between layers in tier
cells
[Atienza et al., TODAES 2007]
Multi-level execution for thermal convergence in 3D
• Local (2D-tier), liquid channels and global (3D) propagation
Evaluate local temperature for each cell
erg
ence
ns
Complete 3D Chip Thermal Modeling
© ESL/EPFL 2010
Tie
r-le
vel c
onve
N it
erat
ion
Feedbacktemperature Update with neighbour
temperature difusion
10
Go to next tier or microchannel
[Ayala et al., NanoNet 2009]
3D Chip Thermal Library Validation
Extensible set of layers in 3D stack• up to 9 tiers and heat spreader
• Pre-defined layers:
Silicon, copper (10 layers), glue, overmold, interposer, bump
Configurable nr. of cells and iterations per tier
© ESL/EPFL 2010
• Initially 10ms thermal interval (1000 iterat./tier)
Multi-tier test chip manufactured at EPFL:
11
3D Chip Thermal Library Validation
Extensible set of layers in 3D stack• up to 9 tiers and heat spreader
• Pre-defined layers:
Silicon, copper (10 layers), glue, overmold, interposer, bump
Configurable nr. of cells and iterations per tier
© ESL/EPFL 2010
• Initially 10ms thermal interval (1000 iterat./tier)
Multi-tier test chip manufactured at EPFL:• Three types of tiers
12
12/03/2010
4
3D Thermal Library Validation: Creating Various 3D Thermal Maps
Flexibility for thermal characterization
© ESL/EPFL 2010 13
Flexibility for thermal characterization
3D Thermal Library Validation: Creating Various 3D Thermal Maps
© ESL/EPFL 2010 14
3D Thermal Library Validation: Creating Various 3D Thermal Maps
Flexibility for thermal characterization
© ESL/EPFL 2010
10 heat sources and sensors per layer, accesible to be simultaneously activated
15
3D Thermal Library Validation: Correlation with 5-Tier 3D Stack
242526272829303132
0 200 400 600 800 1000 1200
Se
ns
or V
olt
ag
e (
mV
)
3D Chip, EPFL, Layer 3 characterizationBlue Curve: 3D current -heat model for D8Pink curve: Heater current measured in D8
Dev8
D7HD8S
© ESL/EPFL 2010
0 200 400 600 800 1000 1200
Heater Current (mA), applied to Dev 7
24
26
28
30
32
34
36
38
0 200 400 600 800 1000 1200 1400
Sen
sor
Vo
ltag
e (m
V)
Heater Current (mA), applied to Dev 7
3D Chip, EPFL, multi-tier characterizationBue/Pink Curve: D7 (tier 1) and D8 (tier 4)Red Curve: 3D current-heat model for D8
Dev6
Dev7
Div6_Iheat7
16
12/03/2010
5
3D Thermal Library Validation: Correlation with 5-Tier 3D Stack
242526272829303132
0 200 400 600 800 1000 1200
Se
ns
or V
olt
ag
e (
mV
)
3D Chip, EPFL, Layer 3 characterizationBlue Curve: 3D current -heat model for D8Pink curve: Heater current measured in D8
Dev8
D7HD8S
© ESL/EPFL 2010
0 200 400 600 800 1000 1200
Heater Current (mA), applied to Dev 7
24
26
28
30
32
34
36
38
0 200 400 600 800 1000 1200 1400
Sen
sor
Vo
ltag
e (m
V)
Heater Current (mA), applied to Dev 7
3D Chip, EPFL, multi-tier characterizationBue/Pink Curve: D7 (tier 1) and D8 (tier 4)Red Curve: 3D current-heat model for D8
Dev6
Dev7
Div6_Iheat7
17
Variations of less than 1.5% between 3D stack measurements and new 3D thermal model
Modeling Through Silicon Vias (TSVs) in 3D Stacks
TSVs:• Size: 5-10um x 10-100um• TSVs change resistivity of interlayer material (IM)
Figure: LSM-EPFL
© ESL/EPFL 2010
Modeling Granularities:1. Homogeneous
distribution, one R value for the IM
2. Different R value per unit (core, cache, etc.)
3. Exact locations of TSVs
• Higher accuracy• Higher complexity
18
Source: IBM Zürich and Y.Heights
TSV Modeling Accuracy in 3D Stacks
© ESL/EPFL 2010 19
Chosen to model TSV groups in localized positions of 3D MPSoCs
Liquid Flux Model for Laminar Flow
Local junction temperature modeled as RC network:
Rtot = Rcond + Rconv + Rheat
Rtot = 1/(Gsi/t + 1/Rb) + A/(bAt) + A/(VPcp)
Thermal resist. of Si
Chip back-side temperature
Heat source
Si b Heating Fl t d
© ESL/EPFL 2010
Thermal resistance of
wiring
temperature
T1
q1
q2
P1
P2
Si base thickness
Heating area Total area
Flow rate and density
Dependence of thermal resistance in liquid flux modeled as a quadratic form• Variable value of coolant flux (Φ)
ΔRheat ≈ aΦ + bΦ2 ; b << a
[Atienza et al., THERMINIC ’09 and DATE ‘10] 20
12/03/2010
6
3D Thermal Model with Liquid Cooling
New set of layers in 3D stack• 3D stack (up to 9 tiers)
• 1 microchannel and coolant flow per tier
5-tier stack with microchannels and manifold cooling seal manufactured at IBM/EPFL• Enables different multi-tier liquid flux injection
© ESL/EPFL 2010
PCB
Micro-HeaterLiquid
Micro-Channels
q j
21
[Courtesy: IBM Zürich Labs]
Front-side
Manufacturing of 5-Tier 3D Test Chip with Liquid Channels in Multiple Tiers
Back-side
© ESL/EPFL 2010 22
Adding multi-tier liquid cooling in-/out-lets
[Courtesy: IBM Zürich Labs]
Multi-tier active cooling technology feasible for
3D-stacked chips
Correlation Results: Liquid Cooling and 3D Heat Transfer
Temperature evolution at the junction (Tj)• Tested range: 0.015 to 0.15 L/min
• Similar accuracy results at different channels
q1q2 P
2
© ESL/EPFL 2010
Avg Max temp Error= 0.6%
T1 P1
23
q1q2 P
2Correlation Results:
Liquid Cooling and 3D Heat Transfer
Temperature evolution at the junction (Tj)• Tested range: 0.015 to 0.15 L/min
• Similar accuracy results at different channels
© ESL/EPFL 2010
T1 P1
Avg Max temp Error= 0.6%
Variations of less than 1% between measurements and RC-based 3D thermal model with liquid cooling
24
12/03/2010
7
Complete 3D Chip Thermal Modeling Flow with Liquid Cooling
Scheduler (Reactive Proactive)
Inputs: • Workload information
• Floorplan, TSV areas, package• Temperature (for dynamic policies) Power Manager (DPM)
Inputs: • Workload information
• Activity of cores
Inputs: • Power trace for each unit
© ESL/EPFL 2010
Scheduler (Reactive, Proactive)
3D Thermal Simulator w. Liquid Cooling based on EPFL-IBM 3D chips
(Integrated within internal HotSpot tool version)
Power trace for each unit• Floorplan, package and die properties
(Niagara-1), TSV area percentage/distribution• Flow rate
Transient Temperature Response for Each Unit
25
sniffer
sniffersniffer
iff
sniffer
I/O
SRAM SRAM
I/O
CPU
SRAM
CPU
CPUCPU
Multi-Proc. OS + DVFS + Task Migration
Sw app 1 ... Sw app NZeroZero--delaydelayMPSoCMPSoC
architecturearchitecturesimulationsimulation
Run-Time HW/SW Thermal Modeling Framework for 3D Chips
Exploitation of both hardware and software benefits
Energy of 2D components
© ESL/EPFL 2010
sniffer
sniffer
sniffersniffer
sniffersnifferMPSoC Behavior
Emulation on FPGA
Temp. (T) of 2D components
Software Thermal
Model
Host PC
si sisi
sisi
sisi
sisi
cu cucucucu
standardstandard Ethernet Ethernet connectionconnection & & dedicateddedicated HW monitorHW monitorDetailedDetailed
thermalthermalanalysisanalysis of of 2D 2D MPSoCMPSoC
layoutlayout
simulationsimulation p
26[D. Atienza et al., TODAES 2007]
sniffer
sniffersniffer
iff
sniffer
I/O
SRAM SRAM
I/O
CPU
SRAM
CPU
CPUCPU
Run-Time HW/SW Thermal Modeling Framework for 3D Chips
Multi-Proc. OS + DVFS + Task Migration
Sw app 1 ... Sw app N
Energy of 3D components
Exploitation of both hardware and software benefits
ZeroZero--delaydelayMPSoCMPSoC
architecturearchitecturesimulationsimulation
© ESL/EPFL 2010
3D StackThermal Model
sniffer
sniffer
sniffersniffer
sniffersnifferMPSoC Behavior
Emulation on FPGA
standardstandard Ethernet Ethernet connectionconnection & & dedicateddedicated HW monitorHW monitor
1st Tier
Nth Tier
Host PC
Temp. of3D components
componentssimulationsimulation
[D. Atienza, THERMINIC 2009] 27
Thermal Management for 3D-MPSoCs with Liquid Cooling
Active-Adapt3D: Combined policy manager (Best-Paper Award at IEEE/IFIP VLSI-SoC 2009)
• Predictive, floorplan-based task assignment and DVFS
• Close-loop variable liquid cooling control T ≥ 80°C Increment flow rate ; T < 80°C Decrement
P li b li d ti l ti l
© ESL/EPFL 2010
Policy can be applied reactively or proactively
28
ARMA‐Based
Predictor
Flow Rate Tuner
Thermal Sensors
Temperature Measurements
Temperature Forecast
System Temperature Dynamics
REACTIVE
PROACTIVE
12/03/2010
8
Adaptive Thermal-Aware Task Assignment Policy for 3D MPSoCs
Cores on layers closer to the heat sink can be cooled faster in comparison to cores further away
Adapt-3D assigns a thermal index ( ) to each core in order to distinguish the location of the cores• Higher Core more prone to hot spotsi
© ESL/EPFL 2010
Higher Core more prone to hot spotsi
321
For cores at locations 1, 2 and 3:
Chip‐1
Chip‐2 2
3
29[Coskun and Atienza, DATE ‘09]
Adaptive Thermal-Aware Task Assignment Policy for 3D MPSoCs
WPP tt 1
preferredavginitinc TTW if
1
Probability of receiving workload at time t:
Weight:
Cool core
© ESL/EPFL 2010
avgpreferredinit TTW
preferredavgiinitdec
preferredavgi
initinc
TTW
WW
if
Hot core
E.g., 80oC
For each core
Measured by sensors
Empirical constants
30
Target 3D systems based on 3D version Sun UltraSPARC T1 • Power values and workloads from real traces measured in Sun
platforms (multimedia players, web servers, databases, etc.)
Cores and caches in separate layers
Experiments 3D Thermal Management: 3D MPSoCs with Microchannels
© ESL/EPFL 2010
Channels:
Width 400um, Depth 250um.
Four flow rate settings, default
at 15ml/min.31
(EXP1-2) (EXP3) (EXP4)
Thermal Management for 3D Chips: Active-Adapt3D Comparisons
Predictive task scheduling, active cooling and floorplan-aware DVFS achieves less than 5% hotspots
© ESL/EPFL 2010
Promising figures for thermal control in 3D-MPSoCs32
12/03/2010
9
Thermal Management in 3D Chips: Active-Adapt3D Comparisons
Variable multi-tier flow control useful for 3D systems with 3+ layers.
Proactive thermal management achieves:• 75% reduction in spatial gradients on average -- for fixed flow rate
• 97% reduction in spatial gradients on average -- for variable flow rate
© ESL/EPFL 2010
*LC: Multi-tier variable liquid
cooling
33
Cooling power savings up to 67% to worst-case flux
Complexity of coming 3D MPSoC chips requires novelthermal modeling approaches• Application of simple RC-based methods demonstrated,
validated with 3D test chip
• Initial model of liquid cooling channels in 3D chips Simple RC laminar flow model, works well with variable liquid
fluxes (errors of less than 2%)
Conclusions
© ESL/EPFL 2010
fluxes (errors of less than 2%)
Integrated the compact model into custom HotSpot tool
New thermal management: feedback controller adjusts flow rate to allowed temperature with job assignment and DVFS• Proactive control improves the hot spot reduction to 95% for
systems with variable flow rates, and reduces thermal variations
• Dynamic flow rate adjustment is helpful in reducing the energy cost of the pump and overall system (67% power savings)
34
Key References and Bibliography
Thermal modeling and FPGA-based emulation• “Emulation-Based Transient Thermal Modeling of 2D/3D Systems-on-Chip with
Active Cooling”, David Atienza, Proc. of the 15th International Workshop on Thermal Investigations of ICs and Systems (THERMINIC ‘09), IEEE Press, Belgium, October, 2009.
Thermal management for 3D MPSoCs• “Energy-Efficient Variable-Flow Liquid Cooling in 3D Stacked Architectures”,
Ayse K. Coskun, David Atienza, et al., Proc. of Design, Automation and Test in
© ESL/EPFL 2010
y C , , , g ,Europe Conference (DATE ‘10), Dresden, Germany, March 2010,
• “Modeling and Dynamic Management of 3D Multicore Systems with Liquid Cooling”, Ayse K. Coskun, et al., Proc. of 17th Annual IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC ‘09), Brazil, October 2009.(Best Paper Award)
• “Through Silicon Via-Based Grid for Thermal Control in 3D Chips”, Jose L. Ayala, et al., Proc. of the 4th International ICST Conference on Nano-Networks (Nano-Net ‘09), Switzerland, October 2009.
• “Dynamic Thermal Management in 3D Multicore Architectures”, Ayse K. Coskun, et al., Proc. of Design, Automation and Test in Europe (DATE ‘09), France, April 2009.
Nano-Tera.ch Swiss Engineering Programme
© ESL/EPFL 2010
QUESTIONS ?
European Commission
Swiss National Science Foundation
top related