1 isca 2004 tutorial thermal issues for temperature-aware computer systems saturday, june 19 th...
TRANSCRIPT
1
ISCA 2004 Tutorial
Thermal Issues for Temperature-Aware Computer
Systems
Saturday, June 19th
8:00am - 5:00pm
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
2
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Presenters:
Kevin Skadron ([email protected])CS Department, University of Virginia
Mircea Stan ([email protected])ECE Department, University of Virginia
David Brooks ([email protected])CS Department, Harvard University
Antonio Gonzalez ([email protected])UPC-Barcelona, and Intel Barcelona Research Center
Lev Finkelstein ([email protected])Intel Haifa
3
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Overview
1. Motivation (Kevin)2. Thermal issues (Kevin)3. Power modeling (David)4. Thermal management (David)5. Optimal DTM (Lev)6. Clustering (Antonio)7. Power distribution (David)8. What current chips do (Lev)9. HotSpot (Kevin)
4
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Overview
1. Motivation (Kevin)2. Thermal issues (Kevin)3. Power modeling (David)4. Thermal management (David)5. Optimal DTM (Lev)6. Clustering (Antonio)7. Power distribution (David)8. What current chips do (Lev)9. HotSpot (Kevin)
5
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Motivation
• Power consumption: first-order design constraint unconstrained power is a theoretical max peak (inst.) power is limiting power delivery (dI/dt) sustained power limits thermal design/packaging max sustained power: thermal “virus”
same as thermal design power average active power and idle power limit mobile
battery life, etc. Common fallacy: instantaneous power temperature
• Power-density is increasing even faster: thermal effects become more problematic.
Moore’s Law: exponential increase Need Power/Temperature-aware computing!
6
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Power density
From PACT 2000 keynote; source: Intel website
But this curve is flattening
7
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Power-aware figures of merit
• Power (P): battery time (mobile) packaging (high-performance)
• Energy (PD): battery life (mobile) fundamental limits (kT)
• Energy-delay (PD^2): performance and low power
• Energy-delay^2 (PD^3): emphasis on performance
Power-aware low powerSimilar to “old” VLSI complexity (A, AD, AD^2)None of these are appropriate for thermal
Refs: R. Gonzales et al. “Supply and threshold voltage scaling for low power CMOS”, JSSC, Aug. 1997
A. Martin et al. “Design of an Asynchronous MIPS R3000”, ARVLSI’97J. Ullman, “Computational aspects of VLSI”, CS Press, 1984
8
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Cooking-aware computing
Boiling water will come soon
9
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Power and temperature are BAD
• and can be EVIL
Source: Tom’s Hardware Guidehttp://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html
10
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Overview
1. Motivation (Kevin)2. Thermal issues (Kevin)3. Power modeling (David)4. Thermal management (David)5. Optimal DTM (Lev)6. Clustering (Antonio)7. Power distribution (David)8. What current chips do (Lev)9. HotSpot (Kevin)
11
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Thermal issues
Temperature affects:• Circuit performance• Circuit power (leakage)• IC reliability• IC and system packaging cost• Environment
12
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Performance and leakage
Temperature affects :
• Transistor threshold and mobility
• Subthreshold leakage, gate leakage
• Ion, Ioff, Igate, delay
• ITRS: 85°C for high-performance, 110°C for embedded!
IonNMOS
Ioff
13
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Temperature-aware circuits
• Robustness constraint: sets Ion/Ioff ratio
• Robustness and reliability: Ion/Igate ratio
Idea: keep ratios constant with T: trade leakage for performance!
Ref: “Ghoshal et al. “Refrigeration Technologies…”, ISSCC 2000Garrett et al. “T3…”, ISCAS 2001
14
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Resulting performance
25% - 30% extra performance (110oC to 0oC)
regularTAC
15
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Reliability
The Arrhenius Equation: MTF=A*exp(Ea/K*T)
MTF: mean time to failure at TA: empirical constantEa: activation energy
K: Boltzmann’s constantT: absolute temperature
Failure mechanisms:Die metalization (Corrosion, Electromigration, Contact spiking)Oxide (charge trapping, gate oxide breakdown, hot electrons)Device (ionic contamination, second breakdown, surface-charge)Die attach (fracture, thermal breakdown, adhesion fatigue)Interconnect (wirebond failure, flip-chip joint failure)Package (cracking, whisker and dendritic growth, lid seal failure)
Most of the above increase with T (Arrhenius)Notable exception: hot electrons are worse at low temperatures
16
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Arrhenius or Erroneous?
“Hot” issue in thermal community: is the Arrhenius equation correct/relevant?
C. Lasance (Philips): “Erroneous” equation• Claim: what really matters are thermal gradients
in space and time, thermal cycling
• Will not solve the dispute here!• Agreement: thermal issues are key for reliability,
whether static or dynamic
Another famous quote: “We have a headache with Arrhenius” (T. Okada, Sony, when asked about reliability prediction methods)
17
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Packaging cost
From Cray (local power generator and refrigeration)…
Source: Gordon Bell, “A Seymour Cray perspective”http://www.research.microsoft.com/users/gbell/craytalk/
18
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Packaging cost
To today…• Grid computing: power plants co-located near
compute farms• IBM S/390:refrigeration
Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”IBM Journal of R&D
19
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
IBM S/390 refrigeration
• Complex and expensive
Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”IBM Journal of R&D
20
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
IBM S/390 processor packaging
Processor subassembly: complex!C4: Controlled Collapse Chip Connection (flip-chip)
Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”IBM Journal of R&D
21
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Intel Itanium packaging
Complex and expensive (note heatpipe)
Source: H. Xie et al. “Packaging the Itanium Microprocessor”Electronic Components and Technology Conference 2002
22
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
P4 packaging
• Simpler, but still…
Source: Intel web site
23
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Environment
• Environment Protection Agency (EPA): computers consume 10% of commercial electricity consumption– This incl. peripherals, possibly also manufacturing– A DOE report suggested this percentage is much lower– No consensus, but it’s still a lot
• Equivalent power (with only 30% efficiency) for AC• CFCs used for refrigeration• Lap burn• Fan noise
24
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Heat mechanisms
• Conduction• Convection• Radiation• Phase change• Heat storage
25
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Conduction
• Similar to electrical conduction (e.g. metals are good conductors)• Heat flow from high energy to low energy• Microscopic (vibration, adjacent molecules, electron transport)• No major displacement of molecules• Need a material: typically in solids (fluids: distance between mol)• Typical example: thermal “slug”, spreader, heatsink
Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001
A
26
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Conduction
Different materials(not a strongfunction oftemperature)Si – more variation
Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001
27
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Convection
• Macroscopic (bulk transport, mix of hot and cold, energy storage)
• Need material (typically in fluids, liquid, gas)• Natural vs. forced (gas or liquid)• Typical example: heatsink (fan), liquid cooling
Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001
28
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Radiation
• Electromagnetic waves (can occur in vacuum)• Negligible in typical applications• Sometimes the only mechanism (e.g. in space)
Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001
29
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Surface-to-surface contacts
• Not negligible, heat crowding• Thermal greases (can “pump-out”) • Phase Change Films (undergo a transition from solid to
semi-solid with the application of heat)
Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001
30
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Phase-change
Thermal solutions evolution:• Natural air cooling• Forced-air cooling• Liquid cooling• Phase change (e.g. heat pipe)• Refrigeration
Phase change:
a. Solid changing to a liquid—fusion, or melting,
b. Liquid changing to a vapor—evaporation, also boiling,
c. Vapor changing to a liquid—condensation,
e. Liquid changing to a solid—crystallization, or freezing,
f. Solid changing to a vapor—sublimation,
g. Vapor changing to a solid—deposition.
31
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Thermal capacitance
• Example:
(Aluminum) = 2,710 kg/m3
Cp(Aluminum) = 875 J/(kg-°C)V = t·A = 0.000025 m3
Cbulk = V·Cp· = 59.28 J/°C
32
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Refrigeration
“conventional” vs. thermo-electric (TEC)• Can get T < T_amb (“negative” Rth!)TEC: Peltier effect (can use for local cooling)
33
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
TEC electro-thermal model
34
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Simplistic steady-state model
All thermal transfer: R = k/A
Power density matters!Ohm’s law for thermals
(steady-state)
V = I · R -> T = P · R
T_hot = P · Rth + T_amb
Ways to reduce T_hot:
- reduce P (power-aware)
- reduce Rth (packaging)
- reduce T_amb (Alaska?)
- maybe also take advantage of transients (Cth)
T_hot
T_amb
35
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Simplistic dynamic thermal model
Electrical-thermal duality V temp (T) I power (P) R thermal resistance (Rth) C thermal capacitance (Cth)RC time constant
KCLdifferential eq. I = C · dV/dt + V/Rdifference eq. V = I/C · t + V/RC · tthermal domain T = P/C · t + T/RC · t(T = T_hot – T_amb) One can compute stepwise changes in
temperature for any granularity at which one can get P, T, R, C
T_hot
T_amb
36
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Combined package model
Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001
Steady-state
Tj – junction temperature
Tc – case temperature
Ts – heatsink temperature
Ta – ambient temperature
37
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Itanium package model
Example: processor + 4 cache modules
Source: H. Xie et al. “Packaging the Itanium Microprocessor”Electronic Components and Technology Conference 2002
38
© M
irce
a St
an, K
evin
Ska
dron
, Dav
id B
rook
s, 2
002
Thermal issues summary
• Performance, power, reliability• Architecture-level: conduction only• Convection: too complicated• Radiation: can be ignored
• Use compact models for package• Power density is key