scaling, power and the future of cmos - semantic scholar › ecb5 › b230c47f5109aafce... ·...
Post on 27-Jun-2020
0 Views
Preview:
TRANSCRIPT
Scaling, Power
and the Future of CMOS
Mark Horowitz, Elad Alon, Dinesh Patil, Stanford
Samuel Naffziger, Rajesh Kumar, Intel
Kerry Bernstein, IBM
Slide 2
A Long Time Ago
In a building far away
A man made a prediction
On surprisingly little data
That has defined an industry
Slide 3
Moore’s Law
Slide 4
CMOS Computer Performance
1.00
10.00
100.00
1000.00
10000.00
85 87 89 91 93 95 97 99 01 03 05 07
intel 386
intel 486intel pentium
intel pentium 2
intel pentium 3
intel pentium 4intel itanium
Alpha 21064
Alpha 21164
Alpha 21264Sparc
SuperSparc
Sparc64Mips
HP PA
Power PC
AMD K6AMD K7
AMD x86-64
Slide 5
Moore’s Original Issues
• Design cost
• Power dissipation
• What to do with all the functionality possible
ftp://download.intel.com/research/silicon/moorespaper.pdf
Slide 6
Outline
• How designers will deal with poor power scaling
• Origins of the power problem
• An optimization perspective
• Low power circuits and architectures
• Cost of variability
• Future scenarios
• What device characteristics matter (to me)
Slide 7
The 80’s Power Problem
• Until mid 80s technology was mixed
• nMOS, bipolar, some CMOS
• Supply voltage was not scaling / power was rising
• nMOS, bipolar gates dissipate static power
From Roger Schmidt, IBM Corp
Slide 8
Solution: Move to CMOS
• And then scale Vdd
0.1
1
10
Jan-85 Jan-88 Jan-91 Jan-94 Jan-97 Jan-00 Jan-03
Feat Size (um)
Vdd
Slide 9
Scaling MOS Devices
• In this ideal scaling• V scales to αV, L scales to αL
• So C scales to αC, i scales to αi (i/µ is stable)
• Delay = CV/I scales as α
• Energy = CV2 scales as α3
JSSC Oct 74, pg 256
Slide 10
Processor Power
• Continued to grow, even when Vdd was scaled
1
10
100
85 87 89 91 93 95 97 99 01 03
Slide 11
Why Power Increased
• Growing die size, fast frequency scaling
Clock Frequency (MHz)
10
100
1000
10000
85 87 89 91 93 95 97 99 01 03 05
Slide 12
Good News
• Die growth & super frequency scaling have stopped
Cycle in FO4
10
100
85 87 89 91 93 95 97 99 01 03 05
Slide 13
Processor Power
• They were high power too
1
10
100
85 87 89 91 93 95 97 99 01 03
Slide 14
Bad News
• Voltage scaling has
stopped as well
• kT/q does not scale
• Vth scaling has
power consequences
• If Vdd does not scale
• Energy scales slowlyEd Nowak, IBM
Slide 15
Energy – Performance Space
• Every design is a point on a 2-D plane
Performance
Energy
Slide 16
Energy – Performance Space
• Every design is a point on a 2-D plane
Performance
Energy
Slide 17
Energy – Performance Space
• Every design is a point on a 2-D plane
Performance
Energy
Slide 18
Trade-offs for an Adder
101
102
103
104
105
Stat.Carr.Chain Stat.Carr.Sel
Stat. KS
Dom. BK
Dom. KS
Dom. LF
Stat. LF
Stat. BK
Stat. 84421
Slide 19
( )
*dd dd
dd
dd
dd V V
EV
Sens VDV
=
∂∂
= −∂
∂
Key Observation:
• Define the Energy/Delay sensitivity of parameter
• For example Vdd:
• At optimal point, all sensitivities should be the same
• Must equal the slope of the Pareto optimal curve
Slide 20
What This Means
• Vdd and Vth are not directly set by scaling
• Instead set by slope of Pareto optimal curve
• Leakage rose to lower total system power!
101
102
103
104
105
[Vdd=0.68, Vt
N=0.31]
[Vdd=1 Vt
N=0.30]
[Vdd=1.2 Vt
N=0.26]
[Vdd=1.3 Vt
N=0.22]
Slide 21
Low Power Design Techniques
Three main classes of methods to reduce energy:
• Cheating
• Reducing the performance of the design
• Reducing waste
• Stop using energy for stuff that does not produce results
• Stop waiting for stuff that you don’t need (parallelism)
• Problem reformulation
• Reduce work (less energy and less delay)
Slide 22
Cheating
• Many low-power papers talk only about energy
• Don’t consider performance
• Reducing performance can always reduce energy
• But there are many ways to reduce performance
• Good technique must lower the optimal curve
• “Sensitivity” of technique
• Must be better than current curve
• This depends on location on the curve
Slide 23
Reducing Energy Waste
• Clock gating
• If a section is idle, remove clock
• Removes clock power
• Prevents any internal node from transitioning
• Create system power states
• Turn on subsystems only when they are needed
• Can have different “off” states
• Power vs. wakeup time
• Disk (do you stop it from spinning?)
Slide 24
Embedded Power Gating
• Can reduce leakage• 250x reported
• But costs• Performance
• Drop in Vdd, Gnd
Embedded
Power
Switches
Rows of
Standard
Cells
Power Switch
Control Signals
• Since transistors still leak when power is off
Royannez, et al, 90nm Low Leakage SoC Design Techniques
for Wireless Applications, ISSCC 2005
Slide 25
Range of Applicability
• Power supply gating
• Done to remove leakage power
• But slows down the circuit
• Adds series resistance to the supply
Performance
Energy/op
Makes circuit worse
when energy
sensitivity is high
Slide 26
Parallelism
• If the application has data parallelism
• Parallelism is a way to improve performance
• With low additional energy cost
Slide 27
Existing Processors
0.01
0.1
1
1 10 100 1000Spec2000*L
Watts/(Spec*L*Vdd2)
10 processors
working in parallel
Slide 28
Parallel Server Chip
• Power 5 from IBM
Slide 29
Problem Reformulation
• Best way to save energy is to do less work
• Energy directly reduced by the reduction in work
• But required time for the function decreases as well
• Convert this into extra power gains
• Shifts the optimal curve down and to the right
User Performance
Energy/op
Slide 30
Cost of Variation
• Variability changes position of the optimal curves
• Need to margin Vth, Vdd to ensure circuit always works
10-1
100
10-1
100
101
Performance
Energy
∆ Vth = 0 mV
∆ Vth = 120 mV
Slide 31
Partial Compensation
• Adjust Vdd after you get part back
• Compensates very well for small deviations in Vth
10-1
100
10-1
100
101
Performance
Energy
∆ Vth = 0 mV
∆ Vth = 120 mV
Slide 32
Reducing Voltage Margins
• At test time determine Vdd for that part
• Have private DC-DC converter already
20%
Slide 33
Variable Application Demands
• Try to provide a couple of operating points
• Application can control speed and energy
• Hard question is what are valid Vdd, F pairs
• Usually determined during test
• Dynamic voltage scaling
• Intel Speed Step in laptop processors
• 2 performance/power points
• Transmeta Long Run Technology
• Many operating points. Test data + formula
Slide 34
Constant Power Scaling
• Foxton controller on next-gen. Itanium II
• Raises Vdd/boosts F when most units idle
• Lowers Vdd for parallel code to stay in budget
Slide 35
Self Checking Hardware
• Razor (Austin/Blaauw, U of Mich)
• Use the actual hardware to check for errors
• Latch the input data twice
• Once on the clock edge, and then a little later
• If the data is not the same, you are going too fast
Din 0
clk_del
Q[0]Din 1Din 2
error
clk
Error_L
comparator
RAZOR FF
Main
Flip-Flop
Shadow
Latch
01
Slide 36
With Error Recovery
• Can run the chip so it makes some errors
• Chip gets right answer 99.9% of the time
• 0.1% of the time, the chip must rerun operation
1.4 1.5 1.6 1.7 1.8
1.4
1.5
1.6
1.7
1.8 Chips
Linear Fit
y=0.78685x + 0.22117
Voltage at First FailureVoltage at 0.1%Error Rate
Slide 37
Adjusting Vth
• In theory want to adjust Vth too
• Very hard to do with modern transistors
0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.35
10
1520
2530
3540
4550
5560
6570
Leakage vs Vdd with Different Body Bias
Vbb=0.8
Vbb=0.7
Vbb=0.6
Vbb=0.5
Vbb=0.4
Vbb=0.3
Vbb=0.2
Vbb=0.1
Vbb=0
Vdd (V)
Leakage (
A)
Slide 38
Future Systems
• Some simple math
• Assume scaling continues
• Dies don’t shrink in size
• Average power/gate must decrease by 2x / generation
• Since gates are shrinking in size
• Get 1.4x from capacitive reduction
• Where is the other factor of 1.4x ?
Slide 39
Exploit Parallelism / Scale Vdd
• If you have parallelism
• Add more function units
• Fill up new die (2x)
• Lower energy/op
• ∆E/∆P will decrease
• Vdd, sizes, etc will reduce
• Build simpler architectures
• Works well when ∆E/∆P is large
• Per unit performance decrease is small
Performance
Energy/op
Slide 40
Exploit Specialization
• Optimize execution units for different applications
• Reformulate the hardware to reduce needed work
• Can improve energy efficiency for a class of applications
• Stream / Vector processing is a current example
• Exploit locality, reuse
• High compute densityµ-controller
Clusters
SRFMem
ory System
HISC NI
Bill Dally et al, Stanford
Imagine
Slide 41
Exploit Integration
• If both those techniques don’t work
• Still can increase integration by at least 1.4x
• Moving units onto one chip
• Reduces the number of I/Os on system
• I/O can take significant power today
• Allows even larger integration
Slide 42
TI - OMAP2420
• Specialization
• And power domains• Most units are off
• OMAP 2420 • 5 Power Domains
• #1: MCU Core
• #2: DSP Core
• #3: Graphic Accelerator
• #4: Core + Periph.
• #5: Always On logicRoyannez, et al, 90nm Low Leakage SoC Design Techniques
for Wireless Applications, ISSCC 2005
Slide 43
Low-Power PowerPC
Nowka et al., Low-
power PowerPC,
ISSCC
Slide 44
What All This Means
• As long as $/function and cap continue to scale
• Moving to the new technology will be profitable
• And will allow designs to be better systems
• In the worst case, active die area will decrease
• Scale gates by the decrease in gate capacitance
• In most cases, we will do much better
• But how to optimize devices in this new domain?
Slide 45
Radical Idea:
• Scaling channel length may no longer be critical
• I still want small (i.e. dense) devices
• But I also want lower variations & external control of Vth
• Longer Leff may actually improve energy efficiency
• Less variability � lower energy penalty
• Especially as move to lower performance (parallelism)
10-1
100
100
101
Performance
Energy
120 mV110 mV
55 mV60 mV
Lnom
Lnom
+ 10%
Slide 46
Conclusions
• Unfortunately power is an old problem
• Magic bullets have mostly been spent
• Power will be addressed by application-level optimization, parallelism/specialized functional units, and more adaptive control
• Need to rethink scaling
• Still makes things cheaper
• But what do we want from scaled transistors?
Slide 47
1.5
Technology Scaling
Seems simple,
• Every 1 years
• Number of transistors double
• Transistors get faster
• Gates become lower power (CMOS)
• Life just gets better and better
2
Slide 48
Reality is a Little Different
• While scaling has been smooth
• Almost nothing else has been
• Device and circuit technology has changed
• DTL, ECL, TTL, pMOS, nMOS, CMOS
• Power periodically becomes a critical issue
• It is critical again
Slide 49
nMOS, TTL, ECL Were King
• 1978 – Started in VLSI
• First design was bipolar/ECL
• 3µm nMOS was hot
Intel 8086 Intel 286
DEC µVaxBIT Sparc HP Focus
Slide 50
MOS Scaling Was Understood
• MOS devices operate on electric fields
• If E fields are the same
• Relation between E and J is the same
• So if all voltages and lengths scale
• iV curve retains the same shape, scaled in V
Bob Dennard worked all the math in 74JSSC Oct 74, pg 256
Slide 51
Dilemma
• Processors today are power limited
• As are many other chips
• Technology scaling will not save us
• With Vdd fixed, energy scaling will be modest
• How does one build more powerful processors?
• Or other types of chips
When constrained, optimize!
Slide 52
Optimizing the Right Thing
• Given systems are power limited
• Highest performance system is not interesting
• Will dissipate too much power
• Lowest energy solution is also not interesting
• Will not have enough performance
• Want constrained optimization
• Highest performance for 20 Watts
• Lowest power for 100 SPEC
Slide 53
Leakage Trends
Gate Length (um)
Active Power Density
Subthreshold Power Density
Slide 54
Performance
Energy/op
Design Parameters To Adjust
• Circuit
(sizing, supply, threshold)
• Circuit topology
(adder: CLA, CSA, …)
• Logic style
(domino, pass-gate, …)
• Micro-architecture
(pipelining, cache design, branch architecture, etc)
Slide 55
Energy Efficient Designs
• Are on the Pareto optimal curve
• On this curve design parameters are constrained
Performance
Energy/op
infeasible
wasting energy
Slide 56
Leakage Energy
• Matching marginal costs for Vdd and Vth
10-2
10-1
0
0.1
0.2
0.3
0.4
0.5
Activity Factor
Leakage Ratio
10-2
10-1
0
0.2
0.4
0.6
0.8
1
1.2
Voltage
Vdd
Vth
Leakage Ratio
Slide 57
Measured Leakage Data
0
0.1
0.2
0.3
0.4
0.5
Leakage R
atio
Slide 58
IBM Cell Processor
Slide 59
Vth Variation
• Since leakage is exponential on Vth
• Average Vth for leakage is not the expected Vth
5 10 15 20 25 30 35 400
0.2
0.4
0.6
0.8
1
Relative Leakage
Cumulative Probability
0.2 0.3 0.4 0.50
0.5
1
1.5
2
2.5
Vth
Relative Leakage Contribution
Leakage
Vth
Slide 60
How Else to Save Energy?
• Running faster than needed wastes energy
• Forces you to run higher on performance curve
• Why do you run faster than needed?
• Need margins to account for variability
• From application, environment, or technology
Variations cause waste
Slide 61
Dynamic Voltage Scaling
Burd et al ISSCC 2000
Slide 62
Dynamic Voltage Scaling
• Dynamic voltage scaling
• Adjusts Vdd to the “right” value for desired performance
• Big problem is how to find the “right” Vdd
• Need to know the relationship between Vdd and F
• Need to have a circuit that matches the critical path
• How do you do this with variations?
top related