scaling, power and the future of cmos - semantic scholar › ecb5 › b230c47f5109aafce... ·...

Scaling, Power

and the Future of CMOS

Mark Horowitz, Elad Alon, Dinesh Patil, Stanford

Samuel Naffziger, Rajesh Kumar, Intel

Kerry Bernstein, IBM

A Long Time Ago

In a building far away

A man made a prediction

On surprisingly little data

That has defined an industry

Moore’s Law

CMOS Computer Performance

100.00

1000.00

10000.00

85 87 89 91 93 95 97 99 01 03 05 07

intel 386

intel 486intel pentium

intel pentium 2

intel pentium 3

intel pentium 4intel itanium

Alpha 21064

Alpha 21164

Alpha 21264Sparc

SuperSparc

Sparc64Mips

Power PC

AMD K6AMD K7

AMD x86-64

Moore’s Original Issues

• Design cost

• Power dissipation

• What to do with all the functionality possible

ftp://download.intel.com/research/silicon/moorespaper.pdf

Outline

• How designers will deal with poor power scaling

• Origins of the power problem

• An optimization perspective

• Low power circuits and architectures

• Cost of variability

• Future scenarios

• What device characteristics matter (to me)

The 80’s Power Problem

• Until mid 80s technology was mixed

• nMOS, bipolar, some CMOS

• Supply voltage was not scaling / power was rising

• nMOS, bipolar gates dissipate static power

From Roger Schmidt, IBM Corp

Solution: Move to CMOS

• And then scale Vdd

Jan-85 Jan-88 Jan-91 Jan-94 Jan-97 Jan-00 Jan-03

Feat Size (um)

Scaling MOS Devices

• In this ideal scaling• V scales to αV, L scales to αL

• So C scales to αC, i scales to αi (i/µ is stable)

• Delay = CV/I scales as α

• Energy = CV2 scales as α3

JSSC Oct 74, pg 256

Processor Power

• Continued to grow, even when Vdd was scaled

85 87 89 91 93 95 97 99 01 03

Why Power Increased

• Growing die size, fast frequency scaling

Clock Frequency (MHz)

85 87 89 91 93 95 97 99 01 03 05

Good News

• Die growth & super frequency scaling have stopped

Cycle in FO4

85 87 89 91 93 95 97 99 01 03 05

Processor Power

• They were high power too

85 87 89 91 93 95 97 99 01 03

Bad News

• Voltage scaling has

stopped as well

• kT/q does not scale

• Vth scaling has

power consequences

• If Vdd does not scale

• Energy scales slowlyEd Nowak, IBM

Energy – Performance Space

• Every design is a point on a 2-D plane

Performance

Energy

Performance

Energy

Performance

Energy

Trade-offs for an Adder

Stat.Carr.Chain Stat.Carr.Sel

Stat. KS

Dom. BK

Dom. KS

Dom. LF

Stat. LF

Stat. BK

Stat. 84421

*dd dd

dd V V

Sens VDV

∂∂

= −∂

Key Observation:

• Define the Energy/Delay sensitivity of parameter

• For example Vdd:

• At optimal point, all sensitivities should be the same

• Must equal the slope of the Pareto optimal curve

What This Means

• Vdd and Vth are not directly set by scaling

• Instead set by slope of Pareto optimal curve

• Leakage rose to lower total system power!

[Vdd=0.68, Vt

N=0.31]

[Vdd=1 Vt

N=0.30]

[Vdd=1.2 Vt

N=0.26]

[Vdd=1.3 Vt

N=0.22]

Low Power Design Techniques

Three main classes of methods to reduce energy:

• Cheating

• Reducing the performance of the design

• Reducing waste

• Stop using energy for stuff that does not produce results

• Stop waiting for stuff that you don’t need (parallelism)

• Problem reformulation

• Reduce work (less energy and less delay)

Cheating

• Many low-power papers talk only about energy

• Don’t consider performance

• Reducing performance can always reduce energy

• But there are many ways to reduce performance

• Good technique must lower the optimal curve

• “Sensitivity” of technique

• Must be better than current curve

• This depends on location on the curve

Reducing Energy Waste

• Clock gating

• If a section is idle, remove clock

• Removes clock power

• Prevents any internal node from transitioning

• Create system power states

• Turn on subsystems only when they are needed

• Can have different “off” states

• Power vs. wakeup time

• Disk (do you stop it from spinning?)

Embedded Power Gating

• Can reduce leakage• 250x reported

• But costs• Performance

• Drop in Vdd, Gnd

Embedded

Switches

Rows of

Standard

Power Switch

Control Signals

• Since transistors still leak when power is off

Royannez, et al, 90nm Low Leakage SoC Design Techniques

for Wireless Applications, ISSCC 2005

Range of Applicability

• Power supply gating

• Done to remove leakage power

• But slows down the circuit

• Adds series resistance to the supply

Performance

Energy/op

Makes circuit worse

when energy

sensitivity is high

Parallelism

• If the application has data parallelism

• Parallelism is a way to improve performance

• With low additional energy cost

Existing Processors

1 10 100 1000Spec2000*L

Watts/(Spec*L*Vdd2)

10 processors

working in parallel

Parallel Server Chip

• Power 5 from IBM

Problem Reformulation

• Best way to save energy is to do less work

• Energy directly reduced by the reduction in work

• But required time for the function decreases as well

• Convert this into extra power gains

• Shifts the optimal curve down and to the right

User Performance

Energy/op

Cost of Variation

• Variability changes position of the optimal curves

• Need to margin Vth, Vdd to ensure circuit always works

Performance

Energy

∆ Vth = 0 mV

∆ Vth = 120 mV

Partial Compensation

• Adjust Vdd after you get part back

• Compensates very well for small deviations in Vth

Performance

Energy

∆ Vth = 0 mV

∆ Vth = 120 mV

Reducing Voltage Margins

• At test time determine Vdd for that part

• Have private DC-DC converter already

Variable Application Demands

• Try to provide a couple of operating points

• Application can control speed and energy

• Hard question is what are valid Vdd, F pairs

• Usually determined during test

• Dynamic voltage scaling

• Intel Speed Step in laptop processors

• 2 performance/power points

• Transmeta Long Run Technology

• Many operating points. Test data + formula

Constant Power Scaling

• Foxton controller on next-gen. Itanium II

• Raises Vdd/boosts F when most units idle

• Lowers Vdd for parallel code to stay in budget

Self Checking Hardware

• Razor (Austin/Blaauw, U of Mich)

• Use the actual hardware to check for errors

• Latch the input data twice

• Once on the clock edge, and then a little later

• If the data is not the same, you are going too fast

clk_del

Q[0]Din 1Din 2

Error_L

comparator

RAZOR FF

Flip-Flop

Shadow

With Error Recovery

• Can run the chip so it makes some errors

• Chip gets right answer 99.9% of the time

• 0.1% of the time, the chip must rerun operation

1.4 1.5 1.6 1.7 1.8

1.8 Chips

Linear Fit

y=0.78685x + 0.22117

Voltage at First FailureVoltage at 0.1%Error Rate

Adjusting Vth

• In theory want to adjust Vth too

• Very hard to do with modern transistors

0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.35

Leakage vs Vdd with Different Body Bias

Vbb=0.8

Vbb=0.7

Vbb=0.6

Vbb=0.5

Vbb=0.4

Vbb=0.3

Vbb=0.2

Vbb=0.1

Vdd (V)

Leakage (

Future Systems

• Some simple math

• Assume scaling continues

• Dies don’t shrink in size

• Average power/gate must decrease by 2x / generation

• Since gates are shrinking in size

• Get 1.4x from capacitive reduction

• Where is the other factor of 1.4x ?

Exploit Parallelism / Scale Vdd

• If you have parallelism

• Add more function units

• Fill up new die (2x)

• Lower energy/op

• ∆E/∆P will decrease

• Vdd, sizes, etc will reduce

• Build simpler architectures

• Works well when ∆E/∆P is large

• Per unit performance decrease is small

Performance

Energy/op

Exploit Specialization

• Optimize execution units for different applications

• Reformulate the hardware to reduce needed work

• Can improve energy efficiency for a class of applications

• Stream / Vector processing is a current example

• Exploit locality, reuse

• High compute densityµ-controller

Clusters

SRFMem

ory System

HISC NI

Bill Dally et al, Stanford

Imagine

Exploit Integration

• If both those techniques don’t work

• Still can increase integration by at least 1.4x

• Moving units onto one chip

• Reduces the number of I/Os on system

• I/O can take significant power today

• Allows even larger integration

TI - OMAP2420

• Specialization

• And power domains• Most units are off

• OMAP 2420 • 5 Power Domains

• #1: MCU Core

• #2: DSP Core

• #3: Graphic Accelerator

• #4: Core + Periph.

• #5: Always On logicRoyannez, et al, 90nm Low Leakage SoC Design Techniques

for Wireless Applications, ISSCC 2005

Low-Power PowerPC

Nowka et al., Low-

power PowerPC,

What All This Means

• As long as $/function and cap continue to scale

• Moving to the new technology will be profitable

• And will allow designs to be better systems

• In the worst case, active die area will decrease

• Scale gates by the decrease in gate capacitance

• In most cases, we will do much better

• But how to optimize devices in this new domain?

Radical Idea:

• Scaling channel length may no longer be critical

• I still want small (i.e. dense) devices

• But I also want lower variations & external control of Vth

• Longer Leff may actually improve energy efficiency

• Less variability � lower energy penalty

• Especially as move to lower performance (parallelism)

Performance

Energy

120 mV110 mV

55 mV60 mV

Conclusions

• Unfortunately power is an old problem

• Magic bullets have mostly been spent

• Power will be addressed by application-level optimization, parallelism/specialized functional units, and more adaptive control

• Need to rethink scaling

• Still makes things cheaper

• But what do we want from scaled transistors?

Technology Scaling

Seems simple,

• Every 1 years

• Number of transistors double

• Transistors get faster

• Gates become lower power (CMOS)

• Life just gets better and better

Reality is a Little Different

• While scaling has been smooth

• Almost nothing else has been

• Device and circuit technology has changed

• DTL, ECL, TTL, pMOS, nMOS, CMOS

• Power periodically becomes a critical issue

• It is critical again

nMOS, TTL, ECL Were King

• 1978 – Started in VLSI

• First design was bipolar/ECL

• 3µm nMOS was hot

Intel 8086 Intel 286

DEC µVaxBIT Sparc HP Focus

MOS Scaling Was Understood

• MOS devices operate on electric fields

• If E fields are the same

• Relation between E and J is the same

• So if all voltages and lengths scale

• iV curve retains the same shape, scaled in V

Bob Dennard worked all the math in 74JSSC Oct 74, pg 256

Dilemma

• Processors today are power limited

• As are many other chips

• Technology scaling will not save us

• With Vdd fixed, energy scaling will be modest

• How does one build more powerful processors?

• Or other types of chips

When constrained, optimize!

Optimizing the Right Thing

• Given systems are power limited

• Highest performance system is not interesting

• Will dissipate too much power

• Lowest energy solution is also not interesting

• Will not have enough performance

• Want constrained optimization

• Highest performance for 20 Watts

• Lowest power for 100 SPEC

Leakage Trends

Gate Length (um)

Active Power Density

Subthreshold Power Density

Performance

Energy/op

Design Parameters To Adjust

• Circuit

(sizing, supply, threshold)

• Circuit topology

(adder: CLA, CSA, …)

• Logic style

(domino, pass-gate, …)

• Micro-architecture

(pipelining, cache design, branch architecture, etc)

Energy Efficient Designs

• Are on the Pareto optimal curve

• On this curve design parameters are constrained

Performance

Energy/op

infeasible

wasting energy

Leakage Energy

• Matching marginal costs for Vdd and Vth

Activity Factor

Leakage Ratio

Voltage

Leakage Ratio

Measured Leakage Data

Leakage R

IBM Cell Processor

Vth Variation

• Since leakage is exponential on Vth

• Average Vth for leakage is not the expected Vth

5 10 15 20 25 30 35 400

Relative Leakage

Cumulative Probability

0.2 0.3 0.4 0.50

Relative Leakage Contribution

Leakage

How Else to Save Energy?

• Running faster than needed wastes energy

• Forces you to run higher on performance curve

• Why do you run faster than needed?

• Need margins to account for variability

• From application, environment, or technology

Variations cause waste

Dynamic Voltage Scaling

Burd et al ISSCC 2000

Dynamic Voltage Scaling

• Dynamic voltage scaling

• Adjusts Vdd to the “right” value for desired performance

• Big problem is how to find the “right” Vdd

• Need to know the relationship between Vdd and F

• Need to have a circuit that matches the critical path

• How do you do this with variations?

scaling, power and the future of cmos - semantic scholar › ecb5 › b230c47f5109aafce... ·...

Documents

elad and hila

data conversion designs for scaled cmos...

guido t. sasse - research.utwente.nl | service portal ·...

part one scaling and challenge of si-based cmos · part one...

the future of nanoelectronics by scaling of cmos …the...

cmos transistor scaling past 32nmcmos transistor...

eec 116 lecture #3: cmos inverters mos scaling

cmos scaling and variability - ホーム - ieee 日本 ......

a grand challenge for cmos scaling: alternate gate...

nano-cmos scaling problems and implications -...

impacts of cmos scaling on the analog...

cmos scaling - nanyang technological university

logic cmos scaling - applied materials

physical andtechnological limitations of nanocmos devices...

why is cmos scaling coming to an end? - publication · why...

improved techniques for high …[key words: domino cmos...

planar bulk cmos scaling to the end of the road

cmos scaling r&d

cmos transistor scaling past 32nm and implications on...

1. cmos scaling - nanohub