lecture 5 power and low-power design - university of british...
TRANSCRIPT
Lecture 5RAS 1
Lecture 5
Power and Low-Power Design
R. SalehDept. of ECE
University of British [email protected]
Lecture 5RAS 2
Overview
• Today, the key issues in integrated circuits revolve around low power design. Timing has always been a design priority, but power is quickly becoming the most critical design goal.
• We will look at trends that have made power the leading issue in the forseeable future and then examine the components of power, and the metrics for power.
• We will look at many techniques for dynamic power reduction that are currently in use.
References: HJS Chapter 5, Sections 5.8, 5.9
A variety of books and papers on Low-power design.
Lecture 5RAS 3
All Applications require Low-Power Design
Powerrange Concerns
< 0.1W • Battery life Portable• PDA• Communications
~ 1W Consumer• Set-Top-Box• Audio-Visual
• Inexpensive package limit • System heat (10W / box)
> 10W • Ceramic package limit• IR drop of power lines
Processor• High-end MPU's• Multimedia DSP's
Typical applications(All need high-perf.)
• 120Wh/kg today
• 400 Wh/kg in 10 years
• 50W/cm2 limit for forced-air cooling
(Itanium chip is 130W and 3.75cm2)
•10W/cm2 limit for convection cooling
- 200W
Lecture 5RAS 4
Ever Increasing Chip PowerP
ow
er p
er c
hip
[W
]
1980 1985 1990 1995 20000.01
0.1
1
10
100
1000
Year
MPU
x4 / 3
year
s
DSP
x1.4 / 3 years
Processors
published
in ISSCC
Lecture 5RAS 5
Subthreshold Leakage Effects will DominateP
ow
er p
er c
hip
[W
]
1980 1985 1990 1995 20000.01
0.1
1
10
100
1000
MPU
x4 / 3
year
s
DSP
x1.4 / 3 years
Processors published in ISSCC
2005 2010 2015
x1.1 / 3 years
ITRS requirement
10000
Dynamic
Leakage
Lecture 5RAS 6
Old Static and Dynamic Power Trends
Year2002 ’04 ’06 ’08 ’10 ’12 ’14 ’160
0.2
0.4
0.6
0.8
1
1.2
0
20
40
60
80
100
120
Tec
hn
olo
gy
no
de[
nm
]
Vo
ltag
e [V
]
VT
VDD
Technology node
2002 ’04 ’06 ’08 ’10 ’12 ’14 ’160
1
2
Year
PDYNAMIC
PLEAK
Po
wer
[µW
/ g
ate]
Subthresholdleakage
Lecture 5RAS 7
Year
Su
pp
ly V
olt
age
[V]
New VDD Trends – back to constant VDD
1995 2000 2005 2010 20150
1
2
3
4
5Supply Voltage
1.2V1.0V
1.8V
2.5V
3.3V
350nm 250nm 180nm 130nm 90nm 65nm 45nm 32nm
P. Urard, ST Microelectronics5.0V
Lecture 5RAS 8
Problems Found on First “Spin” of Silicon in 0.18um/0.13um
Functional Logic Error
Analog Tuning Issue
Signal Integrity
Clock Scheme Error
Reliability Problem
Mixed-Signal Problem
Power Problem
Long Path Error
Short Path Error
IR Drop
Firmware
Other
43%
20%
17%
12%
11%
11%
10%
10%
7%
4%
3%
14%
Lecture 5RAS 9
Power in CMOS Inverter
leakagecircuitshortclktotal
IVddIVddfCP ⋅+⋅+= −α
• Power in an inverter is governed by the 3 part equation above– Dynamic CV2f (switching) power
• Currently the largest part, but percentage getting smaller
– Leakage Power• Subthreshold conduction – getting bigger due to aggressive scaling, temperature,
etc. (DIBL, GIDL)• Reverse leakage of diodes • Gate tunneling current in 90nm and 65nm technology
– Short-circuit (crowbar) current• Both pull-up and pull-down devices are partially conducting for a small, but finite
amount of time• Can be modeled as some fraction of dynamic current
Dynamic Short-circuit Leakage
Vdd ∆V
Lecture 5RAS 10
A Closer Look at Average Dynamic Power
Cinternal
• Capacitance (CL)– Gate and parasitic source/drain capacitances– Wires or interconnects
• V2 = Vdd * Vswing
– Power supply voltage (Vdd) and output swing (Vswing)• fCLK - frequency of operation• Activity factor (α)
– not all nodes switch every cycle – data dependent– Internal nodes can switch without changing the output
L drain wire gateC C C C= + +∑ ∑ ∑
dynamic switched dd swing CLKP C V V fα= ⋅ ⋅ ⋅ ⋅Vdd
Lecture 5RAS 11
Glitch Power
• If the arrival time of two signals is such that the output switches for a short period, a glitch occurs– undesired switching leads to power dissipation from dynamic power and
short-circuit power– amount of power depends on signal timing
• Typically less than 5% of total dynamic power• Ways to mitigate this problem? – Change logic design
– Ensure that signal arrival times are roughly the same
IGlitch
Vout
PGlitch=IGlitchVDD
2
2
22
Lecture 5RAS 12
Short-Circuit Power
• Both pMOS and nMOS transistors conduct simultaneously– occurs only when gate switches (related to α)– Due to finite input rise and fall times; duration = trise+tfall = ∆tSC
• Typically less than 10% of total dynamic power;depends on input/output time constants
• Ways to mitigate this problem?– Sharper edge transitions?
• Just pushes the power elsewhere – means larger transistor sizes for the preceding circuit
– Non-overlapping pMOS and nMOS transitions?• Requires extra circuitry (delay, power) to generate separate signals
– Higher VT? Slows down circuits
ISC
Vout
PSC=ISC,mean VDD
trise tfall
Lecture 5RAS 13
Leakage Power
p-substrate
n+ n+ p+ p+
n-well
n+
Vdd Vdd VddVout = 0
• Mostly depends on processing parameters and operating conditions (i.e.,temperature and voltage)
• Reverse leakage current of reverse biased pn-junctions• Subthreshold conduction in devices even when OFF• Getting to be larger percentage due to aggressive scaling
– Leakage in logic and memory arrays is a big issue today
−⋅= 1eJAI kT
qVbias
Sreverse
( )kT
VVq
Oldsubthresho
Tgs
eII α−
⋅=
( )ldsubthreshoreverseleakage IIVddP +⋅=
Lecture 5RAS 14
Expression for Total Chip Power
P = ααααt •(1+γγγγ) • N • CL • VDD2 • fCLK Charging & discharging
Crowbar and Glitch current+ (1- α)α)α)α) • N • ILEAK • VDD Subthreshold leak current
ααααt : Switching probability including glitchesγγγγ : crowbar and glitch componentsCL : Avg. Load capacitance per gateVDD : Supply voltagefCLK : Clock frequencyILEAK : Subthreshold leakage currentN : total number of gates on chip
Lecture 5RAS 15
Power & Delay Dependence on VDD & VT
Power : P = α α α α t •fCLK •CL •VDD + I0 •e •VDD 2
(Vgs-VT)
nVth
k • CL • VDD
(VDD - VT )ααααDelay =
k•Q
I=
12
34
-0.400.40.8
0
0.2
0.4
0.6
0.8
1x 10
-4
VT (V)
VDD(V)
Po
wer
(W
)
A
B
12
34
-0.400.40.8
0
1
2
3
4
5x 10
-10
Del
ay (
s)
VT (V)VDD(V)
A B
Lecture 5RAS 16
Metrics for Power
• How do we compare the power dissipation of two different circuits? Can we just use the power in Watts? Consider the two designs below which perform the same computation.
– the two designs have different peak power levels for the same computation so design A seems worse than B
– but design B takes longer to perform the computation– so power is not a good metric to determine design efficiency
Watts
time
A
B
Lecture 5RAS 17
Power Metrics - Energy
• How about just the area under the power curve?
• We can do this by multiplying the power x delay:
• This is the classic power-delay product = energy/operation
• Useful, but you can still reduce the energy per operation by reducing the supply voltage or using smaller transistors in agiven technology which effectively slows down the circuit
Watts
time
Power ∗ Delay/Op = αn C Vdd ∆V f * (1/ f) = k CVdd ∆V
Lecture 5RAS 18
Energy-Delay Product vs. Supply Voltage
1.0 2.4 3.8 5.2 6.6supply voltage
0.0
0.2
0.4
0.6
0.8
Delay
Energy
Energy x Delay
Decreases as supply increases
Increases as
supply increases
1.0
• How about taking the product of energy and delay; that is, use E*D = power x delay x delay as a metric of design quality
Lecture 5RAS 19
Power-Related Metrics
• Watts (Power Dissipation)
• Power-Delay/Op (Energy per operation) - P x D
• Energy-Delay/Op (Energy-Delay per operation) – P x D2
• Energy-Power/Op (Energy-Power per operation) – P2 x D
• MOPs/mW = millions of operations per milliWatt
• Watts/cm2 (power density)
• Watt-hours/kg (Battery life per kilogram)
Lecture 5RAS 20
Need to Address Power at Many Levels
Lecture 5RAS 21
Controlling Dynamic Power
Dynamic Power controlled by: activity, supply voltage, signal swing, capacitance and frequency.
Some Circuit Techniques for power reduction:• Pipelining• Reduce Glitches• Reduce Activity• Voltage Islands (MSMV)• Clock Gating• Power Shutoff• Dynamic Frequency and Voltage Scaling
Lecture 5RAS 22
Using Pipelining for Power Reduction
Clk
Clk-Q SetupPropagation Delay
Time Slack
Time Slack
Vdd
Clk
Clk
Lecture 5RAS 23
Multiple Supply Multiple Voltages (MSMV)
• All non critical cells are assigned VddL
• Remaining Cells are assigned to Vddh
• Useful approach, but perhaps too fine-grained for general implementation in ASIC Flow
Vddl = 1.2V Vddh = 1.5V
IBM
Lecture 5RAS 24
� Voltage Assignment Table
Voltage Island Partitioning
0.9V,1.2V,1.5VIP 15
0.9V,1.2V,1.5VIP 14
0.9V,1.2V,1.5VIP 13
0.9V,1.2V,1.5VIP 12
0.9V,1.2V,1.5VIP 11
1.5VIP 10
1.2V,1.5VIP 9
0.9V,1.2V,1.5VIP 8
1.2V,1.5VIP 7
1.2V,1.5VIP 6
0.9V,1.2V,1.5VIP 5
0.9V,1.2V,1.5VIP 4
1.2V,1.5VIP 3
0.9V,1.2V,1.5VIP 2
1.2V,1.5VIP 1
Voltage Choices Block Name
IP 1 IP 8
IP 2IP 3
IP 4 IP 5
IP 7IP 6
IP 9
IP 10
IP 11
IP 12
IP 14
IP 13
IP 15
IP 1 IP 8
IP 2IP 3
IP 4 IP 5
IP 7IP 6
IP 9
IP 10
IP 11
IP 12
IP 14
IP 13
IP 15
Lecture 5RAS 25
Level Shifters
• Level Shifters are circuits that handle the voltage differences that can occur between either side of the island boundaries.
• They provide reliable voltage-level shifting across islands with minimal impact on signal delay or duty cycle.
• Level Shifters should be used in case of VddL to VddH transfer.• If not used, VddL may not be high enough to cause the VddH gate to
switch.
• Gates tied to a VddL can safely be driven by VddH without issue.Therefore, down level shifters are not required.
Lecture 5RAS 26
Simple Level Shifter
When input is high:• If VddL < VddH/2 ,then, the inverter
will not switch.• The VT for the n1 device is set to a
value so that: VddL > Vx/2.• This will make the first inverter
switch to a low value• Once the first inverter switches, the
second inverter switches highWhen input is low:• p4 is used to pull Y to VddH in order
to stabilize the gate and set Y to a valid “1” for the next stage.
X
Y
Lecture 5RAS 27
Floorplan Solutions
: Voltage Island 1 : Voltage Island 2 : Voltage Island 3 : Empty Space
(a) IR Drop Violation (b) No IR Drop Violation
IR DropViolation
ExtraDecapC21
C5C1
C9
C3
C17C22
C6
C13
C15 C7
C11
C19
C25
C5
C1
C9
C3
C17
C22C6C13
C15
C7
C11 C19
C25
C4
C4
C21
Lecture 5RAS 28
Power Shut-off (PSO)
• Standard approach: fixed independent voltage islands working at minimum voltage under a given performance constraint.
• Blocks can be optionally shutdown for any island to save power.
Lecture 5RAS 29
Voltage and Frequency Scaling
Required speed ∝∝∝∝ ƒ0 0.2 0.4 0.6 0.8 1
No
rmal
ized
pow
er P
∝∝ ∝∝ƒV
2
0
0.2
0.4
0.6
0.8
1
Controller
Clock & VDD
Requiredspeed
Processor
Software HardwareSuper-linear
Lecture 5RAS 30
Power-State Model (PSM)
• Address MSMV and PSO together• Use PSM in each stage of the design
Init
S1 S2
S3
S4
S5
S7
S8
S6
C1=Idle;C2=Idle;C3=Idle;C4=Sleep;C5=Sleep ……….
C1=Active;C2=Idle;C3=Idle;C4=Active;C5=Active ……….
Lecture 5RAS 31
Power-Delay Tradeoff for Interconnect
Normalized speed(by changing the number & size of repeaters)
0.9 0.95 10.7
0.8
0.9
1
No
rmal
ized
pow
er
Super-linear
Lecture 5RAS 32
Clock Gating
• Gate off clock to idle functional units
– need logic to generate disable signal
• increases complexity of control logic
• consumes power
• timing critical to avoid clock glitches at AND gate output
– additional gate delay on clock signal• gating AND gate can replace a buffer in
the clock distribution tree
• all clock trees should have same type of gating whether they are used or not for balance
• Most popular method for power reduction of clock signals and functional units
FF’s
Combinational
Logic
disable
clock
Lecture 5RAS 33
Clock Gating Reduces Power
DSP/HIF
DEU
MIF
VDE
896Kb SRAM
10
8.5mW
0 155
30.6mW
20 25
Without clock gating
With clock gating
Power [mW]
90% of F/F’s were clock-gated.
70% power reduction by clock-gating alone.
MPEG4 decoder
Lecture 5RAS 34
Low-power Design Study
• Reference Design: personal digital assistant (PDA)
• Composed of CPU, DSP, peripheral I/O, and memory
• Driving current, Ion, is
Ion = WνsatCox(Vgs-VT)• Subthreshold current, Ioff, is
Ioff = Is e((-qVt)/nkT)
Lecture 5RAS 35
PDA Model Characteristics
Process Technology (nm) 130 90 65 45 32 22Operation Voltage (V) 1.2 1 0.8 0.6 0.5 0.4Clock Frequency (MHz) 150 300 450 600 900 1200Application Real Time Video Codec Real Time Interpretation (MAX performance required) (MPEG4/CIF)Application Web Browser TV Telephone (1:1) TV Telephone (>3:1)(Others) Electric Mailer Voice Recognition (Input) Voice Recognition (Operation)
Scheduler Authentication (Crypto Engine)Processing Performance (GOPS) 0.3 2 14 77 461 2458Parallelism Factor 1 4 4 4 4 4Communication Speed (Kbps) 64 384 2304 13824 82944 497664Power Consumption (MOPS/mW) 3 20 140 770 4160 24580Peak Power Consumption (mW)(Requirement)
2 2 2Standby power consumption (mW) (Requirement)
2 2 2
100 100 100
Still Image Processing
100 100 100
Lecture 5RAS 36
Parameters Type 2001 2004 2007 2010 2013 2016
Drawn Gate L(nm) 130 90 65 45 32 22
LOP 1.8 1.1 0.9 0.8 0.7 0.6
LSTP 1.8 1.2 1.1 1 0.9 0.9
LOP 0.34 0.32 0.29 0.29 0.25 0.22
LSTP 0.51 0.53 0.52 0.49 0.45 0.45
LOP 600 600 700 700 800 900
LSTP 300 400 500 500 600 800
LOP 1.00E-04 3.00E-04 7.00E-04 1.00E-03 3.00E-03 1.00E-02
LSTP 1.00E-06 1.00E-06 1.00E-06 3.00E-06 7.00E-06 1.00E-05
LOP 31 22 14 10 7 4
LSTP 55 32 22 17 11 7
Vdd (V)
FO4 Delay (ps)
Ioff(uA/um)
Ion (uA/um)
Vth (V)
LOP and LSTP Key Parameters
LOP = low operating (dynamic) power LSTP = low standby (static) power
Lecture 5RAS 37
Power Dissipation Equations
• Total Chip Power, Ptotal, is:
∑∑ += memoryictotal PPP log
• Plogic consists of both dynamic and static terms.
log ic dynamic staticP P P= +
Lecture 5RAS 38
Dynamic Logic and Memory Power
• Main power dissipation in memory blocks is due to column bitlines switching;
• Static power is assumed to be small due to VT adjustments• α≈0.4% (constant)
fVCP ddmemorymemorymemory2α=
• Plogic = αlogic Clogic Vdd2f;
where αlogic =10% (constant)
LOGIC
MEMORY
Lecture 5RAS 39
LOP Power Dissipation
0.00
0.50
1.00
1.50
2.00
2.50
2001 2004 2007 2010 2013 2016
Year
Po
wer
(W
)
- Dynamic Power LOP (W)
- Static Power LOP (W)
- Power for LOP Bottom-Up (W)
Lecture 5RAS 40
LSTP Power Dissipation
0.0
0.4
0.8
1.2
1.6
2001 2004 2007 2010 2013 2016
Year
Po
wer
(W
)
- Dynamic Power LSTP (W)
- Static Power LSTP (W)
- Power for LSTP Bottom-Up (W)
Lecture 5RAS 41
0%
20%
40%
60%
80%
100%
2001 2004 2007 2010 2013 2016Year
Per
centa
ge
of A
rea
(%)
Logic Area Contribution (%) LOP
Logic Area Contribution (%) LSTP
Total Memory Area (%) LOP
Total Memory Area (%) LSTP
Die Size = 1cm2
Power-Constrained Chip Composition
Memory
Logic
Lecture 5RAS 42
Power Percentages of Typical Designs
Clock
ASSP1
LogicMemory
I/O
ASSP2
Clock
Logic
MemoryI/O
MPU1 Clock
Logic
MemoryI/O
MPU2Clock
Logic
Memory
I/O
Lecture 5RAS 43
Summary
Power-aware electronics will open up new applications and markets.
Super-linear dependence of power on speed.
Need to exploit cooperation among levels:
Software, architecture, algorithm, system, EDA,
circuit, technology, assembly
Target: 100X power reduction
→→→→ Reduction to 1/10 achievable
→→→→ Another 1/10 to be invented
Next time: Upcoming leakage power crisis.