digital design for low power systems
TRANSCRIPT
Digital Design for
Low Power Systems
Shekhar Borkar
Intel Corp.
Digital Design for Low Power Systems/S. Borkar2
Outline• Low Power—Outlook & Challenges• Circuit solutions for leakage
avoidance, control, & tolerance• Microarchitecture for Low Power• System considerations• Technology, Circuits, Architecture,
and System co-optimization
Digital Design for Low Power Systems/S. Borkar3
Unconstrained SD Leakage
1
10
100
1000
10000
30 80 130
Temp (C)
Ioff
(na/
u)
0.25
90nm
45nm
Digital Design for Low Power Systems/S. Borkar4
Unconstrained SD Leakage Power
0
1
10
100
1,000
0.25u 0.18u 0.13u 90nm 65nm 45nm
SD L
eaka
ge (W
atts
)
1.5X Tr Growth
2X Tr Growth
Temp = 100C
2V1.7V
1.5V
1.2V
1V 0.9V
Digital Design for Low Power Systems/S. Borkar5
Product Cost Pressure
0200400600800
100012001400
1998 2000 2002 2004 2006
Des
k To
p A
SP ($
)
Other
Thermal $10
Power Delivery
$25
Platform Budget
Shrinking ASP & budget for power
Digital Design for Low Power Systems/S. Borkar6
Unconstrained Total Power
0
200
400
600
800
1000
1200
1400Po
wer
(W),
Pow
er D
ensi
ty (W
/cm2 )
SiO2 LkgSD LkgActive
10 mm Die
90nm 65nm 45nm 32nm 22nm 16nm
Technology, Circuits, & Architecture to constrain power
Digital Design for Low Power Systems/S. Borkar7
Active Power Reduction
Slow Fast Slow
Low
Sup
ply
Volta
ge
Hig
h Su
pply
Vo
ltage
Multiple Supply Voltages
Multiple supply voltages mimic Vdd scalingIssues:Performance loss due to interface circuitsMultiple Vdd routing overhead
Digital Design for Low Power Systems/S. Borkar8
SD Leakage Avoidance—Dual Vt
Delay
# of
Pat
hs High High VtVt
0%
20%
40%
60%
80%
100%
0% 5% 10% 15% 20% 25%% T scaling from all high-Vt design
Low
-Vtt
rans
isto
r wid
th(%
of t
otal
tran
sist
or w
idth
)
Full Low Vt Performance
0%
20%
40%
60%
80%
100%
0% 5% 10% 15% 20% 25%% T scaling from all high-Vt design
Low
-Vtt
rans
isto
r wid
th(%
of t
otal
tran
sist
or w
idth
)
Full Low Vt Performance
# of
Pat
hs Low Low VtVt
Delay
Delay
# of
Pat
hs Dual Dual VtVt Leakage 3X smallerLeakage 3X smaller(Active & Standby)(Active & Standby)No performance lossNo performance loss
Digital Design for Low Power Systems/S. Borkar9
Leakage Control CircuitsBody Bias
VddVbp
Vbn-Ve
+Ve
Sleep Transistor
Logic Block
22--1000X1000XReductionReduction
Stack Effect
Equal Loading
22--10X10XReduction
55--10X10XReductionReduction Reduction
Digital Design for Low Power Systems/S. Borkar10
Body BiasReverse BB Forward BB
Increases Vt
Reduces Ioff
Reduces Vt
Increases Ioff
To inactive circuits to reduce leakage
To active circuits to improve performance
VddVbp
Vbn-Ve
+Ve
Tradeoff & analysis required• Applying BB consumes
active power• Switching delay between
body bias
Digital Design for Low Power Systems/S. Borkar11
Reverse Body Bias100
Leak
age
Red
uctio
n Fa
ctor
(X)
110C0.5V RBB110C0.5V RBB
Lower VTHigher VT10
Shorter L
10.01 0.1 1 10 100 100
0Target Ioff (nA/µm)
RBB less effective at shorter L and lower Vt
* A. Keshavarzi et. al., 2001 Int. Symp. Low Power Electronics & Design (ISLPED)
Digital Design for Low Power Systems/S. Borkar12
Scalability of RBB
000.0E+0
100.0E-9
200.0E-9
300.0E-9
400.0E-9
-1.0 -0.8 -0.6 -0.4 -0.2 0.0Body Bias (V)
Pow
er (W
atts
)
Total Leakage Power
SD leakage
Ibn junction leakageIbp junction leakage
Optimum
0.35 µm 0.18 µmOptimum RBB
2V 0.5V
Ioff Red. 1000X 10X
I/O circuit
Microprocessor critical path circuit
I/O circuit
Microprocessor critical path circuit
Total Leakage PowerMeasured on Test Chip
0.18 µm
RBB Less effective with technology scaling* A. Keshavarzi et. al., 1999 Int. Symp. Low Power Electronics & Design (ISLPED)
Digital Design for Low Power Systems/S. Borkar13
Forward Body Bias
0
0.5
1
1.5
0 200 400 600Forward body bias (mV)
Nor
mal
ized
op
erat
ing
freq
uenc
y
450mV
0
0.5
1
1.5
0 200 400 600Forward body bias (mV)
Nor
mal
ized
op
erat
ing
freq
uenc
y
450mV
250
500
750
1000
1250
1500
1750
2000
0.9 1.1 1.3 1.5 1.7Vcc (V)
Fmax
(MH
z)
Body bias chipwith 450 mV FBB
NBB chip& body biaschip withZBBTj ~ 60°C
250500750
10001250150017502000
0 5 10 15 20Active power (W)
Fmax
(MH
z)
Body bias chipwith 450 mV FBB
Body biaschip withZBB
Tj ~ 60°C250500750
10001250150017502000
0 5 10 15 20Active power (W)
Fmax
(MH
z)
Body bias chipwith 450 mV FBB
Body biaschip withZBB
Tj ~ 60°C
Router chip with body bias
Digital Core
PLL
CB
G
24 LBGs
I/O: S-Links
I/O: F-LinksDigital Core
PLLPLL
CB
GC
BG
24 LBGs
I/O: S-LinksI/O: S-Links
I/O: F-LinksI/O: F-Links
* S. Narendra et al, ISSCC 2002, 16.4
Die size 10.1 X 10.1 mm2Technology 150nm CMOSTransistors 6.6 millionArea overhead 2%Power overhead 1%
* S. Narendra et al, ISSCC 2002, 16.4
FBB increases frequency & SD leakage
Digital Design for Low Power Systems/S. Borkar14
Adaptive Body Bias Experiment
Resistor Network
4.5 mm
5.3
mm
Multiplesubsites PD & Counter Resistor
Network
CUT Bias Amplifier
Delay
1.6 X 0.24 mm, 21 sites per die150nm CMOSTechnology 150nm CMOS
Number of subsites per die 21
Body bias range 0.5V FBB to 0.5V RBB
Bias resolution 32 mV
Die frequency: Min(F1..F21)Die power: Sum(P1..P21)
Digital Design for Low Power Systems/S. Borkar15
Adaptive Body Bias Results
0%
20%
60%
100%noBB
100% yield
ABB
Higher Frequency
Num
ber o
f die
s
Frequency
too slow
ftarget
too leaky
ftarget
ABB
FBB RBB
Num
ber o
f die
s
Frequency
too slow
ftarget
too leaky
ftarget
ABB
FBB RBB
97% highest bin
within die ABB
For given Freq and Power densityFor given Freq and Power density•• 100% yield with ABB 100% yield with ABB •• 97% highest freq bin with ABB for 97% highest freq bin with ABB for within die variability within die variability
Acc
epte
d di
e
Digital Design for Low Power Systems/S. Borkar16
Transistor Stack
00.20.40.60.8
11.2
0 0.5 1 1.5Vint(V)
Nor
mal
ized
Cur
rentVdd
Vint
Stack leakage is ~5-10X smaller
* S. Narendra et. al., 2001 Int. Symp. Low Power Electronics & Design (ISLPED)
Digital Design for Low Power Systems/S. Borkar17
Scalability of Stack Effect
0
5
10
15
20
0 0.5 1 1.5 2
Stac
k ef
fect
fact
or X Zero body bias
80 C
0
5
10
15
20
0 0.5 1 1.5 2
Stac
k ef
fect
fact
or X
Zero body bias80 C
Vdd (V)
0.13 µm LVT
0.13 µm HVT
0.18 µm
Model
0.18 0.13 0.09 0.06
Technology generation (µm)
Zero body bias80 C
1.8 V @ 0.18 µm
0.18 0.13 0.09 0.06
Technology generation
Stac
k ef
fect
fact
or
(µm)
Zero body bias80 C
0
10
20
30
40
50
0
10
20
30
40
50
X 1.8 V @ 0.18 µm
10% Vdd scaling
20% Vdd scaling
Stack effects becomes stronger with scaling
Digital Design for Low Power Systems/S. Borkar18
Exploiting Natural Stacks32-bit Kogge-Stone adder
0%10%20%30%
5.0 5.6 6.2 6.8 7.4 105 120 135
vect
ors High VT Low VT
% o
f inp
ut
Standby leakage current (µA)
High VT Low VT
Energy Overhead
1.64 nJ 1.84 nJ
Savings 2.2 µA 38.4 µAMin time in Standby
84 µS 5.4 µS
Reduction Avg WorstHigh VT 1.5X 2.5XLow VT 1.5X 2X
* Y. Ye et. al., 1998 Symp. VLSI Circuits
Digital Design for Low Power Systems/S. Borkar19
Stack Forcing
0
1
2
3
4
5
1.E-05 1.E-04 1.E-03 1.E-02 1.E-01 1.E+00
Ioff (Relative)
Del
ay (R
elat
ive)
High Vt Low Vt
Leakage ReductionPerformance Loss
Equal Loading
Circuit technique provides additional Vt’s
Digital Design for Low Power Systems/S. Borkar20
Sleep Transistor Technique
ALU core
Vddexternal
DynamicALU 32
Vssexternal
Sleep ALU
ScanFIFO
Control
Body bias
Output scan
Non-sleep ALU
Sleep ALU
ScanFIFO
Control
Body bias
Output scan
Non-sleep ALU
0
2
4
6
8
10
12
Clock gating +body bias
Clock gatingonly
Clock gating +sleep transistor
Tota
pow
er (m
W)
Switching Leakage Overhead4GHz, 75Cα=0.1 10.3%9.6%
Frequency change
Leakage savings
Area over-head
PMOS sleep transistor -2% 13X 11%
PMOS body bias 0% 2X 2%
J. Tschanz et al, ISSCC 2003, 6.1
Digital Design for Low Power Systems/S. Borkar21
Leakage Tolerant Register File
* R. Krishnamurthy et. al., 2001 Symp. VLSI Circuits
Local Bit-line
RS0
D0 • •• •
RS15
D15
Φ1
N0LBL0
LBL1M1
M2
Local Bit-line
RS0
D0 • •• •
RS15
D15
Φ1
N0LBL0
LBL1M1
M2
Domino: leakage-sensitive
Pseudo-static: leakage-tolerant
• • • •
Φ1
RS0
Vs
D0#
Pk
M1
M2
LBL0
LBL1
Px
RS15
D15#
• • • •
Φ1
RS0
Vs
D0#
Pk
M1
M2
LBL0
LBL1
Px
RS15
D15#
0
0.05
0.1
0.15
0.2
0.25
150 160 170 180 190 200Read Delay (ps)
LBL
DC
Noi
se R
obus
tnes
s
6GHz design
This work
Dual-Vt
Low-VtNoise floor
Keeper upsizing
Improves delayImproves noise marginLeakage tolerant
Digital Design for Low Power Systems/S. Borkar22
Employing Leakage ControlN
umbe
r of P
aths
Low Vt
Low Vt +Stack Forcing
Dual Vt
Dual Vt +Stack Forcing
TargetDelay
Digital Design for Low Power Systems/S. Borkar23
Optimum Frequency
Pipeline Depth
02468
10
1 2 3 4 5 6 7 8 9 10Relative Pipeline Depth
Power Efficiency
Optimum
Process Technology
02468
10
1 2 3 4 5 6 7 8 9 10Relative Frequency
Sub-threshold Leakage increases
exponentially
Pipeline & Performance
02468
10
1 2 3 4 5 6 7 8 9 10Relative Frequency (Pipelining)
Performance
DiminishingReturn
Maximum performance with• Optimum pipeline depth • Optimum frequency
Digital Design for Low Power Systems/S. Borkar24
Efficiency of Microarchitectures
0
1
2
3
4
S-Scalar Dynamic DeepPipeline
Gro
wth
(X) f
rom
pre
viou
s uA
rch
Die AreaPerformancePower
Same Process Technology
0%
20%
40%
S-Scalar Dynamic DeepPipeline
Red
uctio
n in
MIP
S/W
att Same Process Technology
Enegry efficiency drops ~20%
Employ efficient microarchitecture and design
Digital Design for Low Power Systems/S. Borkar25
Memory Latency
MemoryCPU Cache
Small~few Clocks Large
50-100ns1
10
100
1000
100 1000 10000
Freq (MHz)M
emor
y La
tenc
y (C
lock
s)Assume: 50ns Memory latency
Cache miss hurts performanceWorse at higher frequency
Digital Design for Low Power Systems/S. Borkar26
Increase on-die Memory
0%
25%
50%
75%
100%
1u 0.5u 0.25u 0.13u 65nm
Cac
he %
of T
otal
Are
a
486 Pentium®
Pentium® III
Pentium® 4
Pentium® M
Power Density of Logic ≈ 10X of Memory
Increased Data Bandwidth & Reduced LatencyHence, higher performance for much lower power
Digital Design for Low Power Systems/S. Borkar27
Multi-threading
0%
20%
40%
60%
80%
100%
100% 98% 96%
Cache Hit %
Perf
orm
ance
1 GHz
2 GHz
3 GHz
ST Wait for Mem
Single ThreadSingle Thread
MT1 Wait for Mem
MT2 Wait
MT3
MultiMulti--ThreadingThreading
Full HW Utilization
Thermals & Power Delivery Thermals & Power Delivery designed for full HW utilizationdesigned for full HW utilization
Multi-threading improves performance without impacting thermals & power delivery
Digital Design for Low Power Systems/S. Borkar28
Throughput Oriented Design
Logic Block
VddFreq = 1Throughput = 1Active Power = 1SD Lkg Power = 1
0.7 x Vdd
Logic Block Freq = 0.7Throughput = 1.4Active Power = 0.7SD Lkg Power = 0.7
Logic Block
Higher logic throughput, yet lower power
Digital Design for Low Power Systems/S. Borkar29
Special Purpose HardwareTCP Offload EngineTCP Offload Engine
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1995 2000 2005 2010 2015M
IPS
GP MIPS@75W
TOE MIPS@~2WTC
B ExecCore
PLL
OOO
ROM
CA
M1
TCB Exec
Core
PLL
ROB
ROM
CLB
Inputseq
Sendbuffer
Opportunities:Network processing enginesMPEG Encode/Decode enginesSpeech engines
2.23 mm X 3.54 mm, 260K transistors
Special Purpose HW—Best Mips/Watt
Digital Design for Low Power Systems/S. Borkar30
Chip Level Multi-processingVoltage Frequency Power Performance
1% 1% 3% 0.66%Rule of thumb:
In the same process technology…
Core
Cache
Core
Cache
Core
Voltage = -15%Freq = -15%Area = 2Power = 1Perf = ~1.8
Voltage = 1Freq = 1Area = 1Power = 1Perf = 1
Digital Design for Low Power Systems/S. Borkar31
Multi-CorePower
Large Core
CachePower = 1/4
C1 C2
C3 C4
Cache
1
2
3
4 Performance Performance = 1/2
SmallCore1
2
1 1
1
2
3
4
1
2
3
4 MultiMulti--Core:Core:Power efficientPower efficient
Better power and Better power and thermal managementthermal management
Digital Design for Low Power Systems/S. Borkar32
From Multi to Many…
0
5
10
15
20
25
30
TPT OneApp
TwoApp
FourApp
EightApp
Syst
em P
erfo
rman
ceLargeMedSmall
Single Core Performance
1
0.5
0.3
0
0.2
0.4
0.6
0.8
1
1.2
Larg
e
Med
Smal
l
Rel
ativ
e Pe
rfor
man
ce
13mm, 100W, 48MB Cache, 4B Transistors, in 22nm12 Cores 48 Cores 144 Cores
Digital Design for Low Power Systems/S. Borkar33
Future Multi-core Platform
GP GP
GP
GP GP
GP
GP
GP GP
GP
GP GP
General Purpose CoresSP SP
SP SPSpecial Purpose HW
CC
CC
CC
CC
CC
CC
CC
CC Interconnect fabric
Heterogeneous MultiHeterogeneous Multi--Core PlatformCore Platform
Digital Design for Low Power Systems/S. Borkar34
Low Power & PlatformBreakdown of Platform Power
CPU MCH, ICH
Gfx
Memory
CLK Gen
Com
Disk
Other
Display
Mobile
CPU MCH, ICH
Gfx
Disk
Other
Power Supply Loss
Desktop
CPU & electronics power in a platform is lowMust reduce power of other platform ingredients
Digital Design for Low Power Systems/S. Borkar35
Efficiency of Power Delivery
70%
75%
80%
85%
90%
0 10 20 30 40 50
I (amp)
Effic
ienc
y (%
)Workload %
Design PointLoss
Difficult to maintain high efficiency across workloadsPower supply design needs to exploit new technologies
Digital Design for Low Power Systems/S. Borkar36
Typical Power Delivery System
Mobile Platform, ~50W
PowerDeliveryElectronics
Digital Design for Low Power Systems/S. Borkar37
Fine-grain Power Management
Usage
Power Supply Response
TimeImprove response timeBring power supply closer
Fast Slow
Slow FastCache
Provide multiple supply voltagesFine grain Vdd and Freq scaling
Cache Core Hopping for hot spot &power density management
Digital Design for Low Power Systems/S. Borkar38
Software & Low PowerProvide guidance to compilers and operating system through policyApplications
Add hooks for fine grain power management, and assist OS by providing checkpoints
Compilers
Fine grain power management algorithmControl HW through firmwareOperating System
Hardware controlDetect conditions, take action, notify OSFirmware
Fine grain power management is possible only by cooperation of the entire software stack
Digital Design for Low Power Systems/S. Borkar39
Co-optimizationTechnology Circuits µArch Platform Software
SD Leakage (Ioff)Number of Vt’sGate dielectric (Tox)
Leakage avoidance, control & tolerance
Optimal pipelineThroughput oriented micro-architecture
Fine grain power management hardware & software
Supply voltage (Vdd)
Multiple Vddcircuits
Partitioning of logicError correction
Efficient power delivery & management of multiple Vdd’s
Transistor density
Tradeoff frequency for transistors
Throughput oriented micro-architecture
Software and applications to utilize logic throughput
Interconnect RC delay tolerance Interconnect cost vsefficiency of power delivery
Digital Design for Low Power Systems/S. Borkar40
Summary• Power and energy will limit integration• Employ circuit techniques for SD leakage
avoidance, control, & tolerance• Microarchitecture for Low Power• Opportunities to save power at the platform
level—Hardware & Software• Technology, Circuits, Architecture, and
System co-optimization will be key