on the energy efficiency of computation mihai budiu cmu cs calcm seminar feb 17, 2004 note: this...
TRANSCRIPT
On The Energy Efficiency of Computation
Mihai Budiu
CMU CS
CALCM Seminar
Feb 17, 2004
Note: this version fixes some errors in the ASH performance graphs shown
2
Presentation Setup
main( )
{
signal(SIGINT, welcome);
while (slides( ) && time( )) {
talk( );
}
}
3
Why Do We Care?
Toasted CPU: about 2 sec after removing cooler. (Tom’s Hardware Guide)
4
Power and Power Density
0
50
100
150
200
250
0.25m 0.18m 0.13m 0.1m
Wat
ts
0
25
50
75
100
Po
wer
Den
sity
(W
/cm
2)Leakage power
Active power
Power Density
Data from Fred Polack, Intel, MICRO 32
Assuming constant die size, no power management
5
Power Density Distribution
Chip surface
Data from Fred Polack, Intel, MICRO 32
6
Outline• Introduction
• Power and Energy Efficiency– data from Bob Brodersen,
Berkeley wireless group
• Synchronous Hardware Efficiency
• Asynchronous Hardware Efficiency
• ASH Efficiency
• Conclusions
7
Energy Efficiency Metric
How much computing can we can do... ...with a finite
energy source?
8
Some Arithmetic
9
Energy and Power Efficiency
The energy efficiency metric for energy constrained applications (OP/nJ) =
thermal (power) considerations when maximizing throughput (MOPS/mW).
Joule Watt
OP/nJ = MOPS/mW
10
ISSCC Chips (.18mm-.25mm)# Year Description # Year Description
1 1997 S/390
11 1998 Graphics
2 2000 PPC (SOI)
12 1998 Multimedia
3 1999 G5
13 2000 Multimedia
4 2000 G6
14 2002 Mpg decoder
5 2000 Alpha
15 1998 Multimedia
6 1998 P6
16 2001 Encryption Processor
7 1998 Alpha
17 2000 Hearing Aid Processor
8 1999 PPC
18 2000 FIR for Disk Read Head
9 1998 StrongArm
19 1998 MPEG Encoder
10 2000 Comm
20 2002 802.11a Baseband
Microprocessors DedicatedDSP’s# Year Description
11
0.01
0.1
1
10
100
1000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Chip Number
En
erg
y (P
ow
er)
Eff
icie
ncy
M
OP
S/m
WEnergy Efficiency (MOPS/mW or OP/nJ)
3 orders of magnitude!
12
Outline• Introduction
• Power and Energy Efficiency
• Synchronous Hardware Efficiency
• Asynchronous Hardware Efficiency
• ASH Efficiency
• Conclusions
13
Explaining the Difference
Operations per second:
MOPS = fclk £ N op
Operations per clock
Chip area per operation
Efficiency:
MOPS/Pchip= (fclk £ Nop )/ (Achip £ Csw £ Vdd2 £ fclk )
=1/(Aop £ Csw £ Vdd2)
Normalized switched capacitancePower:
Pchip = Achip £ Csw £ Vdd2 £ fclk
14
Supply Voltage, Vdd
0
0.5
1
1.5
2
2.5
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Chip Number
Vd
d (
Vo
lts
)
MOPS/Pchip =1/(Aop £ Csw £ Vdd2)
15
Normalized Switched Capacitance, Csw
10
30
50
70
90
110
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Chip Number
Csw
(pf
/mm
2 )
MOPS/Pchip =1/(Aop £ Csw £ Vdd2)
3x
16
Area per operation, Aop
0.01
0.1
1
10
100
1000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Chip Number
Ao
p (m
m2 p
er
op
era
tio
n)
Aop = Achip/NopMOPS/Pchip =1/(Aop £ Csw £ Vdd2)
AHA!
17
0.01
0.1
1
10
100
1000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Chip Number
En
erg
y (
Po
wer)
Eff
icie
nc
y (
MO
PS
/mW
)Focusing In
PPC
NECDSP
802.11a
18
mP: MOPS/mW=.13
Useful arithmetic
Nop = 2 (two ways)fclock = 450 MHz
) 900 MIPS
Aop = Achip/2= 42mm2
Power = 7 Watts
19
DSP: MOPS/mW=7
4 processors £ 4 ops eachNop = 16
fclock = 50 MHz) 800 MOPS
Aop = Achip/16= 5.3mm2
Power = 110 mW
20
Dedicated Design: MOPS/mW=200
Nop = 96
fclock = 25 MHz
) 2400 MOPS
Aop = 5.4 mm2/96 =.15 mm2
Power = 12 mW
Complex MAC = 8 ops
Fully parallel mapping of adaptive correlator algorithm.
21
Memory is More Power-Efficient
1
10
100
0.25m 0.18m 0.13m 0.1m
Po
wer
Den
sit
y (
Watt
s/c
m2)
Logic
Memory
Hint: use on-chip caches
22
Energy Distribution in mP
Integer execution
19%
Reservation stations
10%
Reorder buffer15%
Memory order buffer
8%
Data cache14%
Branch target buffer
6%
Floating point execution
10%Global clock
10%
Register alias table8%
“useful” (includes local clock)
23
Efficiency and Performance
• Vdd + ! fclock +, MOPS +Power +MOPS/mW *
• Better metric: Energy £ delay
–Roughly independent of Vdd
24
Efficiency and Technology
1000
100
10
1
0.1
0.01
0.0012 1 0.5 0.25 0.13 0.1 0.07
MOPS / mW
feature size [µ]
hardwired
microprocessors
[T. Claasen, ISSCC 1999]
DSP
25
How Low Can You Go?
• Energy required to compute is ZERO
• If computation is quasistatic...
• ...and no information is destroyed (reversible)
Ops/nJ ! 1
Rolf Landauer
26
Outline• Introduction
• Power and Energy Efficiency
• Synchronous Hardware Efficiency
• Asynchronous Hardware Efficiency
• ASH Efficiency
• Conclusions
27
Lutonium Performance
• Asynchronous microcontroller
• Designed and implemented at Caltech
• 0.18 mm technology
• 1.8V supply, 0.4V/0.5V th
• 200 MIPS
• 1.8 ops/nJDSP-like
Alain Martin
28
Efficiency and Supply Voltage
200
100
48
4
66
1.8
4.83
10.9
23
7.2
0
50
100
150
200
250
1.8V 1.1V 0.9V 0.8V 0.5V
Supply voltage
MIP
S
0
5
10
15
20
25
MIP
S/m
W
performance
efficiency
29
Async Processor Breakdown
ALU2%
Registers14%
Decode24%
I-Mem24%
I-Fetch24%
Slack6%
Buses2%PSW
4%
“useful”
30
Outline• Introduction
• Power and Energy Efficiency
• Synchronous Hardware Efficiency
• Asynchronous Hardware Efficiency
• ASH Efficiency
• Conclusions
31
Application-Specific Hardware
C code
Compiler forApplication
SpecificHardware
Asynchronous Circuits
Memory
32
Tool-FlowC
CASHcore
Verilog back-end
Synopsys,Cadence P/R
ASIC
180nm std. cell library, 2V
~1999technology
Mediabench kernels(1 hot function/benchmark)
Memory
33
Caveat
Memory
we model this partaccurately
optimistic speed model,no power accounting
34
ASH Performance
0
500
1000
1500
2000
2500
3000
adpc
m_d
adpc
m_e
g721
_d
g721
_e
gsm
_d
gsm
_e
jpeg_
e
mpe
g2_d
mpe
g2_e
pegw
it_d
Meg
aop
erat
ion
s p
er s
eco
nd
MOPSall
MOPSspec
MOPS
35
ASH vs 600MHz CPU
36
ASH Area
minimal RISC core
0
1
2
3
4
5
6
7
8
9
adpc
m_d
adpc
m_e
g721
_d
g721
_e
gsm
_d
gsm
_e
jpeg_
e
mpe
g2_d
mpe
g2_e
pegw
it_d
Sq
ua
re m
m
37
Normalized Area
0
10
20
30
40
50
60
70
80
90
100
adpc
m_d
adpc
m_e
g721
_d
g721
_e
gsm
_d
gsm
_e
jpeg_
e
mpe
g2_d
mpe
g2_e
pegw
it_d
So
urc
e l
ine
s/s
q m
m
many Cmacros
38
ASH Energy Efficiency
0
10
20
30
40
50
60
70
adpc
m_d
adpc
m_e
g721
_d
g721
_e
gsm
_d
gsm
_e
jpeg_
e
mpe
g2_d
mpe
g2_e
pegw
it_d
Use
ful o
pe
ratio
ns/
nJ
39
All Together Now
0.01 0.1 1 10 100 1000
Energy Efficiency (MOPS/mW or OP/nJ)
General-purpose DSP
Dedicated hardware
ASH media kernels
Asynchronous microcontroller
Microprocessors
40
Conclusions
• Performance comes at a price
• Energy efficiency is expressed in ops/nJ or MOPS/mW
• Dedicated hardware is more power-efficient than microprocessors
• ASH efficiency competitivewith dedicated hardware