elec516/10 lecture 9 1 elec 516 vlsi system design and design automation spring 2010 lecture 9 - low...
Post on 20-Dec-2015
237 Views
Preview:
TRANSCRIPT
ELEC516/10 Lecture 91
ELEC 516 VLSI System Design and Design Automation Spring
2010Lecture 9 - Low Power Digital
CMOS Design
Reading Assignment: Rabaey: Chapter 4 and 7,11
Note: some of the figures in this slide set are adapted from the slide setof “ Digital Integrated Circuits” by Rabaey, Copyright UCB 2002
ELEC516/10 Lecture 92
Why worry about power?-- Heat Dissipation
Besides low power required for portable applications !!!!
Power is a very big concern in today’s advanced technologies. Power produces heat on the chip which has to be carried off through the chip socket expensive packaging solutions
ELEC516/10 Lecture 93
Motivation for low power design
• Packaging costs• Power supply rail design• Chip and system cooling costs• Noise immunity and system reliability• Battery life (in portable systems)• Environmental concerns
– ICT equipment accounted for 10% of total US commercial energy usage in 2010 and may reach 20% by 2020
– Energy Star compliant systems
ELEC516/10 Lecture 94
Why worry about power — Portability
Multimedia Terminals
Laptop Computers
Digital Cellular Telephony
BATTERY(40+ lbs)
Year
Nom
inal
Cap
acity
(W
att-
hour
s / l
b)
Nickel-Cadium
Ni-Metal Hydride
65 70 75 80 85 90 95 0
10
20
30
40
50
Rechargable Lithium
Expected Battery Lifetime increaseover next 5 years: 30-40%
ELEC516/10 Lecture 95
Why worry about power? -- Standby Power
Drain leakage will increase as VT decreases to maintain noise margins and meet frequency demands, leading to excessive battery draining standby power consumption.
8KW
1.7KW
400W
88W 12W
0%
10%
20%
30%
40%
50%
2000 2002 2004 2006 2008
Sta
nd
by
Po
wer
Source: Borkar, De Intel
Year 2002 2005 2008 2011 2014
Power supply Vdd (V) 1.5 1.2 0.9 0.7 0.6
Threshold VT (V) 0.4 0.4 0.35 0.3 0.25
…and phones leaky!
ELEC516/10 Lecture 96
Power and Energy Figures of Merit
• Power consumption in Watts– determines battery life in hours
• Peak power– determines power ground wiring designs– sets packaging limits– impacts signal noise margin and reliability analysis
• Energy efficiency in Joules– rate at which power is consumed over time
• Energy = power * delay– Joules = Watts * seconds– lower energy number means less power to perform a
computation at the same frequency
ELEC516/10 Lecture 97
Power versus Energy
Watts
time
Power is height of curve
Watts
time
Approach 1
Approach 2
Approach 2
Approach 1
Energy is area under curve
Lower power design could simply be slower
Two approaches require the same energy
ELEC516/10 Lecture 98
PDP and EDP• Power-delay product (PDP) = Pav * tp = (CLVDD
2)/2
– PDP is the average energy consumed per switching event (Watts * sec = Joule)
– lower power design could simply be a slower design
allows one to understand tradeoffs better
0
5
10
15
0.5 1 1.5 2 2.5
Vdd (V)
Energ
y-Dela
y (no
rmali
zed)
energy-delay
energy
delay
Energy-delay product (EDP) = PDP * tp = Pav * tp2
EDP is the average energy consumed multiplied by the computation time required
takes into account that one can trade increased delay for lower energy/operation (e.g., via supply voltage scaling that increases delay, but decreases energy consumption)
ELEC516/10 Lecture 99
• Dynamic power– Charge-discharge current:
• Dominant source of power (CV2 per transition)
– Short circuit current (both NMOS and PMOS on during transit)• <10% of c/d current if transitions are fast
• Subthreshold leakage (transistors not OFF completely)– Becoming important 10-30% active power in <0.18um techn– Diode leakage from reverse source and drain diodes (neglig)– Gate leakage (no longer negligible due to very thin gate oxide)
Where Does Power Go in CMOS?
ELEC516/10 Lecture 910
Dynamic Power Consumption
Vin Vout
CL
Energy/transition = C L * V dd2
Power = Energy/transition * f = C L * V dd2 * f
Need to reduce C L , V dd, and f to reduce power.
Vdd
Not a function of transistor sizes!
ELEC516/10 Lecture 911
Dynamic Power Consumption - Revisited
Power = Energy/transition * transition rate
= CL * V dd2 * f 0 1
= C L * V dd2 * P 0 1* f
= C EFF * V dd2 * f
Power Dissipation is Data DependentFunction of Switching Activity
CEFF = Effective Capacitance = C L * P0 1
ELEC516/10 Lecture 912
Power Consumption is Data Dependent
Example: Static 2 Input NOR Gate
Assume:P(A=1) = 1/2P(B=1) = 1/2
P(Out=1) = 1/4P(01)
= 3/4 X 1/4 = 3/16
Then:
= P(Out=0).P(Out=1)
CEFF = 3/16 * CL
ELEC516/10 Lecture 913
Transition Probabilities for Basic Gates
ELEC516/10 Lecture 914
Transition Probability of 2-input NOR Gate
ELEC516/10 Lecture 916
Inter-signal Correlations
B
A
Z
X
P(Z=1) = P(B=1) & P(A=1 | B=1)
0.5
0.5
(1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16
(1- 3/16 x 0.5) x (3/16 x 0.5) = 0.085Reconvergent
• Determining switching activity is complicated by the fact that signals exhibit correlation in space and time– reconvergent fan-out
• Have to use conditional probabilities
ELEC516/10 Lecture 917
Logic Restructuring
Chain implementation has a lower overall switching activity than the tree implementation for random inputs
Ignores glitching effects
Logic restructuring: changing the topology of a logic network to reduce transitions
A
BC
D F
AB
CD Z
FW
X
Y0.5
0.5
(1-0.25)*0.25 = 3/16
0.50.5
0.5
0.5
0.5
0.5
7/64
15/256
3/16
3/16
15/256
AND: P01 = P0 x P1 = (1 - PAPB) x PAPB
ELEC516/10 Lecture 919
Input Ordering
Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5)
A
BC
X
F
0.5
0.20.1
B
CA
X
F
0.2
0.10.5
(1-0.5x0.2)x(0.5x0.2)=0.09 (1-0.2x0.1)x(0.2x0.1)=0.0196
ELEC516/10 Lecture 920
How about Dynamic Circuits?
Power is Only Dissipated when Out=0!
Mp
Me
VDD
PDN
In1In2In3
Out
CEFF = P(Out=0).C L
ELEC516/10 Lecture 921
2-input NOR Gate
Example: Dynamic 2 Input NOR Gate
Assume:P(A=1) = 1/2P(B=1) = 1/2
P(Out=0) = 3/4
Then:
CEFF = 3/4 * CL
Switching Activity Is Always Higher in Dynamic Circuits
ELEC516/10 Lecture 922
Transition Probabilities for Dynamic Gates
Switching Activity for Precharged Dynamic Gates
P01 = P0
ELEC516/10 Lecture 924
Glitching in Static CMOS Networks
ABC
X
Z
101 000
Unit Delay
AB
X
ZC
• Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards)– glitch: node exhibits multiple transitions in a single cycle
before settling to the correct logic value
ELEC516/10 Lecture 925
Example : Adder Circuit
0 5 100.0
2.0
4.0
Time, ns
Sum
Out
put V
olta
ge, V
olts
Cin
S15
S10
6
5
4
3
2S1
Add0 Add1 Add2 Add14 Add15
S0 S1 S2 S14 S15
Cin
ELEC516/10 Lecture 926
How to Cope with Glitching?
F1
F2
F3
F1
F3
F2
0
0
0
0
1
2
0
0
0
01
1
Equalize Lengths of Timing Paths Through Design
ELEC516/10 Lecture 927
Balanced Delay Paths to Reduce Glitching
So equalize the lengths of timing paths through logic
F1
F2
F3
0
0
0
0
1
2
F1
F2
F3
0
0
0
0
1
1
Glitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs
ELEC516/10 Lecture 928
Glitch Reduction by Pipelining• Glitches depend on the logic depth of the circuit - gates
deeper in the logic network are more prone to glitching– arrival times of the gate inputs are more spread due to
delay imbalances– usually affected more by primary input switching
• Reduce logic depth by adding pipeline registers– additional energy used by the clock and pipeline registers
PC
Fetch Decode Execute Memory WriteBack
Inst
ruct
ion
MA
R
MD
R
I$ D$
clk
pipelinestage
isolationregister
ELEC516/10 Lecture 929
Short Circuit Power Consumption
Finite slope of the input signal causes a direct current path between VDD and GND for a short period of time during switching when both the NMOS and PMOS transistors are conducting.
Vin Vout
CL
Isc
ELEC516/10 Lecture 930
Short Circuit Currents Determinates
• Duration and slope of the input signal, tsc
• Ipeak determined by – the saturation current of the P and N transistors which
depend on their sizes, process technology, temperature, etc.– strong function of the ratio between input and output slopes
• a function of CL
Esc = tsc VDD Ipeak P01
Psc = tsc VDD Ipeak f01
ELEC516/10 Lecture 931
Impact of CL on Psc
Vin Vout
CL
Isc 0
Vin Vout
CL
Isc Imax
Large capacitive load
Output fall time significantly larger than input rise time.
Small capacitive load
Output fall time substantially smaller than the input rise
time.
ELEC516/10 Lecture 932
Ipeak as a Function of CL
-0.5
0
0.5
1
1.5
2
2.5
0 2 4 6
I pea
k (A
)
time (sec)
x 10-10
x 10-4
CL = 20 fF
CL = 100 fF
CL = 500 fF
500 psec input slope
Short circuit dissipation is minimized by matching the rise/fall times of the input and output signals - slope engineering.
When load capacitance is small, Ipeak is large.
ELEC516/10 Lecture 933
Static Power Consumption
Vin=5V
Vout
CL
Vdd
Istat
Pstat = P(In=1) .Vdd . Istat
• Dominates over dynamic consumption
• Not a function of switching frequency
ELEC516/10 Lecture 934
Leakage (Static) Power Consumption
Sub-threshold current is the dominant factor.
All increase exponentially with temperature!
VDD Ileakage
Vout
Drain junction leakage
Sub-threshold currentGate leakage
ELEC516/10 Lecture 935
Power consideration – leakage current
)1( //)( kTqVnkTqVVsub
dsthgs eeKI K: technology constant; q: electronic charge; k: Boltzman constantN: nonlinearity constant (between 1 and 2); T: Temperature
ELEC516/10 Lecture 936
Leakage as a Function of VT
0 0.2 0.4 0.6 0.8 1
VGS (V)
ID (A
)
VT=0.4V
VT=0.1V
10-2
10-12
10-7
Continued scaling of supply voltage and the subsequent scaling of threshold voltage will make subthreshold conduction a dominate component of power dissipation.
An 90mV/decade VT roll-off - so each 255mV increase in VT gives 3 orders of magnitude reduction in leakage (but adversely affects performance)
ELEC516/10 Lecture 937
Sub-Threshold in MOS
VT=0.6VT=0.2
ID
VGS
Lower Bound on Threshold to Prevent Leakage
ELEC516/10 Lecture 938
Low Power Design Space
• The dynamic power consumption equation reveals the three degrees of freedom inherent in the low power design space:– Voltage– Physical capacitance– Data activity
• Optimization for power entails an attempt to reduce one or more of these factors. Interactions among these factors complicate the optimization problem.
• Deep sub-micron design - need to minimize leakage and sub-threshold current
ELEC516/10 Lecture 939
Power Reduction Strategy (I)
• Voltage Reduction
– 5V ->3.3 V -> 2.5V ->1.8V-> 1.0V
– Mixed supplies in system and/or on chip, by using the minimum voltage for different chips of functions within a chip, together with on-chip voltage converters if required.
• Low-voltage circuit techniques are required to give good performances even with low voltages
• Less noisy structures and better signal integrity handling is required
• Lower Vth process is required to maintain good transistor speed performances
ELEC516/10 Lecture 940
Reducing Vdd
P x td = E t = CL * Vdd2
E(Vdd=2)=
(CL) * (2) 2
(CL) * (5) 2E(Vdd=5)
Strong function of voltage (V 2 dependence).
Relatively independent of logic function and style.
E(Vdd=2) 0.16 E (Vdd =5)
0.03
0.05
0.07
0.1
0.15
0.20
0.30
0.50
0.70
1.00
1.5
1 2 5
51 stage ring oscillator
8-bit adder
Vdd (volts)
quadratic dependence
NO
RM
AL
IZE
D P
OW
ER
-DE
LA
Y P
RO
DU
CT
Power Delay Product Improves with lowering V DD .
ELEC516/10 Lecture 941
Lower Vdd Increases Delay
C L * Vdd
I=Td
Td(Vdd=5)
Td(Vdd=2)=
(2) * (5 - 0.7)2
(5) * (2 - 0.7)2
4
I ~ (Vdd - Vt)2
Relatively independent of logic function and style.
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
5.50
6.00
6.50
7.00
7.50
2.00 4.00 6.00Vdd (volts)
NO
RM
AL
I ZE
D D
EL
AY
adder (SPICE)
microcoded DSP chip
multiplier
adder
ring oscillator
clock generator2.0m technology
ELEC516/10 Lecture 942
Lowering the Threshold
DESIGN FOR PLeakage == PDynamic
Vt = 0.2Vt = 0
ID
VGS
Reduces the Speed Loss, But Increases Leakage
Vdd
Delay
2Vt
Interesting Design Approach:
ELEC516/10 Lecture 943
What Threshold Voltage to Use?
• Energy vs. Vt for a fixed throughput
• An “optimal” Vt/Vdd point trades switching and leakage energy
ELEC516/10 Lecture 944
Stack Effect• Leakage is a function of the circuit topology and the value
of the inputsVT = VT0 + (|-2F + VSB| - |-2F|)
where VT0 is the threshold voltage at VSB = 0; VSB is the source- bulk (substrate) voltage; is the body-effect coefficient
A B
B
A
Out
VX
A B VX ISUB
0 0 VT ln(1+n) VGS=VBS= -VX
0 1 0 VGS=VBS=0
1 0 VDD-VT VGS=VBS=0
1 1 0 VSG=VSB=0
• Leakage is least when A = B = 0• Leakage reduction due to stacked
transistors is called the stack effect
ELEC516/10 Lecture 945
Leakage as a Function of Design Time VT
• Reducing the VT increases the sub-threshold leakage current (exponentially)
– 90mV reduction in VT increases leakage by an order of magnitude
• But, reducing VT decreases gate delay (increases performance)
0 0.2 0.4 0.6 0.8 1
VGS (V)ID
(A)
VT=0.4V
VT=0.1V
• Determine the critical path(s) at design time and use low VT devices on the transistors on those paths for speed. Use a high VT on the other logic for leakage control.
– A careful assignment of VT’s can reduce the leakage by as much as 80%
ELEC516/10 Lecture 946
Dual-Thresholds Inside a Logic Block
• Minimum energy consumption is achieved if all logic paths are critical (have the same delay)
• Use lower threshold on timing-critical paths– Assignment can be done on a per gate or transistor basis;
no clustering of the logic is needed– No level converters are needed
ELEC516/10 Lecture 947
Variable VT (ABB) at Run Time
• VT = VT0 + (|-2F + VSB| - |-2F|)
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
-2.5 -2 -1.5 -1 -0.5 0
VSB (V)
VT (
V)
A negative bias on VSB causes VT to increase
Adjusting the substrate bias at run time is called adaptive body-biasing (ABB)
Requires a dual well fab process
For an n-channel device, the substrate is normally tied to ground (VSB = 0)
ELEC516/10 Lecture 948
Techniques for Burst Mode Computation
High VT
High VT
Low VT
SLEEP
SLEEP
-+
-+
onstandby
standbyon
Vin Vout
Vdd
Vp >0
Vn < 0
Multiple VT Technology Substrate Bias ControlledVariable VT Devices
(Disable high VT devices during idle periods) (increase VT during idle periods)e.g. [Sakata-93], [Mutoh-93] e.g. [Seta-95]
ELEC516/10 Lecture 949
Multiple-Threshold Circuits
• Previous approaches cannot reduce leakage power during active mode.
• Use dual Vt CMOS logic, low Vt for critical paths and high Vt for non-critical paths, problem - large threshold swing
• Triple Vt CMOS circuit to reduce the sub-thresold swing [Fujii et. al. ISSCC-98]– for high speed low power active mode, low and medium
Vt are used for critical and non-critical paths, respectively.
– For standby mode, high Vt MOSFET is inserted between the supply rail and the virtual supply rail.
ELEC516/10 Lecture 950
Multiple-Threshold Circuits
ELEC516/10 Lecture 951
Power considerations – reduce leakage
• Leakage proportional to device width– Use smallest devices for critical path.
• Leakage drops with stacked devices (drain voltage divider)– Use stacked transistors for critical path.
• Leakage drops with increasing channel length – Slightly increase L for critical Path
• Use dual VT process providing two threshold VT – Use high VT transistors for critical path
ELEC516/10 Lecture 952
Power consideration – reduce leakage
• Switch off critical path transistors when not needed.
• Stand-by mode between supply and virtual supply lines
• Stand-by vectors – Apply input vector which
minimizes leakage.
– Achieved using Mux
ELEC516/10 Lecture 953
Power gating transistor Sizing• The effect of power gating transistor size
– As the size decreases, logic performance also decreases.
– As the size increases, leakage current and chip area also increase.
– Proper sizing is very important.– power gating transistor size should be decided within
2% performance degradation.
Vop = VDD - V
V must be sizedwithin 2% performance degradation.
VDD
GND
Low Vt
High Vt SwitchControl
ELEC516/10 Lecture 954
Power Reduction Strategy (II)
• Reducing capacitance– Process scaling and better integration, with smaller
capacitances in more aggressive processes– Improved devices and interconnect technology– Efficient clock generation and distribution– Good memory hierarchy – In-place optimization using a library containing ranges
of gates with different strengths, through replacement of gates to use the optimum drive in the critical paths and minimum drive elsewhere
ELEC516/10 Lecture 955
Reducing Effective Capacitance
Global bus architecture Local bus architecture
Shared Resources incur Switching Overhead
ELEC516/10 Lecture 956
Power consideration – Reduce Capacitance
• Reduce switched capacitance C– Careful transistor sizing, transistor ordering, tighter and
more compact layout
– Hierarchical architecture and add TG to isolate buses
– Segmented structures
Shared bus driven by A or B whenSending values to C
Insert switch to isolate bus segmentwhen B sending to C
ELEC516/10 Lecture 957
Power Reduction Strategy (III)
• Reducing activities– lowering operating frequency– Using power management strategies, such as Gated
clocks, Power-Down of non-operational units– Reduce switching activities, Power = Energy/transition *
transition rate =
• Power dissipation is data dependent and hence is a function of switching activity.
• P0->1 is the switching probability
• Effective switching capacitance = Ceff=CL* P0->1
fVCfPVCfVC ddeffddLddL 2
102
102
ELEC516/10 Lecture 958
Factors Affecting Power Consumption - Revisited
• Degree of freedom for low power design space:– Voltage
– Physical capacitance
– Data activity
• Power minimization approaches:– Run at minimum allowable voltage
– Minimize effective switching capacitances
ELEC516/10 Lecture 959
System Level - Power Down Techniques
Operating States:
Active or Full-On(fastest clock)
Standby(slow clock)
Suspend or Sleep(slow clock or shut down)
Micro
Processor
Activity Analyzer
ELEC516/10 Lecture 960
Dynamic Power as a Function of VDD
• Decreasing the VDD
decreases dynamic energy consumption (quadratically)
• But, increases gate delay (decreases performance) 1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
VDD (V) t p
( no
r ma
l ize
d)
• Determine the critical path(s) at design time and use high VDD for the transistors on those paths for speed. Use a lower VDD on the other gates, especially those that drive large capacitances (as this yields the largest energy benefits).
ELEC516/10 Lecture 961
Multiple VDD Considerations
• How many VDD? – Two is becoming common– Many chips already have two supplies (one for core and one for
I/O)• When combining multiple supplies, level converters are required
whenever a module at the lower supply drives a gate at the higher supply (step-up)– If a gate supplied with VDDL drives a gate at VDDH, the PMOS never
turns off• The cross-coupled PMOS transistors
do the level conversion• The NMOS transistor operate on a
reduced supply
– Level converters are not needed for a step-down change in voltage
– Overhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside the flipflop
VDDH
Vin
VoutVDDL
ELEC516/10 Lecture 962
Dual-Supply Inside a Logic Block
• Minimum energy consumption is achieved if all logic paths are critical (have the same delay)
• Clustered voltage-scaling– Each path starts with VDDH and switches to VDDL (gray logic
gates) when delay slack is available– Level conversion is done in the flipflops at the end of the
paths
ELEC516/10 Lecture 963
Power Conscious Behavioral Design
• Run functional units at the minimum allowed voltage while satisfying the timing constraints
• Parallelize or pipeline data-path, memory and controllers to compensate for throughput loss due to reduced supply voltage
• Power down functional units which are not is use; Put in “dynamic power management” capability
• Avoid centralized resources (controllers, functional blocks, global busses, etc.) as much possible
• Map functions to hardware so that inter-chip communication is minimized
• Schedule and bind operations to functional units so as to reduce the activity of the input operands
• Reorder operands to reduce switching activity; Keep the inputs to an idle unit unchanged
ELEC516/10 Lecture 964
Low Power Datapath Design - reducing the supply voltage
• Reducing supply voltage has quadratic effect on power saving, but a negative effect on performance.
• Performance can be gained back by logical and architectural optimizations, e.g. lookahead adder instead of ripple-carry adder, using parallelism to increase performance
ELEC516/10 Lecture 965
Power Considerations – reduce f and VDD
• Reducing frequency does not save energy, just reduces rate at which it is consumed– Power is lower but system must run longer
• Reducing supply voltage is very effective (reduce voltage by 0.5 improves energy/transition by 0.25).
• Dropping the voltage will result in reduced performance in terms of speed (need to recover the performance using parallelism).
• Trade surplus performance for lower energy by reducing the supply voltage until performance are as required
ELEC516/10 Lecture 966
Power considerations – Dynamic voltage Scaling
Can run at lower voltage and hence improves quadraticaly power comsp.
– 8-bit adder/comparator: (Chandrakasan er. Al.)• consumes Pref at 40Mhz at 5V with Area= 0.53mm2
– Two parallel interleaved architecture:• Consumes 0.36Pref at 20MHz, 2.9V with Area=1.80mm2
– One Pipelined architecture:• Consumes 0.39Pref at 40MHz, 2.9V with Area=0.69mm2
– Pipelined and parallel• Consumes 0.2Pref at 20MHz, 2.0V with Area=1.96mm2
ELEC516/10 Lecture 967
Minimizing the power consumption using parallelism
• Reference Design
Critical path = 25ns, clock frequency = 40MHzSupply voltage = 5V
refrefrefref fVCP 2
ELEC516/10 Lecture 968
• Parallel implementation of the reference design
New critical path = 50nsec,Cpar->2.15 Cref, Vpar ->2.9V, fpar ->0.5 fref
refref
refrefparparparpar Pf
VCfVCP 36.0)2
()58.0)(15.2( 22
ELEC516/10 Lecture 969
Pipelined Data Path
• Critical path delay is less => max[Tadder, Tcomparator].
• Keeping clock rate constant: fpipe=fref, Voltage can be dropped to Vpipe = Vref/1.7, while maintaining the original througput
• Capacitance slightly higher: Cpipe=1.15Cref
• Ppipe=(1.15Cref)(Vref/1.7)2fref 0.39Pref
ELEC516/10 Lecture 970
How low a voltage can be used
• Capacitance overhead starts to dominate at “high” levels of parallelism and results in an optimum voltage.
ELEC516/10 Lecture 971
Voltage as a design variable
Adapting Voltage to Workload yields cubic reduction
ELEC516/10 Lecture 972
Multiple supply voltages – Filter Example
1
2
3
4
5
6
7
8
9
10
* * * ** * * *
+ +* * * *
+ ++ +
+ +
+ +
+ +
+ + + +
Power (5V)/Power(5V,3V,2.4V) = 1.5 [Raje95]
2.4V
3V
5V
ELEC516/10 Lecture 973
Using Multiple Voltages
C1 C2 C3
Vdd1Vdd2 Vdd3
Vdd1Vdd2
Vdd3
critical path
non-critical
Vdd1 Vdd2
Vdd1Vdd2
[Horowitz-95]
ELEC516/10 Lecture 974
Processor Usage Model
• System Optimizations:– Maximize Peak Throughput
– Minimize Average Energy/operation
– Maximize computation per battery life
DesiredThroughput
time
Compute-intensive andLow-latency processes
Ceiling: set by topSpeed of the processor
Single-user system not always computing
Background and high-latency processes
ELEC516/10 Lecture 975
Scale Supply Voltage with fclk
ELEC516/10 Lecture 976
Dynamic Voltage Scaling Implementation
• VCO: ring oscillator which matches P critical path• DC-DC: perform D/A, converts battery to regulated
Vdd
• Provides both voltage regulation and clock generation.
Loop Filter DC-DC VCO
P
-+Mfref
fout
Frequency detector battery
Vdd
ELEC516/10 Lecture 977
Dynamic Voltage Scaling in Practice
Fixed Throughput, Energy/operation
Occasionally Demand Peak Throughput
Throughput = 8MIPSEnergy/ops = 0.24nJ/inst. 1.92mW
Throughput = 100MIPSEnergy/ops = 2.2nJ/inst. 220mW
Peak Throughput = 100MIPSAverage Energy/op. = 0.24nJ/inst (~1.8mW)
DVS only advantageous when a majority of computation is performed at low throughput
ELEC516/10 Lecture 978
Low-voltage Switching Regulator
• Arbitrary Vdd (<Vin) generated using the Buck converter– Vdd = Vin Duty Cycle at Node X
• Chief sources of inefficiencies:– Conduction loss (I2R)
– Switching loss (CxVin2fs and LsI2fs)
– Gate-drive loss (CgVin2fs)
Page 17
[Stratakos-94]
ELEC516/10 Lecture 979
Adaptive Power Supply Voltage
• Exploit Data Dependent Computation Times To vary the Supply [Nielsen94]
RE
G
FIF
O
FIF
O
RE
G
Control
Self-timedProcessor
Power Supply
Vdd(t)
ELEC516/10 Lecture 980
Variable Supply Voltage Control Scheme
ELEC516/10 Lecture 981
Voltage Scheduling for variable workload system
• Voltage scheduling under timing constraints• Example [Ishihara-98]
– Energy consumption of a processor:• 10nJ/cycle at 2.5V• 25nJ/cycle at 4 V• 40nJ/cycle at 5V
– maximum clock frequencies:• 50MHz at 5V, 40MHz at 4V, 25MHz at 2.5V
– Given that an application needs 1000M cycles to finish and the timing constraint is 25sec.
ELEC516/10 Lecture 982
Different Voltage Schedules
0 5 10 15 20 25 Time(sec)
5.021000Mcycles50MHz
40J
(A)
0 5 10 15 20 25 Time(sec)
5.02750Mcycles50MHz
32.5J
(B)
0 5 10 15 20 25 Time(sec)
5.02
1000Mcycles40MHz
25J (C)
Timing constraint
2.52
250Mcycles25MHz
4.02
En
ergy
con
sum
pti
on (
Vd
d2 )
ELEC516/10 Lecture 983
Reduce Power Further by Buffering
Two samples processed every two sample periods-> increased latency
ELEC516/10 Lecture 984
Example of Buffering
Block 1
Block 2
Block 3
Block 4
Block 1, 2
Block 1, 2
Block 3,4
Block 3,4
Tsample Tsample
Vdd =5
Vdd =2.5
Vdd =5
Vdd =2.5
Vdd =3.75
Vdd =3.75
Vdd =3.75
Vdd =3.75
C
CCEnergysample
142
)5.2(21
)5( 22
C
CEnergysample
5.102
)75.3()75.0(2 2
ELEC516/10 Lecture 985
Voltage Island Concept
Trade off power for delay by running functional blocks at different voltages Can use mix of Low and High Vt to balance performance and leakage Switch off inactive blocks to reduce leakage power Requires IP standards for power management, clock gating, etc.
Delay vs. Voltage30
25
20
15
10
5
0
Dde
lay
(ps)
0.7 0.8 0 .9 1.0 1.1 1.2 1.3Voltage (Vdd)
Std. Vt Low Vt
E.g.: Telecom ASIC with 1.0/1.2 V islands saved :16 % active power50 % standby power
Power Management Unit
SWITCH SWITCH
LogicLow VT
Logic
Vddo
Vdd1 Vdd2
IP1 IP2
Source from Bergamaschi
*Slide from Prof Kyung of KAIST
ELEC516/10 Lecture 986
Power Management
I/O’s, VReg, Gnd
Memory ArraysVdd 4
High Vt device arraysOptimized for low active
power
Memory ArraysVdd 3
Low Vt device arraysOptimized for low active
power
MicrocontrollerVdd 2
DSPVdd 2
ROMVdd 1
Monitor Logic Vdd 4
ROMVdd1
RLM 1
RLM 2
Memory ArraysVdd 3
Low Vt device arraysOptimized for low active
power
I/O’s, VReg, Gnd
Analog Vdd 5
RLM 3
Vdd 1
I/O
’s,
VR
eg,
Gnd
I/O
’s,
VR
eg,
Gnd
Independently controlled domain power switches Multiple On-Chip Voltage Islands On-Chip Voltage Regulators
*Slide from Prof Kyung of KAIST
ELEC516/10 Lecture 987
Controlling VDD and VTH for low power
Active Stand-byMultiple VTH Dual-VTH MTCMOS
Variable VTH VTH hopping VTCMOS
Multiple VDD Dual-VDD Boosted gate MOS
Variable VDD VDD hopping
Software-hardware cooperation
Technology-circuit cooperation
MTCMOS : Multi-Threshold CMOS VTCMOS : Variable Threshold CMOS
Multiple : spatial assignment Variable : temporal assignment
*Slide from Prof Kyung of KAIST
ELEC516/10 Lecture 988
RTL-level optimization-Reducing effective switching activity
• General Principle: Avoid Waste.– Application-specific processing
– Resource sharing/Locality of reference
– Data representations
– Preservation of Data correlations
– architectural restructuring
– Distributed processing
– Demand-driven/data-driven computation
ELEC516/10 Lecture 989
Application Specific Processing
Application Specific Processing Reduces“Implementation Overhead”
ELEC516/10 Lecture 990
Eliminating Redundant Computation
• Dynamically vary the number of operations per sample.
• Trade power consumption and filter quality [Ludwig-96]
fs fs fs fs fs
….
Out
Power Down
ELEC516/10 Lecture 991
Eliminating Redundant Computation
Switched Capacitance Reduction ~=Peak Number of Operations
Average Number of Operations
Strong Function of Signal Statistics~=2
ELEC516/10 Lecture 992
Reducing switching activity
• Multiplexing multiple operations on a single hardware unit can have detrimental effect on the power consumption, because the switching activity may be increased, e.g. shared bus
ELEC516/10 Lecture 993
Bus Multiplexing
• Share long data buses with time multiplexing (S1 uses even cycles, S2 odd)
S2
S1D1
D2
S1
S2 D2
D1
• Buses are a significant source of power dissipation due to high switching activities and large capacitive loading– 15% of total power in Alpha 21064– 30% of total power in Intel 80386
• But what if data samples are correlated (e.g., sign bits)?
ELEC516/10 Lecture 994
Reducing the Effective Capacitance
• Circuit and Logic Style - select a circuit style with low capacitance and/or switching activity, e.g. 8-bit adder
ELEC516/10 Lecture 995
Reducing Glitching Activity• Some circuit structures can be the cause of
spurious transients, e.g. a 16-bit ripple carry adder
• Glitches can be reduced by selecting structures that have balanced signal paths, e.g. tree.
• The Brent-Kung lookahead adder and Wallace tree multiplier both have this properties, thus more power attractive.
ELEC516/10 Lecture 996
Data Representation
Two’s complement Sign Magnitude
• Sign-extension activity significantly reduced using sign magnitude representation.
• An accumulator example: sign magnitude datapath switches 30% less capacitance for uniformly distributed inputs
ELEC516/10 Lecture 997
Data Representation – Accumulator Example
Sign magnitude datapath switches 30% less capacitance for uniformly distributed inputs
ELEC516/10 Lecture 998
Two’s Complement vs. Sign-Magnitude
Two’s complement datapath has a significantly higher switching activity
ELEC516/10 Lecture 999
Bus encoding to reduce switching activity
• Minimizing temporal bit transition activity by data representation
• Bit encoding: – Active high encoding
• high-level voltage for 1, low-level voltage for 0– Transition-based encoding
• voltage change identifies logic 1, no voltage range identifies logic 0
• Word encoding– assign patterns of 1’s and 0’s to each word of information– Non-redundant codes vs. redundant codes
• Example of low power coding– Limited-weight code– Gray code– One-hot code– Bus-invert code
ELEC516/10 Lecture 9100
Example of Bit Encoding
• Reducing average no. of switching on a data bus• transition-based encoding may limit the no. of transitions for
non equiprobable input lines• Let p(0) > p(1) and no temporal correlation exists. If active-
high encoding is used, the average no. of transition is
• For transition-based encoding, it is simply p(1). If p(1) << p(0), transition-based is better than active-high encoding.
• To reduce transition, the input patterns are transformed in such a way that the p(0) and p(1) prob. of each input line becomes as different as possible, and then to apply a transition-based encoding before the data is transmitted.
)1())1(1(2)0()1()1()0( pppppp
ELEC516/10 Lecture 9101
Word encoding• One-hot coding
•Gray coding - good for sequential data, e.g. addressing for microprocessor [Su-94]
•Disadvantage of Gray code - only good for address bus, not for data bus, additional conversion circuitry is needed
ELEC516/10 Lecture 9102
Word encoding (cont.)• Bus-invert encoding - use redundancy to save power.• Given a n bit data bus with 2n patterns to be represented and
assumes all the patterns are equiprobable and no temporal correlation. p(0) = p(1) = 0.5, the average no. of transition is n/2 per cycle, while the worst case transition is n.
• Bus-invert coding - add an extra line to the bus, I, and then comparing the consecutive patterns before transmission. Two cases– - If the Hamming distance between the two patterns =< n/2, the
current pattern is transmitted as it is and I is set to 0.– - if the Hamming distance between the two patterns > n/2, the
current pattern is first inverted and then transmitted, and I is set to 1.
• Max. transition is limited to n/2 and average transition is reduced by 25%
• Drawback - extra line I to indicate whether a pattern has been inverted.
ELEC516/10 Lecture 9103
Conditional Inversion Coding
ELEC516/10 Lecture 9104
Bus encoding for cross coupling cap.
• Cross-coupling cap dominates for very deep sub-micron technology, e.g. < 70nm
• Bus model – Stand alone cap. Cs
– Cross coupling cap. Cc
• Stand alone switching – Apply to single bit line
– 0-1 transition
• Cross coupling switching – Occurs on adjacent wires
– Four types of coupling transition H --> L
H --> L
L --> H
L --> L
H --> L
L --> H
H --> L
H --> L
Type 1 Type 2 Type 3 Type 4
0 1 2 0
CsCc
Cc Cs
Cs
bit 0
bit 1
bit 2
ELEC516/10 Lecture 9105
Permutation Based Address code
• Rearrange the physical order of the bit lines
• Work efficiently on address bus (40%), but not on instruction bus (4%)– Correlation is lower – Permutation is fixed
Sender Receiver
ELEC516/10 Lecture 9106
Dynamic Reconfigurable Bus Encoding Scheme for Instruction Bus
ELEC516/10 Lecture 9107
Overview of the SchemeDecoding informationstored as header
Decoding information are loaded to LUT first
Instruction is called
Decoding information stored in LUT will control the Cross_bar
Instructions enter Cross_bar to decode
Encoding
.
.
.
MEM
B1
Bn
Mem_bus
Processing Element
.
.
B1
Bm
CACHE
CPU Core
Cache_busDecoder
(Cross_bar)
Look-UpTable
Address_bus
Target ProcessorDecoding
Computer Target
ProcessorBit lines reordering
Encoding during compilation
Mem
ELEC516/10 Lecture 9108
Demand/Data-driven operation
• Clocking strategy– Gated clocking
• System Power Down• Computing Paradigm
ELEC516/10 Lecture 9109
Logic Level Power Down Technique
• Activity driven - precomputation-based sequential logic optimization– Selectively precompute the output logic values of the circuit
one clock cycle before they are required, and use the precomputed values to reduce internal switching activity in the succeeding cycle
It is required thatg1 = 1 -> f = 1g2 = 1 -> f = 0
A
g2
LE
Original Circuit
Modified Circuit
fAR1
R1 R2
FF
FF
g1
R2
ELEC516/10 Lecture 9110
An example: n-bit comparator
• This circuit compares two n-bit numbers A & and computes the function A > B
• In general, precomputation works best when there are a small number of complex functions corresponding to the logic block A
top related