elec516/10 lecture 9 1 elec 516 vlsi system design and design automation spring 2010 lecture 9 - low...

107
ELEC516/10 Lecture ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey: Chapter 4 and 7,11 Note: some of the figures in this slide set are adapted from the slide set of “ Digital Integrated Circuits” by Rabaey, Copyright UCB 2002

Post on 20-Dec-2015

237 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 91

ELEC 516 VLSI System Design and Design Automation Spring

2010Lecture 9 - Low Power Digital

CMOS Design

Reading Assignment: Rabaey: Chapter 4 and 7,11

Note: some of the figures in this slide set are adapted from the slide setof “ Digital Integrated Circuits” by Rabaey, Copyright UCB 2002

Page 2: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 92

Why worry about power?-- Heat Dissipation

Besides low power required for portable applications !!!!

Power is a very big concern in today’s advanced technologies. Power produces heat on the chip which has to be carried off through the chip socket expensive packaging solutions

Page 3: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 93

Motivation for low power design

• Packaging costs• Power supply rail design• Chip and system cooling costs• Noise immunity and system reliability• Battery life (in portable systems)• Environmental concerns

– ICT equipment accounted for 10% of total US commercial energy usage in 2010 and may reach 20% by 2020

– Energy Star compliant systems

Page 4: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 94

Why worry about power — Portability

Multimedia Terminals

Laptop Computers

Digital Cellular Telephony

BATTERY(40+ lbs)

Year

Nom

inal

Cap

acity

(W

att-

hour

s / l

b)

Nickel-Cadium

Ni-Metal Hydride

65 70 75 80 85 90 95 0

10

20

30

40

50

Rechargable Lithium

Expected Battery Lifetime increaseover next 5 years: 30-40%

Page 5: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 95

Why worry about power? -- Standby Power

Drain leakage will increase as VT decreases to maintain noise margins and meet frequency demands, leading to excessive battery draining standby power consumption.

8KW

1.7KW

400W

88W 12W

0%

10%

20%

30%

40%

50%

2000 2002 2004 2006 2008

Sta

nd

by

Po

wer

Source: Borkar, De Intel

Year 2002 2005 2008 2011 2014

Power supply Vdd (V) 1.5 1.2 0.9 0.7 0.6

Threshold VT (V) 0.4 0.4 0.35 0.3 0.25

…and phones leaky!

Page 6: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 96

Power and Energy Figures of Merit

• Power consumption in Watts– determines battery life in hours

• Peak power– determines power ground wiring designs– sets packaging limits– impacts signal noise margin and reliability analysis

• Energy efficiency in Joules– rate at which power is consumed over time

• Energy = power * delay– Joules = Watts * seconds– lower energy number means less power to perform a

computation at the same frequency

Page 7: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 97

Power versus Energy

Watts

time

Power is height of curve

Watts

time

Approach 1

Approach 2

Approach 2

Approach 1

Energy is area under curve

Lower power design could simply be slower

Two approaches require the same energy

Page 8: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 98

PDP and EDP• Power-delay product (PDP) = Pav * tp = (CLVDD

2)/2

– PDP is the average energy consumed per switching event (Watts * sec = Joule)

– lower power design could simply be a slower design

allows one to understand tradeoffs better

0

5

10

15

0.5 1 1.5 2 2.5

Vdd (V)

Energ

y-Dela

y (no

rmali

zed)

energy-delay

energy

delay

Energy-delay product (EDP) = PDP * tp = Pav * tp2

EDP is the average energy consumed multiplied by the computation time required

takes into account that one can trade increased delay for lower energy/operation (e.g., via supply voltage scaling that increases delay, but decreases energy consumption)

Page 9: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 99

• Dynamic power– Charge-discharge current:

• Dominant source of power (CV2 per transition)

– Short circuit current (both NMOS and PMOS on during transit)• <10% of c/d current if transitions are fast

• Subthreshold leakage (transistors not OFF completely)– Becoming important 10-30% active power in <0.18um techn– Diode leakage from reverse source and drain diodes (neglig)– Gate leakage (no longer negligible due to very thin gate oxide)

Where Does Power Go in CMOS?

Page 10: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 910

Dynamic Power Consumption

Vin Vout

CL

Energy/transition = C L * V dd2

Power = Energy/transition * f = C L * V dd2 * f

Need to reduce C L , V dd, and f to reduce power.

Vdd

Not a function of transistor sizes!

Page 11: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 911

Dynamic Power Consumption - Revisited

Power = Energy/transition * transition rate

= CL * V dd2 * f 0 1

= C L * V dd2 * P 0 1* f

= C EFF * V dd2 * f

Power Dissipation is Data DependentFunction of Switching Activity

CEFF = Effective Capacitance = C L * P0 1

Page 12: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 912

Power Consumption is Data Dependent

Example: Static 2 Input NOR Gate

Assume:P(A=1) = 1/2P(B=1) = 1/2

P(Out=1) = 1/4P(01)

= 3/4 X 1/4 = 3/16

Then:

= P(Out=0).P(Out=1)

CEFF = 3/16 * CL

Page 13: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 913

Transition Probabilities for Basic Gates

Page 14: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 914

Transition Probability of 2-input NOR Gate

Page 15: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 916

Inter-signal Correlations

B

A

Z

X

P(Z=1) = P(B=1) & P(A=1 | B=1)

0.5

0.5

(1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16

(1- 3/16 x 0.5) x (3/16 x 0.5) = 0.085Reconvergent

• Determining switching activity is complicated by the fact that signals exhibit correlation in space and time– reconvergent fan-out

• Have to use conditional probabilities

Page 16: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 917

Logic Restructuring

Chain implementation has a lower overall switching activity than the tree implementation for random inputs

Ignores glitching effects

Logic restructuring: changing the topology of a logic network to reduce transitions

A

BC

D F

AB

CD Z

FW

X

Y0.5

0.5

(1-0.25)*0.25 = 3/16

0.50.5

0.5

0.5

0.5

0.5

7/64

15/256

3/16

3/16

15/256

AND: P01 = P0 x P1 = (1 - PAPB) x PAPB

Page 17: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 919

Input Ordering

Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5)

A

BC

X

F

0.5

0.20.1

B

CA

X

F

0.2

0.10.5

(1-0.5x0.2)x(0.5x0.2)=0.09 (1-0.2x0.1)x(0.2x0.1)=0.0196

Page 18: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 920

How about Dynamic Circuits?

Power is Only Dissipated when Out=0!

Mp

Me

VDD

PDN

In1In2In3

Out

CEFF = P(Out=0).C L

Page 19: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 921

2-input NOR Gate

Example: Dynamic 2 Input NOR Gate

Assume:P(A=1) = 1/2P(B=1) = 1/2

P(Out=0) = 3/4

Then:

CEFF = 3/4 * CL

Switching Activity Is Always Higher in Dynamic Circuits

Page 20: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 922

Transition Probabilities for Dynamic Gates

Switching Activity for Precharged Dynamic Gates

P01 = P0

Page 21: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 924

Glitching in Static CMOS Networks

ABC

X

Z

101 000

Unit Delay

AB

X

ZC

• Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards)– glitch: node exhibits multiple transitions in a single cycle

before settling to the correct logic value

Page 22: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 925

Example : Adder Circuit

0 5 100.0

2.0

4.0

Time, ns

Sum

Out

put V

olta

ge, V

olts

Cin

S15

S10

6

5

4

3

2S1

Add0 Add1 Add2 Add14 Add15

S0 S1 S2 S14 S15

Cin

Page 23: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 926

How to Cope with Glitching?

F1

F2

F3

F1

F3

F2

0

0

0

0

1

2

0

0

0

01

1

Equalize Lengths of Timing Paths Through Design

Page 24: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 927

Balanced Delay Paths to Reduce Glitching

So equalize the lengths of timing paths through logic

F1

F2

F3

0

0

0

0

1

2

F1

F2

F3

0

0

0

0

1

1

Glitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs

Page 25: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 928

Glitch Reduction by Pipelining• Glitches depend on the logic depth of the circuit - gates

deeper in the logic network are more prone to glitching– arrival times of the gate inputs are more spread due to

delay imbalances– usually affected more by primary input switching

• Reduce logic depth by adding pipeline registers– additional energy used by the clock and pipeline registers

PC

Fetch Decode Execute Memory WriteBack

Inst

ruct

ion

MA

R

MD

R

I$ D$

clk

pipelinestage

isolationregister

Page 26: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 929

Short Circuit Power Consumption

Finite slope of the input signal causes a direct current path between VDD and GND for a short period of time during switching when both the NMOS and PMOS transistors are conducting.

Vin Vout

CL

Isc

Page 27: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 930

Short Circuit Currents Determinates

• Duration and slope of the input signal, tsc

• Ipeak determined by – the saturation current of the P and N transistors which

depend on their sizes, process technology, temperature, etc.– strong function of the ratio between input and output slopes

• a function of CL

Esc = tsc VDD Ipeak P01

Psc = tsc VDD Ipeak f01

Page 28: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 931

Impact of CL on Psc

Vin Vout

CL

Isc 0

Vin Vout

CL

Isc Imax

Large capacitive load

Output fall time significantly larger than input rise time.

Small capacitive load

Output fall time substantially smaller than the input rise

time.

Page 29: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 932

Ipeak as a Function of CL

-0.5

0

0.5

1

1.5

2

2.5

0 2 4 6

I pea

k (A

)

time (sec)

x 10-10

x 10-4

CL = 20 fF

CL = 100 fF

CL = 500 fF

500 psec input slope

Short circuit dissipation is minimized by matching the rise/fall times of the input and output signals - slope engineering.

When load capacitance is small, Ipeak is large.

Page 30: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 933

Static Power Consumption

Vin=5V

Vout

CL

Vdd

Istat

Pstat = P(In=1) .Vdd . Istat

• Dominates over dynamic consumption

• Not a function of switching frequency

Page 31: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 934

Leakage (Static) Power Consumption

Sub-threshold current is the dominant factor.

All increase exponentially with temperature!

VDD Ileakage

Vout

Drain junction leakage

Sub-threshold currentGate leakage

Page 32: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 935

Power consideration – leakage current

)1( //)( kTqVnkTqVVsub

dsthgs eeKI K: technology constant; q: electronic charge; k: Boltzman constantN: nonlinearity constant (between 1 and 2); T: Temperature

Page 33: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 936

Leakage as a Function of VT

0 0.2 0.4 0.6 0.8 1

VGS (V)

ID (A

)

VT=0.4V

VT=0.1V

10-2

10-12

10-7

Continued scaling of supply voltage and the subsequent scaling of threshold voltage will make subthreshold conduction a dominate component of power dissipation.

An 90mV/decade VT roll-off - so each 255mV increase in VT gives 3 orders of magnitude reduction in leakage (but adversely affects performance)

Page 34: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 937

Sub-Threshold in MOS

VT=0.6VT=0.2

ID

VGS

Lower Bound on Threshold to Prevent Leakage

Page 35: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 938

Low Power Design Space

• The dynamic power consumption equation reveals the three degrees of freedom inherent in the low power design space:– Voltage– Physical capacitance– Data activity

• Optimization for power entails an attempt to reduce one or more of these factors. Interactions among these factors complicate the optimization problem.

• Deep sub-micron design - need to minimize leakage and sub-threshold current

Page 36: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 939

Power Reduction Strategy (I)

• Voltage Reduction

– 5V ->3.3 V -> 2.5V ->1.8V-> 1.0V

– Mixed supplies in system and/or on chip, by using the minimum voltage for different chips of functions within a chip, together with on-chip voltage converters if required.

• Low-voltage circuit techniques are required to give good performances even with low voltages

• Less noisy structures and better signal integrity handling is required

• Lower Vth process is required to maintain good transistor speed performances

Page 37: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 940

Reducing Vdd

P x td = E t = CL * Vdd2

E(Vdd=2)=

(CL) * (2) 2

(CL) * (5) 2E(Vdd=5)

Strong function of voltage (V 2 dependence).

Relatively independent of logic function and style.

E(Vdd=2) 0.16 E (Vdd =5)

0.03

0.05

0.07

0.1

0.15

0.20

0.30

0.50

0.70

1.00

1.5

1 2 5

51 stage ring oscillator

8-bit adder

Vdd (volts)

quadratic dependence

NO

RM

AL

IZE

D P

OW

ER

-DE

LA

Y P

RO

DU

CT

Power Delay Product Improves with lowering V DD .

Page 38: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 941

Lower Vdd Increases Delay

C L * Vdd

I=Td

Td(Vdd=5)

Td(Vdd=2)=

(2) * (5 - 0.7)2

(5) * (2 - 0.7)2

4

I ~ (Vdd - Vt)2

Relatively independent of logic function and style.

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

5.00

5.50

6.00

6.50

7.00

7.50

2.00 4.00 6.00Vdd (volts)

NO

RM

AL

I ZE

D D

EL

AY

adder (SPICE)

microcoded DSP chip

multiplier

adder

ring oscillator

clock generator2.0m technology

Page 39: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 942

Lowering the Threshold

DESIGN FOR PLeakage == PDynamic

Vt = 0.2Vt = 0

ID

VGS

Reduces the Speed Loss, But Increases Leakage

Vdd

Delay

2Vt

Interesting Design Approach:

Page 40: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 943

What Threshold Voltage to Use?

• Energy vs. Vt for a fixed throughput

• An “optimal” Vt/Vdd point trades switching and leakage energy

Page 41: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 944

Stack Effect• Leakage is a function of the circuit topology and the value

of the inputsVT = VT0 + (|-2F + VSB| - |-2F|)

where VT0 is the threshold voltage at VSB = 0; VSB is the source- bulk (substrate) voltage; is the body-effect coefficient

A B

B

A

Out

VX

A B VX ISUB

0 0 VT ln(1+n) VGS=VBS= -VX

0 1 0 VGS=VBS=0

1 0 VDD-VT VGS=VBS=0

1 1 0 VSG=VSB=0

• Leakage is least when A = B = 0• Leakage reduction due to stacked

transistors is called the stack effect

Page 42: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 945

Leakage as a Function of Design Time VT

• Reducing the VT increases the sub-threshold leakage current (exponentially)

– 90mV reduction in VT increases leakage by an order of magnitude

• But, reducing VT decreases gate delay (increases performance)

0 0.2 0.4 0.6 0.8 1

VGS (V)ID

(A)

VT=0.4V

VT=0.1V

• Determine the critical path(s) at design time and use low VT devices on the transistors on those paths for speed. Use a high VT on the other logic for leakage control.

– A careful assignment of VT’s can reduce the leakage by as much as 80%

Page 43: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 946

Dual-Thresholds Inside a Logic Block

• Minimum energy consumption is achieved if all logic paths are critical (have the same delay)

• Use lower threshold on timing-critical paths– Assignment can be done on a per gate or transistor basis;

no clustering of the logic is needed– No level converters are needed

Page 44: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 947

Variable VT (ABB) at Run Time

• VT = VT0 + (|-2F + VSB| - |-2F|)

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

-2.5 -2 -1.5 -1 -0.5 0

VSB (V)

VT (

V)

A negative bias on VSB causes VT to increase

Adjusting the substrate bias at run time is called adaptive body-biasing (ABB)

Requires a dual well fab process

For an n-channel device, the substrate is normally tied to ground (VSB = 0)

Page 45: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 948

Techniques for Burst Mode Computation

High VT

High VT

Low VT

SLEEP

SLEEP

-+

-+

onstandby

standbyon

Vin Vout

Vdd

Vp >0

Vn < 0

Multiple VT Technology Substrate Bias ControlledVariable VT Devices

(Disable high VT devices during idle periods) (increase VT during idle periods)e.g. [Sakata-93], [Mutoh-93] e.g. [Seta-95]

Page 46: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 949

Multiple-Threshold Circuits

• Previous approaches cannot reduce leakage power during active mode.

• Use dual Vt CMOS logic, low Vt for critical paths and high Vt for non-critical paths, problem - large threshold swing

• Triple Vt CMOS circuit to reduce the sub-thresold swing [Fujii et. al. ISSCC-98]– for high speed low power active mode, low and medium

Vt are used for critical and non-critical paths, respectively.

– For standby mode, high Vt MOSFET is inserted between the supply rail and the virtual supply rail.

Page 47: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 950

Multiple-Threshold Circuits

Page 48: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 951

Power considerations – reduce leakage

• Leakage proportional to device width– Use smallest devices for critical path.

• Leakage drops with stacked devices (drain voltage divider)– Use stacked transistors for critical path.

• Leakage drops with increasing channel length – Slightly increase L for critical Path

• Use dual VT process providing two threshold VT – Use high VT transistors for critical path

Page 49: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 952

Power consideration – reduce leakage

• Switch off critical path transistors when not needed.

• Stand-by mode between supply and virtual supply lines

• Stand-by vectors – Apply input vector which

minimizes leakage.

– Achieved using Mux

Page 50: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 953

Power gating transistor Sizing• The effect of power gating transistor size

– As the size decreases, logic performance also decreases.

– As the size increases, leakage current and chip area also increase.

– Proper sizing is very important.– power gating transistor size should be decided within

2% performance degradation.

Vop = VDD - V

V must be sizedwithin 2% performance degradation.

VDD

GND

Low Vt

High Vt SwitchControl

Page 51: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 954

Power Reduction Strategy (II)

• Reducing capacitance– Process scaling and better integration, with smaller

capacitances in more aggressive processes– Improved devices and interconnect technology– Efficient clock generation and distribution– Good memory hierarchy – In-place optimization using a library containing ranges

of gates with different strengths, through replacement of gates to use the optimum drive in the critical paths and minimum drive elsewhere

Page 52: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 955

Reducing Effective Capacitance

Global bus architecture Local bus architecture

Shared Resources incur Switching Overhead

Page 53: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 956

Power consideration – Reduce Capacitance

• Reduce switched capacitance C– Careful transistor sizing, transistor ordering, tighter and

more compact layout

– Hierarchical architecture and add TG to isolate buses

– Segmented structures

Shared bus driven by A or B whenSending values to C

Insert switch to isolate bus segmentwhen B sending to C

Page 54: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 957

Power Reduction Strategy (III)

• Reducing activities– lowering operating frequency– Using power management strategies, such as Gated

clocks, Power-Down of non-operational units– Reduce switching activities, Power = Energy/transition *

transition rate =

• Power dissipation is data dependent and hence is a function of switching activity.

• P0->1 is the switching probability

• Effective switching capacitance = Ceff=CL* P0->1

fVCfPVCfVC ddeffddLddL 2

102

102

Page 55: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 958

Factors Affecting Power Consumption - Revisited

• Degree of freedom for low power design space:– Voltage

– Physical capacitance

– Data activity

• Power minimization approaches:– Run at minimum allowable voltage

– Minimize effective switching capacitances

Page 56: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 959

System Level - Power Down Techniques

Operating States:

Active or Full-On(fastest clock)

Standby(slow clock)

Suspend or Sleep(slow clock or shut down)

Micro

Processor

Activity Analyzer

Page 57: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 960

Dynamic Power as a Function of VDD

• Decreasing the VDD

decreases dynamic energy consumption (quadratically)

• But, increases gate delay (decreases performance) 1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4

VDD (V) t p

( no

r ma

l ize

d)

• Determine the critical path(s) at design time and use high VDD for the transistors on those paths for speed. Use a lower VDD on the other gates, especially those that drive large capacitances (as this yields the largest energy benefits).

Page 58: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 961

Multiple VDD Considerations

• How many VDD? – Two is becoming common– Many chips already have two supplies (one for core and one for

I/O)• When combining multiple supplies, level converters are required

whenever a module at the lower supply drives a gate at the higher supply (step-up)– If a gate supplied with VDDL drives a gate at VDDH, the PMOS never

turns off• The cross-coupled PMOS transistors

do the level conversion• The NMOS transistor operate on a

reduced supply

– Level converters are not needed for a step-down change in voltage

– Overhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside the flipflop

VDDH

Vin

VoutVDDL

Page 59: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 962

Dual-Supply Inside a Logic Block

• Minimum energy consumption is achieved if all logic paths are critical (have the same delay)

• Clustered voltage-scaling– Each path starts with VDDH and switches to VDDL (gray logic

gates) when delay slack is available– Level conversion is done in the flipflops at the end of the

paths

Page 60: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 963

Power Conscious Behavioral Design

• Run functional units at the minimum allowed voltage while satisfying the timing constraints

• Parallelize or pipeline data-path, memory and controllers to compensate for throughput loss due to reduced supply voltage

• Power down functional units which are not is use; Put in “dynamic power management” capability

• Avoid centralized resources (controllers, functional blocks, global busses, etc.) as much possible

• Map functions to hardware so that inter-chip communication is minimized

• Schedule and bind operations to functional units so as to reduce the activity of the input operands

• Reorder operands to reduce switching activity; Keep the inputs to an idle unit unchanged

Page 61: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 964

Low Power Datapath Design - reducing the supply voltage

• Reducing supply voltage has quadratic effect on power saving, but a negative effect on performance.

• Performance can be gained back by logical and architectural optimizations, e.g. lookahead adder instead of ripple-carry adder, using parallelism to increase performance

Page 62: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 965

Power Considerations – reduce f and VDD

• Reducing frequency does not save energy, just reduces rate at which it is consumed– Power is lower but system must run longer

• Reducing supply voltage is very effective (reduce voltage by 0.5 improves energy/transition by 0.25).

• Dropping the voltage will result in reduced performance in terms of speed (need to recover the performance using parallelism).

• Trade surplus performance for lower energy by reducing the supply voltage until performance are as required

Page 63: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 966

Power considerations – Dynamic voltage Scaling

Can run at lower voltage and hence improves quadraticaly power comsp.

– 8-bit adder/comparator: (Chandrakasan er. Al.)• consumes Pref at 40Mhz at 5V with Area= 0.53mm2

– Two parallel interleaved architecture:• Consumes 0.36Pref at 20MHz, 2.9V with Area=1.80mm2

– One Pipelined architecture:• Consumes 0.39Pref at 40MHz, 2.9V with Area=0.69mm2

– Pipelined and parallel• Consumes 0.2Pref at 20MHz, 2.0V with Area=1.96mm2

Page 64: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 967

Minimizing the power consumption using parallelism

• Reference Design

Critical path = 25ns, clock frequency = 40MHzSupply voltage = 5V

refrefrefref fVCP 2

Page 65: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 968

• Parallel implementation of the reference design

New critical path = 50nsec,Cpar->2.15 Cref, Vpar ->2.9V, fpar ->0.5 fref

refref

refrefparparparpar Pf

VCfVCP 36.0)2

()58.0)(15.2( 22

Page 66: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 969

Pipelined Data Path

• Critical path delay is less => max[Tadder, Tcomparator].

• Keeping clock rate constant: fpipe=fref, Voltage can be dropped to Vpipe = Vref/1.7, while maintaining the original througput

• Capacitance slightly higher: Cpipe=1.15Cref

• Ppipe=(1.15Cref)(Vref/1.7)2fref 0.39Pref

Page 67: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 970

How low a voltage can be used

• Capacitance overhead starts to dominate at “high” levels of parallelism and results in an optimum voltage.

Page 68: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 971

Voltage as a design variable

Adapting Voltage to Workload yields cubic reduction

Page 69: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 972

Multiple supply voltages – Filter Example

1

2

3

4

5

6

7

8

9

10

* * * ** * * *

+ +* * * *

+ ++ +

+ +

+ +

+ +

+ + + +

Power (5V)/Power(5V,3V,2.4V) = 1.5 [Raje95]

2.4V

3V

5V

Page 70: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 973

Using Multiple Voltages

C1 C2 C3

Vdd1Vdd2 Vdd3

Vdd1Vdd2

Vdd3

critical path

non-critical

Vdd1 Vdd2

Vdd1Vdd2

[Horowitz-95]

Page 71: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 974

Processor Usage Model

• System Optimizations:– Maximize Peak Throughput

– Minimize Average Energy/operation

– Maximize computation per battery life

DesiredThroughput

time

Compute-intensive andLow-latency processes

Ceiling: set by topSpeed of the processor

Single-user system not always computing

Background and high-latency processes

Page 72: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 975

Scale Supply Voltage with fclk

Page 73: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 976

Dynamic Voltage Scaling Implementation

• VCO: ring oscillator which matches P critical path• DC-DC: perform D/A, converts battery to regulated

Vdd

• Provides both voltage regulation and clock generation.

Loop Filter DC-DC VCO

P

-+Mfref

fout

Frequency detector battery

Vdd

Page 74: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 977

Dynamic Voltage Scaling in Practice

Fixed Throughput, Energy/operation

Occasionally Demand Peak Throughput

Throughput = 8MIPSEnergy/ops = 0.24nJ/inst. 1.92mW

Throughput = 100MIPSEnergy/ops = 2.2nJ/inst. 220mW

Peak Throughput = 100MIPSAverage Energy/op. = 0.24nJ/inst (~1.8mW)

DVS only advantageous when a majority of computation is performed at low throughput

Page 75: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 978

Low-voltage Switching Regulator

• Arbitrary Vdd (<Vin) generated using the Buck converter– Vdd = Vin Duty Cycle at Node X

• Chief sources of inefficiencies:– Conduction loss (I2R)

– Switching loss (CxVin2fs and LsI2fs)

– Gate-drive loss (CgVin2fs)

Page 17

[Stratakos-94]

Page 76: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 979

Adaptive Power Supply Voltage

• Exploit Data Dependent Computation Times To vary the Supply [Nielsen94]

RE

G

FIF

O

FIF

O

RE

G

Control

Self-timedProcessor

Power Supply

Vdd(t)

Page 77: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 980

Variable Supply Voltage Control Scheme

Page 78: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 981

Voltage Scheduling for variable workload system

• Voltage scheduling under timing constraints• Example [Ishihara-98]

– Energy consumption of a processor:• 10nJ/cycle at 2.5V• 25nJ/cycle at 4 V• 40nJ/cycle at 5V

– maximum clock frequencies:• 50MHz at 5V, 40MHz at 4V, 25MHz at 2.5V

– Given that an application needs 1000M cycles to finish and the timing constraint is 25sec.

Page 79: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 982

Different Voltage Schedules

0 5 10 15 20 25 Time(sec)

5.021000Mcycles50MHz

40J

(A)

0 5 10 15 20 25 Time(sec)

5.02750Mcycles50MHz

32.5J

(B)

0 5 10 15 20 25 Time(sec)

5.02

1000Mcycles40MHz

25J (C)

Timing constraint

2.52

250Mcycles25MHz

4.02

En

ergy

con

sum

pti

on (

Vd

d2 )

Page 80: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 983

Reduce Power Further by Buffering

Two samples processed every two sample periods-> increased latency

Page 81: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 984

Example of Buffering

Block 1

Block 2

Block 3

Block 4

Block 1, 2

Block 1, 2

Block 3,4

Block 3,4

Tsample Tsample

Vdd =5

Vdd =2.5

Vdd =5

Vdd =2.5

Vdd =3.75

Vdd =3.75

Vdd =3.75

Vdd =3.75

C

CCEnergysample

142

)5.2(21

)5( 22

C

CEnergysample

5.102

)75.3()75.0(2 2

Page 82: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 985

Voltage Island Concept

Trade off power for delay by running functional blocks at different voltages Can use mix of Low and High Vt to balance performance and leakage Switch off inactive blocks to reduce leakage power Requires IP standards for power management, clock gating, etc.

Delay vs. Voltage30

25

20

15

10

5

0

Dde

lay

(ps)

0.7 0.8 0 .9 1.0 1.1 1.2 1.3Voltage (Vdd)

Std. Vt Low Vt

E.g.: Telecom ASIC with 1.0/1.2 V islands saved :16 % active power50 % standby power

Power Management Unit

SWITCH SWITCH

LogicLow VT

Logic

Vddo

Vdd1 Vdd2

IP1 IP2

Source from Bergamaschi

*Slide from Prof Kyung of KAIST

Page 83: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 986

Power Management

I/O’s, VReg, Gnd

Memory ArraysVdd 4

High Vt device arraysOptimized for low active

power

Memory ArraysVdd 3

Low Vt device arraysOptimized for low active

power

MicrocontrollerVdd 2

DSPVdd 2

ROMVdd 1

Monitor Logic Vdd 4

ROMVdd1

RLM 1

RLM 2

Memory ArraysVdd 3

Low Vt device arraysOptimized for low active

power

I/O’s, VReg, Gnd

Analog Vdd 5

RLM 3

Vdd 1

I/O

’s,

VR

eg,

Gnd

I/O

’s,

VR

eg,

Gnd

Independently controlled domain power switches Multiple On-Chip Voltage Islands On-Chip Voltage Regulators

*Slide from Prof Kyung of KAIST

Page 84: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 987

Controlling VDD and VTH for low power

Active Stand-byMultiple VTH Dual-VTH MTCMOS

Variable VTH VTH hopping VTCMOS

Multiple VDD Dual-VDD Boosted gate MOS

Variable VDD VDD hopping

Software-hardware cooperation

Technology-circuit cooperation

MTCMOS : Multi-Threshold CMOS VTCMOS : Variable Threshold CMOS

Multiple : spatial assignment Variable : temporal assignment

*Slide from Prof Kyung of KAIST

Page 85: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 988

RTL-level optimization-Reducing effective switching activity

• General Principle: Avoid Waste.– Application-specific processing

– Resource sharing/Locality of reference

– Data representations

– Preservation of Data correlations

– architectural restructuring

– Distributed processing

– Demand-driven/data-driven computation

Page 86: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 989

Application Specific Processing

Application Specific Processing Reduces“Implementation Overhead”

Page 87: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 990

Eliminating Redundant Computation

• Dynamically vary the number of operations per sample.

• Trade power consumption and filter quality [Ludwig-96]

fs fs fs fs fs

….

Out

Power Down

Page 88: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 991

Eliminating Redundant Computation

Switched Capacitance Reduction ~=Peak Number of Operations

Average Number of Operations

Strong Function of Signal Statistics~=2

Page 89: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 992

Reducing switching activity

• Multiplexing multiple operations on a single hardware unit can have detrimental effect on the power consumption, because the switching activity may be increased, e.g. shared bus

Page 90: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 993

Bus Multiplexing

• Share long data buses with time multiplexing (S1 uses even cycles, S2 odd)

S2

S1D1

D2

S1

S2 D2

D1

• Buses are a significant source of power dissipation due to high switching activities and large capacitive loading– 15% of total power in Alpha 21064– 30% of total power in Intel 80386

• But what if data samples are correlated (e.g., sign bits)?

Page 91: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 994

Reducing the Effective Capacitance

• Circuit and Logic Style - select a circuit style with low capacitance and/or switching activity, e.g. 8-bit adder

Page 92: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 995

Reducing Glitching Activity• Some circuit structures can be the cause of

spurious transients, e.g. a 16-bit ripple carry adder

• Glitches can be reduced by selecting structures that have balanced signal paths, e.g. tree.

• The Brent-Kung lookahead adder and Wallace tree multiplier both have this properties, thus more power attractive.

Page 93: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 996

Data Representation

Two’s complement Sign Magnitude

• Sign-extension activity significantly reduced using sign magnitude representation.

• An accumulator example: sign magnitude datapath switches 30% less capacitance for uniformly distributed inputs

Page 94: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 997

Data Representation – Accumulator Example

Sign magnitude datapath switches 30% less capacitance for uniformly distributed inputs

Page 95: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 998

Two’s Complement vs. Sign-Magnitude

Two’s complement datapath has a significantly higher switching activity

Page 96: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 999

Bus encoding to reduce switching activity

• Minimizing temporal bit transition activity by data representation

• Bit encoding: – Active high encoding

• high-level voltage for 1, low-level voltage for 0– Transition-based encoding

• voltage change identifies logic 1, no voltage range identifies logic 0

• Word encoding– assign patterns of 1’s and 0’s to each word of information– Non-redundant codes vs. redundant codes

• Example of low power coding– Limited-weight code– Gray code– One-hot code– Bus-invert code

Page 97: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 9100

Example of Bit Encoding

• Reducing average no. of switching on a data bus• transition-based encoding may limit the no. of transitions for

non equiprobable input lines• Let p(0) > p(1) and no temporal correlation exists. If active-

high encoding is used, the average no. of transition is

• For transition-based encoding, it is simply p(1). If p(1) << p(0), transition-based is better than active-high encoding.

• To reduce transition, the input patterns are transformed in such a way that the p(0) and p(1) prob. of each input line becomes as different as possible, and then to apply a transition-based encoding before the data is transmitted.

)1())1(1(2)0()1()1()0( pppppp

Page 98: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 9101

Word encoding• One-hot coding

•Gray coding - good for sequential data, e.g. addressing for microprocessor [Su-94]

•Disadvantage of Gray code - only good for address bus, not for data bus, additional conversion circuitry is needed

Page 99: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 9102

Word encoding (cont.)• Bus-invert encoding - use redundancy to save power.• Given a n bit data bus with 2n patterns to be represented and

assumes all the patterns are equiprobable and no temporal correlation. p(0) = p(1) = 0.5, the average no. of transition is n/2 per cycle, while the worst case transition is n.

• Bus-invert coding - add an extra line to the bus, I, and then comparing the consecutive patterns before transmission. Two cases– - If the Hamming distance between the two patterns =< n/2, the

current pattern is transmitted as it is and I is set to 0.– - if the Hamming distance between the two patterns > n/2, the

current pattern is first inverted and then transmitted, and I is set to 1.

• Max. transition is limited to n/2 and average transition is reduced by 25%

• Drawback - extra line I to indicate whether a pattern has been inverted.

Page 100: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 9103

Conditional Inversion Coding

Page 101: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 9104

Bus encoding for cross coupling cap.

• Cross-coupling cap dominates for very deep sub-micron technology, e.g. < 70nm

• Bus model – Stand alone cap. Cs

– Cross coupling cap. Cc

• Stand alone switching – Apply to single bit line

– 0-1 transition

• Cross coupling switching – Occurs on adjacent wires

– Four types of coupling transition H --> L

H --> L

L --> H

L --> L

H --> L

L --> H

H --> L

H --> L

Type 1 Type 2 Type 3 Type 4

0 1 2 0

CsCc

Cc Cs

Cs

bit 0

bit 1

bit 2

Page 102: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 9105

Permutation Based Address code

• Rearrange the physical order of the bit lines

• Work efficiently on address bus (40%), but not on instruction bus (4%)– Correlation is lower – Permutation is fixed

Sender Receiver

Page 103: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 9106

Dynamic Reconfigurable Bus Encoding Scheme for Instruction Bus

Page 104: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 9107

Overview of the SchemeDecoding informationstored as header

Decoding information are loaded to LUT first

Instruction is called

Decoding information stored in LUT will control the Cross_bar

Instructions enter Cross_bar to decode

Encoding

.

.

.

MEM

B1

Bn

Mem_bus

Processing Element

.

.

B1

Bm

CACHE

CPU Core

Cache_busDecoder

(Cross_bar)

Look-UpTable

Address_bus

Target ProcessorDecoding

Computer Target

ProcessorBit lines reordering

Encoding during compilation

Mem

Page 105: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 9108

Demand/Data-driven operation

• Clocking strategy– Gated clocking

• System Power Down• Computing Paradigm

Page 106: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 9109

Logic Level Power Down Technique

• Activity driven - precomputation-based sequential logic optimization– Selectively precompute the output logic values of the circuit

one clock cycle before they are required, and use the precomputed values to reduce internal switching activity in the succeeding cycle

It is required thatg1 = 1 -> f = 1g2 = 1 -> f = 0

A

g2

LE

Original Circuit

Modified Circuit

fAR1

R1 R2

FF

FF

g1

R2

Page 107: ELEC516/10 Lecture 9 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 9 - Low Power Digital CMOS Design Reading Assignment: Rabaey:

ELEC516/10 Lecture 9110

An example: n-bit comparator

• This circuit compares two n-bit numbers A & and computes the function A > B

• In general, precomputation works best when there are a small number of complex functions corresponding to the logic block A