sill lowpower2

67
Micro transductors ’08 Micro transductors ’08 Low Power VLSI Design 2 Low Power VLSI Design 2 Dr.-Ing. Frank Sill Department of Electrical Engineering, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil [email protected] http://www.cpdee.ufmg.br/~frank/

Upload: gabbula-raju

Post on 28-Apr-2015

21 views

Category:

Documents


1 download

DESCRIPTION

low power vlsi

TRANSCRIPT

Page 1: Sill LowPower2

Micro transductors ’08Micro transductors ’08 Low Power VLSI Design 2Low Power VLSI Design 2

Dr.-Ing. Frank SillDepartment of Electrical Engineering, Federal University of Minas Gerais,

Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil

[email protected]

http://www.cpdee.ufmg.br/~frank/

Page 2: Sill LowPower2

Micro transductors ‘08, Low Power 2 2Copyright Sill, 2008

AgendaAgenda

Recap Power reduction on

Gate level Architecture level Algorithm level System level

Page 3: Sill LowPower2

Micro transductors ‘08, Low Power 2 3Copyright Sill, 2008

Recap: Problems of Power DissipationRecap: Problems of Power Dissipation

Continuously increasing performance demands

Increasing power dissipation of technical devices

Today: power dissipation is a main problem

High Power dissipation leads to:

High efforts for cooling

Increasing operational costs

Reduced reliability

High efforts for cooling

Increasing operational costs

Reduced reliability

Reduced time of operation

Higher weight (batteries)

Reduced mobility

Reduced time of operation

Higher weight (batteries)

Reduced mobility

Page 4: Sill LowPower2

Micro transductors ‘08, Low Power 2 4Copyright Sill, 2008

CL

Recap: Consumption in CMOSRecap: Consumption in CMOS Voltage (Volt, V) Water pressure (bar) Current (Ampere, A) Water quantity per second (liter/s) Energy Amount of Water

Energy consumption is proportional to capacitive load!

0

1

Page 5: Sill LowPower2

Micro transductors ‘08, Low Power 2 5Copyright Sill, 2008

Watts

time

Power is height of curve

Watts

time

Energy is area under curve

Approach 1

Approach 2

Approach 2

Approach 1

Recap: Energy and PowerRecap: Energy and Power

Energy = Power * time for calculation = Power * Delay

Page 6: Sill LowPower2

Micro transductors ‘08, Low Power 2 6Copyright Sill, 2008

P = α f CL VDD2 + VDD Ipeak (P01 + P10 ) + VDD Ileak

Dynamic power(≈ 40 - 70% today and decreasing

relatively)

Short-circuit power(≈ 10 % today and

decreasing absolutely)

Leakage power(≈ 20 – 50 %

today and increasing)

Recap: Power Equations in CMOSRecap: Power Equations in CMOS

Page 7: Sill LowPower2

Micro transductors ‘08, Low Power 2 7Copyright Sill, 2008

System

Algorithm

Architecture

Gate

Transistor

T

T

+

ST1

ALU

ME

M

ME

MMP3

Savings Speed Error

> 70 %

40-70 %

25-40 %

15-25 %

10-15 %

Seconds

Minute

Minutes

Hour

Hours

> 50 %

25-50 %

15-30 %

10-20 %

5-10 %

Recap: Levels of Recap: Levels of OptimizationOptimization

nach Massoud Pedram

Page 8: Sill LowPower2

Micro transductors ‘08, Low Power 2 8Copyright Sill, 2008

Recap: Logic RestructuringRecap: Logic Restructuring

Chain implementation has a lower overall switching activity than tree implementation for random inputs

BUT: Ignores glitching effects

Logic restructuring: changing the topology of a logic network to reduce transitions

A

BC

D F

AB

CD Z

FW

X

Y0.5

0.5

(1-0.25)*0.25 = 3/16

0.50.5

0.5

0.5

0.5

0.5

7/64 = 0.109

15/256

3/16

3/16 = 0.188

15/256

AND: P01 = P0 * P1 = (1 - PAPB) * PAPB

Source: Timmernann, 2007

Page 9: Sill LowPower2

Micro transductors ‘08, Low Power 2 9Copyright Sill, 2008

Recap: Input OrderingRecap: Input Ordering

Beneficial: postponing introduction of signals with a high transition rate (signals with signal probability close to 0.5)

A

BC

X

F

0.5

0.20.1

B

CA

X

F

0.2

0.10.5

(1-0.5x0.2)*(0.5x0.2)=0.09 (1-0.2x0.1)*(0.2x0.1)=0.0196

Source: Irwin, 2000

AND: P01 = (1 - PAPB) * PAPB

Page 10: Sill LowPower2

Micro transductors ‘08, Low Power 2 10Copyright Sill, 2008

ABC

X

Z

101 000

Unit Delay

AB

X

ZC

Recap: GlitchingRecap: Glitching

Source: Irwin, 2000

Page 11: Sill LowPower2

Micro transductors ‘08, Low Power 2 11Copyright Sill, 2008

Design Layer: Gate LevelDesign Layer: Gate Level

Basic elements: Logic gates Sequential elements (flipflops, latches)

Behavior of elements is described in libraries

Page 12: Sill LowPower2

Micro transductors ‘08, Low Power 2 12Copyright Sill, 2008

Device Sizing (= changing gate width)

Affects input capacitance Cin

Affects load capacitance Cload

Affects dynamic power consumption Pdyn

Optimal fanout factor f for Pdyn is smaller

than for performance (especially for large

loads)

e.g., for Cload=20, Cin=1

fcircuit = 20

fopt_energy = 3.53

fopt_performance = 4.47

For Low Power: avoid oversizing (f too big)

beyond the optimal

1 2 3 4 5 6 70

0.5

1

1.5

fanout fno

rmal

ized

ene

rgy fcircuit=1

fcircuit=2

fcircuit=5

fcircuit=10

fcircuit=20

Dynamic Power and Device SizeDynamic Power and Device Size

Source: Nikolic, UCB

Page 13: Sill LowPower2

Micro transductors ‘08, Low Power 2 13Copyright Sill, 2008

0

1

2

3

4

5

6

0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4

Supply voltage (VDD)

Rel

ativ

e D

ela

y t d

0

2

4

6

8

10

Rel

ativ

e P

dyn

VVDDDD versus Delay and Power versus Delay and Power

Delay (td) and dynamic power consumption (Pdyn) are functions of VDD

tdPdyn

Page 14: Sill LowPower2

Micro transductors ‘08, Low Power 2 14Copyright Sill, 2008

Multiple VMultiple VDDDD

Main ideas: Use of different supply voltages within the same design

High VDD for critical parts (high performance needed)

Low VDD for non-critical parts (only low performance demands)

At design phase: Determine critical path(s) (see upper next slide)

High VDD for gates on those paths

Lower VDD on the other gates (in non-critical paths)

For low VDD: prefer gates that drive large capacitances (yields

the largest energy benefits)

Usually two different VDD (but more are possible)

Page 15: Sill LowPower2

Micro transductors ‘08, Low Power 2 15Copyright Sill, 2008

Level converters: Necessary, when module at lower supply drives gate at higher

supply (step-up) If gate supplied with VDDL drives a gate supplied with VDDH

then PMOS never turns off Possible implementation:

Cross-coupled PMOS transistors NMOS transistor operate on

reduced supply No need of level converters for

step-down change in voltage Reducing of overhead:

Conversions at register boundaries Embedding of inside flipflop

Multiple VMultiple VDDDD cont’d cont’d

VDDH

Vin

VoutVDDL

Page 16: Sill LowPower2

Micro transductors ‘08, Low Power 2 16Copyright Sill, 2008

FF

FF

FF

FF

FF

FF

FF

FF

FF

CLK CLK CLK

Data PathsData Paths

Data propagate through different data paths between registers (flipflops - FF)

Paths mostly differ in propagation delay times Frequency of clock signal (CLK) depends on path with longest delay

critical path

Paths

Path

Page 17: Sill LowPower2

Micro transductors ‘08, Low Power 2 17Copyright Sill, 2008

Data Paths: SlackData Paths: Slack

B

A

Y

C

time

all Inputs of G1arrived

G1 ready withevaluation

delay of G1

all inputs of G2arrived

Slack for G1

BA Y

C

G1G2

Page 18: Sill LowPower2

Micro transductors ‘08, Low Power 2 18Copyright Sill, 2008

Multiple VMultiple VDDDD in Data Paths in Data Paths Minimum energy consumption when all logic paths are critical (same

delay) Possible Algorithm: clustered voltage-scaling

Each path starts with VDDH and switches to VDDL (blue gates)

when slack is available Level conversion in flipflops at end of paths

Connected with VDDL

Connected with VDDH

Page 19: Sill LowPower2

Micro transductors ‘08, Low Power 2 19Copyright Sill, 2008

Design Layer: Architecture LevelDesign Layer: Architecture Level

Also known as Register transfer level (RTL) Base elements:

Register structures Arithmetic logic units (ALU) Memory elements

Only behavior is described

(no inner structure)

Page 20: Sill LowPower2

Micro transductors ‘08, Low Power 2 20Copyright Sill, 2008

Most popular method for power reduction of clock signals and functional units

Gate off clock to idle functional units Logic for generation of disable signal necessary

Higher complexity of control logic Higher power consumption Critical timing critical for avoiding of

clock glitches at OR gate output Additional gate delay on clock signal

Clock GatingClock Gating

Reg

clock

disable

Functionalunit

Source: Irwin, 2000

Page 21: Sill LowPower2

Micro transductors ‘08, Low Power 2 21Copyright Sill, 2008

Clock Gating cont’dClock Gating cont’d

Source: Agarwal, 2007

D QD

CLK

Clock-Gating in Low-Power Flip-Flop

Page 22: Sill LowPower2

Micro transductors ‘08, Low Power 2 22Copyright Sill, 2008

Clock Gating cont’dClock Gating cont’d

Clock gating over consideration of state in Finite-State-

Machines (FSM)

Combinational logic

LatchClock

activation logic

Flip

-flo

ps

PI

CLK

PO

Source: L. Benini and G. De Micheli,Dynamic Power Management, Boston: Springer, 1998.

Page 23: Sill LowPower2

Micro transductors ‘08, Low Power 2 23Copyright Sill, 2008

Clock Gating: ExampleClock Gating: Example

DSP/HIF

DEU

MIF

VDE

896Kb SRAM

Source: M. Ohashi, Matsushita, 2002

90% of FlipFlops clock-gated

70% power reduction by clock-gating

MPEG4 decoder

10

8.5mW

0 155

30.6mW

20 25

Without clock gating

With clock gating

Power [mW]

Page 24: Sill LowPower2

Micro transductors ‘08, Low Power 2 24Copyright Sill, 2008

0

1

2

3

4

5

6

0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4

Supply voltage (VDD)

Rel

ativ

e D

ela

y t d

0

2

4

6

8

10

Rel

ativ

e P

dyn

Recap: VRecap: VDDDD versus Delay and Power versus Delay and Power

Dynamic Power can be traded by delay

tdPdyn

Page 25: Sill LowPower2

Micro transductors ‘08, Low Power 2 25Copyright Sill, 2008

A Reference DatapathA Reference Datapath

Combinationallogic

OutputInputR

eg

iste

r

Re

gis

ter

CLKSupply voltage = Vref

Total capacitance switched per cycle = Cref

Clock frequency = fClk

Power consumption: Pref = CrefVref2fclk

Cref

Source: Agarwal, 2007

Page 26: Sill LowPower2

Micro transductors ‘08, Low Power 2 26Copyright Sill, 2008

Parallel ArchitectureParallel Architecture

Comb.Logic

Copy 1

Comb.Logic

Copy 2

Comb.Logic

Copy N

Re

gis

ter

Re

gis

ter

Re

gis

ter

Re

gis

ter

N to

1 m

ulti

ple

xer

MultiphaseClock gen. and mux

control

InputOutput

CK

fclk

fclk/N

Each copy processesevery Nth input,operates atreduced voltage

Supply voltage:VN ≤ Vref

N = Deg. of parallelism

Source: Agarwal, 2007

fclk/N

fclk/N

Page 27: Sill LowPower2

Micro transductors ‘08, Low Power 2 27Copyright Sill, 2008

DataData

Pipelined ArchitecturePipelined Architecture

Reduces the propagation time of a block by factor N

Voltage can be reduced at constant clock frequency Constant throughput

Functionality:

CLK

Area A

CLK

CLK

A/N A/NA/N

Page 28: Sill LowPower2

Micro transductors ‘08, Low Power 2 28Copyright Sill, 2008

Reference Data path (for example)

Critical path delay Tadder + Tcomparator (= 25 ns) fref = 40 MHz

Total capacitance being switched = Cref

VDD = Vref = 5V

Power for reference datapath = Pref = Cref Vref2 fref

Parallel Architecture: ExampleParallel Architecture: Example

A

B

Source: Irwin, 2000

Page 29: Sill LowPower2

Micro transductors ‘08, Low Power 2 29Copyright Sill, 2008

Parallel Architecture: Example cont’dParallel Architecture: Example cont’d

Area = 1476 x 1219 µ2

Source: Irwin, 2000

The clock rate can be reduced by half with the same throughput fpar = fref / 2

Vpar = Vref / 1.7, Cpar = 2.15 Cref

Ppar = (2.15 Cref) (Vref / 1.7)2 (fref / 2) = 0.36 Pref

Page 30: Sill LowPower2

Micro transductors ‘08, Low Power 2 30Copyright Sill, 2008

Pipelined Architecture: ExamplePipelined Architecture: Example

Source: Irwin, 2000

fpipe = fref, , Cpipe = 1.1 Cref , Vpipe = Vref / 1.7

Voltage can be dropped while maintaining the original throughput Ppipe = CpipeVpipe

2 fpipe = (1.1 Cref) (Vref/1.7)2 fref = 0.37 Pref

Page 31: Sill LowPower2

Micro transductors ‘08, Low Power 2 31Copyright Sill, 2008

Approximate TrendApproximate Trend

N-parallel proc. N-stage pipeline proc.

Capacitance N*Cref Cref

Voltage Vref/N Vref/N

Frequency fref/N fref

Dynamic Power CrefVref2fref/N2 CrefVref

2fref/N2

Chip area N times 10-20% increase

Source: G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.

Page 32: Sill LowPower2

Micro transductors ‘08, Low Power 2 32Copyright Sill, 2008

Reduction of switching activity by adding latches at inputs

Latch preserves previous value of inputs to suppress activity

Could also use AND gates to mask inputs to zero

= forced zero

A

MultiplierB

C

condition

MultiplierB

C

A

Latc

h

condition

Guarded EvaluationGuarded Evaluation

Page 33: Sill LowPower2

Micro transductors ‘08, Low Power 2 33Copyright Sill, 2008

Outputs

Precomputationlogic

Precomputedinputs

Gatedinputs

g(X)

Combinationlogic f(X)

g(X)

R2

R1

Loaddisable

Source: Irwin, 2000

PrecomputationPrecomputation

Identify logical conditions at inputs that are invariant to the output Since those inputs don’t affect output, disable input transitions Trade area for energy

Page 34: Sill LowPower2

Micro transductors ‘08, Low Power 2 34Copyright Sill, 2008

Source: Irwin, 2000

Precomputation: Design IssuesPrecomputation: Design Issues

Design steps1. Selection of precomputation architecture

2. Determination of precomputed and gated inputs (Register R1 should be much smaller than R2)

3. Search good implementation for g(X)

4. Evaluation of potential energy savings based on input statistics (if savings not sufficient go to step 2 or 3 and try again)

Also works for multiple output functions where g(X) is the product of gj(X) over all j

Page 35: Sill LowPower2

Micro transductors ‘08, Low Power 2 35Copyright Sill, 2008

A > Bn-bit binary value

comparatorA > BR2

R1

LoaddisableAn = Bn

An

Bn

An-1

A1

Bn-1

B1

Can achieve up to 75% power reduction with 3% area overhead and 1 to 5 additional gate delays in worst case path

Source: Irwin, 2000

Precomputation: ExamplePrecomputation: Example

Binary Comparator

Page 36: Sill LowPower2

Micro transductors ‘08, Low Power 2 36Copyright Sill, 2008

Various algorithms exist to implement an integer adder Ripple, select, skip (x2), Look-ahead, conditional-sum. Each with its own characteristics of timing and power consumption.

FAFAFAFAFAFA

Ripple CarryRipple Carry

FAFAFAFA

Variable/Fixed Width Carry SkipVariable/Fixed Width Carry Skip

FAFAFAFA

Carry Look-aheadCarry Look-ahead

FAFAFAFA FAFA 0

1

Carry SelectCarry Select

Adder DesignAdder Design

Source: Mendelson, Intel

Page 37: Sill LowPower2

Micro transductors ‘08, Low Power 2 37Copyright Sill, 2008

Source: Callaway, Swartzlander “Estimating the power consumption of CMOS adders” - 11th Symposium on Computer Arithmetic, 1993. Proceedings.

Energy (pJ)

Delay (nSec)

Ripple Carry 117 54.27Constant Width Carry Skip 109 28.38Variable Width Carry Skip 126 21.84Carry Lookahead 171 17.13Carry Select 216 19.56Conditional Sum 304 20.05

Adder DesignAdder Design

Adders differ in Energy and delay Different adders for different applications Also true for other units (multiplier, counter, …)

Page 38: Sill LowPower2

Micro transductors ‘08, Low Power 2 38Copyright Sill, 2008

Bus PowerBus Power

Buses are significant source of power dissipation 50% of dynamic power for interconnect switching (Magen, SLIP 04) MIT Raw processor’s on-chip network consumes 36% of total chip

power (Wang et al. 2003) Caused by:

High switching activities Large capacitive loading

Ain Bin Cin Din

Wout Xout Yout Zout

Busdrivers

Busreceivers

Bus

Source: Irwin, 2000

Page 39: Sill LowPower2

Micro transductors ‘08, Low Power 2 39Copyright Sill, 2008

Bus Power ReductionBus Power Reduction

For an n-bit bus: Pbus = n* αfClkCloadVDD2

Alternative bus structures Segmented buses (lower Cload)

Charge recovery buses

Bus multiplexing (lower fClk possible)

Minimizing bus traffic (n) Code compression Instruction loop buffers

Minimization of bit switching activity (fclk) by data

encoding

Minimize voltage swing (VDD2) using differential signaling

Source: Irwin, 2000

Page 40: Sill LowPower2

Micro transductors ‘08, Low Power 2 40Copyright Sill, 2008

Source: Irwin, 2000

Reducing Shared ResourcesReducing Shared Resources

Shared resources incur switching overhead Local bus structures reduce overhead

Global bus architecture Local bus architecture

Page 41: Sill LowPower2

Micro transductors ‘08, Low Power 2 41Copyright Sill, 2008

Reducing Shared Resources cont’dReducing Shared Resources cont’d

Shared Bus

B

B

Segmented Bus

Source: Evgeny Bolotin – Jan 2004

Bus segmentation Another way to reduce shared buses Control of bus segment by controller blocks (B)

Page 42: Sill LowPower2

Micro transductors ‘08, Low Power 2 42Copyright Sill, 2008

Base elements: Functions Procedures Processes Control structures

Description of design behavior

Design Layer: Algorithm LevelDesign Layer: Algorithm Level

Page 43: Sill LowPower2

Micro transductors ‘08, Low Power 2 43Copyright Sill, 2008

Use processor-specific instruction style: Variable types Function calls style Conditionalized instructions (for ARM)

Follow general guidelines for software coding Use table look-up instead of conditionals Make local copies of global variables so that they can be

assigned to registers Avoid multiple memory look-ups with pointer chains

Coding stylesCoding styles

Page 44: Sill LowPower2

Micro transductors ‘08, Low Power 2 44Copyright Sill, 2008

Minimize power-consuming activity: Computation

Communication

Storage

Source-code TransformationsSource-code Transformations

A*B+A*C A*(B+C)

for (c = 1..N)

receive (A) B=c*A

receive (A)

for (c = 1..N) B=c*A

for (c = 1..N) B[c] = A[c]*D[c]for (c = 1..N) F[c] = B[c]-1

for (c = 1..N)F[c] = A[c]*D[c]-1

Page 45: Sill LowPower2

Micro transductors ‘08, Low Power 2 45Copyright Sill, 2008

Source: Irwin, 2000

Datapath Energy ConsumptionDatapath Energy Consumption

Algorithms can differ in power dissipation

0

2000

4000

6000

8000

10000

12000

14000

bubble.c heap.c quick.c

Sw

itche

d C

apac

itanc

e (n

F)

OthersFunctional UnitPipeline RegistersRegister File

Page 46: Sill LowPower2

Micro transductors ‘08, Low Power 2 46Copyright Sill, 2008

Slow down processor to fill idle time More Delay lower operational voltage

Runtime Scheduler determines processor speed and selects appropriate voltage

Transitions delay for frequencies ~150s Potential to realize 10x energy savings

Active Idle Active Idle 3.3 V

Active 2.4 V

Adaptive Dynamic Voltage Scaling (DVS) Adaptive Dynamic Voltage Scaling (DVS)

Page 47: Sill LowPower2

Micro transductors ‘08, Low Power 2 47Copyright Sill, 2008

Adaptive DVS: ExampleAdaptive DVS: ExampleS

peed

Time

T1 T2 T1 T2

Idle

Same work,lower energy

TaskTask

Task with 100 ms deadline, requires 50 ms CPU time at full speed Normal system gives 50 ms computation, 50 ms idle/stopped time Half speed/voltage system gives 100 ms computation, 0 ms idle

Same number of CPU cycles but: E = C (VDD/2)2 = Eref / 4

Dynamic Voltage Scaling adapts voltage to workload

Time

Page 48: Sill LowPower2

Micro transductors ‘08, Low Power 2 48Copyright Sill, 2008

Design Layer: System LevelDesign Layer: System Level

Basic Elements: Complex modules

Processors

Calculation and control units

SensorsALU

ME

M

ME

M

MP3

Page 49: Sill LowPower2

Micro transductors ‘08, Low Power 2 49Copyright Sill, 2008

Systems are: Designed to deliver peak performance, but … Not needing peak performance most of the time

Components are idle sometimes Dynamic power management (DPM):

Puts idle components in low-power non-operationalstates when idle

Power manager: Observes and controls the system Power consumption of power manager is negligible

Dynamic Power ManagementDynamic Power Management

Page 50: Sill LowPower2

Micro transductors ‘08, Low Power 2 50Copyright Sill, 2008

Software power control - power management

DOZE Most units stopped except on-chip

cache memory (cache coherency)

NAP Cache also turned off, PLL still on,

time out or external interrupt to resume

SLEEP PLL off, external interrupt to resume

Deeper sleep mode consumesless power

Deeper sleep mode requiresmore latency to resume

Processor Sleep ModesProcessor Sleep Modes

Page 51: Sill LowPower2

Micro transductors ‘08, Low Power 2 51Copyright Sill, 2008

Mode 66Mhz 80Mhz

No power mgmt 2.18W 2.54WDynamic power mgmt 1.89W 2.20W

DOZE 307mW 366mW

NAP 113mW 135mW

SLEEP 89mW 105mW

SLEEP without PLL 18mW 19mW

SLEEP without clock 2mW 2mW

10 cycles to wake up from SLEEP 100us to wake up from SLEEP+

Source: Irwin, 2000

Processor Sleep Modes: ExampleProcessor Sleep Modes: Example PowerPC sleep modes

Page 52: Sill LowPower2

Micro transductors ‘08, Low Power 2 52Copyright Sill, 2008

Transmeta LongRunTransmeta LongRun

Applies adaptive DVS LongRun policies:

Detection of different workload scenarios Based on runtime performance information

After detection accordingly adaptation of: Processor supply voltage Processor frequency Clock frequency always within limits required by supply voltage to avoid

clock skew problems

Use of core frequency/voltage hard coded operating points

Best trade-off between performance and power possible

Page 53: Sill LowPower2

Micro transductors ‘08, Low Power 2 53Copyright Sill, 2008

0

10

20

30

40

50

60

70

80

90

100

300 400 500 600 700 800 900 1000

Frequency (MHz)

% o

f m

ax p

ow

erl c

on

sum

pti

on

300 Mhz0.80 V

433 Mhz0.87 V

533 Mhz0.95 V

667 Mhz1.05 V

800 Mhz1.15 V

900 Mhz1.25 V

1000 Mhz1.30 V

Typical operating region Peak performance region

Transmeta LongRun cont’dTransmeta LongRun cont’d

Source: Transmeta

Page 54: Sill LowPower2

Micro transductors ‘08, Low Power 2 54Copyright Sill, 2008

Transmeta LongRun: ExampleTransmeta LongRun: Example

Source: Transmeta

Page 55: Sill LowPower2

Micro transductors ‘08, Low Power 2 55Copyright Sill, 2008

Battery aware designBattery aware design

1000 mAh(Standard Capacity)

Discharge current (mA)

Cap

acity

(m

Ah)

( Rated Current)125mA

1000800

600

400

200

AvailableCharge(mA) time

idleDischarge

Current(mA)

time

Non-linear effects influence life time of batteries

“Rate Capacity” If discharging currents

higher than allowed

real capacity goes under nominal capacity

“Battery Recovery” Pulsed discharge increases

nominal capacity Based on recovery times (as long there is no rate

capacity effect)

Source: Timmermann, 2007

Page 56: Sill LowPower2

Micro transductors ‘08, Low Power 2 56Copyright Sill, 2008

Electro-active species

Diffusion Model from - Rakhmatov, Vrudula et al.

Fully charged battery

Fully discharged

Ele

ctro

de

After a recent discharge

Ele

ctro

de

Ele

ctro

de

After Recovery

Ele

ctro

de

Battery aware design cont’dBattery aware design cont’d

Analytically very sound but computationally intensive Cannot be used for online scheduling decisions.

Page 57: Sill LowPower2

Micro transductors ‘08, Low Power 2 57Copyright Sill, 2008

Performance of a bipolar lead-acid battery subjected to six current impulses. Pulse length=3 ms, rest period=22 ms.

Current Battery Voltage

Battery aware design: Example 1Battery aware design: Example 1

Source: LaFollette, “Design and performance of high specific power, pulsed discharge, bipolar lead acid batteries”, 10th Annual Battery Conference on Applications and Advances, Long Beach, pp. 43–47, January 1995.

Page 58: Sill LowPower2

Micro transductors ‘08, Low Power 2 58Copyright Sill, 2008

Discharge profile A Discharge profile B

Minimum average current ≠ Maximum battery life time

Profile Aver. Current [mA] Battery lifetime [ms] Specif. energy [Wh/Kg]

A 123.8 357053 15.12

B 124.2 536484 18.58

Battery aware design: Example 2Battery aware design: Example 2

Source: Timmermann, 2007

Cur

rent

[m

A]

Cur

rent

[m

A]

Page 59: Sill LowPower2

Micro transductors ‘08, Low Power 2 59Copyright Sill, 2008

BackupBackup

Page 60: Sill LowPower2

Micro transductors ‘08, Low Power 2 60Copyright Sill, 2008

FSM: Clock-GatingFSM: Clock-Gating

Moore machine: Outputs depend only on the state variables. If a state has a self-loop in the state transition graph

(STG), then clock can be stopped whenever a self-loop is to be executed.

Sj

SiSk

Xi/Zk

Xk/Zk

Xj/Zk

Clock can be stopped when (Xk, Sk) combination occurs.

Page 61: Sill LowPower2

Micro transductors ‘08, Low Power 2 61Copyright Sill, 2008

Trend: Interconnects Trend: Interconnects

Interconnects

Source: Tenhunen, 2005

Example (very optimistic):

6–10 clock cycles in 50nm technology

[Benini, 2002]

Propagation delays of global wires will be a multiple of the clock cycle.

Page 62: Sill LowPower2

Micro transductors ‘08, Low Power 2 62Copyright Sill, 2008

or

Number of bus transitions per cycle

= 2 (1 + 1/2 + 1/4 + ...) = 4

Bus MultiplexingBus Multiplexing

Source: Irwin, 2000

Page 63: Sill LowPower2

Micro transductors ‘08, Low Power 2 63Copyright Sill, 2008

Resource Sharing and Activity IIResource Sharing and Activity II

Page 64: Sill LowPower2

Micro transductors ‘08, Low Power 2 64Copyright Sill, 2008

Bus MultiplexingBus Multiplexing

S2

S1D1

D2

S1

S2 D2

D1

Source: Irwin, 2000

Sharing of long data buses with time multiplexing Example:

S1 uses even cycles

S2 odd

Page 65: Sill LowPower2

Micro transductors ‘08, Low Power 2 65Copyright Sill, 2008

Correlated Data StreamsCorrelated Data Streams

0

0,5

1

14 12 10 8 6 4 2 0

Muxed

Dedicated

Bit positionMSB LSB

Bit

switc

hin

g p

roba

bili

ties For a shared (multiplexed) bus advantages of data correlation are lost (bus carries samples from two uncorrelated data streams) Bus sharing should not be used

for positively correlated data streams

Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching

Source: Irwin, 2000

Page 66: Sill LowPower2

Micro transductors ‘08, Low Power 2 66Copyright Sill, 2008

If data bus is shared, advantages of data correlation are lost (bus carries samples from two uncorrelated data streams)

Bus sharing should not be used for positively correlated data streams

Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching

Disadvantages of Bus MultiplexingDisadvantages of Bus Multiplexing

Page 67: Sill LowPower2

Micro transductors ‘08, Low Power 2 67Copyright Sill, 2008

Implementation

VariablePower-Speed

SystemFIFO Input Buffer

WorkloadFilter

Power-SpeedControl Knob

Adaptive DVS cont’dAdaptive DVS cont’d