sill lowpower2
Post on 28-Apr-2015
21 Views
Preview:
DESCRIPTION
TRANSCRIPT
Micro transductors ’08Micro transductors ’08 Low Power VLSI Design 2Low Power VLSI Design 2
Dr.-Ing. Frank SillDepartment of Electrical Engineering, Federal University of Minas Gerais,
Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil
franksill@ufmg.br
http://www.cpdee.ufmg.br/~frank/
Micro transductors ‘08, Low Power 2 2Copyright Sill, 2008
AgendaAgenda
Recap Power reduction on
Gate level Architecture level Algorithm level System level
Micro transductors ‘08, Low Power 2 3Copyright Sill, 2008
Recap: Problems of Power DissipationRecap: Problems of Power Dissipation
Continuously increasing performance demands
Increasing power dissipation of technical devices
Today: power dissipation is a main problem
High Power dissipation leads to:
High efforts for cooling
Increasing operational costs
Reduced reliability
High efforts for cooling
Increasing operational costs
Reduced reliability
Reduced time of operation
Higher weight (batteries)
Reduced mobility
Reduced time of operation
Higher weight (batteries)
Reduced mobility
Micro transductors ‘08, Low Power 2 4Copyright Sill, 2008
CL
Recap: Consumption in CMOSRecap: Consumption in CMOS Voltage (Volt, V) Water pressure (bar) Current (Ampere, A) Water quantity per second (liter/s) Energy Amount of Water
Energy consumption is proportional to capacitive load!
0
1
Micro transductors ‘08, Low Power 2 5Copyright Sill, 2008
Watts
time
Power is height of curve
Watts
time
Energy is area under curve
Approach 1
Approach 2
Approach 2
Approach 1
Recap: Energy and PowerRecap: Energy and Power
Energy = Power * time for calculation = Power * Delay
Micro transductors ‘08, Low Power 2 6Copyright Sill, 2008
P = α f CL VDD2 + VDD Ipeak (P01 + P10 ) + VDD Ileak
Dynamic power(≈ 40 - 70% today and decreasing
relatively)
Short-circuit power(≈ 10 % today and
decreasing absolutely)
Leakage power(≈ 20 – 50 %
today and increasing)
Recap: Power Equations in CMOSRecap: Power Equations in CMOS
Micro transductors ‘08, Low Power 2 7Copyright Sill, 2008
System
Algorithm
Architecture
Gate
Transistor
T
T
+
ST1
ALU
ME
M
ME
MMP3
Savings Speed Error
> 70 %
40-70 %
25-40 %
15-25 %
10-15 %
Seconds
Minute
Minutes
Hour
Hours
> 50 %
25-50 %
15-30 %
10-20 %
5-10 %
Recap: Levels of Recap: Levels of OptimizationOptimization
nach Massoud Pedram
Micro transductors ‘08, Low Power 2 8Copyright Sill, 2008
Recap: Logic RestructuringRecap: Logic Restructuring
Chain implementation has a lower overall switching activity than tree implementation for random inputs
BUT: Ignores glitching effects
Logic restructuring: changing the topology of a logic network to reduce transitions
A
BC
D F
AB
CD Z
FW
X
Y0.5
0.5
(1-0.25)*0.25 = 3/16
0.50.5
0.5
0.5
0.5
0.5
7/64 = 0.109
15/256
3/16
3/16 = 0.188
15/256
AND: P01 = P0 * P1 = (1 - PAPB) * PAPB
Source: Timmernann, 2007
Micro transductors ‘08, Low Power 2 9Copyright Sill, 2008
Recap: Input OrderingRecap: Input Ordering
Beneficial: postponing introduction of signals with a high transition rate (signals with signal probability close to 0.5)
A
BC
X
F
0.5
0.20.1
B
CA
X
F
0.2
0.10.5
(1-0.5x0.2)*(0.5x0.2)=0.09 (1-0.2x0.1)*(0.2x0.1)=0.0196
Source: Irwin, 2000
AND: P01 = (1 - PAPB) * PAPB
Micro transductors ‘08, Low Power 2 10Copyright Sill, 2008
ABC
X
Z
101 000
Unit Delay
AB
X
ZC
Recap: GlitchingRecap: Glitching
Source: Irwin, 2000
Micro transductors ‘08, Low Power 2 11Copyright Sill, 2008
Design Layer: Gate LevelDesign Layer: Gate Level
Basic elements: Logic gates Sequential elements (flipflops, latches)
Behavior of elements is described in libraries
Micro transductors ‘08, Low Power 2 12Copyright Sill, 2008
Device Sizing (= changing gate width)
Affects input capacitance Cin
Affects load capacitance Cload
Affects dynamic power consumption Pdyn
Optimal fanout factor f for Pdyn is smaller
than for performance (especially for large
loads)
e.g., for Cload=20, Cin=1
fcircuit = 20
fopt_energy = 3.53
fopt_performance = 4.47
For Low Power: avoid oversizing (f too big)
beyond the optimal
1 2 3 4 5 6 70
0.5
1
1.5
fanout fno
rmal
ized
ene
rgy fcircuit=1
fcircuit=2
fcircuit=5
fcircuit=10
fcircuit=20
Dynamic Power and Device SizeDynamic Power and Device Size
Source: Nikolic, UCB
Micro transductors ‘08, Low Power 2 13Copyright Sill, 2008
0
1
2
3
4
5
6
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
Supply voltage (VDD)
Rel
ativ
e D
ela
y t d
0
2
4
6
8
10
Rel
ativ
e P
dyn
VVDDDD versus Delay and Power versus Delay and Power
Delay (td) and dynamic power consumption (Pdyn) are functions of VDD
tdPdyn
Micro transductors ‘08, Low Power 2 14Copyright Sill, 2008
Multiple VMultiple VDDDD
Main ideas: Use of different supply voltages within the same design
High VDD for critical parts (high performance needed)
Low VDD for non-critical parts (only low performance demands)
At design phase: Determine critical path(s) (see upper next slide)
High VDD for gates on those paths
Lower VDD on the other gates (in non-critical paths)
For low VDD: prefer gates that drive large capacitances (yields
the largest energy benefits)
Usually two different VDD (but more are possible)
Micro transductors ‘08, Low Power 2 15Copyright Sill, 2008
Level converters: Necessary, when module at lower supply drives gate at higher
supply (step-up) If gate supplied with VDDL drives a gate supplied with VDDH
then PMOS never turns off Possible implementation:
Cross-coupled PMOS transistors NMOS transistor operate on
reduced supply No need of level converters for
step-down change in voltage Reducing of overhead:
Conversions at register boundaries Embedding of inside flipflop
Multiple VMultiple VDDDD cont’d cont’d
VDDH
Vin
VoutVDDL
Micro transductors ‘08, Low Power 2 16Copyright Sill, 2008
FF
FF
FF
FF
FF
FF
FF
FF
FF
CLK CLK CLK
Data PathsData Paths
Data propagate through different data paths between registers (flipflops - FF)
Paths mostly differ in propagation delay times Frequency of clock signal (CLK) depends on path with longest delay
critical path
Paths
Path
Micro transductors ‘08, Low Power 2 17Copyright Sill, 2008
Data Paths: SlackData Paths: Slack
B
A
Y
C
time
all Inputs of G1arrived
G1 ready withevaluation
delay of G1
all inputs of G2arrived
Slack for G1
BA Y
C
G1G2
Micro transductors ‘08, Low Power 2 18Copyright Sill, 2008
Multiple VMultiple VDDDD in Data Paths in Data Paths Minimum energy consumption when all logic paths are critical (same
delay) Possible Algorithm: clustered voltage-scaling
Each path starts with VDDH and switches to VDDL (blue gates)
when slack is available Level conversion in flipflops at end of paths
Connected with VDDL
Connected with VDDH
Micro transductors ‘08, Low Power 2 19Copyright Sill, 2008
Design Layer: Architecture LevelDesign Layer: Architecture Level
Also known as Register transfer level (RTL) Base elements:
Register structures Arithmetic logic units (ALU) Memory elements
Only behavior is described
(no inner structure)
Micro transductors ‘08, Low Power 2 20Copyright Sill, 2008
Most popular method for power reduction of clock signals and functional units
Gate off clock to idle functional units Logic for generation of disable signal necessary
Higher complexity of control logic Higher power consumption Critical timing critical for avoiding of
clock glitches at OR gate output Additional gate delay on clock signal
Clock GatingClock Gating
Reg
clock
disable
Functionalunit
Source: Irwin, 2000
Micro transductors ‘08, Low Power 2 21Copyright Sill, 2008
Clock Gating cont’dClock Gating cont’d
Source: Agarwal, 2007
D QD
CLK
Clock-Gating in Low-Power Flip-Flop
Micro transductors ‘08, Low Power 2 22Copyright Sill, 2008
Clock Gating cont’dClock Gating cont’d
Clock gating over consideration of state in Finite-State-
Machines (FSM)
Combinational logic
LatchClock
activation logic
Flip
-flo
ps
PI
CLK
PO
Source: L. Benini and G. De Micheli,Dynamic Power Management, Boston: Springer, 1998.
Micro transductors ‘08, Low Power 2 23Copyright Sill, 2008
Clock Gating: ExampleClock Gating: Example
DSP/HIF
DEU
MIF
VDE
896Kb SRAM
Source: M. Ohashi, Matsushita, 2002
90% of FlipFlops clock-gated
70% power reduction by clock-gating
MPEG4 decoder
10
8.5mW
0 155
30.6mW
20 25
Without clock gating
With clock gating
Power [mW]
Micro transductors ‘08, Low Power 2 24Copyright Sill, 2008
0
1
2
3
4
5
6
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
Supply voltage (VDD)
Rel
ativ
e D
ela
y t d
0
2
4
6
8
10
Rel
ativ
e P
dyn
Recap: VRecap: VDDDD versus Delay and Power versus Delay and Power
Dynamic Power can be traded by delay
tdPdyn
Micro transductors ‘08, Low Power 2 25Copyright Sill, 2008
A Reference DatapathA Reference Datapath
Combinationallogic
OutputInputR
eg
iste
r
Re
gis
ter
CLKSupply voltage = Vref
Total capacitance switched per cycle = Cref
Clock frequency = fClk
Power consumption: Pref = CrefVref2fclk
Cref
Source: Agarwal, 2007
Micro transductors ‘08, Low Power 2 26Copyright Sill, 2008
Parallel ArchitectureParallel Architecture
Comb.Logic
Copy 1
Comb.Logic
Copy 2
Comb.Logic
Copy N
Re
gis
ter
Re
gis
ter
Re
gis
ter
Re
gis
ter
N to
1 m
ulti
ple
xer
MultiphaseClock gen. and mux
control
InputOutput
CK
fclk
fclk/N
Each copy processesevery Nth input,operates atreduced voltage
Supply voltage:VN ≤ Vref
N = Deg. of parallelism
Source: Agarwal, 2007
fclk/N
fclk/N
Micro transductors ‘08, Low Power 2 27Copyright Sill, 2008
DataData
Pipelined ArchitecturePipelined Architecture
Reduces the propagation time of a block by factor N
Voltage can be reduced at constant clock frequency Constant throughput
Functionality:
CLK
Area A
CLK
CLK
A/N A/NA/N
Micro transductors ‘08, Low Power 2 28Copyright Sill, 2008
Reference Data path (for example)
Critical path delay Tadder + Tcomparator (= 25 ns) fref = 40 MHz
Total capacitance being switched = Cref
VDD = Vref = 5V
Power for reference datapath = Pref = Cref Vref2 fref
Parallel Architecture: ExampleParallel Architecture: Example
A
B
Source: Irwin, 2000
Micro transductors ‘08, Low Power 2 29Copyright Sill, 2008
Parallel Architecture: Example cont’dParallel Architecture: Example cont’d
Area = 1476 x 1219 µ2
Source: Irwin, 2000
The clock rate can be reduced by half with the same throughput fpar = fref / 2
Vpar = Vref / 1.7, Cpar = 2.15 Cref
Ppar = (2.15 Cref) (Vref / 1.7)2 (fref / 2) = 0.36 Pref
Micro transductors ‘08, Low Power 2 30Copyright Sill, 2008
Pipelined Architecture: ExamplePipelined Architecture: Example
Source: Irwin, 2000
fpipe = fref, , Cpipe = 1.1 Cref , Vpipe = Vref / 1.7
Voltage can be dropped while maintaining the original throughput Ppipe = CpipeVpipe
2 fpipe = (1.1 Cref) (Vref/1.7)2 fref = 0.37 Pref
Micro transductors ‘08, Low Power 2 31Copyright Sill, 2008
Approximate TrendApproximate Trend
N-parallel proc. N-stage pipeline proc.
Capacitance N*Cref Cref
Voltage Vref/N Vref/N
Frequency fref/N fref
Dynamic Power CrefVref2fref/N2 CrefVref
2fref/N2
Chip area N times 10-20% increase
Source: G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.
Micro transductors ‘08, Low Power 2 32Copyright Sill, 2008
Reduction of switching activity by adding latches at inputs
Latch preserves previous value of inputs to suppress activity
Could also use AND gates to mask inputs to zero
= forced zero
A
MultiplierB
C
condition
MultiplierB
C
A
Latc
h
condition
Guarded EvaluationGuarded Evaluation
Micro transductors ‘08, Low Power 2 33Copyright Sill, 2008
Outputs
Precomputationlogic
Precomputedinputs
Gatedinputs
g(X)
Combinationlogic f(X)
g(X)
R2
R1
Loaddisable
Source: Irwin, 2000
PrecomputationPrecomputation
Identify logical conditions at inputs that are invariant to the output Since those inputs don’t affect output, disable input transitions Trade area for energy
Micro transductors ‘08, Low Power 2 34Copyright Sill, 2008
Source: Irwin, 2000
Precomputation: Design IssuesPrecomputation: Design Issues
Design steps1. Selection of precomputation architecture
2. Determination of precomputed and gated inputs (Register R1 should be much smaller than R2)
3. Search good implementation for g(X)
4. Evaluation of potential energy savings based on input statistics (if savings not sufficient go to step 2 or 3 and try again)
Also works for multiple output functions where g(X) is the product of gj(X) over all j
Micro transductors ‘08, Low Power 2 35Copyright Sill, 2008
A > Bn-bit binary value
comparatorA > BR2
R1
LoaddisableAn = Bn
An
Bn
An-1
A1
Bn-1
B1
Can achieve up to 75% power reduction with 3% area overhead and 1 to 5 additional gate delays in worst case path
Source: Irwin, 2000
Precomputation: ExamplePrecomputation: Example
Binary Comparator
Micro transductors ‘08, Low Power 2 36Copyright Sill, 2008
Various algorithms exist to implement an integer adder Ripple, select, skip (x2), Look-ahead, conditional-sum. Each with its own characteristics of timing and power consumption.
FAFAFAFAFAFA
Ripple CarryRipple Carry
FAFAFAFA
Variable/Fixed Width Carry SkipVariable/Fixed Width Carry Skip
FAFAFAFA
Carry Look-aheadCarry Look-ahead
FAFAFAFA FAFA 0
1
Carry SelectCarry Select
Adder DesignAdder Design
Source: Mendelson, Intel
Micro transductors ‘08, Low Power 2 37Copyright Sill, 2008
Source: Callaway, Swartzlander “Estimating the power consumption of CMOS adders” - 11th Symposium on Computer Arithmetic, 1993. Proceedings.
Energy (pJ)
Delay (nSec)
Ripple Carry 117 54.27Constant Width Carry Skip 109 28.38Variable Width Carry Skip 126 21.84Carry Lookahead 171 17.13Carry Select 216 19.56Conditional Sum 304 20.05
Adder DesignAdder Design
Adders differ in Energy and delay Different adders for different applications Also true for other units (multiplier, counter, …)
Micro transductors ‘08, Low Power 2 38Copyright Sill, 2008
Bus PowerBus Power
Buses are significant source of power dissipation 50% of dynamic power for interconnect switching (Magen, SLIP 04) MIT Raw processor’s on-chip network consumes 36% of total chip
power (Wang et al. 2003) Caused by:
High switching activities Large capacitive loading
Ain Bin Cin Din
Wout Xout Yout Zout
Busdrivers
Busreceivers
Bus
Source: Irwin, 2000
Micro transductors ‘08, Low Power 2 39Copyright Sill, 2008
Bus Power ReductionBus Power Reduction
For an n-bit bus: Pbus = n* αfClkCloadVDD2
Alternative bus structures Segmented buses (lower Cload)
Charge recovery buses
Bus multiplexing (lower fClk possible)
Minimizing bus traffic (n) Code compression Instruction loop buffers
Minimization of bit switching activity (fclk) by data
encoding
Minimize voltage swing (VDD2) using differential signaling
Source: Irwin, 2000
Micro transductors ‘08, Low Power 2 40Copyright Sill, 2008
Source: Irwin, 2000
Reducing Shared ResourcesReducing Shared Resources
Shared resources incur switching overhead Local bus structures reduce overhead
Global bus architecture Local bus architecture
Micro transductors ‘08, Low Power 2 41Copyright Sill, 2008
Reducing Shared Resources cont’dReducing Shared Resources cont’d
Shared Bus
B
B
Segmented Bus
Source: Evgeny Bolotin – Jan 2004
Bus segmentation Another way to reduce shared buses Control of bus segment by controller blocks (B)
Micro transductors ‘08, Low Power 2 42Copyright Sill, 2008
Base elements: Functions Procedures Processes Control structures
Description of design behavior
Design Layer: Algorithm LevelDesign Layer: Algorithm Level
Micro transductors ‘08, Low Power 2 43Copyright Sill, 2008
Use processor-specific instruction style: Variable types Function calls style Conditionalized instructions (for ARM)
Follow general guidelines for software coding Use table look-up instead of conditionals Make local copies of global variables so that they can be
assigned to registers Avoid multiple memory look-ups with pointer chains
Coding stylesCoding styles
Micro transductors ‘08, Low Power 2 44Copyright Sill, 2008
Minimize power-consuming activity: Computation
Communication
Storage
Source-code TransformationsSource-code Transformations
A*B+A*C A*(B+C)
for (c = 1..N)
receive (A) B=c*A
receive (A)
for (c = 1..N) B=c*A
for (c = 1..N) B[c] = A[c]*D[c]for (c = 1..N) F[c] = B[c]-1
for (c = 1..N)F[c] = A[c]*D[c]-1
Micro transductors ‘08, Low Power 2 45Copyright Sill, 2008
Source: Irwin, 2000
Datapath Energy ConsumptionDatapath Energy Consumption
Algorithms can differ in power dissipation
0
2000
4000
6000
8000
10000
12000
14000
bubble.c heap.c quick.c
Sw
itche
d C
apac
itanc
e (n
F)
OthersFunctional UnitPipeline RegistersRegister File
Micro transductors ‘08, Low Power 2 46Copyright Sill, 2008
Slow down processor to fill idle time More Delay lower operational voltage
Runtime Scheduler determines processor speed and selects appropriate voltage
Transitions delay for frequencies ~150s Potential to realize 10x energy savings
Active Idle Active Idle 3.3 V
Active 2.4 V
Adaptive Dynamic Voltage Scaling (DVS) Adaptive Dynamic Voltage Scaling (DVS)
Micro transductors ‘08, Low Power 2 47Copyright Sill, 2008
Adaptive DVS: ExampleAdaptive DVS: ExampleS
peed
Time
T1 T2 T1 T2
Idle
Same work,lower energy
TaskTask
Task with 100 ms deadline, requires 50 ms CPU time at full speed Normal system gives 50 ms computation, 50 ms idle/stopped time Half speed/voltage system gives 100 ms computation, 0 ms idle
Same number of CPU cycles but: E = C (VDD/2)2 = Eref / 4
Dynamic Voltage Scaling adapts voltage to workload
Time
Micro transductors ‘08, Low Power 2 48Copyright Sill, 2008
Design Layer: System LevelDesign Layer: System Level
Basic Elements: Complex modules
Processors
Calculation and control units
SensorsALU
ME
M
ME
M
MP3
Micro transductors ‘08, Low Power 2 49Copyright Sill, 2008
Systems are: Designed to deliver peak performance, but … Not needing peak performance most of the time
Components are idle sometimes Dynamic power management (DPM):
Puts idle components in low-power non-operationalstates when idle
Power manager: Observes and controls the system Power consumption of power manager is negligible
Dynamic Power ManagementDynamic Power Management
Micro transductors ‘08, Low Power 2 50Copyright Sill, 2008
Software power control - power management
DOZE Most units stopped except on-chip
cache memory (cache coherency)
NAP Cache also turned off, PLL still on,
time out or external interrupt to resume
SLEEP PLL off, external interrupt to resume
Deeper sleep mode consumesless power
Deeper sleep mode requiresmore latency to resume
Processor Sleep ModesProcessor Sleep Modes
Micro transductors ‘08, Low Power 2 51Copyright Sill, 2008
Mode 66Mhz 80Mhz
No power mgmt 2.18W 2.54WDynamic power mgmt 1.89W 2.20W
DOZE 307mW 366mW
NAP 113mW 135mW
SLEEP 89mW 105mW
SLEEP without PLL 18mW 19mW
SLEEP without clock 2mW 2mW
10 cycles to wake up from SLEEP 100us to wake up from SLEEP+
Source: Irwin, 2000
Processor Sleep Modes: ExampleProcessor Sleep Modes: Example PowerPC sleep modes
Micro transductors ‘08, Low Power 2 52Copyright Sill, 2008
Transmeta LongRunTransmeta LongRun
Applies adaptive DVS LongRun policies:
Detection of different workload scenarios Based on runtime performance information
After detection accordingly adaptation of: Processor supply voltage Processor frequency Clock frequency always within limits required by supply voltage to avoid
clock skew problems
Use of core frequency/voltage hard coded operating points
Best trade-off between performance and power possible
Micro transductors ‘08, Low Power 2 53Copyright Sill, 2008
0
10
20
30
40
50
60
70
80
90
100
300 400 500 600 700 800 900 1000
Frequency (MHz)
% o
f m
ax p
ow
erl c
on
sum
pti
on
300 Mhz0.80 V
433 Mhz0.87 V
533 Mhz0.95 V
667 Mhz1.05 V
800 Mhz1.15 V
900 Mhz1.25 V
1000 Mhz1.30 V
Typical operating region Peak performance region
Transmeta LongRun cont’dTransmeta LongRun cont’d
Source: Transmeta
Micro transductors ‘08, Low Power 2 54Copyright Sill, 2008
Transmeta LongRun: ExampleTransmeta LongRun: Example
Source: Transmeta
Micro transductors ‘08, Low Power 2 55Copyright Sill, 2008
Battery aware designBattery aware design
1000 mAh(Standard Capacity)
Discharge current (mA)
Cap
acity
(m
Ah)
( Rated Current)125mA
1000800
600
400
200
AvailableCharge(mA) time
idleDischarge
Current(mA)
time
Non-linear effects influence life time of batteries
“Rate Capacity” If discharging currents
higher than allowed
real capacity goes under nominal capacity
“Battery Recovery” Pulsed discharge increases
nominal capacity Based on recovery times (as long there is no rate
capacity effect)
Source: Timmermann, 2007
Micro transductors ‘08, Low Power 2 56Copyright Sill, 2008
Electro-active species
Diffusion Model from - Rakhmatov, Vrudula et al.
Fully charged battery
Fully discharged
Ele
ctro
de
After a recent discharge
Ele
ctro
de
Ele
ctro
de
After Recovery
Ele
ctro
de
Battery aware design cont’dBattery aware design cont’d
Analytically very sound but computationally intensive Cannot be used for online scheduling decisions.
Micro transductors ‘08, Low Power 2 57Copyright Sill, 2008
Performance of a bipolar lead-acid battery subjected to six current impulses. Pulse length=3 ms, rest period=22 ms.
Current Battery Voltage
Battery aware design: Example 1Battery aware design: Example 1
Source: LaFollette, “Design and performance of high specific power, pulsed discharge, bipolar lead acid batteries”, 10th Annual Battery Conference on Applications and Advances, Long Beach, pp. 43–47, January 1995.
Micro transductors ‘08, Low Power 2 58Copyright Sill, 2008
Discharge profile A Discharge profile B
Minimum average current ≠ Maximum battery life time
Profile Aver. Current [mA] Battery lifetime [ms] Specif. energy [Wh/Kg]
A 123.8 357053 15.12
B 124.2 536484 18.58
Battery aware design: Example 2Battery aware design: Example 2
Source: Timmermann, 2007
Cur
rent
[m
A]
Cur
rent
[m
A]
Micro transductors ‘08, Low Power 2 59Copyright Sill, 2008
BackupBackup
Micro transductors ‘08, Low Power 2 60Copyright Sill, 2008
FSM: Clock-GatingFSM: Clock-Gating
Moore machine: Outputs depend only on the state variables. If a state has a self-loop in the state transition graph
(STG), then clock can be stopped whenever a self-loop is to be executed.
Sj
SiSk
Xi/Zk
Xk/Zk
Xj/Zk
Clock can be stopped when (Xk, Sk) combination occurs.
Micro transductors ‘08, Low Power 2 61Copyright Sill, 2008
Trend: Interconnects Trend: Interconnects
Interconnects
Source: Tenhunen, 2005
Example (very optimistic):
6–10 clock cycles in 50nm technology
[Benini, 2002]
Propagation delays of global wires will be a multiple of the clock cycle.
Micro transductors ‘08, Low Power 2 62Copyright Sill, 2008
or
Number of bus transitions per cycle
= 2 (1 + 1/2 + 1/4 + ...) = 4
Bus MultiplexingBus Multiplexing
Source: Irwin, 2000
Micro transductors ‘08, Low Power 2 63Copyright Sill, 2008
Resource Sharing and Activity IIResource Sharing and Activity II
Micro transductors ‘08, Low Power 2 64Copyright Sill, 2008
Bus MultiplexingBus Multiplexing
S2
S1D1
D2
S1
S2 D2
D1
Source: Irwin, 2000
Sharing of long data buses with time multiplexing Example:
S1 uses even cycles
S2 odd
Micro transductors ‘08, Low Power 2 65Copyright Sill, 2008
Correlated Data StreamsCorrelated Data Streams
0
0,5
1
14 12 10 8 6 4 2 0
Muxed
Dedicated
Bit positionMSB LSB
Bit
switc
hin
g p
roba
bili
ties For a shared (multiplexed) bus advantages of data correlation are lost (bus carries samples from two uncorrelated data streams) Bus sharing should not be used
for positively correlated data streams
Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching
Source: Irwin, 2000
Micro transductors ‘08, Low Power 2 66Copyright Sill, 2008
If data bus is shared, advantages of data correlation are lost (bus carries samples from two uncorrelated data streams)
Bus sharing should not be used for positively correlated data streams
Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching
Disadvantages of Bus MultiplexingDisadvantages of Bus Multiplexing
Micro transductors ‘08, Low Power 2 67Copyright Sill, 2008
Implementation
VariablePower-Speed
SystemFIFO Input Buffer
WorkloadFilter
Power-SpeedControl Knob
Adaptive DVS cont’dAdaptive DVS cont’d
top related