outline - massoud pedram ·  · 2012-09-08age & gender → zero-delay power estimate weight...

21
1 University of Southern California 1 © M. Pedram Nov 1997 M. Pedram USC Massoud Pedram University of Southern California Department of EE-Systems Los Angeles CA 90089 M. Pedram USC Outline Motivation and Objectives Power Estimation Methodology Example Analysis/Estimation Tools Power Optimization Flow Example Minimization Techniques Summary Jan 1998

Upload: trantram

Post on 03-Apr-2018

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

1

University of Southern California 1

© M. Pedram Nov 1997

M. PedramUSC

Massoud PedramUniversity of Southern California

Department of EE-SystemsLos Angeles CA 90089

M. PedramUSC

Outline

• Motivation and Objectives• Power Estimation Methodology• Example Analysis/Estimation Tools• Power Optimization Flow• Example Minimization Techniques• Summary

Jan 1998

Page 2: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

2

University of Southern California 2

© M. Pedram Nov 1997

M. PedramUSC

Opportunities for Power Savings

System

Behavioral

RT-Level

Logic

Physical 5-10 %

10-25 %

25-40%

30-60 %

50-90 %HW/SW Co-designCustom ISAAlgorithm DesignCommunication Synthesis

Scheduling, AllocationPipeliningBehavioral Transformations

Clock Gating, PrecomputationOperand IsolationState Assignment, Retiming

Logic RestructuringTechnology MappingPin Ordering & Phase Assignment

Fanout Optimization, BufferingTransistor Sizing, PlacementPartitioning, Clock Tree DesignGlitch EliminationSavings

M. PedramUSC

Realistic Estimation Expectations

System

Architectural

RT-Level

Gate

Transistor

Instruction-Level ModelsIP Core ModelsProgram ComplexityProgram Simulation

Entropic BoundsArchitectural SimulationI/O and Memory Accesses

RT-Level MacromodelsHDL Simulation Quick Synthesis

Probabilistic SimulationGate-Level SimulationSampling and CompactionASIC Library Models

Parasitic ExtractionAccurate Timing AnalysisCircuit-Level Simulation

5-10 %

10-30 %

30-50%

40-70 %

70-90 %

Day

Hours

Hour

Minutes

Minute

Speed Error

Jan 1998

Page 3: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

3

University of Southern California 3

© M. Pedram Nov 1997

M. PedramUSC

Example Applications

• Portable Electronics (PC, PDA, Wireless)• Ultra-Low-Power Circuits (Pacemaker)• Space Missions (Miniaturized Satellites)• IC Cost (Packaging and Cooling)• Reliability (Electromigration, Latch-up)• Signal Integrity (Switching Noise, DC

Voltage Drop)• Thermal Design

M. PedramUSC

Digital Camera CircuitCCD

preprocessor

DRAM

ROM

Compactflash

Single-cyclemultiplier

accumulator

Pixelcoprocessor

Mini-RISC MIPSmicroprocessor

Memorycontroller

10-channelDMA

controller

JPEGcodec

PC/ATAinterface

General-purpose I/O

NTSC/PALencoder

Triplevideo D/A

On-screendisplay

controller

LCDcontroller

High-speedserial I/O

Chargecoupleddevice

Cameralens

A/D

To TV

Camera’sLCD

display

To PCinterface

Source : LSI Logic

Jan 1998

Page 4: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

4

University of Southern California 4

© M. Pedram Nov 1997

M. PedramUSC

A Power Estimation Methodology

Classify Modules

Quick SynthesisSampling/Compaction

Model Reduction

Control Logic

Interface Logic

Cores w/oMacromodel

Decompose & Classify Blocks

Composite Cores

Library ModelsAnalytic Models

Memory

Cores with Macromodel

Analog ModulesBehavioral

Simulation

Input Vector Sequences

DesignPlanning

InterconnectEstimates

HDL Description

CellLibrary

CAD Database

M. PedramUSC

Quick Synthesis

Quick RTL Synthesis

Quick Logic Synthesis

RTL Source Code

RTL Netlist

Mapped Netlist

Jan 1998

Page 5: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

5

University of Southern California 5

© M. Pedram Nov 1997

M. PedramUSC

Cell Characterization

SPICE Simulation Event-Driven Simulation

Gate-Level Netlist

SPICE Parameters

Transistor-Level Netlist

Gate-Level Power/DelayCharacterization Data

RT-Level Power/DelayCharacterization Data

M. PedramUSC

Circuit Model Reduction

Datapath Block Predictor Block

FSMApproximate SSM Model

Analog Block Sensitivity-Based Simplified Models

Random Logic k-LUT Structure

Jan 1998

Page 6: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

6

University of Southern California 6

© M. Pedram Nov 1997

M. PedramUSC

Dynamic (Simulative) Techniques

Report Power

InputSequence

SampledSequence

Compacted Sequence

RT-Level or Gate-LevelNetlist

Library Data

ProbabilisticCompaction

StatisticalSampling

Event orCycle-BasedSimulation

M. PedramUSC

Simple Random Sampling (SRS)

Exhaustive

0% errorvery time consuming

< 5% error> 1000X speedup

Random Sampling

Jan 1998

Page 7: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

7

University of Southern California 7

© M. Pedram Nov 1997

M. PedramUSC

Monte Carlo Simulation

Efficiency: depends onpopulation characteristicsand sampling procedure

“Difficult” distributions

NO

YES

Report Power

Input vectors

Generate one sample of k units

Do circuit level simulation

Calculate mean power & confidence interval

Converged?

M. PedramUSC

SRS Results

0 2000 4000 6000 8000

C432

C880

C1355

C1908

C2670

C3540

C5315

C6288

C7552

Biased SequenceRandom Sequence

Jan 1998

Page 8: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

8

University of Southern California 8

© M. Pedram Nov 1997

M. PedramUSC

Stratified Random Sampling (StRS)Estimate the average weight, assuming that gender andage of individuals are readily available

Lower samplevariance leads tofaster convergence

StratificationS

ampling

M. PedramUSC

Application to Power EstimationAge & Gender → Zero-delay power estimateWeight → Powermill power estimate

Input vectors(population)

Report Power

Zero-Delay Simulationof the Entire population

PopulationStratification

Random Sampling andPowerMill Simulation

Convergent?

NO

YES

Jan 1998

Page 9: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

9

University of Southern California 9

© M. Pedram Nov 1997

M. PedramUSC

Regression EstimationEstimate the average height, assuming that weight ofindividuals is readily available

SampleH1

Simple RandomEstimator

H2 = H1+Slope•(W2-W1)

RegressionEstimator

W1: Avg. Weight of Samples

W2: Avg. Weight of Population

height

weight

slope

height

H1: Avg. Height of Samples

M. PedramUSC

StRS Results

02468

1012

C432 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552

Speedup

Jan 1998

Page 10: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

10

University of Southern California 10

© M. Pedram Nov 1997

M. PedramUSC

Sampling on FSMsMethod 1: Functional simulation + sampling on comb. circuit

Logic

FFs

Logic

Input Seq. Input

Seq. +StateLines

Method 2: Do machine warm-up for every sampling unit

Logic

FFs

Warm-upvectors +sampling unit

Warm-up sequencelength can be quitelarge

M. PedramUSC

Probabilistic Compaction (PC)Sequence compaction : Generate a new, but shorter,sequence which exhibits nearly the same spatio-temporal correlations as the initial sequence

TargetCircuit

ProbabilisticModel

SequenceGeneration New

Seq.

InitialSeq.

outin

Jan 1998

Page 11: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

11

University of Southern California 11

© M. Pedram Nov 1997

M. PedramUSC

PC by ExampleInitial sequence

011010100111110101110000011011100010000110101110101

Avg. bit activity23/16

Compacted sequence001010011000010001

Avg. bit activity8/5 Stochastic State Machine Model

1/2

1 1/2

1/4

2/3

1/2

1/4

1/21/3

1/2 1/2

1/2

1/2

3

0

6

1

5

4

7

1/2

M. PedramUSC

Dynamic Markov Trees

S1 = (0000, 0001, 1001, 1100, 1001, 1100, 1001, 1100)

12 2

3 3 3

4 4 4

2 62

2

1 1

3 33 3

3 3

DMT0

DMT1 12 2

3 3 3

4 4 45 555

6 666

7 7778 888

2 62 33

2

1 11

1

1

1

1

11

1

33

33

3

3

332

2

22

Upper subtree

Lower subtrees

Jan 1998

Page 12: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

12

University of Southern California 12

© M. Pedram Nov 1997

M. PedramUSC

Comparison with SRS

0% 2% 4% 6% 8%

C432

C880

C1355

C1908

C3540

C6288

circ

uit

L = 4,000Compaction Ratio 10

SRSHierarchical

M. PedramUSC

Compaction for FSMs

Input Sequence Target Circuit

Logic

FFs

xn zn = out (xn, sn)

sn

p(xnsn) sn+1 = next (xn, sn)Markov Chain

Jan 1998

Page 13: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

13

University of Southern California 13

© M. Pedram Nov 1997

M. PedramUSC

Higher Order DMTsA lag-k Markov chain which correctly models the inputsequence, also models the joint k-step conditionalprobabilities of the primary inputs and state lines

p(vi)

p(vj|vi)

p(vk|vjvi)

DMT0

DMT1

DMT2

vi

vj

vk

M. PedramUSC

High Order DMT Results

0% 10% 20% 30% 40% 50% 60%

b bara

d k17

mc

p lane t

sh iftre g

s1196

s1423

s5378

s820

s9234Com paction Ratio 10

Order 1

Order 2

Jan 1998

Page 14: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

14

University of Southern California 14

© M. Pedram Nov 1997

M. PedramUSC

New Program Program Synthesis

Architectural SimulationCharacteristic

Profile

RTL Simulation PowerEstimation

CPU’sRTL

Model

Original Program

Instruction Level Compaction

M. PedramUSC

Characteristic Profile• Instruction mix• Average instruction size (if applicable)• Clocks per instruction• Branch prediction miss rate• Instruction cache miss rate• Data cache read/write miss rate• Pipeline stall rate• Speculative execution and register renaming are

not considered• Target micro-processor: A super-scalar pipelined

CPU with branch prediction (Intel’s Pentium chip)

Jan 1998

Page 15: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

15

University of Southern California 15

© M. Pedram Nov 1997

M. PedramUSC

1. Block Allocation

2. Instruction Allocation

add op1, op2xor op1, op2mul op1, op2mov op1, mem1branch

3. Memory Allocation

mov op1, mem1

4. Operand Allocation/ Instruction Scheduling

add op1, op2

reg1 reg4reorder the sequence

memory space

Program Synthesis Procedure

M. PedramUSC

Results: Compression Ratio

1 10 100 1000 10000 100000

VORTEX

PERL

IJPEG

LISP

COMP

GCC

M88

GO

Jan 1998

Page 16: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

16

University of Southern California 16

© M. Pedram Nov 1997

M. PedramUSC

Results: Accuracy

0 50 100 150 nano Joule

VORTEX

PERL

IJPEG

LISP

COMP

GCC

M88

GO

Energy per Instruction

SynthesizedOriginal

M. PedramUSC

Power Macromodeling

MacromodelEquation Form

ModelVariables

Training Set

Low - LevelSimulation

Regression Analysis

RegressionCoefficients

MacroModels

Module

Analytic ModelReduction

Jan 1998

Page 17: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

17

University of Southern California 17

© M. Pedram Nov 1997

M. PedramUSC

Dual Bit Type Model

Pwr = C0 + C1 ·S1 + C2 ·S2 + C3 ·S3 + C4 ·S4

Consider a data path block:

Sign bit

MSB region LSB region

S1 S2 : avg. switching activity of LSB(MSB) region of operand 1

S3 S4 : avg. switching activity of LSB(MSB) region of operand 2

Module

M. PedramUSC

Input/Output Data Model

Pwr = C0 + C1 ·S1 + C2 ·S2 + C3 ·S3

Consider a data path block:

S1 S2 : avg. switching activity ofoperands 1 and 2

S3 : avg. switching activity ofoutput

Module

Jan 1998

Page 18: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

18

University of Southern California 18

© M. Pedram Nov 1997

M. PedramUSC

Bitwise Data Model

Consider a random logic block:

∑ ⋅+=inputs iSiCCPwr 0

Si : avg. switching activity ofinput signal i Module

More parameters lead to ahigher degree of accuracy,but increase thecomputational overhead

M. PedramUSC

Cycle-Accurate Macro-Models

Cycle-AccurateMacro-Models

AveragePower

SignalIntegrity

PowerDistribution

PeakPower

DCVoltage

Drop

SwitchingNoise

Jan 1998

Page 19: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

19

University of Southern California 19

© M. Pedram Nov 1997

M. PedramUSC

Statistical MacromodelConstruction

0

5

10

15

20

C1355 C1908 C2670 C3540 C432 C5315 C6288 C7552 C880 Mul16 Adder16

Erro r in Cyc lePo we r (%)

Erro r in Ave rag ePo we r (%)

Model Validation?

101110010…1111011010…0010000011…1…

Low-levelSimulation

TrainingSet

Variable Selection basedon Sensitivity Analysis

Circuit

Database

EquationForm

Initial VariableSet

No YesDONE

Variable Selection basedon Sensitivity Analysis

Least Square Fit

M. PedramUSC

Power Optimization FlowHW/SW Co-Design:

Partitioning and MappingCommunication Synthesis

Datapath Synthesis:Multiple Voltage Scheduling Resource Allocation/Binding

Controller Synthesis:State Assignment

Interface Design:I/O Encoding

Design Planning:Floorplanning & Global Routing

Power/Delay Budgeting

Logic Synthesis:Logic Minimization & Restructuring

Technology Mapping

Physical Design:Placement & RoutingResizing & Rewiring

Jan 1998

Page 20: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

20

University of Southern California 20

© M. Pedram Nov 1997

M. PedramUSC

System and Behavioral Synthesis

Resource Allocation and Sharing

Interconnect Synthesis

CommunicationSynthesis

Selective Shut-offof System Modules

Multiple Supply Voltage Scheduling

HW-SW Partitioningand Mapping

M. PedramUSC

RT-Level and Logic Synthesis

Logic Restructuringand Minimization

Technology Mapping

Retiming

Precomputation or Bypass Logic

State Assignment and Bus Encoding

Gated Clocking

Jan 1998

Page 21: Outline - Massoud Pedram ·  · 2012-09-08Age & Gender → Zero-delay power estimate Weight → Powermill power estimate Input vectors (population) ... also models the joint k-step

21

University of Southern California 21

© M. Pedram Nov 1997

M. PedramUSC

Physical Design

Transistor Re-ordering under a DSM Delay/Power Model

Gate Sizing under aDSM Delay/Power Model

Bounded-SkewGated Clock Routing

Floorplanning withIntegrated Power Plane Design

Fanout Optimization undera DSM Delay/Power Model

P/G Network Design forGround Bounce Control

M. PedramUSC

Summary• CAD flows and tools can reduce power

dissipation in VLSI circuits and systems by afactor of 5 - 8 X over the next three years

• Process and voltage scaling can provideanother factor of 8 - 12 X

• Commercial tools for gate-level and circuit-level power estimation and optimization exist

• High level power analysis and estimation toolsare a key enabling technology

• Early Design planning and system-level designand power optimization tools are needed

Jan 1998