performance counter based architecture level power modeling ( ) methodologyresults motivation &...

1
Performance Counter Based Architecture Level Power Modeling ( http://vlsicad.ucsd.edu ) Methodology Results Motivation & Goals • Processor power is increasing power management is a “grand challenge” in the semiconductor roadmap (ITRS) • Processor architects need accurate architecture-level power models • Low-overhead solutions are preferable Potential applications: • Hints for low-power compilers/embedded programmers to reduce power consumption. • Guidance for processors designers seeking to reduce power • “Zero-overhead temperature sensing” for thermal reliability-driven processor throttling (dynamic voltage and frequency scaling) Power Measurement Platform Puneet Sharma ( [email protected] ) Advisor: Prof. Andrew B. Kahng Electrical & Computer Engineering Joint work with Mr. John Seng and Prof. Dean Tullsen, UCSD CSE department Abstract Modern microprocessors have built-in performance counters that are used primarily for compiler and processor optimization. We investigate whether built-in performance counters can also be used to predict the amount of power consumed by the processor. This poster reports early efforts toward correlation of processor power consumption to increments in performance counters, via statistical model fitting. • Only certain subsets of counters may be collected simultaneously due to limitations imposed by the collection method • Multiple runs required to collect all counters • Different runs collect counters at different time instants need to synchronize Counter Micro-operations retired j+1 j-1 j 0 i-1 0 i Interpolated Problem 1: Counter-Counter Synchronization • Number of micro-operations retired is collected in all runs and used as a “timeline” • Counter values are linearly interpolated from all runs to match the first run (“reference run”) • “Poorly interpolable” counters put in reference run Solution: • Counter values and power values are collected on different systems which are not synchronized • Need to synchronize counters and power to know which counter readings correspond to each power reading Problem 2: Counter-Power Synchronization Solution: • Initial sleep phase introduced both counter and power readings drop to zero, improving initial alignment of counters and power • Power and counter readings time-stamped • Sliding time windows of n counter readings considered and energy computed in them UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering • VLSI CAD Laboratory • UC San Diego Computer Engineering • VLSI CAD Laboratory • UC San Diego Computer Engineering • VLSI CAD Laboratory U C S a n D i e g o C o m p u t e r E n g i n e e r i n g V L S I C A D L a b o r a t o r y U C S a n D i e g o C o m p u t e r E n g i n e e r i n g V L S I C A D L a b o r a t o r y U C S a n D i e g o C o m p u t e r E n g i n e e r i n g V L S I C A D L a b o r a t o r y Power Consumption Power Estimat or Processor Read Counters Related Work • Joseph et al. (ISLPED-01) model power consumption of an Intel Pentium Pro based on known maximum power dissipations in microarchitectural structures. The relative contribution of structures to total power is dependent on counter readings, but no claims of accuracy are established. • Bellosa et al. (SIGOPS-00) studied several performance counters to demonstrate correlation with total chip power, and estimate energy consumed for each microarchitectural event. • Power consumption estimates have also been made using statistics from architecture-level performance simulators, with activities of particular structures used to estimate power. Wattch, SimplePower, Architecture Power Model and AccuPower fall in this category. For example, Wattch has an accuracy of 10%-13% for individual processor structures (compared to actual circuit implementations) and 30% for full chip power (compared to reported maximum full chip power values). Pentium 4 motherboard Voltage Regulator A/D Converter Single Board Computer Data Collection Computer .015 .015 Setup Motherboard Gigabyte GA-8IEXP Processor Intel Pentium 4 2.0 GHz 1.5 Vdd, 512KB L2 cache A/D converter TI ADS1210 100 Hz sample rate 22 bits resolution Experiment • We use the SPEC 2000 benchmark suite for all experiments • Performance counter values are collected on the processor under test at the rate of 50Hz • During the run, the power consumption of the processor under test is read at the rate of 100Hz on another machine • Our counter collection method restricts us from collecting all counters simultaneously we perform multiple runs to collect all counters • We form two subsets of the benchmark suite: our model it fitted using the training set and model accuracy is evaluated using the test set • Relate energy consumed to increment in performance counters Problem 3: Model-fitting Solution: • Linear, quadratic, cubic, etc. regression • Cluster analysis Conclusion 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% applu art crafty gap gzip lucas • Training set = 14 Floating Point benchmarks,10 Integer benchmarks • Test set = 3 FP and 3 Int benchmarks • Linear regression model results (error in total energy consumption per benchmark) shown at right • The benchmark gap has maximum error (25.17%) Estimated power vs. Actual power • Blue = estimated power, Red = actual power art applu crafty gzip gap lucas • Initial results: performance counters can potentially yield accurate models and predictions of processor power consumption • More flexible nonlinear regression models may yield improved predictions of power from counter values • Counters that could be useful for power prediction are not available • E.g., number of divides, multiplies, … • Splitting certain counters might be useful • Pentium 4 processor contains a counter for the number of floating point operations; more specific counters which count different operations separately might be more useful Can architecture power be estimated accurately using existing performance counters? This project: • Study feasibility of power modeling based on built-in performance counters • Study effects of architectural events on dynamic power • Counter collected at the black points. • Blue points represent interpolated values of at (micro-operations retired) corresponding to reference run. Power collected at t’ i , t’ i+1 t’ i+p , t’ I+p+1 . • Need to find energy consumed in the time window t k to t k+w (given by the shaded area).

Post on 19-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Performance Counter Based Architecture Level Power Modeling (  ) MethodologyResults Motivation & Goals Processor power is increasing

Performance Counter Based Architecture Level Power Modeling( http://vlsicad.ucsd.edu )

Methodology Results

Motivation & Goals•Processor power is increasing power

management is a “grand challenge” in the semiconductor roadmap (ITRS)

•Processor architects need accurate architecture-level power models

•Low-overhead solutions are preferable

Potential applications:• Hints for low-power compilers/embedded programmers to reduce

power consumption.• Guidance for processors designers seeking to reduce power• “Zero-overhead temperature sensing” for thermal reliability-driven

processor throttling (dynamic voltage and frequency scaling)

Power Measurement Platform

Puneet Sharma( [email protected] )

Advisor: Prof. Andrew B. Kahng

Electrical & Computer Engineering

Joint work with Mr. John Seng and Prof. Dean Tullsen, UCSD CSE department

AbstractModern microprocessors have built-in performance counters that are used primarily for compiler and processor optimization. We investigate whether built-in performance counters can also be used to predict the amount of power consumed by the processor. This poster reports early efforts toward correlation of processor power consumption to increments in performance counters, via statistical model fitting.

• Only certain subsets of counters may be collected simultaneously due to limitations imposed by the collection method

• Multiple runs required to collect all counters

• Different runs collect counters at different time instants need to synchronize

Counte

r

Micro-operations retired

j+1j-1 j0i-1 0

i

Interpolated

Problem 1: Counter-Counter Synchronization

• Number of micro-operations retired is collected in all runs and used as a “timeline”• Counter values are linearly interpolated from all runs to match the

first run (“reference run”)• “Poorly interpolable” counters put in reference run

Solution:

• Counter values and power values are collected on different systems which are not synchronized

• Need to synchronize counters and power to know which counter readings correspond to each power reading

Problem 2: Counter-Power Synchronization

Solution:• Initial sleep phase introduced both counter and power readings drop to zero, improving initial alignment of counters and power• Power and counter readings time-stamped• Sliding time windows of n counter readings considered and energy

computed in them

UC San Diego Computer Engineering • VLSI CAD Laboratory • UC San Diego Computer Engineering • VLSI CAD Laboratory • UC San Diego Computer Engineering • VLSI CAD Laboratory • UC San Diego Computer Engineering • VLSI CAD Laboratory UC San Diego Computer Engineering • VLSI CAD Laboratory • UC San Diego Computer Engineering • VLSI CAD Laboratory • UC San Diego Computer Engineering • VLSI CAD Laboratory • UC San Diego Computer Engineering • VLSI CAD Laboratory

UC

San D

iego C

om

pute

r Engin

eeri

ng

V

LSI C

AD

Labora

tory

U

C S

an D

ieg

o C

om

pu

ter

Engin

eeri

ng

V

LSI C

AD

Labora

tory

U

C S

an D

iego C

om

pute

r Engin

eeri

ng

V

LSI C

AD

Labora

tory

UC

San D

iego C

om

pute

r Engin

eerin

g •

VLS

I CA

D La

bora

tory

• U

C S

an D

ieg

o C

om

pu

ter E

ngin

eerin

g •

VLS

I CA

D La

bora

tory

• U

C S

an D

iego C

om

pute

r Engin

eerin

g •

VLS

I CA

D La

bora

tory

••

PowerConsumption

PowerEstimator

ProcessorRead

Counters

Related Work• Joseph et al. (ISLPED-01) model power consumption of an Intel

Pentium Pro based on known maximum power dissipations in microarchitectural structures. The relative contribution of structures to total power is dependent on counter readings, but no claims of accuracy are established.

• Bellosa et al. (SIGOPS-00) studied several performance counters to demonstrate correlation with total chip power, and estimate energy consumed for each microarchitectural event.

• Power consumption estimates have also been made using statistics from architecture-level performance simulators, with activities of particular structures used to estimate power. Wattch, SimplePower,

Architecture Power Model and AccuPower fall in this category. For example, Wattch has an accuracy of 10%-13% for individual processor structures (compared to actual circuit implementations)

and 30% for full chip power (compared to reported maximum full chip power values).

Pentium4

motherboard

VoltageRegulator

A/D ConverterSingleBoard

Computer

DataCollectionComputer

.015

.015

Setup

MotherboardGigabyte GA-8IEXP

ProcessorIntel Pentium 4 2.0 GHz1.5 Vdd, 512KB L2 cache

A/D converterTI ADS1210100 Hz sample rate22 bits resolution

Experiment

• We use the SPEC 2000 benchmark suite for all experiments• Performance counter values are collected on the processor under

test at the rate of 50Hz• During the run, the power consumption of the processor under test

is read at the rate of 100Hz on another machine• Our counter collection method restricts us from collecting all

counters simultaneously we perform multiple runs to collect all counters

• We form two subsets of the benchmark suite: our model it fitted using the training set and model accuracy is evaluated using the test set

• Relate energy consumed to increment in performance countersProblem 3: Model-fitting

Solution:• Linear, quadratic, cubic, etc. regression• Cluster analysis

Conclusion

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

applu art crafty gap gzip lucas

• Training set = 14 Floating Point benchmarks,10 Integer benchmarks

• Test set = 3 FP and 3 Int benchmarks• Linear regression model results (error

in total energy consumption per benchmark) shown at right

• The benchmark gap has maximum error (25.17%)

Estimated power vs. Actual power • Blue = estimated power, Red = actual power

artapplu

crafty

gzip

gap

lucas

• Initial results: performance counters can potentially yield accurate models and predictions of processor power consumption • More flexible nonlinear regression models may yield improved

predictions of power from counter values• Counters that could be useful for power prediction are not available

• E.g., number of divides, multiplies, …• Splitting certain counters might be useful

• Pentium 4 processor contains a counter for the number of floating point operations; more specific counters which count different operations separately might be more useful

Can architecture power be estimated accurately using existing performance counters?

This project:• Study feasibility of power modeling based on built-in performance

counters• Study effects of architectural events on dynamic power

• Counter collected at the black points.• Blue points represent interpolated values of at (micro-operations retired) corresponding to reference run.

• Power collected at t’i, t’i+1…t’i+p, t’I+p+1.• Need to find energy consumed in the time window tk to tk+w (given by the shaded area).