benchmarking of dynamic power management solutions · benchmarking of dynamic power management...

Post on 11-May-2018

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Benchmarking of Dynamic Power Management Solutions

Frank Dols

CELF Embedded Linux Conference

Santa Clara, California (USA)

April 19, 2007

2 / 43

Why Benchmarking?

VendorVendor NXPNXP

What do we get?What do we get?

From Here to There, 2000whatever From Here to There, 2000whatever

!!

3 / 43

Presentation Outline:

The context;– Power management concepts.

– Hardware and software.

Benchmarks;– The process.

– Metrics.

– Findings.

Conclusions.– Relation to other work.

– What’s next.

4 / 43

Area Of Interest

Mobile/portable devices mostly exist of:1. Battery.

2. Storage (flash disk, hard disk, …).

3. Display (LCD, …).

4. Speaker.

5. Broadband I/O (Bluetooth, UMTS, …).

6. Processing (CPU)

We are initially only focusing on the optimization of the “Processing” power

consumption.

– As next, we are also taking 1-5 into account.

5 / 43

Power Measurements

A board under test

LabView

Mp3 playback – LabViewmeasurements

Measurements equipment: high precision DMM (digital Multi-Meters)

6 / 43

Energy Consumers

Energy saving methods trade performance or functionality for energy:– Scaling performance of processors, memories and buses;

– Using various stand-by modes of peripherals.

Energy saving is about supplying

the right amount of performance

at the right time.

However, the future is unknown!

other

CPU

Memory

LCD

DC/DC

Energy consuming components in a typical

audio/video playing mobile device.

7 / 43

Power Dissipation Basics

� Reduce capacitance switched.

� Reduce switching currents.

� Reduce operating voltage.

∫t

cDD fVC0

2)( Dynamic

Power Dissipation

Dynamic

Power Dissipation

dtIVfVCE QDD

t

cDD ))((

0

2+= ∫

Total Power

Dissipation

Total Power

Dissipation

� Reduce leakage current in active and standby modes of operation.

� Reduce operating voltage.

lkgDD

t

IV∫0

Leakage

Power Dissipation

Leakage

Power Dissipation+

8 / 43

Dynamic Voltage Frequency Scaling (DVFS)

Scales performance according to demand (using an estimation of future

workload).

Based on the fact that energy per clock cycle rises with frequency:

Implemented by switching between operating points (voltage and frequency

pairs).

fVP ⋅=2

ˆ

Clock Generator

Voltage Controller

Applications, task scheduler,or other source of information

Workload estimation

Processor ormemory

9 / 43

DVFS: How Does it Save Power

Power

Time

High f,V

Performance

Stand-by power

Idle

Full speed

50% CPU usage, no DVFS

Power

Time

Optimal!

Performance

Low f,V

50% speed

50% CPU usage, with DVFS

Performance

Power

TimeEnergy used

100% CPU usage

10 / 43

Performance Prediction Methods

100%

0%

Interval-based: use CPU-usage of previous interval(s) to determine frequency for next interval.

operating frequency

task /workload

100%

0%

Problem: late reaction on changing workloads �missed deadlines

11 / 43

Performance Prediction Methods

100%

0%

Application-directed: use information from the application to change frequencies.

Problems: requires changes in the application;not always possible (interactive applications).

task /workload

operating frequency

12 / 43

Dynamic Power Management (DPM) Concept

DPM software connects to OS kernel and collects data,– with collected data, and usage of policies, try to predict future workload.

Multiple policies categorize software workload.

Single global prediction of future performance is made.

OSDPM

SoftwareRequired

Performance

Policy

PolicyPolicy

Policy

Policy Evaluation

Volts

MHz

Apps.

Apps.

Apps.

Apps.

% of max.

performance

13 / 43

Application directed DVFS

Two main groups of mobile applications:

Internet browsing

Gaming

Interactive

Audio decoding

Video decoding

Streaming

Future workload unknown (depends on user input)

Future workload known to application

14 / 43

Presentation Outline:

The context;– Power management concepts.

– Hardware and software.

Benchmarks;– The process.

– Metrics.

– Findings.

Conclusions.– Relation to other work.

– What’s next.

15 / 43

Benchmark Process

Plan:– Define Key Performance Indicators (KPI).

– Define test to measure KPI.

Do:– Measurement on evaluation platform.

Check:– Analyze gained results and can they be explained.

– When necessary, cross verify with Vendor.

Act:– Take necessary actions on gained results.

16 / 43

Key Performance Indicators (KPI)

Memory usage.– Footprint of DPM framework and administration.

CPU usage.– Cycles consumed by DPM framework.

– System behavior, predictability & reproducibility.

System idleness.– Prediction of optimum frequency (minimization of idleness).

Real time behavior.– Application deadlines missed.

– Responsiveness (latency on events).

– DPM Policy prediction accuracy.

17 / 43

Test / Use Cases

Synthetic (fine-grained) benchmark,– Simulate different workload levels,

– Test corner cases.• LMbench as available from Open Source; lat_proc, lat_syscall, memory, clock, idle, …

Application level (coarse-grained) benchmark:– Whetstone & Dhrystone (artificial system load).

– Hartstone (real time behavior).

Video decoding,– ffplay mpeg video decoding use case.

• use ffplay, part of ffmpeg.

Audio decoding,– mp3 audio decoding use case.

• use ARM MP3 decoding library.

18 / 43

Metrics for SW based Benchmarking

Missed deadlines;

Overdue time;

Idle time;

Jitter;

Responsiveness.

19 / 43

Missed Deadlines

When operating frequency is too low, deadlines are missed:

Power

Time

Power

Time

20 / 43

Overdue Time

Are missed deadlines caused by timing inaccuracy, or by performance

deficiency?

Power

Time

Power

Time

Overdue time

21 / 43

Idle Time

When operating frequency is too high, extra slack time is introduced, and

power is wasted:

Power

Time

Power

Time

Power

Time

Optimal scenario

22 / 43

Jitter

Execution times vary, so time of completion varies as well:

Power

Time

Arrival times are periodic, but completion times vary ���� Jitter

arrival

completion

23 / 43

In Summary, DPM Metrics

Missed deadlines (OS schedule);

Overdue time;

Idle time;

Jitter (fluctuation in Completion time);

Responsiveness.

Power

Time

Power

Time

Missed Deadline

Overdue Time

Idle TimeCompletion

Time

24 / 43

NXP’s DVFS Benchmark Platform

Energizer I SoC;– ARM1176JZF-s,

– DVFS-enabled (core also),

– CPU voltage can be set in

25 mV increments,

– CPU frequency can be set using

2 PLL’s (300MHz and 400MHz by default)

and a divider.

Energizer I Software;– Linux 2.6.15,

– CPUfreq and PowerOP.

– MV’s DPM framework ported but not used.

25 / 43

CPU Usage CharacteristicsNo DPM

0

10

20

30

40

50

60

70

80

90

100

0

2,7

5,4

8,1

10,8

13,5

16,2

18,9

21,6

24,3 27

29,7

32,4

35,1

37,8

40,5

43,2

45,9

48,6

51,3 54

56,7

59,4

62,1

time (s)

perf

orm

an

ce (

%)

DPM Load CPU usage

Exponential Weighed Average Policy

0

20

40

60

80

100

120

0

1,9

3,8

5,7

7,6

9,5

11,4

13,3

15,2

17,1 19

20,9

22,8

24,7

26,6

28,5

30,4

32,3

34,2

36,1 38

39,9

41,8

43,7

45,6

47,5

49,4

51,3

53,2

55,1 57

58,9

60,8

62,7

time (s)

perf

orm

an

ce (%

)

DPM Load CPU usage

Moving Average Policy

0

20

40

60

80

100

120

0

1,9

3,8

5,7

7,6

9,5

11

,4

13

,3

15

,2

17

,1 19

20

,9

22

,8

24

,7

26

,6

28

,5

30

,4

32

,3

34

,2

36

,1 38

39

,9

41

,8

43

,7

45

,6

47

,5

49

,4

51

,3

53

,2

55

,1 57

58

,9

60

,8

62

,7

time (s)

pe

rfo

rma

nce

(%

)

DPM Load CPU usage

CPU usage,

applied policy reacts slow

on required performance.

26 / 43

DPM; Synthetic Workload

0

20

40

60

80

100

120

time

pe

rfo

rma

nc

e

(1st) Synthetic Workload (2nd) DPM, Predicted Performance Requirement

(3rd) Actual CPU Usage

Headroom

Late Response

(Exponential Weighed Average Policy)

27 / 43

Application Level Benchmark

Video playback (DivX).– Open-Source ffplay using frame-buffer device.

– Instrumented with time measurements.

Evaluation:– How many hard deadlines are missed with & without DPM.

28 / 43

ffplay Deadlines

• ffplay deadline: time interval in between displaying of

successive (decoded) frames.

- For 10 fps movie -> the deadline is 100 ms.

- With DPM activated,

fluctuation increases.

- REMARK: video output

(display) on CompactPresen-

ter via Compact flash interface,

results in high fluctuation

oops !! (see next slide).

ffplay@10fps

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

200000

1 28 55 82 109 136 163 190 217 244 271 298 325 352 379 406 433 460 487

frames

us

1

2

3

4

29 / 43

Kernel & User Space Activity

While running ffplay, sample the CPU activity in both

kernel and user space (based on /proc/stat).

Kernel activity dominates during the video playback.

ffplay@10fps

0

10

20

30

40

50

60

70

80

90

100

0 10000 20000 30000 40000 50000

time, us

CPU u

sage, %

user

kernel

30 / 43

DPM; Video Decoding

0

20

40

60

80

100

5 10 15 20 25

frames per second

dead

lin

es m

issed

/ i

dle

tim

e (

%)

idle idle DPM deadlines deadlines DPM

Power Savings

Performance Impact

PolicyOverhead

(Exponential Weighed Average Policy)

System Overload

31 / 43

Video Decoding Jitter

1

10

100

1000

10000

0 5 10 15 20 25 30

frames per second

jitt

er

(milliseco

nd

s)

jitter jitter DPM

Clock scaling causes jitter increase

Jitter metric gets polluted by high number of deadline misses

32 / 43

Video Decoding and Hints

CITY_5fps.avi

0

20

40

60

80

100

120

0 5 10 15 20 25 30 35 40 45 50

time (s)

fre

qu

en

cy

(M

Hz

)

No DPM DPM, No hints DPM, with Hints

33 / 43

0

10

20

30

40

50

60

70

80

2,3 3,55 4,8 6,05 7,3

idle

tim

e (%

)

5 7,5 10 12,5 15

frames per second

Custom DPM disabled Custom DPM enabled, without hints Custom DPM enabled, with hints

DPM disabled DPM enabled

Comparison

DPM without hints is only effective at

low workloads

When application-directed hints are enabled;• idle time reduction increases,• effective in a wider frame rate window.

34 / 43

Presentation Outline:

The context;– Power management concepts.

– Hardware and software.

– Relation to other work.

Benchmarks;– The process.

– Metrics.

– Findings.

Conclusions.– Relation to other work.

– What’s next.

35 / 43

Technology Conclusions

Dynamic Power Management (DPM).– Added value is in policies.

– Performs best on low workloads.

– Adaptation to changing workload is heavily dependent on chosen policy.

– As application developer, you have to develop your own policies.

– Be aware, policy adds overhead.

Applications cannot feed data directly into DPM.– Performance prediction based on logging application history does not always

work.

– The best performance prediction may come from the application itself (feed

forward)!

– Hinting can give improvements over an interval based approach.

36 / 43

Conclusions

Interval based DPM at CPU-level results in higher CPU utilisation, but seems

only really effective at low workloads;

Precise energy saving figures should be measured using HW measuring

tools, but SW benchmarks give good insight in saving potentials and

consequences on real-time behaviour.

SoC is not the biggest power consumer.– Backlight, DC/DC converters & power amplifiers are big consumers, by optimizing

these, most can be gained.

37 / 43

What’s Next

1. CELF (you guys), MIPI PM working group

- Matt Locke and his team

2. Mode switching.

3. Correlation between hardware and software;

- What to solve at which level?

4. Deadline based scheduling, from priority based to time based.

5. Start contributing to the Community on this.

Regarding bullet 1, 2 & 3, see extra slides

38 / 43

Overview of Open Source Power Mgt. Software

CPU power management

Standby (low power mode)when idle

DynamicVoltage/FrequencyScaling

Device power management

Suspend/Resumeof

devices

(memory)bus

frequencyscaling

Suspendcompletesystem

CPUfreqTask

schedulerAPM/

ACPI

Power

OP

Linux Device

Model

(also look at Mark Gross presentation)

39 / 43

Mode Transitions

ACTIVE mode LP mode ACTIVE modetransition transition

app

policy

slp_sw

app

slp_hw wu_hw

wu_sw

PM policy (performance monitoring, decision taking)

Mode transition in sw domain

(state save, sequencing, PM infra control)

Mode transition in hw domain(sequencing, communication)

event

Mode transition in hw domain(communication, Voltage/

Frequency settling,

sequencing)

Mode transition in sw domain

(warm boot, state restore, sequencing)

40 / 43

Typical

Hardware

aided

Improving power managementthrough co-design of hardware and software

Other ideas for improving power management– Performance counters for other peripherals:

• Cache miss rate is an indicator for memory bus usage of the CPU.

– Hardware aided workload prediction algorithms (in embedded FPGA):• DVFS algorithm in FPGA, instead of software;• Enables fast calculation of estimations and quick operating point changes;• Algorithm can be changed for every use case because of the use of an FPGA.

– More hardware acceleration for specific applications:• Needs discussion with SW developers in early stage about which parts can be

accelerated;• Enables running on a lower frequency, and as such saving energy.

Workload

prediction

algorithm

Policy

manager

DVFS Hardware

(clock generator,

voltage controller)Applications

Software

Hardware

OS

FPGA

41 / 43

To come to the End.

42 / 43

From Here to There, 2000whatever From Here to There, 2000whatever

We Started With the Question:“Why Benchmarking?”

Do your homework,

don’t bring in the Trojan Horse!

43 / 43

44 / 43

NXP Semiconductors

Established in 2006(formerly a division of Philips)

Builds on a heritage of 50+ years of experience in semiconductors

Provides engineers and designers with semiconductors and software that deliver better sensory experiences

Top-10 supplier with Sales of € 4.960 Bln (2006)

Sales: 35% Greater China, 31% Rest of Asia, 25% Europe, 9% North America

Headquarters: Eindhoven, The Netherlands

Key focus areas:

– Mobile & Personal, Home, Automotive & Identification, Multimarket Semiconductors

Owner of NXP Software: a fully independent software solutions company

45 / 43

Where to Find Us

Operating systems Knowledge Centre (OKC)

phone: +31 40 - 27 45 131

fax: +31 40 - 27 43 630

email: okc.helpdesk@nxp.com

Web: www.nxp.com

address: High Tech Campus 46

location 3.31

5656 AE Eindhoven (NL)

46 / 43

top related