university of california, irvine and san diego energy-aware system design for wireless multimedia...

69
University of University of California, California, Irvine and San Diego Irvine and San Diego Energy-Aware System Design for Energy-Aware System Design for Wireless Multimedia Wireless Multimedia Nikil Dutt Shivajit Mohapatra Rajesh Gupta Nalini Venkatasubramanian Sumit Gupta Cristiano Pereira Hans Van Antwerpen Ralph Von Vignau Philips Semiconductors

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

University of California, University of California, Irvine and San DiegoIrvine and San Diego

Energy-Aware System Design for Energy-Aware System Design for Wireless MultimediaWireless Multimedia

Nikil DuttShivajit Mohapatra

Rajesh GuptaNalini

VenkatasubramanianSumit Gupta

Cristiano Pereira

Hans Van Antwerpen

Ralph Von Vignau

Philips Semiconductors

2

Talk OutlineTalk Outline

OverviewOverview Distributed Wireless Distributed Wireless MultimediaMultimedia

Case Study:Case Study: The FORGE FrameworkThe FORGE Framework

IP Reuse at PhilipsIP Reuse at Philips

3

• Power Optimization in battery operated mobile devices is a crucial research challenge• Devices operate in dynamic distributed environments.• Future power management strategies need to be aware of global system changes.

4

Online Gaming

Video Streaming

Distance Learning

Online Banking,Chat

W i

d e

A

r e

a N

e t

w o

r k

W i

r e

l e s

s

N e

t w

o r

k

L o

w

P o

w e

r

D e

v i

c e

s

AP

AP

laptop

palmtop

iPAQ

Low-powermobile device

Infrastructure for Mobile Multimedia Environments

request

response

• Best-Effort Service

5

Online Gaming

Video Streaming

Distance Learning

Online Banking,Chat

Directory Service

Broker

W i

d e

A

r e

a N

e t

w o

r k

W i

r e

l e s

s

N e

t w

o r

k

PROXY-N

L o

w

P o

w e

r

D e

v i

c e

s

AP

AP

laptop

palmtop

iPAQ

Low-powermobile device

Data from Server

Execute Remote Tasks

Caching compress

decryptionencryption

Compositing transcode

output

Data from Mobile host

services

PROXY-1

Enhanced Infrastructure

6

Challenges in Wireless Multimedia Challenges in Wireless Multimedia ProcessingProcessing

Proliferation of DevicesProliferation of Devices System support for multitude of smart devices System support for multitude of smart devices

thatthat attach and detach from a distributed infrastructureattach and detach from a distributed infrastructure produce large volume of information at a high rateproduce large volume of information at a high rate limited by communication and power constraintslimited by communication and power constraints

Need a customizable networking backboneNeed a customizable networking backbone QoS driven resource provisioning algorithms QoS driven resource provisioning algorithms

for highly dynamic environmentsfor highly dynamic environments Need to deal adaptively with incoming Need to deal adaptively with incoming

requestsrequests Dynamically reconfigure system to service Dynamically reconfigure system to service

requestsrequests

7

High Data Volume of Multimedia High Data Volume of Multimedia InformationInformation

Speech 8000 samples/s 8Kbytes/s

CD Audio 44,100 samples/s, 2 bytes/sample

176Kbytes/s

Satellite Imagery

180X180 km 2̂ 30m 2̂ resolution

600MB/image (60MB compressed)

NTSC Video 30fps, 640X480 pixels, 3bytes/pixel

30Mbytes/s (2-8 Mbits/s compressed)

8

Challenges in Wireless Multimedia Challenges in Wireless Multimedia ProcessingProcessing

Dealing with Device MobilityDealing with Device Mobility Need high degree of “network awareness”Need high degree of “network awareness”

congestion rates, mobility patterns etc.congestion rates, mobility patterns etc. global system state is constantly changingglobal system state is constantly changing

Service Brokering for QoS Aware Resource Service Brokering for QoS Aware Resource ProvisioningProvisioning Admission control, Load-balancing etc.Admission control, Load-balancing etc.

Multimedia Processing challenges Multimedia Processing challenges Soft Real Time ConstraintsSoft Real Time Constraints Synchronization (e.g. lip sync. , floor control)Synchronization (e.g. lip sync. , floor control) Support for traditional media (text, images) and Support for traditional media (text, images) and

continuous media (audio/video)continuous media (audio/video) Other considerations – Availability, Other considerations – Availability,

Reliability, Cost-Effectiveness & SecurityReliability, Cost-Effectiveness & Security

9

Distributed Wireless MultimediaDistributed Wireless Multimedia

Different forms of information accessible anytimeDifferent forms of information accessible anytime Multiple Sessions with varying characteristics.Multiple Sessions with varying characteristics.

Services, Networks and SystemsServices, Networks and Systems Heterogeneous, evolve dynamicallyHeterogeneous, evolve dynamically

Quality of Service Quality of Service Constraints: Timing, resource availability, network Constraints: Timing, resource availability, network

constraints, (e.g. bandwidth), security, reliability …constraints, (e.g. bandwidth), security, reliability … Example: For Multimedia Streaming to Handheld Example: For Multimedia Streaming to Handheld

DevicesDevices QoS Parameters: QoS Parameters: jitter, frame rate, resolution, bit-ratejitter, frame rate, resolution, bit-rate etc. etc. All these QoS parameters affect user perception. All these QoS parameters affect user perception.

Power is a new QoS dimension – in distributed Power is a new QoS dimension – in distributed multimedia. multimedia. User must be able to watch requested video without User must be able to watch requested video without

running out of batteryrunning out of battery

10

Multimedia Streaming ExampleMultimedia Streaming Example

We use this framework for examining the design challengesWe use this framework for examining the design challenges Proxy node between servers and clients allows dynamic stream Proxy node between servers and clients allows dynamic stream

transformations (Transcoding, Adaptation, Annotation etc)transformations (Transcoding, Adaptation, Annotation etc)

WiredNetwork

WiredNetwork

ACCESS POINT

WirelessNetwork

PROXY

MEDIA SERVERSMEDIA SERVERS CLIENTSCLIENTS

Handheld PC

PDA

Phone

NETWORKNETWORK

11

Opportunities in Wireless Multimedia Opportunities in Wireless Multimedia System Design System Design

Dynamic nature of multimedia tasks leaves some Dynamic nature of multimedia tasks leaves some computational slackcomputational slack Slack = Difference between computational capability Slack = Difference between computational capability

and computational requirements due to deadlinesand computational requirements due to deadlines QoS trade-offs possible for reducing energy QoS trade-offs possible for reducing energy

consumptionconsumption Example: Lower quality video needs less Example: Lower quality video needs less

computation/bandwidthcomputation/bandwidth Multimedia Applications CharacteristicsMultimedia Applications Characteristics

Kernels of computation-dominated operationsKernels of computation-dominated operations E.g. MPEG: IDCT, motion compensation, VLDE.g. MPEG: IDCT, motion compensation, VLD

Predictable, regular behavior (most of the time)Predictable, regular behavior (most of the time) E.g. VLD, followed by IQ, IDCTE.g. VLD, followed by IQ, IDCT

Clear computation and/or data access patterns (cyclic)Clear computation and/or data access patterns (cyclic) E.g. video frames are traversed in a known orderE.g. video frames are traversed in a known order

Exploit multimedia specific characteristics to Exploit multimedia specific characteristics to enable a range of optimization techniquesenable a range of optimization techniques

12

Implications on Wireless Multimedia Implications on Wireless Multimedia System DesignSystem Design

Devise strategies that reduce energy Devise strategies that reduce energy These strategies must adapt to/optimize for These strategies must adapt to/optimize for

changes in changes in Application Data (video stream)Application Data (video stream) OS/Hardware (CPU, Memory, Reconfigurable logic)OS/Hardware (CPU, Memory, Reconfigurable logic) Network (congestion, noise, node mobility)Network (congestion, noise, node mobility) Residual Energy (battery)Residual Energy (battery) Environment (Ambient light, sound)Environment (Ambient light, sound)

Strategies canStrategies can Change application behavior (compression ratio)Change application behavior (compression ratio) Reduce backlightReduce backlight Buffer Data (and switch off network card)Buffer Data (and switch off network card)

13

Abstraction Layers in Distributed Abstraction Layers in Distributed Multimedia SystemsMultimedia Systems

Server

Clientn

Clienti

Client1

NetworkCard

Display Cache Memory Reg Files CPU H/W

Operating System

DVS Scheduler

NetworkManagement

Transcoding AdmissionControl

ApplicationsVideo Player Other Tasks

Middleware

Abstraction Layers

ChallengesChallenges Enable high quality of services (particularly multimedia Enable high quality of services (particularly multimedia

services) at the mobile device: services) at the mobile device: High Computational capabilityHigh Computational capability Do so within strict Do so within strict Peak Power and Energy BudgetsPeak Power and Energy Budgets Eg.: Play video stream at highest quality (requires Eg.: Play video stream at highest quality (requires

computation), while ensuring the entire video plays back computation), while ensuring the entire video plays back (requires energy)(requires energy)

14

Energy Aware System Design Energy Aware System Design TechniquesTechniques

Several approaches optimize energy for Several approaches optimize energy for each component and each abstraction leveleach component and each abstraction level

Solutions – at each abstraction levelSolutions – at each abstraction level Architecture: Architecture:

Cache/Memory optimizations, Processor Cache/Memory optimizations, Processor architectural optimizationsarchitectural optimizations

Operating SystemOperating System Dynamic voltage scaling (DVS)Dynamic voltage scaling (DVS) Dynamic power management (DPM) of System Dynamic power management (DPM) of System

components: disks, network interfacescomponents: disks, network interfaces Middleware solutions Middleware solutions

Adaptive streaming, mobility based Adaptive streaming, mobility based adaptationsadaptations

Application adaptationsApplication adaptations Profiling applications for low power executionProfiling applications for low power execution

Related WorkRelated Work

Architecture

Architecture(cpu, memory)Operating System

DVS, DPM, DriverInterfaces, system callsDistributed Middleware

Distributed Adaptation Cross-Layer AdaptationAppl. specific AdaptationUser/Application

Quality of Service Application/user feedback

• Soderquist (ACM Multimedia 97)• Azevedo (AWIA 2001)• Hughes, Adve (MICRO 01, ICSA 01)• Brooks (ISCA 2000), Choi (ISLPED 02)• Leback (ASPLOS 2000), Microsoft’s ACPI

• Ellis, Vahdat (EcoSystem, Currentcy, ASPLOS 02) • Hao, Nahrstedt (ICMCS 99, HPDC 99, Globecom)•DVS (Shin, Gupta, Weiser, Srivastava, Govil et. al.)•DPM (Douglis, Hembold, Delaluz, Kumpf et. al.)•Chandra (MMCN 02), Katz (IEICE 97), Chou(02)•Feeney, Nilson ( Infocom 2001)

• Nahrstedt ( Grace, UIUC - MMCN 2002, 2003)• Shenoy (MMCN 2002), RajKumar (ICDCS 2003)• Mohapatra(ICDCS, MWCN 2003), Xu (DCS 03) • Efstratiou, Friday (Middleware 2000)•Forge Project UCI (ACM MM, RTAS, CIPC 03)

• Flinn (ICDSP 2001), Yau (ICME 2002)• Krintz, Wolski (UCSD)• Noble (SOSP 97, MCSA 1999)• Li (CASES 2002), Othman (1998)• Abeni (RTSS 98)•Rudenko ( ACM SAC 99), Satyanarayan (2001)

PROXY-BASED ADAPTATION for POWER AWARENESS• Shenoy(transcoding), Chandra(netwrk), Mohapatra (OS, arch, network + transcoding)

CROSS-LAYER ADAPTATION• GRACE (Illinois), FORGE/DYNAMO (UCI)

16

Talk OutlineTalk Outline

OverviewOverview Distributed Wireless Distributed Wireless MultimediaMultimedia

Case Study:Case Study: The FORGE FrameworkThe FORGE Framework

IP Reuse at PhilipsIP Reuse at Philips

17

Traditional Approach: A Traditional Approach: A Closer LookCloser Look

Architecture

Operating System

Application

Low-powerdevice

request

response

Power Management

Wide Area Network

Low-powermobile device

Wireless Network

Wireless Distributed Infrastructure (traditional)

server

Network Infrastructure

18

DrawbacksDrawbacks• Limited co-ordination between the different computation layers (Architecture, OS, application)

• Lack of generalized framework• Example (DVS in presence of architectural opt.)

• Do not exploit global system knowledge • Network congestion levels• Device mobility information• Data characteristics

Cross-layer coordination directed by a distributed Cross-layer coordination directed by a distributed middleware framework can effectively address middleware framework can effectively address

the above limitations.the above limitations.

19

device

Architecture

Operating System

Distributed Middleware

User/Application

LOCALCROSS LAYER ADAPTATION

Architecture

Operating System

Distributed Middleware

User/Application

DirectoryService

Proxy

GLOBALPROXY BASED ADAPTATION

network

Build Power-aware Distributed Embedded System framework that can o Exploit global changes (network congestion, system loads, mobility patterns) to improve local adaptationso Distribute local information (e.g. device mobility, residual power) for improved global adaptationso Co-ordinate power management strategies at different levels (application, middleware, OS, architecture) o Maximize the utility (application QoS, power savings) of a mobile device.

A Global Coordinated Approach in A Global Coordinated Approach in FORGEFORGE

20

Operating SystemDVS Scheduler Device Drivers

OS

FORGE: Layers and FORGE: Layers and InteractionsInteractions

Middleware Device Runtime (API Interface)

Networkoptimization

Taskpartitioning

Userinfo

Collect/updateLocal data

Middleware strategies

U S E R A P P L I C A T I O N S (Utility)

App. specific info

NetworkCard

Display Cache Memory RegFiles CPUH/W

CROSS LAYER ADAPTATION(Local Device)

Proxy

• Admission Control• Task Partitioning• Adaptive network transmission

PROXY-BASED ADAPTATION

Proxy Middleware

•Mobility Information, •Current Residual Power•Utility levels supported•User requirements for Adm. Control.

• Transcoded payload/data• Settings for transmitted data•Control information ( n/w trans)

• NIC Idle periods• Video Encoding Info• Display Settings

• Residual Power Info• Power API

Arch. Specific Settings e.g. Cache config• Arch. Specific Knobs

(Register file sizes, Cache config)

21

Outline for the rest of the Outline for the rest of the talktalk

Examine Energy optimization knobs at Examine Energy optimization knobs at each abstraction leveleach abstraction level

Examine how cross-layer coordination Examine how cross-layer coordination can reduce energy furthercan reduce energy further

Specifically, we will talk about:Specifically, we will talk about: Using Reconfigurable CachesUsing Reconfigurable Caches Adaptive DVS techniquesAdaptive DVS techniques Network Card shut-down by buffering video Network Card shut-down by buffering video

data data Reducing Backlight by Video EnhancementReducing Backlight by Video Enhancement

22

Hardware/Architectural Hardware/Architectural Level KnobsLevel Knobs

Major sources of power consumptionMajor sources of power consumption Display (Backlight)Display (Backlight) Network InterfaceNetwork Interface CPU – particularly memory sub-systemCPU – particularly memory sub-system

We will discuss two Middleware/HW We will discuss two Middleware/HW optimizations:optimizations: Quality-Driven Cache ReconfigurationQuality-Driven Cache Reconfiguration Dynamic Backlight AdjustmentDynamic Backlight Adjustment

23

Quality-Driven Cache Quality-Driven Cache ReconfigurationReconfiguration

((Hardware-Level OptimizationHardware-Level Optimization)) Why caches?Why caches?

High relative power consumption (above 50%)High relative power consumption (above 50%) Influences external memory powerInfluences external memory power

Idea: reconfigure data cache for specific video Idea: reconfigure data cache for specific video stream format requirementsstream format requirements

Cache power knobs used: Cache power knobs used: size, associativitysize, associativity Goal: Find best configuration for each quality levelGoal: Find best configuration for each quality level

Plus: combine with dynamic voltage scaling (DVS)Plus: combine with dynamic voltage scaling (DVS) Application: MPEG decoding Application: MPEG decoding

Frame decoding may take less than frame delayFrame decoding may take less than frame delay Slack time: Slack time: θθ = = FFdd – D – D (between deadline & end of (between deadline & end of

computation)computation)

24

Impact of Cache Parameters on Impact of Cache Parameters on EnergyEnergy

Profiled short (10sec) video clips (quality: low - Profiled short (10sec) video clips (quality: low - high) for all cache configurations – parameters high) for all cache configurations – parameters varied:varied: Size: Size: 4KB – 64KB4KB – 64KB Associativity: Associativity: 1 – 321 – 32

Energy savings: Energy savings: 10-15% (CPU + memory) 10-15% (CPU + memory) over over 32x32 baseline32x32 baseline

•Experimental Setup:•Wattch/Simplescalar•Berkeley MPEG tools

“Action” clip, high quality

Observations:Observations: Associativity: largest impact on Associativity: largest impact on

energyenergy Best cache configuration Best cache configuration

reflectsreflects internal storage internal storage

requirements for different requirements for different frame sizesframe sizes

decoding algorithm internal decoding algorithm internal organization (data sets)organization (data sets)

25

Cache Configuration + DVSCache Configuration + DVS Interaction of DVS with cache configurations Interaction of DVS with cache configurations

Cache configurations with the largest frame Cache configurations with the largest frame decoding slack enable largest DVS savingsdecoding slack enable largest DVS savings

Results: up to Results: up to 60%60% energy savings over base config energy savings over base config

Video Cache Cache Clock Voltage Original Optimized Savings

Quality Size Associativity Frequency Energy Energy

Q1 8 8 100 1 1.29 0.76 47.50%Q2 8 8 100 1 1.09 0.64 47.80%Q3 8 8 100 1 0.95 0.56 48.00%Q4 32 2 66 0.9 0.54 0.26 57.60%Q5 32 2 66 0.9 0.48 0.23 57.80%Q6 32 2 33 0.9 0.42 0.2 58.00%Q7 8 8 33 0.9 0.29 0.14 57.30%Q8 8 8 33 0.9 0.24 0.11 57.50%

Base configuration: 400MHz, 1.3V, 32 kb, 32 set assoc

Middleware Rule Base for Best Config at each Quality LevelMiddleware Rule Base for Best Config at each Quality Level

QualityHigh

to Low

26

OSOS Directed Power Directed Power ManagementManagement

OS has a global view of what is going OS has a global view of what is going on the whole system on the whole system

Applications should communicate:Applications should communicate: Quality of service, timing restrictionsQuality of service, timing restrictions

The OS decides how to configure the The OS decides how to configure the knobs availableknobs available Ex: Processor frequency and voltage Ex: Processor frequency and voltage

scalingscaling

27

Power Aware Software Architecture Power Aware Software Architecture (PASA)(PASA)

PA-APIPA-API (Power Aware API) (Power Aware API) Application/OS InterfaceApplication/OS Interface Makes power aware OS Makes power aware OS

services available to the services available to the application writer.application writer.

PA-OSLPA-OSL (Power Aware Operating (Power Aware Operating System Layer)System Layer) Implements modified OS Implements modified OS

services and active components services and active components such as a DPM manager. such as a DPM manager.

PA-HALPA-HAL (Power Aware Hardware (Power Aware Hardware Abstraction Layer) Abstraction Layer) OS/Hardware InterfaceOS/Hardware Interface Makes power control knobs Makes power control knobs

available to the OS available to the OS programmer.programmer.

AdaptableApplications

Power Aware API

OS

PA OS Services

Local PM

Power Aware HALOS HAL

Hardware

Middleware

28

Operating system driven Operating system driven DVSDVS

Slow down the CPU based on workload and timing Slow down the CPU based on workload and timing restrictions (slowdown factors f < 1)restrictions (slowdown factors f < 1)

We model real time task sets with periods=deadlines We model real time task sets with periods=deadlines using RMSusing RMS

We implemented 4 variations of DVS with CPU We implemented 4 variations of DVS with CPU shutdown:shutdown: Shutdown when idleShutdown when idle – – as soon as CPU becomes idle as soon as CPU becomes idle

shutdown the processorshutdown the processor Static slow down factorsStatic slow down factors – – calculated offline and calculated offline and

based on RM schedulability analysis (using the based on RM schedulability analysis (using the WCETs)WCETs)

Dynamic slow downDynamic slow down – – run-time slow down factors run-time slow down factors are predicted based on a history of execution timesare predicted based on a history of execution times

Adaptive slow downAdaptive slow down – – A third slowdown factor A third slowdown factor adapted according to number of deadline missed in a adapted according to number of deadline missed in a previous window of executions.previous window of executions.

29

ImplementationImplementation We modified the We modified the eCos eCos real time operating real time operating

system running on a XScale platform system running on a XScale platform (80200EVB) with dynamic frequency and (80200EVB) with dynamic frequency and voltage scaling hardware.voltage scaling hardware.

For the DVS techniques, we implemented For the DVS techniques, we implemented real tasksets to validate the software real tasksets to validate the software implementation:implementation: MPEG decoding, ADPCM and FFTMPEG decoding, ADPCM and FFT

30

TaskTask ApplicationApplication WCET (us)WCET (us) Std Dev (us)Std Dev (us)

T1T1 MPEG2 MPEG2 3070030700 31003100

T2T2 MPEG2MPEG2 2630026300 21002100

T3T3 ADPCMADPCM 93009300 33003300

T4T4 FFTFFT 1590015900 00

T5T5 FFTFFT 1360013600 800800

Energy Consumption for each scheme

0

0.2

0.4

0.6

0.8

1

Scheme

Ra

tio

of

en

erg

y

co

ns

um

pti

on

be

twe

en

N

orm

al

an

d S

ch

em

e

Taskset A Taskset B Taskset C

Task Set

A: T1,T3,T4

B: T2,T3,T4

C: T1,T3,T5

ObservationsObservations Adaptive slowdown achieves about 30-40 % savingsAdaptive slowdown achieves about 30-40 % savings However, deadline misses increase ( not shown here)However, deadline misses increase ( not shown here) OS/Middleware have to trade-off deadline misses with OS/Middleware have to trade-off deadline misses with

energy savings/slowdown factorsenergy savings/slowdown factors

31

Middleware Middleware Controlled Network Controlled Network Data BufferingData Buffering

Wireless NIC cards consume significantly less energy Wireless NIC cards consume significantly less energy in in sleepsleep mode (NIC = Network Interface Card) mode (NIC = Network Interface Card) Avg. power consumption in Avg. power consumption in sleepsleep mode = 0.184 W, whereasmode = 0.184 W, whereas Idle & receiveIdle & receive modes consume 1.34 & 1.435 W respectivelymodes consume 1.34 & 1.435 W respectively

Transmitting video data in burstsTransmitting video data in bursts can help save can help save power. power. NIC on device can be transitioned into sleep modeNIC on device can be transitioned into sleep mode

The middleware on the proxy is used to buffer video The middleware on the proxy is used to buffer video data and transmit it in bursts to the device. data and transmit it in bursts to the device.

Additionally, based on the residual energy feedback Additionally, based on the residual energy feedback from the device, the middleware can transcode the from the device, the middleware can transcode the video stream based on Quality/Power Matrix.video stream based on Quality/Power Matrix.

32

N=1N=3

N=5Q1Q2

Q3Q4

Q5Q6

Q7Q8

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25A

vg

. Po

we

r S

av

ed

(%

)

Power Gains using Buffering for various noise levels

Decreasing

Increasing

Power savings decrease as video quality increasesPower savings decrease as video quality increases Amount of Data Buffering possible is less at higher qualityAmount of Data Buffering possible is less at higher quality

This is an ideal model: in practice, network noise will mean that This is an ideal model: in practice, network noise will mean that network interface has to be left on for longer periods of timenetwork interface has to be left on for longer periods of time

33

Reducing Backlight for Reducing Backlight for Lower PowerLower Power

Identify “Identify “Groups of Groups of ScenesScenes” with little ” with little variance in luminosityvariance in luminosity

Increase pixel luminance Increase pixel luminance and reduce backlight leveland reduce backlight level

To avoid loss of contrast To avoid loss of contrast (due to pixel luminance (due to pixel luminance saturation)saturation) Perform spatial convolution Perform spatial convolution

using high pass filter using high pass filter This sharpens objects in the This sharpens objects in the

imageimage

Backlight Backlight ModesModes

Power Power ConsumConsumed (in ed (in Watts)Watts)

Super BrightSuper Bright 2.802.80

High BrightHigh Bright 2.512.51

Medium Medium BrightBright

2.322.32

Low BrightLow Bright 2.162.16

Power SavePower Save 1.721.72Power consumed at various

backlight levels during streaming multimedia playback

on the Compaq iPAQ

34

MPEG VideoMPEG Video ResolutionResolution FPSFPS DuratioDuration (sec)n (sec)

Luminosity Luminosity VariationVariation

Video TypeVideo Type

bipolar.mpgbipolar.mpg 320 x 240320 x 240 3030 4141 LittleLittle Dark, 3D animationDark, 3D animation

iceegg.mpgiceegg.mpg 240 x 136240 x 136 3030 5959 ModerateModerate Bright, 3D animationBright, 3D animation

intro.mpgintro.mpg 160 x 120160 x 120 3030 5959 Very HighVery High Flashy, TV show clipFlashy, TV show clip

simpsons.mpgsimpsons.mpg 192 x 144192 x 144 3030 2727 HighHigh Colorful, 2D animationColorful, 2D animationCharacteristics of video streams used in experiment

bipolar.mpg iceegg.mpg intro.mpg simpsons.mpg

Snapshots of MPEG-1 video streams used in experiments

Video Streams used for Video Streams used for ExperimentsExperiments

35

SBCSBC: Simple Backlight Compensation: Simple Backlight Compensation Only identify GOS, reduce backlight on Only identify GOS, reduce backlight on

handheldhandheld No video stream contrast enhancementNo video stream contrast enhancement

CBVLCCBVLC: Constant Backlight with Video : Constant Backlight with Video Luminosity Compensation Luminosity Compensation Backlight level set once at start of video streamBacklight level set once at start of video stream Video stream is enhanced (dynamically at the Video stream is enhanced (dynamically at the

proxy)proxy)

DCADCA: Dual Compensation Approach: Dual Compensation Approach Backlight level is dynamically changed based on GOSBacklight level is dynamically changed based on GOS Video stream is enhanced based on Backlight level Video stream is enhanced based on Backlight level

decisiondecision

Three Backlight Compensation Three Backlight Compensation ApproachesApproaches

36

0

100

200

300

400

500

600

700

iceegg simpsons intro bipolar

Po

wer

sav

ing

(m

Wat

ts)

CBVLC

SBC

DCA

0

100

200

300

400

500

600

700

iceegg simpsons intro bipolar

Po

wer

sav

ing

(m

Wat

ts)

CBVLC

SBC

DCA

Super Bright

Results for Backlight Results for Backlight CompensationCompensation

0

50

100

150

200

250

300

350

400

450

500

iceegg simpsons intro bipolar

Po

wer

Sav

ing

(in

mW

atts

)

CBVLC

SBC

DCA

0

50

100

150

200

250

300

350

400

450

500

iceegg simpsons intro bipolar

Po

wer

Sav

ing

(in

mW

atts

)

CBVLC

SBC

DCA

0

20

40

60

80

100

120

140

160

180

iceegg simpsons intro bipolar

Po

wer

Sav

ing

(in

mW

atts

)

CBVLC

SBC

DCA

0

20

40

60

80

100

120

140

160

180

iceegg simpsons intro bipolar

Po

wer

Sav

ing

(in

mW

atts

)

CBVLC

SBC

DCA

High Bright

Medium Bright

Backlight Backlight ModesModes

Power Power ConsumConsumed (in ed (in Watts)Watts)

Super Super BrightBright

2.802.80

High BrightHigh Bright 2.512.51

Medium Medium BrightBright

2.322.32

Low BrightLow Bright 2.162.16

Power SavePower Save 1.721.72

37

SummarySummary We explored ways to reduce power by We explored ways to reduce power by

integrating power optimization techniques integrating power optimization techniques across abstraction layersacross abstraction layers HW/OS/Middleware: HW/OS/Middleware: Cache Reconfiguration, DVS, Cache Reconfiguration, DVS,

Backlight ReductionBacklight Reduction OS/Application: OS/Application: Power Aware API for DVSPower Aware API for DVS Middleware/Network: Middleware/Network: NIC Shutdown using data NIC Shutdown using data

bufferingbuffering ConclusionConclusion: A Cross-Layer Coordinated Strategy : A Cross-Layer Coordinated Strategy

is required for maximum energy savingsis required for maximum energy savings Information available at different abstraction levels Information available at different abstraction levels

can be used by either the OS or the middleware to can be used by either the OS or the middleware to make global decisionsmake global decisions

38

Ongoing WorkOngoing Work Exploits repetitive and cyclic characteristics Exploits repetitive and cyclic characteristics

of MPEG-2, MPEG-4/H.263of MPEG-2, MPEG-4/H.263 Application and data profiling possible for reducing Application and data profiling possible for reducing

energy consumptionenergy consumption Energy Characterization of Security and Energy Characterization of Security and

Digital Media Protection algorithmsDigital Media Protection algorithms Security and IP protection of multimedia content Security and IP protection of multimedia content

has spawned a range of security measureshas spawned a range of security measures First step: We analyzed the effects of watermarking First step: We analyzed the effects of watermarking

on energy and computation time on PDAson energy and computation time on PDAs Task partitioning between proxy and handheld Task partitioning between proxy and handheld

for reducing total energy for reducing total energy (=computation+communication)(=computation+communication) For Video Streaming, Video Conferencing, For Video Streaming, Video Conferencing,

WatermarkingWatermarking

39

Talk OutlineTalk Outline

OverviewOverview Distributed Wireless Distributed Wireless MultimediaMultimedia

Case Study:Case Study: The FORGE FrameworkThe FORGE Framework

IP Reuse at PhilipsIP Reuse at Philips

Blowing away Blowing away the Barriers to the Barriers to Large Scale IP Large Scale IP

ReuseReuse

Ralph von vignauRalph von vignau

5 January 20045 January 2004

DATE Conference 2004

Paris, La Defense

41

Philips and IP ReusePhilips and IP Reuse

Philips Semiconductors is a leading Philips Semiconductors is a leading SoC developerSoC developer

A reuse structure and policy for IP has A reuse structure and policy for IP has been systematically introduced into been systematically introduced into the development environment. the development environment.

There are rules and tools to support There are rules and tools to support the reusethe reuse CoReUse for HWCoReUse for HW MoReUse for SWMoReUse for SW

42

Philips and IP ReusePhilips and IP Reuse Background - 1Background - 1

Philips Semiconductors has a strategy of Philips Semiconductors has a strategy of developing products based on System developing products based on System Silicon Platforms (SSP’s).Silicon Platforms (SSP’s). Chameleon (MIPS subsystem generator)Chameleon (MIPS subsystem generator) ChipBuilder (ARM based system generator)ChipBuilder (ARM based system generator)

Demonstrates the value of automatic Demonstrates the value of automatic methods of integrating IP blocks into a methods of integrating IP blocks into a subsystem along with it’s verification subsystem along with it’s verification environment.environment.

43

Philips and IP ReusePhilips and IP Reuse Background - 2Background - 2

““Need a generic framework that Need a generic framework that enables platform developers to enables platform developers to

implement their system in a implement their system in a consistent, flexible and easy-to-consistent, flexible and easy-to-

use wayuse way””

Combining automatic methods of Combining automatic methods of integrating configurable IP blocks integrating configurable IP blocks

together with their verification together with their verification environmentenvironment

44

Lessons LearnedLessons Learned Factors that enable successful IP reuseFactors that enable successful IP reuse

A centrally driven and supported company A centrally driven and supported company policypolicy

Wide deployment with consultancyWide deployment with consultancy Central repositoryCentral repository High quality that can be trustedHigh quality that can be trusted Ease of useEase of use Good documentationGood documentation Central supportCentral support Distributed championsDistributed champions Visible improvements and successesVisible improvements and successes

45

The Limits of the Current The Limits of the Current PoliciesPolicies

A standard set of views is provided for each IP blockA standard set of views is provided for each IP block Ensures compatibility with the development flowsEnsures compatibility with the development flows Supports easier integrationSupports easier integration Ensures a minimum of documentation is availableEnsures a minimum of documentation is available Is supported by checking toolsIs supported by checking tools

However:However: Verification reuse is not yet includedVerification reuse is not yet included The checking is done by in-house toolsThe checking is done by in-house tools The rules only apply to in-house IPThe rules only apply to in-house IP

A far more radical change is required to move to the next A far more radical change is required to move to the next level of reuse methodology. level of reuse methodology. Higher automationHigher automation Faster integration and verificationFaster integration and verification Higher qualityHigher quality Flexibility in design flowsFlexibility in design flows

46

Requirements for the next Requirements for the next Level of ReuseLevel of Reuse

Extend reuse both within Philips as well as to the IP Extend reuse both within Philips as well as to the IP bought for use within Philipsbought for use within Philips The use of IP from multiple vendors must be made easier and The use of IP from multiple vendors must be made easier and

less costlyless costly Tools from various EDA vendors must be easier to Tools from various EDA vendors must be easier to

integrate into a design flowintegrate into a design flow The verification of IP must be more:The verification of IP must be more:

Comprehensive, stretching from unit tests to system Comprehensive, stretching from unit tests to system verificationverification

Reusable in all stages of the SoC developmentReusable in all stages of the SoC development A higher automation in the development flow must be A higher automation in the development flow must be

supportedsupported Automated IP integrationAutomated IP integration Verification suite compilationVerification suite compilation

47

Supportive StandardizationSupportive Standardization

Although there are several activities Although there are several activities and working groups throughout the and working groups throughout the industry and standardization groups, industry and standardization groups, none have the industry focus or time none have the industry focus or time drive set by the SPIRIT Consortiumdrive set by the SPIRIT Consortium

Only if there is an industry drive to common standards for the Reuse of IP can major

improvements be achieved

WWW.spiritconsortium.com

48

The SPIRIT ConsortiumThe SPIRIT Consortium

SPIRIT SPIRIT SStructure for tructure for PPackaging, ackaging, IIntegrating and ntegrating and RRe-e-

using using IIP within P within TTool-flowsool-flows A consortium of leading companies in the EDA, A consortium of leading companies in the EDA,

IP, system and semiconductor industriesIP, system and semiconductor industries AimAim

To develop industry standards To develop industry standards Ease integration of semiconductor IP into SystemsEase integration of semiconductor IP into Systems Enable the interoperability of tools for IP integrationEnable the interoperability of tools for IP integration

49

Reason for the SPIRIT Reason for the SPIRIT consortiumconsortium

Industry demandsIndustry demands Complex System-on-Chip and Programmable Complex System-on-Chip and Programmable

Platforms require IP re-usePlatforms require IP re-use Device manufacturers need to be able to select Device manufacturers need to be able to select

IP from multiple sourcesIP from multiple sources Unifying IP descriptions and access to this Unifying IP descriptions and access to this

information permits best-in-class choices for information permits best-in-class choices for both IP and toolsboth IP and tools

50

SPIRIT Consortium SPIRIT Consortium BackgroundBackground

The founding companies decided mutually The founding companies decided mutually to establish a unified set of standards to to establish a unified set of standards to increase efficiency of IP based SoC design increase efficiency of IP based SoC design

Combining technological strengths of Combining technological strengths of SPIRIT members toSPIRIT members to Create standards that will help express complex Create standards that will help express complex

IP IP Deliver greater flexibility and efficiency to the Deliver greater flexibility and efficiency to the

SoC design process SoC design process

51

Consortium GoalsConsortium Goals

Develop standards to facilitate IP re-useDevelop standards to facilitate IP re-use Structure for configurable IP design Structure for configurable IP design

Separating core functionality from associated parametersSeparating core functionality from associated parameters Defining standard interfaces for tools Defining standard interfaces for tools Enable more efficient and cost-effective integration of Enable more efficient and cost-effective integration of

IP from multiple sources IP from multiple sources Test the proposed standards within multiple live Test the proposed standards within multiple live

projectsprojects Providing proof-of-concept Providing proof-of-concept

Transfer proven standards to an international Transfer proven standards to an international standards bodystandards body

52

Future-world for designersFuture-world for designers

SPIRIT-enabled IP will facilitate new levels of design integration &SPIRIT-enabled IP will facilitate new levels of design integration &automation across a wide range of IP, tools and vendors:automation across a wide range of IP, tools and vendors:

IP providers will ship IP with a machine-readable XML 'data-book'IP providers will ship IP with a machine-readable XML 'data-book' Designers do not have to study data books to use IP in a System design.Designers do not have to study data books to use IP in a System design. IP will be automatically configured and integrated into designs.IP will be automatically configured and integrated into designs.

The same design information will be used to generate varied system The same design information will be used to generate varied system

information information Simulation models, Documentation, System APIs, Tool Configurations, SW applicationsSimulation models, Documentation, System APIs, Tool Configurations, SW applications

New specialist design applications will emerge to process IP informationNew specialist design applications will emerge to process IP information FPGA prototype generatorsFPGA prototype generators HW/SW optimizers/re-mapping toolsHW/SW optimizers/re-mapping tools Automatic OS portingAutomatic OS porting Bus Generators optimized for power/bandwidth etc.Bus Generators optimized for power/bandwidth etc.

For the first time, it will become realistic to reuse IP directly in a System For the first time, it will become realistic to reuse IP directly in a System DesignDesign

53

54

The Philips Utilization of the The Philips Utilization of the SPIRIT StandardsSPIRIT Standards

Philips has been gaining experience with Philips has been gaining experience with automated IP integration:automated IP integration: A Philips in-house tool, Chip Builder is an A Philips in-house tool, Chip Builder is an

excellent example of the technologyexcellent example of the technology Uses architecture templatesUses architecture templates IP generatorsIP generators Interconnect generatorsInterconnect generators Automated clock insertion and DfTAutomated clock insertion and DfT Automated pad and ring insertionAutomated pad and ring insertion

Using the SPIRIT standards, Philips intends Using the SPIRIT standards, Philips intends to use third party tools to realize an to use third party tools to realize an optimized new generationoptimized new generation

55

The Nx-Builder development The Nx-Builder development environmentenvironment

Philips will integrate a selected number of tools Philips will integrate a selected number of tools together to form a highly automated SoC design together to form a highly automated SoC design flowflow The Nx-Builder development environment will The Nx-Builder development environment will

support the 3 main phases in the development of support the 3 main phases in the development of SoC’s:SoC’s:

The architecture exploration and definition of templatesThe architecture exploration and definition of templates The integration & verification of IPThe integration & verification of IP The synthesis and chip design stepsThe synthesis and chip design steps

Nx-Builder will provide a highly flexible platform Nx-Builder will provide a highly flexible platform and product development environmentand product development environment

56

System Silicon Platform

NxNx--Builder GoalsBuilder Goals Aim is to move to next level of abstraction in SoC developmentAim is to move to next level of abstraction in SoC development

HW & SW IP, Subsystems and platforms, SoC HW & SW IP, Subsystems and platforms, SoC Encapsulates architectural rules and IP in an abstract form Encapsulates architectural rules and IP in an abstract form Provides basis for derivativesProvides basis for derivatives

encapsulated system can be deployed to derivative development teamsencapsulated system can be deployed to derivative development teams

Microcontroller Subsystem

CPU

Application testbench

...

...

Standard IP

Standard Cell

Subsystems

SoC

57

Nx-Builder, Nx-Builder, its place in the IC its place in the IC design flowdesign flowUpstreamUpstream

Architectureexploration

- Identification of new Identification of new IPIP

- Decision on IP reuseDecision on IP reuse-Specifications of of new IPnew IP

- Identification of new Identification of new IPIP

- Decision on IP reuseDecision on IP reuse-Specifications of of new IPnew IP

Optimization of:•Performance•IP Reuse•DevelopmentDevelopment of new IP

Optimization of:•Performance•IP Reuse•DevelopmentDevelopment of new IP

•SystemC•Verilog•VHDL

•SystemC•Verilog•VHDL

SystemSystemDefinitionDefinition

IP & System development using SystemC

•Verification Software

•Test Suites•Drivers

•Verification Software

•Test Suites•Drivers

SWEnvironment

Co-SimulationSystemC

Simulations

- Access to data Access to data base of SystemC base of SystemC modelsmodels

- Access to data Access to data base of SystemC base of SystemC modelsmodels

58

Nx-Builder, Nx-Builder, its place in the IC its place in the IC design flowdesign flow

IP IntegrationIP Integration

IPSelect

- GUI EntryGUI Entry or Configuration filefile- Configurable Configurable BlocksBlocks- I/OsI/Os

- GUI EntryGUI Entry or Configuration filefile- Configurable Configurable BlocksBlocks- I/OsI/Os

Feedback on: - Gates - Address Maps - Block Diagram

Feedback on: - Gates - Address Maps - Block Diagram

- Database generation- Extract all IP from data bases

- Database generation- Extract all IP from data bases

- Chip Model- System Model - Test Bench- Simulations

- Chip Model- System Model - Test Bench- Simulations

ExtractChipChip

ConfigurationConfiguration

GUI using generators for automation

- Build Verification Software

- Build Verification Software

Build SW SimulateCompile

- Search in Search in IPYPIPYP

- Search in Search in IPYPIPYP

59

Nx-Builder, Nx-Builder, its place in the IC its place in the IC design flowdesign flow

DownstreamDownstream

- Chip Model- System Model- Test Bench-Simulations-Prototyping

- Chip Model- System Model- Test Bench-Simulations-Prototyping

Scripts forIndustry Standard Tools

Scripts forIndustry Standard Tools

Make/Automation scripts

Simulate Synthesis TimingProduct

VerificationPlace &Route

60

The Major FocusThe Major Focus Nx-Builder will focus onNx-Builder will focus on

Reuse of verification suites at all stages of the Reuse of verification suites at all stages of the development flowdevelopment flow

Support of verification for SystemC Support of verification for SystemC simulations, in prototyping systems and on the simulations, in prototyping systems and on the integrated IP integrated IP

All IP will have several standard views:All IP will have several standard views: A SystemC modelA SystemC model An FPGA view for prototypingAn FPGA view for prototyping A verification suite viewA verification suite view RTL, Verilog and/or VHDLRTL, Verilog and/or VHDL A metadata description packageA metadata description package

61

SDRAMcontroller

Cameraintf.

PCMCIA

ARM7TDMIARM7TDMI

MPEG-4 DMA

Platform ExamplePlatform Example

ARM9

ISROM

VPB1

VDDalways

CGU

PLLPLL

PLLPLL

ClocksExternalint. ctl

bridge

AHB

ISRAM

VPB2

bridge

AHB

UART

JTAGTAP

CTAGTCB

IO conf

sys_creg

New IP 1New IP 1vectoredinterrupt ctl

New IP 2New IP 2

Multi-layer AHB

62

SummarySummary

Philips is committed to the planned Philips is committed to the planned Reuse of IP and Verification SuitesReuse of IP and Verification Suites

Philips will exploit the SPIRIT Philips will exploit the SPIRIT standards to achieve the next step in standards to achieve the next step in Reuse technologyReuse technology

Philips believes the changes induced in Philips believes the changes induced in the EDA and IP provider scene through the EDA and IP provider scene through SPIRIT will have positive effects on the SPIRIT will have positive effects on the electronic industry as a wholeelectronic industry as a whole

63

64

Layered Model for QoSLayered Model for QoS

User

NetworkMM devices

System

Application

(System QoS)

(Application QoS)

(Perceptual QoS)

(Operating and Communication System)

(Network QoS)(Device QoS)

(System QoS)

(Application QoS)

(Perceptual QoS)

Application Qos

Media Quality …... Media Relations

Intraframe

Media Characteristics

Interframe

Component Spec

Name

Size

Rate

Importance

Loss Rate Transmission Characteristics

Sample Size

Sample Rate

Compression

End-to-end Delay

Sample Loss Rate

Importance

Cost

Synchronization Skew

Integration

Communication

Conversion

Application QoS Parameter Examples

65

QoS ClassesQoS Classes

QoS classes can determineQoS classes can determine Reliability of offered servicesReliability of offered services Utilization of resourcesUtilization of resources

Guaranteed Service ClassGuaranteed Service Class Deterministic/Statistical guarantees.Deterministic/Statistical guarantees.

Predictive Service Class Predictive Service Class QoS parameters based on past behaviorQoS parameters based on past behavior

Best-Effort Service ClassBest-Effort Service Class Only partial guarantees based on resource Only partial guarantees based on resource

availabilityavailability QoS parameters are specified with only QoS parameters are specified with only

minimal/no boundminimal/no bound

66

What’s New in the Context of What’s New in the Context of Wireless SystemsWireless Systems

Earlier optimization metric was Earlier optimization metric was BandwidthBandwidth MPEG is a video compression standardMPEG is a video compression standard

For mobile, wireless devices: Energy is a For mobile, wireless devices: Energy is a severely limited resourceseverely limited resource How can we optimize MPEG How can we optimize MPEG

encoding/decoding to reduce energyencoding/decoding to reduce energy Traditionally, DSPs and ASICs have been Traditionally, DSPs and ASICs have been

used to execute Multimedia applicationsused to execute Multimedia applications Mobile handhelds, laptops etc use general Mobile handhelds, laptops etc use general

purpose processorspurpose processors

67

Architecture

Operating System

Distributed Middleware

User/Application

Low-powerdevice

Power Management

Wide Area Network

Wireless Network

Low-powermobile device

Proxy

Proxy-Based Optimization

Network Infrastructure

Execute Remote Tasks

Caching Compress

DecryptionEncryption

Compositing Transcode

Proxy Based Middleware ApproachProxy Based Middleware Approach

68

Energy-Sensitive Video Energy-Sensitive Video TranscodingTranscoding

► We conducted a survey to subjectively assess We conducted a survey to subjectively assess human perception of video quality on handhelds.human perception of video quality on handhelds.► Hard to programmatically identify video quality parametersHard to programmatically identify video quality parameters► We identified 8 perceptible video quality levels that We identified 8 perceptible video quality levels that

produced noticeable difference in power consumption produced noticeable difference in power consumption (Compaq iPaq 3600)(Compaq iPaq 3600)

5.38 W3.88 WQSIF, 20fps,100kbpsQ8 (Terrible)

5.5 W3.95 WQSIF, 20fps, 150KbpsQ7 (Bad)

5.63 W4.06 WHSIF, 24fps, 150KbpsQ6(Poor)

5.73 W4.15 WHSIF, 24fps, 200KbpsQ5 (Fair)

5.81 W4.24 WHSIF, 24fps, 350KbpsQ4 (Good)

5.86 W4.31 WSIF, 25fps, 350KbpsQ3 (Very Good)

5.99 W4.37 WSIF, 25fps, 450KbpsQ2 (Excellent)

6.07 W4.42 WSIF, 30fps, 650KbpsQ1 (Like original)

Avg. Power (Linux)

Avg. Power (Windows CE)

Video transformation parameters

QUALITY

VIDEO TRANSCODING PARAMETERS

Quality/Power Matrix for COMPAQ IPAQ 3600 ( Grand Theft Auto Action Video Sequence)

69

Experimental SetupExperimental Setup Power measurements:Power measurements:

IPAQ 3650 + Cisco 350 Aironet wireless IPAQ 3650 + Cisco 350 Aironet wireless cardcard

206Mhz Intel StrongArm, 16MB ROM, 206Mhz Intel StrongArm, 16MB ROM, 32MB RAM32MB RAM

SimulationSimulation Wattch / SimpleScalar for ARMWattch / SimpleScalar for ARM

MPEG decoder: Berkeley MPEG toolsMPEG decoder: Berkeley MPEG tools Transcoder: TMPGEncTranscoder: TMPGEnc Video clipsVideo clips

High action (e.g. GTA)High action (e.g. GTA) Medium action (sport)Medium action (sport) Low action (news)Low action (news)

Cable DAQ

Power measurement system(Windows XP, 650 MHz)

Ext

ern

al V

olt

ag

e

Su

pp

ly (

5V

)

AP

BNC-2110connector

802.11b

Serial connection

Wireless

R=.22ohm

ProxyVR

ViP

AQ