optimization of power consumption for an arm7- based multimedia handheld device hoseok chang;...

20
Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Su ng Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on , Volume: 5 , 25-28 May 2003 Pages:V-105 - V-108 vol.5 Presenter: Chin-Chi Hu

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Optimization Of Power Consumption For An ARM7-BASED Multimedia Handheld Device

Hoseok Chang; Wonchul Lee; Wonyong Sung

Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on , Volume: 5 , 25-28 May 2003 Pages:V-105 - V-108 vol.5

Presenter: Chin-Chi Hu

112/04/18 Chin-Chi Hu 2/20

Abstract We have developed a multimedia handheld educational device and

optimized the current consumption not only by employing several software optimization techniques but also by using dynamic clock frequency scaling scheme (DFS). Although the ARM7 CPU employed does not support operating voltage scaling, the controlling of the operating frequency helps reducing the current consumption in the idle time and results in up to 25% of power reduction in the system level. The CPU operation frequency is determined by profiling the multimedia program components, which include LZW (Lempel-Ziv Welch) image decompression, MP3 audio decoding, CELP based speech decoding, speech recognition and ADPCM. Especially, it is shown that the time for LZW decompression is proportional to the image size rather than the size of the compressed file. The CPU load becomes almost full, between 80 to 95%, after applying the DFS.

112/04/18 Chin-Chi Hu 3/20

What’s the problem?

Multi-Tasking operating system and dynamic frequency scaling analysis the current consumption for system

Software optimization techniques improve software to reduce numbers of instruction

and clock cycle

CPU load estimation the CPU load for executing each software

components

Results and optimization

112/04/18 Chin-Chi Hu 4/20

Introduction

A low power multimedia handheld device only two AA-size batteries

It was needed to optimize DSP programs MP3 decoding LZW(Lempel-Ziv Welch) decompression speech recognition

Aspect ARM7 specific feature optimization of software components lowering the CPU clock frequency minimizes the idle time

112/04/18 Chin-Chi Hu 5/20

System architecture Speaking partner

ARM7TDMI 60MHz CPU 8KB cache graphic LCD controller synchronous DRAM controller IIS interface 8 channel of 10 bit ADC 128KB NOR flash for system ROM NAND flash and SMC (smart media card) for

program ROM SSFDC (solid state floppy disk card) and USB for

read / write

112/04/18 Chin-Chi Hu 6/20

System architecture

Speaking Partner

112/04/18 Chin-Chi Hu 7/20

Current consumption

The CPU drains some power even when the CPU load is very small although the CPU is mostly in the idle state It is advantageous for power reduction to use the

lowest possible clock frequency. The estimation of the minimum clock frequency for

a real-time implementation is needed

112/04/18 Chin-Chi Hu 8/20

Current consumption

This figure shows that the dynamic frequency scaling scheme is more efficient than the constant frequency operation with idle state when the load condition is low

112/04/18 Chin-Chi Hu 9/20

Current consumption

Current consumption at each hardware block (CPU load is 10%)

112/04/18 Chin-Chi Hu 10/20

Software optimization

ARM7TDMI processor has characteristics for implementing DSP algorithms large number of registers most of the instructions can be executed

conditionally 32 bit barrel shifter block load and store instructions are supported

ARM7TDMI processor has a relatively simple data path, where the hardware multiplier only has the accuracy of 32*8 bits

112/04/18 Chin-Chi Hu 11/20

Software optimization MP3 decoding algorithm

C language based high level optimization assembly language based low level optimization optimized by the conditional execution of

ARM7TDMI processor

112/04/18 Chin-Chi Hu 12/20

Software optimization block data transfer

is used for load (LDM) or store (STM) of any subset of currently visible registers to/from sequential memory

No block data transfer of 15 32-bit registers from registers to sequential memory

14S+2N+1I cycles From registers to memory using the store

instruction (STR) (1S+1N+1I)*15

S :sequential cycles N :non-sequential cycles I :internal cycles

112/04/18 Chin-Chi Hu 13/20

Software optimization

112/04/18 Chin-Chi Hu 14/20

Software optimization Optimization for speech recognition

16bit multiplications instead of 32 bit multiplications 8% of cycle time reduction

employed several software optimization techniques loop fusion loop unrolling post increment/decrement conversion total execution time is reduced to about 30~45%

112/04/18 Chin-Chi Hu 15/20

CPU load estimation

The load for MP3 decoding is dependent on the bit rate and sampling clock frequency The CPU load with 60MHz

56kbps 22.05kHz : 10% 32kbps 22.05kHz : 9.6% 32kbps 16kHz : 7%

The load for CELP decoding is almost constant 18% of the 60MHz CPU load

112/04/18 Chin-Chi Hu 16/20

CPU load estimation

Processing time of LZW according to the number of pixels

Processing time of LZW according to the compressed data size

112/04/18 Chin-Chi Hu 17/20

CPU load estimation

Execution time prediction of each software component

112/04/18 Chin-Chi Hu 18/20

Experimental result

478mA(optimized) / 542(original current) = 88.2%

112/04/18 Chin-Chi Hu 19/20

Experimental result No change the clock frequency of the CPU,

which would be a more aggressive power optimization approach which paying the delay for PLL relocking

112/04/18 Chin-Chi Hu 20/20

Concluding

A dynamic frequency scaling scheme is employed in order to reduce the CPU power consumption, which shows that 20% of system power saving can be achieved

The power analysis show that the current consumed at the DRAM is almost equal to that of the CPU core, which means that reducing cache miss is most important for lowering power consumption

The current can be further reduced, without any significant change in the power reduction algorithm Employ a CPU that supports the dynamic voltage scaling (Int

el’s Xscale)