hardware/software integration in portable systems
DESCRIPTION
Hardware/Software Integration in Portable Systems. Trevor Pering University of California Berkeley. Outline. This talk describes several research projects over the last six years that have relied heavily on integrated hardware/software design. Background: The InfoPad Project - PowerPoint PPT PresentationTRANSCRIPT
Hardware/Software Integrationin
Portable Systems
Trevor PeringUniversity of California Berkeley
Outline
Background: The InfoPad ProjectEnergy Efficient MicroprocessorsSystem Design Environment
This talk describes several research projects over the last six years that have relied heavily on integrated hardware/software design.
InfoPad Overview
Perform all computation in the network tominimize client energy dissipation
CentralizedApplication
Compute Server
Wireless Basestation
Internet
Database
InfoPadInfoPadInfoPadInfoPad InfoPadInfoPad Workstation
capabilities on a portable device!
High-bandwidth radio connection
InfoPad Software Architecture
Communicate through centralized server toprovide transparent ‘wired’ semantics
SpeechRecognizer
“PadServer”Wireless Basestation
InfoPadInfoPadMaintain state in the network, not
on the Pad
Transmit audio and raw bitmaps across
the wireless link
WebBrowser
Internet
Example:Hand-held
speech-enabled web-browser
InfoPad Hardware Flexibility
Use hardware/software integration toprovide energy-efficient high-level functionality
Only header sentto microprocessor
10 MIPSμProcessor
ControlStatisticsReliabilityDebugging
Entire packet routed to dedicated hardware
RX Packet
PacketHeader
Frame-bufferupdate
Embedded software responsible for high-level functions
Main data-flow handled by custom low-power ASICs
Radio
FrameBuffer
DC/DC42%
LCD10%
I/O2%
Wireless29%
µProc.6%
Misc11%
InfoPad EvolutionTotal Power: ~7 W
High-level system design optimizes complete solution and drives new research
Where did the power go?
No local computation?
Commercial radios
Commercial DC/DC
Inefficientimplementation
IntercomIntercomEnergy-Efficient
ProcessorsInfoPadInfoPad
Outline
The InfoPad Project:• Energy-efficient integrated system
design
Energy Efficient Microprocessors:Dynamic Voltage Scaling
System Design Environment
Voltage
µP
roc.
Sp
eed
Trade-off energy and speed through voltage tominimize energy consumed
Dynamic Voltage Scaling (DVS)
Voltage
En
ergy
/Op
.
E V2
fmax(V-c)/V
µProcessor Speed
En
ergy
/Op
. E fmax Energy ~Work • Speed
DVS vs. Fixed-Voltage
Reduce both speed and voltage tominimize both power and energy
µProcessor Speed
En
ergy
/Op
.
Fixed VoltageVoltage Scaled
10xenergysavings
DVS:Voltage: 3xSpeed: 10xEnergy: 10xPower: 100x
DVS Project Charter
Design microprocessor system tosupport low-power devices
I/O operations independent of
processor architecture
SRAMSRAMSRAM
lpARM
I/O
Dynamic Voltage
Regulator
Scale voltage of entire microprocessor system!
lpARMIntercomIntercom
General-purpose software controls system voltage
DVS Scheduling Framework
Use real-time framework toconstrain task voltage scheduling
µP
roc.
Spe
ed
Time
Start Deadline Start Deadline
Idle time represents
wasted energy
Lower speed,Lower voltage, Lower energy
Energy ~ Work • Speed
WorkWork
DVS Scheduling
Schedule all tasks so as to minimize system energy dissipation
Similar to minimizing xi2 with constant xi
µP
roc.
Spe
ed
Time
S1 S2 S3 D2 D3 D1
W1 W2 W3 W1
Task runs faster to meet timing constraints
DVS Simulation
Simulate run-time scheduler tofully understand voltage-scaling behavior
Spe
ed
Time
S1 S2 S3 D1 D3 D2 Task Variance
Weather
Interrupts
User Input
Cache Behavior
Scheduling Overhead
IntercomIntercom
RealityTheory Implementation
Simulation Benchmarks
Model accurate I/O interaction toevaluate effects of voltage scaling
•Audio Decryption•Graphical UI•MPEG Decode•Run-Time Support
•Audio Decryption•Graphical UI•MPEG Decode•Run-Time Support
IntercomIntercom
SPEC
Frame Computation Histogram
0%
20%
40%
60%
80%
100%
Frame Execution Time
Audio
GUI
MPEG
Simulation Infrastructure
Develop support environment tomodel complete software system
GUI
Run-timeScheduler
VoltageScheduler
Applicationsupport libraries
MPEG Priority 80GUI Priority 23
MPEG Priority 80GUI Priority 23
Speed Priority
{ Frame_Start(deadline); Decode_MPEG_Frame(); Frame_Finish();}
{ Frame_Start(deadline); Decode_MPEG_Frame(); Frame_Finish();}Windowing
Cryptography
I/O Support
lpARM
MPEG
Simulation Run-Time Algorithm
Relax scheduling constraints toschedule efficiently in real-time
µP
roc.
Spe
ed
Time
S1 S2 S3 D2 D3 D1
W2 W3 W1
Present time
Schedule all tasks as if they were currently runnable:
O(n log n)
Speed = Work / Time
Execute W1 because W2 is not yet runnable
O(n3)
Run-Time Scheduling Dynamics
Periodically re-evaluate schedule toadjust for unforeseen events
µP
roc.
Spe
ed
Time
Thread accomplishing more than expected,
reduce speedDeadline exceeded,
increase speedHigher-priority
task
Run faster to make up lost time
Initial speed estimate
Optimal scheduleE(work)
Workload calculated to be average of previous frames
Run-Time Execution Trace
Simulate the entire system tomeasure overhead and effectiveness
SystemIdle
VoltageScheduler
MPEGDecoder
InterruptHandler
Time
Frame Deadlines
μProcessorSpeed
SchedulingOverhead < 3%
Results: Run-Time Voltage Scaling
Dynamic Voltage Scalingsignificantly reduces energy dissipation!
73%
58%
25%16%
65%
46%
15%20%
0%
20%
40%
60%
80%
100%
Audio GUI MPEG Audio &MPEG
To
tal
Sy
ste
m E
ne
rgy
DVS SimulationPost-Trace Optimal
Normalized to 3.3V
fixed-voltage processor
Combination of independent
benchmarks
Includes 10% DVS implementation
overhead
Run-Time Performance Analysis
Application characteristics strongly affectvoltage scaling performance
Frame Computation Histogram
0%
20%
40%
60%
80%
100%
Fixed-V Frame Execution Time
AudioGUIMPEG
DVS System Energy
0%
20%
40%
60%
80%
100%
To
tal S
ys
tem
En
erg
y
Basic AlgorithmAdjusted AlgorithmPost-Trace Optimal
Audio MPEG GUI
Software can automatically recognize and adjust for
bi-modal GUI distribution
0 2x deadline
Normalized to deadline at max processor speed
Beyond Dynamic Voltage Scaling
Voltage scheduling framework can be applied to many different designs and technologies
Spe
ed
Time
S1 S2 S3 D1 D3 D2
IntercomIntercom
DSP
DSP
CPU
mem
Disk
*+
lpARM
Outline
The InfoPad Project:• Energy-efficient integrated system
design
Dynamic Voltage Scaling:• Software control to minimize energy
System Design Environment:Top-Down Microprocessor Design
The lpARM Project
Combine diverse backgrounds todevelop an energy-efficient microprocessor
0.6 m DVS ARM8 processor with 16 kB on-chip cache
Speed: 10 - 100 MHzVoltage: 1.1 - 3.3 VEnergy: 0.18 - 2.2 nJ/cyclePower: 1.8 - 220 mW
Control &Software
ProcessorDesign
Dynamic VoltageRegulator
Trevor Pering
Tom Burd
Tony Stratakos
Processor validation & optimization
Silicon expected May 1999
SRAMSRAMSRAM I/O
Dynamic Voltage
RegulatorlpARM
lpARM Top-Down Design
Use top-down design flow to optimize and verify design
Cycle-levelInstructionSimulation
VHDL/LayoutHardwareSimulation
ANSI CFunctionalSimulation
=?
IntercomIntercom Functional
Specification
lpARM
Iterative design
lpARM Feature Specification
Simulate high-level system todiscover desired implementation features
Energy-saving processor features:• Dynamic speed control• Execution cycle counter• Low-power sleep mode• Interrupt speed control…
Functional
Specification
Scale voltage tominimize energy
System Simulation
lpARM End-to-End Verification
Compare inter-simulation results toverify end-to-end design
Frame 1 Chk: 0x2dbf92c2Frame 2 Chk: 0x32fe4cdaFrame 3 Chk: 0x3aa0d4acFrame 4 Chk: 0x93efa7c8Frame 5 Chk: 0x28f4efa9
Frame 1 Chk: 0x2dbf92c2Frame 2 Chk: 0x32fe4cdaFrame 3 Chk: 0x3aa0d4acFrame 4 Chk: 0x93efa7c8Frame 5 Chk: 0x28f4efa9
Application-level frame checksum
VHDLSimulation
Functional
Simulation
InstructionSimulation
TransistorSimulation
Memory hierarchy coherency
Strict cycle-level comparison
lpARM SRAM=?
Functional
Specification
lpARM
lpARM Application Evaluation
Evaluate target applications toaccurately represent system behavior
0%
20%
40%
60%
80%
100%
120%
140%
160%
180%
Nor
mal
ized
DV
S S
yste
m E
ner
gy
1-DM
2-Parallel
4-Parallel
16-CAM
32-CAM
64-CAM
CacheAssociativity
MPEGAudio GUI GeometricMean
Direct-mapped cache is very application sensitive
Intra-groupnormalized to 32-CAM
‘DVS energy’ includes system performance
lpARM System-Level Optimization
Evaluate the complete system early-on todirect architectural design
MPEG Memory Subsystem Energy
0%
10%
20%
30%
40%
50%
60%
Cache Line Size (Bytes)
Ext-Mem EnergyCAM Energy$-Mem EnergySystem EnergyuProc Die Energy
16 32 64
Tot
al D
VS
Sys
tem
En
ergy
Other parameters analyzed:• Write-back/Write-through• Allocation policy• Write-buffer size• Associativity
lpARM Design Summary
Simulating top-down hardware/software design improves end result
Scale voltage to minimize
energy
IntercomIntercomControl &Software
ProcessorDesign
VoltageRegulator
Hardware and software components
combine to form a system solution
Top-down
Spe
edTime
S1 S2 S3 D1 D3 D2
lpARM
Conclusion The InfoPad Project
• Energy-efficient integrated system design Dynamic Voltage Scaling
• Software control to minimize energy Top-Down Microprocessor Design
• Application-driven energy optimization
Effective energy-efficient systems requirecomplete top-to-bottom integrated design