the seec framework and runtime system
DESCRIPTION
The SEEC Framework and Runtime System. Henry Hoffmann MIT CSAIL http://heartbeats.csail.mit.edu/ High Performance Embedded Computing Workshop September 21-22, 2011. - PowerPoint PPT PresentationTRANSCRIPT
The SEEC Framework and Runtime System
Henry HoffmannMIT CSAIL
http://heartbeats.csail.mit.edu/
High Performance Embedded Computing WorkshopSeptember 21-22, 2011
This work was funded by the U.S. Government under the DARPA UHPC program. The views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government.
In the beginning…*
2*The beginning, in this case, refers to the beginning of my career (1999)
Performance
Application programmers had one goal:
But Modern Systems Have Increased the Burden on Application Programmers
3
Performance
Pow
er
Quality
Resiliency
Many additional constraints
Even worse, constraints can change dynamicallyE.g. power cap, workload fluctuation, core failure
beat/s
Lo Hi
Power
Most Programming Models Designed for Performance
4
Coherent Shared Memory Message Passing
Communication
Concurrency
Coordination
Multi-threaded Multi-process
Through Memory Through Network
Locks Messages
Control Procedural Procedural
Global Shared Cache
LocalCache
RF
data
store
LocalCache
RF
data
load
Core 0 Core 1
data
LocalMemory
RF
data
store
LocalMemory
RFNetworkInterface
data
load
Core 0 Core 1
NetworkInterface
Network
Procedural control insufficient to meet the needs of modern systems
5
SEEC Replaces Procedural Control with Self-Aware Control
Procedural Control Self-Aware Control
Decide
Act
• Run in open loop • Assumptions made at design time• Based on guesses about future
Decide Act
Observe
• Run in closed loop• Understand user goals • Monitor the environment
- Application optimized for system- No flexibility to adapt to changes
+System optimizes for application+Flexibly adapt behavior
The self-aware model allows the system to solve constrained optimization problems dynamically
6
Outline
• Introduction/Motivation
• The SEEC Model and Implementation
• Experimental Validation
• Conclusions
6
7
The SElf-awarE Computing (SEEC) Model
• Goal: Reduce programmer burden by continuously optimizing online
• Key Features: 1. Decoupled Approach:
• Applications explicitly state goals and progress• System software and hardware state available actions• The SEEC runtime system dynamically selects actions to maintain goals
2. General and Extensible:• New applications can be supported without training• New actions can be added without redesign and reimplementation
Application Developer
Systems Developer
Observe Act
Decide
API SPI
SEECRuntime
8
Example Self-Aware System Built from SEEC
Video Encoder
Actuators
Goals:30 beat/s, Minimize Power
20 b/s30 b/s33 b/s
Algorithm Cores
Frequency Bandwidth
Control and Learning System
_r
Hea
rt R
ate
Time
pure delayslow convergenceoscillating
SEECController
Application-
Desired Heart Rate
_r
Observed Heart Rate
r(k)
Error
e(k)
Speedup
s(k)
SEECController
Application-
Desired Heart Rate
_r
Observed Heart Rate
r(k)
Error
e(k)
Speedup
s(k)
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30
Action
Spee
dup
Initial Model
Observe
Decide Act
9
Roles in SEEC’s Decoupled Model
Application Developer
Systems Developer
SEECRuntime System
Express application goals and progress(e.g. frames/ second)
Read goals and performance
Determine how to adapt (e.g. How much to speed up the application)
Provide a set of actions and a callback function(e.g. allocation of cores to process)
Initiate actions based on results of decision phase
Observe
Decide
Act
10
Registering Application Goals
• Performance– Goals: target heart rate and/or latency between tagged heartbeats– Progress: issue heartbeats at important intervals
• Quality– Goals: distortion (distance from application defined nominal value)– Progress: distortion over last heartbeat
• Power– Goals: target heart rate / Watt and/or target energy between tagged heartbeats– Progress: Power/energy over last heartbeat interval
Application
Lo Hi
Power
SEECDecision Engine
Performance
Power
Quality
Research to date focuses on meeting performance while minimizing power/maximizing quality
Observe
Actuators
11
Registering System Actions
Each action has the following attributes:• Estimated Speedup
– Predicted benefit of taking an action• Cost
– Predicted downside of taking an action – Axis for cost (accuracy, power, etc.)
• RPC handle– A function that takes an id an implements the associated action– This is currently subject to change/redesign
SEECDecision Engine
Estimated Speedup
Cost
Callback
Algorithm Cores
Frequency Bandwidth
Act
The SEEC Decision Engine(A general, extensible approach)
• Pros: Simple, Analyzable, Works well for profiled applications
• Cons: Lack of generality for unseen applications
12
Controller(Decide)
Actuator(Act)
Application(Observe)-
PerformanceGoal
CurrentPerformance
Classic Control System: Generalized second order system
Decide
The SEEC Decision Engine(A general, extensible approach)
13
Controller(Decide)
Actuator(Act)
Application(Observe)-
PerformanceGoal
CurrentPerformance
Classic Control System
ApplicationModel
(Decide)
Adaptive Control:Based on 1-DimensionalKalman Filter for Workload Estimation
• Pros: Adapts to unseen applications• Cons: Assumes (relative) system models are correct Cannot support race-to-idle
Decide
The SEEC Decision Engine(A general, extensible approach)
14
Controller(Decide)
Actuator(Act)
Application(Observe)-
PerformanceGoal
CurrentPerformance
Classic Control System
ApplicationModel
(Decide)
Adaptive Control
ResourceModel
(Decide)
Adaptive Action Selection:Approximates solution to linear programming problem for meeting goals and minimizing cost
• Pros: Supports race-to-idle and proportional allocation• Cons: May overprovision due to system model errors
Decide
The SEEC Decision Engine(A general, extensible approach)
15
Controller(Decide)
Actuator(Act)
Application(Observe)-
PerformanceGoal
CurrentPerformance
Classic Control System
ApplicationModel
(Decide)
Adaptive Control
Machine Learner:Uses reinforcement learning to estimate system models online, becomes control system when models converge
SystemModel
(Decide)
ResourceModel
(Decide)
Adaptive Action Selection
Decide
16
Outline
• Introduction/Motivation
• The SEEC Model and Implementation
• Experimental Validation
• Conclusions
16
17
Systems Built with SEEC
17
System Actions Tradeoff BenchmarksDynamic Loop Perforation
Skip some loop iterations Performance vs. Quality
7/13 PARSECs
Dynamic Knobs Make static parameters dynamic
Performance vs. Quality
bodytrack, swaptions, x264, SWISH++
Core Scheduler Assign N cores to application
Compute vs. Power PARSEC
Clock Scaler Change processor speed Compute vs. Power PARSEC
Bandwidth Allocator Assign memory controllers to application
Memory vs. Power STREAM,PARSEC
Power Manager Combination of the three above
Performance vs. Power
PARSEC, STREAM, simple test apps (mergesort, binary search)
Learned Models Power Manager with speedup and cost learned online
Performance vs. Power
PARSEC
Multi-App Control Power Manager with multiple applications
Performance vs. Power and Quality for multiple applications
Combinations of PARSECs
18
Systems Built with SEEC
System Actions Tradeoff BenchmarksDynamic Loop Perforation
Skip some loop iterations Performance vs. Quality
7/13 PARSECs
Dynamic Knobs Make static parameters dynamic
Performance vs. Quality
bodytrack, swaptions, x264, SWISH++
Core Scheduler Assign N cores to application
Compute vs. Power PARSEC
Clock Scaler Change processor speed Compute vs. Power PARSEC
Bandwidth Allocator Assign memory controllers to application
Memory vs. Power STREAM
Power Manager Combination of the three above
Performance vs. Power
PARSEC, STREAM, extra test apps (mergesort, binary search)
Learned Models Power Manager with speedup and cost learned online
Performance vs. Power
PARSEC
Multi-App Control Power Manager with multiple applications
Performance vs. Power and Quality for multiple applications
Combinations of PARSECs
18
19
Constrained Optimization: Managing Performance/Watt for PARSEC
19
Application Goals
System Actions
Experiment
Optimize performance/Watt on multiple machines
Maintain performance, minimize power
Allocate cores
Allocate clock speed
Allocate memory bandwidth
Execute on two machines (w/ different power profiles)
Compare SEEC to several other approaches including a static oracle
Performance/Watt for PARSEC(On server with low idle power)
SEEC beats the static oracle by adjusting to phases within an application and recognizing when to race-to-idle
black
scho
les
body
track
cann
eal
dedu
p
faces
imfer
ret
fluida
nimate
freqm
ine
raytra
ce
strea
mcluste
r
swap
tions vip
sx2
64
Averag
e0
0.2
0.4
0.6
0.8
1
1.2
1.4
static oracle Heuristic Classic Control SEEC (AAS) SEEC (ML)
Benchmark
Norm
aliz
ed P
erfo
rman
ce P
er W
att
Performance/Watt for PARSEC(On server with high idle power)
SEEC is able to beat the static oracle on a different machine without code changes
black
scho
les
body
track
cann
eal
dedu
p
faces
imfer
ret
fluida
nimate
freqm
ine
raytra
ce
strea
mcluste
r
swap
tions vip
sx2
64
Averag
e0
0.20.40.60.8
11.21.41.6
static oracle Heuristic Classic Control SEEC (AAS) SEEC (ML)
Benchmark
Norm
aliz
ed P
erfo
rman
ce P
er W
att
22
Learning Models Online
22
Application Goals
System Actions
Experiment
Adapt system behavior when initial models are wrong
Minimize power consumption while meeting target performance
Change cores, clock speed, and mem. bandwidth
Initial models are incredibly optimistic(Assume linear speedup with any resource increase)
Benchmark: STREAM
Observe convergence time and performance/Watt for converged system
SEEC Can Learn Models Online
0 50 100 150 200 250 300 350 4000
0.2
0.4
0.6
0.8
1
1.2
Classic Control SEEC
Time
Nor
mal
ized
Per
form
ance
Per
W
att
SEEC learns not to allocate too many compute resources to a memory bound application
When System Models Are Wrong(Breakdown)
24
0 50 100 150 200 250 300 350 4000
0.2
0.4
0.6
0.8
1
1.2
1.4
Classic Control SEEC (AAS)SEEC (ML)
Time
Norm
aliz
ed P
ower
0 50 100 150 200 250 300 350 4000
0.2
0.4
0.6
0.8
1
1.2
Classic Control SEEC (AAS)SEEC (ML)
Time
Norm
aliz
ed P
erfo
rman
ce
Adaptive Control achieves performance fastest, but wastes power.ML reaches target performance slowest, but saves power
25
Managing Application and System Resources Concurrently
25
Application Goals
System Actions
Experiment
Manage multiple applications when clock frequency changes
bodytrack: maintain performance, minimize power
x264: maintain performance, minimize quality loss
Change core allocation to both applications
Change x264’s algorithms
Maintain performance of both applications when clock frequency changes
26
SEEC Management of Multiple ApplicationsIn Response to a Power Cap
bodytrack x264
26
0
0.5
1
1.5
2
2.5
40 90 140 190 240Time (Heartbeat)
Nor
mal
ized
Per
form
ance
0
1
2
3
4
5
6
7
Cor
es
bodytrack w/ adaptationbodytrackbodytrack cores
0
0.5
1
1.5
2
2.5
40 90 140 190 240Time (Heartbeat)
Nor
mal
ized
Per
form
ance
0
1
2
3
4
5
6
7
Cor
es
x264 w/ adaptationx264x264 cores
Clock drops 2.4-1.6GHz w/o SEEC app
misses goals
SEEC allocates cores to bodytrack
w/o SEEC app exceeds goals
SEEC removes cores from x264
SEEC adjusts algorithm to meet goals
27
Outline
• Introduction/Motivation
• The SEEC Framework
• Experimental Validation
• Conclusions
27
28
Conclusions
• SEEC is designed to help ease programmer burden– Solves resource allocation problems – Adapts to fluctuations in environment and application behavior
• SEEC has two distinguishing features– Decoupled Design
• Incorporates goals and feedback directly from the application• Allows independent specification of adaptation
– General and Extensible Decision Engine• Uses an adaptive second order control system to manage adaptation
• Demonstrated the benefits of SEEC in several experiments– Optimize performance per Watt for multiple benchmarks on multiple
machines– Adapts algorithms and resource allocation as environment changes
28
Thanks
• DARPA’s UHPC Program
• MIT, Politecnico di Milano, Freescale
• Martina Maggio, Marco D. Santambrogio, Jason E. Miller, Jonathan Eastep
• Stelios Sidiroglou, Sasa Misailovic, Michael Carbin
• Anant Agarwal, Martin Rinard, Alberto Leva, Jim Holt
29