reliable embedded multimedia systems?tbasten/presentations/wisc20090921.pdf · p40 145 p50 190 p60...
TRANSCRIPT
Reliable Embedded Multimedia Systems?
‘Knowing is not understanding.’
Charles Kettering
Twan Basten
Joint work with
Marc Geilen, AmirHossein Ghamarian, Hamid Shojaei, Sander Stuijk, Bart Theelen, and others
Funding:
NWO PROMES
EC FP6 Betsy, FP7 MNEMEE
2 Overview
• Embedded Multi-media
• Analysis of Synchronous Dataflow Graphs
• Throughput Analysis
• Throughput-Storage Trade-off Analysis
• Scenario-aware Analysis
• Run-time Adaptation
• Looking Forward
3
Trends
Concurrency
Connectedness
Interaction
Variability
Embedded Multi-media
emb. mm throughput storage adaptation the futurescenarios
4 Embedded Multi-media
emb. mm throughput storage adaptation the futurescenarios
5 Embedded Multi-media
emb. mm throughput storage adaptation the futurescenarios
6 Embedded Multi-media
emb. mm throughput storage adaptation the futurescenarios
7 Approach: Models of Computation (MoCs)
functionality
concurrency
timing
energy
quality…
Basic dimensions
processing
communication
storage
functionality
concurrency
timing
energy
quality…
Aspects Metrics
executiontime
energydissipation
perceivedquality
reconfigurationtime
…
essential for predictabilityon the interface between science and engineering
emb. mm throughput storage adaptation the futurescenarios
8 Essential Challenges
predictability and efficiency
… are contradictory
emb. mm throughput storage adaptation the futurescenarios
9 Scenario-aware H.263 Decoder Model
1
1VLD IDCT
RCMCFD1
1
3
1 1
1
1
1
11
1 e
ddd
a
bc c
Detector
Kernel
Parameterized
rate
Control channelData
channel
Tokens
I P99 P99
P0
...
1
Fixed rate
Rate I Px P0
a 0 1 0
b 0 x 0
c 99 x 1
d 1 1 0
e 99 x 0
Execution time
VLDP0 0
others 40
IDCTP0 0
others 17
MC
I, P0 0
P30 90
P40 145
P50 190
P60 235
P70 265
P80 310
P99 390
RC
I 350
P0 0
P30, P40, P50 250
P60 300
P70, P80, P99 320
FD All 0
x={30,40,50,60,70,80,99}
(FSM-based Scenario-Aware DataFlow (SADF))
MEMOCODE 2006emb. mm throughput storage adaptation the futurescenarios
10 The Goal: Scenario-aware Flow
extract application
model
keep trade-offs
select operating
point and switch
dynamically
emb. mm throughput storage adaptation the futurescenarios
11 Overview
• Embedded Multi-media
• Analysis of Synchronous Dataflow Graphs
• Throughput Analysis
• Throughput-Storage Trade-off Analysis
• Scenario-aware Analysis
• Run-time Adaptation
• Looking Forward
ACSD 2006
12 Synchronous Data Flow Graphs
A,1 B,2 C,22 3 1 2
2 3 1 2
1 1
1
1 1
1
1 1
1
4 2
channelratetoken
execution time
actor
Throughput: average number of actor firings over time
(SDFGs [Lee 1986])
storage adaptation the futurescenariosemb. mm throughput
Iteration: smallest non-empty set of actor firings that does not change
the token distribution (example: A:3, B:2, C:1)
13 Our Throughput Analysis Method
A
A
A, C
A, CB
B
B
B
A
Periodic PhaseTransient Phase
Throughput can be calculatedfrom the periodic phase
Efficient implementationConsider one designated firing per iteration only to detect a recurrent state
storage adaptation the futurescenariosemb. mm throughput
• Self-timed execution:
• Each actor fires as soon as it can fire
• Known to maximize throughput
• Execution state: distribution of tokens + remaining execution times
• Execution: transient + periodic phase
14 Throughput Analysis: Our Result
1. 10-31. 10-31. 10-31. 10-3MP3 dec.
>1800>1800>180010. 10-3H.263 decoder
>1800>1800>180056. 10-3Satellite
>1800>1800>18002. 10-3Sample Rate
81. 10-381. 10-382. 10-31.10-3Modem
Young
Tarjan
Orlin
HowardDasdan
Gupta
State
Space
Runtimes of various throughput analysis methods (in seconds)
Traditional methods, all using
conversion to Homogeneous SDFGs
storage adaptation the futurescenariosemb. mm throughput
15 Overview
• Embedded Multi-media
• Analysis of Synchronous Dataflow Graphs
• Throughput Analysis
• Throughput-Storage Trade-off Analysis
• Scenario-aware Analysis
• Run-time Adaptation
• Looking Forward
DAC 2006, IEEE T on Comp 2008
16
`
5 6 7 8 9 10 110
0.05
0.1
0.15
0.2
0.25
thro
ug
hp
ut
storage
Trade-off: Storage vs. Throughput
DSP synthesis
multi-media
applications
adaptation the futurescenariosemb. mm throughput storage
17 Storage Distribution
• Separate memory for each channel.
• Storage distribution, for example, ⟨α, β⟩ → ⟨4,2⟩
A,1 B,2 C,22 3 1 2α β
Tokens on channels must be stored in memory.
1 1
1
1 1
1
1 1
1
adaptation the futurescenariosemb. mm throughput storage
18 Problem Definition
5 6 7 8 9 10 110
0.05
0.1
0.15
0.2
0.25
thro
ug
hp
ut
distribution size
⟨⟨⟨⟨4,2⟩⟩⟩⟩⟨⟨⟨⟨6,2⟩⟩⟩⟩
⟨⟨⟨⟨6,3⟩⟩⟩⟩
⟨⟨⟨⟨7,3⟩⟩⟩⟩
⟨⟨⟨⟨5,3⟩⟩⟩⟩
Find all minimal storage distributions for any possible throughput
1 1
1
1 1
1
1 1
1
A,1 B,2 C,22 3 1 2α β
adaptation the futurescenariosemb. mm throughput storage
19
tk(α)
tk(β)
sp(α)
sp(β)
tk(A)
tk(α)sp(α)
Dependency Graph
A3 A4
B0
C0
A2 B1
cycles denotestorage
dependencies
sp - space
tk - token
adaptation the futurescenariosemb. mm throughput storage
1 1
1
1 1
1
1 1
1
A,1 B,2 C,22 3 1 2α β
Storage distribution ⟨α, β⟩ → ⟨4,2⟩
dependency:an actor firing start
may depend on an actor firing end
firings in one
period
20
tk(α)
tk(β)
sp(α)
sp(β)
tk(A)
tk(α)sp(α)
Dependency Graph
A3 A4
B0
C0
A2 B1
cycles denotestorage
dependencies
sp - space
tk - token
adaptation the futurescenariosemb. mm throughput storage
resolving storage dependencies may increase throughput
dependency:an actor firing start
may depend on an actor firing end
firings in one
period
21
tk(α)
tk(β)
sp(α)
sp(β)
tk(A)
tk(α)sp(α)
Abstract Dependency Graph
A3 A4
B0
C0
A2 B1
tk(α) tk(β)
sp(β)
tk(A)
sp(α)
A B C
abstract dependency graph contains all storage
dependencies
adaptation the futurescenariosemb. mm throughput storage
easily constructed from state-space exploration
dependency graphmay be very large
dependencies between actors
22 Design-Space Exploration Algorithm
Given an SDFG G with storage distribution δ
1. Compute throughput and abstract dependency graph ∆ for G with δ
2. For each channel c with a storage dependency in ∆
Create new storage distribution by enlarging c
3. Repeat steps 1-2
All minimal storage distributions are found
when starting from storage distribution ⟨0,...,0⟩
adaptation the futurescenariosemb. mm throughput storage
23 Experimental Results
2ms
3
3
16
1/132
12
1/137
12
13
MP3
53min
3254
3254
8006
1/10000
4753
1/21876
3
4
H.263
7ms
2
2
1544
0.23
1542
0.18
26
22
Satellite
1msExec. time
5#Distributions
4#Pareto points
10Distr. size
1/4Max throughput
6Distr. size
1/7min throughput > 0
2#channels
3#actors
Example
Approximation
increase buffers with n times the minimal step size
adaptation the futurescenariosemb. mm throughput storage
24 Conclusions Trade-off Analysis
• Exact minimum memory requirements for any possible throughput
• Technique can be combined with heuristics to prune the search space if necessary
• Technique can be generalized to other types of resources and performance metrics
adaptation the futurescenariosemb. mm throughput storage
25 Overview
• Embedded Multi-media
• Analysis of Synchronous Dataflow Graphs
• Throughput Analysis
• Throughput-Storage Trade-off Analysis
• Scenario-aware Analysis
• Run-time Adaptation
• Looking Forward
26 Scenario-aware H.263 Decoder Model
1
1VLD IDCT
RCMCFD1
1
3
1 1
1
1
1
11
1 e
ddd
a
bc c
Detector
Kernel
Parameterized
rate
Control channelData
channel
Tokens
I P99 P99
P0
...
1
Fixed rate
Rate I Px P0
a 0 1 0
b 0 x 0
c 99 x 1
d 1 1 0
e 99 x 0
Execution time
VLDP0 0
others 40
IDCTP0 0
others 17
MC
I, P0 0
P30 90
P40 145
P50 190
P60 235
P70 265
P80 310
P99 390
RC
I 350
P0 0
P30, P40, P50 250
P60 300
P70, P80, P99 320
FD All 0
x={30,40,50,60,70,80,99}
(FSM-based Scenario-Aware DataFlow (SADF))
MEMOCODE 2006adaptation the futurescenariosemb. mm throughput storage
27 Scenario-aware Performance Analysis
• SDF-based throughput analysis considers worst-case actor times
• Scenario-aware throughput analysis considers worst-case actor times per scenario, but it must consider scenario transitions
• Analyzing all scenario transitions separately can be avoided
• Separate scenario transitions with an invariant reference schedule
ACM TECSadaptation the futurescenariosemb. mm throughput storage
28 Overview
• Embedded Multi-media
• Analysis of Synchronous Dataflow Graphs
• Throughput Analysis
• Throughput-Storage Trade-off Analysis
• Scenario-aware Analysis
• Run-time Adaptation
• Looking Forward
DAC 2009
29
Energy
Completion time
(3,ck1)
(4,ck1)
(5,ck1)
(1,ck2)
(2,ck2)
(3,ck2)
(3,ck1)
(4,ck1)
(5,ck1)
(1,ck2)
(2,ck2)
(3,ck2)
(3,ck1)
(4,ck1)
(5,ck1)
(1,ck2)
(2,ck2)
(3,ck2)
(8,ck1)
(4,ck2)
(10,ck1)
(9,ck1)
(6,ck2) (5,ck2)
(7,ck2)
(6,ck2)
(9,ck2)
(8,ck2)
JCM
Run-time Application Management
Const
Gantt chart
- Applications starting/stopping over time
- 6 procs, slow (ck1) and fast clocks (ck2)
- Minimize energy under time constraints
Join Const Min
“enforce the
same clock”
adaptation the futurescenariosemb. mm throughput storage
30 Pareto Algebra
Goal: Compositional computation of trade-offs
adaptation the futurescenariosemb. mm throughput storage
31 Pareto Algebra
• The elements: sets of configurations
• (Relevant) operators:
• Minimization
Gives the Pareto points (optimal trade-offs)
• Product
Cartesian product of configurations
e.g. application and platform configurations
• Constraint
Selects solutions according to constraints
e.g. all application configurations with some minimal quality
• AbstractionDiscards information about solutions
e.g. bandwidth usage in bandwidth × energy × quality configurations
• Derived metricDerive a new metric from other metrics
e.g. total power from power of components
energy
time
better
better
prerequisite
monotonicity
adaptation the futurescenariosemb. mm throughput storage
32 Pareto-Algebraic Characterization
… of run-time adaptation
… select and configure
adapt
product - derive | constrain | abstract - minimize
com
pon
ent
trade-o
ff s
paces
system trade-off space(at the time of a change)
adaptation the futurescenariosemb. mm throughput storage
33 Pareto-Algebraic Characterization
… of run-time adaptation
… select and configure
product - derive | constrain | abstract - minimize
com
pon
ents
X
derive | constrain | abstract min
adaptation the futurescenariosemb. mm throughput storage
34 Complexity
n components with p Pareto points each
adapt
take a product and on-the-fly
derive | constrain | abstract - minimize
O(p2n)
com
pon
ent
trade-o
ff s
paces
system trade-off space(at the time of a change)
adaptation the futurescenariosemb. mm throughput storage
35
X
Complexity
n components with p Pareto points each
take a product and on-the-fly
derive | constrain | abstract - minimize
O(p2n)
com
pon
ents
constrainderive-abstract
min
adaptation the futurescenariosemb. mm throughput storage
36 Complexity Control
keep n and p small
motivation Pareto algebra adaptation wsncmp
• Compositional reasoning, among others
• Approximate Pareto sets
conclusions
y =
x =
y/x = c2
y/x = c4
37 Multi-dimensional Multiple-choice Knapsack Problem
MMKP
- One optimization objective (value)
- Multiple resource dimensions with capacity constraints
- Multiple independent applications
- Multiple independent configurations per application
Pick one configuration per application optimizing value within constraints
NP hard
adaptation the futurescenariosemb. mm throughput storage
38 A Parameterized Compositional Heuristic
• Project all resource dimensions into one dimension
• Approximate Pareto set in 2-dimensional space
product-constrain-derive-abstract-minimize
product
constrain
derive abstract
minimize
Compute product while on-the-fly
• Applying resource constraints
• Computing values, projecting all resource dimensions into one(taking simply the sum)
• Minimizing the configuration set in 2-dimensional space
adaptation the futurescenariosemb. mm throughput storage
39 A Parameterized Compositional Heuristic
• Project all resource dimensions into one dimension
• Approximate Pareto set in 2-dimensional space
y =
x =
y/x = c2
y/x = c4
product-constrain-derive-abstract-minimize
adaptation the futurescenariosemb. mm throughput storage
40 A Parameterized Compositional Heuristic
• Project all resource dimensions into one dimension
• Approximate Pareto set in 2-dimensional space
product-constrain-derive-abstract-minimize
parameter p: maintain (at most) p Pareto points
Allows to budget analysis time
analysis time ≤ c ▪ n ▪ p2
(n number of applications; c platform constant)
adaptation the futurescenariosemb. mm throughput storage
41
50
60
70
80
90
100
I1 I2 I3 I4 I5 I6
T=1ms t=5ms T=20ms t=100ms
Controlling Time Budgets
budgets
time
values
• MMKP benchmark: www.laria.u-picardie.fr/hifi/OR-Benchmark/MMKP
• Always within budget
• Good values
• Similar results to the IMEC, SOC 2006,
non-compositional state-of-the-art heuristic
par p
adaptation the futurescenariosemb. mm throughput storage
SimIt-ARM simulator
206 Mhz StrongARM
42
0
1
0 5 10 15 20
IMEC Pareto
time(ms)
768 768
1576
1576 2353
2353
3224
3705
3224
3900
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
time(ms)
Compositionality
• Applications start at various times
• 5 at a time (a), 1 at a time (b)
values
values always at least as good
as IMEC heuristic
what about applications stopping ?
# of active applications
# of active applicationsadaptation the futurescenariosemb. mm throughput storage
43 Conclusions Run-time Adaptation
• Run-time adaptation:
product - derive | constrain | abstract - minimize
… select and configure
• Parameterized compositional method
• Allows to trade off quality with analysis time
• Allows to bound analysis time
• Open question: components leaving the composition ?
• Feasible for resource-constrained embedded systems
• Wide variety of applications
adaptation the futurescenariosemb. mm throughput storage
44 Overview
• Embedded Multi-media
• Analysis of Synchronous Dataflow Graphs
• Throughput Analysis
• Throughput-Storage Trade-off Analysis
• Scenario-aware Analysis
• Run-time Adaptation
• Looking Forward
45 Dataflow MoCs Expressiveness Hierarchy
RPN: Reactive Process Networks
KPN: Kahn Process Networks
DF: DataFlow
DDF: Dynamic DF
BDF: Boolean DF
CSDF: Cyclo-Static DF
SDF: Synchronous DF
HSDF: Homogeneous SDF
• BDF and larger: Turing complete
• Better notions of expressiveness
needed
• Unification needed
adaptation the futurescenariosemb. mm throughput storage
46 Challenges
• Suitable notions of expressivity
• Expressivity while maintaining analyzability and synthesizability
• Abstraction without loosing accuracy
• Composability and compositionality
• MoCs for non-functional aspects
• Unification of MoCs
• Multi-objective trade-off analysis
• Parametric analysis
• Model-driven design and synthesis flows
• Model-driven run-time systems
adaptation the futurescenariosemb. mm throughput storage
47 Thank you !
More info: www.es.ele.tue.nl/~tbasten/SDF3 toolkit: www.es.ele.tue.nl/sdf3/Pareto calculator: www.es.ele.tue.nl/pareto/
Questions ?
‘An understanding of the natural world and what's in it
is a source of not only a great curiosity but great fulfillment.’
David Attenborough