response time analysis of tasks in multiprocessor systems
DESCRIPTION
Response Time Analysis of Tasks in Multiprocessor Systems. ArtistDesign Joint TA/MpSoC Cluster Meeting Date 2009. Simon Schliecker Mircea Negrean Rolf Ernst. Outline. Performance Abstractions for the Analysis of Real-Time Systems Multicore Architecture - PowerPoint PPT PresentationTRANSCRIPT
Institute of Computer andNetwork Engineering
Response Time Analysis of Tasks in Multiprocessor Systems
Simon Schliecker Mircea NegreanRolf Ernst
ArtistDesign Joint TA/MpSoC Cluster
MeetingDate 2009
2Simon Schliecker – 23.04.2009
Outline
• Performance Abstractions for the Analysis of Real-Time Systems
• Multicore Architecture• Timing Implications and Countermeasures• Analytical Solutions• Conclusion
3Simon Schliecker – 23.04.2009
Scheduler
T1 T2
activationsResource
Task
System
CPU1
HWShared Memory
CPU2 CoP
Arbiter
LocalMem
Software Timing Hierarchy
…
…
4Simon Schliecker – 23.04.2009
CPUaPreemption
Stalling
Execution
Local Scheduling Analysis
• Large body of methods available to derive worst case response time for different scheduling policies – SPP, TDMA, RR, EDF,…– considering realistic scheduling
effects (context switch times, offsets, cache-related preemption delay,…)
Single core execution (with explicit memory)
Memory
CPUa
Single core execution(classical model)
WCRTValid assumption for single processor systems: Memory access times are part of the Core Execution Time Ci
5Simon Schliecker – 23.04.2009
Model task activation as event streams
t [ms]
events
2.5
events
D [ms] 2.5
Event Stream
Arrival Curves
Response Time Analysis requires traffic models:
Derive event bounds
Determined by• application model
(Simulink, LabView, …)• environment model
(reactive systems) • service contracts (max
no of requests per time, …)
T1
Extract key parameters (optional)
P PeriodJ JitterdminMinimum event
distance
T1
6Simon Schliecker – 23.04.2009
Component performance analysis
• Output event model of one processor becomes input event model of successor compose multiple local analyses into system level analysis
Comp 1
scheduling comp 1
P4
P3 output event streams
input event streams
7Simon Schliecker – 23.04.2009
Merging ECUs using Multicore-Architectures
Current distributed system
• all accesses to local resources
• bus communication clearly specified and systematic
localresources
CPU1
localresources
CPU2
localresources
core1
localresources
core2
sharedresources
ECU1 ECU2
MC-ECU
Multi-core system• keep task sets and functions
separate• accesses to local and shared
resources• complicated, interleaved and
less systematic communication timing
8Simon Schliecker – 23.04.2009
Task Exection in Multicore
Single core execution (classical model)
Preemption
Stalling
Single core execution (with stalling) Memory
CPUa
CPUa
WCRT
Memory
CPUa
CPUb
Multicore execution
Execution
9Simon Schliecker – 23.04.2009
Scheduler
T1 T2
activationsRessource
Task
System
CPU1
HWShared Memory
CPU2 CoP
Arbiter
LocalMem
Software Timing Hierarchy
…
…
Bottom
-
up
broke
n
11Simon Schliecker – 23.04.2009
Countermeasures
1. Orthogonalization of resources– introduce schedulers that give upper bounds on interference
independently of competing streams– at least perform traffic shaping imposes strict hardware guidelines protection from partially false system specification prone to overprovisioning (not so much in hard real-time setups)
2. Use formal analysis that covers dynamism– Find realistc upper bounds on application behavior– Provide formulas and analysis methods matching actual system requires comprehensive knowledge of hardware behavior to set up
analysis requires safe assumptions about behavior of the software allows considering dynamic schedulers and load
3. Mix of the above
12Simon Schliecker – 23.04.2009
Multicore with dynamically shared resources
To formally analyse multicore systems with dynamic shared ressource arbiters, we need:
1. Traffic Models to the Shared Ressource Reuse methods from the WCET community
2. Analysis of Latency of Shared Ressource Accesses Basically covered with system level approach, but new
focus on multiple events
3. Analysis of Impact of Shared Ressource Access Latency on Task Response Time Extensions of single processor scheduling analysis
13Simon Schliecker – 23.04.2009
Extended Analysis Loop
environment model
Extended responsetime analysis
output traffic description
until convergence or non-schedulability
input traffic description
Shared resource access analysis
1.
2.
3.
14Simon Schliecker – 23.04.2009
1. Derivation of Request Latencies
Tra(1)
Tra(2)Tra(1)
Control Flow Graph
Transactions
Maximum number of requests
Longest Execution
Path
• “Transactions” need to be explicit part of task description
• Problem: Cache Misses occur only during runtime Use cache modeling
from WCET tools
• Improvement: minimum distance between requests is bounded by “best case” execution
• accumulate for each task to derive load on shared resource
15Simon Schliecker – 23.04.2009
2. Derivation of Shared Resource Latencies
• Request times are very dynamic
• System state at request time is dynamic
Individual request times difficult to track
calculate total interference for all requests during
execution using load bonds
(previous slide) „aggregate busy time“
Given a certain amount of dynamism „sum-of-all-requests“ is close to actual worst-case
Memory
CPU1
CPU2
a)
Memory
CPU1
CPU2
b)
16Simon Schliecker – 23.04.2009
3. Considering Resource Access in Response Time• „Processor stalls during accesses“:
– Note: Memory acesses included at the WCRT analysis level
Memory
CPUa
CPUb
)(
)()(ihpj
ijijii RSCRCR
• Processor allows Multithreading:– Self-suspension to increase utilization– Parallelism of „local execution“ and „transaction processing“ difficult to prove!
• Exact solutions have exponential complexity
– Careful when combining MT with SPP in hard-real time [Codes06]
17Simon Schliecker – 23.04.2009
Multi-core Consequences / Conclusion
• merging ECU functions on multi-core impacts function timing
• Cross-processor interference due to shared resources occurs– Shared memories– Shared coprocessors / hardware– Logical locks (DATE 2009)
• new analysis algorithms are available– compatible to system level analysis – can work with incomplete and estimated task sets
Use analysis to optimize performance and cost:- in early design stages to guide towards optimal design choices– refine input data during design process to provide verification
strength guarantees for final product
18Simon Schliecker – 23.04.2009
FIN