response time analysis of tasks in multiprocessor systems

Institute of Computer andNetwork Engineering

Response Time Analysis of Tasks in Multiprocessor Systems

Simon Schliecker Mircea NegreanRolf Ernst

ArtistDesign Joint TA/MpSoC Cluster

MeetingDate 2009

2Simon Schliecker – 23.04.2009

Outline

• Performance Abstractions for the Analysis of Real-Time Systems

• Multicore Architecture• Timing Implications and Countermeasures• Analytical Solutions• Conclusion


Scheduler

T1 T2

activationsResource

Task

System

CPU1

HWShared Memory

CPU2 CoP

Arbiter

LocalMem

Software Timing Hierarchy

…

…


CPUaPreemption

Stalling

Execution

Local Scheduling Analysis

• Large body of methods available to derive worst case response time for different scheduling policies – SPP, TDMA, RR, EDF,…– considering realistic scheduling

effects (context switch times, offsets, cache-related preemption delay,…)

Single core execution (with explicit memory)

Memory

CPUa

Single core execution(classical model)

WCRTValid assumption for single processor systems: Memory access times are part of the Core Execution Time Ci


Model task activation as event streams

t [ms]

events

2.5

events

D [ms] 2.5

Event Stream

Arrival Curves

Response Time Analysis requires traffic models:

Derive event bounds

Determined by• application model

(Simulink, LabView, …)• environment model

(reactive systems) • service contracts (max

no of requests per time, …)

T1

Extract key parameters (optional)

P PeriodJ JitterdminMinimum event

distance

T1


Component performance analysis

• Output event model of one processor becomes input event model of successor compose multiple local analyses into system level analysis

Comp 1

scheduling comp 1

P4

P3 output event streams

input event streams


Merging ECUs using Multicore-Architectures

Current distributed system

• all accesses to local resources

• bus communication clearly specified and systematic

localresources

CPU1

localresources

CPU2

localresources

core1

localresources

core2

sharedresources

ECU1 ECU2

MC-ECU

Multi-core system• keep task sets and functions

separate• accesses to local and shared

resources• complicated, interleaved and

less systematic communication timing


Task Exection in Multicore

Single core execution (classical model)

Preemption

Stalling

Single core execution (with stalling) Memory

CPUa

CPUa

WCRT

Memory

CPUa

CPUb

Multicore execution

Execution


Scheduler

T1 T2

activationsRessource

Task

System

CPU1

HWShared Memory

CPU2 CoP

Arbiter

LocalMem

Software Timing Hierarchy

…

…

Bottom

-

up

broke

n


Countermeasures

1. Orthogonalization of resources– introduce schedulers that give upper bounds on interference

independently of competing streams– at least perform traffic shaping imposes strict hardware guidelines protection from partially false system specification prone to overprovisioning (not so much in hard real-time setups)

2. Use formal analysis that covers dynamism– Find realistc upper bounds on application behavior– Provide formulas and analysis methods matching actual system requires comprehensive knowledge of hardware behavior to set up

analysis requires safe assumptions about behavior of the software allows considering dynamic schedulers and load

3. Mix of the above


Multicore with dynamically shared resources

To formally analyse multicore systems with dynamic shared ressource arbiters, we need:

1. Traffic Models to the Shared Ressource Reuse methods from the WCET community

2. Analysis of Latency of Shared Ressource Accesses Basically covered with system level approach, but new

focus on multiple events

3. Analysis of Impact of Shared Ressource Access Latency on Task Response Time Extensions of single processor scheduling analysis


Extended Analysis Loop

environment model

Extended responsetime analysis

output traffic description

until convergence or non-schedulability

input traffic description

Shared resource access analysis

1.

2.

3.


1. Derivation of Request Latencies

Tra(1)

Tra(2)Tra(1)

Control Flow Graph

Transactions

Maximum number of requests

Longest Execution

Path

• “Transactions” need to be explicit part of task description

• Problem: Cache Misses occur only during runtime Use cache modeling

from WCET tools

• Improvement: minimum distance between requests is bounded by “best case” execution

• accumulate for each task to derive load on shared resource


2. Derivation of Shared Resource Latencies

• Request times are very dynamic

• System state at request time is dynamic

Individual request times difficult to track

calculate total interference for all requests during

execution using load bonds

(previous slide) „aggregate busy time“

Given a certain amount of dynamism „sum-of-all-requests“ is close to actual worst-case

Memory

CPU1

CPU2

a)

Memory

CPU1

CPU2

b)


3. Considering Resource Access in Response Time• „Processor stalls during accesses“:

– Note: Memory acesses included at the WCRT analysis level

Memory

CPUa

CPUb

)(

)()(ihpj

ijijii RSCRCR

• Processor allows Multithreading:– Self-suspension to increase utilization– Parallelism of „local execution“ and „transaction processing“ difficult to prove!

• Exact solutions have exponential complexity

– Careful when combining MT with SPP in hard-real time [Codes06]


Multi-core Consequences / Conclusion

• merging ECU functions on multi-core impacts function timing

• Cross-processor interference due to shared resources occurs– Shared memories– Shared coprocessors / hardware– Logical locks (DATE 2009)

• new analysis algorithms are available– compatible to system level analysis – can work with incomplete and estimated task sets

Use analysis to optimize performance and cost:- in early design stages to guide towards optimal design choices– refine input data during design process to provide verification

strength guarantees for final product


FIN

response time analysis of tasks in multiprocessor systems

Documents

input event model of

core execution time

model task activation

multiple local analyses

worst case response

system level analysiscomp

task exection

single processor systems