worst-case execution time analysis...worst-case execution time analysis 2009-12-03 1 worst-case...

18
Worst-Case Execution Time analysis 2009-12-03 1 Worst-Case Execution Time Analysis Analysis Andreas Ermedahl, Docent Mälardalen Real-Time Research Center (MRTC) Västerås, Sweden [email protected] C 2 What C are we talking about? » A key component in the analysis of real-time systems » You have seen it in formulas such as: Worst-Case Worst-Case 3 R i = C i + R i / T j C j jhp(i) Worst Case Response Time Period Where do these C values come from? Worst Case Execution Time Program timing is not trivial! Simpler questions int f(int x) { return 2 * x; } Harder questions » What is the program doing? » Will it always do the same thing? » How important is the result? 4 » What is the execution time of the program? » Will it always take the same time to execute? » How important is execution time? Program timing basics » Most computer programs have varying execution time ¡ Due to input values ¡ Due to software characteristics ¡ Due to hardware characteristics » Example: some timed program runs 5 0 execution time program runs Most runs have similar execution time Some take much longer time (why?) Is this the longest execution time... ... or can we get even longer ones? WCET and WCET analysis » Worst-Case Execution Time = WCET ¡ The longest calculation time possible ¡ For one program/task when run in isolation ¡ Other interesting measures: BCET, ACET » The goal of a WCET analysis is to derive f b d WCET safe upper timing bounds possible execution times a safe upper bound on a programs WCET time safe lower timing bounds 0 BCET WCET

Upload: others

Post on 09-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Worst-Case Execution Time analysis 2009-12-03

    1

    Worst-CaseExecution Time

    AnalysisAnalysisAndreas Ermedahl, Docent

    Mälardalen Real-Time Research Center (MRTC)Västerås, Sweden

    [email protected]

    C2

    What C are we talking about?

    A key component in the analysis of real-time systemsYou have seen it in formulas such as:

    Worst-Case Worst-Case

    3

    Ri = Ci + ∑⎡Ri / Tj⎤ Cjj∈hp(i)

    Worst CaseResponse Time Period

    Where do these C values come from?

    Worst CaseExecution Time

    Program timing is not trivial!

    Simpler questions

    int f(int x) {

    return 2 * x;

    }

    Harder questions

    What is the program doing?

    Will it always do the same thing?

    How important is the result?

    4

    What is the executiontime of the program?

    Will it always take the same time to execute?

    How important is execution time?

    Program timing basicsMost computer programs have varyingexecution time

    Due to input valuesDue to software characteristicsDue to hardware characteristics

    Example: some timed program runs

    5

    0 execution time

    prog

    ram

    runs Most runs have

    similar execution time

    Some take much longer time (why?)

    Is this the longest execution time...

    ... or can we get even longer ones?

    WCET and WCET analysisWorst-Case Execution Time = WCET

    The longest calculation time possibleFor one program/task when run in isolationOther interesting measures: BCET, ACET

    The goal of a WCET analysis is to derive f b d ’ WCET

    safe uppertiming

    bounds

    possible execution

    times

    a safe upper bound on a program’s WCET

    time

    safe lowertiming

    bounds

    0

    BCET WCET

  • Worst-Case Execution Time analysis 2009-12-03

    2

    Presentation outlineEmbedded and Real-TimeWCET analysis

    Measurements Static analysis Fl l i l l l l i d l l tiFlow analysis, low-level analysis, and calculationHybrid approaches

    WCET analysis toolsThe SWEET approach to WCET analysisUpcoming challengesWCET tool demo

    Embedded and

    Real-Time

    Embedded computersAn integrated part of a larger system

    Example: Each microwave oven holds at least one embedded processor Example: A modern car can contain more than 100 embedded processors

    Interacts with the use, the environment, and with other computers

    Often limited or no user interfaceOften with some timing constraints inputinput resultresult

    Embedded systems everywhere

    Today, all advanced products contain embedded computers!

    Our society is dependant on that they function correctlythat they function correctly

    Embedded systems software

    Amount of software can vary from extremely small to very large

    Gives characteristics to the productOften developed with target hardware in mind

    11

    Often limited resources (memory / speed)Often direct accesses to different HW devicesNot always easily portable to other HW

    Many different programming languagesC still dominates, but often special purpose languages

    Many different software development tools Not just GCC and/or Microsoft Visual Studio

    Embedded system hardwareHuge variety of embedded system processors

    Not just one main processor type as for PCs Additionally, same CPU can be used with various hardware configurations (memories, devices, …) g ( )

    The hardware is often tailored specifically to the application

    For example using a DSP processor for signal processing

    Cross-platform developmentE.g., develop on PC and download final application to target HW

  • Worst-Case Execution Time analysis 2009-12-03

    3

    A numerical comparisionEmbedded systems processors clearly dominate yearly production

    100 million PC processors6000 million embedded

    June 10, 2005Timely Software - a Challenge 13

    "Desktop"2%

    "Embedded"98%

    Real-time systemsComputer systems where the timely behavior is a central part of the function

    Containing one or more embedded computersBoth soft- and hard real-time, or a mixture…

    Timing of radio

    Timing of radio communication,

    speech recognition,…

    Timing of musicplaying from MP3 file

    Timing of radio communication, motor

    control, rudder and flaps control,…

    Timing of network communication, motor control, ABS brakes, anti-slip control,…

    Uses of reliable WCET bounds

    Hard real-time systems WCET needed to guarantee behavior

    Real-time schedulingCreating and verifying schedulesLarge part of RT research assumeLarge part of RT research assume the existence of reliable WCET bounds

    Soft real-time systemsWCET useful for system understanding

    Program tuningCritical loops and paths

    Interrupt latency checking

    WCET analysisanalysis

    Obtaining WCET bounds

    MeasurementIndustrial practice

    Static analysis Research front

    Timing measurementmeasurement

  • Worst-Case Execution Time analysis 2009-12-03

    4

    Measuring for the WCET

    Methodology:Determine potential”worst-case input”pRun and measureAdd a safety margin

    19

    Measurement issuesLarge number of potential worst-case inputs

    Program state might be part of input

    Has the worst-case path really been taken?Often many possible paths through a program

    20

    Hardware features may interact in unexpected ways

    How to monitor the execution? The instrumentation may affect the timingHow much instrumentionoutput can be handled?

    LEDs

    Buzzer

    SW measurement methodsOperating system facilities

    Commands such as time, date and clockNote that all OS-based solutions require precise HW timing facilities (and an OS)

    Cycle-level simulatorsS f i l i CPU

    21

    Software simulating CPUCorrectness vs. hardware?

    High-water markingKeep system runningRecord maximum time observed for taskKeep in shipping systems, read at service intervals

    Using an oscilloscopeCommon equipment for HW debugging

    Used to examine electrical output signals of HW

    Mainly for observing the voltage or signal waveform on a particular pin

    Usually only two to four inputs

    22

    Usually only two to four inputs

    To measure time spent in a routine: 1. Set I/O pin high when entering routine 2. Set the same I/O pin low before exiting 3. Oscilloscope measures the amount of

    time that the I/O pin is high 4. This is the time spent in the routine

    Using a logic analyzerEquipment designed for troubleshooting digital hardwareHave dozens or even hundreds of inputs

    23

    hundreds of inputsEach one keeping track on whether the electrical signal it is attached to is currently at logic level 1 or 0Result can be displayed against a timeline Can be programmed to start capturing data at particular input patterns

    Target board

    HW Debugger

    HW measurement toolsIn-circuit emulators (ICE)

    Special CPU version revealing internalsHigh visibility & bandwidthSupportive hardware required

    Processors with debug support

    24

    g ppDesigned into processor

    Use a few dedicated processor pinsUsing standardized interfaces

    Nexus debug interfaces, JTAG, Embedded Trace Macrocell, …

    Supportive HW requiredCommon on modern chip

  • Worst-Case Execution Time analysis 2009-12-03

    5

    Problem of using measurement

    Measured value never larger than WCET!

    safe upperti i

    possible execution

    ti

    safe lowerti i

    BCET WCET

    A safety margin must be added!How much is enough?

    timingbounds

    times

    time

    timingbounds

    0

    You can never measure a value > WCET

    Measurement will result in a value ≤ WCET

    StaticWCET

    analysis

    Static WCET analysisDo not run the program – analyze it!

    Using models based on the static properties of the software and the hardware

    Guaranteed reliable WCET boundsProvided all models, input data and analysis methods are correct

    Trying to be as tight as possible

    safe uppertiming

    bounds

    possible execution

    times

    time

    safe lowertiming

    bounds

    0

    BCET WCET

    All derived bounds will be ≥ WCET

    foo(x,i):

    while(i < 100)

    if (x > 5) then

    x = x*2;

    else

    x = x+2;

    d

    Again: Causes of Execution Time Variation

    Execution characteristics of the software

    A program can often execute in many different waysInput data dependencies

    end

    if (x < 0) then

    b[i] = a[i];

    end

    i = i+1;

    end

    Application characteristics

    Timing characteristics of the hardware

    Clock frequencyCPU characteristicsMemories used…

    WCET analysis phases

    Compiler

    program

    L l l

    Flow analysis

    1. Flow analysis Bound the number of times different program parts may be executed (SW analysis)

    2. Low-level analysis

    AnalysisReality

    ObjectCode

    Target Hardware

    Low levelanalysis

    Calculation

    Bound the execution time of different program parts(HW analysis)

    3. CalculationCombine flow- and low-level analysis results to derive an upper WCET bound

    ActualWCET

    WCETbound

    Flow analysis

    30

    analysis

  • Worst-Case Execution Time analysis 2009-12-03

    6

    Flow AnalysisProvides bounds on the number of times different program parts may be executed

    Valid for all possible executionsE l f id d i f

    Flow analysis

    Program

    31

    Examples of provided info:Bounds of loop iterationsBounds on recursion depthInfeasible paths

    Info provided by: Static program analysisManual annotations

    Low levelanalysis

    Calculation

    WCETEstimate

    The control-flow graphfoo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    else

    foo()

    A

    B

    Each block will run as a

    unitelse

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i = i+1;

    end

    Flows as edges

    C D

    E

    F

    G

    end

    Flow info characteristics

    Basic finiteness

    Statically allowed

    Loop bound: 100Structurally possibleexecutions (infinite)

    foo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    else

    foo()

    C

    B

    D

    A

    Example program

    Actual feasiblepaths

    #F < 10

    Control flow graph Relation between possible executions and flow info

    WCET found here = desired result

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i = i+1;

    end

    E

    F

    G

    end

    WCET found here =overestimation

    Example: Loop boundsLoop bound:

    Depends on possible values of input variable i

    E.g. if 1 ≤ i ≤ 10 holds for input value i then loop bound is 100

    foo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    else In general, a very difficult problemHowever, solvable for many types of loops

    Requirement for basic finiteness

    All loops must be upper bound

    else

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i = i+1;

    end

    Example: Infeasible path

    Infeasible path:Path A-B-C-E-F-G can not be executedSince C implies ¬F

    foo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    elseIf (x > 5) then it is not possible that (x*2) < 0

    Limits statically allowed executions

    Might tighten the WCET estimate

    else

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i=i+1;

    end

    Example: Triangular Loop

    Two loops:Loop A bound: 100 Local B bound: 100

    Block C: B l b d

    triangle(a,b):

    A: loop(i=1..100)

    B l (j i 100)

    36

    By loop bounds:100 * 100 = 10 000But actually:100+...+1 = 5 050

    Limits statically allowed executions

    Might tighten the WCET estimate

    B: loop(j=i..100)

    C: a[i,j]=...

    end loop

    end loop

  • Worst-Case Execution Time analysis 2009-12-03

    7

    Object File

    Object File

    Obj t Fil

    Compiler

    Embedded SW Tool Chain

    C Source

    C Source

    C Library

    C Runtime

    OSC Source

    int twice(int a) {int temp;temp = 2 * a;return temp;

    }

    Affects timing

    Affects timingAffects

    timingAffects timingAffects

    timing

    Executable

    100011000001111010001100001000001010110000100000101011000001111010101100001000111010111100011001

    LinkerObject File

    twice:mov ip, spstmfd sp!, {fp,ip,lr,pc}sub fp, ip, #4sub sp, sp, #8str r0, [fp, #-16]ldr r3, [fp, #-16]mov r3, r3, asl #1str r3, [fp, #-20]ldr r3, [fp, #-20]mov r0, r3ldmea fp, {fp,sp,pc}

    Start-upC Source

    WCET

    Affects timing

    Affects timing

    The SW building toolsThe compiler:

    Translates an source code file to an object code fileOnly translates one source code file at the time

    Often makes some type of code optimizations Increase execution speed, reduce memory size, …Different optimizations give different object code layouts

    38

    Different optimizations give different object code layouts

    The linker:Combines several object code files into one executable

    Places code, global data, stack, etc in different memory partsResolves function calls and jumps between object files

    Can also perform some code transformations

    Both tools may affect the program timing!

    Example: compiling & linking/******************

    * File: main.c

    *****************/

    int foo();

    int main() {

    return 1 + foo();

    Contains object code for main.c

    Object code contains an unresolved call to foo

    Compiler main.o

    39

    ;

    }

    /******************

    * File: foo.c

    *****************/

    int foo() {

    return 1;

    } Contains object code for foo.c

    Compiler foo.o

    The main.o and foo.o object code files are combined

    The call to fooin main has

    been resolved

    Linker a.exe

    Common additional filesC Runtime code:

    Whatever needed but not supported by the HW32-bit arithmetic on a 16-bit machineFloating-point arithmetic Complex operations (e.g., modulo, variable-length shifts)

    C ith th il

    40

    Comes with the compiler May have a large footprint

    Bigger for simpler machinesTens of bytes of data and tens of kilobytes of code

    OS code: In many ES the OS code is linked together with the rest of the object code files to form a single binary image

    Common additional files

    Startup code: A small piece of assembly code that prepares the way for the execution of software written in a high-level language

    For example, setting up the system stackMany ES compilers provide a file named

    0 h ldi t t d

    41

    startup.asm, crt0.s, … holding startup code

    C Library code:A full ANSI-C compiler must provide code that implements all ANSI-C functionality

    E.g., functions such as printf, memmove, strcpyMany ES compilers only support subset of ANSI-CComes with the compiler (often non-standard)

    The mapping problemFlow analysis easier on source code level

    Semantics of code clearerEasier for programmer/tool to give flow information

    Low-level analysis requires binary codeThe code executed by the processorAll code parts are not available in source code formatp

    Compiler optimizations may change structureFor example, loops can be removed or added

    ...for(i=0; i 10)...

    else...

    }...

    …0111111010010111011001010010100110010111010100101001010100111010101001010101010000011010011010101001010101010101....

    Loopbound: 101

    Where is the loop?

    Source code Executable code

  • Worst-Case Execution Time analysis 2009-12-03

    8

    Low-level analysisanalysis

    Low-Level AnalysisDetermine execution time bounds for program parts

    Focus of most WCET-related researchUsing a model of the target HW

    Flow analysis

    Program

    The model does not need to model all HW detailsHowever, it should safely account for all possible HW timing effects

    Works on the binary, linked codeThe executable program

    Low levelanalysis

    Calculation

    WCETEstimate

    Some HW model detailsMuch effort required to safely model CPU internals

    Pipelines, branch predictors, superscalar, out-of-order, …Much effort to safely model memories

    Cache memories must be modelled in detailOther types of memories may also affect timingOther types of memories may also affect timing

    For complex CPUs many features must be analyzed together

    Timing of instructions get very history dependantDeveloping a safe HW timing model troublesome

    May take many months (or even years)All things affecting timing must be accounted for

    Hardware time variability Simpler 4-, 8- & 16-bit processors (H8300, 8051, …):

    Instructions might have varying execution time due to argument valuesVarying data access time due to different memory areasAnalysis rather simple, timing fetched from HW manual

    S & ( )Simpler 16- & 32-bit processors, with a (scalar) pipe-line and maybe a cache (ARM7, ARM9, V850E, …):

    Instruction timing dependent on previouslyexecuted instructions and accessed data:

    State of pipeline and cacheVarying access times due to cache hits and missesVarying pipeline overlap between instructionsHardware features can be analyzed in isolation

    Hardware time variabilityAdvanced 32- & 64-bit processors (PowerPC 7xx, Pentium, UltraSPARC, ARM11, …):

    Many performance enhancing features affect timingPipelines, out-of-order exec, branch pred., caches, speculative exec. I t ti ti i t hi t d d tInstruction timing gets very history dependent

    Some processors suffer from timing anomaliesE.g., a cache miss might give shorter overall program execution time than a cache hit

    Features and their timing interact Most features must be analyzed together

    Hard to create a correct and safe hardware timing model!

    The memory hierarchy

    Cache memory

    Caches store frequently used

    i t ti d d t

    Fasteraccess

    time

    CPUCaches increase

    average speed, but give more variable

    execution time

    The CPU execute instructions. It also need to access data to perform

    operations upon

    Larger storage capacity

    Main memory

    memoryinstructions and data (for faster access)

    Main memory has larger storage capacity but much longer access

    time than caches

    Many variants exists: instruction caches,

    data caches, unified caches,

    cache hierarchies, …

  • Worst-Case Execution Time analysis 2009-12-03

    9

    Example: Cache analysisfib:

    mov #1, r5mov #0, r6mov #2, r7br fib_0

    fib_1:

    What instructions will cause cache misses?

    Cache misses takes much more time than cache hits!

    Main memor

    Cache memory

    CPU

    49

    mov r5,r8add r6,r5mov r8,r6add #1,r7

    fib_0:cmp r7,r1bge fib_1

    fib_2:mov r5,r1jmp [r31]

    Performed on the object codeOnly direct-mapped instruction cache in this example

    Main memory

    Example: Cache analysis

    fib:mov #1, r5 2 1000mov #0, r6 2 1002mov #2, r7 2 1004br fib_0 2 1006

    fib_1:

    Starting address

    Size of instruction

    Information needed for i t ti

    50

    mov r5,r8 2 1008add r6,r5 2 1010mov r8,r6 2 1012add #1,r7 2 1014

    fib_0:cmp r7,r1 2 1016bge fib_1 2 1018

    fib_2: mov r5,r1 2 1020jmp [r31] 2 1022

    instruction cache analysis

    Example: Cache analysisfib:

    mov #1, r5 2 1000mov #0, r6 2 1002mov #2, r7 2 1004br fib_0 2 1006

    fib_1:

    51

    mov r5,r8 2 1008add r6,r5 2 1010mov r8,r6 2 1012add #1,r7 2 1014

    fib_0:cmp r7,r1 2 1016bge fib_1 2 1018

    fib_2: mov r5,r1 2 1020jmp [r31] 2 1022

    Mapping to instruction cache

    Example: Cache analysisfib:

    mov #1, r5mov #0, r6mov #2, r7br fib_0

    fib_1:

    misshithithit

    52

    mov r5,r8add r6,r5mov r8,r6add #1,r7

    fib_0:cmp r7,r1bge fib_1

    fib_2: mov r5,r1jmp [r31]

    misshit

    misshithithit

    Example: Cache analysisfib:

    mov #1, r5mov #0, r6mov #2, r7br fib_0

    fib_1:

    misshithithitFirst

    iteration

    53

    mov r5,r8add r6,r5mov r8,r6add #1,r7

    fib_0:cmp r7,r1bge fib_1

    fib_2: mov r5,r1jmp [r31]

    misshit

    misshithithit

    hithit

    hithithithit

    Remaining iterations

    of the loop

    hithit

    Example: CPU pipelinesObservation: Most instructions go through same stages in the CPUExample: Classic RISC 5-stage pipeline

    IF ID EX MEM WB

    54

    IF ID EX MEM WB

    Instruction fetch (IF)Get the next instruction

    from memory to process (its address is held by PC)

    Instruction decodeDetermine operation to be

    performed (i.e., extract opcode and arguments)

    ExecutePerform the actual

    operation (e.g, an add)

    Memory accessLoad/store values from/to memory if

    needed

    Write backWrite the result into the target register

  • Worst-Case Execution Time analysis 2009-12-03

    10

    CPU pipelinesIdea: Overlap the CPU stages of the instructions to achieve speed-upNo pipelining:

    Next instruction cannot start before

    IFIDEX

    1 2 3 5 6 74 8 9 10

    55

    cannot start before previous one has finished all its stages

    Pipelining:In principle: Speed-up = length of pipelineHowever, often dependenciesbetween instructions

    IFIDEX

    MEMWB

    1 2 3 5 64

    EXMEMWB

    Pipeline VariantsNone: Simple CPUs (68HC11, 8051, …)Scalar: Single pipeline (ARM7,ARM9,V850, …)VLIW: Multiple pipelines, static, compiler scheduled (DSPs, Itanium, Crusoe, …)

    56

    ( , , , )Superscalar: Multiple pipelines, out-of-order (PowerPC 7xx, Pentium, UltraSPARC, ...)

    IFIDEX

    MEMWB

    1 2 3 5 6 74 8 9 10 11 Blue instruction occupies EX stage for 2 extra cycles

    This stalls both red and green instructions

    Example: No Pipelinefoo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    else

    ((77 cyclescycles))((55 cc))(1(122 cc)) Constant time

    for each block

    57

    else

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i = i+1;

    end

    (2 (2 cc))

    (4 (4 cc))(8 (8 cc))

    (2 (2 cc))

    in the code

    Object code not shown

    Example: No pipeline

    foo()

    A

    B

    ttAA=7=7

    ttBB=5=5

    foo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    else

    C D

    E

    F

    G

    end

    ttDD=2=2ttCC=12=12

    ttEE=4=4

    ttFF=8=8

    ttGG=2=2

    else

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i = i+1;

    end

    foo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    else

    A

    B

    Example: Simple Pipeline

    ttA A = 7= 7

    B

    IF

    1 2 3 4 5ttB B = 5= 5

    IF

    EXEX

    M

    F

    1 2 3 4 5 6 7

    A IFEXEX

    M

    F

    1 2 3 4 5 6 7

    δδABAB = = --22

    else

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i = i+1;

    end

    EXEX

    M

    F

    1 2 3 4 5 6 7 8 9

    IF

    EXEX

    M

    F

    10

    ttABAB = 10= 10

    δδAB AB = 10 = 10 -- (7 + 5) = (7 + 5) = --22

    Example: Pipeline result

    ttAA=7=7

    ttBB=5=5δδABAB==--22

    δδBCBC==--22 δδBDBD==--11

    δδGAGA==--11

    foo()

    A

    B

    foo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    else

    60

    ttDD=2=2ttCC=12=12

    ttEE=4=4

    ttFF=8=8

    ttGG=2=2

    BCBC BDBD

    δδDEDE==--22δδCECE==--11

    δδEFEF==--22

    δδFGFG==--11

    δδEGEG==--11

    C D

    E

    F

    G

    end

    else

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i = i+1;

    end

  • Worst-Case Execution Time analysis 2009-12-03

    11

    IFEXMF

    Pipeline Interactions

    IFEXMFIF IF

    IFEXMF

    Pairwise overlap: speed-up

    61

    IFEXMF

    IFEXMF

    FIFEXMF

    IFEXMF

    F

    IFEXMF

    Interaction across more than two blocks also possible!Can be both speed-up or slow-down

    Cache & Pipeline analysis

    foo_0:

    foo_1:

    foo:

    info

    info

    Pipeline analysis might take cache analysis results as input

    Instructions gets annotated ith h hit/ i

    62

    1032: cmp r6,r1

    1034: blt foo_5foo_2:

    foo_3:

    foo_5:

    foo_4:

    info

    info

    info

    info

    info

    with cache hit/missThese misses/hits affect pipeline timing

    Complex HW requireintegrated cache & pipeline analysis

    1020:icache miss

    1022:icache hit

    Analysis of complex CPUsExample: Out-of-order pipelines

    Several instructions executes in parallel in unitsFunctional units often replicatedDynamic scheduling of instructionsDo not need to follow issuing order g

    Very difficult analysis problemTrack all possible pipeline states, iterate until fixed pointRequires integrated pipeline/icache/dcache/branch-prediction analysis

    Been done for PowerPC 755Up to 1000 states per instruction!

    Low-level analysis correctness?

    Abstract model of the hardware is usedModern hardware often very complex

    Combines many features Pipelining, caches, branch prediction, out-of-order

    64

    out-of-order...Have all effects been accounted for?

    Manufactures keep hardware internals secretBugs in hardware manuals Bugs relative hardware specifications

    Calculation

    CalculationDerive an upper bound on the program’s WCET

    Given flow and timing informationSeveral approaches used:

    Tree based

    Flow analysis

    Program

    Tree-basedPath-basedConstraint-based (IPET)

    Properties of approaches:Flow information handledObject code structure allowedModeling of hardware timingSolution complexity

    Low levelanalysis

    Estimatecalculation

    WCETEstimate

  • Worst-Case Execution Time analysis 2009-12-03

    12

    Example: Combined flow analysis and low-level analysis result

    foo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    else

    foo()

    A

    B

    ttAA=7=7

    ttBB=5=5

    ”Loop bound: 100”

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i = i+1;

    end

    C D

    E

    F

    G

    end

    ttDD=2=2ttCC=12=12

    ttEE=4=4

    ttFF=8=8

    ttGG=2=2

    ”C and F can’t be taken together”

    Path-Based Calcfoo()

    B

    tA=7

    tB=5

    A

    foo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    else

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i = i+1;

    end

    C D

    E

    F

    end

    tC=12

    tG=2

    Find longest pathOne loop at a time

    Prepare the loopRemove back edgesRedirect to special continue nodes

    continue

    G

    tD=2

    tF=8

    end

    tE=4

    Path-Based Calculationfoo()

    B

    tA=7

    tB=5

    Longest path:A-B-C-E-F-G7+5+12+4+8+2= 38 cycles

    A

    C D

    E

    F

    end

    tC=12

    tE=4

    tG=2

    Total time:100 iterations38 cycles per iterationTotal: 3800 cycles

    continue

    G

    tD=2

    tF=8

    Path-Based Calcfoo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    else

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i = i+1;

    end

    C and F can never execute

    together

    foo()

    B

    tA=7

    tB=5

    A

    Infeasible path:A-B-C-E-F-GIgnore, look for next

    end

    C D

    E

    F

    end

    tC=12

    tE=4

    tG=2

    continue

    G

    tD=2

    tF=8

    Path-Based Calcfoo()

    B

    tA=7

    tB=5

    A

    foo(x,i):

    A: while(i < 100)

    B: if (x > 5) then

    C: x = x*2;

    else

    D: x = x+2;

    end

    E: if (x < 0) then

    F: b[i] = a[i];

    end

    G: i = i+1;

    end

    C and F can never execute

    together

    C D

    E

    F

    end

    tC=12

    tE=4

    tG=2

    Infeasible path:A-B-C-E-F-GIgnore, look for next

    New longest path:A-B-C-E-G30 cycles

    Total time:Total: 3000 cycles

    continue

    G

    tD=2

    tF=8

    end

    IPET = Implicit path enumeration technique

    Execution paths not explicitly represented

    foo()

    A

    B

    ttAA=7=7

    tt 22

    ttBB=5=5

    tt 1212

    Example: IPET Calculation

    XXGAGA

    XXABAB

    XXBCBC XXBDBD

    XXfooAfooA

    XXAA

    XXBB

    XXfoofoo

    p y p

    Program model:Nodes and edges Timing info (ttentityentity)

    Node times: basic blocks Edge times: overlap

    Execution count (xxentityentity)

    C D

    E

    F

    G

    end

    ttDD=2=2ttCC=12=12

    ttEE=4=4

    ttFF=8=8

    ttGG=2=2

    BDBD

    XXDEDE

    XXEGEG

    XXCECE

    XXEFEF

    XXFGFG

    XXCC XXDD

    XXEE

    XXFF

    XXGGXXendend

  • Worst-Case Execution Time analysis 2009-12-03

    13

    WCET=max Σ(xxentityentity * ttentityentity)

    Where each xxentityentitysatisfies constraints

    foo()

    A

    B

    IPET Calculation

    XXAA

    XXBB

    XXfoofoo=1=1

    XXendend=1=1XXGAGA

    XXABAB

    XXBCBC XXBDBD

    XXfooAfooA

    XXABAB+X+XAendAend=X=XAA

    XXAA=X=XfooAfooA+X+XGAGA

    XXAA

    XXBB

    XXfoofoo

    satisfies constraints

    Constraints:Start & end conditionProgram structureLoop boundsOther flow information

    C D

    E

    F

    G

    end

    XXCC XXDD

    XXEE

    XXFF

    XXGG

    BDBD

    XXDEDE

    XXEGEG

    XXCECE

    XXEFEF

    XXFGFG

    ABAB AendAend AA

    XXEE=X=XCECE+X+XDEDE

    XXBCBC+X+XBDBD=X=XBB

    XXAA

  • Worst-Case Execution Time analysis 2009-12-03

    14

    WCET analysis y

    tools

    WCET Analysis ToolsSeveral more or less complete toolsCommercial tools:

    aiT from AbsIntBound-T from TidoRumRapiTime from

    80

    pRapita Systems

    Research tools:SWEET – Swedish Execution Time tool Heptane from IrisaFlorida state universitySymTA/P from

    TU Braunschweig

    WCET tool differencesUsed static and/or hybrid methodsUser interface

    Graphical and/or textualFlow analysis performed

    Manual annotations supported

    81

    ppHow the mapping problem is solved

    Decoding binariesIntegrated with compiler

    Supported processors and compilersLow-level analysis performed

    Type of hardware features handledCalculation method used

    Supported CPUs (2008)Tool Hardware platformsaiT Motorola PowerPC MPC 555, 565, and 755, Motorola ColdFire MCF

    5307, ARM7 TDMI, HCS12/STAR12, TMS320C33, C166/ST10, Renesas M32C/85, Infineon TriCore 1.3

    Bound-T Intel-8051, ADSP-21020, ATMEL ERC32, Renesas H8/300, ATMEL AVR and ATmega ARM7

    82

    ATMEL AVR and ATmega, ARM7RapiTime Motorola PowerPC family, HCS12 family, ARM, NECV850, MIPS3000SWEET ARM9, NECV850EHeptane Pentium1, StrongARM 1110, Renesas H8/300Vienna M68000, M68360, Infineon C167, PowerPC, PentiumFlorida MicroSPARC I, Intel Pentium, StarCore SC100, Atmel Atmega,

    PISA/MIPSChalmers PowerPC

    Industrial usageStatic/hybrid WCET analysis are today used in real industrial settingsExamples of industrial usage:

    Avionics – Airbus, aiTAutomotive – Ford, aiT

    83

    Automotive Ford, aiTAvionics – BAE Systems, RapiTimeAutomotive – BMW, RapiTimeSpace systems – SSF, Bound-T

    However, most companies are still highlyunaware of the concepts of “WCET analysis” and/or “schedulability analysis”

    The SWEET approach to

    WCET l iWCET analysis

  • Worst-Case Execution Time analysis 2009-12-03

    15

    The MDH WCET projectResearching on static WCET analysis

    Developing the SWEET (SWEdish Execution Time) analysis tool

    Research focus:Flow analysisTechnology transfer to industryTechnology transfer to industryInternational collaborationParametrical WCET analysis*Early stage WCET analysis*WCET analysis for multi-core*

    Previous research focus:Low-level analysisCalculation

    * = new project activities

    Project membersProfessorBjörnLisperDocentAndreas

    LecturerChristerSandberg

    PhD studentM lAndreas

    ErmedahlDocentJan Gustafsson

    MarceloSantos

    PhD studentStefanBygde

    86

    + 2 post-docs, 1 PhD student, and 2 programmers

    Technology transfer to industry (and academia)

    Evaluation of WCET analysis in industrial settings Targeting both WCET tool providers and industrial usersUsing state-of-the-art WCET analysis tools

    Applied as MSc thesis works:Enea OSE, using SWEET & aiT Volcano Communications, using aiTBound-T adaption to Lego Mindstorms and Renesas H8/300. Used in MDH RT courses CC-Systems, using aiT & measurement toolsVolvo CE using aiT & SWEET ….

    Articles and MSc thesis reports available on the MRTC web

    Available MSc thesis work

    “Creating open-source embedded real-time systems benchmarks in cooperation with companies”

    Work closely with CC-System company and (if time allows) other companies Result of high importance for RT & WCET research communities (and industries)

    http://www.idt.mdh.se/examensarbete/index.php?choice=show&id=0904Or email: [email protected]

    88

    Flow analysisMain focus of the MDH WCET analysis group

    Motivated by our industrial case studiesWe perform many types of advanced program analyses:

    Program slicing (dependency analysis)x > 5

    x = 1..10

    x = 1 4

    89

    Value analysis (abstract interpretation)Abstract execution...

    Both loop bounds and infeasible paths are derivedAnalysis made on ALF intermediate code

    ~ “high level assembler”

    A

    C

    x > 5

    B

    x < 3

    D

    EPath A-C is infeasible!

    x = 6..10

    x = 1..4

    x = 1..2x = 3..8

    Where SWEET comes in…

    C Source Object File

    Object File

    Compiler

    C Source

    Compiler

    Linker

    Object File C Library

    C Runtime

    OS

    Object File

    Hardware WCETLow-levelanalysis Calculation

    SWEET

    C Source Object File

    WCETEstimate

    Flow analysis

    Input valueconstraints

    ALFLinker

    Executable

    Other Lib

    LOW-SWEET

    ALF

    Object File

    ExecutableBinary reader

  • Worst-Case Execution Time analysis 2009-12-03

    16

    Slicing for flow analysisObservation: some variables and statements do not affect the execution flow of the program= they will never be used to determine the outcome of conditions

    Idea: remove variables and statements which are guaranteed to not affect execution flow

    Subsequent flow analyses should provide same resultSubsequent flow analyses should provide same resultbut with shorter analysis time

    Based on well-known program slicing techniquesReduces up to 94% of total program size for some of our benchmarks

    1. a[0] = 42;2. i = 1;3. j = 5;4. n = 2 * j;5. while (i

  • Worst-Case Execution Time analysis 2009-12-03

    17

    Upcoming challengesfor WCETfor WCET analysis

    Trends in Embedded SW

    Traditionally: wmbedded SW written in C and assembler, close to hardware Trend: size of embedded SW increases

    SW now clearly dominates ES development costHardware used to dominate

    Trend: more ES development by high-level programming languages and tools

    Object-oriented programming languagesModel-based toolsComponent-based tools

    Increase in embedded SW sizeMore and more functionality required

    Most easily realized in software

    Software gets more and more complexHarder to identify the timing critical part of the codeSource code not always available for all parts of the system, e.g. for SW developed by subcontractors

    Challenges for WCET analysis:Scaling of WCET analysis methods to larger code sizes

    Better visualization of results (where is the time spent?)Better adaptation to the SW development process

    Today’s WCET analysis works on the final executableChallenge: how to provide reasonable precise WCET estimates at early development stages

    Higher-level prog. languages

    Typically object-oriented: C++, Java, C#, …Challenges for WCET analysis:

    Higher use of dynamic data structuresIn traditional ES programming all data is statically

    ll t d d i il tiallocated during compile time Dynamic code, e.g., calls to virtual methods

    Hard to analyze statically (actual method called may not be known until run-time)

    Dynamic middleware: Run-time system with GCVirtual machines with JIT compilation

    Model-based designMore embedded system code generated by higher-level modeling and design tools

    Esterel, Ascet, Targetlink, Scade, ...

    The resulting code structure depends on the code generator

    model

    p gOften simpler than handwritten code

    Possible to integrate such toolswith WCET analysis tools

    The analysis can be automatedLess user interaction requiredE.g., loop bounds can be provided directly by the modeling tool

    ...label rerun: if(flag1 || flag2) ...else

    goto rerun;...

    generatedcode

    ….´10010101001110101100101001100101010011101011001010011010010101010100101001010010010101010101010100101010....

    executable

    Component-based designVery trendy within software engineeringGeneral idea:

    Package software into reusable componentsBuild systems out of prefabricatedBuild systems out of prefabricated components, which are “glued together”

    WCET analysis challenges:How to reuse WCET analysis results when some settings have changed? How to analyze SW components when not all information is available? Are WCET analysis results composable?

  • Worst-Case Execution Time analysis 2009-12-03

    18

    Compiler interactionToday – commercial WCET analysis tools analyses binariesAnother possibility – interaction with the compiler

    Easier to identify data objects and to understand what the program is intended to do

    There exists many compilers for embedded systems

    Very fragmented market Each specialized on a few particular targetsTargeting code size and execution speed

    Integration with WCET analysis tools opens new possibilities:

    Compile for timing predictabilityCompile for small WCET

    Trends in Embedded HWTrend: Large variety of ES HW platforms

    Not just one main processor type as for PCs Many different HW configurations (memories, devices, …) Challenge: How to make WCET analysis portable between platforms?

    Trend: Increasingly complex HW features to boost performance

    Taken from the high-performance CPUsPipelines, caches, branch predictors, superscalar, out-of-order, …Challenge: How to create safe and tight HW timing models?

    Trend: Multi-core architectures

    Multi-core architecturesSeveral (simple) CPUs on one chip

    Increased performance & lower power “SoC”: System-on-a-Chip

    Explicit parallelismNot hidden as in superscalar architectures

    Likely that CPUs will be less complex th t hi h d

    CPU

    L1 cache

    CPU

    L1 cache

    CPU

    L1 cache

    than current high-end processors Good for WCET analysis!

    However, risk for more shared resources: buses, memories, …

    Bad for WCET analysis!Unrelated threads on other cores might use shared resources

    Multi-core ok if predictable sharing of common resources is enforced

    Multicore chip

    L2 cache

    RAM

    Devices

    etc.Network

    Timer Serial

    The End!

    106

    For more information:www.mrtc.mdh.se/projects/wcet

    WCET tool demotool demo