qemu/sstemcsystemc cosim lationcosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfspeed up...

24
Cooperative Computing & Communication Laboratory QEMU/S stemC Cosim lation at QEMU/SystemC Cosimulation at Different Abstraction Levels 1 st International QEMU Users Forum (QUF’11) March 18 th , 2011 Markus Becker, Henning Zabel, Wolfgang Müller University of Paderborn/C-LAB

Upload: others

Post on 14-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

QEMU/S stemC Cosim lation atQEMU/SystemC Cosimulation at Different Abstraction Levels

1st International QEMU Users Forum (QUF’11)March 18th, 2011

Markus Becker, Henning Zabel, Wolfgang MüllerUniversity of Paderborn/C-LAB

Page 2: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

Today’s Embedded Software ComplexityToday s Embedded Software Complexity

Highly complex platformsHighly complex platforms Multi-core with pipelines & branch prediction Shared memories & hierarchical caches Buses & networks-on-chipuses & e o s o c p

Modern real-time operating systems & compilers Preemptive multitaskingp g Virtual memory Code optimization

Early virtual platforms• Software development• Performance estimation• Real-time verification

2© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 3: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

System Level & Transaction Level MethodologySystem Level & Transaction Level Methodology

ComputationComputation

cycletimed

A. SpecificationD F

approx.timed

B. Component assemblyC. Bus arbitrationD. Bus functionalE Cycle accurate computation

C E

untimed

E. Cycle accurate computationF. Implementation

A B

Communicationuntimed approx.

ti dcycleti d

3© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

timed timed

Page 4: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

System Level RTOS Modeling: State of the ArtSystem Level RTOS Modeling: State of the Art

HW/SW cosimulation: HDL and cycle-accurate ISSHW/SW cosimulation: HDL and cycle accurate ISSCycle-accurate timing Infeasible for early investigations of complex systems

Abstract system level RTOS models in SystemCNative speed

S ffi i t ti i Sufficient timing accuracy All source code must be available!!!

Advanced emulation (virtual prototypes)Efficient target binary execution Instruction-accurate

4© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 5: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

OutlineOutline

SystemC RTOS Modeling

QEMU/S t C C i l ti E i tQEMU/SystemC Cosimulation Environment

QEMU Cycle-Approximate Time EstimationQEMU Cycle Approximate Time Estimation

5© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 6: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

SystemC Abstract RTOS ModelingSystemC Abstract RTOS Modeling

Application movw r22, r28movw r24, r18call 0xa2 <mod>

Instruction accurate softwareApplication tasks

T1 TnT2

movw r18, r28sbiw r24, 0x00

brne .-16movw r18, r28

Application tasks Actual RTOS kernel Device drivers & comm. stacks

InstructionsRTOSInstruction Set Simulator (ISS) RTOS

N ti f ti l t

RTOSAbstraction

T1 TnT2

Application Native functional segmentsWith time annotations

RTOS model provides

Abstract RTOSModel

RTOS model provides Scheduling policies Context switching Canonic API/Standard APIs

Time annotatedsegments

T1

6© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

SystemC/SpecC Resource synchronizationScheduling

T2

Tn

Page 7: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

SystemC Abstract RTOS Modeling (cont‘d)SystemC Abstract RTOS Modeling (cont d)

Tasks and Interrupt Service Routines (ISR)

SystemC Threads

Tasks and Interrupt Service Routines (ISR)Modeled/wrapped by SystemC threads

Derive SystemC modules from RTOS Module B l t id RTOS d li

yWrapping RTOSTasks/ISRs

Base class to provide RTOS modelingCapabilities

Module class provides primitives for1:n

RTOSModules

Module class provides primitives forSynchronization of functional segments and forTime annotation consume(t)

Conte t class s nchroni es local task/ISR time

RTOS context

Context class synchronizes local task/ISR time With global SystemC time

Context class corresponds to a simulated CPU

1:n

ISR scheduler

p

1:1

CPU

7© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Task scheduler

Page 8: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

QEMU EmulatorQEMU Emulator

QEMU open source emulatorQEMU open source emulator Dynamic binary translation based CPU emulation PowerPC, ARM, MIPS, etc.

Full system emulation Complete target software stack OS and device drivers CPU Memory & I/O

OS & Drivers

Application

Memory Management Unit (MMU) I/O & peripherals

U d l ti

CPU, Memory & I/O

Host Process

Full system emulationUser mode emulation Target-compiled application task

Unprivileged CPU instructions only( d )

Application Task(user mode)

Trap system callsUser Mode CPU

Host Process

User mode emulation

8© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

User mode emulation

Page 9: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

QEMU/SystemC Cosimulation EnvironmentQEMU/SystemC Cosimulation Environment

QEMU task wrapper

Task.elf

RTOS context

Task.elf

Task.elf

Q ppMemories

& I/OSC_THREAD QEMU task wrapper

MemoriesRTOS context

Task.elf

Task.elf

Task.elf

Native task wrapper

Task scheduler SC_THREAD

Task.elfTask.elfTask.elf

& I/O

SC_THREAD

RTOS contextNative task wrapper

Task.elfTask.elfTask.elf

Task scheduler SC_THREADTask scheduler

9© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 10: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

QEMU/SystemC Cosimulation EnvironmentQEMU/SystemC Cosimulation Environment

QEMU task wrapper QEMUTLM

Task.elf

RTOS context

Task.elf

Task.elf

Q ppMemories

& I/OSC_THREAD

QEMU User mode Emulator

Task.elf

SyscallTranslator

ops

TLMTransactor

execute()syscall()RTOS context

Task.elf

Task.elf

Task.elf

Native task wrapper

Task scheduler SC_THREAD

Exec time EstimatorSC_THREAD delay

execute()y ()consume()

QEMUTaskWrapper::Thread() {do {

wait(); // For task activationwhile(!END_OF_TASK) {

switch(QEMU->execute()) { case SYSCALL_EXCEPTION:

RTOS->consume(ESTIMATOR->delay);TRANSLATOR->syscall(&QEMU->env);TRANSLATOR >syscall(&QEMU >env);break;

... }

}} hile(tr e)

10© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

} while(true);}

Page 11: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

QEMU/SystemC Multilevel CosimulationQEMU/SystemC Multilevel Cosimulation

Native Task Level Binary Task Level CPU LevelUser modeQEMU inUser mode

CPUNetwork

SW Tasks

RTOSModel

RTOSModel

DriverComm.StacksRTOS Kernels

DriverComm.StacksRTOS Kernel

User Emu.User Emu.

RTOSModel

RTOSModel CPU Emu. CPU Emu.

I/O I/OI/O I/O

RTOS Software Refinement

User modeQEMU inUser mode System mode

QEMU in System modeSystem modeQEMU inSystem mode

11© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 12: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

Simulation OverheadSimulation Overhead

Example applicationExample application PowerPC 405 ORCOS real-time operating system Two RTOS tasks synchronized via kernel signals

Cosim. Level Sim. TimeNative Task Level 5 6s

o OS as s sy c o ed a e e s g a s

Native Task Level 5.6s

Mixed Task Set 7.6s

Binary Task Level 9.2sQEMU fullSystem modeQEMU in User mode

CPU Level 51.6s

CPU Level Cosim 1472.2sQEMU in fullSystem mode

RTOS model

Task1 Task2

RTOS model

Task1 Task2

Native simulation Target emulation

RTOS model

Task1 Task2

RTOS kernel

Task1 Task2

12© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

RTOS modelRTOS model RTOS model RTOS kernel

Page 13: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

Time Annotated Basic BlocksTime Annotated Basic Blocks

Static WCET/BCET annotation error:

BB Real Delay Terror ≤ Tmax_dynamic/2

Basic Block Delay Tstatic Tmax_dynamic

BB BCET

BB WCET

13© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 14: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

Basic Block Delay Estimation (QEMU)Basic Block Delay Estimation (QEMU)

Cycle-approximate delay estimationCycle approximate delay estimation Integrates with QEMU’s dynamic binary translator No explicit micro architecture CPU model

Two phases approach Basic block translation (static analysis)

• Accumulate static instruction delays • Annotate translated blocks with delay accumulation• Instrument translated blocks with dynamic estimation code

Translated Block execution (dynamic estimation)Translated Block execution (dynamic estimation)• Execute instrumented translated blocks• Accumulate dynamic delays

TTotalAnnotation= TStaticAnalysis+TDynamicEstimation

14© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 15: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

Execution Time Estimation AccuracyExecution Time Estimation Accuracy

1.600.000

1 000 000

1.200.000

1.400.000

es

600.000

800.000

1.000.000

CPU KCycle

0

200.000

400.000 BB‐BCET

Estimation (QEMU)

Real (Logic Analyzer)

BB‐WCET

15© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 16: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

Execution Time Estimation Accuracy (cont‘d)Execution Time Estimation Accuracy (cont d)

10

5

0on (%

)

5

0

Deviatio

Deviation:Estimation vs. Real

‐5

‐10

16© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

‐15

Page 17: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

Execution Time Estimation OverheadExecution Time Estimation Overhead4

2 5

3

3,5

e (s)

1,5

2

2,5

ulation Time

0,5

1

,

Sim

Untimed QEMU @Intel P4 3GHz

QEMU w/ Time @ Intel P4 3 GHz

PowerPC Board @ 300 MHz

0

17© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 18: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

RTOS Modeling and TLM Methodology?RTOS Modeling and TLM Methodology?

18© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 19: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

Synchronization SchemeSynchronization Scheme

Data dependency awareness (causality-true)Data dependency awareness (causality true) System calls Shared variables I/O/O

Dynamic software segment Comprise TB execution between consecutive interaction points

Cycle

Interaction points

Consume accumulated execution time (interruptible wait-statement)

Instruction

TB

Cycle

tTB Dynamic software segment

tTaskTask

Basic block

TB tTB

tBB

Dynamic software segment

19© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

TB = Translated Block

Page 20: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

ConclusionConclusion

Abstract RTOS simulation and QEMU emulationAbstract RTOS simulation and QEMU emulation Early performance estimation Fast RTOS verification

Fast simulation Speed up through dynamic binary translation w.r.t. interpretive ISS

Speed up through native RTOS kernel & driver abstraction Speed up through native RTOS kernel & driver abstraction Fast cycle-approximate time accuracy through QEMU extension Flexibility by means of mixing native and binary task levels

Open issues & future work RTOS modeling TLM methodology Efficient cache & multicore modeling

20© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 21: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

Research Outlook (1/2)Research Outlook (1/2)

Motivation

Transaction-level modelslevel models

RTOS-aware refinement flow

Conclusion

Research outlook

21© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

Page 22: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

Dynamic Binary Translation (DBT)Dynamic Binary Translation (DBT)

Target instruction set emulation through host codeTarget instruction set emulation through host code Static pre-compilation of functional equivalent host code snippets Dynamic translation of linear Basic Blocks (BB) at runtime

• Concatenate code snippets until branch instruction

Introduction

QEMU

Mixed Level Co ca e a e code s ppe s u b a c s uc o• Store Translated Blocks (TB) in translation cache

Main loop• Translate BB if program counter (PC) value is unknown

Mixed Level Simulation

Experimental Results p g ( )

• Otherwise, chain TBs directly from cacheConclusion

Fetch Branch? ExecuteYes No

Yes

KnownPC?

Decode

No

TBGeneration TB

Cache

010101101010101000010110

Host codeSnippets

22© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

[Adapted from: M. Gligor et al. - Using binary translation in event driven simulationfor fast and flexible MPSoC simulation, CODES+ISSS’09, Grenoble, France]

Page 23: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

QEMU/SystemC Co-Simulation Levels (cont‘d)QEMU/SystemC Co-Simulation Levels (cont d)

Fully native RTOS model in SystemCFully native RTOS model in SystemC Early and fast verification through native simulation

Mixed native/emulated user space

Introduction

SystemC/QEMU

Cosimulation Flexibility in case of limited source code availability

User space emulationRTOS kernel & device driver abstraction

Environment

Execution TimeEstimation

RTOS kernel & device driver abstraction Abstracts from register accurate I/O

Co-simulation of full system emulator and SystemCCo simulation of full system emulator and SystemC Verification of actual RTOS and device drivers Final target firmware verification

RTOS model

Task1 Task2

RTOS model

Task1 Task2

Native simulation Target emulation

RTOS model

Task1 Task2

RTOS kernel

Task1 Task2

23© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker

RTOS modelRTOS model RTOS model RTOS kernel

Page 24: QEMU/SstemCSystemC Cosim lationCosimulation at …adt.cs.upb.de/quf/quf11/quf2011_04.pdfSpeed up through dynamic binary translation w.r.t. interpretive ISS Speed up through native

Cooperative Computing & Communication Laboratory

SystemCSystemC

System Level Design Language (IEEE standard)System Level Design Language (IEEE standard)

C++ class and macro libraryMod les

Introduction

SystemC/QEMU

Cosimulation Modules Ports Interfaces

Channels

Environment

Execution TimeEstimation

Channels

Cooperative event-based simulation kernelP t SC METHOD SC THREAD Process types: SC_METHOD, SC_THREAD

wait() for event or time

Ab t ti L lAbstraction Levels Register Transfer Level Transaction Level Modeling (TLM) support

24© 2011 Siemens AG und Universität Paderborn QUF‘11 / M. Becker