ee382n: embedded system design and...

28
EE382N: Embedded Sys Dsgn/Modeling Lecture 8 © 2015 A. Gerstlauer 1 EE382N: Embedded System Design and Modeling Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin [email protected] Lecture 8 – Computation Modeling & Refinement EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 2 Lecture 8: Outline Processor layers • Application • Task/OS • Firmware • Hardware Processor synthesis Software synthesis Hardware synthesis

Upload: others

Post on 25-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 1

EE382N:Embedded System Design and Modeling

Andreas GerstlauerElectrical and Computer Engineering

University of Texas at [email protected]

Lecture 8 – Computation Modeling & Refinement

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 2

Lecture 8: Outline

• Processor layers

• Application

• Task/OS

• Firmware

• Hardware

• Processor synthesis

• Software synthesis

• Hardware synthesis

Page 2: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 2

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 3

General Processor Micro-Architecture

• Basic computation component is a processor (PE)

• Programmable, general-purpose software processor (CPU)

• Programmable special-purpose processor (e.g. DSPs)

• Application-specific instruction set processor (ASIP)

• Custom hardware processor

Functionality and timing (and power and …)

PE

Controller Datapath

Bus interface CLK

Control signals

Status lines∆t

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 4

Computation Modeling (1)

• Structural RTL models

Sub-cycle accurate

HW

Controller

State

Next state logic

Output logic

Datapath

Registerfile

Memory

Bus interface CLK

FU1

CPU

Controller Datapath

Registerfile

Memory(data &progr.)

Load/store unit CLK

ALU

IR

PC

Decode

Fetch

Software processor Hardware processor

Page 3: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 3

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 5

Computation Modeling (2)

• Behavioral RTL models (FSMD)• Instruction-set simulation (ISS) models

• Purely functional (binary translation) [QEMU,…]• Micro-architectural (RTL in C) [GEM5,…]

Cycle or timing accurate

HW

HW_CLK

CPU

CPU_CLK

HAL

ISS

RTOS

App.

Instruction set simulation (ISS) FSMD

Bin

ary

© 2015 A. Gerstlauer 6

Computation Modeling (3)

• Host-compiled models

• Source-level application model– Compile & execute natively

– Fast functional simulation

• Back-annotate timing and other metrics

• Abstract OS and processor models

• Transaction-level model (TLM) backplane

• C-based discrete-eventsimulation kernel [SpecC,SystemC]

Fast and accurate full-system simulation

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8

Page 4: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 4

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 7

Host-Compiled Computation Layers

• Application

• Process execution (C code)

• Execution timing

• OS & processor

• Operating system– Real-time multi-tasking (RTOS model)

– Bus drivers (C code)

• Hardware abstraction layer (HAL)– Interrupt handlers

– Media accesses

• Processor hardware– Bus interfaces (I/O state machines)

– Interrupt suspension and timing

P1 P2

OS

CP

U

Drv

Interrupts

Bus

ISRHAL

Process B1(){

…waitfor(15000);…waitfor(25000);…

};

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 8

• High-level, abstract programming model• Hierarchical process graph

– ANSI C leaf processes– Parallel-serial composition

• Abstract, typed inter-process communication

– Channels– Shared variables

Timed simulation of application functionality• Annotate timing, energy, …

– Granularity?– Compiler optimizations?– Dynamic architecture effects?

Source profiling [SCE] Back-annotate from ISS Predict from host activity

Application Layer

Logical time

5 100

CPU

B2 C1

B1

B3C2

… … …

... void f() {

waitfor(5);...

}...

Page 5: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 5

© 2015 A. Gerstlauer 9

Source-Level Back-Annotation

• Retargetable back-annotation flow • Intermediate

representation (IR)– Frontend optimizations [gcc]– IR to C conversion

• Target binary– Cross-compiler backend [gcc]– Control-flow graph

matching

• Timing and power estimation

– ISS or RTL– Cycle-accurate timing,

power, …

• Back-annotation into IR– Basic block level

C Source Code

Frontend Optimisations

(gcc)

Intermediate Rep. (IR)

Backend

Binary

a=b=c=0;if(a<=0) { a=1; c=2; }……printf(…);

bb_2: a = 1; b = 0; c = 2; goto bb_7;bb_3:…..bb_7: printf(…);

Compile-able Intermediate Code

IR to C

Timing and

Energy Back

Annotator

bb_2: a = 1; b = 0; c = 2; incrDelay(15); incrEnergy(2); bb = BB_2; goto bb_7;bb_3: ….. incrDelay(delay[bb][BB_3]); incrEnergy(energy[bb][BB_3]); bb = BB_3;

…..

Host-Compiled (HC) Model

IR

Binary

GraphMatching

Mapping Table

Basic BlockTiming and Energy Cz.

AugmentedMapping Table

Back Annotator

uADL ISS

McPAT

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8

Source: S. Chakravarty, Z. Zhao, A. Gerstlauer. “Automated, Retargetable Back-Annotation for Host-Compiled Performance and Power Modeling," CODES+ISSS’13.

10© 2015 A. GerstlauerEE382N: Embedded Sys Dsgn/Modeling, Lecture 8

Binary-to-Source/IR Mapping

• Compiler optimizations• Frontend

– Control flow optimizations

• Backend– Instruction scheduling/percolation

Mismatches– Capture frontend by annotating

at IR, not source– Establish binary-IR mapping

for back-annotation

Graph matching heuristics• Synchronized, recursive depth-first traversal

– Compatibility: loop and branch nesting levels– Cost: sum of unmatched nodes in subgraphs rooted at node– Return least-cost mapping between all successors (incl. skips)

• Resolve ambiguities using debug information

Page 6: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 6

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 11

Timing/Energy Characterization

• Basic block characterization• Execution depends on state

– Pipeline stalls in case of hazards– Pipeline overlaps in multi-issue

• Pairwise characterization– Over all immediate predecessors – Across function hierarchy

• Timing & energy– First-to-last instruction fetch time– Resource utilization statistics

• Back-annotation into IR

• Path-dependent metrics– Capture static branch prediction

bb_2:a = 1; b = 0; c = 2;goto bb_7;

wait(15); energy(2);bb_3:…..If(prev_bb==3)

wait(25); energy(5);else if(prev_bb==1)

wait(30); energy(6);…..bb_7: printf(…);

Annotated IR

BB1 BB2

BB3

Exec flow 1

Exec flow 2

SS =A SS = BSS – Sys State

(registers, mem,

pipeline)

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 12

void f(void) {BB1: ...

os.wait(BB1_DELAY);if (c) goto BB2;

BB2: a[i][j] += sum;

...

os.wait(BB2_DELAY);BB3: ...

os.wait(BB3_DELAY);drv.write(res);

}

Cache-Aware Back-Annotation

TLM

FrontendOptimizations

IntermediateIntermediatecode

Retargetable Backend

CW

PC

Binarycode

void f(void) {BB1: ...

waitfor(BB1_DELAY);if (c) goto BB3;

BB2: a[i][j] += sum;alist[__idx] =

A_BASE + 4*(i*A_WID+j);...miss = cache.upd(__alist, __idx);waitfor(BB2_DELAY + miss);

BB3: ...waitfor(BB3_DELAY);ch.write(res);

}

Cac

he m

odel

Micro-architecturedescription

Block-Level Characterization

• Memory address tracing• Stack, heap

Addresslayout

Memoryaccesses

• Hybrid model• Static/dynamic back-annotation

Page 7: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 7

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 13

Source-Level Modeling Accuracy & Speed

• One-time back-annotation overhead

3min. to 3s runtime (function of code size)

Close to cycle-accurate at source-level speeds

>98% timing and energy accuracy @ 2000 MIPS

>95% accuracy @ 160 MIPS including cache

Integrate back-annotation of other metrics

Performance, energy, reliability, power, thermal (PERPT)

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

10

SHA(Small)

SHA(Large)

ADPCM(Small)

ADPCM(Large)

CRC32(Small)

CRC32(Large)

Sieve

Err

or

[%]

Z4 Timing Z4 Timing With CacheZ6 Timing Z6 Timing With CacheZ4 Energy Z4 Energy With CacheZ6 Energy Z6 Energy With Cache

1

10

100

1000

10000

SHA(Small)

SHA(Large)

ADPCM(Small)

ADPCM(Large)

CRC32(Small)

CRC32(Large)

Sieve

MIP

S

Source IR HC HC With Cache

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 14

Application Layer

• Application source code• C-based process model

– Parallel programming model– Canonical API & MoC

• Communication primitives– IPC channel library

• Timing model• Block-/IR-level granularity

– Capture data-dependent execution– Capture compiler effects

• Hybrid simulation– Back-annotation of static aspects from one-time

static analysis/estimation– Simulation of dynamic micro-architecture effects

(models of caches, branch predictors, …)

Single task timing model

CPU

P2 C1

P1

P3C2

process P3{

void main() {…c1.recv();…waitfor(5);…c2.send();…

}};

Page 8: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 8

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 15

Operating System Layer

• Scheduling

• Group processes into tasks– Static scheduling

• Schedule tasks– Dynamic scheduling, multitasking

– Preemption, interrupt handling

– Task communication (IPC)

Scheduling refinement

• Flatten hierarchy

• Reorder behaviors

OS refinement

• Insert OS model

• Task refinement

• IPC refinement

Application

SLDL

OS Layer

P1 P2

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 16

OS Modeling

• High-level RTOS abstraction

• Specification is fast but inaccurate– Native execution, truly concurrent model

• Traditional ISS-based validation infeasible– Accurate but slow (esp. in multi-processor context), requires full binary

Model of operating system (task interleaving in time) High accuracy but small overhead at early stages

Focus on key effects, abstract unnecessary implementation details

Model all concepts: multi-tasking, scheduling, preemption, interrupts, IPC

Specification Host-Compiled Implementation

Source: A. Gerstlauer, H. Yu, D. Gajski. "RTOS Modeling for System-Level Design," DATE03.

Application

SLDL

Channels

RTOS Model

T1 T2

Page 9: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 9

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 17

Abstract RTOS Model

• Emulate the sequential execution of concurrent tasks• Task scheduler

– Maintain task queues, determine task(s) to run & perform context switch

• Timing model– Simulate back-annotated task delays, call scheduler to allow for preemptions

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 18

RTOS Model Interface

interface OSAPI {

void init();void start(int sched_alg); void interrupt_return();

Task task_create(char *name, int type,sim_time period);

void task_terminate(); void task_sleep(); void task_activate(Task t); void task_endcycle();void task_kill(Task t); Task par_start();void par_end(Task t);

Task pre_wait();void post_wait(Task t);

void time_wait(sim_time nsec); };

1

5

10

15

20

Task management

OS management

Event handling

Delay modeling

• Canonical, target-independent API

Page 10: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 10

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 19

RTOS Model Implementation• RTOS model

• OS, task, event management– Descriptors & queues

• Context switching– Block all but active task on SLDL level

• Scheduling– Select and dispatch task based on

algorithm

• Preemption– Allow rescheduling at simulation time

increases

• Event handling– Remove task temporarily from OS

while waiting for SLDL event

RTOS model library• RTOS models for different

scheduling strategies– Round robin, priority based

• Parametrizable– Task parameters (priorities)

channel OS implements OSAPI {Task current = 0;os_queue rdyq;

void dispatch(void) {current = schedule(rdyq);if(current)notify(current.event);

}void yield() {task = current;rdyq.insert(task);dispatch();wait(task.event);

}

void time_wait(time t) {waitfor(t);yield();

}

Task pre_wait(void) {Task t = current;dispatch(); return t;

}void post_wait(Task t) {rdyq.insert(t);if (!current) dispatch();wait(t.event);

}};

1

5

10

15

20

25

30

schedule(rdyq);

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 20

Task Refinementprocess task_B2(OSAPI os) {

void main(void) {

... /* model execution delay */waitfor(BLOCK1_DELAY);...send();/* model execution delay */waitfor(BLOCK2_DELAY);

...

}

void send() {

wait(ack);

}};

1

5

10

15

20

25

os.task_terminate(h);

• Convert processes into tasks

• Task initialization– Register task with OS model

• Task activation– Wait for task start trigger from OS

• Replace delay model– Trigger rescheduling in OS

Preemption points

• Convert channels into IPC

• Communication and synchronization

– Wrap around SLDL event handling

os.time_wait(BLOCK1_DELAY);

os.time_wait(BLOCK2_DELAY);

Task h;void task_create(void) {h = os.task_create(“B2”,

APERIODIC, 0); }

os.task_activate(h);

t = os.pre_wait();

os.post_wait(t);

Page 11: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 11

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 21

Simulated Dynamic Behavior

C1

c1.recv()

c1.send()

Bu

s

bus.recv()

P2 P3

S1

Logical time

t0

t1

t2

t3

t5

t8

t6

t4

t7

Unscheduled

t0

t1

t2

t3

t4

t5

t6

t7

t8

Inaccuracy due to timing granularity

waitfor() waitfor()

waitfor()

waitfor()waitfor()

waitfor()

ISR

P1

waitfor()

Scheduled

C1

c1.recv()

c1.send()B

us

bus.recv()

Task P2 Task P3

S1

time_wait()

time_wait()

time_wait()

ISR

time_wait()

time_wait()

time_wait()

time_wait()

P1

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 22

OS Modeling Results

• Configurable, generic and flexible OS model

• Configurable scheduling strategies and parameters– Round-robin or priority-based scheduling

Scheduling exploration

Accuracy & speed

• Artificial task set example

GranularityAvg. speed

per coreAvg. err.

1 s 140 MIPS 0.4 %

10 s 1500 MIPS 0.4 %

100 s 9000 MIPS 1.0 %

1000 s 29000 MIPS 8.0%

Page 12: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 12

© 2015 A. Gerstlauer 23

Speed and Accuracy Tradeoffs

• Errors in discrete preemption models

Automatic Timing Granularity Adjustment (ATGA)• Observe system state to predict preemption points• Dynamically and optimally control timing model • Transparently integrated into OS model Eliminate preemption errors

Time

Thigh

rlrh

Idle

Preemption Error

fh fl

TlowRun

Preemption Error

• Potentially large preemption errors– Not bounded by

simulation granularity

Source: P. Razaghi, A. Gerstlauer. "Predictive OS Modeling for Host-Compiled Simulation of Periodic Real-Time Task Sets," Emb. Sys. Letters ‘12.P. Razaghi, A. Gerstlauer. “Automatic Timing Granularity Adjustment for Host-Compiled Software Simulation,” ASPDAC’12

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8

ATGA Model Execution Example

© 2015 A. Gerstlauer 24

•Ready

•Idle

t0 •rTH,1 t6t5t4t3t2

•Ready

•Wait

•rTH,3•rTH,2

•Sleep

• Predictive •OS Mode:

•Wait

• Fall-back

•Ready

t7

•TL

•TM

•TH

•TIntr

•fTH,1

•Idle

•fTH,2

•Ready

•Idle

• Predictive

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8

Fast and accurate

Page 13: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 13

Advanced OS Modeling Approaches

• Conservative

• Predict possible preemption points

• Simulate until next predicted point

• Fall back to fine granularity if prediction is not possible

Automatic timing granularity adjustment (ATGA) [Razaghi’12]

• Optimistic

• Simulate at coarse granularity assuming no preemptions

• Record disturbing influences

• Correct and roll back if necessary

Result-oriented modeling (ROM) [Schirner’08]

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 25

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 26

Operating System Layer

OS model

• On top of standard SLDL

• Wrap around SLDL primitives, replace event handling

– Block all but active task

– Select and dispatch tasks

• Target-independent, canonical API

– Task management

– Channel communication

– Timing and all events

Application

SLDL

OS Model

Task P2 Task P3

Page 14: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 14

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 27

Hardware Abstraction Layer (HAL)

• External communication

• Software Drivers– Presentation, session, network

communication layers

– Synchronization (interrupts)

• Hardware/software boundary– Low-level HW access

– Bus drivers and interrupt handlers

– Canonical HW/SW interface

• External interface– Bus transactions (TLM)

– Interrupt trigger

sample.send(v1);

void send(…) { intr.receive();bus.masterWrite(0xA000,

&tmp, len);

}

App

.D

river

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 28

Hardware Layer (1)

• Processor TLM

• HW interrupt handling– Interrupt logic

» Suspend user code

– Interrupt scheduling» Priority, nesting

• Peripherals– Interrupt controller

– Timers

• TLM bus model– Bus transactions

HAL: Hardware:

Page 15: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 15

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 29

Hardware Layer (2)

• Cache modeling• Pure behavioral

modeling– Tag state– Hits/misses– Replacement policy

• Integrated into back-annotation

– Called with accessedaddress trace

– Update cache state– Return delay

penalties

Implemented asSpecC channel

– < 200 lines of code

HWHALOSApp

TaskP2

C1

P1

TaskP3C2

OS Model

HWInt

IntA IntB IntC

UsrInt2UsrInt1

IntD

Bus TLM

INTAINTBINTCINTD

Cac

heM

odelAddresses

/ Delays

Source: A. Pedram, D. Craven, T. Amimeur, A. Gerstlauer. “Modeling Cache Effects at the Transaction Level," IESS 2009.

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 30

Hardware Layer (3)

• Bus-functional model (BFM)

• Pin-accurate processormodel

– Timing-accurate bus and interrupt protocols

• Bus model– Pin- and cycle-accurate

– Driving and sampling ofbus wires

Page 16: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 16

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 31

Processor Models

OS

OS HA

L

HW

-TLM

HW

-BF

M

OS HA

L

HW

-TLM

HW

-BF

M

BF

M -

ISS

• Processor layers

• Application– Native, host-compiled C

– Back-annotation

• OS– OS model

– Middleware, drivers

• HAL– Firmware

• Processorhardware

– Bus interfaces

– Interrupts

– Cache

Source: G. Schirner, A. Gerstlauer, R. Doemer. “Fast and Accurate Processor Models for Efficient MPSoC Design," TODAES, 2009.

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 32

Processor Model Example

• Voice encoding and decoding• Motorola DSP 56600

– Encoding & decoding tasks– custom OS

• 4 custom I/O blocks• 1 custom HW co-processor

– Codebook search

• Processor models• Perfect timing

– Back-annotated from ISS

• Priority-based OS model– EDF: Decoder > Encoder

• HW interrupt scheduling– 4 non-preempted priority levels

• Reference• Motorola proprietary ISS

Custom HWDSP 5660k

Encoder

Decoder

INTDINTCINTB

Codebook search

Cust. HWCust. HWCust. HW Cust. HW

Enc. Input

Enc. Output

Dec. Input

Dec. Output

DSP Port A

INTA

Page 17: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 17

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 33

Processor Model Results

• Vocoder example

• 163 speech frames

• Speed vs. accuracy

OS model (Appl Task)

Interrupts (FW TLM)

1800x speed w/ 3% error (vs. cycle-accurate ISS)

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 34

Multi-Core OS & Processor Models

• Multi-core OS model• SMP scheduler model

– Global or partitioned queue

• Configurable parameters– Number of cores– FIFO, round-robin, priority-based

scheduling policies– Priorities, affinity, time slice

(for round-robin)

• Multi-core processor model• Multi-core interrupt handling

chain models– Interrupt handlers & tasks– Configurable generic interrupt controller (GIC) model

• TLM bus interfaces

Source: P. Razaghi, A. Gerstlauer. "Host-Compiled Multi-Core System Simulation for Early Real-Time Performance Evaluation," ACM TECS ‘14.

OS

Multi-Core Scheduler

Dispatch

Global ReadyQueue

SLDL Simulation Kernel

Intr.Handler

Application

HAL

TLM

I/ODrv

I/O IF

T1

CH

Intr.Handler

Intr. IF

T2

Intr.Task

Intr.Task

T3

Page 18: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 18

Multi-Core OS Model

• Global or partitioned SMP scheduling

• Replicated or shared Ready, Idle, Sleep & Wait queues

• Processor suspension and interrupt handling

• Interrupt handlers as highest-priority OS-internal tasks

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 35

ISR

Interrupt task(bottom half)

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 36

MA

C

TL

M A

da

pterD

rvD

rv

Multi-Core Processor Model

• HW/SW interface• HAL• HW model

• Memory model• Multi-core

cache model

• Interrupt chain• Routing• Detection• Suspension• Bottom handler release

• System integration• I/O driver models, external TLM bus interfaces

Page 19: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 19

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 37

Interrupt Modeling Example

• Errors in preemption model due to discrete timing Integrate multi-core ATGA approach

Co

re 1

Co

re 0

tim

e

Multi-Core ATGA Model

• Enhanced fallbackmode check

• Ignore interrupthandlers in predictivemode

• Model inter-core interrupt notifications

• Adjust predicted times or switch to fallback

Accurate interrupt response times while maintaining speed

But: high-priority interrupt-driven tasks degrade performance

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 38

ATGA(Intr.H) ATGA(Intr.M)

ATGA(Intr.L)

ATGA(No.Intr)

10-2 10-1 100 10+110-2

10-1

100

10+1

10+2

Ave

rag

e E

rro

r [%

]

Simulation Time [Sec.]

Conventional (Intr.H)

Conventional (Intr.M)

Conventional (Intr.L)

Conventional (no Intr.)

10 ms

100 µs

1 µs

Page 20: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 20

Multi-Core Cache Model

• Application model• Per core memory

access list– Address, mode, time stamp

• Cache interface• Hardware layer of

processor model

• Generic cache model• Emulate cache state

– Only tags, no values– Return hit & miss info

• Parameterizable– Cache size, line size, associativity,

replacement & write-back policy

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 39

Source: P. Razaghi, A. Gerstlauer. “Multi-Core Cache Modeling for Host-Compiled Performance Simulation," ESLSyn ‘13.

Multi-Core Cache Simulation

• Directly committing accesses in simulation order Globally out-of-order in discrete timing model

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 40

Page 21: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 21

Multi-Core Cache Simulation• Delayed reordering of aggregated requests

Multi-Core Out-of-Order Cache (MOOC) model

100% accuracy @ coarse-grain speeds

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 41

•Safe-to-commit

•Safe-to-commit

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 42

Platform Simulation Example

• Cellphone baseband MPSoC

• Design space exploration: mapping & scheduling

Full-system simulation in close to real time

• 1400 MIPS at > 99% timing accuracy

Page 22: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 22

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 43

MPSoC Exploration Results

•Dual-Core

•Core-attached Interrupt•Single-Core •Dual-Core

•Task-attached Interrupt

0.1%

1.0%

10.0%

100.0%

1000.0%

0ms

8ms

16ms

24ms

Avg. Frame Error

MP3

Avg. Frame Delay

HCSim.TLM HCSim.TLM.no_Intr HCSim.TLM.no_Intr.error HCSim.TLM.error

0.1%

1.0%

10.0%

100.0%

0ms

10ms

20ms

30ms

Avg. Frame Error

JPEG

Avg. Frame Delay

HCSim.TLM HCSim.TLM.no_Intr HCSim.TLM.error HCSim.TLM.no_Intr.error

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 44

Lecture 8: Outline

Processor layers

Application

Task/OS

Firmware

Hardware

• Processor synthesis

• Software synthesis

• Hardware synthesis

Page 23: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 23

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 45

Software Synthesis

Automatically generate target binaries from TLM Generate code for application (tasks and IPC) Synthesize firmware (drivers, interrupt handlers) OS wrappers and HAL implementations from DB Compile and link against target RTOS and libraries

ISS

MA

C

Dri

ver

Dri

ver

HALRTOS

App.

Source: G. Schirner, A. Gerstlauer, R. Doemer. “Automatic Generation of Hardware dependent Software for MPSoCs from Abstract System Specifications,” ASPDAC08

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 46

Processor Implementation Models

• Software C model

• Generated application C code– Flat standard ANSI C code

• Firmware and hardware models– RTOS model, HAL model

– Low-level &hardware interrupt handling

– External bus communication protocol/TLM

• Software ISS model

• Reintegrared processor ISS– Bus-functional ISS wrapper

• Running generated binary– Application, RTOS, drivers, HAL

Bus Functional ModelHardware ShellCore ISS

ISS

nIRQnFIQ

ISS API (lib)

Bus Protocol

CPU_1.bin

HALInt.RTOSRAL

DriversSW Application

Page 24: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 24

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 47

Lecture 8: Outline

Processor layers

Application

Task/OS

Firmware

Hardware

• Processor synthesis

Software synthesis

• Hardware synthesis

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 48

Hardware Synthesis

• C-to-RTL high-level synthesis (HLS)

• Allocation, scheduling, binding

s3

s4

s5

t=y*i

d+=t

i++

s6 h=2*d

s1

s2

y=3*x

i=0

HW_FSMD

Behavioral RTL

HW_RTLController

Datapath

RegisterFile (RF)

Bus interface

FU

s3

s4

s5

s6

s1

s2

CLKCLK

b1b2

b3

Structural RTL

ctrl=10…10

Sch

edul

ing

Bin

ding

, net

lisin

g

……y = 3*x;i = 0;do {d += y * i;i++;

} while (i < 10);h = d + d;……

HW

BFM

Source: D. Shin, A. Gerstlauer, R. Doemer, D. Gajski. “An Interactive Design Environment for C-based High-level Synthesis of RTL Processors," TVLSI, 2008.

Page 25: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 25

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 49

Modeling of Hardware in SoC Design

• RTL Modeling

• State modeling: Accellera RTL Semantics Standard– Style 1: unmapped

» a = b * c;

– Style 2: storage mapped» R1 = R1 * RF2[4];

– Style 3: function mapped» R1 = ALU1(MULT, R1, RF2[4]);

– Style 4: connection mapped» Bus1 = R1;

» Bus2 = RF2[4];

» Bus3 = ALU1(MULT, Bus1, Bus2);

– Style 5: exposed control» ALU_CTRL = 011001b;

» RF2_CTRL = 010b;

» …

http://www.eda.org/alc-cwg/cwg-open.pdf

Source: R. Doemer

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 50

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

RTLModelingExample

Source: R. Doemer

Page 26: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 26

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 51

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

S1 : { a = b + c; // Accellera style 1 d = Inport * e; // (unmapped)Outport = a;goto S2;}

bit[32] a, b, c, d, e; // unmapped variables

MappedRTLExample

Source: R. Doemer

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 52

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

S1 : { RF[0]=RF[1]+RF[2]; // Accellera style 2 RF[3]=Inport*RF[4];// (storage mapped)Outport = RF[0];goto S2;}

buffered[CLK] bit[32] RF[4]; // register file

MappedRTLExample

Source: R. Doemer

Page 27: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 27

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 53

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

S1 : { RF[0] = // Accellera style 3 ADD0(RF[1],RF[2]);// (function mapped)RF[3] =MUL0(Inport,RF[4]);Outport = RF[0];goto S2;}

buffered[CLK] bit[32] RF[4]; // register file

MappedRTLExample

Source: R. Doemer

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 54

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

S1 : { BUS0 = RF[1]; // Accellera style 4 BUS1 = RF[2]; // (connection mapped)BUS3 = ADD0(BUS0,BUS1);RF[0]= BUS3;...goto S2;}

buffered[CLK] bit[32] RF[4]; // register file bit[32] BUS0, BUS1, BUS2; // busses

MappedRTLExample

Source: R. Doemer

Page 28: EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture8.pdf · 2015-10-12 · Model of operating system (task interleaving in time) High

EE382N: Embedded Sys Dsgn/Modeling Lecture 8

© 2015 A. Gerstlauer 28

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 55

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

S1 : { RF_CTRL = 011000b; // Accellera style 5 ADD0_CTRL = 01b; // (exposed control)MUL0_CTRL = 11b;...

goto S2;}

signal bit[5:0] RF_CTRL; // control wires signal bit[1:0] ADD0_CTRL, MUL0_CTRL;

MappedRTLExample

Source: R. Doemer

EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 56

Lecture 8: Summary

• Host-compiled computation modeling

• Model of software running in execution environment– Timed application, OS, bus drivers, interrupt handlers

– Processor hardware model, suspension, bus interfaces

Virtual platform prototype Embedded software development and validation

Viable complement to ISS-based validation

• Backend processor synthesis

• Software synthesis– Code generation, RTOS targeting, cross-compilation & linking

– Fully automatic final target binary generation

• Hardware synthesis– High-level/behavioral synthesis: allocation, scheduling, binding

– Interactive C-to-RTL synthesis flow