ee382n: embedded system design and...
TRANSCRIPT
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 1
EE382N:Embedded System Design and Modeling
Andreas GerstlauerElectrical and Computer Engineering
University of Texas at [email protected]
Lecture 8 – Computation Modeling & Refinement
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 2
Lecture 8: Outline
• Processor layers
• Application
• Task/OS
• Firmware
• Hardware
• Processor synthesis
• Software synthesis
• Hardware synthesis
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 2
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 3
General Processor Micro-Architecture
• Basic computation component is a processor (PE)
• Programmable, general-purpose software processor (CPU)
• Programmable special-purpose processor (e.g. DSPs)
• Application-specific instruction set processor (ASIP)
• Custom hardware processor
Functionality and timing (and power and …)
PE
Controller Datapath
Bus interface CLK
Control signals
Status lines∆t
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 4
Computation Modeling (1)
• Structural RTL models
Sub-cycle accurate
HW
Controller
State
Next state logic
Output logic
Datapath
Registerfile
Memory
Bus interface CLK
FU1
CPU
Controller Datapath
Registerfile
Memory(data &progr.)
Load/store unit CLK
ALU
IR
PC
Decode
Fetch
Software processor Hardware processor
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 3
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 5
Computation Modeling (2)
• Behavioral RTL models (FSMD)• Instruction-set simulation (ISS) models
• Purely functional (binary translation) [QEMU,…]• Micro-architectural (RTL in C) [GEM5,…]
Cycle or timing accurate
HW
HW_CLK
CPU
CPU_CLK
HAL
ISS
RTOS
App.
Instruction set simulation (ISS) FSMD
Bin
ary
© 2015 A. Gerstlauer 6
Computation Modeling (3)
• Host-compiled models
• Source-level application model– Compile & execute natively
– Fast functional simulation
• Back-annotate timing and other metrics
• Abstract OS and processor models
• Transaction-level model (TLM) backplane
• C-based discrete-eventsimulation kernel [SpecC,SystemC]
Fast and accurate full-system simulation
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 4
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 7
Host-Compiled Computation Layers
• Application
• Process execution (C code)
• Execution timing
• OS & processor
• Operating system– Real-time multi-tasking (RTOS model)
– Bus drivers (C code)
• Hardware abstraction layer (HAL)– Interrupt handlers
– Media accesses
• Processor hardware– Bus interfaces (I/O state machines)
– Interrupt suspension and timing
P1 P2
OS
CP
U
Drv
Interrupts
Bus
ISRHAL
Process B1(){
…waitfor(15000);…waitfor(25000);…
};
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 8
• High-level, abstract programming model• Hierarchical process graph
– ANSI C leaf processes– Parallel-serial composition
• Abstract, typed inter-process communication
– Channels– Shared variables
Timed simulation of application functionality• Annotate timing, energy, …
– Granularity?– Compiler optimizations?– Dynamic architecture effects?
Source profiling [SCE] Back-annotate from ISS Predict from host activity
Application Layer
Logical time
5 100
CPU
B2 C1
B1
B3C2
… … …
... void f() {
waitfor(5);...
}...
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 5
© 2015 A. Gerstlauer 9
Source-Level Back-Annotation
• Retargetable back-annotation flow • Intermediate
representation (IR)– Frontend optimizations [gcc]– IR to C conversion
• Target binary– Cross-compiler backend [gcc]– Control-flow graph
matching
• Timing and power estimation
– ISS or RTL– Cycle-accurate timing,
power, …
• Back-annotation into IR– Basic block level
C Source Code
Frontend Optimisations
(gcc)
Intermediate Rep. (IR)
Backend
Binary
a=b=c=0;if(a<=0) { a=1; c=2; }……printf(…);
bb_2: a = 1; b = 0; c = 2; goto bb_7;bb_3:…..bb_7: printf(…);
Compile-able Intermediate Code
IR to C
Timing and
Energy Back
Annotator
bb_2: a = 1; b = 0; c = 2; incrDelay(15); incrEnergy(2); bb = BB_2; goto bb_7;bb_3: ….. incrDelay(delay[bb][BB_3]); incrEnergy(energy[bb][BB_3]); bb = BB_3;
…..
Host-Compiled (HC) Model
IR
Binary
GraphMatching
Mapping Table
Basic BlockTiming and Energy Cz.
AugmentedMapping Table
Back Annotator
uADL ISS
McPAT
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8
Source: S. Chakravarty, Z. Zhao, A. Gerstlauer. “Automated, Retargetable Back-Annotation for Host-Compiled Performance and Power Modeling," CODES+ISSS’13.
10© 2015 A. GerstlauerEE382N: Embedded Sys Dsgn/Modeling, Lecture 8
Binary-to-Source/IR Mapping
• Compiler optimizations• Frontend
– Control flow optimizations
• Backend– Instruction scheduling/percolation
Mismatches– Capture frontend by annotating
at IR, not source– Establish binary-IR mapping
for back-annotation
Graph matching heuristics• Synchronized, recursive depth-first traversal
– Compatibility: loop and branch nesting levels– Cost: sum of unmatched nodes in subgraphs rooted at node– Return least-cost mapping between all successors (incl. skips)
• Resolve ambiguities using debug information
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 6
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 11
Timing/Energy Characterization
• Basic block characterization• Execution depends on state
– Pipeline stalls in case of hazards– Pipeline overlaps in multi-issue
• Pairwise characterization– Over all immediate predecessors – Across function hierarchy
• Timing & energy– First-to-last instruction fetch time– Resource utilization statistics
• Back-annotation into IR
• Path-dependent metrics– Capture static branch prediction
bb_2:a = 1; b = 0; c = 2;goto bb_7;
wait(15); energy(2);bb_3:…..If(prev_bb==3)
wait(25); energy(5);else if(prev_bb==1)
wait(30); energy(6);…..bb_7: printf(…);
Annotated IR
BB1 BB2
BB3
Exec flow 1
Exec flow 2
SS =A SS = BSS – Sys State
(registers, mem,
pipeline)
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 12
void f(void) {BB1: ...
os.wait(BB1_DELAY);if (c) goto BB2;
BB2: a[i][j] += sum;
...
os.wait(BB2_DELAY);BB3: ...
os.wait(BB3_DELAY);drv.write(res);
}
Cache-Aware Back-Annotation
TLM
FrontendOptimizations
IntermediateIntermediatecode
Retargetable Backend
CW
PC
Binarycode
void f(void) {BB1: ...
waitfor(BB1_DELAY);if (c) goto BB3;
BB2: a[i][j] += sum;alist[__idx] =
A_BASE + 4*(i*A_WID+j);...miss = cache.upd(__alist, __idx);waitfor(BB2_DELAY + miss);
BB3: ...waitfor(BB3_DELAY);ch.write(res);
}
Cac
he m
odel
Micro-architecturedescription
Block-Level Characterization
• Memory address tracing• Stack, heap
Addresslayout
Memoryaccesses
• Hybrid model• Static/dynamic back-annotation
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 7
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 13
Source-Level Modeling Accuracy & Speed
• One-time back-annotation overhead
3min. to 3s runtime (function of code size)
Close to cycle-accurate at source-level speeds
>98% timing and energy accuracy @ 2000 MIPS
>95% accuracy @ 160 MIPS including cache
Integrate back-annotation of other metrics
Performance, energy, reliability, power, thermal (PERPT)
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
10
SHA(Small)
SHA(Large)
ADPCM(Small)
ADPCM(Large)
CRC32(Small)
CRC32(Large)
Sieve
Err
or
[%]
Z4 Timing Z4 Timing With CacheZ6 Timing Z6 Timing With CacheZ4 Energy Z4 Energy With CacheZ6 Energy Z6 Energy With Cache
1
10
100
1000
10000
SHA(Small)
SHA(Large)
ADPCM(Small)
ADPCM(Large)
CRC32(Small)
CRC32(Large)
Sieve
MIP
S
Source IR HC HC With Cache
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 14
Application Layer
• Application source code• C-based process model
– Parallel programming model– Canonical API & MoC
• Communication primitives– IPC channel library
• Timing model• Block-/IR-level granularity
– Capture data-dependent execution– Capture compiler effects
• Hybrid simulation– Back-annotation of static aspects from one-time
static analysis/estimation– Simulation of dynamic micro-architecture effects
(models of caches, branch predictors, …)
Single task timing model
CPU
P2 C1
P1
P3C2
process P3{
void main() {…c1.recv();…waitfor(5);…c2.send();…
}};
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 8
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 15
Operating System Layer
• Scheduling
• Group processes into tasks– Static scheduling
• Schedule tasks– Dynamic scheduling, multitasking
– Preemption, interrupt handling
– Task communication (IPC)
Scheduling refinement
• Flatten hierarchy
• Reorder behaviors
OS refinement
• Insert OS model
• Task refinement
• IPC refinement
Application
SLDL
OS Layer
P1 P2
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 16
OS Modeling
• High-level RTOS abstraction
• Specification is fast but inaccurate– Native execution, truly concurrent model
• Traditional ISS-based validation infeasible– Accurate but slow (esp. in multi-processor context), requires full binary
Model of operating system (task interleaving in time) High accuracy but small overhead at early stages
Focus on key effects, abstract unnecessary implementation details
Model all concepts: multi-tasking, scheduling, preemption, interrupts, IPC
Specification Host-Compiled Implementation
Source: A. Gerstlauer, H. Yu, D. Gajski. "RTOS Modeling for System-Level Design," DATE03.
Application
SLDL
Channels
RTOS Model
T1 T2
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 9
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 17
Abstract RTOS Model
• Emulate the sequential execution of concurrent tasks• Task scheduler
– Maintain task queues, determine task(s) to run & perform context switch
• Timing model– Simulate back-annotated task delays, call scheduler to allow for preemptions
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 18
RTOS Model Interface
interface OSAPI {
void init();void start(int sched_alg); void interrupt_return();
Task task_create(char *name, int type,sim_time period);
void task_terminate(); void task_sleep(); void task_activate(Task t); void task_endcycle();void task_kill(Task t); Task par_start();void par_end(Task t);
Task pre_wait();void post_wait(Task t);
void time_wait(sim_time nsec); };
1
5
10
15
20
Task management
OS management
Event handling
Delay modeling
• Canonical, target-independent API
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 10
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 19
RTOS Model Implementation• RTOS model
• OS, task, event management– Descriptors & queues
• Context switching– Block all but active task on SLDL level
• Scheduling– Select and dispatch task based on
algorithm
• Preemption– Allow rescheduling at simulation time
increases
• Event handling– Remove task temporarily from OS
while waiting for SLDL event
RTOS model library• RTOS models for different
scheduling strategies– Round robin, priority based
• Parametrizable– Task parameters (priorities)
channel OS implements OSAPI {Task current = 0;os_queue rdyq;
void dispatch(void) {current = schedule(rdyq);if(current)notify(current.event);
}void yield() {task = current;rdyq.insert(task);dispatch();wait(task.event);
}
void time_wait(time t) {waitfor(t);yield();
}
Task pre_wait(void) {Task t = current;dispatch(); return t;
}void post_wait(Task t) {rdyq.insert(t);if (!current) dispatch();wait(t.event);
}};
1
5
10
15
20
25
30
schedule(rdyq);
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 20
Task Refinementprocess task_B2(OSAPI os) {
void main(void) {
... /* model execution delay */waitfor(BLOCK1_DELAY);...send();/* model execution delay */waitfor(BLOCK2_DELAY);
...
}
void send() {
wait(ack);
}};
1
5
10
15
20
25
os.task_terminate(h);
• Convert processes into tasks
• Task initialization– Register task with OS model
• Task activation– Wait for task start trigger from OS
• Replace delay model– Trigger rescheduling in OS
Preemption points
• Convert channels into IPC
• Communication and synchronization
– Wrap around SLDL event handling
os.time_wait(BLOCK1_DELAY);
os.time_wait(BLOCK2_DELAY);
Task h;void task_create(void) {h = os.task_create(“B2”,
APERIODIC, 0); }
os.task_activate(h);
t = os.pre_wait();
os.post_wait(t);
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 11
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 21
Simulated Dynamic Behavior
C1
c1.recv()
c1.send()
Bu
s
bus.recv()
P2 P3
S1
Logical time
t0
t1
t2
t3
t5
t8
t6
t4
t7
Unscheduled
t0
t1
t2
t3
t4
t5
t6
t7
t8
Inaccuracy due to timing granularity
waitfor() waitfor()
waitfor()
waitfor()waitfor()
waitfor()
ISR
P1
waitfor()
Scheduled
C1
c1.recv()
c1.send()B
us
bus.recv()
Task P2 Task P3
S1
time_wait()
time_wait()
time_wait()
ISR
time_wait()
time_wait()
time_wait()
time_wait()
P1
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 22
OS Modeling Results
• Configurable, generic and flexible OS model
• Configurable scheduling strategies and parameters– Round-robin or priority-based scheduling
Scheduling exploration
Accuracy & speed
• Artificial task set example
GranularityAvg. speed
per coreAvg. err.
1 s 140 MIPS 0.4 %
10 s 1500 MIPS 0.4 %
100 s 9000 MIPS 1.0 %
1000 s 29000 MIPS 8.0%
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 12
© 2015 A. Gerstlauer 23
Speed and Accuracy Tradeoffs
• Errors in discrete preemption models
Automatic Timing Granularity Adjustment (ATGA)• Observe system state to predict preemption points• Dynamically and optimally control timing model • Transparently integrated into OS model Eliminate preemption errors
Time
Thigh
rlrh
Idle
Preemption Error
fh fl
TlowRun
Preemption Error
• Potentially large preemption errors– Not bounded by
simulation granularity
Source: P. Razaghi, A. Gerstlauer. "Predictive OS Modeling for Host-Compiled Simulation of Periodic Real-Time Task Sets," Emb. Sys. Letters ‘12.P. Razaghi, A. Gerstlauer. “Automatic Timing Granularity Adjustment for Host-Compiled Software Simulation,” ASPDAC’12
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8
ATGA Model Execution Example
© 2015 A. Gerstlauer 24
•Ready
•Idle
t0 •rTH,1 t6t5t4t3t2
•Ready
•Wait
•rTH,3•rTH,2
•Sleep
• Predictive •OS Mode:
•Wait
• Fall-back
•Ready
t7
•TL
•TM
•TH
•TIntr
•fTH,1
•Idle
•fTH,2
•Ready
•Idle
• Predictive
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8
Fast and accurate
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 13
Advanced OS Modeling Approaches
• Conservative
• Predict possible preemption points
• Simulate until next predicted point
• Fall back to fine granularity if prediction is not possible
Automatic timing granularity adjustment (ATGA) [Razaghi’12]
• Optimistic
• Simulate at coarse granularity assuming no preemptions
• Record disturbing influences
• Correct and roll back if necessary
Result-oriented modeling (ROM) [Schirner’08]
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 25
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 26
Operating System Layer
OS model
• On top of standard SLDL
• Wrap around SLDL primitives, replace event handling
– Block all but active task
– Select and dispatch tasks
• Target-independent, canonical API
– Task management
– Channel communication
– Timing and all events
Application
SLDL
OS Model
Task P2 Task P3
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 14
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 27
Hardware Abstraction Layer (HAL)
• External communication
• Software Drivers– Presentation, session, network
communication layers
– Synchronization (interrupts)
• Hardware/software boundary– Low-level HW access
– Bus drivers and interrupt handlers
– Canonical HW/SW interface
• External interface– Bus transactions (TLM)
– Interrupt trigger
sample.send(v1);
void send(…) { intr.receive();bus.masterWrite(0xA000,
&tmp, len);
}
App
.D
river
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 28
Hardware Layer (1)
• Processor TLM
• HW interrupt handling– Interrupt logic
» Suspend user code
– Interrupt scheduling» Priority, nesting
• Peripherals– Interrupt controller
– Timers
• TLM bus model– Bus transactions
HAL: Hardware:
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 15
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 29
Hardware Layer (2)
• Cache modeling• Pure behavioral
modeling– Tag state– Hits/misses– Replacement policy
• Integrated into back-annotation
– Called with accessedaddress trace
– Update cache state– Return delay
penalties
Implemented asSpecC channel
– < 200 lines of code
HWHALOSApp
TaskP2
C1
P1
TaskP3C2
OS Model
HWInt
IntA IntB IntC
UsrInt2UsrInt1
IntD
Bus TLM
INTAINTBINTCINTD
Cac
heM
odelAddresses
/ Delays
Source: A. Pedram, D. Craven, T. Amimeur, A. Gerstlauer. “Modeling Cache Effects at the Transaction Level," IESS 2009.
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 30
Hardware Layer (3)
• Bus-functional model (BFM)
• Pin-accurate processormodel
– Timing-accurate bus and interrupt protocols
• Bus model– Pin- and cycle-accurate
– Driving and sampling ofbus wires
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 16
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 31
Processor Models
OS
OS HA
L
HW
-TLM
HW
-BF
M
OS HA
L
HW
-TLM
HW
-BF
M
BF
M -
ISS
• Processor layers
• Application– Native, host-compiled C
– Back-annotation
• OS– OS model
– Middleware, drivers
• HAL– Firmware
• Processorhardware
– Bus interfaces
– Interrupts
– Cache
Source: G. Schirner, A. Gerstlauer, R. Doemer. “Fast and Accurate Processor Models for Efficient MPSoC Design," TODAES, 2009.
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 32
Processor Model Example
• Voice encoding and decoding• Motorola DSP 56600
– Encoding & decoding tasks– custom OS
• 4 custom I/O blocks• 1 custom HW co-processor
– Codebook search
• Processor models• Perfect timing
– Back-annotated from ISS
• Priority-based OS model– EDF: Decoder > Encoder
• HW interrupt scheduling– 4 non-preempted priority levels
• Reference• Motorola proprietary ISS
Custom HWDSP 5660k
Encoder
Decoder
INTDINTCINTB
Codebook search
Cust. HWCust. HWCust. HW Cust. HW
Enc. Input
Enc. Output
Dec. Input
Dec. Output
DSP Port A
INTA
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 17
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 33
Processor Model Results
• Vocoder example
• 163 speech frames
• Speed vs. accuracy
OS model (Appl Task)
Interrupts (FW TLM)
1800x speed w/ 3% error (vs. cycle-accurate ISS)
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 34
Multi-Core OS & Processor Models
• Multi-core OS model• SMP scheduler model
– Global or partitioned queue
• Configurable parameters– Number of cores– FIFO, round-robin, priority-based
scheduling policies– Priorities, affinity, time slice
(for round-robin)
• Multi-core processor model• Multi-core interrupt handling
chain models– Interrupt handlers & tasks– Configurable generic interrupt controller (GIC) model
• TLM bus interfaces
Source: P. Razaghi, A. Gerstlauer. "Host-Compiled Multi-Core System Simulation for Early Real-Time Performance Evaluation," ACM TECS ‘14.
OS
Multi-Core Scheduler
Dispatch
Global ReadyQueue
SLDL Simulation Kernel
Intr.Handler
Application
HAL
TLM
I/ODrv
I/O IF
T1
CH
Intr.Handler
Intr. IF
T2
Intr.Task
Intr.Task
T3
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 18
Multi-Core OS Model
• Global or partitioned SMP scheduling
• Replicated or shared Ready, Idle, Sleep & Wait queues
• Processor suspension and interrupt handling
• Interrupt handlers as highest-priority OS-internal tasks
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 35
ISR
Interrupt task(bottom half)
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 36
MA
C
TL
M A
da
pterD
rvD
rv
Multi-Core Processor Model
• HW/SW interface• HAL• HW model
• Memory model• Multi-core
cache model
• Interrupt chain• Routing• Detection• Suspension• Bottom handler release
• System integration• I/O driver models, external TLM bus interfaces
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 19
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 37
Interrupt Modeling Example
• Errors in preemption model due to discrete timing Integrate multi-core ATGA approach
Co
re 1
Co
re 0
tim
e
Multi-Core ATGA Model
• Enhanced fallbackmode check
• Ignore interrupthandlers in predictivemode
• Model inter-core interrupt notifications
• Adjust predicted times or switch to fallback
Accurate interrupt response times while maintaining speed
But: high-priority interrupt-driven tasks degrade performance
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 38
ATGA(Intr.H) ATGA(Intr.M)
ATGA(Intr.L)
ATGA(No.Intr)
10-2 10-1 100 10+110-2
10-1
100
10+1
10+2
Ave
rag
e E
rro
r [%
]
Simulation Time [Sec.]
Conventional (Intr.H)
Conventional (Intr.M)
Conventional (Intr.L)
Conventional (no Intr.)
10 ms
100 µs
1 µs
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 20
Multi-Core Cache Model
• Application model• Per core memory
access list– Address, mode, time stamp
• Cache interface• Hardware layer of
processor model
• Generic cache model• Emulate cache state
– Only tags, no values– Return hit & miss info
• Parameterizable– Cache size, line size, associativity,
replacement & write-back policy
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 39
Source: P. Razaghi, A. Gerstlauer. “Multi-Core Cache Modeling for Host-Compiled Performance Simulation," ESLSyn ‘13.
Multi-Core Cache Simulation
• Directly committing accesses in simulation order Globally out-of-order in discrete timing model
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 40
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 21
Multi-Core Cache Simulation• Delayed reordering of aggregated requests
Multi-Core Out-of-Order Cache (MOOC) model
100% accuracy @ coarse-grain speeds
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 41
•Safe-to-commit
•Safe-to-commit
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 42
Platform Simulation Example
• Cellphone baseband MPSoC
• Design space exploration: mapping & scheduling
Full-system simulation in close to real time
• 1400 MIPS at > 99% timing accuracy
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 22
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 43
MPSoC Exploration Results
•Dual-Core
•Core-attached Interrupt•Single-Core •Dual-Core
•Task-attached Interrupt
0.1%
1.0%
10.0%
100.0%
1000.0%
0ms
8ms
16ms
24ms
Avg. Frame Error
MP3
Avg. Frame Delay
HCSim.TLM HCSim.TLM.no_Intr HCSim.TLM.no_Intr.error HCSim.TLM.error
0.1%
1.0%
10.0%
100.0%
0ms
10ms
20ms
30ms
Avg. Frame Error
JPEG
Avg. Frame Delay
HCSim.TLM HCSim.TLM.no_Intr HCSim.TLM.error HCSim.TLM.no_Intr.error
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 44
Lecture 8: Outline
Processor layers
Application
Task/OS
Firmware
Hardware
• Processor synthesis
• Software synthesis
• Hardware synthesis
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 23
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 45
Software Synthesis
Automatically generate target binaries from TLM Generate code for application (tasks and IPC) Synthesize firmware (drivers, interrupt handlers) OS wrappers and HAL implementations from DB Compile and link against target RTOS and libraries
ISS
MA
C
Dri
ver
Dri
ver
HALRTOS
App.
Source: G. Schirner, A. Gerstlauer, R. Doemer. “Automatic Generation of Hardware dependent Software for MPSoCs from Abstract System Specifications,” ASPDAC08
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 46
Processor Implementation Models
• Software C model
• Generated application C code– Flat standard ANSI C code
• Firmware and hardware models– RTOS model, HAL model
– Low-level &hardware interrupt handling
– External bus communication protocol/TLM
• Software ISS model
• Reintegrared processor ISS– Bus-functional ISS wrapper
• Running generated binary– Application, RTOS, drivers, HAL
Bus Functional ModelHardware ShellCore ISS
ISS
nIRQnFIQ
ISS API (lib)
Bus Protocol
CPU_1.bin
HALInt.RTOSRAL
DriversSW Application
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 24
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 47
Lecture 8: Outline
Processor layers
Application
Task/OS
Firmware
Hardware
• Processor synthesis
Software synthesis
• Hardware synthesis
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 48
Hardware Synthesis
• C-to-RTL high-level synthesis (HLS)
• Allocation, scheduling, binding
s3
s4
s5
t=y*i
d+=t
i++
s6 h=2*d
s1
s2
y=3*x
i=0
HW_FSMD
Behavioral RTL
HW_RTLController
Datapath
RegisterFile (RF)
Bus interface
FU
s3
s4
s5
s6
s1
s2
CLKCLK
b1b2
b3
Structural RTL
ctrl=10…10
Sch
edul
ing
Bin
ding
, net
lisin
g
……y = 3*x;i = 0;do {d += y * i;i++;
} while (i < 10);h = d + d;……
HW
BFM
Source: D. Shin, A. Gerstlauer, R. Doemer, D. Gajski. “An Interactive Design Environment for C-based High-level Synthesis of RTL Processors," TVLSI, 2008.
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 25
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 49
Modeling of Hardware in SoC Design
• RTL Modeling
• State modeling: Accellera RTL Semantics Standard– Style 1: unmapped
» a = b * c;
– Style 2: storage mapped» R1 = R1 * RF2[4];
– Style 3: function mapped» R1 = ALU1(MULT, R1, RF2[4]);
– Style 4: connection mapped» Bus1 = R1;
» Bus2 = RF2[4];
» Bus3 = ALU1(MULT, Bus1, Bus2);
– Style 5: exposed control» ALU_CTRL = 011001b;
» RF2_CTRL = 010b;
» …
http://www.eda.org/alc-cwg/cwg-open.pdf
Source: R. Doemer
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 50
SpecC RTL Modeling
behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)
{ void main(void){fsmd(CLK) // clock + sensitivity
{bit[32] a, b, c, d, e; // local variables
{ Outport = 0; // defaultDone = 0b; // assignments}
if (RST) { goto S0; // reset actions}
S0 : { if (Start) goto S1;else goto S0;}
S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}
... }}
};
RTLModelingExample
Source: R. Doemer
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 26
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 51
SpecC RTL Modeling
behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)
{ void main(void){fsmd(CLK) // clock + sensitivity
{bit[32] a, b, c, d, e; // local variables
{ Outport = 0; // defaultDone = 0b; // assignments}
if (RST) { goto S0; // reset actions}
S0 : { if (Start) goto S1;else goto S0;}
S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}
... }}
};
S1 : { a = b + c; // Accellera style 1 d = Inport * e; // (unmapped)Outport = a;goto S2;}
bit[32] a, b, c, d, e; // unmapped variables
MappedRTLExample
Source: R. Doemer
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 52
SpecC RTL Modeling
behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)
{ void main(void){fsmd(CLK) // clock + sensitivity
{bit[32] a, b, c, d, e; // local variables
{ Outport = 0; // defaultDone = 0b; // assignments}
if (RST) { goto S0; // reset actions}
S0 : { if (Start) goto S1;else goto S0;}
S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}
... }}
};
S1 : { RF[0]=RF[1]+RF[2]; // Accellera style 2 RF[3]=Inport*RF[4];// (storage mapped)Outport = RF[0];goto S2;}
buffered[CLK] bit[32] RF[4]; // register file
MappedRTLExample
Source: R. Doemer
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 27
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 53
SpecC RTL Modeling
behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)
{ void main(void){fsmd(CLK) // clock + sensitivity
{bit[32] a, b, c, d, e; // local variables
{ Outport = 0; // defaultDone = 0b; // assignments}
if (RST) { goto S0; // reset actions}
S0 : { if (Start) goto S1;else goto S0;}
S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}
... }}
};
S1 : { RF[0] = // Accellera style 3 ADD0(RF[1],RF[2]);// (function mapped)RF[3] =MUL0(Inport,RF[4]);Outport = RF[0];goto S2;}
buffered[CLK] bit[32] RF[4]; // register file
MappedRTLExample
Source: R. Doemer
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 54
SpecC RTL Modeling
behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)
{ void main(void){fsmd(CLK) // clock + sensitivity
{bit[32] a, b, c, d, e; // local variables
{ Outport = 0; // defaultDone = 0b; // assignments}
if (RST) { goto S0; // reset actions}
S0 : { if (Start) goto S1;else goto S0;}
S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}
... }}
};
S1 : { BUS0 = RF[1]; // Accellera style 4 BUS1 = RF[2]; // (connection mapped)BUS3 = ADD0(BUS0,BUS1);RF[0]= BUS3;...goto S2;}
buffered[CLK] bit[32] RF[4]; // register file bit[32] BUS0, BUS1, BUS2; // busses
MappedRTLExample
Source: R. Doemer
EE382N: Embedded Sys Dsgn/Modeling Lecture 8
© 2015 A. Gerstlauer 28
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 55
SpecC RTL Modeling
behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)
{ void main(void){fsmd(CLK) // clock + sensitivity
{bit[32] a, b, c, d, e; // local variables
{ Outport = 0; // defaultDone = 0b; // assignments}
if (RST) { goto S0; // reset actions}
S0 : { if (Start) goto S1;else goto S0;}
S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}
... }}
};
S1 : { RF_CTRL = 011000b; // Accellera style 5 ADD0_CTRL = 01b; // (exposed control)MUL0_CTRL = 11b;...
goto S2;}
signal bit[5:0] RF_CTRL; // control wires signal bit[1:0] ADD0_CTRL, MUL0_CTRL;
MappedRTLExample
Source: R. Doemer
EE382N: Embedded Sys Dsgn/Modeling, Lecture 8 © 2015 A. Gerstlauer 56
Lecture 8: Summary
• Host-compiled computation modeling
• Model of software running in execution environment– Timed application, OS, bus drivers, interrupt handlers
– Processor hardware model, suspension, bus interfaces
Virtual platform prototype Embedded software development and validation
Viable complement to ISS-based validation
• Backend processor synthesis
• Software synthesis– Code generation, RTOS targeting, cross-compilation & linking
– Fully automatic final target binary generation
• Hardware synthesis– High-level/behavioral synthesis: allocation, scheduling, binding
– Interactive C-to-RTL synthesis flow