memory oriented system-level optimizations for scripting enabled embedded systems

72
Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems Jiwon Hahn PhD Qualifying Exam University of California, Irvine March 2006

Upload: camilla-farren

Post on 31-Dec-2015

31 views

Category:

Documents


2 download

DESCRIPTION

Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems. Jiwon Hahn PhD Qualifying Exam University of California, Irvine March 2006. Motivation ▶ Embedded system development. Growing challenges Increasing end-user’s expectation More functionality - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn

PhD Qualifying ExamUniversity of California, IrvineMarch 2006

Page 2: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 2

Motivation▶ Embedded system development Growing challenges

Increasing end-user’s expectation More functionality Higher performance Cheaper Smaller

Very short time-to-market Wide gap between available techniques

and user satisfaction

Need new tools and methodology!

physiological sensing

motion sensing structural

healthmonitoring

preterm infantmonitoring

eco node

Page 3: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 3

Strategies

Speed up the development! Need better programming/debugging

methodology and tool

Improve the current system’s bottleneck! Memory unit is one of the most costly

components, and affects system’s performance, power, and overall application range

Maximize the system’s capability! Since embedded system is resource

constrained, it helps to partition the system workload to the host

Page 4: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 4

About My Research

Framework Enhanced programming/debugging

methodology Host-assisting runtime environment

Optimization Reducing data memory requirements and

increasing memory utilization Power and performance co-optimization

Page 5: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 5

Outline

Scripting Framework Memory-oriented Optimization Implementation Experimental Platforms Summary & Research Plan

Page 6: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 6

Outline

▶ Scripting Framework⊳Scripting Engine Synthesis⊳Runtime Environment⊳Preliminary Results

Memory-oriented Optimization Implementation Experimental Platforms Summary & Research Plan

Page 7: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 7

Motivating Example▶ Building a small embedded system Application

temperature sensor sense temperature, send to the host every 5 min.

Platform TecO particle

17 x 35 mm PIC18LF452 at 20 MHz 32KB program Flash 1.5KB RAM 32KB external EEPROM temperature sensor RF interface Etc.

Hardware Solder RF module

1. Write the FW (C/assembly)

Software (or Firmware) no OS support! no interactivity no partial testing

2. Compile

3. Connect board to the host

4. Enter the bootloading mode

5. Erase/Load/Verify Program

6. Restart the board

7. Run

repeat

Page 8: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 8

Motivation▶ Alternative approach: Scripting! Environment Setup Scripting

1. Generate the FW (Scripting engine synthesis)

2. Compile

3. Connect board to the host

4. Enter the bootloading mode

5. Erase/Load/Verify Program

6. Restart the board

7. Run

1. Write the script

2. Connect board to the host

3. Load & Runrepeat

Scripting Engine Synthesis Runtime+

Page 9: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 9

Motivation▶ Scripting vs. Traditional Programming

Aspects Traditional Scripting

Language C, Assemblyless human readable

Python, Tcl, Perl, …higher level

System Query

No interactivityneed oscilloscope, multimeter to check the status

Instant feedback

System Update

Recompile, reboot required

On-the-fly

Code Size 5x~ 10x more lines[J. Ousterhout ’98]

Shorter

Performance Overhead

None Scripting engine-dependant(could be None or less)

Page 10: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 10

Related Work▶ Frameworks for runtime support

Name high level(language)

interactivity

reconfigurability

kernel synthesis

hetero. sys.

code size

SOS no (C) no yes* yes yes 20K

Mate no (asm-like) no yes* no no 39K

TinyOS no (nesC) no yes yes* no 18K

Agilla no (asm-like) yes yes* no no 55K

Pushpin no (C-subset) no yes* no (berthaOS

)

no 34K

Sensorware

yes* (Tcl) yes yes* no no >237K

Actornet yes* (S-expression)

N/A yes no no <128K

VM* yes (java) no yes* yes N/A 25K

Our work yes (python-like)

yes yes yes yes <17K

Page 11: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 11

Our Framework: Rappit▶ Overview

H/ W Device

Device Drivers

#include <stdio.h>void main(void){ int a; . . For(i=0;i<2;i++) { . a =b * c; } . . return;}

Rappit F/ W

ApplicationScript

Target SystemHostRappit S/W

Wired/Wireless link

Framework to provide user an integrated scripting environment of the host and target systems

>> readTemperature()

Receive packets

Interpret the command

Execute primitives

(e.g., ADC read)

Return the result

52

Page 12: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 12

Rappit▶ Scripting engine synthesis

ComponentLibrary

CodeSynthesis

Target F/W(Scripting

Engine,Primitives,…)

Architecture Application Communication

CompatibleMessage format

Interactive

Language

Binary

Executable

Host

Target System

Host S/W(Parser, MsgGen,GUI, …)

# example: pin mapping for an RF modulemcu = MCU(ATmega169) # instantiate an atmega169 MCUimport RF # load a transceiver modulerf = RF(nRF2401) # instantiate nRF2401rf.CS = mcu.PORTB[0] # connect the chip select pinrf.CE = mcu.PORTB[1] # connect the chip enable pinrf.DR1 = mcu.PORTB[2] # connect the data ready pinrf.CLK1 = mcu.PORTF[1] # connect the clock pinrf.DOUT1 = mcu.PORTF[2] # connect the data pin

# example: packet formatc_format = src(1),dst(1),msgID(1),opcode(1),arg(3),crc(1)r_format = src(1),dst(1),msgID(1),mtype(1),dtype(1),\

data(v), crc(1),eop(1)

System Description

// part of Scripting engineswitch (opcode){

case 0x00: val =

ADC_read();case 0x01: RF_send(val);case 0x02:

RF_packetize(val);…

}

// part of primitiveschar ADC_read(void){ …}

void RF_send(char pck){ …}

Page 13: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 13

Rappit▶ Runtime environment

Pars

er

Op

timize

r

Pars

er

Op

timize

r

GUI

ComponentLibrary

PacketManager

Pcktze

r/D

ep

cktze

rScriptingEngine

AdmissionController

Native Routines

Host

commandresponse

Target System

Msg

Gen

era

tor

Pck

Bu

ffer

Pcktze

r/D

isp

atc

her

Host Assisting modules

Page 14: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 14

Rappit▶ Host assistance

Script Parsing (Parser)

Memory Management (Optimizer)

“readTemp()” Host Parser,Msg. generator

“0x4A0x01”

• User friendly

Syntax

• Easy to parse at node• Compact and efficient

representation

Script Scheduler, Buffer Mapper

Raw script

• Written by user

Optimized script

• Minimal script size• Minimized memory usage • Minimized runtime overhead

(Fixed schedule and buffer usage)

To target node

To target node

Page 15: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 15

Rappit▶ Scripting examples

Interactive port-setting>> PORTA[2] = 1 # toggle clock

>> PORTA[2] = 0

>> PORTA[1] = 1 # set port A pin 1

>> PORTA[0] # read input pin

0

>> PORTA[2] = 1

>> PORTA[2] = 0 # toggle clock

>> PORTA[0] # read input pin

1

System configuration>> mcu.sysclock = 1 MHz

>> uart.baudrate = 9600 bps

>> rf.power = -5 db

>> rf.speed = 1 Mbps

>> rf.config # query

{’payload’: 1, ’power’: -5,

’speed’: 1000000,

’channel’:100, ’mode’: TX’}

Periodic-task scheduling>> s = (every 50 ms: sample())

>> s.start()

>> s.stop()

Page 16: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 16

Rappit▶ Experimental platform

AVR Butterfly Board Atmel ATmega169 8-bit MCU @ 8MHz, 512B

EEPROM, 1KB SRAM, 16KB program flash

Includes dataflash, speaker, sensors, joystick, LCD

USART serial link at 9600 baud

AVR Butterfly AVR Butterfly w/ Wireless module

Page 17: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 17

Rappit▶ Experimenting metrics and modality Observation Metrics

Execution ModalityModality Approac

hProgramming Method

Native Compiled Program the firmware onto the Flash

Batch Scripting Preload a script program onto the RAM

Interactive Scripting Send one line of command to the RAM

Metric Unit

Code size Bytes

Execution Speed

Cmds/sec

Page 18: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 18

Rappit▶ Preliminary results

Code size reduction 61.8 – 66.3% reduction Scripting engine consists a

thin layer Most reduction in

application code size

Performance overhead Batch mode scripting

can be faster than native!

Observed up to 25.7% speed-up

Page 19: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 19

Outline

Scripting Framework▶ Memory-oriented Optimization

⊳Memory Optimization⊳Multi-metric Optimization

Implementation Experimental Platforms Summary & Research Plan

Page 20: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 20

Motivating Example▶ Installing Rappit primitives on Butterfly Problem Arise

Choose primitives ADC_read, RF_send,

RF_read, SD_write, SD_read, …

Compile & Install Runtime Error! Why?

exceeded 1KB RAM usage

Solution Sharing memory space Mapping static data to

dataflash

Problem Analysis

Result Increased board capability Increased application range

.data

.bss

heap

stack

SD_buffer

RF_buffer

ADC_buffer

Static strings

1KB

512B

Memory Sharing

Map to dataflash

heap

stack

Shared_buffer1KB

600B ?

static unsigned char sd_buffer[512];

static unsigned char rf_buffer[30];

static unsigned char ADC_buffer[30];

char error_msg1 = “No SD Card detected!”;

char error_msg2 = “Card Read Error!”;

SRAM

SRAM

Page 21: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 21

Data Memory Minimization▶ Assumptions and Approach

Assumptions Optimizing scripts

script size buffer size

Optimizing at runtime Need low complexity algorithm

Approach High-level optimization Using scheduling and buffer mapping

techniques Priority on data memory minimization Based on model of computation (MoC)

Page 22: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 22

Models of Computation (MoC)

Synchronous Dataflow (SDF) [E. Lee ’87]

Extensively used as specification for block-diagram based programming environments for signal processing

Special case of dataflow No notion of time The number of tokens (=data) consumed and

produced by each actor (=node) during each firing (=invocation) cycle is statically fixed.

Fractional Rate Dataflow (FRDF) [H. Oh, S. Ha ’02] Extension of SDF that allows fractional

flow of I/O samples of the original SDF

Page 23: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 23

Why SDF?

Formal representation for optimization, simulation and analysis

System-level optimization Application flow of various primitives

Static scheduling Minimize runtime overhead for resource

constrained embedded systems Deadlock detection Bounding the memory requirements

Good match for sensor applications collect data, process, transmit

Page 24: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 24

SDF▶ Notations

SDF graph G = (V, E, p, c) V: {v1, v2, … v|V|}

E: {e1, e2, … e|E|} src(e) : source node snk(e): sink node p(e) : produce rate -c(e) : consume rate

T(e,v): topology matrix p(e) if v = src(e), -c(e) if v = snk(e) 0 otherwise

v11 2 2 1 3 … 5

e1 e2 e3 … e|E|

v2 v3

T =

e1

e2

e3

…e|E|

v1 v2 v3 … v|V|

1 -2 0 … 00 2 -1 … 00 0 3 … …0 0 0 … -5

v|V|

src(e1) p(e1) c(e1) snk(e1)

e1

Page 25: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 25

SDF▶ Example

Surge Application

Actors: A, B, C Buffers: x, y Schedule: ABC Rappit Script (4L):

ADCread

RFsend

RFpack

1 1 1 1

x y

A B C

every 2048:x = ADC.read()y = RF.pack(x)RF.send(y)

Page 26: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 26

SDF▶ Example (cont’d)

Same code in Java (20L) [J. Koshy ’05]:

SurgePacket sgPkt;char eList, eVector;byte sHandle;sgPkt = new SurgePacket();evList = Select.setEventId( eList, Events.TIMEOUT | Events.RADIO RECV );sHandle = Select.requestSelectHandle();char val;Clock.startTimeout( 2048 );while (true) { eVector = Select.select(sHandle, eList); if (Select.eventOccurred( eVector, Events.TIMEOUT )) { val = PhotoSensor.sense(); sgPkt.setReading( val ); Surge.sendPacket( sgPkt ); Clock.startTimeout( 2048 ); } else if (Select.eventOccurred( eVector, Events.RADIO RECV)) { handleRadioEvent( sgPkt ); // if base, forward to uart }}

Page 27: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 27

Problem Statements

1. Find the best schedule and buffer mapping that minimizes the buffer size requirement Goal-oriented Previous work

2. Find the best schedule and buffer mapping that fits into, and maximizes the utilization of a given memory size Constraint-driven Novel Practical

Page 28: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 28

Buffer Mapping Problem▶ Spatial representation

Token-lifetime chart (t-chart) row: token’s lifetime, produced placed

consumed column: fixed number of token changes caused by

firing eventt2 t2 t2

t1 t1

t4 t4 t4

t3 t3 t3

x

ytime

localbuffer

A B B C C

Page 29: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 29

Buffer Mapping Problem▶ Spatial representation (cont’d) Memory-usage profile (m-profile)

Metrics Msize = 4, Mtotal = 20, Mused = 11, Mwasted = 9, Mutil =

55% T = 5

time

memory

A B B C C

Page 30: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 30

Related Work▶ Data memory optimization based on MoC

Technique Group IdeaOptimal Scheduling

[Bhattacharyya et al] in Ptolemy Group

Buffer minimized by optimal scheduling, optimize each local buffer

Buffer sharing by lifetime analysis

[Bhattacharyya et al] in Ptolemy Group, [Ha et al] in PeaCE group, [Ritz et al] in Meyr Group

Local buffer lifetime is analyzed to share global buffers

Buffer merging

[Bhattacharyya et al] in Ptolemy Group

Input/output buffer is shared (finer grain than buffer sharing)

Model checking

[Geilan et al] in Eindhoven Univ.

Reduced the problem to a model-checking problem on the state-space of SDF graph

Etc. (MBRO, PAPS, MRSP, …)

[Govindarajan et al] in Gao Group, [Peperstraete et al], [Goddard et al], [Ade et al] in GRAPE group

Rate-optimal / Vectorization/ Application to real-time systems / etc

Page 31: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 31

Memory Optimization Techniques

1) *Scheduling w/ Unshared Buffer 2) *Buffer Sharing3) *I/O Buffer Merging4a) **Fractionizing 4b) Rate Selection (new)5) Pipelining (new)

* Well established previous work** Recently proposed

Page 32: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 32

By efficient ordering of actors, buffer requirement is reduced! Each edge is directly mapped to its dedicated buffer space

Memory Optimization Techniques▶ 1) Scheduling with unshared buffer

A2 1 1 1

B CSchedule 1: A B B C C Schedule 2: A B C B C

x = A()repeat 2: y = B(x)repeat 2: C(y)

x = A()repeat 2: y = B(x) C(y)

x[0..1] = A()y[0] = B(x[0])y[1] = B(x[1])C(y[0])C(y[1])

x[0..1] = A()y[0] = B(x[0])C(y[0])y[0] = B(x[1])C(y[0])

Buffer requirement:

|x| + |y| = 2 + 2 = 4Buffer requirement:

|a| + |b| = 2 + 1 = 3

x y

Page 33: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 33

Memory Optimization Techniques▶ Comparing 1), 2), 3)

A2 1 1 1

B CSchedule: A B B C C

x[0..1] = A()y[0] = B(x[0])y[1] = B(x[1])C(y[0])C(y[1])

x[0..1] = A()x[0] = B(x[0])x[1] = B(x[1])C(x[0])C(x[1])

x[0..1] = A()y[0] = B(x[0])x[0] = B(x[1])C(y[0])C(x[0])

1) Unshared Buffer 2) Shared Buffer 3) Merged I/O Buffer

x = A()repeat 2: y = B(x)repeat 2: C(y)

B(x[0])

Data consumed

x[0]

Reuse the

available space!

Assuming the token is consumed

before output is

produced…

B(x[0])B(x[1])

Use the same space

for the input/outpu

t tokens

x[0]x[1]

Buffer requirement:

|x| + |y| = 2 + 2 = 4Buffer requirement:

|x| + |y| = 2 + 1 = 3Buffer requirement:

|x| + |y| = 2 + 0 = 2

x y

Page 34: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 34

Memory Optimization Techniques▶ Comparing 1), 2), 3) (cont’d)

x

ytime

localbuffer

A B B C C

t4 t4 t4

t3 t3 t3

t2 t2 t2

t1 t1

t2 t4

t1 t3

1) Unshared Buffer 2) Shared Buffer 3) Merged I/O Buffer

|x|+|y| :Mtotal :Mused :Mwasted :Mutil :

42011955%

31511473%

2109190%

Page 35: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 35

Memory Optimization Techniques▶ 4a) Fractionizing Idea:

Don’t wait until A produces big chunk of data Modify actor A to process only fractional amount of

the original data at a time Trade-off

Local effect Possible time and energy overhead

e.g., resource’s access time, packet overhead Global effect

Reduced bottleneck: shorter processing interval of A Reduced buffer size: min|x|: 2 1

A3 1

x B1 1

11/3

x

Schedule: A 3(B) Schedule: 2(AB)

A’ Bw w

Page 36: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 36

Memory Optimization Techniques▶ 4b) Rate Selection Idea

Generalize fractionizing Not only allow fractions but also multiples Rate is defined as range, but fixed before schedule

finalizes Each actor is modeled with timing and power function

with respect to the I/O range

Benefits Combines the power of flexibility and static determinism Increases buffer reduction opportunity

Challenge Need an efficient way to handle considerably increased

exploration space at runtime

(2,6)

x BA(1,3)

w(4,4)

Schedule1: 2(A)BSchedule2: ABSchedule3: 2(A)3(B)

Page 37: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 37

Memory Optimization Techniques▶ 5) Pipelining Idea

Allow multiple actor firing at once Benefits

Reduced buffer requirement Higher memory utilization Increased throughput

Challenges Need multiprocessors Need to resolve resource conflict Need to consider synchronization problem

Page 38: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 38

Memory Optimization Techniques▶ Comparing 1), 4), 5)

A’1 1 1 1

B C1/2

x

y

t3 t3

t2 t2

A B C

A2 1 1 1

B C1

x

yA B C B C

1) Unshared Buffer

t2 t2 t2 t2

t1 t1

t3 t3 t4 t4

x y

4) Fractionized / Rate Selected x y5) Pipelined

t4 t4

t1

t4

t1

A B

CC

Buffer Size:

33% reduction

Utilization: 66.7% 100%

Time: 5 4 firing unit

Page 39: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 39

Memory Optimization Techniques▶ Summary

0: None (baseline) 1: Unshared Scheduling 2: Shared Buffer 3: Merged I/O 4: Fractionized 5: Pipelined

t1 t1 t2 t2 t3 t3 t4 t4

0 1 1+2 1+2+3

1+4 1+2+4 1+2+3+4

1+4+5

M_size 4 3 3 2 2 2 1 2

M_used 11 10 10 9 8 8 6 8

M_wasted 9 5 5 1 4 4 0 0

T 5 5 5 5 6 6 6 4

M_utilization 55% 66.7%

66.7%

90% 66.7% 66.7% 100% 100%

t1 t1 t3 t3

t4 t2 t2 t4 global

A B C A B

C

Page 40: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 40

Multi-metric Optimization

Trade-offs In actor point of view

(local), processing large amount of data at once tends to reduce time and energy overhead

In SDF-flow point of view (global), processing small amount of data at once reduces buffer requirement

Goal Find a pareto-optimal

point that resides in a range of solution set that satisfies constraints

DataMemory

Energy

ExecutionTime

data-flow

rate

Page 41: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 41

Applying it to Rappit▶ Quasi-static optimization

Compile-time

Run-time

Host

Target

Compile

Load script

Preprocess

Load script code

Execute

Rappit Flow Performed Tasks

Kernel and primitives compiled and installed

SDF defined

Actor-to-processor assignment,Actor ordering (scheduling),Buffer mapping

Static schedule loaded

Deterministic executionw/o runtime overhead

Optimization

Page 42: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 42

Outline

Scripting Framework Memory-oriented Optimization▶ Implementation

⊳Synthesis Tool⊳Simulator⊳Runtime Host-assisting Tool (GUI)

Experimental Platforms Summary & Research Plan

Page 43: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 43

Implementation▶ Scripting engine synthesis tool System Template

GUI-based check-box approach easily capture existing systems model new systems for simulation and

design space exploration includes communication description

Component Library binds according to template configuration consists of MCU, on-chip devices, off-chip

peripherals each component has I/O pins and driver

modules

Page 44: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 44

Implementation▶ Memory simulator

Page 45: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 45

Implementation▶ Interactive runtime tool

Page 46: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 46

Implementation▶ Tool integration

GUI Scheduler

MemoryOptimizer

DispatcherParser

NodeManager

Node 1

Node 2

Node 3

Node N

Page 47: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 47

Outline

Scripting Framework Memory-oriented Optimization Implementation▶ Experimental Platforms Summary & Research Plan

Page 48: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 48

HW Platforms and Real-world Applications Eco

ultra-compact sensor node pre-term infant monitoring dancing motion detection

Mini-FDPM active laser sensing device breast cancer detection

DuraNode real-time data acquisition system structural health monitoring

Butterfly low-power, i/o rich development board prototyping (SD-card, speaker, sensors, RF)

Page 49: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 49

Outline

Scripting Framework Memory-oriented Optimization Implementation Experimental Platforms▶ Summary & Research Plan

Page 50: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 50

Summary

A novel scripting framework for embedded systems Scripting engine synthesis Host assisting runtime environment

Memory optimization techniques Comparison of techniques Integration and multi-objective problem

Tool Implementations Rappit GUI, memory simulator

Page 51: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 51

Contributions

Empowered Embedded Systems Unleashing the severely constrained

embedded systems

SDF Extensions Extension of SDF model Extending the application area of SDF

Memory Savings Reduced memory requirement by

integration of policies, including new techniques

Page 52: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 52

Research Plan▶ finished, ongoing, future work Framework

Language definition* Initial implementation

and prototyping Component library

generation* Code generation Overhead analysis Tool integration Test on multinode

scenario

Optimization Survey and comparison Simulator implementation Integrating techniques SDF extension on rate Rate-selection algorithm Buffer-mapping protocol Cost function modeling of

multi-metric optimization SDF extension on timing

Case Study AVR butterfly mini-FDPM eco DuraNode*with Qiang Xie & Jinfeng Liu

Page 53: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 53

Publications

Jiwon Hahn, Qiang Xie, and Pai H. Chou, Rappit: A Framework for the Synthesis of Host-Assisted Light-Weight Scripting Engines for Adaptive Embedded Systems, in Proc. International Conference on Hardware Software Codesign and System Synthesis (CODES+ISSS), 2005.

Jiwon Hahn, Dexin Li, Qiang Xie, Pai H. Chou, Nader Bagherzadeh, David W. Jensen, Alan C. Tribble, Power Reduction in JTRS Radios with ImpacctPro," in Proc. IEEE Military Communication Conference (MILCOM), 2004.

Page 54: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 54

Bibliography

Murthy PK, Shuvra S. Bhattacharyya, Buffer merging - a powerful technique for reducing memory requirements of synchronous dataflow specifications. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2004.

Murthy PK, Shuvra S. Bhattacharyya, Shared buffer implementations of signal processing systems using lifetime analysis techniques, IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems (TCADICS), 2001.

Shuvra S. Bhattacharyya., Murthy PK, Edward A. Lee, APGAN and RPMC: Complementary Heuristics for Translating DSP Block Diagrams into Efficient Software Implementations, Design Automation for Embedded Systems (DAES), 1997

Shuvra S. Bhattacharyya, Murthy PK, Edward A. Lee, Joint Minimization of Code and Data for Synchronous Dataflow Programs, 1997.

Hyunok Oh, Soonhoi Ha, Fractional rate dataflow model and efficient code synthesis for multimedia applications, SIGPLAN Not, 2002.

Hyunok Oh, Soonhoi Ha, Data memory minimization by sharing large size buffers, Asia and South Pacific Design Automation Conference (ASPDAC), 2000.

Hyunok Oh, Soonhoi Ha, Efficient Code synthesis from extended dataflow graphs for multimedia applications, Design Automation Conference (DAC), 2002.

Geilen M, Basten T, Stuijk S, Minimising buffer requirements of synchronous dataflow graphs with model checking, 42nd Design Automation Conference (DAC), 2005.

Eckart Zitzler and Jurgen Teich and Shuvra S. Bhattacharyya, Multidimensional Exploration of Software Implementations for DSP Algorithms, Journal of VLSI Signal Processing (JVLSI), 1999

John K. Ousterhout, Scripting: Higher Level Programming for the 21st Century, IEEE Computer magazine, 1998

TecO Home, http://particle.teco.edu/

Page 55: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 55

Acknowledgements

This work is sponsored in part by the National Science Foundation grant CCR-0205712 and NSF CAREER Award CNS-0448668

Professor Pai Chou Qiang Xie Jinfeng Liu

Page 56: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 56

Backup Slides

Page 57: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 57

Scripting Overhead

Scripting for General Purpose Computers Assume unlimited resources Full feature scripting engine for convenience Slower than system programming language

Scripting for Embedded Systems Limited memory, CPU, power, … Need scripting engine optimization

Host assist Language subsetting Library subsetting Efficient memory usage

Scripting may be even faster than compiled code!

Page 58: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 58

Rappit▶ Packet format example

Command Packet Format

Response Packet Format

Dst. Msg ID Opcode Input[3] Output[3] CRC

Src. Msg ID Msg Type Data Type Payload CRC EOP

Command Message Format

Response Message Format

Opcode In_addr In_start In_size Out_addr Out_start Out_size

Page 59: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 59

Rappit▶ Scripting engine optimization in code synthesis

Language subsetting eg., assignment (=), loop (repeat)

Library subsetting customized for target applications and

platform

RF SPI InterruptsGPIOUART ADC

MCUFull-Featured

Component LibraryRFInterrupts

GPIO UART

ADC Sensor1JoystickLCD Sensor1 Sensor2Dataflash

Page 60: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 60

Memory Organizations▶ Comparing previous work and Rappit Previous approaches consider both data and code

memory minimization, but prioritize code size* We mainly focus on data size** minimization

Buffer

ApplicationCode*

Buffer **

Primitives

RappitKernel

RAM

On-chip Flashor EEPROM

RAM

On-chip Flashor EEPROM

Previous work Our work

Script Code

Data Flash

Page 61: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 61

Rappit▶ Code size of runtime components

Host Code (.py)

Lines

Size (KB)

GUI 644 21.8

Cmd 127 2.87

Parser &Msg Generator

221 4.97

Library 263 6.396

Packetizer &Depacketizer

82 2.0

Packet Mgr 42 0.92

Total 1379

38.96

MCU Code (.c)

Lines Size (KB)

Interpreter 260 -

Primitives 90 -

Packetizer & Depacketizer

300 -

Total 750 1.484

Page 62: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 62

Rappit▶ Summary of results

Code size reduction

Performance overhead components analysis

Native Interactive

Batch

Communication

1 3 1

RAM Access 3 1 1

ROM Access 3 1 1

Packetization 1 2 2

Interpretation 1 2 2

Total cmd/sec 92 4.75 111

Application Native Rappit Reduction

Reg setting 4.356 KB 1.664 KB 61.8%

LCD usage 12.45 KB 4.2 KB 66.3%

1: fast

2: tolerable

3: slow

(bottleneck)

Page 63: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 63

Rappit▶ Subset of primitives

Device Primitive Device Primitive Device Primitive

MCU reset GPIO set pin Timer register fcn

MCU power save GPIO get pin Timer remove fcn

MCU initialize GPIO clear pin RTC set clock

MCU get sys clock USART TX RTC read clock

MCU set sys clock USART RX LCD clear

RF INIT SD read LCD write

RF set channel SD write LCD set contrast

RF set power ADC read Joystick get key

RF set frequency

Sensor1 read Speaker set volume

RF send Sensor2 read Speaker play tone

RF receive Sensor3 read Speaker play song

Page 64: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 64

Rappit▶ Language

key Usage Example

import import methods of each device

from RF import *

doc, dict look up documentation, included methods

RF.__doc__

RF.__dict__

open, close

open/close a connection to a target system

node1 = open(MCU1, uart1) node1.close()

ls list all connected instances ls

every,start, stop

schedule events with certain period

s1 = (every 30ms: a+= ADC1.read()); s1.start(); s1.stop()

repeat looping repeat 3:

SD.write(a)

def define of a function with a series of methods

def readTemperature(): ...

=, + assign/configure or add value a = SD.read(10); a+=SD.read(20)

Page 65: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 65

SDF▶ Strength and limitations

Strength Ability to express multi-rate systems, parallelism Deadlock detection and scheduling can be

determined at compile-time Bounded memory requirements No runtime supervisory overhead

Limitations Lack of conditional control flow Does not model asynchronous nodes Does not adequately address the real-time nature

of connections to the outside world Does not address data-dependent run times

Page 66: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 66

Superset of SDF▶ Dynamic dataflow (DDF)

Allows asynchronous actors with non-fixed rate of each actor

Captures dynamic constructs if/else for-loop do/while loop recursion

Page 67: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 67

SDF▶ Notations Firing & Tokens

f(n) : nth firing vector tk(n) : number of live tokens after nth firing tk(n+1) = tk(n) + G · f(n) f = n=0T f(n) : firing frequency q = fmin : firing vector (minimum # of firings) q(src(ei)) x p(src(ei)) = q(snk(ei)) x c(snk(ei)) balance

equation Consistent SDF

rank (G) = |N|-1 G · q = 0

Scheduling Given G, tk(0), and q, find a firing order which satisfies tk(n)

>= 0, and q = n=0T f(n) Deadlocked if no node can be fired before reaching q = n=0T

f(n)

Page 68: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 68

SDF▶ Our extensions

SDF previously used in multimedia-oriented applications targeting DSPs and FPGAs

To target more general types of applications, non-buffered edges (dummy channels) should be added, which only denotes precedence

The produce/consume rate of each actor is not given as fixed, but as a range

Add timing (future work)

Page 69: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 69

SDF▶ Another example

Extended Surge Application

Valid Schedules: 30(A) 3(B) 3(C) D 10(E) 10(F) – Flat SAS 3 (10(A) BC) D 10(EF) – SAS 30(A) 2(BC) BCD 10(EF) – Non SAS

ADCread

RFsend

Kernelpack1 10

aA C D

SDstore

SDread

LCDshow

E F

B1 1

1 10

10 11 3

1 1

d e f

bc

Page 70: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 70

SDF▶ Another example (cont’d)

Script (SAS)

enable Timer1, RF, SD, LCDevery 2048:

repeat 10:repeat 10:

a = ADC.read()LCD.show(a)SD.store(a)

repeat 10:b = SD.read()repeat 3:

c = Kernel.pack(b)

RF.send(c)

Page 71: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 71

Script-to-SDF Transform

User script

V = { A, B, C }E = { x, y } = {eAB, eBC}πinit = A2(BC)

x = A()repeat 2: y = B(x) C(y)

eA

B

p (A)

= (2, 3)

c (B)

= (1,1)

eB

C

p (B)

= (1,1)

c (C)

= (1,2)

A2/3 1/1 1/1 1/2

x yB C

Page 72: Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn, UC Irvine 72

Multimetric Optimization▶ Cost function modeling

Constraints Energy

Battery lifetime or other source of power budget Time

Deadline in given real-time application Memory

Given memory size for a platform

Each node is modeled with: Pv(c,p): power consumption w.r.t. consume/produce

rate (i.e., input/output data size) Tv(c,p): execution delay w.r.t. consume/produce

rate