memory oriented system-level optimizations for scripting enabled embedded systems

Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems

Jiwon Hahn

PhD Qualifying ExamUniversity of California, IrvineMarch 2006

Jiwon Hahn, UC Irvine 2

Motivation▶ Embedded system development Growing challenges

Increasing end-user’s expectation More functionality Higher performance Cheaper Smaller

Very short time-to-market Wide gap between available techniques

and user satisfaction

Need new tools and methodology!

physiological sensing

motion sensing structural

healthmonitoring

preterm infantmonitoring

eco node

Strategies

Speed up the development! Need better programming/debugging

methodology and tool

Improve the current system’s bottleneck! Memory unit is one of the most costly

components, and affects system’s performance, power, and overall application range

Maximize the system’s capability! Since embedded system is resource

constrained, it helps to partition the system workload to the host

About My Research

Framework Enhanced programming/debugging

methodology Host-assisting runtime environment

Optimization Reducing data memory requirements and

increasing memory utilization Power and performance co-optimization

Outline

Scripting Framework Memory-oriented Optimization Implementation Experimental Platforms Summary & Research Plan

Outline

▶ Scripting Framework⊳Scripting Engine Synthesis⊳Runtime Environment⊳Preliminary Results

Memory-oriented Optimization Implementation Experimental Platforms Summary & Research Plan

Motivating Example▶ Building a small embedded system Application

temperature sensor sense temperature, send to the host every 5 min.

Platform TecO particle

17 x 35 mm PIC18LF452 at 20 MHz 32KB program Flash 1.5KB RAM 32KB external EEPROM temperature sensor RF interface Etc.

Hardware Solder RF module

1. Write the FW (C/assembly)

Software (or Firmware) no OS support! no interactivity no partial testing

2. Compile

3. Connect board to the host

4. Enter the bootloading mode

5. Erase/Load/Verify Program

6. Restart the board

7. Run

repeat

Motivation▶ Alternative approach: Scripting! Environment Setup Scripting

1. Generate the FW (Scripting engine synthesis)

2. Compile

4. Enter the bootloading mode

5. Erase/Load/Verify Program

6. Restart the board

7. Run

1. Write the script

3. Load & Runrepeat

Scripting Engine Synthesis Runtime+

Motivation▶ Scripting vs. Traditional Programming

Aspects Traditional Scripting

Language C, Assemblyless human readable

Python, Tcl, Perl, …higher level

System Query

No interactivityneed oscilloscope, multimeter to check the status

Instant feedback

System Update

Recompile, reboot required

On-the-fly

Code Size 5x~ 10x more lines[J. Ousterhout ’98]

Shorter

Performance Overhead

None Scripting engine-dependant(could be None or less)

Related Work▶ Frameworks for runtime support

Name high level(language)

interactivity

reconfigurability

kernel synthesis

hetero. sys.

code size

SOS no (C) no yes* yes yes 20K

Mate no (asm-like) no yes* no no 39K

TinyOS no (nesC) no yes yes* no 18K

Agilla no (asm-like) yes yes* no no 55K

Pushpin no (C-subset) no yes* no (berthaOS

no 34K

Sensorware

yes* (Tcl) yes yes* no no >237K

Actornet yes* (S-expression)

N/A yes no no <128K

VM* yes (java) no yes* yes N/A 25K

Our work yes (python-like)

yes yes yes yes <17K

Our Framework: Rappit▶ Overview

H/ W Device

Device Drivers

#include <stdio.h>void main(void){ int a; . . For(i=0;i<2;i++) { . a =b * c; } . . return;}

Rappit F/ W

ApplicationScript

Target SystemHostRappit S/W

Wired/Wireless link

Framework to provide user an integrated scripting environment of the host and target systems

>> readTemperature()

Receive packets

Interpret the command

Execute primitives

(e.g., ADC read)

Return the result

Rappit▶ Scripting engine synthesis

ComponentLibrary

CodeSynthesis

Target F/W(Scripting

Engine,Primitives,…)

Architecture Application Communication

CompatibleMessage format

Interactive

Language

Binary

Executable

Target System

Host S/W(Parser, MsgGen,GUI, …)

# example: pin mapping for an RF modulemcu = MCU(ATmega169) # instantiate an atmega169 MCUimport RF # load a transceiver modulerf = RF(nRF2401) # instantiate nRF2401rf.CS = mcu.PORTB[0] # connect the chip select pinrf.CE = mcu.PORTB[1] # connect the chip enable pinrf.DR1 = mcu.PORTB[2] # connect the data ready pinrf.CLK1 = mcu.PORTF[1] # connect the clock pinrf.DOUT1 = mcu.PORTF[2] # connect the data pin

# example: packet formatc_format = src(1),dst(1),msgID(1),opcode(1),arg(3),crc(1)r_format = src(1),dst(1),msgID(1),mtype(1),dtype(1),\

data(v), crc(1),eop(1)

System Description

// part of Scripting engineswitch (opcode){

case 0x00: val =

ADC_read();case 0x01: RF_send(val);case 0x02:

RF_packetize(val);…

// part of primitiveschar ADC_read(void){ …}

void RF_send(char pck){ …}

Rappit▶ Runtime environment

timize

ComponentLibrary

PacketManager

Pcktze

rScriptingEngine

AdmissionController

Native Routines

commandresponse

Target System

Pcktze

Host Assisting modules

Rappit▶ Host assistance

Script Parsing (Parser)

Memory Management (Optimizer)

“readTemp()” Host Parser,Msg. generator

“0x4A0x01”

• User friendly

Syntax

• Easy to parse at node• Compact and efficient

representation

Script Scheduler, Buffer Mapper

Raw script

• Written by user

Optimized script

• Minimal script size• Minimized memory usage • Minimized runtime overhead

(Fixed schedule and buffer usage)

To target node

Rappit▶ Scripting examples

Interactive port-setting>> PORTA[2] = 1 # toggle clock

>> PORTA[2] = 0

>> PORTA[1] = 1 # set port A pin 1

>> PORTA[0] # read input pin

>> PORTA[2] = 1

>> PORTA[2] = 0 # toggle clock

>> PORTA[0] # read input pin

System configuration>> mcu.sysclock = 1 MHz

>> uart.baudrate = 9600 bps

>> rf.power = -5 db

>> rf.speed = 1 Mbps

>> rf.config # query

{’payload’: 1, ’power’: -5,

’speed’: 1000000,

’channel’:100, ’mode’: TX’}

Periodic-task scheduling>> s = (every 50 ms: sample())

>> s.start()

>> s.stop()

Rappit▶ Experimental platform

AVR Butterfly Board Atmel ATmega169 8-bit MCU @ 8MHz, 512B

EEPROM, 1KB SRAM, 16KB program flash

Includes dataflash, speaker, sensors, joystick, LCD

USART serial link at 9600 baud

AVR Butterfly AVR Butterfly w/ Wireless module

Rappit▶ Experimenting metrics and modality Observation Metrics

Execution ModalityModality Approac

hProgramming Method

Native Compiled Program the firmware onto the Flash

Batch Scripting Preload a script program onto the RAM

Interactive Scripting Send one line of command to the RAM

Metric Unit

Code size Bytes

Execution Speed

Cmds/sec

Rappit▶ Preliminary results

Code size reduction 61.8 – 66.3% reduction Scripting engine consists a

thin layer Most reduction in

application code size

Performance overhead Batch mode scripting

can be faster than native!

Observed up to 25.7% speed-up

Outline

Scripting Framework▶ Memory-oriented Optimization

⊳Memory Optimization⊳Multi-metric Optimization

Implementation Experimental Platforms Summary & Research Plan

Motivating Example▶ Installing Rappit primitives on Butterfly Problem Arise

Choose primitives ADC_read, RF_send,

RF_read, SD_write, SD_read, …

Compile & Install Runtime Error! Why?

exceeded 1KB RAM usage

Solution Sharing memory space Mapping static data to

dataflash

Problem Analysis

Result Increased board capability Increased application range

SD_buffer

RF_buffer

ADC_buffer

Static strings

Memory Sharing

Map to dataflash

Shared_buffer1KB

600B ?

static unsigned char sd_buffer[512];

static unsigned char rf_buffer[30];

static unsigned char ADC_buffer[30];

char error_msg1 = “No SD Card detected!”;

char error_msg2 = “Card Read Error!”;

Data Memory Minimization▶ Assumptions and Approach

Assumptions Optimizing scripts

script size buffer size

Optimizing at runtime Need low complexity algorithm

Approach High-level optimization Using scheduling and buffer mapping

techniques Priority on data memory minimization Based on model of computation (MoC)

Models of Computation (MoC)

Synchronous Dataflow (SDF) [E. Lee ’87]

Extensively used as specification for block-diagram based programming environments for signal processing

Special case of dataflow No notion of time The number of tokens (=data) consumed and

produced by each actor (=node) during each firing (=invocation) cycle is statically fixed.

Fractional Rate Dataflow (FRDF) [H. Oh, S. Ha ’02] Extension of SDF that allows fractional

flow of I/O samples of the original SDF

Why SDF?

Formal representation for optimization, simulation and analysis

System-level optimization Application flow of various primitives

Static scheduling Minimize runtime overhead for resource

constrained embedded systems Deadlock detection Bounding the memory requirements

Good match for sensor applications collect data, process, transmit

SDF▶ Notations

SDF graph G = (V, E, p, c) V: {v1, v2, … v|V|}

E: {e1, e2, … e|E|} src(e) : source node snk(e): sink node p(e) : produce rate -c(e) : consume rate

T(e,v): topology matrix p(e) if v = src(e), -c(e) if v = snk(e) 0 otherwise

v11 2 2 1 3 … 5

e1 e2 e3 … e|E|

…e|E|

v1 v2 v3 … v|V|

1 -2 0 … 00 2 -1 … 00 0 3 … …0 0 0 … -5

src(e1) p(e1) c(e1) snk(e1)

SDF▶ Example

Surge Application

Actors: A, B, C Buffers: x, y Schedule: ABC Rappit Script (4L):

ADCread

RFsend

RFpack

1 1 1 1

every 2048:x = ADC.read()y = RF.pack(x)RF.send(y)

SDF▶ Example (cont’d)

Same code in Java (20L) [J. Koshy ’05]:

SurgePacket sgPkt;char eList, eVector;byte sHandle;sgPkt = new SurgePacket();evList = Select.setEventId( eList, Events.TIMEOUT | Events.RADIO RECV );sHandle = Select.requestSelectHandle();char val;Clock.startTimeout( 2048 );while (true) { eVector = Select.select(sHandle, eList); if (Select.eventOccurred( eVector, Events.TIMEOUT )) { val = PhotoSensor.sense(); sgPkt.setReading( val ); Surge.sendPacket( sgPkt ); Clock.startTimeout( 2048 ); } else if (Select.eventOccurred( eVector, Events.RADIO RECV)) { handleRadioEvent( sgPkt ); // if base, forward to uart }}

Problem Statements

1. Find the best schedule and buffer mapping that minimizes the buffer size requirement Goal-oriented Previous work

2. Find the best schedule and buffer mapping that fits into, and maximizes the utilization of a given memory size Constraint-driven Novel Practical

Buffer Mapping Problem▶ Spatial representation

Token-lifetime chart (t-chart) row: token’s lifetime, produced placed

consumed column: fixed number of token changes caused by

firing eventt2 t2 t2

t4 t4 t4

t3 t3 t3

localbuffer

A B B C C

Buffer Mapping Problem▶ Spatial representation (cont’d) Memory-usage profile (m-profile)

Metrics Msize = 4, Mtotal = 20, Mused = 11, Mwasted = 9, Mutil =

55% T = 5

memory

A B B C C

Related Work▶ Data memory optimization based on MoC

Technique Group IdeaOptimal Scheduling

[Bhattacharyya et al] in Ptolemy Group

Buffer minimized by optimal scheduling, optimize each local buffer

Buffer sharing by lifetime analysis

[Bhattacharyya et al] in Ptolemy Group, [Ha et al] in PeaCE group, [Ritz et al] in Meyr Group

Local buffer lifetime is analyzed to share global buffers

Buffer merging

[Bhattacharyya et al] in Ptolemy Group

Input/output buffer is shared (finer grain than buffer sharing)

Model checking

[Geilan et al] in Eindhoven Univ.

Reduced the problem to a model-checking problem on the state-space of SDF graph

Etc. (MBRO, PAPS, MRSP, …)

[Govindarajan et al] in Gao Group, [Peperstraete et al], [Goddard et al], [Ade et al] in GRAPE group

Rate-optimal / Vectorization/ Application to real-time systems / etc

Memory Optimization Techniques

1) *Scheduling w/ Unshared Buffer 2) *Buffer Sharing3) *I/O Buffer Merging4a) **Fractionizing 4b) Rate Selection (new)5) Pipelining (new)

* Well established previous work** Recently proposed

By efficient ordering of actors, buffer requirement is reduced! Each edge is directly mapped to its dedicated buffer space

Memory Optimization Techniques▶ 1) Scheduling with unshared buffer

A2 1 1 1

B CSchedule 1: A B B C C Schedule 2: A B C B C

x = A()repeat 2: y = B(x)repeat 2: C(y)

x = A()repeat 2: y = B(x) C(y)

x[0..1] = A()y[0] = B(x[0])y[1] = B(x[1])C(y[0])C(y[1])

x[0..1] = A()y[0] = B(x[0])C(y[0])y[0] = B(x[1])C(y[0])

Buffer requirement:

|x| + |y| = 2 + 2 = 4Buffer requirement:

|a| + |b| = 2 + 1 = 3

Memory Optimization Techniques▶ Comparing 1), 2), 3)

A2 1 1 1

B CSchedule: A B B C C

x[0..1] = A()y[0] = B(x[0])y[1] = B(x[1])C(y[0])C(y[1])

x[0..1] = A()x[0] = B(x[0])x[1] = B(x[1])C(x[0])C(x[1])

x[0..1] = A()y[0] = B(x[0])x[0] = B(x[1])C(y[0])C(x[0])

1) Unshared Buffer 2) Shared Buffer 3) Merged I/O Buffer

x = A()repeat 2: y = B(x)repeat 2: C(y)

B(x[0])

Data consumed

Reuse the

available space!

Assuming the token is consumed

before output is

produced…

B(x[0])B(x[1])

Use the same space

for the input/outpu

t tokens

x[0]x[1]

Buffer requirement:

|x| + |y| = 2 + 0 = 2

Memory Optimization Techniques▶ Comparing 1), 2), 3) (cont’d)

localbuffer

A B B C C

t4 t4 t4

t3 t3 t3

t2 t2 t2

1) Unshared Buffer 2) Shared Buffer 3) Merged I/O Buffer

|x|+|y| :Mtotal :Mused :Mwasted :Mutil :

42011955%

31511473%

2109190%

Memory Optimization Techniques▶ 4a) Fractionizing Idea:

Don’t wait until A produces big chunk of data Modify actor A to process only fractional amount of

the original data at a time Trade-off

Local effect Possible time and energy overhead

e.g., resource’s access time, packet overhead Global effect

Reduced bottleneck: shorter processing interval of A Reduced buffer size: min|x|: 2 1

x B1 1

Schedule: A 3(B) Schedule: 2(AB)

A’ Bw w

Memory Optimization Techniques▶ 4b) Rate Selection Idea

Generalize fractionizing Not only allow fractions but also multiples Rate is defined as range, but fixed before schedule

finalizes Each actor is modeled with timing and power function

with respect to the I/O range

Benefits Combines the power of flexibility and static determinism Increases buffer reduction opportunity

Challenge Need an efficient way to handle considerably increased

exploration space at runtime

x BA(1,3)

w(4,4)

Schedule1: 2(A)BSchedule2: ABSchedule3: 2(A)3(B)

Memory Optimization Techniques▶ 5) Pipelining Idea

Allow multiple actor firing at once Benefits

Reduced buffer requirement Higher memory utilization Increased throughput

Challenges Need multiprocessors Need to resolve resource conflict Need to consider synchronization problem

Memory Optimization Techniques▶ Comparing 1), 4), 5)

A’1 1 1 1

B C1/2

A2 1 1 1

yA B C B C

1) Unshared Buffer

t2 t2 t2 t2

t3 t3 t4 t4

4) Fractionized / Rate Selected x y5) Pipelined

Buffer Size:

33% reduction

Utilization: 66.7% 100%

Time: 5 4 firing unit

Memory Optimization Techniques▶ Summary

0: None (baseline) 1: Unshared Scheduling 2: Shared Buffer 3: Merged I/O 4: Fractionized 5: Pipelined

t1 t1 t2 t2 t3 t3 t4 t4

0 1 1+2 1+2+3

1+4 1+2+4 1+2+3+4

M_size 4 3 3 2 2 2 1 2

M_used 11 10 10 9 8 8 6 8

M_wasted 9 5 5 1 4 4 0 0

T 5 5 5 5 6 6 6 4

M_utilization 55% 66.7%

90% 66.7% 66.7% 100% 100%

t1 t1 t3 t3

t4 t2 t2 t4 global

A B C A B

Multi-metric Optimization

Trade-offs In actor point of view

(local), processing large amount of data at once tends to reduce time and energy overhead

In SDF-flow point of view (global), processing small amount of data at once reduces buffer requirement

Goal Find a pareto-optimal

point that resides in a range of solution set that satisfies constraints

DataMemory

Energy

ExecutionTime

data-flow

Applying it to Rappit▶ Quasi-static optimization

Compile-time

Run-time

Target

Compile

Load script

Preprocess

Load script code

Execute

Rappit Flow Performed Tasks

Kernel and primitives compiled and installed

SDF defined

Actor-to-processor assignment,Actor ordering (scheduling),Buffer mapping

Static schedule loaded

Deterministic executionw/o runtime overhead

Optimization

Outline

Scripting Framework Memory-oriented Optimization▶ Implementation

⊳Synthesis Tool⊳Simulator⊳Runtime Host-assisting Tool (GUI)

Experimental Platforms Summary & Research Plan

Implementation▶ Scripting engine synthesis tool System Template

GUI-based check-box approach easily capture existing systems model new systems for simulation and

design space exploration includes communication description

Component Library binds according to template configuration consists of MCU, on-chip devices, off-chip

peripherals each component has I/O pins and driver

modules

Implementation▶ Memory simulator

Implementation▶ Interactive runtime tool

Implementation▶ Tool integration

GUI Scheduler

MemoryOptimizer

DispatcherParser

NodeManager

Node 1

Node 2

Node 3

Node N

Outline

Scripting Framework Memory-oriented Optimization Implementation▶ Experimental Platforms Summary & Research Plan

HW Platforms and Real-world Applications Eco

ultra-compact sensor node pre-term infant monitoring dancing motion detection

Mini-FDPM active laser sensing device breast cancer detection

DuraNode real-time data acquisition system structural health monitoring

Butterfly low-power, i/o rich development board prototyping (SD-card, speaker, sensors, RF)

Outline

Scripting Framework Memory-oriented Optimization Implementation Experimental Platforms▶ Summary & Research Plan

Summary

A novel scripting framework for embedded systems Scripting engine synthesis Host assisting runtime environment

Memory optimization techniques Comparison of techniques Integration and multi-objective problem

Tool Implementations Rappit GUI, memory simulator

Contributions

Empowered Embedded Systems Unleashing the severely constrained

embedded systems

SDF Extensions Extension of SDF model Extending the application area of SDF

Memory Savings Reduced memory requirement by

integration of policies, including new techniques

Research Plan▶ finished, ongoing, future work Framework

Language definition* Initial implementation

and prototyping Component library

generation* Code generation Overhead analysis Tool integration Test on multinode

scenario

Optimization Survey and comparison Simulator implementation Integrating techniques SDF extension on rate Rate-selection algorithm Buffer-mapping protocol Cost function modeling of

multi-metric optimization SDF extension on timing

Case Study AVR butterfly mini-FDPM eco DuraNode*with Qiang Xie & Jinfeng Liu

Publications

Jiwon Hahn, Qiang Xie, and Pai H. Chou, Rappit: A Framework for the Synthesis of Host-Assisted Light-Weight Scripting Engines for Adaptive Embedded Systems, in Proc. International Conference on Hardware Software Codesign and System Synthesis (CODES+ISSS), 2005.

Jiwon Hahn, Dexin Li, Qiang Xie, Pai H. Chou, Nader Bagherzadeh, David W. Jensen, Alan C. Tribble, Power Reduction in JTRS Radios with ImpacctPro," in Proc. IEEE Military Communication Conference (MILCOM), 2004.

Bibliography

Murthy PK, Shuvra S. Bhattacharyya, Buffer merging - a powerful technique for reducing memory requirements of synchronous dataflow specifications. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2004.

Murthy PK, Shuvra S. Bhattacharyya, Shared buffer implementations of signal processing systems using lifetime analysis techniques, IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems (TCADICS), 2001.

Shuvra S. Bhattacharyya., Murthy PK, Edward A. Lee, APGAN and RPMC: Complementary Heuristics for Translating DSP Block Diagrams into Efficient Software Implementations, Design Automation for Embedded Systems (DAES), 1997

Shuvra S. Bhattacharyya, Murthy PK, Edward A. Lee, Joint Minimization of Code and Data for Synchronous Dataflow Programs, 1997.

Hyunok Oh, Soonhoi Ha, Fractional rate dataflow model and efficient code synthesis for multimedia applications, SIGPLAN Not, 2002.

Hyunok Oh, Soonhoi Ha, Data memory minimization by sharing large size buffers, Asia and South Pacific Design Automation Conference (ASPDAC), 2000.

Hyunok Oh, Soonhoi Ha, Efficient Code synthesis from extended dataflow graphs for multimedia applications, Design Automation Conference (DAC), 2002.

Geilen M, Basten T, Stuijk S, Minimising buffer requirements of synchronous dataflow graphs with model checking, 42nd Design Automation Conference (DAC), 2005.

Eckart Zitzler and Jurgen Teich and Shuvra S. Bhattacharyya, Multidimensional Exploration of Software Implementations for DSP Algorithms, Journal of VLSI Signal Processing (JVLSI), 1999

John K. Ousterhout, Scripting: Higher Level Programming for the 21st Century, IEEE Computer magazine, 1998

TecO Home, http://particle.teco.edu/

Acknowledgements

This work is sponsored in part by the National Science Foundation grant CCR-0205712 and NSF CAREER Award CNS-0448668

Professor Pai Chou Qiang Xie Jinfeng Liu

Backup Slides

Scripting Overhead

Scripting for General Purpose Computers Assume unlimited resources Full feature scripting engine for convenience Slower than system programming language

Scripting for Embedded Systems Limited memory, CPU, power, … Need scripting engine optimization

Host assist Language subsetting Library subsetting Efficient memory usage

Scripting may be even faster than compiled code!

Rappit▶ Packet format example

Command Packet Format

Response Packet Format

Dst. Msg ID Opcode Input[3] Output[3] CRC

Src. Msg ID Msg Type Data Type Payload CRC EOP

Command Message Format

Response Message Format

Opcode In_addr In_start In_size Out_addr Out_start Out_size

Rappit▶ Scripting engine optimization in code synthesis

Language subsetting eg., assignment (=), loop (repeat)

Library subsetting customized for target applications and

platform

RF SPI InterruptsGPIOUART ADC

MCUFull-Featured

Component LibraryRFInterrupts

GPIO UART

ADC Sensor1JoystickLCD Sensor1 Sensor2Dataflash

Memory Organizations▶ Comparing previous work and Rappit Previous approaches consider both data and code

memory minimization, but prioritize code size* We mainly focus on data size** minimization

Buffer

ApplicationCode*

Buffer **

Primitives

RappitKernel

On-chip Flashor EEPROM

Previous work Our work

Script Code

Data Flash

Rappit▶ Code size of runtime components

Host Code (.py)

Size (KB)

GUI 644 21.8

Cmd 127 2.87

Parser &Msg Generator

221 4.97

Library 263 6.396

Packetizer &Depacketizer

82 2.0

Packet Mgr 42 0.92

Total 1379

MCU Code (.c)

Lines Size (KB)

Interpreter 260 -

Primitives 90 -

Packetizer & Depacketizer

Total 750 1.484

Rappit▶ Summary of results

Code size reduction

Performance overhead components analysis

Native Interactive

Communication

RAM Access 3 1 1

ROM Access 3 1 1

Packetization 1 2 2

Interpretation 1 2 2

Total cmd/sec 92 4.75 111

Application Native Rappit Reduction

Reg setting 4.356 KB 1.664 KB 61.8%

LCD usage 12.45 KB 4.2 KB 66.3%

1: fast

2: tolerable

3: slow

(bottleneck)

Rappit▶ Subset of primitives

Device Primitive Device Primitive Device Primitive

MCU reset GPIO set pin Timer register fcn

MCU power save GPIO get pin Timer remove fcn

MCU initialize GPIO clear pin RTC set clock

MCU get sys clock USART TX RTC read clock

MCU set sys clock USART RX LCD clear

RF INIT SD read LCD write

RF set channel SD write LCD set contrast

RF set power ADC read Joystick get key

RF set frequency

Sensor1 read Speaker set volume

RF send Sensor2 read Speaker play tone

RF receive Sensor3 read Speaker play song

Rappit▶ Language

key Usage Example

import import methods of each device

from RF import *

doc, dict look up documentation, included methods

RF.__doc__

RF.__dict__

open, close

open/close a connection to a target system

node1 = open(MCU1, uart1) node1.close()

ls list all connected instances ls

every,start, stop

schedule events with certain period

s1 = (every 30ms: a+= ADC1.read()); s1.start(); s1.stop()

repeat looping repeat 3:

SD.write(a)

def define of a function with a series of methods

def readTemperature(): ...

=, + assign/configure or add value a = SD.read(10); a+=SD.read(20)

SDF▶ Strength and limitations

Strength Ability to express multi-rate systems, parallelism Deadlock detection and scheduling can be

determined at compile-time Bounded memory requirements No runtime supervisory overhead

Limitations Lack of conditional control flow Does not model asynchronous nodes Does not adequately address the real-time nature

of connections to the outside world Does not address data-dependent run times

Superset of SDF▶ Dynamic dataflow (DDF)

Allows asynchronous actors with non-fixed rate of each actor

Captures dynamic constructs if/else for-loop do/while loop recursion

SDF▶ Notations Firing & Tokens

f(n) : nth firing vector tk(n) : number of live tokens after nth firing tk(n+1) = tk(n) + G · f(n) f = n=0T f(n) : firing frequency q = fmin : firing vector (minimum # of firings) q(src(ei)) x p(src(ei)) = q(snk(ei)) x c(snk(ei)) balance

equation Consistent SDF

rank (G) = |N|-1 G · q = 0

Scheduling Given G, tk(0), and q, find a firing order which satisfies tk(n)

>= 0, and q = n=0T f(n) Deadlocked if no node can be fired before reaching q = n=0T

SDF▶ Our extensions

SDF previously used in multimedia-oriented applications targeting DSPs and FPGAs

To target more general types of applications, non-buffered edges (dummy channels) should be added, which only denotes precedence

The produce/consume rate of each actor is not given as fixed, but as a range

Add timing (future work)

SDF▶ Another example

Extended Surge Application

Valid Schedules: 30(A) 3(B) 3(C) D 10(E) 10(F) – Flat SAS 3 (10(A) BC) D 10(EF) – SAS 30(A) 2(BC) BCD 10(EF) – Non SAS

ADCread

RFsend

Kernelpack1 10

aA C D

SDstore

SDread

LCDshow

10 11 3

SDF▶ Another example (cont’d)

Script (SAS)

enable Timer1, RF, SD, LCDevery 2048:

repeat 10:repeat 10:

a = ADC.read()LCD.show(a)SD.store(a)

repeat 10:b = SD.read()repeat 3:

c = Kernel.pack(b)

RF.send(c)

Script-to-SDF Transform

User script

V = { A, B, C }E = { x, y } = {eAB, eBC}πinit = A2(BC)

x = A()repeat 2: y = B(x) C(y)

= (2, 3)

= (1,1)

= (1,2)

A2/3 1/1 1/1 1/2

x yB C

Multimetric Optimization▶ Cost function modeling

Constraints Energy

Battery lifetime or other source of power budget Time

Deadline in given real-time application Memory

Given memory size for a platform

Each node is modeled with: Pv(c,p): power consumption w.r.t. consume/produce

rate (i.e., input/output data size) Tv(c,p): execution delay w.r.t. consume/produce

memory oriented system-level optimizations for scripting enabled embedded systems

systems performance

memory utilizationpower

systems capability

memory unit

system workload

current systems bottleneck

environment setupscripting1

yesyes20kmateno asmlikenoyes

Documents

de-optimizations attack!!!

breakthrough ultrascale+ device performance with ......this...

web optimizations

intraprocedural optimizations

presentation tier optimizations

scripting languages. client side scripting languages server...

resource optimizations for broadcast enabled networks

tensorflow graph optimizations

intelligent system optimizations

z-buffer optimizations

scripting enabled - how to make the web more accessible with...

interconnect optimizations

hp proliant sl165z generation 7 - nts computers · 2010. 9....

optimizations and tradeoffs

energy optimizations eaf

memory oriented system-level optimizations for scripting...

local optimizations

global optimizations

interprocedural optimizations

why i hate the interweb - kath moonan at scripting enabled