memory oriented system-level optimizations for scripting enabled embedded systems
Post on 31-Dec-2015
31 Views
Preview:
DESCRIPTION
TRANSCRIPT
Memory Oriented System-level Optimizations for Scripting Enabled Embedded Systems
Jiwon Hahn
PhD Qualifying ExamUniversity of California, IrvineMarch 2006
Jiwon Hahn, UC Irvine 2
Motivation▶ Embedded system development Growing challenges
Increasing end-user’s expectation More functionality Higher performance Cheaper Smaller
Very short time-to-market Wide gap between available techniques
and user satisfaction
Need new tools and methodology!
physiological sensing
motion sensing structural
healthmonitoring
preterm infantmonitoring
eco node
Jiwon Hahn, UC Irvine 3
Strategies
Speed up the development! Need better programming/debugging
methodology and tool
Improve the current system’s bottleneck! Memory unit is one of the most costly
components, and affects system’s performance, power, and overall application range
Maximize the system’s capability! Since embedded system is resource
constrained, it helps to partition the system workload to the host
Jiwon Hahn, UC Irvine 4
About My Research
Framework Enhanced programming/debugging
methodology Host-assisting runtime environment
Optimization Reducing data memory requirements and
increasing memory utilization Power and performance co-optimization
Jiwon Hahn, UC Irvine 5
Outline
Scripting Framework Memory-oriented Optimization Implementation Experimental Platforms Summary & Research Plan
Jiwon Hahn, UC Irvine 6
Outline
▶ Scripting Framework⊳Scripting Engine Synthesis⊳Runtime Environment⊳Preliminary Results
Memory-oriented Optimization Implementation Experimental Platforms Summary & Research Plan
Jiwon Hahn, UC Irvine 7
Motivating Example▶ Building a small embedded system Application
temperature sensor sense temperature, send to the host every 5 min.
Platform TecO particle
17 x 35 mm PIC18LF452 at 20 MHz 32KB program Flash 1.5KB RAM 32KB external EEPROM temperature sensor RF interface Etc.
Hardware Solder RF module
1. Write the FW (C/assembly)
Software (or Firmware) no OS support! no interactivity no partial testing
2. Compile
3. Connect board to the host
4. Enter the bootloading mode
5. Erase/Load/Verify Program
6. Restart the board
7. Run
repeat
Jiwon Hahn, UC Irvine 8
Motivation▶ Alternative approach: Scripting! Environment Setup Scripting
1. Generate the FW (Scripting engine synthesis)
2. Compile
3. Connect board to the host
4. Enter the bootloading mode
5. Erase/Load/Verify Program
6. Restart the board
7. Run
1. Write the script
2. Connect board to the host
3. Load & Runrepeat
Scripting Engine Synthesis Runtime+
Jiwon Hahn, UC Irvine 9
Motivation▶ Scripting vs. Traditional Programming
Aspects Traditional Scripting
Language C, Assemblyless human readable
Python, Tcl, Perl, …higher level
System Query
No interactivityneed oscilloscope, multimeter to check the status
Instant feedback
System Update
Recompile, reboot required
On-the-fly
Code Size 5x~ 10x more lines[J. Ousterhout ’98]
Shorter
Performance Overhead
None Scripting engine-dependant(could be None or less)
Jiwon Hahn, UC Irvine 10
Related Work▶ Frameworks for runtime support
Name high level(language)
interactivity
reconfigurability
kernel synthesis
hetero. sys.
code size
SOS no (C) no yes* yes yes 20K
Mate no (asm-like) no yes* no no 39K
TinyOS no (nesC) no yes yes* no 18K
Agilla no (asm-like) yes yes* no no 55K
Pushpin no (C-subset) no yes* no (berthaOS
)
no 34K
Sensorware
yes* (Tcl) yes yes* no no >237K
Actornet yes* (S-expression)
N/A yes no no <128K
VM* yes (java) no yes* yes N/A 25K
Our work yes (python-like)
yes yes yes yes <17K
Jiwon Hahn, UC Irvine 11
Our Framework: Rappit▶ Overview
H/ W Device
Device Drivers
#include <stdio.h>void main(void){ int a; . . For(i=0;i<2;i++) { . a =b * c; } . . return;}
Rappit F/ W
ApplicationScript
Target SystemHostRappit S/W
Wired/Wireless link
Framework to provide user an integrated scripting environment of the host and target systems
>> readTemperature()
Receive packets
Interpret the command
Execute primitives
(e.g., ADC read)
Return the result
52
Jiwon Hahn, UC Irvine 12
Rappit▶ Scripting engine synthesis
ComponentLibrary
CodeSynthesis
Target F/W(Scripting
Engine,Primitives,…)
Architecture Application Communication
CompatibleMessage format
Interactive
Language
Binary
Executable
Host
Target System
Host S/W(Parser, MsgGen,GUI, …)
# example: pin mapping for an RF modulemcu = MCU(ATmega169) # instantiate an atmega169 MCUimport RF # load a transceiver modulerf = RF(nRF2401) # instantiate nRF2401rf.CS = mcu.PORTB[0] # connect the chip select pinrf.CE = mcu.PORTB[1] # connect the chip enable pinrf.DR1 = mcu.PORTB[2] # connect the data ready pinrf.CLK1 = mcu.PORTF[1] # connect the clock pinrf.DOUT1 = mcu.PORTF[2] # connect the data pin
# example: packet formatc_format = src(1),dst(1),msgID(1),opcode(1),arg(3),crc(1)r_format = src(1),dst(1),msgID(1),mtype(1),dtype(1),\
data(v), crc(1),eop(1)
System Description
// part of Scripting engineswitch (opcode){
case 0x00: val =
ADC_read();case 0x01: RF_send(val);case 0x02:
RF_packetize(val);…
}
// part of primitiveschar ADC_read(void){ …}
void RF_send(char pck){ …}
Jiwon Hahn, UC Irvine 13
Rappit▶ Runtime environment
Pars
er
Op
timize
r
Pars
er
Op
timize
r
GUI
ComponentLibrary
PacketManager
Pcktze
r/D
ep
cktze
rScriptingEngine
AdmissionController
Native Routines
Host
commandresponse
Target System
Msg
Gen
era
tor
Pck
Bu
ffer
Pcktze
r/D
isp
atc
her
Host Assisting modules
Jiwon Hahn, UC Irvine 14
Rappit▶ Host assistance
Script Parsing (Parser)
Memory Management (Optimizer)
“readTemp()” Host Parser,Msg. generator
“0x4A0x01”
• User friendly
Syntax
• Easy to parse at node• Compact and efficient
representation
Script Scheduler, Buffer Mapper
Raw script
• Written by user
Optimized script
• Minimal script size• Minimized memory usage • Minimized runtime overhead
(Fixed schedule and buffer usage)
To target node
To target node
Jiwon Hahn, UC Irvine 15
Rappit▶ Scripting examples
Interactive port-setting>> PORTA[2] = 1 # toggle clock
>> PORTA[2] = 0
>> PORTA[1] = 1 # set port A pin 1
>> PORTA[0] # read input pin
0
>> PORTA[2] = 1
>> PORTA[2] = 0 # toggle clock
>> PORTA[0] # read input pin
1
System configuration>> mcu.sysclock = 1 MHz
>> uart.baudrate = 9600 bps
>> rf.power = -5 db
>> rf.speed = 1 Mbps
>> rf.config # query
{’payload’: 1, ’power’: -5,
’speed’: 1000000,
’channel’:100, ’mode’: TX’}
Periodic-task scheduling>> s = (every 50 ms: sample())
>> s.start()
>> s.stop()
Jiwon Hahn, UC Irvine 16
Rappit▶ Experimental platform
AVR Butterfly Board Atmel ATmega169 8-bit MCU @ 8MHz, 512B
EEPROM, 1KB SRAM, 16KB program flash
Includes dataflash, speaker, sensors, joystick, LCD
USART serial link at 9600 baud
AVR Butterfly AVR Butterfly w/ Wireless module
Jiwon Hahn, UC Irvine 17
Rappit▶ Experimenting metrics and modality Observation Metrics
Execution ModalityModality Approac
hProgramming Method
Native Compiled Program the firmware onto the Flash
Batch Scripting Preload a script program onto the RAM
Interactive Scripting Send one line of command to the RAM
Metric Unit
Code size Bytes
Execution Speed
Cmds/sec
Jiwon Hahn, UC Irvine 18
Rappit▶ Preliminary results
Code size reduction 61.8 – 66.3% reduction Scripting engine consists a
thin layer Most reduction in
application code size
Performance overhead Batch mode scripting
can be faster than native!
Observed up to 25.7% speed-up
Jiwon Hahn, UC Irvine 19
Outline
Scripting Framework▶ Memory-oriented Optimization
⊳Memory Optimization⊳Multi-metric Optimization
Implementation Experimental Platforms Summary & Research Plan
Jiwon Hahn, UC Irvine 20
Motivating Example▶ Installing Rappit primitives on Butterfly Problem Arise
Choose primitives ADC_read, RF_send,
RF_read, SD_write, SD_read, …
Compile & Install Runtime Error! Why?
exceeded 1KB RAM usage
Solution Sharing memory space Mapping static data to
dataflash
Problem Analysis
Result Increased board capability Increased application range
.data
.bss
heap
stack
SD_buffer
RF_buffer
ADC_buffer
Static strings
1KB
512B
Memory Sharing
Map to dataflash
heap
stack
Shared_buffer1KB
600B ?
static unsigned char sd_buffer[512];
static unsigned char rf_buffer[30];
static unsigned char ADC_buffer[30];
…
char error_msg1 = “No SD Card detected!”;
char error_msg2 = “Card Read Error!”;
…
SRAM
SRAM
Jiwon Hahn, UC Irvine 21
Data Memory Minimization▶ Assumptions and Approach
Assumptions Optimizing scripts
script size buffer size
Optimizing at runtime Need low complexity algorithm
Approach High-level optimization Using scheduling and buffer mapping
techniques Priority on data memory minimization Based on model of computation (MoC)
Jiwon Hahn, UC Irvine 22
Models of Computation (MoC)
Synchronous Dataflow (SDF) [E. Lee ’87]
Extensively used as specification for block-diagram based programming environments for signal processing
Special case of dataflow No notion of time The number of tokens (=data) consumed and
produced by each actor (=node) during each firing (=invocation) cycle is statically fixed.
Fractional Rate Dataflow (FRDF) [H. Oh, S. Ha ’02] Extension of SDF that allows fractional
flow of I/O samples of the original SDF
Jiwon Hahn, UC Irvine 23
Why SDF?
Formal representation for optimization, simulation and analysis
System-level optimization Application flow of various primitives
Static scheduling Minimize runtime overhead for resource
constrained embedded systems Deadlock detection Bounding the memory requirements
Good match for sensor applications collect data, process, transmit
Jiwon Hahn, UC Irvine 24
SDF▶ Notations
SDF graph G = (V, E, p, c) V: {v1, v2, … v|V|}
E: {e1, e2, … e|E|} src(e) : source node snk(e): sink node p(e) : produce rate -c(e) : consume rate
T(e,v): topology matrix p(e) if v = src(e), -c(e) if v = snk(e) 0 otherwise
v11 2 2 1 3 … 5
e1 e2 e3 … e|E|
v2 v3
T =
e1
e2
e3
…e|E|
v1 v2 v3 … v|V|
1 -2 0 … 00 2 -1 … 00 0 3 … …0 0 0 … -5
v|V|
src(e1) p(e1) c(e1) snk(e1)
e1
Jiwon Hahn, UC Irvine 25
SDF▶ Example
Surge Application
Actors: A, B, C Buffers: x, y Schedule: ABC Rappit Script (4L):
ADCread
RFsend
RFpack
1 1 1 1
x y
A B C
every 2048:x = ADC.read()y = RF.pack(x)RF.send(y)
Jiwon Hahn, UC Irvine 26
SDF▶ Example (cont’d)
Same code in Java (20L) [J. Koshy ’05]:
SurgePacket sgPkt;char eList, eVector;byte sHandle;sgPkt = new SurgePacket();evList = Select.setEventId( eList, Events.TIMEOUT | Events.RADIO RECV );sHandle = Select.requestSelectHandle();char val;Clock.startTimeout( 2048 );while (true) { eVector = Select.select(sHandle, eList); if (Select.eventOccurred( eVector, Events.TIMEOUT )) { val = PhotoSensor.sense(); sgPkt.setReading( val ); Surge.sendPacket( sgPkt ); Clock.startTimeout( 2048 ); } else if (Select.eventOccurred( eVector, Events.RADIO RECV)) { handleRadioEvent( sgPkt ); // if base, forward to uart }}
Jiwon Hahn, UC Irvine 27
Problem Statements
1. Find the best schedule and buffer mapping that minimizes the buffer size requirement Goal-oriented Previous work
2. Find the best schedule and buffer mapping that fits into, and maximizes the utilization of a given memory size Constraint-driven Novel Practical
Jiwon Hahn, UC Irvine 28
Buffer Mapping Problem▶ Spatial representation
Token-lifetime chart (t-chart) row: token’s lifetime, produced placed
consumed column: fixed number of token changes caused by
firing eventt2 t2 t2
t1 t1
t4 t4 t4
t3 t3 t3
x
ytime
localbuffer
A B B C C
Jiwon Hahn, UC Irvine 29
Buffer Mapping Problem▶ Spatial representation (cont’d) Memory-usage profile (m-profile)
Metrics Msize = 4, Mtotal = 20, Mused = 11, Mwasted = 9, Mutil =
55% T = 5
time
memory
A B B C C
Jiwon Hahn, UC Irvine 30
Related Work▶ Data memory optimization based on MoC
Technique Group IdeaOptimal Scheduling
[Bhattacharyya et al] in Ptolemy Group
Buffer minimized by optimal scheduling, optimize each local buffer
Buffer sharing by lifetime analysis
[Bhattacharyya et al] in Ptolemy Group, [Ha et al] in PeaCE group, [Ritz et al] in Meyr Group
Local buffer lifetime is analyzed to share global buffers
Buffer merging
[Bhattacharyya et al] in Ptolemy Group
Input/output buffer is shared (finer grain than buffer sharing)
Model checking
[Geilan et al] in Eindhoven Univ.
Reduced the problem to a model-checking problem on the state-space of SDF graph
Etc. (MBRO, PAPS, MRSP, …)
[Govindarajan et al] in Gao Group, [Peperstraete et al], [Goddard et al], [Ade et al] in GRAPE group
Rate-optimal / Vectorization/ Application to real-time systems / etc
Jiwon Hahn, UC Irvine 31
Memory Optimization Techniques
1) *Scheduling w/ Unshared Buffer 2) *Buffer Sharing3) *I/O Buffer Merging4a) **Fractionizing 4b) Rate Selection (new)5) Pipelining (new)
* Well established previous work** Recently proposed
Jiwon Hahn, UC Irvine 32
By efficient ordering of actors, buffer requirement is reduced! Each edge is directly mapped to its dedicated buffer space
Memory Optimization Techniques▶ 1) Scheduling with unshared buffer
A2 1 1 1
B CSchedule 1: A B B C C Schedule 2: A B C B C
x = A()repeat 2: y = B(x)repeat 2: C(y)
x = A()repeat 2: y = B(x) C(y)
x[0..1] = A()y[0] = B(x[0])y[1] = B(x[1])C(y[0])C(y[1])
x[0..1] = A()y[0] = B(x[0])C(y[0])y[0] = B(x[1])C(y[0])
Buffer requirement:
|x| + |y| = 2 + 2 = 4Buffer requirement:
|a| + |b| = 2 + 1 = 3
x y
Jiwon Hahn, UC Irvine 33
Memory Optimization Techniques▶ Comparing 1), 2), 3)
A2 1 1 1
B CSchedule: A B B C C
x[0..1] = A()y[0] = B(x[0])y[1] = B(x[1])C(y[0])C(y[1])
x[0..1] = A()x[0] = B(x[0])x[1] = B(x[1])C(x[0])C(x[1])
x[0..1] = A()y[0] = B(x[0])x[0] = B(x[1])C(y[0])C(x[0])
1) Unshared Buffer 2) Shared Buffer 3) Merged I/O Buffer
x = A()repeat 2: y = B(x)repeat 2: C(y)
B(x[0])
Data consumed
…
x[0]
Reuse the
available space!
Assuming the token is consumed
before output is
produced…
B(x[0])B(x[1])
Use the same space
for the input/outpu
t tokens
x[0]x[1]
Buffer requirement:
|x| + |y| = 2 + 2 = 4Buffer requirement:
|x| + |y| = 2 + 1 = 3Buffer requirement:
|x| + |y| = 2 + 0 = 2
x y
Jiwon Hahn, UC Irvine 34
Memory Optimization Techniques▶ Comparing 1), 2), 3) (cont’d)
x
ytime
localbuffer
A B B C C
t4 t4 t4
t3 t3 t3
t2 t2 t2
t1 t1
t2 t4
t1 t3
1) Unshared Buffer 2) Shared Buffer 3) Merged I/O Buffer
|x|+|y| :Mtotal :Mused :Mwasted :Mutil :
42011955%
31511473%
2109190%
Jiwon Hahn, UC Irvine 35
Memory Optimization Techniques▶ 4a) Fractionizing Idea:
Don’t wait until A produces big chunk of data Modify actor A to process only fractional amount of
the original data at a time Trade-off
Local effect Possible time and energy overhead
e.g., resource’s access time, packet overhead Global effect
Reduced bottleneck: shorter processing interval of A Reduced buffer size: min|x|: 2 1
A3 1
x B1 1
11/3
x
Schedule: A 3(B) Schedule: 2(AB)
A’ Bw w
Jiwon Hahn, UC Irvine 36
Memory Optimization Techniques▶ 4b) Rate Selection Idea
Generalize fractionizing Not only allow fractions but also multiples Rate is defined as range, but fixed before schedule
finalizes Each actor is modeled with timing and power function
with respect to the I/O range
Benefits Combines the power of flexibility and static determinism Increases buffer reduction opportunity
Challenge Need an efficient way to handle considerably increased
exploration space at runtime
(2,6)
x BA(1,3)
w(4,4)
Schedule1: 2(A)BSchedule2: ABSchedule3: 2(A)3(B)
Jiwon Hahn, UC Irvine 37
Memory Optimization Techniques▶ 5) Pipelining Idea
Allow multiple actor firing at once Benefits
Reduced buffer requirement Higher memory utilization Increased throughput
Challenges Need multiprocessors Need to resolve resource conflict Need to consider synchronization problem
Jiwon Hahn, UC Irvine 38
Memory Optimization Techniques▶ Comparing 1), 4), 5)
A’1 1 1 1
B C1/2
x
y
t3 t3
t2 t2
A B C
A2 1 1 1
B C1
x
yA B C B C
1) Unshared Buffer
t2 t2 t2 t2
t1 t1
t3 t3 t4 t4
x y
4) Fractionized / Rate Selected x y5) Pipelined
t4 t4
t1
t4
t1
A B
CC
Buffer Size:
33% reduction
Utilization: 66.7% 100%
Time: 5 4 firing unit
Jiwon Hahn, UC Irvine 39
Memory Optimization Techniques▶ Summary
0: None (baseline) 1: Unshared Scheduling 2: Shared Buffer 3: Merged I/O 4: Fractionized 5: Pipelined
t1 t1 t2 t2 t3 t3 t4 t4
0 1 1+2 1+2+3
1+4 1+2+4 1+2+3+4
1+4+5
M_size 4 3 3 2 2 2 1 2
M_used 11 10 10 9 8 8 6 8
M_wasted 9 5 5 1 4 4 0 0
T 5 5 5 5 6 6 6 4
M_utilization 55% 66.7%
66.7%
90% 66.7% 66.7% 100% 100%
t1 t1 t3 t3
t4 t2 t2 t4 global
A B C A B
C
Jiwon Hahn, UC Irvine 40
Multi-metric Optimization
Trade-offs In actor point of view
(local), processing large amount of data at once tends to reduce time and energy overhead
In SDF-flow point of view (global), processing small amount of data at once reduces buffer requirement
Goal Find a pareto-optimal
point that resides in a range of solution set that satisfies constraints
DataMemory
Energy
ExecutionTime
data-flow
rate
Jiwon Hahn, UC Irvine 41
Applying it to Rappit▶ Quasi-static optimization
Compile-time
Run-time
Host
Target
Compile
Load script
Preprocess
Load script code
Execute
Rappit Flow Performed Tasks
Kernel and primitives compiled and installed
SDF defined
Actor-to-processor assignment,Actor ordering (scheduling),Buffer mapping
Static schedule loaded
Deterministic executionw/o runtime overhead
Optimization
Jiwon Hahn, UC Irvine 42
Outline
Scripting Framework Memory-oriented Optimization▶ Implementation
⊳Synthesis Tool⊳Simulator⊳Runtime Host-assisting Tool (GUI)
Experimental Platforms Summary & Research Plan
Jiwon Hahn, UC Irvine 43
Implementation▶ Scripting engine synthesis tool System Template
GUI-based check-box approach easily capture existing systems model new systems for simulation and
design space exploration includes communication description
Component Library binds according to template configuration consists of MCU, on-chip devices, off-chip
peripherals each component has I/O pins and driver
modules
Jiwon Hahn, UC Irvine 44
Implementation▶ Memory simulator
Jiwon Hahn, UC Irvine 45
Implementation▶ Interactive runtime tool
Jiwon Hahn, UC Irvine 46
Implementation▶ Tool integration
GUI Scheduler
MemoryOptimizer
DispatcherParser
NodeManager
Node 1
Node 2
Node 3
Node N
Jiwon Hahn, UC Irvine 47
Outline
Scripting Framework Memory-oriented Optimization Implementation▶ Experimental Platforms Summary & Research Plan
Jiwon Hahn, UC Irvine 48
HW Platforms and Real-world Applications Eco
ultra-compact sensor node pre-term infant monitoring dancing motion detection
Mini-FDPM active laser sensing device breast cancer detection
DuraNode real-time data acquisition system structural health monitoring
Butterfly low-power, i/o rich development board prototyping (SD-card, speaker, sensors, RF)
Jiwon Hahn, UC Irvine 49
Outline
Scripting Framework Memory-oriented Optimization Implementation Experimental Platforms▶ Summary & Research Plan
Jiwon Hahn, UC Irvine 50
Summary
A novel scripting framework for embedded systems Scripting engine synthesis Host assisting runtime environment
Memory optimization techniques Comparison of techniques Integration and multi-objective problem
Tool Implementations Rappit GUI, memory simulator
Jiwon Hahn, UC Irvine 51
Contributions
Empowered Embedded Systems Unleashing the severely constrained
embedded systems
SDF Extensions Extension of SDF model Extending the application area of SDF
Memory Savings Reduced memory requirement by
integration of policies, including new techniques
Jiwon Hahn, UC Irvine 52
Research Plan▶ finished, ongoing, future work Framework
Language definition* Initial implementation
and prototyping Component library
generation* Code generation Overhead analysis Tool integration Test on multinode
scenario
Optimization Survey and comparison Simulator implementation Integrating techniques SDF extension on rate Rate-selection algorithm Buffer-mapping protocol Cost function modeling of
multi-metric optimization SDF extension on timing
Case Study AVR butterfly mini-FDPM eco DuraNode*with Qiang Xie & Jinfeng Liu
Jiwon Hahn, UC Irvine 53
Publications
Jiwon Hahn, Qiang Xie, and Pai H. Chou, Rappit: A Framework for the Synthesis of Host-Assisted Light-Weight Scripting Engines for Adaptive Embedded Systems, in Proc. International Conference on Hardware Software Codesign and System Synthesis (CODES+ISSS), 2005.
Jiwon Hahn, Dexin Li, Qiang Xie, Pai H. Chou, Nader Bagherzadeh, David W. Jensen, Alan C. Tribble, Power Reduction in JTRS Radios with ImpacctPro," in Proc. IEEE Military Communication Conference (MILCOM), 2004.
Jiwon Hahn, UC Irvine 54
Bibliography
Murthy PK, Shuvra S. Bhattacharyya, Buffer merging - a powerful technique for reducing memory requirements of synchronous dataflow specifications. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2004.
Murthy PK, Shuvra S. Bhattacharyya, Shared buffer implementations of signal processing systems using lifetime analysis techniques, IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems (TCADICS), 2001.
Shuvra S. Bhattacharyya., Murthy PK, Edward A. Lee, APGAN and RPMC: Complementary Heuristics for Translating DSP Block Diagrams into Efficient Software Implementations, Design Automation for Embedded Systems (DAES), 1997
Shuvra S. Bhattacharyya, Murthy PK, Edward A. Lee, Joint Minimization of Code and Data for Synchronous Dataflow Programs, 1997.
Hyunok Oh, Soonhoi Ha, Fractional rate dataflow model and efficient code synthesis for multimedia applications, SIGPLAN Not, 2002.
Hyunok Oh, Soonhoi Ha, Data memory minimization by sharing large size buffers, Asia and South Pacific Design Automation Conference (ASPDAC), 2000.
Hyunok Oh, Soonhoi Ha, Efficient Code synthesis from extended dataflow graphs for multimedia applications, Design Automation Conference (DAC), 2002.
Geilen M, Basten T, Stuijk S, Minimising buffer requirements of synchronous dataflow graphs with model checking, 42nd Design Automation Conference (DAC), 2005.
Eckart Zitzler and Jurgen Teich and Shuvra S. Bhattacharyya, Multidimensional Exploration of Software Implementations for DSP Algorithms, Journal of VLSI Signal Processing (JVLSI), 1999
John K. Ousterhout, Scripting: Higher Level Programming for the 21st Century, IEEE Computer magazine, 1998
TecO Home, http://particle.teco.edu/
Jiwon Hahn, UC Irvine 55
Acknowledgements
This work is sponsored in part by the National Science Foundation grant CCR-0205712 and NSF CAREER Award CNS-0448668
Professor Pai Chou Qiang Xie Jinfeng Liu
Jiwon Hahn, UC Irvine 56
Backup Slides
Jiwon Hahn, UC Irvine 57
Scripting Overhead
Scripting for General Purpose Computers Assume unlimited resources Full feature scripting engine for convenience Slower than system programming language
Scripting for Embedded Systems Limited memory, CPU, power, … Need scripting engine optimization
Host assist Language subsetting Library subsetting Efficient memory usage
Scripting may be even faster than compiled code!
Jiwon Hahn, UC Irvine 58
Rappit▶ Packet format example
Command Packet Format
Response Packet Format
Dst. Msg ID Opcode Input[3] Output[3] CRC
Src. Msg ID Msg Type Data Type Payload CRC EOP
Command Message Format
Response Message Format
Opcode In_addr In_start In_size Out_addr Out_start Out_size
Jiwon Hahn, UC Irvine 59
Rappit▶ Scripting engine optimization in code synthesis
Language subsetting eg., assignment (=), loop (repeat)
Library subsetting customized for target applications and
platform
RF SPI InterruptsGPIOUART ADC
MCUFull-Featured
Component LibraryRFInterrupts
GPIO UART
ADC Sensor1JoystickLCD Sensor1 Sensor2Dataflash
Jiwon Hahn, UC Irvine 60
Memory Organizations▶ Comparing previous work and Rappit Previous approaches consider both data and code
memory minimization, but prioritize code size* We mainly focus on data size** minimization
Buffer
ApplicationCode*
Buffer **
Primitives
RappitKernel
RAM
On-chip Flashor EEPROM
RAM
On-chip Flashor EEPROM
Previous work Our work
Script Code
Data Flash
Jiwon Hahn, UC Irvine 61
Rappit▶ Code size of runtime components
Host Code (.py)
Lines
Size (KB)
GUI 644 21.8
Cmd 127 2.87
Parser &Msg Generator
221 4.97
Library 263 6.396
Packetizer &Depacketizer
82 2.0
Packet Mgr 42 0.92
Total 1379
38.96
MCU Code (.c)
Lines Size (KB)
Interpreter 260 -
Primitives 90 -
Packetizer & Depacketizer
300 -
Total 750 1.484
Jiwon Hahn, UC Irvine 62
Rappit▶ Summary of results
Code size reduction
Performance overhead components analysis
Native Interactive
Batch
Communication
1 3 1
RAM Access 3 1 1
ROM Access 3 1 1
Packetization 1 2 2
Interpretation 1 2 2
Total cmd/sec 92 4.75 111
Application Native Rappit Reduction
Reg setting 4.356 KB 1.664 KB 61.8%
LCD usage 12.45 KB 4.2 KB 66.3%
1: fast
2: tolerable
3: slow
(bottleneck)
Jiwon Hahn, UC Irvine 63
Rappit▶ Subset of primitives
Device Primitive Device Primitive Device Primitive
MCU reset GPIO set pin Timer register fcn
MCU power save GPIO get pin Timer remove fcn
MCU initialize GPIO clear pin RTC set clock
MCU get sys clock USART TX RTC read clock
MCU set sys clock USART RX LCD clear
RF INIT SD read LCD write
RF set channel SD write LCD set contrast
RF set power ADC read Joystick get key
RF set frequency
Sensor1 read Speaker set volume
RF send Sensor2 read Speaker play tone
RF receive Sensor3 read Speaker play song
Jiwon Hahn, UC Irvine 64
Rappit▶ Language
key Usage Example
import import methods of each device
from RF import *
doc, dict look up documentation, included methods
RF.__doc__
RF.__dict__
open, close
open/close a connection to a target system
node1 = open(MCU1, uart1) node1.close()
ls list all connected instances ls
every,start, stop
schedule events with certain period
s1 = (every 30ms: a+= ADC1.read()); s1.start(); s1.stop()
repeat looping repeat 3:
SD.write(a)
def define of a function with a series of methods
def readTemperature(): ...
=, + assign/configure or add value a = SD.read(10); a+=SD.read(20)
Jiwon Hahn, UC Irvine 65
SDF▶ Strength and limitations
Strength Ability to express multi-rate systems, parallelism Deadlock detection and scheduling can be
determined at compile-time Bounded memory requirements No runtime supervisory overhead
Limitations Lack of conditional control flow Does not model asynchronous nodes Does not adequately address the real-time nature
of connections to the outside world Does not address data-dependent run times
Jiwon Hahn, UC Irvine 66
Superset of SDF▶ Dynamic dataflow (DDF)
Allows asynchronous actors with non-fixed rate of each actor
Captures dynamic constructs if/else for-loop do/while loop recursion
Jiwon Hahn, UC Irvine 67
SDF▶ Notations Firing & Tokens
f(n) : nth firing vector tk(n) : number of live tokens after nth firing tk(n+1) = tk(n) + G · f(n) f = n=0T f(n) : firing frequency q = fmin : firing vector (minimum # of firings) q(src(ei)) x p(src(ei)) = q(snk(ei)) x c(snk(ei)) balance
equation Consistent SDF
rank (G) = |N|-1 G · q = 0
Scheduling Given G, tk(0), and q, find a firing order which satisfies tk(n)
>= 0, and q = n=0T f(n) Deadlocked if no node can be fired before reaching q = n=0T
f(n)
Jiwon Hahn, UC Irvine 68
SDF▶ Our extensions
SDF previously used in multimedia-oriented applications targeting DSPs and FPGAs
To target more general types of applications, non-buffered edges (dummy channels) should be added, which only denotes precedence
The produce/consume rate of each actor is not given as fixed, but as a range
Add timing (future work)
Jiwon Hahn, UC Irvine 69
SDF▶ Another example
Extended Surge Application
Valid Schedules: 30(A) 3(B) 3(C) D 10(E) 10(F) – Flat SAS 3 (10(A) BC) D 10(EF) – SAS 30(A) 2(BC) BCD 10(EF) – Non SAS
ADCread
RFsend
Kernelpack1 10
aA C D
SDstore
SDread
LCDshow
E F
B1 1
1 10
10 11 3
1 1
d e f
bc
Jiwon Hahn, UC Irvine 70
SDF▶ Another example (cont’d)
Script (SAS)
enable Timer1, RF, SD, LCDevery 2048:
repeat 10:repeat 10:
a = ADC.read()LCD.show(a)SD.store(a)
repeat 10:b = SD.read()repeat 3:
c = Kernel.pack(b)
RF.send(c)
Jiwon Hahn, UC Irvine 71
Script-to-SDF Transform
User script
V = { A, B, C }E = { x, y } = {eAB, eBC}πinit = A2(BC)
x = A()repeat 2: y = B(x) C(y)
eA
B
p (A)
= (2, 3)
c (B)
= (1,1)
eB
C
p (B)
= (1,1)
c (C)
= (1,2)
A2/3 1/1 1/1 1/2
x yB C
Jiwon Hahn, UC Irvine 72
Multimetric Optimization▶ Cost function modeling
Constraints Energy
Battery lifetime or other source of power budget Time
Deadline in given real-time application Memory
Given memory size for a platform
Each node is modeled with: Pv(c,p): power consumption w.r.t. consume/produce
rate (i.e., input/output data size) Tv(c,p): execution delay w.r.t. consume/produce
rate
top related