design and implementation of an embedded python run-time … · rice computer architecture group...

35
rice computer architecture group Design and Implementation of an Embedded Python Run-Time System Thomas W. Barr, Rebecca Smith, Scott Rixner Rice University, Department of Computer Science USENIX Annual Technical Conference, June 2012 1

Upload: others

Post on 15-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Design and Implementation of an Embedded Python

Run-Time SystemThomas W. Barr, Rebecca Smith, Scott Rixner

Rice University, Department of Computer Science USENIX Annual Technical Conference, June 2012

!1

Page 2: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

The computing landscape

!2

Page 3: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Microcontrollers: They’re everywhere.

!3

~1000s~1bil~10bilUncountable?

Page 4: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Microcontrollers: They’re everywhere.

- Virtually no run-time support (yet) - Storage? - Memory allocation? - Interrupt handlers? - Console?

!4

Page 5: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Microcontrollers: They’re everywhere.

- 32-bit ARM Cortex-M3 - 64-128KB RAM - Up to 512KB Flash - 50-100MHz

- Interpreters on microcontrollers? - Java Card - eLua - Python-on-a-chip

!5

Page 6: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Owl: A Development Environment for Microcontrollers

- Run-time and toolchain - Open-source

- Based on p14p

- Interactive prompt

- Profilers - Programmers - A lot more

- It works. Today.

- Focus on two parts today - Store Python objects - Call C functions

!6

Page 7: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Owl: Toolchain and run-time

!7

Page 8: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Representing programs

- Python source code

!8

def foo(): a = 'foo' b = (42, 'bar') c = 27 + 3

Page 9: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Representing programs

- Compiler builds code objects - Bytecode to represent user program

!9

Bytecodes: LOAD_CONST 0 STORE_NAME 0 ...

def foo(): a = 'foo' b = (42, 'bar') c = 27 + 3

Page 10: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Representing programs

- Must include data along with code - Constants, names, etc. - .pyc file, .class file

!10

Bytecodes: LOAD_CONST 0 STORE_NAME 0 ...

Constants: 0: 'foo' 1: (42, 'bar') ...

Page 11: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Loading programs

!11

File: 0: 'foo' 1: (42, 'bar') ...

Memory

Page 12: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Loading programs

!12

File: 0: 'foo' 1: (42, 'bar') ...

Memory

str:

Page 13: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Loading programs

!13

File: 0: 'foo' 1: (42, 'bar') ...

Memory

str: 'foo'

Page 14: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Loading programs

!14

File: 0: 'foo' 1: (42, 'bar') ... str: 'foo'

Memory

str: 'bar'

tup: _, _

int: 42

Page 15: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Loading programs

- Intermediate files - Python .pyc file, Java .class file, etc. - Loaded at runtime, then executed

- Load before execution - Copy and transform

- Link compound objects with references

- Makes sense on x86 - RAM is plentiful

- Doesn’t on microcontroller - Can't afford to copy anything

!15

Page 16: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Loading programs

!16

str: 'foo'

Memory image

str: 'bar'

tup: _, _

int: 42- Owl memory image - Unique on interpreted

embedded systems - Store objects in form that

is used by the VM - Object headers - Byte-order

- How do we store compound objects?

Page 17: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

packed tuple:

rice computer architecture group!17

str: 'bar'int: 42

- Packed tuples - Store objects inside of nested containers - Transplantable

- Computer → Controller

- Controller → Controller

- No pointers to fix up

Loading programs

Page 18: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Memory images

- p14p, eLua use dynamic loaders - Objects copied from flash to RAM

- Owl uses memory images - Can stay in flash, not SRAM - 512KB flash vs. 96KB SRAM

- Results - 4x reduced SRAM footprint

!18

Page 19: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Calling C library functions

- C is a necessary evil - Peripheral I/O libraries

- Generally vendor provided

- Must provide a way for Python to call C - Two techniques compared - Automatically expose functions from .h file - Trade-offs in space and speed

!19

Page 20: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Calling C library functions

- User cannot directly control execution - In an interpreter loop - Need some way to let

programmer break out of loop

!20

bytecode = *IP; switch bytecode: case BINARY_ADD: // add two Python ints // together IP++; case JUMP: // change IP case CALL_FUNCTION: // create new frame // change IP to new frame

Page 21: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Calling C library functions

!21

bytecode = *IP; switch bytecode: case BINARY_ADD: // add two Python ints together IP++; case JUMP: // change IP case CALL_FUNCTION: // create new frame // change IP to new frame

Page 22: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Calling C library functions

!22

bytecode = *IP; switch bytecode: case BINARY_ADD: // add two Python ints together IP++; case JUMP: // change IP case CALL_FUNCTION: if (func_name == "uart_init") { call_uart_init(); } else { // create new frame // change IP to new frame }

Page 23: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Calling C library functions

!23

bytecode = *IP; switch bytecode: case BINARY_ADD: // add two Python ints together IP++; case JUMP: // change IP case CALL_FUNCTION: if (func_name == "uart_init") { call_uart_init(); } else if (func_name == "uart_send") { call_uart_send(); } else { // create new frame // change IP to new frame }

Page 24: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

call_uart_init()

- Wrappers for functions - Interpreter calls

wrapper - Wrapper converts

arguments - Calls function - Wrapper converts result - Returns result on stack

- Call C as you call Python

6/5/12 Pygments — Python syntax highlighter

1/2pygments.org/demo/45439/?style=perldoc

Usethisstyle: perldoc Go

Home FAQ Languages Demo Whousesit? DownloadDocumentation Contribute

Entry45439out.c

SubmittedbyanonymousonJune6,2012at1:34a.m.Language:C.Codesize:680bytes.

Important:Thisentryisnotyetstoredinthedatabase.

Saveitpermanently or Deleteitnow

/* Variable declarations */PmReturn_t retval = PM_RET_OK;pPmObj_t p0;uint32_t peripheral;

/* If wrong number of arguments, raise TypeError */if (NATIVE_GET_NUM_ARGS() != 1) { PM_RAISE(retval, PM_RET_EX_TYPE); return retval;}

/* Get Python argument */p0 = NATIVE_GET_LOCAL(0);

/* If wrong argument type, raise TypeError */if (OBJ_GET_TYPE(p0) != OBJ_TYPE_INT) { PM_RAISE(retval, PM_RET_EX_TYPE); return retval;}

/* Convert Python argument to C argument */peripheral = ((pPmInt_t)p0)->val;

/* Actual call to native function */SysCtlPeripheralEnable(peripheral);

/* Return Python object */NATIVE_SET_TOS(PM_NONE);return retval;

Thissnippettook0.00secondstohighlight.

BacktotheEntryListorHome.!24

Page 25: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Calling C library functionsimport gpio class Output: def __init__(self, portpin): self.port = PORTS[portpin[0]] self.pin = PINS[portpin[1]] # turn on GPIO module init_port(portpin[0]) # configure pin for output gpio.GPIOPinTypeGPIOOutput(self.port, self.pin) def write(self, value): if value: # turn pin on gpio.GPIOPinWrite(self.port, self.pin, self.pin) else: # turn pin off gpio.GPIOPinWrite(self.port, self.pin, 0)

!25

Page 26: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Calling C library functionsimport gpio class Output: def __init__(self, portpin): self.port = PORTS[portpin[0]] self.pin = PINS[portpin[1]] # turn on GPIO module init_port(portpin[0]) # configure pin for output gpio.GPIOPinTypeGPIOOutput(self.port, self.pin) def write(self, value): if value: # turn pin on gpio.GPIOPinWrite(self.port, self.pin, self.pin) else: # turn pin off gpio.GPIOPinWrite(self.port, self.pin, 0)

!26

Page 27: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Calling C library functions

- Wrapped functions - Manually write

wrappers - p14p, eLua

- Autowrapping - Similar to SWIG

- Simple - Any compiler - Any architecture - Lots of duplicated code

6/5/12 Pygments — Python syntax highlighter

1/2pygments.org/demo/45439/?style=perldoc

Usethisstyle: perldoc Go

Home FAQ Languages Demo Whousesit? DownloadDocumentation Contribute

Entry45439out.c

SubmittedbyanonymousonJune6,2012at1:34a.m.Language:C.Codesize:680bytes.

Important:Thisentryisnotyetstoredinthedatabase.

Saveitpermanently or Deleteitnow

/* Variable declarations */PmReturn_t retval = PM_RET_OK;pPmObj_t p0;uint32_t peripheral;

/* If wrong number of arguments, raise TypeError */if (NATIVE_GET_NUM_ARGS() != 1) { PM_RAISE(retval, PM_RET_EX_TYPE); return retval;}

/* Get Python argument */p0 = NATIVE_GET_LOCAL(0);

/* If wrong argument type, raise TypeError */if (OBJ_GET_TYPE(p0) != OBJ_TYPE_INT) { PM_RAISE(retval, PM_RET_EX_TYPE); return retval;}

/* Convert Python argument to C argument */peripheral = ((pPmInt_t)p0)->val;

/* Actual call to native function */SysCtlPeripheralEnable(peripheral);

/* Return Python object */NATIVE_SET_TOS(PM_NONE);return retval;

Thissnippettook0.00secondstohighlight.

BacktotheEntryListorHome.!27

Page 28: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Foreign Function Interface

- Based on libffi - Calls C functions with arbitrary signature - Implements calling convention

- Custom port to Cortex-M3

- eFFI - Functions statically linked into program - Function pointer tables instead of DWARF symbol

headers - Compatible with statically linked embedded system

- No duplicated code

!28

Page 29: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Comparison

- eFFI smaller - VM 2KB larger - Eliminates 21KB of wrappers

- Wrapper functions simpler - 2x faster, largely irrelevant in practice

0kb 10kb 20kb 30kb 40kb 50kb 60kb 70kb 80kb 90kb 100kb 110kb 120kb 130kb 140kb 150kb 160kb 170kb 180kb 190kb

interp.c - 13kb

Virtual machine34.9kb

Lib (C)9.8kb

(libm) - 10kb

ieee754-df.S - 6kb

ieee754-sf.S - 5kb

Math20.8kb

uip.c - 4kb

IP7.8kb

(usblib) - 7kb

(driverlib) - 6kb

Platform15.2kb

[bytecodes] - 5kb

[debug] - 5kb

[strings] - 9kb

[other] - 6kb

Lib (Py)25.0kb

adc - 4kb

i2c - 4kb

uart - 3kb

I/O (C)21.1kb

[bytecodes] - 5kb

[strings] - 14kb

[other] - 5kb

I/O (Py)26.5kb

0kb 10kb 20kb 30kb 40kb 50kb 60kb 70kb 80kb 90kb 100kb 110kb 120kb 130kb 140kb 150kb 160kb 170kb 180kb 190kb

interp.c - 13kb

Virtual machine36.6kb

Lib (C)10.2kb

(libm) - 10kb

ieee754-df.S - 6kb

ieee754-sf.S - 5kb

Math20.8kb

uip.c - 4kb

IP7.8kb

(usblib) - 7kb

Platform12.2kb

[bytecodes] - 5kb

[debug] - 5kb

[strings] - 9kb

[other] - 6kb

Lib (Py)25.4kb

[bytecodes] - 5kb

[strings] - 15kb

[other] - 7kb

I/O (Py)29.1kb

0kb 10kb 20kb 30kb 40kb 50kb 60kb 70kb 80kb 90kb 100kb 110kb 120kb 130kb 140kb 150kb 160kb 170kb 180kb 190kb

src8.8kb

(libc, libm) - 64kb

(other)65.9kb

lparser.c - 6kb

lvm.c - 5kb

lstrlib.c - 4kb

lbaselib.c - 4kb

lcode.c - 4kb

lapi.c - 4kb

lua63.5kb

luarpc.c - 6kb

modules20.6kb

platform.c - 4kb

uip.c - 4kb

ieee754-df.S - 7kb

arm10.2kb

0kb 10kb 20kb 30kb 40kb 50kb 60kb 70kb 80kb 90kb 100kb 110kb 120kb 130kb 140kb 150kb 160kb 170kb 180kb 190kb

tasks.c - 6kb

Source9.2kb

IntQueue.c - 4kb

Minimal13.2kb

(driverlib) - 7kb

(other)10.0kb

uip.c - 10kb

uip14.8kb

webserver13.0kb

bitmap.h - 5kb

RTOSDemo14.2kb

Figure 5: Static binary analysis, from top to bottom, of the Owl virtual machine using wrapped functions, using theforeign function interface, of the eLua virtual machine and of an example program using SafeRTOS.

a floating-point benchmark that simulates the motion ofthe Sun and the four planets of the outer solar system.

7 Results

This section presents an analysis of the Owl system. Theoverhead of including the virtual machine in flash canbe quite small, as low as 32 KB compared to 22 KB fora simple RTOS. We show that for the embedded work-loads, garbage collection has almost no impact on run-time performance. Finally, we show that using a loader-less architecture uses four times less SRAM than a tradi-tional system.

7.1 Static binary analysis

Figure 5 shows the output of the static binary analyzer.The four rows in the figure are the Owl run-time sys-tem using autowrapping, the Owl run-time system usingeFFI, the eLua interpreter, and the SafeRTOS demonstra-tion program. SafeRTOS is an open-source real-time op-erating system that is representative of the types of run-time systems in use on microcontrollers today.

Consider the Owl run-time system using autowrap-ping. The virtual machine section includes the interpreterand support code to create and manage Python objects.The math section includes support for software floatingpoint and mathematical functions, while the IP sectionprovides support for Ethernet networking. The platform

section is the StellarisWare peripheral and USB driver li-braries. The lib sections (C and Python) are the Pythonstandard libraries for the Owl run-time system. Finally,the I/O sections (C and Python) are the calls to the pe-ripheral library, wrapped by the autowrapper tool.

The Python standard libraries consume a significantfraction of the total flash memory required for the Owlrun-time system. The capabilities that these libraries pro-vide are mostly optional, and therefore can be removedto save space. However, they provide many useful andconvenient functionalities beyond the basic Python byte-codes, such as string manipulation. These sections alsoinclude optional debugging information (5 KB).

With eFFI, the binary is roughly 19 KB smaller, illus-trating the advantage of using a foreign function inter-face. While the code required to manually create stackframes and call functions marginally increases the size ofthe virtual machine and stores some additional informa-tion in the Python code objects, it completely eliminatesthe need for C wrappers.

The Owl virtual machine itself is actually quite small,approximately 35 KB. It contains all of the code nec-essary for manipulating objects, interpreting bytecodes,managing threads, and calling foreign functions. Thisis significantly smaller than eLua’s core, which takes up63 KB, and not much larger than the so-called “lightweight” SafeRTOS, which requires 22 KB (the Sourceand Minimal sections). Note also that the supposed com-pactness of SafeRTOS is deceptive, as it is statically

!29

Page 30: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Owl: A Development Environment for Microcontrollers

!30

Page 31: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Owl: A Development Environment for Microcontrollers

!31

- Yes, you can run a managed run-time on a microcontroller.

- It works, right now. - Autonomous car - Toys - Courseware

- ENGI128 at Rice

- Cookware - ...

Page 32: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Owl: A Development Environment for Microcontrollers

- Yes, you can run a managed run-time on a microcontroller.

- It works, right now. - Autonomous car - Toys - Courseware

- ENGI128 at Rice

- Cookware - ...

!32

Page 33: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Owl: A Development Environment for Microcontrollers

!33

- Yes, you can run a managed run-time on a microcontroller.

- It works, right now. - Autonomous car - Toys - Courseware

- ENGI128 at Rice

- Cookware - ...

Page 34: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Owl: A Development Environment for Microcontrollers

- Interesting area for research - Two innovations presented here - More in paper - Far more in the future?

- Compare to C? - To MATLAB?

- Vital area for research - Microcontrollers are getting more common - Larger peripheral set is making programming harder

- We can and must reverse this trend

!34

Page 35: Design and Implementation of an Embedded Python Run-Time … · rice computer architecture group call_uart_init()-Wrappers for functions-Interpreter calls wrapper-Wrapper converts

rice computer architecture group

Owl: A Development Environment for Microcontrollers

!35

[email protected]