toy to practical interpreter mosh intenals shibuya.lisp2009/02/28

Post on 29-Jun-2015

1.288 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Mosh R6RS Scheme interpreterFrom slow toy interpreter to fast practical Interpreter.

TRANSCRIPT

Toy to practical

Taro Minowa (Higepon)

Shibuya.Lisp Tech Talk#2February 28, 2009

Mosh internals

Introduce myself

MonaOpen Source OS

MoshFast Scheme Interpreter

Outputzhttp://outputz.com/

Today’s presentation is

about…

From toy to practical interpreter

Mosh

R6RS Scheme InterpreterAs fast as Gauche and Ypsilon(I believe)

Many SRFIs, DBI (MySQL)

Regexp (Oniguruma)

Object system (Tiny CLOS)

Process Management

Foreign Function Interface

Developmentversion 0.0.7

2 comitters

Higepon

kokosabu

In the future

Use shell for Mona

Toy

The MITOH 2006Scheme Shell for Mona

OS integrated R5RS Scheme

Implementation based on SICP

My first interpreter!

Basic tree-based interpreter

Written in “Pure C++”

Good pointsIt works

Almost covers R5RS

Bad pointsToo slow

fib(31) takes a few minitues !

Not for practical use

Why was it slow and bad?

Problems

Slow GC -

Scheme recursion uses native stack

-

Slow environment look up

-

Incomplete tail calloptimization

-

Slow arithmetic -

Few Optimization -

Slow arithmeticToo many heap allocations

So fib(31) causes ...

With slow GC

(+ 1 1)=> new Number(1 + 1)

Learned from toySlow interpreter is useless

Slow interpreter is not practical

Need more speed

Need better design

Tree-based→ VM

Read “The 3imp”Three implementation models for scheme

Kent Dyvbig (ChezScheme)

Bytecode VM

With sample code

http://mono.kmc.gr.jp/~yhara/w/?Reading3imp.pdf#l13

Choose Stack-based VMFast environment look up

Use display closure

Use virtual stack

tail call optimization

3imp doesn’t have

Multiple values Values register

Global variables Global hash-table

Subr Hook on APPLY instruction

let LET_FRAME instruction

Optimization on compilation

Borrowed from Gauche

Second implementation Mini Mosh

Write in Scheme instead of C++Easy to make prototype

no need to parse, we have “read”

never SEGV (very important!)

We can use backend’s proceduresOP_CAR => car

PrototypingRewrote about 50 times

VM: 1400 lines, Compiler 2400 lines

Hardest partDesigning stack layout

Wrong stack position

change stack layout => crash

some code works the other doesn’t

Bugs in compiler, VM or design?

A Pen and a notebook are more than friend.

Instruction example

VM

(+ 1 1)=> ‘(CONST 1 PUSH CONST 1 NUMBER_ADD)

[(NUMBER_ADD) (apply-native-2arg +)][(CONSTANT) (val1) (VM codes (skip 1) (next 1) fp c stack sp)]

half

Improvement in Mini Mosh

Problems of toy Solutions

Slow GC -

Scheme recursion uses native stack

Stack VM usesvirtual stack

Slow environment look up

First lookupusing display closure

Incomplete tail calloptimization

Tail call optimization

Slow arithmetic -

Few Optimization Borrowed from Gauche

Port prototypeto C++

VMEasy to port

maps cond to switch/case

maps recusive call to loop

CompilerPainful

compiler(Scheme)

Mini Mosh(Scheme)

compiler(Scheme)

LREF 0PUSHGREF ‘compileAPPLY...

compile

list of instructionscompiler.cpp

generate

read

Compile compiler with Mini MoshEmbed the instructions to C++ Mosh

Run on Gauche

Mosh(C++)

make(LREF, 0)make(PUSH)make(GREF, “compile”)

Mini Mosh(Scheme)

Mosh(C++)

compiler(Scheme)

share

Share the compiler written in SchemeEasy to debug

Easy to process intermediate code

VM in C++Use Boehm GC

Much faster than toy

fib(31) takes only a few seconds

(+ 1 1) doesn’t need heap allocation

Tag bit based Object system

Use immediate value for Number

Improvement in C++ Mosh

Problems of toy Solutions

Slow GC Boehm GC

Scheme recursion uses native stack

Stack VM usesvirtual stack

Slow environment look up

First lookupusing display closure

Incomplete tail calloptimization

Tail call optimization

Slow arithmetic Tag bit Object System

Few Optimization Borrowed from Gauche

Become a practical fast interpreter?

Not yet.

2 beard gurus(ひげのお兄さんたち)

Gauche & Ypsilon

Speed freakhttp://osdevj.g.hatena.ne.jp/osdevj/20060807/1154962935

CJava

GaucheCINTPerl

PythonRuby 1.8

0 75 150 225 300msec

We need performance tuningChart

Profiler

Tuning

Fast startup

Many optimization techniques

ChartMake a goal clear

Know what I’ve done is good or bad

Run benchmarks

make bench

Draw charts

every time

ProfilerC++ profiler tells us little

It happens inside the run-loop

We need Scheme profiler

mosh -p

SIG_PROF

Fast start up is also importantJust running empty script takes ...

Fast start up is also importantJust running empty script takes ...

Perl

Ruby

Gauche

Python

Ypsilon

Mosh

0 20 40 60 80msec

Mosh startup 80 => 20msecDon’t read many files when starts up

Don’t allocate large memory

Don’t use too many static initializer

Embed the compiler with binary format

FASL(Fast Loading)

Optimizations

CompilerBeta reduction

Procedure inlining

Constant folding

(+ 1 2) => 3

Peep hole

destination of jump is jump etc ..

VMInstructions Unification

Shorter instructions are faster

PUSH + APPLY => PUSH_APPLY

Direct threaded code

switch/case => goto

GCC only

Compare on instruction levelGauche

(disasm ...)

Ypsilon

(debug-compile ...)

More improvement in C++ Mosh

Problems of toy Solutions

Slow GC Boehm GC

Scheme recursion uses native stack

Stack VM usesvirtual stack

Slow environment look up

First lookupusing display closure

Incomplete tail calloptimization

Tail call optimization

Slow arithmetic tag bit Object System

Few Optimization many many optimizations

Finally

Mosh becomes practical!Practical speed

Conclusions

Toy to Practical is not easy.

Wear a beard!Please try Moshhttp://code.google.com/p/mosh-scheme/

trunk is better

top related