abstract syntax tree runtime information can be

8
A Fast Abstract Syntax Tree Interpreter for R Petr Maj Tomas Kalibera Jan Vitek Floréal Morandat Purdue University Thesis Runtime information can be leveraged to create simple, fast, easy to maintain interpreters for real languages I Into the data mines… Vendors were excluded from these analyses By Rexer Analytics, presented at Predictive Analytics World – Boston, 2013 1259 respondents from 75 countries “70% of data miners prefer R”

Upload: others

Post on 12-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Abstract Syntax Tree Runtime information can be

A Fast Abstract Syntax Tree

Interpreter for R

Petr Maj

Tomas Kalibera

Jan Vitek

Floréal Morandat

Purdue University

Java cup

Thesis

Runtime information can be leveraged to create

simple, fast,

easy to maintain !interpreters for real languages

I !

Into the data mines… ♥

• 

Vendors were excluded

from these analyses

By Rexer Analytics, presented at Predictive Analytics World – Boston, 20131259 respondents from 75 countries

Tools Used by Data Miners

• 

Vendors were excluded

from these analyses

By Rexer Analytics, presented at Predictive Analytics World – Boston, 20131259 respondents from 75 countries

Tools Used by Data Miners

“70% of data miners prefer R”

Page 2: Abstract Syntax Tree Runtime information can be

An R history

1976 S

1993 R

Today, The R project

Chambers @ Bell Labs, then S-Plus (closed-source owned by Tibco)

Ihaka and Gentlemann, started R as new language at the U of Auckland, NZ

Core team ~ 20 people, under GPL. Continued dev of language & libraries:

namespaces (‘11), bytecode (‘11), 64-bit indexes (‘13)http://www.r-project.org

Vectors

with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))

x <- c(2,7,9,NA,5)

c(1,2,3) + x[1:3]

x[is.na(x)] <- 0

Functions

with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))

p<-function(x=5,…,y=x+1) x + c(…) + y

p() p(1,2) p(y=2)

p(y=2,x=1)

p(1,2,y=0,3,4)

Promises

with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))

assert(x==2, report(x)) !

!

assert<-function(C,P) if(C)print(P)

Page 3: Abstract Syntax Tree Runtime information can be

Referential transparency

with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))

x <- c(0,1) !

f(x) !

assert(x[[1]]==1) !

!

f<-function(a){a[[1]]<-0}

Reflection

with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))

with(fd, carb*den) !

!

with<-function(data,exp,…) eval(substitute(exp), data, parent.frame)))

Attributes

with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))

x <- c(1,2,3,4) !

attr(x,"dim")<-c(2,2) II !

Of interpreters and men…

Page 4: Abstract Syntax Tree Runtime information can be

Performance & Complexity

Basic textual interpreter Abstract syntax tree interpreter Bytecode interpreter JIT compiler Tracing JIT compiler Optimizing JIT

DLS’12

Self-Optimizing AST Interpreters!

Thomas Wurthinger, Andreas Woß, Lukas Stadler, Gilles Duboscq, Doug Simon, Christian Wimmer

Showed how to obtain performance for a simple(r) language

We use Truffle as an inspiration to develop techniques for R

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

bint

ree

fann

krdx

fast

a

fast

ardx

knuc

leo

knuc

leo-

br

man

del

man

del-n

v

nbod

y

pidi

gits

rege

xdna

reve

rse

reve

rse-

nv

spec

tral

spec

tral

-alt

spec

tral

-mth

Renjin: R in Java1.8x slower

III !

Worse is better…

Java cup

Page 5: Abstract Syntax Tree Runtime information can be

Preparation

!

!

!

•Optimizations:

• allocate slot for a subset of the local variables

• return statement elision

•Only “compilation” step in FastR 0.168

ANTLR Parse Tree FastR Executable Tree

Code Specialization

•In-place profile-driven code specialization

!

!

!

!

!

•Use values of variables to choose a better implementation

•Simple to implement in single threaded context, modulo some care around recursive functions

N

N2

N

N1

N’

N’1 N’2

+ program state+ program state’

+ exception?

Code Specialization

f(12, x+1, a=3)

gen_call

exp

f

12 x+1 3‘a’

exp exp

f <- function(a,b,c){a+c}

Code Specialization

pos_call

expexp exp

f <- function(a,b,c){a+c}

3 12 x+1

Page 6: Abstract Syntax Tree Runtime information can be

Code Specializationclass If { RNode condE, trueB, falseB;! Object execute(Frame f) { try { val = condE.executeScalarLogical(frame); } catch (UnexpectedResult e) { cast = ToLogical.mkNode(condE, e.result()); replaceChild(condE, cast); return execute(frame); } if (val == TRUE) return trueB.execute(f); if (val == FALSE) return falseB.execute(f); throw unexpectedNA(); }

Code Specialization•Specialized code for simple cases, e.g.

arithmetics with scalars; function callsscalar vector indexing arithmetics on vectors of same length

•Bounded self-rewriting: - Uninitialized node rewrites itself - Initialized node rewrites itself when a guard fails

•Rewriting based on events: e.g. a symbol change can re-writes nodes

Data Specialization

•A special representation for common cases of data-types

• Scalar integer with no names, no dimensions, no custom attributes... just the number

!

Reduces memory overhead Merges guards needed in (specialized) code

if (v instanceof RInt && v.size() == 1 && v.names() == null && ! v.dimensions() == null && v.attributes() == null) { ...

if (v instanceof ScalarIntImpl) { ...

Data Specialization

Integer ranges

A specialized implementation of an integer vector which happens to be a range 1,2,…,k k∈N, k>0

Saves memory and enables code specialization

!

!

for(i in 1:n) {...} Java loop over Java integers 1 to n

x[,1:n] = NA Column selection using a Java loop from 1 to n

Page 7: Abstract Syntax Tree Runtime information can be

Data SpecializationInteger ranges

!

!

!

!

!

A range trivially won't contain an NA, negative values, zero, and bounds check for whole range is constant time

Much faster vector indexing

class RIntSimpleRange implements RInt { ! final int to;" int getInt(int i) { return i + 1; }" RInt materialize() { " int[] c = new int[to]; " for (int i = 0; i < to; i++) c[i] = i + 1; " return RInt.RIntFactory.getFor(c);" }"}

Data+Code Specialization

Views

•…delay construction of large data objects

•…are a data-flow representation of vectors

•…avoid unnecessary work if a subset of the data is required

•…avoid allocation of temporary objects

•…permit fusion of multiple data traversals into one

a <- b+c ...

o <- a*2 2

+

*

Data+Code Specialization

Profiling view

Counts number of accesses to a view

An executable node, on first invocation, creates a profiling view; second invocation, rewrites based on the profile

Heuristics:

Avoid small views; avoid over-used root views; avoid repeating computation; avoid recursive views

a <- b+c +P

FastR 0.168 (speedup 8.5x)Java cup

0.00

0.25

0.50

0.75

1.00

bt1

bt2

bt3

pr1

pr2

fa1

fa2

fa3

fa4

fa5 fr1 fr2 kn1

kn2

kn3

kn4

ma1

ma2

ma3

ma4 nb1

nb2

nb3

nb4

nb5

pd1

rd1

rc1

rc2

rc3

sn1

sn2

sn3

sn4

sn5

sn6

sn7

Rel

ative

Exe

cutio

n Ti

me

(GN

UR−A

ST =

1)

FastR GNUR−BC

Relative Time of FastR and GNUR−BC over GNUR−AST

Figure 13. Shootout Relative Execution Times (lower is better). Geo. mean speedup for FastR is 8.5x and 1.8x for GNUR-BC.

0.00

0.25

0.50

0.75

1.00

mc1

mc2

mc3

mc4

mc5 mf1

mf2

mf3

mf4

mf5 pr1

pr2

pr3

pr4

pr5

Rel

ative

Exe

cutio

n Ti

me

(GN

UR−A

ST =

1)

FastR GNUR−BC

Relative Time of FastR and GNUR−BC over GNUR−AST

Figure 14. Benchmark 2.5 Relative Execution Times (lower is better). Geo. mean speedup FastR is 1.7x and 1.1x for GNUR-BC.

stored in the R Frame). Moreover, FastR’s return elision op-timization avoids executing the return statement completelyin the bt benchmarks.

Fusion. Our implementation of fusion of view trees intoJava byte-code, on average, provides no performance change

on the b25 and the shootout benchmarks (numbers shown ingraphs are without fusion). Lacking a realistic applicationthat would stress vector computation, we use a trivial micro-benchmark to validate the potential speed-up of fusion. Wemeasure the time to compute a sequence of commands x =

y + z * y + z - 2 * (8 + z); x[[1]] = 3 for primitive

Page 8: Abstract Syntax Tree Runtime information can be

Runtime information can be leveraged to create

simple, fast, easy to maintain interpreters for real languages

!

http://github.com/allr/fastr

Conclusion