abstract syntax tree runtime information can be
TRANSCRIPT
A Fast Abstract Syntax Tree
Interpreter for R
Petr Maj
Tomas Kalibera
Jan Vitek
Floréal Morandat
Purdue University
Java cup
Thesis
Runtime information can be leveraged to create
simple, fast,
easy to maintain !interpreters for real languages
I !
Into the data mines… ♥
•
Vendors were excluded
from these analyses
By Rexer Analytics, presented at Predictive Analytics World – Boston, 20131259 respondents from 75 countries
Tools Used by Data Miners
♥
•
Vendors were excluded
from these analyses
By Rexer Analytics, presented at Predictive Analytics World – Boston, 20131259 respondents from 75 countries
Tools Used by Data Miners
“70% of data miners prefer R”
An R history
1976 S
1993 R
Today, The R project
Chambers @ Bell Labs, then S-Plus (closed-source owned by Tibco)
Ihaka and Gentlemann, started R as new language at the U of Auckland, NZ
Core team ~ 20 people, under GPL. Continued dev of language & libraries:
namespaces (‘11), bytecode (‘11), 64-bit indexes (‘13)http://www.r-project.org
Vectors
with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))
x <- c(2,7,9,NA,5)
c(1,2,3) + x[1:3]
x[is.na(x)] <- 0
Functions
with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))
p<-function(x=5,…,y=x+1) x + c(…) + y
p() p(1,2) p(y=2)
p(y=2,x=1)
p(1,2,y=0,3,4)
Promises
with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))
assert(x==2, report(x)) !
!
assert<-function(C,P) if(C)print(P)
Referential transparency
with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))
x <- c(0,1) !
f(x) !
assert(x[[1]]==1) !
!
f<-function(a){a[[1]]<-0}
Reflection
with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))
with(fd, carb*den) !
!
with<-function(data,exp,…) eval(substitute(exp), data, parent.frame)))
Attributes
with(fd,carb*den) with.default <-function(data,exp,...) eval(substitute(exp), data, parent.frame)))
x <- c(1,2,3,4) !
attr(x,"dim")<-c(2,2) II !
Of interpreters and men…
Performance & Complexity
Basic textual interpreter Abstract syntax tree interpreter Bytecode interpreter JIT compiler Tracing JIT compiler Optimizing JIT
DLS’12
Self-Optimizing AST Interpreters!
Thomas Wurthinger, Andreas Woß, Lukas Stadler, Gilles Duboscq, Doug Simon, Christian Wimmer
Showed how to obtain performance for a simple(r) language
We use Truffle as an inspiration to develop techniques for R
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
bint
ree
fann
krdx
fast
a
fast
ardx
knuc
leo
knuc
leo-
br
man
del
man
del-n
v
nbod
y
pidi
gits
rege
xdna
reve
rse
reve
rse-
nv
spec
tral
spec
tral
-alt
spec
tral
-mth
Renjin: R in Java1.8x slower
III !
Worse is better…
Java cup
Preparation
!
!
!
•Optimizations:
• allocate slot for a subset of the local variables
• return statement elision
•Only “compilation” step in FastR 0.168
ANTLR Parse Tree FastR Executable Tree
Code Specialization
•In-place profile-driven code specialization
!
!
!
!
!
•Use values of variables to choose a better implementation
•Simple to implement in single threaded context, modulo some care around recursive functions
N
N2
N
N1
N’
N’1 N’2
+ program state+ program state’
+ exception?
Code Specialization
f(12, x+1, a=3)
gen_call
exp
f
12 x+1 3‘a’
exp exp
f <- function(a,b,c){a+c}
Code Specialization
pos_call
expexp exp
f <- function(a,b,c){a+c}
3 12 x+1
Code Specializationclass If { RNode condE, trueB, falseB;! Object execute(Frame f) { try { val = condE.executeScalarLogical(frame); } catch (UnexpectedResult e) { cast = ToLogical.mkNode(condE, e.result()); replaceChild(condE, cast); return execute(frame); } if (val == TRUE) return trueB.execute(f); if (val == FALSE) return falseB.execute(f); throw unexpectedNA(); }
Code Specialization•Specialized code for simple cases, e.g.
arithmetics with scalars; function callsscalar vector indexing arithmetics on vectors of same length
•Bounded self-rewriting: - Uninitialized node rewrites itself - Initialized node rewrites itself when a guard fails
•Rewriting based on events: e.g. a symbol change can re-writes nodes
Data Specialization
•A special representation for common cases of data-types
• Scalar integer with no names, no dimensions, no custom attributes... just the number
!
Reduces memory overhead Merges guards needed in (specialized) code
if (v instanceof RInt && v.size() == 1 && v.names() == null && ! v.dimensions() == null && v.attributes() == null) { ...
if (v instanceof ScalarIntImpl) { ...
Data Specialization
Integer ranges
A specialized implementation of an integer vector which happens to be a range 1,2,…,k k∈N, k>0
Saves memory and enables code specialization
!
!
for(i in 1:n) {...} Java loop over Java integers 1 to n
x[,1:n] = NA Column selection using a Java loop from 1 to n
Data SpecializationInteger ranges
!
!
!
!
!
A range trivially won't contain an NA, negative values, zero, and bounds check for whole range is constant time
Much faster vector indexing
class RIntSimpleRange implements RInt { ! final int to;" int getInt(int i) { return i + 1; }" RInt materialize() { " int[] c = new int[to]; " for (int i = 0; i < to; i++) c[i] = i + 1; " return RInt.RIntFactory.getFor(c);" }"}
Data+Code Specialization
Views
•…delay construction of large data objects
•…are a data-flow representation of vectors
•…avoid unnecessary work if a subset of the data is required
•…avoid allocation of temporary objects
•…permit fusion of multiple data traversals into one
a <- b+c ...
o <- a*2 2
+
*
Data+Code Specialization
Profiling view
Counts number of accesses to a view
An executable node, on first invocation, creates a profiling view; second invocation, rewrites based on the profile
Heuristics:
Avoid small views; avoid over-used root views; avoid repeating computation; avoid recursive views
a <- b+c +P
FastR 0.168 (speedup 8.5x)Java cup
0.00
0.25
0.50
0.75
1.00
bt1
bt2
bt3
pr1
pr2
fa1
fa2
fa3
fa4
fa5 fr1 fr2 kn1
kn2
kn3
kn4
ma1
ma2
ma3
ma4 nb1
nb2
nb3
nb4
nb5
pd1
rd1
rc1
rc2
rc3
sn1
sn2
sn3
sn4
sn5
sn6
sn7
Rel
ative
Exe
cutio
n Ti
me
(GN
UR−A
ST =
1)
FastR GNUR−BC
Relative Time of FastR and GNUR−BC over GNUR−AST
Figure 13. Shootout Relative Execution Times (lower is better). Geo. mean speedup for FastR is 8.5x and 1.8x for GNUR-BC.
0.00
0.25
0.50
0.75
1.00
mc1
mc2
mc3
mc4
mc5 mf1
mf2
mf3
mf4
mf5 pr1
pr2
pr3
pr4
pr5
Rel
ative
Exe
cutio
n Ti
me
(GN
UR−A
ST =
1)
FastR GNUR−BC
Relative Time of FastR and GNUR−BC over GNUR−AST
Figure 14. Benchmark 2.5 Relative Execution Times (lower is better). Geo. mean speedup FastR is 1.7x and 1.1x for GNUR-BC.
stored in the R Frame). Moreover, FastR’s return elision op-timization avoids executing the return statement completelyin the bt benchmarks.
Fusion. Our implementation of fusion of view trees intoJava byte-code, on average, provides no performance change
on the b25 and the shootout benchmarks (numbers shown ingraphs are without fusion). Lacking a realistic applicationthat would stress vector computation, we use a trivial micro-benchmark to validate the potential speed-up of fusion. Wemeasure the time to compute a sequence of commands x =
y + z * y + z - 2 * (8 + z); x[[1]] = 3 for primitive
Runtime information can be leveraged to create
simple, fast, easy to maintain interpreters for real languages
!
http://github.com/allr/fastr
Conclusion