some developments for the r engine - university of...

208
Some Developments for the R Engine Luke Tierney Department of Statistics & Actuarial Science University of Iowa November 10, 2011 Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 1 / 32

Upload: others

Post on 22-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Developments for the R Engine

Luke Tierney

Department of Statistics & Actuarial ScienceUniversity of Iowa

November 10, 2011

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 1 / 32

Page 2: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Introduction

R is a language for data analysis and graphics.

R is based on the S language developed by John Chambers and othersat Bell Labs.

R is widely used in the field of statistics and beyond, especially inuniversity environments.

R has become the primary framework for developing and makingavailable new statistical methodology.

Many (over 3,000) extension packages are available through CRAN orsimilar repositories.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 2 / 32

Page 3: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Introduction

R is a language for data analysis and graphics.

R is based on the S language developed by John Chambers and othersat Bell Labs.

R is widely used in the field of statistics and beyond, especially inuniversity environments.

R has become the primary framework for developing and makingavailable new statistical methodology.

Many (over 3,000) extension packages are available through CRAN orsimilar repositories.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 2 / 32

Page 4: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Introduction

R is a language for data analysis and graphics.

R is based on the S language developed by John Chambers and othersat Bell Labs.

R is widely used in the field of statistics and beyond, especially inuniversity environments.

R has become the primary framework for developing and makingavailable new statistical methodology.

Many (over 3,000) extension packages are available through CRAN orsimilar repositories.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 2 / 32

Page 5: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Introduction

R is a language for data analysis and graphics.

R is based on the S language developed by John Chambers and othersat Bell Labs.

R is widely used in the field of statistics and beyond, especially inuniversity environments.

R has become the primary framework for developing and makingavailable new statistical methodology.

Many (over 3,000) extension packages are available through CRAN orsimilar repositories.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 2 / 32

Page 6: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Introduction

R is a language for data analysis and graphics.

R is based on the S language developed by John Chambers and othersat Bell Labs.

R is widely used in the field of statistics and beyond, especially inuniversity environments.

R has become the primary framework for developing and makingavailable new statistical methodology.

Many (over 3,000) extension packages are available through CRAN orsimilar repositories.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 2 / 32

Page 7: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

History and Development Model

R is an Open Source project.

Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.

Developed by the R-core group since mid 1997,

Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek

Strong community support through

contributed extension packagesmailing lists and blogscontributed documentation and task views

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32

Page 8: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

History and Development Model

R is an Open Source project.

Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.

Developed by the R-core group since mid 1997,

Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek

Strong community support through

contributed extension packagesmailing lists and blogscontributed documentation and task views

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32

Page 9: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

History and Development Model

R is an Open Source project.

Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.

Developed by the R-core group since mid 1997,

Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek

Strong community support through

contributed extension packagesmailing lists and blogscontributed documentation and task views

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32

Page 10: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

History and Development Model

R is an Open Source project.

Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.

Developed by the R-core group since mid 1997,

Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek

Strong community support through

contributed extension packagesmailing lists and blogscontributed documentation and task views

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32

Page 11: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

History and Development Model

R is an Open Source project.

Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.

Developed by the R-core group since mid 1997,

Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek

Strong community support through

contributed extension packagesmailing lists and blogscontributed documentation and task views

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32

Page 12: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

History and Development Model

R is an Open Source project.

Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.

Developed by the R-core group since mid 1997,

Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek

Strong community support through

contributed extension packagesmailing lists and blogscontributed documentation and task views

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32

Page 13: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

History and Development Model

R is an Open Source project.

Originally developed by Robert Gentleman and Ross Ihaka in the early1990’s for a Macintosh computer lab at U. of Auckland, NZ.

Developed by the R-core group since mid 1997,

Douglas Bates John Chambers Peter DalgaardRobert Gentleman Seth Falcon Kurt HornikStefano Iacus Ross Ihaka Friedrich LeischUwe Ligges Thomas Lumley Martin MaechlerDuncan Murdoch Paul Murrell Martyn PlummerBrian Ripley Deepayan Sarkar Duncan Temple LangLuke Tierney Simon Urbanek

Strong community support through

contributed extension packagesmailing lists and blogscontributed documentation and task views

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 3 / 32

Page 14: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Introduction

This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:

Byte code compilation of R code.Taking advantage of multiple cores for

basic vectorized operationssimple matrix operations.

Increasing the limit on the size of vector data objects.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32

Page 15: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Introduction

This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:

Byte code compilation of R code.Taking advantage of multiple cores for

basic vectorized operationssimple matrix operations.

Increasing the limit on the size of vector data objects.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32

Page 16: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Introduction

This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:

Byte code compilation of R code.Taking advantage of multiple cores for

basic vectorized operationssimple matrix operations.

Increasing the limit on the size of vector data objects.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32

Page 17: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Introduction

This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:

Byte code compilation of R code.Taking advantage of multiple cores for

basic vectorized operationssimple matrix operations.

Increasing the limit on the size of vector data objects.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32

Page 18: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Introduction

This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:

Byte code compilation of R code.Taking advantage of multiple cores for

basic vectorized operationssimple matrix operations.

Increasing the limit on the size of vector data objects.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32

Page 19: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Introduction

This talk outlines some recent developments in the core R engine Ihave worked on and developments I expect to work on over the next12 to 18 months:

Byte code compilation of R code.Taking advantage of multiple cores for

basic vectorized operationssimple matrix operations.

Increasing the limit on the size of vector data objects.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 4 / 32

Page 20: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

The standard R evaluation mechanism

parses code into a parse tree when the code is readevaluates code by interpreting the parse trees.

Most low level languages (e.g. C, FORTRAN) compile their sourcecode to native machine code.

Some intermediate level languages (e.g. Java, C#) and manyscripting languages (e.g. Perl, Python) compile to a simpler languagecalled byte code.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 5 / 32

Page 21: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

The standard R evaluation mechanism

parses code into a parse tree when the code is readevaluates code by interpreting the parse trees.

Most low level languages (e.g. C, FORTRAN) compile their sourcecode to native machine code.

Some intermediate level languages (e.g. Java, C#) and manyscripting languages (e.g. Perl, Python) compile to a simpler languagecalled byte code.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 5 / 32

Page 22: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

The standard R evaluation mechanism

parses code into a parse tree when the code is readevaluates code by interpreting the parse trees.

Most low level languages (e.g. C, FORTRAN) compile their sourcecode to native machine code.

Some intermediate level languages (e.g. Java, C#) and manyscripting languages (e.g. Perl, Python) compile to a simpler languagecalled byte code.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 5 / 32

Page 23: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

The standard R evaluation mechanism

parses code into a parse tree when the code is readevaluates code by interpreting the parse trees.

Most low level languages (e.g. C, FORTRAN) compile their sourcecode to native machine code.

Some intermediate level languages (e.g. Java, C#) and manyscripting languages (e.g. Perl, Python) compile to a simpler languagecalled byte code.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 5 / 32

Page 24: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

The standard R evaluation mechanism

parses code into a parse tree when the code is readevaluates code by interpreting the parse trees.

Most low level languages (e.g. C, FORTRAN) compile their sourcecode to native machine code.

Some intermediate level languages (e.g. Java, C#) and manyscripting languages (e.g. Perl, Python) compile to a simpler languagecalled byte code.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 5 / 32

Page 25: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

Byte code is the machine code for a virtual machine.

Virtual machine code can then be interpreted by a simpler, moreefficient interpreter.

Virtual machines, and their machine code, are usually specific to thelanguages they are designed to support.

Various strategies for further compiling byte code to native machinecode are also sometimes used.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 6 / 32

Page 26: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

Byte code is the machine code for a virtual machine.

Virtual machine code can then be interpreted by a simpler, moreefficient interpreter.

Virtual machines, and their machine code, are usually specific to thelanguages they are designed to support.

Various strategies for further compiling byte code to native machinecode are also sometimes used.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 6 / 32

Page 27: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

Byte code is the machine code for a virtual machine.

Virtual machine code can then be interpreted by a simpler, moreefficient interpreter.

Virtual machines, and their machine code, are usually specific to thelanguages they are designed to support.

Various strategies for further compiling byte code to native machinecode are also sometimes used.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 6 / 32

Page 28: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

Byte code is the machine code for a virtual machine.

Virtual machine code can then be interpreted by a simpler, moreefficient interpreter.

Virtual machines, and their machine code, are usually specific to thelanguages they are designed to support.

Various strategies for further compiling byte code to native machinecode are also sometimes used.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 6 / 32

Page 29: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

Efforts to add byte code compilation to R have been underway forsome time.

The first release of the compiler occurred with R 2.13.0 this spring.

Some improvements in the virtual machine interpreter were releasedwith R 2.14.0 this fall.

The current compiler and virtual machine produce goodimprovements in a number of cases.

Better results should be possible with some improvements to thevirtual machine and are currently being explored.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 7 / 32

Page 30: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

Efforts to add byte code compilation to R have been underway forsome time.

The first release of the compiler occurred with R 2.13.0 this spring.

Some improvements in the virtual machine interpreter were releasedwith R 2.14.0 this fall.

The current compiler and virtual machine produce goodimprovements in a number of cases.

Better results should be possible with some improvements to thevirtual machine and are currently being explored.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 7 / 32

Page 31: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

Efforts to add byte code compilation to R have been underway forsome time.

The first release of the compiler occurred with R 2.13.0 this spring.

Some improvements in the virtual machine interpreter were releasedwith R 2.14.0 this fall.

The current compiler and virtual machine produce goodimprovements in a number of cases.

Better results should be possible with some improvements to thevirtual machine and are currently being explored.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 7 / 32

Page 32: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

Efforts to add byte code compilation to R have been underway forsome time.

The first release of the compiler occurred with R 2.13.0 this spring.

Some improvements in the virtual machine interpreter were releasedwith R 2.14.0 this fall.

The current compiler and virtual machine produce goodimprovements in a number of cases.

Better results should be possible with some improvements to thevirtual machine and are currently being explored.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 7 / 32

Page 33: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Byte Code Compilation

Efforts to add byte code compilation to R have been underway forsome time.

The first release of the compiler occurred with R 2.13.0 this spring.

Some improvements in the virtual machine interpreter were releasedwith R 2.14.0 this fall.

The current compiler and virtual machine produce goodimprovements in a number of cases.

Better results should be possible with some improvements to thevirtual machine and are currently being explored.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 7 / 32

Page 34: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The compiler can be called explicitly to compile single functions orfiles of code:

cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp

It is also possible to have package code compiled when a package isinstalled.

Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.

Alternatively, the compiler can be used in a JIT mode where

functions are compiled on first useloops are compiler before they are run

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32

Page 35: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The compiler can be called explicitly to compile single functions orfiles of code:

cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp

It is also possible to have package code compiled when a package isinstalled.

Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.

Alternatively, the compiler can be used in a JIT mode where

functions are compiled on first useloops are compiler before they are run

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32

Page 36: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The compiler can be called explicitly to compile single functions orfiles of code:

cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp

It is also possible to have package code compiled when a package isinstalled.

Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.

Alternatively, the compiler can be used in a JIT mode where

functions are compiled on first useloops are compiler before they are run

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32

Page 37: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The compiler can be called explicitly to compile single functions orfiles of code:

cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp

It is also possible to have package code compiled when a package isinstalled.

Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.

Alternatively, the compiler can be used in a JIT mode where

functions are compiled on first useloops are compiler before they are run

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32

Page 38: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The compiler can be called explicitly to compile single functions orfiles of code:

cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp

It is also possible to have package code compiled when a package isinstalled.

Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.

Alternatively, the compiler can be used in a JIT mode where

functions are compiled on first useloops are compiler before they are run

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32

Page 39: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The compiler can be called explicitly to compile single functions orfiles of code:

cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp

It is also possible to have package code compiled when a package isinstalled.

Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.

Alternatively, the compiler can be used in a JIT mode where

functions are compiled on first useloops are compiler before they are run

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32

Page 40: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The compiler can be called explicitly to compile single functions orfiles of code:

cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp

It is also possible to have package code compiled when a package isinstalled.

Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.

Alternatively, the compiler can be used in a JIT mode where

functions are compiled on first useloops are compiler before they are run

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32

Page 41: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The compiler can be called explicitly to compile single functions orfiles of code:

cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp

It is also possible to have package code compiled when a package isinstalled.

Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.

Alternatively, the compiler can be used in a JIT mode where

functions are compiled on first useloops are compiler before they are run

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32

Page 42: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The compiler can be called explicitly to compile single functions orfiles of code:

cmpfun compiles a functioncmpfile compiles a file to be loaded by loadcmp

It is also possible to have package code compiled when a package isinstalled.

Use --byte-compile when installing, or specify the ByteCompile optionin the DESCRIPTION file.R 2.14.0 by default compiles R code in all base and recommendedpackages.

Alternatively, the compiler can be used in a JIT mode where

functions are compiled on first useloops are compiler before they are run

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 8 / 32

Page 43: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The current compiler includes a number of optimizations, such as

constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.

dnorm(y, 2, 3)

is replaced by

.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))

special instructions for many .Internals

The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32

Page 44: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The current compiler includes a number of optimizations, such as

constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.

dnorm(y, 2, 3)

is replaced by

.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))

special instructions for many .Internals

The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32

Page 45: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The current compiler includes a number of optimizations, such as

constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.

dnorm(y, 2, 3)

is replaced by

.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))

special instructions for many .Internals

The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32

Page 46: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The current compiler includes a number of optimizations, such as

constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.

dnorm(y, 2, 3)

is replaced by

.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))

special instructions for many .Internals

The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32

Page 47: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The current compiler includes a number of optimizations, such as

constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.

dnorm(y, 2, 3)

is replaced by

.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))

special instructions for many .Internals

The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32

Page 48: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Compiler Operation

The current compiler includes a number of optimizations, such as

constant foldingspecial instructions for most SPECIALs, many BUILTINsinlining simple .Internal calls: e.g.

dnorm(y, 2, 3)

is replaced by

.Internal(dnorm(y, mean = 2, sd = 3, log = FALSE))

special instructions for many .Internals

The compiler is currently most effective for code used on scalar dataor short vectors where interpreter overhead is large relative to actualcomputation.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 9 / 32

Page 49: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

A Simple Example

R Code

f <- function(x) {s <- 0.0for (y in x)

s <- s + ys

}

VM Assembly Code

LDCONST 0.0SETVAR sPOPGETVAR xSTARTFOR y L2

L1: GETVAR sGETVAR yADDSETVAR sPOPSTEPFOR L1

L2: ENDFORPOPGETVAR sRETURN

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 10 / 32

Page 50: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

A Simple Example

R Code

f <- function(x) {s <- 0.0for (y in x)

s <- s + ys

}

VM Assembly Code

LDCONST 0.0SETVAR sPOPGETVAR xSTARTFOR y L2

L1: GETVAR sGETVAR yADDSETVAR sPOPSTEPFOR L1

L2: ENDFORPOPGETVAR sRETURN

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 10 / 32

Page 51: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Performance Results

Timings for some simple benchmarks on an x86_64 Ubuntu laptop:

Benchmark Interp. Comp. Speedup Exper. Speedup

p1 32.19 7.98 4.0 1.47 21.9sum 6.72 1.86 3.6 0.59 11.4conv 14.48 4.30 3.4 0.81 17.9rem 56.82 23.68 2.4 4.77 11.9

Interp., Comp. are for the development version of R

includes some variable lookup improvements for compiled code

Exper.: experimental version using

separate instructions for vector, matrix indexing

typed stack to avoid allocating intermediate scalar values

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 11 / 32

Page 52: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Performance Results

Timings for some simple benchmarks on an x86_64 Ubuntu laptop:

Benchmark Interp. Comp. Speedup Exper. Speedup

p1 32.19 7.98 4.0 1.47 21.9sum 6.72 1.86 3.6 0.59 11.4conv 14.48 4.30 3.4 0.81 17.9rem 56.82 23.68 2.4 4.77 11.9

Interp., Comp. are for the development version of R

includes some variable lookup improvements for compiled code

Exper.: experimental version using

separate instructions for vector, matrix indexing

typed stack to avoid allocating intermediate scalar values

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 11 / 32

Page 53: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Performance Results

Timings for some simple benchmarks on an x86_64 Ubuntu laptop:

Benchmark Interp. Comp. Speedup Exper. Speedup

p1 32.19 7.98 4.0 1.47 21.9sum 6.72 1.86 3.6 0.59 11.4conv 14.48 4.30 3.4 0.81 17.9rem 56.82 23.68 2.4 4.77 11.9

Interp., Comp. are for the development version of R

includes some variable lookup improvements for compiled code

Exper.: experimental version using

separate instructions for vector, matrix indexing

typed stack to avoid allocating intermediate scalar values

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 11 / 32

Page 54: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 55: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 56: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 57: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 58: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 59: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 60: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 61: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 62: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 63: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 64: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 65: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 66: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 67: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 68: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Future Directions

The current virtual machine uses a stack based design.

An alternative approach might use a register-based design.

Some additional optimizations currently being explored:

avoiding the allocation of intermediate values when possiblemore efficient variable lookup mechanismsmore efficient function callspossibly improved handling of lazy evaluation

Some promising preliminary results are available.

Other possible directions include

Partial evaluation when some arguments are constantsIntra-procedural optimizations and inliningDeclarations (sealing, scalars, types, strictness)Machine code generation using LLVM or other approaches

Collaborations with computer scientists are being explored.

Maintainability is a key concern.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 12 / 32

Page 69: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vector and Matrix Operations

Most modern computers feature two or more processor cores.

It is expected that tens of cores will soon be common.

A common question:

How can I make R use more than one core for my computation?

There are many easy answers.

But this is the wrong question.

The right question:

How can we take advantage of having more than one core to get ourcomputations to run faster?

This is harder to answer.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32

Page 70: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vector and Matrix Operations

Most modern computers feature two or more processor cores.

It is expected that tens of cores will soon be common.

A common question:

How can I make R use more than one core for my computation?

There are many easy answers.

But this is the wrong question.

The right question:

How can we take advantage of having more than one core to get ourcomputations to run faster?

This is harder to answer.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32

Page 71: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vector and Matrix Operations

Most modern computers feature two or more processor cores.

It is expected that tens of cores will soon be common.

A common question:

How can I make R use more than one core for my computation?

There are many easy answers.

But this is the wrong question.

The right question:

How can we take advantage of having more than one core to get ourcomputations to run faster?

This is harder to answer.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32

Page 72: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vector and Matrix Operations

Most modern computers feature two or more processor cores.

It is expected that tens of cores will soon be common.

A common question:

How can I make R use more than one core for my computation?

There are many easy answers.

But this is the wrong question.

The right question:

How can we take advantage of having more than one core to get ourcomputations to run faster?

This is harder to answer.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32

Page 73: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vector and Matrix Operations

Most modern computers feature two or more processor cores.

It is expected that tens of cores will soon be common.

A common question:

How can I make R use more than one core for my computation?

There are many easy answers.

But this is the wrong question.

The right question:

How can we take advantage of having more than one core to get ourcomputations to run faster?

This is harder to answer.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32

Page 74: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vector and Matrix Operations

Most modern computers feature two or more processor cores.

It is expected that tens of cores will soon be common.

A common question:

How can I make R use more than one core for my computation?

There are many easy answers.

But this is the wrong question.

The right question:

How can we take advantage of having more than one core to get ourcomputations to run faster?

This is harder to answer.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32

Page 75: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vector and Matrix Operations

Most modern computers feature two or more processor cores.

It is expected that tens of cores will soon be common.

A common question:

How can I make R use more than one core for my computation?

There are many easy answers.

But this is the wrong question.

The right question:

How can we take advantage of having more than one core to get ourcomputations to run faster?

This is harder to answer.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32

Page 76: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vector and Matrix Operations

Most modern computers feature two or more processor cores.

It is expected that tens of cores will soon be common.

A common question:

How can I make R use more than one core for my computation?

There are many easy answers.

But this is the wrong question.

The right question:

How can we take advantage of having more than one core to get ourcomputations to run faster?

This is harder to answer.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32

Page 77: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vector and Matrix Operations

Most modern computers feature two or more processor cores.

It is expected that tens of cores will soon be common.

A common question:

How can I make R use more than one core for my computation?

There are many easy answers.

But this is the wrong question.

The right question:

How can we take advantage of having more than one core to get ourcomputations to run faster?

This is harder to answer.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 13 / 32

Page 78: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Approaches to Parallel Computing

Two possible approaches:Explicit parallelization:

uses some form of annotation to specify parallelismpackages snow, multicore, parallel.

Implicit parallelization:

automatic, no user action needed

I will focus on implicit parallelization of

basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32

Page 79: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Approaches to Parallel Computing

Two possible approaches:Explicit parallelization:

uses some form of annotation to specify parallelismpackages snow, multicore, parallel.

Implicit parallelization:

automatic, no user action needed

I will focus on implicit parallelization of

basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32

Page 80: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Approaches to Parallel Computing

Two possible approaches:Explicit parallelization:

uses some form of annotation to specify parallelismpackages snow, multicore, parallel.

Implicit parallelization:

automatic, no user action needed

I will focus on implicit parallelization of

basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32

Page 81: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Approaches to Parallel Computing

Two possible approaches:Explicit parallelization:

uses some form of annotation to specify parallelismpackages snow, multicore, parallel.

Implicit parallelization:

automatic, no user action needed

I will focus on implicit parallelization of

basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32

Page 82: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Approaches to Parallel Computing

Two possible approaches:Explicit parallelization:

uses some form of annotation to specify parallelismpackages snow, multicore, parallel.

Implicit parallelization:

automatic, no user action needed

I will focus on implicit parallelization of

basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32

Page 83: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Approaches to Parallel Computing

Two possible approaches:Explicit parallelization:

uses some form of annotation to specify parallelismpackages snow, multicore, parallel.

Implicit parallelization:

automatic, no user action needed

I will focus on implicit parallelization of

basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32

Page 84: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Approaches to Parallel Computing

Two possible approaches:Explicit parallelization:

uses some form of annotation to specify parallelismpackages snow, multicore, parallel.

Implicit parallelization:

automatic, no user action needed

I will focus on implicit parallelization of

basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32

Page 85: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Approaches to Parallel Computing

Two possible approaches:Explicit parallelization:

uses some form of annotation to specify parallelismpackages snow, multicore, parallel.

Implicit parallelization:

automatic, no user action needed

I will focus on implicit parallelization of

basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32

Page 86: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Approaches to Parallel Computing

Two possible approaches:Explicit parallelization:

uses some form of annotation to specify parallelismpackages snow, multicore, parallel.

Implicit parallelization:

automatic, no user action needed

I will focus on implicit parallelization of

basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32

Page 87: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Approaches to Parallel Computing

Two possible approaches:Explicit parallelization:

uses some form of annotation to specify parallelismpackages snow, multicore, parallel.

Implicit parallelization:

automatic, no user action needed

I will focus on implicit parallelization of

basic vectorized math functionsbasic matrix operations (e.g. colSums)BLAS

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 14 / 32

Page 88: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAn Idealized View

Basic idea for computing f(x[1:n]) on a two-processor system:

Run two worker threads.Place half the computation on each thread.

Ideally this would produce a two-fold speed up.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 15 / 32

Page 89: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAn Idealized View

Basic idea for computing f(x[1:n]) on a two-processor system:

Run two worker threads.Place half the computation on each thread.

Ideally this would produce a two-fold speed up.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 15 / 32

Page 90: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAn Idealized View

Basic idea for computing f(x[1:n]) on a two-processor system:

Run two worker threads.Place half the computation on each thread.

Ideally this would produce a two-fold speed up.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 15 / 32

Page 91: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAn Idealized View

Basic idea for computing f(x[1:n]) on a two-processor system:

Run two worker threads.Place half the computation on each thread.

Ideally this would produce a two-fold speed up.

Parallel

Sequential n

n/2

n/2

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 15 / 32

Page 92: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsA More Realistic View

Reality is a bit different:

Parallel

Sequential n

n/2

n/2

There is

synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload

Parallelizing will only pay off if n is large enough.

For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.

Careful tuning to ensure improvement will be needed.

Some aspects will depend on architecture and OS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32

Page 93: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsA More Realistic View

Reality is a bit different:

Parallel

Sequential n

n/2

n/2

There is

synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload

Parallelizing will only pay off if n is large enough.

For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.

Careful tuning to ensure improvement will be needed.

Some aspects will depend on architecture and OS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32

Page 94: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsA More Realistic View

Reality is a bit different:

Parallel

Sequential n

n/2

n/2

There is

synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload

Parallelizing will only pay off if n is large enough.

For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.

Careful tuning to ensure improvement will be needed.

Some aspects will depend on architecture and OS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32

Page 95: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsA More Realistic View

Reality is a bit different:

Parallel

Sequential n

n/2

n/2

There is

synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload

Parallelizing will only pay off if n is large enough.

For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.

Careful tuning to ensure improvement will be needed.

Some aspects will depend on architecture and OS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32

Page 96: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsA More Realistic View

Reality is a bit different:

Parallel

Sequential n

n/2

n/2

There is

synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload

Parallelizing will only pay off if n is large enough.

For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.

Careful tuning to ensure improvement will be needed.

Some aspects will depend on architecture and OS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32

Page 97: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsA More Realistic View

Reality is a bit different:

Parallel

Sequential n

n/2

n/2

There is

synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload

Parallelizing will only pay off if n is large enough.

For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.

Careful tuning to ensure improvement will be needed.

Some aspects will depend on architecture and OS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32

Page 98: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsA More Realistic View

Reality is a bit different:

Parallel

Sequential n

n/2

n/2

There is

synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload

Parallelizing will only pay off if n is large enough.

For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.

Careful tuning to ensure improvement will be needed.

Some aspects will depend on architecture and OS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32

Page 99: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsA More Realistic View

Reality is a bit different:

Parallel

Sequential n

n/2

n/2

There is

synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload

Parallelizing will only pay off if n is large enough.

For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.

Careful tuning to ensure improvement will be needed.

Some aspects will depend on architecture and OS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32

Page 100: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsA More Realistic View

Reality is a bit different:

Parallel

Sequential n

n/2

n/2

There is

synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload

Parallelizing will only pay off if n is large enough.

For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.

Careful tuning to ensure improvement will be needed.

Some aspects will depend on architecture and OS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32

Page 101: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsA More Realistic View

Reality is a bit different:

Parallel

Sequential n

n/2

n/2

There is

synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload

Parallelizing will only pay off if n is large enough.

For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.

Careful tuning to ensure improvement will be needed.

Some aspects will depend on architecture and OS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32

Page 102: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsA More Realistic View

Reality is a bit different:

Parallel

Sequential n

n/2

n/2

There is

synchronization overheadsequential code and use of shared resources (memory, bus, . . . )uneven workload

Parallelizing will only pay off if n is large enough.

For some functions, e.g. qbeta, n ≈ 10 may be large enough.For some, e.g. qnorm, n ≈ 1000 is needed.For basic arithmetic operations n ≈ 30000 may be needed.

Careful tuning to ensure improvement will be needed.

Some aspects will depend on architecture and OS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 16 / 32

Page 103: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsSome Experimental Results

0 200 400 600 800 1000

0.00

0.05

0.10

0.15

0.20

0.25

0.30

n

CP

U ti

me

in m

illis

econ

ds

qnorm, Linux/AMD/x86_64

0 200 400 600 800 1000

0.00

0.05

0.10

0.15

0.20

0.25

0.30

n

CP

U ti

me

in m

illis

econ

ds

pgamma, Linux/AMD/x86_64

0 200 400 600 800 1000

0.00

0.05

0.10

0.15

0.20

0.25

0.30

n

CP

U ti

me

in m

illis

econ

ds

qnorm, Mac OS X/Intel/i386

0 200 400 600 800 1000

0.00

0.05

0.10

0.15

0.20

0.25

0.30

n

CP

U ti

me

in m

illis

econ

ds

pgamma, Mac OS X/Intel/i386

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 17 / 32

Page 104: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsSome Experimental Results

Some observations:

Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,

sP ≈ s1/P

Relative slopes of functions seem roughly independent ofOS/architecture.

A simple calibration strategy:

Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.

The appropriate time to run calibration code is still open.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32

Page 105: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsSome Experimental Results

Some observations:

Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,

sP ≈ s1/P

Relative slopes of functions seem roughly independent ofOS/architecture.

A simple calibration strategy:

Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.

The appropriate time to run calibration code is still open.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32

Page 106: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsSome Experimental Results

Some observations:

Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,

sP ≈ s1/P

Relative slopes of functions seem roughly independent ofOS/architecture.

A simple calibration strategy:

Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.

The appropriate time to run calibration code is still open.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32

Page 107: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsSome Experimental Results

Some observations:

Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,

sP ≈ s1/P

Relative slopes of functions seem roughly independent ofOS/architecture.

A simple calibration strategy:

Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.

The appropriate time to run calibration code is still open.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32

Page 108: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsSome Experimental Results

Some observations:

Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,

sP ≈ s1/P

Relative slopes of functions seem roughly independent ofOS/architecture.

A simple calibration strategy:

Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.

The appropriate time to run calibration code is still open.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32

Page 109: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsSome Experimental Results

Some observations:

Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,

sP ≈ s1/P

Relative slopes of functions seem roughly independent ofOS/architecture.

A simple calibration strategy:

Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.

The appropriate time to run calibration code is still open.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32

Page 110: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsSome Experimental Results

Some observations:

Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,

sP ≈ s1/P

Relative slopes of functions seem roughly independent ofOS/architecture.

A simple calibration strategy:

Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.

The appropriate time to run calibration code is still open.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32

Page 111: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsSome Experimental Results

Some observations:

Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,

sP ≈ s1/P

Relative slopes of functions seem roughly independent ofOS/architecture.

A simple calibration strategy:

Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.

The appropriate time to run calibration code is still open.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32

Page 112: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsSome Experimental Results

Some observations:

Times are roughly linear in vector length.Intercepts on a given platform are roughly the same for all functions.If the slope for P processors is sP , then at least for P = 2 and P = 4,

sP ≈ s1/P

Relative slopes of functions seem roughly independent ofOS/architecture.

A simple calibration strategy:

Compute relative slopes once, or average across several setups.For each OS/architecture combination compute the intercepts.

The appropriate time to run calibration code is still open.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 18 / 32

Page 113: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Need to use threads

One possibility: using raw pthreads

Better choice: use Open MP

Open MP consists of

compiler directives (#pragma statements in C)a runtime support library

Most commercial compilers support Open MP.

Current gcc versions support Open MP; newer ones do a better job.

MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.

Support for Win64 now available also and should be in the toolchainsoon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32

Page 114: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Need to use threads

One possibility: using raw pthreads

Better choice: use Open MP

Open MP consists of

compiler directives (#pragma statements in C)a runtime support library

Most commercial compilers support Open MP.

Current gcc versions support Open MP; newer ones do a better job.

MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.

Support for Win64 now available also and should be in the toolchainsoon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32

Page 115: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Need to use threads

One possibility: using raw pthreads

Better choice: use Open MP

Open MP consists of

compiler directives (#pragma statements in C)a runtime support library

Most commercial compilers support Open MP.

Current gcc versions support Open MP; newer ones do a better job.

MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.

Support for Win64 now available also and should be in the toolchainsoon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32

Page 116: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Need to use threads

One possibility: using raw pthreads

Better choice: use Open MP

Open MP consists of

compiler directives (#pragma statements in C)a runtime support library

Most commercial compilers support Open MP.

Current gcc versions support Open MP; newer ones do a better job.

MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.

Support for Win64 now available also and should be in the toolchainsoon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32

Page 117: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Need to use threads

One possibility: using raw pthreads

Better choice: use Open MP

Open MP consists of

compiler directives (#pragma statements in C)a runtime support library

Most commercial compilers support Open MP.

Current gcc versions support Open MP; newer ones do a better job.

MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.

Support for Win64 now available also and should be in the toolchainsoon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32

Page 118: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Need to use threads

One possibility: using raw pthreads

Better choice: use Open MP

Open MP consists of

compiler directives (#pragma statements in C)a runtime support library

Most commercial compilers support Open MP.

Current gcc versions support Open MP; newer ones do a better job.

MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.

Support for Win64 now available also and should be in the toolchainsoon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32

Page 119: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Need to use threads

One possibility: using raw pthreads

Better choice: use Open MP

Open MP consists of

compiler directives (#pragma statements in C)a runtime support library

Most commercial compilers support Open MP.

Current gcc versions support Open MP; newer ones do a better job.

MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.

Support for Win64 now available also and should be in the toolchainsoon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32

Page 120: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Need to use threads

One possibility: using raw pthreads

Better choice: use Open MP

Open MP consists of

compiler directives (#pragma statements in C)a runtime support library

Most commercial compilers support Open MP.

Current gcc versions support Open MP; newer ones do a better job.

MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.

Support for Win64 now available also and should be in the toolchainsoon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32

Page 121: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Need to use threads

One possibility: using raw pthreads

Better choice: use Open MP

Open MP consists of

compiler directives (#pragma statements in C)a runtime support library

Most commercial compilers support Open MP.

Current gcc versions support Open MP; newer ones do a better job.

MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.

Support for Win64 now available also and should be in the toolchainsoon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32

Page 122: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Need to use threads

One possibility: using raw pthreads

Better choice: use Open MP

Open MP consists of

compiler directives (#pragma statements in C)a runtime support library

Most commercial compilers support Open MP.

Current gcc versions support Open MP; newer ones do a better job.

MinGW for Win32 also supports Open MP; an additional pthreadsdownload is needed.

Support for Win64 now available also and should be in the toolchainsoon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 19 / 32

Page 123: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Basic loop for a one-argument function:#pragma omp parallel for if (P > 0) num_threads(P) \

default(shared) private(i) reduction(&&:naflag)

for (i = 0; i < n; i++) {

double ai = a[i];

MATH1_LOOP_BODY(y[i], f(ai), ai, naflag);

}

Steps in converting to Open MP:

check f is thread-safe; modify if notrewrite loop to work with the Open MP directivetest without Open MP, then enable Open MP

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 20 / 32

Page 124: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Basic loop for a one-argument function:#pragma omp parallel for if (P > 0) num_threads(P) \

default(shared) private(i) reduction(&&:naflag)

for (i = 0; i < n; i++) {

double ai = a[i];

MATH1_LOOP_BODY(y[i], f(ai), ai, naflag);

}

Steps in converting to Open MP:

check f is thread-safe; modify if notrewrite loop to work with the Open MP directivetest without Open MP, then enable Open MP

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 20 / 32

Page 125: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Basic loop for a one-argument function:#pragma omp parallel for if (P > 0) num_threads(P) \

default(shared) private(i) reduction(&&:naflag)

for (i = 0; i < n; i++) {

double ai = a[i];

MATH1_LOOP_BODY(y[i], f(ai), ai, naflag);

}

Steps in converting to Open MP:

check f is thread-safe; modify if notrewrite loop to work with the Open MP directivetest without Open MP, then enable Open MP

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 20 / 32

Page 126: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Basic loop for a one-argument function:#pragma omp parallel for if (P > 0) num_threads(P) \

default(shared) private(i) reduction(&&:naflag)

for (i = 0; i < n; i++) {

double ai = a[i];

MATH1_LOOP_BODY(y[i], f(ai), ai, naflag);

}

Steps in converting to Open MP:

check f is thread-safe; modify if notrewrite loop to work with the Open MP directivetest without Open MP, then enable Open MP

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 20 / 32

Page 127: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Basic loop for a one-argument function:#pragma omp parallel for if (P > 0) num_threads(P) \

default(shared) private(i) reduction(&&:naflag)

for (i = 0; i < n; i++) {

double ai = a[i];

MATH1_LOOP_BODY(y[i], f(ai), ai, naflag);

}

Steps in converting to Open MP:

check f is thread-safe; modify if notrewrite loop to work with the Open MP directivetest without Open MP, then enable Open MP

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 20 / 32

Page 128: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Some things that are not thread-safe:

use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)

Random number generation is also problematic.

Functions in nmath that have not been parallelized yet:

Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32

Page 129: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Some things that are not thread-safe:

use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)

Random number generation is also problematic.

Functions in nmath that have not been parallelized yet:

Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32

Page 130: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Some things that are not thread-safe:

use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)

Random number generation is also problematic.

Functions in nmath that have not been parallelized yet:

Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32

Page 131: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Some things that are not thread-safe:

use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)

Random number generation is also problematic.

Functions in nmath that have not been parallelized yet:

Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32

Page 132: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Some things that are not thread-safe:

use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)

Random number generation is also problematic.

Functions in nmath that have not been parallelized yet:

Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32

Page 133: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Some things that are not thread-safe:

use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)

Random number generation is also problematic.

Functions in nmath that have not been parallelized yet:

Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32

Page 134: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Some things that are not thread-safe:

use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)

Random number generation is also problematic.

Functions in nmath that have not been parallelized yet:

Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32

Page 135: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Some things that are not thread-safe:

use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)

Random number generation is also problematic.

Functions in nmath that have not been parallelized yet:

Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32

Page 136: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Some things that are not thread-safe:

use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)

Random number generation is also problematic.

Functions in nmath that have not been parallelized yet:

Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32

Page 137: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Some things that are not thread-safe:

use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)

Random number generation is also problematic.

Functions in nmath that have not been parallelized yet:

Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32

Page 138: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsImplementation

Some things that are not thread-safe:

use of global variablesR memory allocationsignaling warnings and errorsuser interrupt checkingcreating internationalized messages (calls to gettext)

Random number generation is also problematic.

Functions in nmath that have not been parallelized yet:

Bessel functions (partially done)Wilcoxon, signed rank functions (may not make sense)random number generators

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 21 / 32

Page 139: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAvailability

Package pnmath is available at

http://www.stat.uiowa.edu/~luke/R/experimental/

This requires a version of gcc that

supports Open MPallows dlopen to be used on libgomp.so

A version using just pthreads is available in pnmath0.

Loading these packages replaces builtin operations by parallelizedones.

For Linux, Mac OS X predetermined intercept calibrations are used.

For other platforms a calibration test is run at package load time.

The calibration can be run manually by calling calibratePnmath

Hopefully we will be able to include this in R soon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32

Page 140: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAvailability

Package pnmath is available at

http://www.stat.uiowa.edu/~luke/R/experimental/

This requires a version of gcc that

supports Open MPallows dlopen to be used on libgomp.so

A version using just pthreads is available in pnmath0.

Loading these packages replaces builtin operations by parallelizedones.

For Linux, Mac OS X predetermined intercept calibrations are used.

For other platforms a calibration test is run at package load time.

The calibration can be run manually by calling calibratePnmath

Hopefully we will be able to include this in R soon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32

Page 141: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAvailability

Package pnmath is available at

http://www.stat.uiowa.edu/~luke/R/experimental/

This requires a version of gcc that

supports Open MPallows dlopen to be used on libgomp.so

A version using just pthreads is available in pnmath0.

Loading these packages replaces builtin operations by parallelizedones.

For Linux, Mac OS X predetermined intercept calibrations are used.

For other platforms a calibration test is run at package load time.

The calibration can be run manually by calling calibratePnmath

Hopefully we will be able to include this in R soon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32

Page 142: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAvailability

Package pnmath is available at

http://www.stat.uiowa.edu/~luke/R/experimental/

This requires a version of gcc that

supports Open MPallows dlopen to be used on libgomp.so

A version using just pthreads is available in pnmath0.

Loading these packages replaces builtin operations by parallelizedones.

For Linux, Mac OS X predetermined intercept calibrations are used.

For other platforms a calibration test is run at package load time.

The calibration can be run manually by calling calibratePnmath

Hopefully we will be able to include this in R soon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32

Page 143: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAvailability

Package pnmath is available at

http://www.stat.uiowa.edu/~luke/R/experimental/

This requires a version of gcc that

supports Open MPallows dlopen to be used on libgomp.so

A version using just pthreads is available in pnmath0.

Loading these packages replaces builtin operations by parallelizedones.

For Linux, Mac OS X predetermined intercept calibrations are used.

For other platforms a calibration test is run at package load time.

The calibration can be run manually by calling calibratePnmath

Hopefully we will be able to include this in R soon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32

Page 144: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAvailability

Package pnmath is available at

http://www.stat.uiowa.edu/~luke/R/experimental/

This requires a version of gcc that

supports Open MPallows dlopen to be used on libgomp.so

A version using just pthreads is available in pnmath0.

Loading these packages replaces builtin operations by parallelizedones.

For Linux, Mac OS X predetermined intercept calibrations are used.

For other platforms a calibration test is run at package load time.

The calibration can be run manually by calling calibratePnmath

Hopefully we will be able to include this in R soon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32

Page 145: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAvailability

Package pnmath is available at

http://www.stat.uiowa.edu/~luke/R/experimental/

This requires a version of gcc that

supports Open MPallows dlopen to be used on libgomp.so

A version using just pthreads is available in pnmath0.

Loading these packages replaces builtin operations by parallelizedones.

For Linux, Mac OS X predetermined intercept calibrations are used.

For other platforms a calibration test is run at package load time.

The calibration can be run manually by calling calibratePnmath

Hopefully we will be able to include this in R soon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32

Page 146: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAvailability

Package pnmath is available at

http://www.stat.uiowa.edu/~luke/R/experimental/

This requires a version of gcc that

supports Open MPallows dlopen to be used on libgomp.so

A version using just pthreads is available in pnmath0.

Loading these packages replaces builtin operations by parallelizedones.

For Linux, Mac OS X predetermined intercept calibrations are used.

For other platforms a calibration test is run at package load time.

The calibration can be run manually by calling calibratePnmath

Hopefully we will be able to include this in R soon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32

Page 147: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAvailability

Package pnmath is available at

http://www.stat.uiowa.edu/~luke/R/experimental/

This requires a version of gcc that

supports Open MPallows dlopen to be used on libgomp.so

A version using just pthreads is available in pnmath0.

Loading these packages replaces builtin operations by parallelizedones.

For Linux, Mac OS X predetermined intercept calibrations are used.

For other platforms a calibration test is run at package load time.

The calibration can be run manually by calling calibratePnmath

Hopefully we will be able to include this in R soon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32

Page 148: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Vectorized OperationsAvailability

Package pnmath is available at

http://www.stat.uiowa.edu/~luke/R/experimental/

This requires a version of gcc that

supports Open MPallows dlopen to be used on libgomp.so

A version using just pthreads is available in pnmath0.

Loading these packages replaces builtin operations by parallelizedones.

For Linux, Mac OS X predetermined intercept calibrations are used.

For other platforms a calibration test is run at package load time.

The calibration can be run manually by calling calibratePnmath

Hopefully we will be able to include this in R soon.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 22 / 32

Page 149: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Simple Matrix Operations

Very preliminary results for colSums on an 8-core Linux machine:

size

time

2.5e−05

3.0e−05

3.5e−05

4.0e−05

4.5e−05

5.0e−05

0 5000 10000 15000

●●●

●●●●●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●●●●●

●●●

●●

● ●

●●

●● ●

● ●

● ●

● ●

● ●

● ●

●●

● ●

● ●

● ●

● ●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

● ●

● ●

●●

● ●

●●●●●

●●●●

● ●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●●●

●●

●● ●

● ●

● ●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ● ● ●● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

12468

Preliminary results on OS X indicate cutoff levels may be much higher.

Part of the implementation work was done by Xiao Yang.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 23 / 32

Page 150: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Simple Matrix Operations

Very preliminary results for colSums on an 8-core Linux machine:

size

time

2.5e−05

3.0e−05

3.5e−05

4.0e−05

4.5e−05

5.0e−05

0 5000 10000 15000

●●●

●●●●●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●●●●●

●●●

●●

● ●

●●

●● ●

● ●

● ●

● ●

● ●

● ●

●●

● ●

● ●

● ●

● ●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

● ●

● ●

●●

● ●

●●●●●

●●●●

● ●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●●●

●●

●● ●

● ●

● ●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ● ● ●● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

12468

Preliminary results on OS X indicate cutoff levels may be much higher.

Part of the implementation work was done by Xiao Yang.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 23 / 32

Page 151: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Simple Matrix Operations

Very preliminary results for colSums on an 8-core Linux machine:

size

time

2.5e−05

3.0e−05

3.5e−05

4.0e−05

4.5e−05

5.0e−05

0 5000 10000 15000

●●●

●●●●●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●●●●●

●●●

●●

● ●

●●

●● ●

● ●

● ●

● ●

● ●

● ●

●●

● ●

● ●

● ●

● ●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

● ●

● ●

●●

● ●

●●●●●

●●●●

● ●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●●●

●●

●● ●

● ●

● ●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ● ● ●● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

12468

Preliminary results on OS X indicate cutoff levels may be much higher.

Part of the implementation work was done by Xiao Yang.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 23 / 32

Page 152: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Simple Matrix Operations

Some issues to consider:

Again using too many processor cores for small problems can slow thecomputation down.

colSums can be parallelized by rows or columns:

Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).

rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.

A number of other basic operations can be handled similarly.

Simple uses of apply and sweep might also be handled along theselines.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32

Page 153: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Simple Matrix Operations

Some issues to consider:

Again using too many processor cores for small problems can slow thecomputation down.

colSums can be parallelized by rows or columns:

Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).

rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.

A number of other basic operations can be handled similarly.

Simple uses of apply and sweep might also be handled along theselines.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32

Page 154: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Simple Matrix Operations

Some issues to consider:

Again using too many processor cores for small problems can slow thecomputation down.

colSums can be parallelized by rows or columns:

Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).

rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.

A number of other basic operations can be handled similarly.

Simple uses of apply and sweep might also be handled along theselines.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32

Page 155: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Simple Matrix Operations

Some issues to consider:

Again using too many processor cores for small problems can slow thecomputation down.

colSums can be parallelized by rows or columns:

Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).

rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.

A number of other basic operations can be handled similarly.

Simple uses of apply and sweep might also be handled along theselines.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32

Page 156: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Simple Matrix Operations

Some issues to consider:

Again using too many processor cores for small problems can slow thecomputation down.

colSums can be parallelized by rows or columns:

Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).

rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.

A number of other basic operations can be handled similarly.

Simple uses of apply and sweep might also be handled along theselines.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32

Page 157: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Simple Matrix Operations

Some issues to consider:

Again using too many processor cores for small problems can slow thecomputation down.

colSums can be parallelized by rows or columns:

Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).

rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.

A number of other basic operations can be handled similarly.

Simple uses of apply and sweep might also be handled along theselines.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32

Page 158: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelizing Simple Matrix Operations

Some issues to consider:

Again using too many processor cores for small problems can slow thecomputation down.

colSums can be parallelized by rows or columns:

Handling groups of columns in parallel produces identical results to asequential version.Handling groups of rows in parallel changes numerical results slightly(floating point addition is not associative).

rowSums is slightly more complex since locality of reference (columnmajor storage) need to be taken into account.

A number of other basic operations can be handled similarly.

Simple uses of apply and sweep might also be handled along theselines.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 24 / 32

Page 159: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Using a Parallel BLAS

Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).

R has supported using a custom BLAS implementation for some time.

Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.

Atlas and Goto’s BLAS also come in sequential and threaded versions.

Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.

Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.

More testing is needed.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32

Page 160: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Using a Parallel BLAS

Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).

R has supported using a custom BLAS implementation for some time.

Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.

Atlas and Goto’s BLAS also come in sequential and threaded versions.

Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.

Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.

More testing is needed.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32

Page 161: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Using a Parallel BLAS

Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).

R has supported using a custom BLAS implementation for some time.

Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.

Atlas and Goto’s BLAS also come in sequential and threaded versions.

Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.

Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.

More testing is needed.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32

Page 162: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Using a Parallel BLAS

Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).

R has supported using a custom BLAS implementation for some time.

Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.

Atlas and Goto’s BLAS also come in sequential and threaded versions.

Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.

Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.

More testing is needed.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32

Page 163: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Using a Parallel BLAS

Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).

R has supported using a custom BLAS implementation for some time.

Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.

Atlas and Goto’s BLAS also come in sequential and threaded versions.

Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.

Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.

More testing is needed.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32

Page 164: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Using a Parallel BLAS

Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).

R has supported using a custom BLAS implementation for some time.

Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.

Atlas and Goto’s BLAS also come in sequential and threaded versions.

Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.

Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.

More testing is needed.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32

Page 165: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Using a Parallel BLAS

Most core linear algebra calculations use the Basic Linear AlgebraSubroutines library (BLAS).

R has supported using a custom BLAS implementation for some time.

Both Intel and AMD provide sequential and threaded acceleratedBLAS implementations.

Atlas and Goto’s BLAS also come in sequential and threaded versions.

Very preliminary testing suggests that the Intel threaded BLAS workswell for small and large matrices.

Anecdotal evidence, that may no longer apply, suggests that this maynot be true of some other threaded BLAS implementations.

More testing is needed.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 25 / 32

Page 166: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelization and Compilation

Compilation may be useful for parallelizing vector operations:

Many vector operations occur in compound expressions, like

exp(-0.5*x^2)

A compiler may be able to fuse these operations:

This will allow gains from parallelizing compound operations onshorter vectors.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 26 / 32

Page 167: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelization and Compilation

Compilation may be useful for parallelizing vector operations:

Many vector operations occur in compound expressions, like

exp(-0.5*x^2)

A compiler may be able to fuse these operations:

This will allow gains from parallelizing compound operations onshorter vectors.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 26 / 32

Page 168: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelization and Compilation

Compilation may be useful for parallelizing vector operations:

Many vector operations occur in compound expressions, like

exp(-0.5*x^2)

A compiler may be able to fuse these operations:

SQUARE

SQUARE

SCALE

SCALE EXP

EXP

EXPSQUARE SCALE

SQUARE SCALE EXP

Compiled, fused

Interpreted

Sequential SQUARE SCALE EXP

This will allow gains from parallelizing compound operations onshorter vectors.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 26 / 32

Page 169: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Parallelization and Compilation

Compilation may be useful for parallelizing vector operations:

Many vector operations occur in compound expressions, like

exp(-0.5*x^2)

A compiler may be able to fuse these operations:

SQUARE

SQUARE

SCALE

SCALE EXP

EXP

EXPSQUARE SCALE

SQUARE SCALE EXP

Compiled, fused

Interpreted

Sequential SQUARE SCALE EXP

This will allow gains from parallelizing compound operations onshorter vectors.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 26 / 32

Page 170: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Big Data

Automated data acquisition in science and commerce is producinghuge amounts of data.

Big Data is a hot topic in the popular and trade press.

Some categories:

fit into memoryfit on one machines disk storagerequire multiple machines to store

Smaller large data sets can be handled by standard methods if enoughmemory is available.

Very large data sets require specialized methods and algorithms.

R should be able to smaller large data problems on machines withenough memory.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32

Page 171: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Big Data

Automated data acquisition in science and commerce is producinghuge amounts of data.

Big Data is a hot topic in the popular and trade press.

Some categories:

fit into memoryfit on one machines disk storagerequire multiple machines to store

Smaller large data sets can be handled by standard methods if enoughmemory is available.

Very large data sets require specialized methods and algorithms.

R should be able to smaller large data problems on machines withenough memory.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32

Page 172: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Big Data

Automated data acquisition in science and commerce is producinghuge amounts of data.

Big Data is a hot topic in the popular and trade press.

Some categories:

fit into memoryfit on one machines disk storagerequire multiple machines to store

Smaller large data sets can be handled by standard methods if enoughmemory is available.

Very large data sets require specialized methods and algorithms.

R should be able to smaller large data problems on machines withenough memory.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32

Page 173: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Big Data

Automated data acquisition in science and commerce is producinghuge amounts of data.

Big Data is a hot topic in the popular and trade press.

Some categories:

fit into memoryfit on one machines disk storagerequire multiple machines to store

Smaller large data sets can be handled by standard methods if enoughmemory is available.

Very large data sets require specialized methods and algorithms.

R should be able to smaller large data problems on machines withenough memory.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32

Page 174: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Big Data

Automated data acquisition in science and commerce is producinghuge amounts of data.

Big Data is a hot topic in the popular and trade press.

Some categories:

fit into memoryfit on one machines disk storagerequire multiple machines to store

Smaller large data sets can be handled by standard methods if enoughmemory is available.

Very large data sets require specialized methods and algorithms.

R should be able to smaller large data problems on machines withenough memory.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32

Page 175: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Big Data

Automated data acquisition in science and commerce is producinghuge amounts of data.

Big Data is a hot topic in the popular and trade press.

Some categories:

fit into memoryfit on one machines disk storagerequire multiple machines to store

Smaller large data sets can be handled by standard methods if enoughmemory is available.

Very large data sets require specialized methods and algorithms.

R should be able to smaller large data problems on machines withenough memory.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32

Page 176: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Big Data

Automated data acquisition in science and commerce is producinghuge amounts of data.

Big Data is a hot topic in the popular and trade press.

Some categories:

fit into memoryfit on one machines disk storagerequire multiple machines to store

Smaller large data sets can be handled by standard methods if enoughmemory is available.

Very large data sets require specialized methods and algorithms.

R should be able to smaller large data problems on machines withenough memory.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32

Page 177: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Big Data

Automated data acquisition in science and commerce is producinghuge amounts of data.

Big Data is a hot topic in the popular and trade press.

Some categories:

fit into memoryfit on one machines disk storagerequire multiple machines to store

Smaller large data sets can be handled by standard methods if enoughmemory is available.

Very large data sets require specialized methods and algorithms.

R should be able to smaller large data problems on machines withenough memory.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32

Page 178: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Big Data

Automated data acquisition in science and commerce is producinghuge amounts of data.

Big Data is a hot topic in the popular and trade press.

Some categories:

fit into memoryfit on one machines disk storagerequire multiple machines to store

Smaller large data sets can be handled by standard methods if enoughmemory is available.

Very large data sets require specialized methods and algorithms.

R should be able to smaller large data problems on machines withenough memory.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 27 / 32

Page 179: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Limit on Vector Object Size

Currently The total number of elements in a vector cannot exceed231 − 1 = 2, 147, 483, 647

This limit represents the largest possible 32-bit signed integer.

For numeric (double precision) data this means the largest possiblevector is about 16 GB.

This is fairly large, but is becoming an issue with larger data sets withmany variables on 64-bit platforms.

Can this limit be raised without breaking too much existing R codeand requiring the rewriting of too much C code?

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 28 / 32

Page 180: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Limit on Vector Object Size

Currently The total number of elements in a vector cannot exceed231 − 1 = 2, 147, 483, 647

This limit represents the largest possible 32-bit signed integer.

For numeric (double precision) data this means the largest possiblevector is about 16 GB.

This is fairly large, but is becoming an issue with larger data sets withmany variables on 64-bit platforms.

Can this limit be raised without breaking too much existing R codeand requiring the rewriting of too much C code?

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 28 / 32

Page 181: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Limit on Vector Object Size

Currently The total number of elements in a vector cannot exceed231 − 1 = 2, 147, 483, 647

This limit represents the largest possible 32-bit signed integer.

For numeric (double precision) data this means the largest possiblevector is about 16 GB.

This is fairly large, but is becoming an issue with larger data sets withmany variables on 64-bit platforms.

Can this limit be raised without breaking too much existing R codeand requiring the rewriting of too much C code?

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 28 / 32

Page 182: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Limit on Vector Object Size

Currently The total number of elements in a vector cannot exceed231 − 1 = 2, 147, 483, 647

This limit represents the largest possible 32-bit signed integer.

For numeric (double precision) data this means the largest possiblevector is about 16 GB.

This is fairly large, but is becoming an issue with larger data sets withmany variables on 64-bit platforms.

Can this limit be raised without breaking too much existing R codeand requiring the rewriting of too much C code?

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 28 / 32

Page 183: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Limit on Vector Object Size

Currently The total number of elements in a vector cannot exceed231 − 1 = 2, 147, 483, 647

This limit represents the largest possible 32-bit signed integer.

For numeric (double precision) data this means the largest possiblevector is about 16 GB.

This is fairly large, but is becoming an issue with larger data sets withmany variables on 64-bit platforms.

Can this limit be raised without breaking too much existing R codeand requiring the rewriting of too much C code?

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 28 / 32

Page 184: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Considerations

For all practical purposes on all current architectures the C int typeand the FORTRAN integer type are 32 bit signed integers.

The R source code uses C int or FORTRAN integer types in manyplaces that would need to be changed to a wider type.

The R memory manager is easy enough to change.

Finding all the other places in the C code implementing R where intwould need to be changed to a wider type, and making sure it is notchanged where it should not be, is hard.

External code used by R is also a problem, in particular the BLAS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 29 / 32

Page 185: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Considerations

For all practical purposes on all current architectures the C int typeand the FORTRAN integer type are 32 bit signed integers.

The R source code uses C int or FORTRAN integer types in manyplaces that would need to be changed to a wider type.

The R memory manager is easy enough to change.

Finding all the other places in the C code implementing R where intwould need to be changed to a wider type, and making sure it is notchanged where it should not be, is hard.

External code used by R is also a problem, in particular the BLAS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 29 / 32

Page 186: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Considerations

For all practical purposes on all current architectures the C int typeand the FORTRAN integer type are 32 bit signed integers.

The R source code uses C int or FORTRAN integer types in manyplaces that would need to be changed to a wider type.

The R memory manager is easy enough to change.

Finding all the other places in the C code implementing R where intwould need to be changed to a wider type, and making sure it is notchanged where it should not be, is hard.

External code used by R is also a problem, in particular the BLAS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 29 / 32

Page 187: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Considerations

For all practical purposes on all current architectures the C int typeand the FORTRAN integer type are 32 bit signed integers.

The R source code uses C int or FORTRAN integer types in manyplaces that would need to be changed to a wider type.

The R memory manager is easy enough to change.

Finding all the other places in the C code implementing R where intwould need to be changed to a wider type, and making sure it is notchanged where it should not be, is hard.

External code used by R is also a problem, in particular the BLAS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 29 / 32

Page 188: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Considerations

For all practical purposes on all current architectures the C int typeand the FORTRAN integer type are 32 bit signed integers.

The R source code uses C int or FORTRAN integer types in manyplaces that would need to be changed to a wider type.

The R memory manager is easy enough to change.

Finding all the other places in the C code implementing R where intwould need to be changed to a wider type, and making sure it is notchanged where it should not be, is hard.

External code used by R is also a problem, in particular the BLAS.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 29 / 32

Page 189: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Possible Directions

A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.

The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32

Page 190: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Possible Directions

A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.

The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32

Page 191: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Possible Directions

A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.

The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32

Page 192: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Possible Directions

A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.

The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32

Page 193: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Possible Directions

A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.

The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32

Page 194: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Possible Directions

A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.

The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32

Page 195: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Possible Directions

A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.

The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32

Page 196: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Possible Directions

A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.

The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32

Page 197: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Possible Directions

A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.

The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32

Page 198: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Some Possible Directions

A possible strategy:Change length fields in internal headers to support longer vectors.Change standard field accessors to signal an error if long vectors areused.Add new accessors that allow long vectors.Gradually introduce long vector support into the R internals.

The initial header change will require recompiling all C code but nofurther code changes.After the header change, support for large vectors can be introducedincrementally in R itself and in packages.It may eventually be necessary to introduce a long integer data typeor change the integer type from 32 to 64 bits.It may also be sufficient to store larger integers as double precisionfloating point numbers.If the integer representation is changed, a possible direction to exploreis whether smaller integer types could be added (one byte and twobyte, for example).

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 30 / 32

Page 199: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Status of Explorations

Initial experiments are promising.

In some cases just changing data types will be sufficient.

In other cases more may be needed, such as

ability to interrupt computationsmore stable numerical algorithms

Some more experimentation is needed.

If successful, the header changes may occur within the next fewweeks.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32

Page 200: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Status of Explorations

Initial experiments are promising.

In some cases just changing data types will be sufficient.

In other cases more may be needed, such as

ability to interrupt computationsmore stable numerical algorithms

Some more experimentation is needed.

If successful, the header changes may occur within the next fewweeks.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32

Page 201: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Status of Explorations

Initial experiments are promising.

In some cases just changing data types will be sufficient.

In other cases more may be needed, such as

ability to interrupt computationsmore stable numerical algorithms

Some more experimentation is needed.

If successful, the header changes may occur within the next fewweeks.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32

Page 202: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Status of Explorations

Initial experiments are promising.

In some cases just changing data types will be sufficient.

In other cases more may be needed, such as

ability to interrupt computationsmore stable numerical algorithms

Some more experimentation is needed.

If successful, the header changes may occur within the next fewweeks.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32

Page 203: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Status of Explorations

Initial experiments are promising.

In some cases just changing data types will be sufficient.

In other cases more may be needed, such as

ability to interrupt computationsmore stable numerical algorithms

Some more experimentation is needed.

If successful, the header changes may occur within the next fewweeks.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32

Page 204: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Status of Explorations

Initial experiments are promising.

In some cases just changing data types will be sufficient.

In other cases more may be needed, such as

ability to interrupt computationsmore stable numerical algorithms

Some more experimentation is needed.

If successful, the header changes may occur within the next fewweeks.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32

Page 205: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Status of Explorations

Initial experiments are promising.

In some cases just changing data types will be sufficient.

In other cases more may be needed, such as

ability to interrupt computationsmore stable numerical algorithms

Some more experimentation is needed.

If successful, the header changes may occur within the next fewweeks.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 31 / 32

Page 206: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Summary

This talk has outlined several areas I believe are important and towhich I hope I can make some contributions during the near future.

The R development model is quite distributed: other R developers areworking on a wide range of other areas.

Fortunately conflicts are rare and the different efforts, so far at least,have merged together quite very successfully.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 32 / 32

Page 207: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Summary

This talk has outlined several areas I believe are important and towhich I hope I can make some contributions during the near future.

The R development model is quite distributed: other R developers areworking on a wide range of other areas.

Fortunately conflicts are rare and the different efforts, so far at least,have merged together quite very successfully.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 32 / 32

Page 208: Some Developments for the R Engine - University of Iowahomepage.stat.uiowa.edu/~luke/talks/uiowa11.pdf · Some Developments for the R Engine Luke Tierney Department of Statistics

Summary

This talk has outlined several areas I believe are important and towhich I hope I can make some contributions during the near future.

The R development model is quite distributed: other R developers areworking on a wide range of other areas.

Fortunately conflicts are rare and the different efforts, so far at least,have merged together quite very successfully.

Luke Tierney (U. of Iowa) Developments for the R engine November 10, 2011 32 / 32