liftr: an r package for persistent reproducible research › talks › jsm2018-liftr-nanxiao.pdf ·...

13
liftr: an R Package for Persistent Reproducible Research JSM 2018 Nan Xiao @road2stat

Upload: others

Post on 05-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

liftr: an R Package for Persistent Reproducible Research

JSM 2018

Nan Xiao@road2stat

Page 2: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

The Reproducibility Crisis- Always a concern in both academia & industry. - R Markdown + knitr pre<y much saved the day.

Page 3: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

The New Challenge

Even higher reproducibility for staAsAcal compuAng: regardless of !me or environment.

Page 4: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

Docker to the Rescue

- Docker allows applicaAons and their dependencies to be packaged into discrete runAme environments, called containers.

- Apps packaged in this way can run from diverse infrastructures.

Page 5: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

knitr

liftr+ =+

Our Solution: liftr

Persistent, OS-level reproducibility for R Markdown documents.

Page 6: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

---liftr: sysdeps: - gfortran cran: - glmnet - xgboost---

YAML

R Markdown Documents with li5r op7ons in metadata

DockerfileRendered HTML/PDF/DOCX Reports

lift("foo.Rmd") render_docker("foo.Rmd")

Build Docker image, run the container, and render the RMD document.

Parse metadata in the document, and generate Dockerfile.

Containerized Report

Introduce addi7onal R Markdown metadata for containerizing reports.

+ PDF +

Containerize R Markdown Documents as Easy as 1-2-3

Page 7: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

IDE integra1on: RStudio addins (emojified): 📦 🎉 ✂ 🗑

Page 8: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

Philosophies

- Con$nuous reproducibility. Reproducible research should be a conAnuous process, instead of simply archiving code/data.

- Document first. R Markdown documents should be the center. Everything should be driven by documents, not packages.

- Minimal footprint. Connect R Markdown and Docker wisely, achieve more flexibility by doing less.

Page 9: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

Applications

- Individuals: off-the-shelf soluAon for achieving persistent, environment-irrelevant reproducibility for data analysis.

- Ins1tu1ons: key backend component for automated, large-scale report compilaAon/orchestraAon services.

Page 10: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

dockflow.org

Easily containerized ~20 complex R Markdown workflows from Bioconductor.

Page 11: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

Feature Roadmap

- AutomaAc inference of document dependencies (packrat)

- New renderers for bookdown, xaringan, and blogdown

- Improve CLI message interface (cli + crayon)

- Be<er Docker integra1on (reAculate + Docker API)

Page 12: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

Acknowledgements

A special thanks to:

- Prof. Qing-Song Xu (Central South University)

- Prof. Ma<hew Stephens (University of Chicago)

- Dr. Tengfei Yin (Seven Bridges Genomics)

- Dr. Miaozhu Li (Duke University)

for offering me all the freedom, advice, and encouragement to develop be<er so]ware for the staAsAcal and R community.

Page 13: liftr: an R Package for Persistent Reproducible Research › talks › jsm2018-liftr-nanxiao.pdf · Reproducible research should be a conAnuous process, instead of simply archiving

Thank you! Questions?