persistent reproducible reporting nan xiao · the docker image, run the container, and render the r...

15
Persistent Reproducible Reporting Nan Xiao Genomic Data Scientist, Seven Bridges

Upload: others

Post on 08-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

Persistent Reproducible Reporting

Nan Xiao Genomic Data Scientist, Seven Bridges

Page 2: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

Reproducible Research

Page 3: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

knitr

R Markdown + knitr to the rescue

+

Page 4: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

Reproducibility… has always been a concern in both academia & industry.

Page 5: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

Hundreds of automated analysis workflows for petabyte-scale data from The Cancer Genome Atlas.

Cancer Genomics Cloudwww.cancergenomicscloud.org

Page 6: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

Product & Tech Innovations in CGC

Rabix

Page 7: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

liftrOS-level reproducibility & persistency for reports.

knitrliftr+ =+

Page 8: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

---liftr: sysdeps: - gfortran cran: - glmnet - xgboost---

YAML

R Markdown Documentswith liftr options in metadata

DockerfileRendered HTML/PDF/Docx Reports

+ .docker.yml

lift("foo.Rmd") render_docker("foo.Rmd")

By running render_docker(), liftr will build the Docker image, run the container, and render the R Markdown document.

By running lift() on the RMD file, liftr parses the metadata fields appeared in the R Markdown document; then generates the Dockerfile.

Containerized Report

liftr extends the R Markdown metadata format, introducing additional options for containerizing and rendering reports.

+ PDF +

Dockerize documents as easy as 1-2-3

Page 9: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

library("liftr")input = "demo.Rmd"

lift(input) # Generate Dockerfilerender_docker(input) # Render report with Docker

purge_image(input) # Clean up Docker imagepush_image(input) # Push image to registry (devel)

Dockerize documents as easy as 1-2-3

Page 10: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

• RNA-Seq: biotechnology for measuring the expression of genes. It can help identify key genes in cancer.

• TBs of RNA-Seq data are generated. Computational tools and workflows are developed to analyze such data.

• How to ensure such reports are reproducible through time, when datasets, analysis tools are both evolving?

• Code available from: bit.ly/liftrdemo

Demo: RNA-Seq Data AnalysisExample workflow from Bioconductor

Page 11: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

Add liftr metadata to the R Markdown document:

base image, system dependencies, package dependencies, etc.

Step 1

Page 12: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

Use liftr::lift to generate Dockerfile

Step 2

Page 13: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

liftr::render_docker will build the image, run the container, and render into PDF/HTML/Docx.

Re-compilation: cached image layers are used to improve speed.

Remove the used image, or push to registry.

Step 3

Page 14: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

• Cloud-based rendering and containerization services for dynamic documents

• Democratize reproducible report creation/sharing

Future works

Page 15: Persistent Reproducible Reporting Nan Xiao · the Docker image, run the container, and render the R Markdown document. By running lift() on the RMD file, liftr parses the metadata

Thank You!liftr.me

@road2stat

#dockercon #liftr