sweave: reproducible research using r and latex · 2016-07-13 · sweave: reproducible research...

20
Sweave: Reproducible Research using R and L A T E X Sandra D. Griffith Department of Biostatistics and Epidemiology University of Pennsylvania [email protected] Biostatistics Computing Workshop Series March 15, 2012 S. Griffith ([email protected]) Sweave 15 March 2012 1 / 20

Upload: others

Post on 14-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Sweave: Reproducible Research using R and LATEX

Sandra D. GriffithDepartment of Biostatistics and EpidemiologyUniversity of [email protected]

Biostatistics Computing Workshop SeriesMarch 15, 2012

S. Griffith ([email protected]) Sweave 15 March 2012 1 / 20

Page 2: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Non-reproducible Research

• CharacteristicsI Prepare or manipulate data in a spreadsheetI Cut and paste output to create tablesI Multiple versions of data and analysis scriptsI Create many versions of graphics, selecting only one for final

presentation of results

• ProblemsI Data, code, and results not linkedI Any changes in analysis or data require manual regeneration of resultsI Workflow or organization scheme may change over timeI Can be difficult to replicate in the futureI Less forensic evidence if results are questioned

S. Griffith ([email protected]) Sweave 15 March 2012 2 / 20

Page 3: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Response to Duke University Scandal

“We now require most of our reports to be written using Sweave, a literateprogramming combination of LATEX source and R code (SASweave andodfWeave are also available) so that we can rerun the reports as needed

and get the same results.”

S. Griffith ([email protected]) Sweave 15 March 2012 3 / 20

Page 4: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Sweave: Conceptual Overview

• Link data, code, and results with a single .Rnw fileI Similar to .tex file, but includes interspersed “chunks” of R codeI Uses noweb syntax for literate programming

• Weave .Rnw file to produce .tex file which includes output from Rcode

• Compile TeX file to PDF or PS files as usual

• Tangle .Rnw file to extract R code into separate file

• In addition to including them in the output, creates individual files foreach figure

• Can refer to within-chunk R expressions in regular document textusing Sexpr

S. Griffith ([email protected]) Sweave 15 March 2012 4 / 20

Page 5: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Getting Started with Sweave

• Assume R and LATEX already installed

• Sweave.sty is already included with base R installation

I Preferred method: include R folder containing Sweave.sty in yourTeX path

F Will automatically update style file when you update R

I Copy Sweave.sty to a centralized location with other style files, alsoin your TeX path

F Requires manual updates, but can be located in a central locationshared among computers (e.g. Dropbox)

I Hard path: include \usepackage{...\Sweave} in preambleI Copy Sweave.sty into same folder as each .Rnw file

S. Griffith ([email protected]) Sweave 15 March 2012 5 / 20

Page 6: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Anatomy of a Code Chunk

<< label (optional), options >>=

insert R code here

@

Commonly-used options (see manual for full list)

• echo = F

Suppress R input from appearing in document (default = T)

• eval = F

R code not evaluated (default = T)

• results = hide

Suppress R output from appearing in document (default = verbatim)

• results = tex

R output will be read as TeX (default = verbatim)

• fig = T

Code chuck includes a figure (default = F)

S. Griffith ([email protected]) Sweave 15 March 2012 6 / 20

Page 7: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Global Options

Default options can be set in preamble and updated throughout document

• Set R chunk options\SweaveOpts{eval=T, echo=F}

• Preserve comments and spacing of echoed R code\SweaveOpts{keep.source=TRUE}

• Figure options for height, width, and file type

S. Griffith ([email protected]) Sweave 15 March 2012 7 / 20

Page 8: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Example

<<echo=T>>=

x <- exp(2.3)

x

@

> x <- exp(2.3)

> x

[1] 9.974182

<<echo=F>>=

x <- exp(2.3)

x

@

[1] 9.974182

<<echo=T, results=hide>>=

x <- exp(2.3)

x

@

> x <- exp(2.3)

> x

S. Griffith ([email protected]) Sweave 15 March 2012 8 / 20

Page 9: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Compiling an Sweave Document

• Manually (Windows or Mac)

1. Run Sweave(‘foo.Rnw’) in R console2. Open foo.tex in a TeX editor3. Compile PDF using TeX editor4. Stangle(‘foo.Rnw’) to extract R code if desired

• Manually (Linux/Unix)

1. Run R CMD Sweave foo.Rnw

2. Run pdflatex foo or latex foo

• Integrated Development Environment (IDE)

I Rstudio, Emacs (ESS), Eclipse (StatEt), etc.I If supported, usually one click/command for all steps (Sweave, compile

TeX, view PDF)

S. Griffith ([email protected]) Sweave 15 March 2012 9 / 20

Page 10: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

RStudio

S. Griffith ([email protected]) Sweave 15 March 2012 10 / 20

Page 11: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

The xtable Package: Basic Table Code

R package to convert many R objects to LATEXor HTML tables

<<label=tab:GenderRace, results=tex>>=

library(xtable)

data(tli)

xtable(table(tli$ethnicty, tli$sex),

caption="Distribution of gender and ethnicity")

@

<<label=tab:LM1, results=tex>>=

lm1 <- lm(tlimth ~ sex + ethnicty, data=tli)

xtable(lm1, caption="Linear Model Results")

@

S. Griffith ([email protected]) Sweave 15 March 2012 11 / 20

Page 12: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

The xtable package: Basic Table Output

F M

BLACK 11 12HISPANIC 8 12

OTHER 2 0WHITE 30 25

Table: Distribution of gender and ethnicity

Estimate Std. Error t value Pr(>|t|)(Intercept) 71.0226 3.2894 21.59 0.0000

sexM 3.3734 2.8594 1.18 0.2410ethnictyHISPANIC -3.7466 4.3044 -0.87 0.3863

ethnictyOTHER 18.4774 10.4716 1.76 0.0809ethnictyWHITE 7.4622 3.4964 2.13 0.0354

Table: Linear Model Results

S. Griffith ([email protected]) Sweave 15 March 2012 12 / 20

Page 13: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

The xtable package: Customized Tables

> mat <- round(matrix(c(0.9, 0.89, 200, 0.045, 2.0),

+ c(1, 5)), 4)

> rownames(mat) <- "$y_{t-1}$"

> colnames(mat) <- c("$R^2$", "$\\bar{R}^2$",

+ "F-stat", "S.E.E", "DW")

> mat <- xtable(mat)

> print(mat, sanitize.text.function = function(x){x})

R2 R̄2 F-stat S.E.E DW

yt−1 0.90 0.89 200.00 0.04 2.00

Almost all functionality available for LATEX tablescan be included directly in R code using xtable

S. Griffith ([email protected]) Sweave 15 March 2012 13 / 20

Page 14: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Aside: Using xtable for MS Word Tables

Non-statistical collaborators often prefer tabular results in MS Word

xtable(table(tli$ethnicty, tli$sex),

file="TabGenderRace",

type="html"

)

1. Save results in HTML file using xtable() in R

2. Open “TabGenderRace.htm” in a browser

3. Copy and paste into Word document as a fully-formatted table

S. Griffith ([email protected]) Sweave 15 March 2012 14 / 20

Page 15: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Basic Figure Example

<<fig=T, echo=F, width=5, height=3.5>>=

plot(1:10, rnorm(10))

@

2 4 6 8 10

−2

−1

01

1:10

rnor

m(1

0)

NB: Embed figure chunk within a LATEX figure environment for moreprecise control

S. Griffith ([email protected]) Sweave 15 March 2012 15 / 20

Page 16: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Large or Computationally Intensive Projects

• Use input statements or make files

• save() and load() intermediate results

• Conditional evaluationif (file exists) {load file} else {run; save file})

• Change R chunk evaluation options as necessary

• R package: cacheSweave to cache intermediate results

S. Griffith ([email protected]) Sweave 15 March 2012 16 / 20

Page 17: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Including R code as an Appendix

• Useful for homework, solution sets, etc.

• Include \usepackage{listings} in the preamble

• Include the following R chunk and TeX code in foo.Rnw where youwould like to place appendix

<<echo=FALSE, results=hide, split=TRUE>>=

Stangle(file="foo.Rnw",output="foo.R",

annotate=FALSE)

@

\pagebreak

\section{R Code}

\texttt{\lstinputlisting[emptylines=0]{foo.R}}

S. Griffith ([email protected]) Sweave 15 March 2012 17 / 20

Page 18: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Miscellaneous Sweave Tricks

• Load all libraries in one chunk with results = hide option tosuppress unwanted output (e.g. package dependencies)

• Beamer presentationsI Include [fragile] option for every frame with R code to handle

verbatim outputI For frames with TeX and verbatim output, must include

[containsverbatim] option instead

• R graphics package ggplot2

I Must use print() wrapper for ggplot objects

• R session information

> toLatex(sessionInfo(), locale=F)

I R version 2.14.1 (2011-12-22), x86_64-pc-mingw32I Base packages: base, datasets, graphics, grDevices, methods, stats,

utilsI Other packages: xtable 1.7-0I Loaded via a namespace (and not attached): tools 2.14.1

S. Griffith ([email protected]) Sweave 15 March 2012 18 / 20

Page 19: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Alternatives for Reproducible Research

• R for other document formats

I HTML: R2HTMLI Open Office: odfWeaveI MS Word: SwordI MS Powerpoint: R2PPT

• Other statistical packages

I Statweave for SAS, Stata, or MATLAB and LATEX or Open OfficeI Various other software-specific report generators

S. Griffith ([email protected]) Sweave 15 March 2012 19 / 20

Page 20: Sweave: Reproducible Research using R and LaTeX · 2016-07-13 · Sweave: Reproducible Research using R and LATEX Sandra D. Gri th Department of Biostatistics and Epidemiology University

Resources

• Sweave user manual (Friedrich Leisch): http://www.stat.

uni-muenchen.de/~leisch/Sweave/Sweave-manual.pdf

• Stack Overflow questions tagged Sweave:http://stackoverflow.com/questions/tagged/sweave

• Keith Baggerly’s introduction to Sweave: http://bioinformatics.

mdanderson.org/SweaveTalk/sweaveTalkb.pdf

• QuickR summary of alternatives to Sweave:http://www.statmethods.net/interface/output.html

• Citing R with Sweave: http://biostat.mc.vanderbilt.edu/

wiki/pub/Main/SweaveLatex/RCitation.pdf

• xtable gallery with examples: http://cran.r-project.org/web/

packages/xtable/vignettes/xtableGallery.pdf

S. Griffith ([email protected]) Sweave 15 March 2012 20 / 20