an introduction to the r environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/r-overview.pdf ·...

200
About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University of Copenhagen Center for Bioinformatics, Univ.Copenhagen, June 2005

Upload: others

Post on 06-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

An Introduction to the R Environment

Peter Dalgaard

Department of BiostatisticsUniversity of Copenhagen

Center for Bioinformatics, Univ.Copenhagen, June 2005

Page 2: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Outline

About the Talk

Basics of R

Modeling

The Package System

Some Practical Issues

Graphics

Programming

Page 3: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Outline

About the Talk

Basics of R

Modeling

The Package System

Some Practical Issues

Graphics

Programming

Page 4: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Practicalities

• Short tutorial (approx. 2hr)

• High coverage, not great depth

• Little time for interaction, but there should be made roomfor clarification.

• Short break (5–10 minutes) in the middle.

Page 5: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Practicalities

• Short tutorial (approx. 2hr)

• High coverage, not great depth

• Little time for interaction, but there should be made roomfor clarification.

• Short break (5–10 minutes) in the middle.

Page 6: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Practicalities

• Short tutorial (approx. 2hr)

• High coverage, not great depth

• Little time for interaction, but there should be made roomfor clarification.

• Short break (5–10 minutes) in the middle.

Page 7: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Practicalities

• Short tutorial (approx. 2hr)

• High coverage, not great depth

• Little time for interaction, but there should be made roomfor clarification.

• Short break (5–10 minutes) in the middle.

Page 8: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Plan

• Elementary things about R, simple interactive demo

• Modeling tools

• R packages

• Dealing with the R workspace

• Graphics in R

• Ad-hoc programming

• Not covering: Installation, Data summaries, GUIs,Computing on the language, C/Fortran interface, Advancedanalysis. . .

Page 9: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Plan

• Elementary things about R, simple interactive demo

• Modeling tools

• R packages

• Dealing with the R workspace

• Graphics in R

• Ad-hoc programming

• Not covering: Installation, Data summaries, GUIs,Computing on the language, C/Fortran interface, Advancedanalysis. . .

Page 10: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Plan

• Elementary things about R, simple interactive demo

• Modeling tools

• R packages

• Dealing with the R workspace

• Graphics in R

• Ad-hoc programming

• Not covering: Installation, Data summaries, GUIs,Computing on the language, C/Fortran interface, Advancedanalysis. . .

Page 11: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Plan

• Elementary things about R, simple interactive demo

• Modeling tools

• R packages

• Dealing with the R workspace

• Graphics in R

• Ad-hoc programming

• Not covering: Installation, Data summaries, GUIs,Computing on the language, C/Fortran interface, Advancedanalysis. . .

Page 12: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Plan

• Elementary things about R, simple interactive demo

• Modeling tools

• R packages

• Dealing with the R workspace

• Graphics in R

• Ad-hoc programming

• Not covering: Installation, Data summaries, GUIs,Computing on the language, C/Fortran interface, Advancedanalysis. . .

Page 13: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Plan

• Elementary things about R, simple interactive demo

• Modeling tools

• R packages

• Dealing with the R workspace

• Graphics in R

• Ad-hoc programming

• Not covering: Installation, Data summaries, GUIs,Computing on the language, C/Fortran interface, Advancedanalysis. . .

Page 14: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Plan

• Elementary things about R, simple interactive demo

• Modeling tools

• R packages

• Dealing with the R workspace

• Graphics in R

• Ad-hoc programming

• Not covering: Installation, Data summaries, GUIs,Computing on the language, C/Fortran interface, Advancedanalysis. . .

Page 15: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Outline

About the Talk

Basics of R

Modeling

The Package System

Some Practical Issues

Graphics

Programming

Page 16: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The R environment

• Built around the programming language R, an OpenSource dialect of the S language

• R is Free Software, and runs on a variety of platforms (I’llbe using Linux here, mainly to avoid technical surprises).

• Command-line execution based on function calls

• Extensible with user functions

• Workspace containing data and functions

• Various graphics devices (interactive and non-interactive)

Page 17: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The R environment

• Built around the programming language R, an OpenSource dialect of the S language

• R is Free Software, and runs on a variety of platforms (I’llbe using Linux here, mainly to avoid technical surprises).

• Command-line execution based on function calls

• Extensible with user functions

• Workspace containing data and functions

• Various graphics devices (interactive and non-interactive)

Page 18: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The R environment

• Built around the programming language R, an OpenSource dialect of the S language

• R is Free Software, and runs on a variety of platforms (I’llbe using Linux here, mainly to avoid technical surprises).

• Command-line execution based on function calls

• Extensible with user functions

• Workspace containing data and functions

• Various graphics devices (interactive and non-interactive)

Page 19: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The R environment

• Built around the programming language R, an OpenSource dialect of the S language

• R is Free Software, and runs on a variety of platforms (I’llbe using Linux here, mainly to avoid technical surprises).

• Command-line execution based on function calls

• Extensible with user functions

• Workspace containing data and functions

• Various graphics devices (interactive and non-interactive)

Page 20: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The R environment

• Built around the programming language R, an OpenSource dialect of the S language

• R is Free Software, and runs on a variety of platforms (I’llbe using Linux here, mainly to avoid technical surprises).

• Command-line execution based on function calls

• Extensible with user functions

• Workspace containing data and functions

• Various graphics devices (interactive and non-interactive)

Page 21: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The R environment

• Built around the programming language R, an OpenSource dialect of the S language

• R is Free Software, and runs on a variety of platforms (I’llbe using Linux here, mainly to avoid technical surprises).

• Command-line execution based on function calls

• Extensible with user functions

• Workspace containing data and functions

• Various graphics devices (interactive and non-interactive)

Page 22: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The basic vector types

• Numeric (integer/double)

• Character (strings)

• Logical

• Factor (really integer + level attribute)

• Lists (generic vectors)

Page 23: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The basic vector types

• Numeric (integer/double)

• Character (strings)

• Logical

• Factor (really integer + level attribute)

• Lists (generic vectors)

Page 24: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The basic vector types

• Numeric (integer/double)

• Character (strings)

• Logical

• Factor (really integer + level attribute)

• Lists (generic vectors)

Page 25: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The basic vector types

• Numeric (integer/double)

• Character (strings)

• Logical

• Factor (really integer + level attribute)

• Lists (generic vectors)

Page 26: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The basic vector types

• Numeric (integer/double)

• Character (strings)

• Logical

• Factor (really integer + level attribute)

• Lists (generic vectors)

Page 27: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Basic operations

• Standard arithmetic (x + y , etc.)

• Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

• c — concatenate

• seq — sequences

• rep — replication

• sum, mean, range , . . .

Page 28: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Basic operations

• Standard arithmetic (x + y , etc.)

• Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

• c — concatenate

• seq — sequences

• rep — replication

• sum, mean, range , . . .

Page 29: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Basic operations

• Standard arithmetic (x + y , etc.)

• Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

• c — concatenate

• seq — sequences

• rep — replication

• sum, mean, range , . . .

Page 30: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Basic operations

• Standard arithmetic (x + y , etc.)

• Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

• c — concatenate

• seq — sequences

• rep — replication

• sum, mean, range , . . .

Page 31: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Basic operations

• Standard arithmetic (x + y , etc.)

• Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

• c — concatenate

• seq — sequences

• rep — replication

• sum, mean, range , . . .

Page 32: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Basic operations

• Standard arithmetic (x + y , etc.)

• Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

• c — concatenate

• seq — sequences

• rep — replication

• sum, mean, range , . . .

Page 33: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Demo 1

x <- round(rnorm(10,mean=20,sd=5)) # simulate dataxmean(x)m <- mean(x)x - m # notice recycling(x - m)^2sum((x - m)^2)sqrt(sum((x - m)^2)/9)sd(x)

Page 34: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Smart indexing

• a[5] single element

• a[5:7] several elements

• a[-6] all except the 6th

• a[b>200] logical index

• a["name"] by name

• a$b list elements

Page 35: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Smart indexing

• a[5] single element

• a[5:7] several elements

• a[-6] all except the 6th

• a[b>200] logical index

• a["name"] by name

• a$b list elements

Page 36: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Smart indexing

• a[5] single element

• a[5:7] several elements

• a[-6] all except the 6th

• a[b>200] logical index

• a["name"] by name

• a$b list elements

Page 37: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Smart indexing

• a[5] single element

• a[5:7] several elements

• a[-6] all except the 6th

• a[b>200] logical index

• a["name"] by name

• a$b list elements

Page 38: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Smart indexing

• a[5] single element

• a[5:7] several elements

• a[-6] all except the 6th

• a[b>200] logical index

• a["name"] by name

• a$b list elements

Page 39: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Smart indexing

• a[5] single element

• a[5:7] several elements

• a[-6] all except the 6th

• a[b>200] logical index

• a["name"] by name

• a$b list elements

Page 40: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Matrices/tables/arrays

• Used in matrix calculus and as input to, e.g.,chisq.test() . Results of tabulation.

• Vectors with dimensions

• Dimnames

• Matrices: Generate with matrix

• Indexing methods include [i,j] , [i,] , [,j]

Page 41: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Matrices/tables/arrays

• Used in matrix calculus and as input to, e.g.,chisq.test() . Results of tabulation.

• Vectors with dimensions

• Dimnames

• Matrices: Generate with matrix

• Indexing methods include [i,j] , [i,] , [,j]

Page 42: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Matrices/tables/arrays

• Used in matrix calculus and as input to, e.g.,chisq.test() . Results of tabulation.

• Vectors with dimensions

• Dimnames

• Matrices: Generate with matrix

• Indexing methods include [i,j] , [i,] , [,j]

Page 43: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Matrices/tables/arrays

• Used in matrix calculus and as input to, e.g.,chisq.test() . Results of tabulation.

• Vectors with dimensions

• Dimnames

• Matrices: Generate with matrix

• Indexing methods include [i,j] , [i,] , [,j]

Page 44: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Matrices/tables/arrays

• Used in matrix calculus and as input to, e.g.,chisq.test() . Results of tabulation.

• Vectors with dimensions

• Dimnames

• Matrices: Generate with matrix

• Indexing methods include [i,j] , [i,] , [,j]

Page 45: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Data frames

• Like data set in other packages

• Technically: Lists of vectors/factors of same length

• Row names (must be unique)

• Indexed like matrices (Beware, though: Data frames arenot matrices)

• Generate from read operation or with data.frame

• Many sample data frames are avalilable using data()

Page 46: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Data frames

• Like data set in other packages

• Technically: Lists of vectors/factors of same length

• Row names (must be unique)

• Indexed like matrices (Beware, though: Data frames arenot matrices)

• Generate from read operation or with data.frame

• Many sample data frames are avalilable using data()

Page 47: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Data frames

• Like data set in other packages

• Technically: Lists of vectors/factors of same length

• Row names (must be unique)

• Indexed like matrices (Beware, though: Data frames arenot matrices)

• Generate from read operation or with data.frame

• Many sample data frames are avalilable using data()

Page 48: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Data frames

• Like data set in other packages

• Technically: Lists of vectors/factors of same length

• Row names (must be unique)

• Indexed like matrices (Beware, though: Data frames arenot matrices)

• Generate from read operation or with data.frame

• Many sample data frames are avalilable using data()

Page 49: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Data frames

• Like data set in other packages

• Technically: Lists of vectors/factors of same length

• Row names (must be unique)

• Indexed like matrices (Beware, though: Data frames arenot matrices)

• Generate from read operation or with data.frame

• Many sample data frames are avalilable using data()

Page 50: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Data frames

• Like data set in other packages

• Technically: Lists of vectors/factors of same length

• Row names (must be unique)

• Indexed like matrices (Beware, though: Data frames arenot matrices)

• Generate from read operation or with data.frame

• Many sample data frames are avalilable using data()

Page 51: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Demo 2

data(airquality)airquality$Monthairquality[airquality$Month==5,]oz <- airquality[airquality$Month==5,]$Ozonemean(oz)mean(oz, na.rm=TRUE)attach(airquality)mean(Ozone, na.rm=TRUE)tapply(Ozone, Month, mean, na.rm=TRUE)detach()

Page 52: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Some standard procedures

• Continuous data by group: t.test , wilcox.test ,oneway.test , kruskal.test

• Categorical data: prop.test , chisq.test ,fisher.test

• Correlations: cor.test , with options for nonparametrics

Page 53: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Some standard procedures

• Continuous data by group: t.test , wilcox.test ,oneway.test , kruskal.test

• Categorical data: prop.test , chisq.test ,fisher.test

• Correlations: cor.test , with options for nonparametrics

Page 54: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Some standard procedures

• Continuous data by group: t.test , wilcox.test ,oneway.test , kruskal.test

• Categorical data: prop.test , chisq.test ,fisher.test

• Correlations: cor.test , with options for nonparametrics

Page 55: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Demo 3

library(ISwR)data(intake)attach(intake)t.test(pre, post, paired=TRUE)detach()data(caesarean) # loads a tablecaesar.shoechisq.test(caesar.shoe)fisher.test(caesar.shoe)

Page 56: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Outline

About the Talk

Basics of R

Modeling

The Package System

Some Practical Issues

Graphics

Programming

Page 57: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Modeling Tools: Overview

• Model formulas

• Model objects and summaries

• Comparing models

• Evaluating model fit

• Generalized linear models

Page 58: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Modeling Tools: Overview

• Model formulas

• Model objects and summaries

• Comparing models

• Evaluating model fit

• Generalized linear models

Page 59: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Modeling Tools: Overview

• Model formulas

• Model objects and summaries

• Comparing models

• Evaluating model fit

• Generalized linear models

Page 60: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Modeling Tools: Overview

• Model formulas

• Model objects and summaries

• Comparing models

• Evaluating model fit

• Generalized linear models

Page 61: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Modeling Tools: Overview

• Model formulas

• Model objects and summaries

• Comparing models

• Evaluating model fit

• Generalized linear models

Page 62: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model formulas

• Linear model, y = Xβ + ε

• In practice something like

y = β0 + β1 × height + β2 × 1(type=2) + β3 × 1(type=3) + ε

• Wilkinson-Rogers formulas:

y = height + type

(Interpretation depends on whether variables arecategorical or continuous)

Page 63: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model formulas

• Linear model, y = Xβ + ε

• In practice something like

y = β0 + β1 × height + β2 × 1(type=2) + β3 × 1(type=3) + ε

• Wilkinson-Rogers formulas:

y = height + type

(Interpretation depends on whether variables arecategorical or continuous)

Page 64: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model formulas

• Linear model, y = Xβ + ε

• In practice something like

y = β0 + β1 × height + β2 × 1(type=2) + β3 × 1(type=3) + ε

• Wilkinson-Rogers formulas:

y = height + type

(Interpretation depends on whether variables arecategorical or continuous)

Page 65: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model formulas in R

• R representation y ~ height + type where type is afactor

• Interactions a:b , a*b = a + b + a:b

• Algebra (a:(b + c) = a:b + a:c etc.)

• Notice special interpretation of operators

• Special items: offset , -1 (no intercept)

Page 66: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model formulas in R

• R representation y ~ height + type where type is afactor

• Interactions a:b , a*b = a + b + a:b

• Algebra (a:(b + c) = a:b + a:c etc.)

• Notice special interpretation of operators

• Special items: offset , -1 (no intercept)

Page 67: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model formulas in R

• R representation y ~ height + type where type is afactor

• Interactions a:b , a*b = a + b + a:b

• Algebra (a:(b + c) = a:b + a:c etc.)

• Notice special interpretation of operators

• Special items: offset , -1 (no intercept)

Page 68: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model formulas in R

• R representation y ~ height + type where type is afactor

• Interactions a:b , a*b = a + b + a:b

• Algebra (a:(b + c) = a:b + a:c etc.)

• Notice special interpretation of operators

• Special items: offset , -1 (no intercept)

Page 69: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model formulas in R

• R representation y ~ height + type where type is afactor

• Interactions a:b , a*b = a + b + a:b

• Algebra (a:(b + c) = a:b + a:c etc.)

• Notice special interpretation of operators

• Special items: offset , -1 (no intercept)

Page 70: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Fitting linear models

data(airquality)aq <- transform(airquality, Month=factor(Month))fit.aq <- lm(log(Ozone) ~ Solar.R + Wind +

Temp + Month, data=aq)

• lm generates a fitted model object

• Extract information from model object

• Fit other models based on model object

Page 71: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Inspecting model objects

• Extract information about the fit

• summary(fit.aq)

• fitted(fit.aq) , resid(fit.aq)

• anova(model1, model2)

• plot(fit.aq) – diagnostics

• predict(fit.aq, newdata)

Page 72: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Inspecting model objects

• Extract information about the fit

• summary(fit.aq)

• fitted(fit.aq) , resid(fit.aq)

• anova(model1, model2)

• plot(fit.aq) – diagnostics

• predict(fit.aq, newdata)

Page 73: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Inspecting model objects

• Extract information about the fit

• summary(fit.aq)

• fitted(fit.aq) , resid(fit.aq)

• anova(model1, model2)

• plot(fit.aq) – diagnostics

• predict(fit.aq, newdata)

Page 74: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Inspecting model objects

• Extract information about the fit

• summary(fit.aq)

• fitted(fit.aq) , resid(fit.aq)

• anova(model1, model2)

• plot(fit.aq) – diagnostics

• predict(fit.aq, newdata)

Page 75: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Inspecting model objects

• Extract information about the fit

• summary(fit.aq)

• fitted(fit.aq) , resid(fit.aq)

• anova(model1, model2)

• plot(fit.aq) – diagnostics

• predict(fit.aq, newdata)

Page 76: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Inspecting model objects

• Extract information about the fit

• summary(fit.aq)

• fitted(fit.aq) , resid(fit.aq)

• anova(model1, model2)

• plot(fit.aq) – diagnostics

• predict(fit.aq, newdata)

Page 77: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model search

• anova(model) “Type I” sum of squares

• drop1 (“Type III”), add1

• step (AIC/BIC) criteria

• update

Page 78: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model search

• anova(model) “Type I” sum of squares

• drop1 (“Type III”), add1

• step (AIC/BIC) criteria

• update

Page 79: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model search

• anova(model) “Type I” sum of squares

• drop1 (“Type III”), add1

• step (AIC/BIC) criteria

• update

Page 80: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Model search

• anova(model) “Type I” sum of squares

• drop1 (“Type III”), add1

• step (AIC/BIC) criteria

• update

Page 81: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Demo 4

data(airquality)aq <- transform(airquality, Month=factor(Month))fit.aq <- lm(log(Ozone) ~ Solar.R + Wind +

Temp + Month, data=aq)fit.aq2 <- update(fit.aq, ~ . - Month)summary(fit.aq)plot(fit.aq)drop1(fit.aq, test="F")anova(fit.aq, fit.aq2)

Page 82: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Generalized linear models

• Statistical distribution (exponential) family

• Link function transforming mean to linear scale

• Deviance

• Examples; Binomial, Poisson, Gaussian (σ known — inprinciple)

• Canonical link functions

• Fit using glm in R

Page 83: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Generalized linear models

• Statistical distribution (exponential) family

• Link function transforming mean to linear scale

• Deviance

• Examples; Binomial, Poisson, Gaussian (σ known — inprinciple)

• Canonical link functions

• Fit using glm in R

Page 84: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Generalized linear models

• Statistical distribution (exponential) family

• Link function transforming mean to linear scale

• Deviance

• Examples; Binomial, Poisson, Gaussian (σ known — inprinciple)

• Canonical link functions

• Fit using glm in R

Page 85: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Generalized linear models

• Statistical distribution (exponential) family

• Link function transforming mean to linear scale

• Deviance

• Examples; Binomial, Poisson, Gaussian (σ known — inprinciple)

• Canonical link functions

• Fit using glm in R

Page 86: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Generalized linear models

• Statistical distribution (exponential) family

• Link function transforming mean to linear scale

• Deviance

• Examples; Binomial, Poisson, Gaussian (σ known — inprinciple)

• Canonical link functions

• Fit using glm in R

Page 87: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Generalized linear models

• Statistical distribution (exponential) family

• Link function transforming mean to linear scale

• Deviance

• Examples; Binomial, Poisson, Gaussian (σ known — inprinciple)

• Canonical link functions

• Fit using glm in R

Page 88: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Demo 5

no.yes <- c("No","Yes")smoking <- gl(2, 1, 8, no.yes)obesity <- gl(2, 2, 8, no.yes)snoring <- gl(2, 4, 8, no.yes)n.tot <- c(60,17,8,2,187,85,51,23)n.hyp <- c(5,2,1,0,35,13,15,8)data.frame(smoking,obesity,snoring,n.tot,n.hyp)hyp.tbl <- cbind(n.hyp,n.tot-n.hyp)glm.hyp <- glm(hyp.tbl~smoking+obesity+snoring,

family=binomial("logit"))summary(glm.hyp)0.87194 + qnorm(c(.025,.975))*0.39757library(MASS)confint(glm.hyp)

Page 89: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Likelihood-based inference

• Wald approximations (β̂/s.e.(β̂), etc.) can be badlyinaccurate in small samples

• Likelihood-based inference is preferable

• Use drop1(model, test="Chisq") (forbinomial/Poisson)

• profile (in MASSfor glm ) investigates behaviour oflikelihood around maximum

• plot(profile(model)) shows signed LR statisticsign(β − β̂)

√Q when varying each parameter.

• confint gives likelihood-based confidence intervals

Page 90: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Likelihood-based inference

• Wald approximations (β̂/s.e.(β̂), etc.) can be badlyinaccurate in small samples

• Likelihood-based inference is preferable

• Use drop1(model, test="Chisq") (forbinomial/Poisson)

• profile (in MASSfor glm ) investigates behaviour oflikelihood around maximum

• plot(profile(model)) shows signed LR statisticsign(β − β̂)

√Q when varying each parameter.

• confint gives likelihood-based confidence intervals

Page 91: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Likelihood-based inference

• Wald approximations (β̂/s.e.(β̂), etc.) can be badlyinaccurate in small samples

• Likelihood-based inference is preferable

• Use drop1(model, test="Chisq") (forbinomial/Poisson)

• profile (in MASSfor glm ) investigates behaviour oflikelihood around maximum

• plot(profile(model)) shows signed LR statisticsign(β − β̂)

√Q when varying each parameter.

• confint gives likelihood-based confidence intervals

Page 92: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Likelihood-based inference

• Wald approximations (β̂/s.e.(β̂), etc.) can be badlyinaccurate in small samples

• Likelihood-based inference is preferable

• Use drop1(model, test="Chisq") (forbinomial/Poisson)

• profile (in MASSfor glm ) investigates behaviour oflikelihood around maximum

• plot(profile(model)) shows signed LR statisticsign(β − β̂)

√Q when varying each parameter.

• confint gives likelihood-based confidence intervals

Page 93: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Likelihood-based inference

• Wald approximations (β̂/s.e.(β̂), etc.) can be badlyinaccurate in small samples

• Likelihood-based inference is preferable

• Use drop1(model, test="Chisq") (forbinomial/Poisson)

• profile (in MASSfor glm ) investigates behaviour oflikelihood around maximum

• plot(profile(model)) shows signed LR statisticsign(β − β̂)

√Q when varying each parameter.

• confint gives likelihood-based confidence intervals

Page 94: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Likelihood-based inference

• Wald approximations (β̂/s.e.(β̂), etc.) can be badlyinaccurate in small samples

• Likelihood-based inference is preferable

• Use drop1(model, test="Chisq") (forbinomial/Poisson)

• profile (in MASSfor glm ) investigates behaviour oflikelihood around maximum

• plot(profile(model)) shows signed LR statisticsign(β − β̂)

√Q when varying each parameter.

• confint gives likelihood-based confidence intervals

Page 95: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Outline

About the Talk

Basics of R

Modeling

The Package System

Some Practical Issues

Graphics

Programming

Page 96: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

R packages

• Collections of R functions, data, and compiled code

• Well-defined format that ensures easy installation, a basicstandard of documentation, and enhances portability andreliability,

• You can write your own packages! It is not entirely trivial,but tools are there to help you.

Page 97: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

R packages

• Collections of R functions, data, and compiled code

• Well-defined format that ensures easy installation, a basicstandard of documentation, and enhances portability andreliability,

• You can write your own packages! It is not entirely trivial,but tools are there to help you.

Page 98: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

R packages

• Collections of R functions, data, and compiled code

• Well-defined format that ensures easy installation, a basicstandard of documentation, and enhances portability andreliability,

• You can write your own packages! It is not entirely trivial,but tools are there to help you.

Page 99: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Packages that come with R

• Standard R (1.9.0) loads with these packages available

• methods — New (S4) class system• stats — Statistical procedures• graphics — Graphics• utils — Utilities (file handling, packages, . . . )• base — All the basic stuff

• Further packages are available in core R, but notautomatically loaded (tcltk , grid , splines ,stats4 . . . )

• 10 further packages and package bundles are maintainedseparately, but included with source and binarydistributions (survival , nlme , MASS, . . . ).

Page 100: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Packages that come with R

• Standard R (1.9.0) loads with these packages available

• methods — New (S4) class system• stats — Statistical procedures• graphics — Graphics• utils — Utilities (file handling, packages, . . . )• base — All the basic stuff

• Further packages are available in core R, but notautomatically loaded (tcltk , grid , splines ,stats4 . . . )

• 10 further packages and package bundles are maintainedseparately, but included with source and binarydistributions (survival , nlme , MASS, . . . ).

Page 101: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Packages that come with R

• Standard R (1.9.0) loads with these packages available

• methods — New (S4) class system• stats — Statistical procedures• graphics — Graphics• utils — Utilities (file handling, packages, . . . )• base — All the basic stuff

• Further packages are available in core R, but notautomatically loaded (tcltk , grid , splines ,stats4 . . . )

• 10 further packages and package bundles are maintainedseparately, but included with source and binarydistributions (survival , nlme , MASS, . . . ).

Page 102: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Packages that come with R

• Standard R (1.9.0) loads with these packages available

• methods — New (S4) class system• stats — Statistical procedures• graphics — Graphics• utils — Utilities (file handling, packages, . . . )• base — All the basic stuff

• Further packages are available in core R, but notautomatically loaded (tcltk , grid , splines ,stats4 . . . )

• 10 further packages and package bundles are maintainedseparately, but included with source and binarydistributions (survival , nlme , MASS, . . . ).

Page 103: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Packages that come with R

• Standard R (1.9.0) loads with these packages available

• methods — New (S4) class system• stats — Statistical procedures• graphics — Graphics• utils — Utilities (file handling, packages, . . . )• base — All the basic stuff

• Further packages are available in core R, but notautomatically loaded (tcltk , grid , splines ,stats4 . . . )

• 10 further packages and package bundles are maintainedseparately, but included with source and binarydistributions (survival , nlme , MASS, . . . ).

Page 104: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Packages that come with R

• Standard R (1.9.0) loads with these packages available

• methods — New (S4) class system• stats — Statistical procedures• graphics — Graphics• utils — Utilities (file handling, packages, . . . )• base — All the basic stuff

• Further packages are available in core R, but notautomatically loaded (tcltk , grid , splines ,stats4 . . . )

• 10 further packages and package bundles are maintainedseparately, but included with source and binarydistributions (survival , nlme , MASS, . . . ).

Page 105: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Packages that come with R

• Standard R (1.9.0) loads with these packages available

• methods — New (S4) class system• stats — Statistical procedures• graphics — Graphics• utils — Utilities (file handling, packages, . . . )• base — All the basic stuff

• Further packages are available in core R, but notautomatically loaded (tcltk , grid , splines ,stats4 . . . )

• 10 further packages and package bundles are maintainedseparately, but included with source and binarydistributions (survival , nlme , MASS, . . . ).

Page 106: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Packages that come with R

• Standard R (1.9.0) loads with these packages available

• methods — New (S4) class system• stats — Statistical procedures• graphics — Graphics• utils — Utilities (file handling, packages, . . . )• base — All the basic stuff

• Further packages are available in core R, but notautomatically loaded (tcltk , grid , splines ,stats4 . . . )

• 10 further packages and package bundles are maintainedseparately, but included with source and binarydistributions (survival , nlme , MASS, . . . ).

Page 107: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

CRAN

• The Comprehensive R Archive Network• Collection of servers mirroring a central server in Vienna.

Modeled on CTAN and CPAN (for TEX and Perl code)• http://cran.us.r-project.org• Maintains a curated collection of R packages as well as the

source and binary distributions R itself• About 320 packages available• Unix/Linux variants generally install packages from

sources. Windows and MacOSX have binary packageformats which are even easier to install

• See also: Bioconductor,http://www.bioconductor.org

Page 108: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

CRAN

• The Comprehensive R Archive Network• Collection of servers mirroring a central server in Vienna.

Modeled on CTAN and CPAN (for TEX and Perl code)• http://cran.us.r-project.org• Maintains a curated collection of R packages as well as the

source and binary distributions R itself• About 320 packages available• Unix/Linux variants generally install packages from

sources. Windows and MacOSX have binary packageformats which are even easier to install

• See also: Bioconductor,http://www.bioconductor.org

Page 109: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

CRAN

• The Comprehensive R Archive Network• Collection of servers mirroring a central server in Vienna.

Modeled on CTAN and CPAN (for TEX and Perl code)• http://cran.us.r-project.org• Maintains a curated collection of R packages as well as the

source and binary distributions R itself• About 320 packages available• Unix/Linux variants generally install packages from

sources. Windows and MacOSX have binary packageformats which are even easier to install

• See also: Bioconductor,http://www.bioconductor.org

Page 110: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

CRAN

• The Comprehensive R Archive Network• Collection of servers mirroring a central server in Vienna.

Modeled on CTAN and CPAN (for TEX and Perl code)• http://cran.us.r-project.org• Maintains a curated collection of R packages as well as the

source and binary distributions R itself• About 320 packages available• Unix/Linux variants generally install packages from

sources. Windows and MacOSX have binary packageformats which are even easier to install

• See also: Bioconductor,http://www.bioconductor.org

Page 111: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

CRAN

• The Comprehensive R Archive Network• Collection of servers mirroring a central server in Vienna.

Modeled on CTAN and CPAN (for TEX and Perl code)• http://cran.us.r-project.org• Maintains a curated collection of R packages as well as the

source and binary distributions R itself• About 320 packages available• Unix/Linux variants generally install packages from

sources. Windows and MacOSX have binary packageformats which are even easier to install

• See also: Bioconductor,http://www.bioconductor.org

Page 112: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

CRAN

• The Comprehensive R Archive Network• Collection of servers mirroring a central server in Vienna.

Modeled on CTAN and CPAN (for TEX and Perl code)• http://cran.us.r-project.org• Maintains a curated collection of R packages as well as the

source and binary distributions R itself• About 320 packages available• Unix/Linux variants generally install packages from

sources. Windows and MacOSX have binary packageformats which are even easier to install

• See also: Bioconductor,http://www.bioconductor.org

Page 113: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

CRAN

• The Comprehensive R Archive Network• Collection of servers mirroring a central server in Vienna.

Modeled on CTAN and CPAN (for TEX and Perl code)• http://cran.us.r-project.org• Maintains a curated collection of R packages as well as the

source and binary distributions R itself• About 320 packages available• Unix/Linux variants generally install packages from

sources. Windows and MacOSX have binary packageformats which are even easier to install

• See also: Bioconductor,http://www.bioconductor.org

Page 114: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Demo 6

# Cheat for offline demo:# Pretend CRAN is local directoryoptions(CRAN="file:/home/pd/cran.r-project.org")# Manipulate install path.libPaths("~/Rlibrary").libPaths()# Source install (gives harmless warning)install.packages("mvtnorm")library(mvtnorm)library(help=mvtnorm)

Page 115: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Outline

About the Talk

Basics of R

Modeling

The Package System

Some Practical Issues

Graphics

Programming

Page 116: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Practical issues

• Dealing with the workspace

• Reading data

• Saving and restoring data and results

Page 117: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Practical issues

• Dealing with the workspace

• Reading data

• Saving and restoring data and results

Page 118: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Practical issues

• Dealing with the workspace

• Reading data

• Saving and restoring data and results

Page 119: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The workspace

• The global environment contains R objects created on thecommand line.

• There is an additional search path of loaded packages andattached data frames.

• The search path is maintained by library() , attach() ,and detach()

• This determines the way R looks up objects by name

• Notice that objects in the global environment may maskobjects in packages.

Page 120: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The workspace

• The global environment contains R objects created on thecommand line.

• There is an additional search path of loaded packages andattached data frames.

• The search path is maintained by library() , attach() ,and detach()

• This determines the way R looks up objects by name

• Notice that objects in the global environment may maskobjects in packages.

Page 121: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The workspace

• The global environment contains R objects created on thecommand line.

• There is an additional search path of loaded packages andattached data frames.

• The search path is maintained by library() , attach() ,and detach()

• This determines the way R looks up objects by name

• Notice that objects in the global environment may maskobjects in packages.

Page 122: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The workspace

• The global environment contains R objects created on thecommand line.

• There is an additional search path of loaded packages andattached data frames.

• The search path is maintained by library() , attach() ,and detach()

• This determines the way R looks up objects by name

• Notice that objects in the global environment may maskobjects in packages.

Page 123: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

The workspace

• The global environment contains R objects created on thecommand line.

• There is an additional search path of loaded packages andattached data frames.

• The search path is maintained by library() , attach() ,and detach()

• This determines the way R looks up objects by name

• Notice that objects in the global environment may maskobjects in packages.

Page 124: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Demo 7

search()data(intake) # From ISwRls()attach(intake)search()ls("intake") # show variables in data framepost - prerm(intake) # remove data framedetach() # remove from search path

Page 125: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Reading data

• Simple data vectors can be read using scan()

• Data frames can be read from most reasonably structuredtext file formats (space separated columns, tab- andcomma-delimited files) using read.table() orread.delim() . Note colClasses .

• The foreign package can read files from Stata, SASexport libraries, SPSS, and Epi-Info, Minitab, and someS-PLUS versions.

• For spreadsheets and databases, the quick and easy wayis to export to a delimited file, but you can work via ODBCconnections and database access packages

Page 126: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Reading data

• Simple data vectors can be read using scan()

• Data frames can be read from most reasonably structuredtext file formats (space separated columns, tab- andcomma-delimited files) using read.table() orread.delim() . Note colClasses .

• The foreign package can read files from Stata, SASexport libraries, SPSS, and Epi-Info, Minitab, and someS-PLUS versions.

• For spreadsheets and databases, the quick and easy wayis to export to a delimited file, but you can work via ODBCconnections and database access packages

Page 127: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Reading data

• Simple data vectors can be read using scan()

• Data frames can be read from most reasonably structuredtext file formats (space separated columns, tab- andcomma-delimited files) using read.table() orread.delim() . Note colClasses .

• The foreign package can read files from Stata, SASexport libraries, SPSS, and Epi-Info, Minitab, and someS-PLUS versions.

• For spreadsheets and databases, the quick and easy wayis to export to a delimited file, but you can work via ODBCconnections and database access packages

Page 128: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Reading data

• Simple data vectors can be read using scan()

• Data frames can be read from most reasonably structuredtext file formats (space separated columns, tab- andcomma-delimited files) using read.table() orread.delim() . Note colClasses .

• The foreign package can read files from Stata, SASexport libraries, SPSS, and Epi-Info, Minitab, and someS-PLUS versions.

• For spreadsheets and databases, the quick and easy wayis to export to a delimited file, but you can work via ODBCconnections and database access packages

Page 129: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Getting organized

Several possibilities:

• Save/restore entire workspace (objects only)

• Save selected objects and load them

• source() script files

• Batch processing (R CMD BATCH file.R )

• ESS – Emacs Speaks Statistics: Integrated environmentfor maintaining scripts, running R, saving results, etc.

Page 130: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Getting organized

Several possibilities:

• Save/restore entire workspace (objects only)

• Save selected objects and load them

• source() script files

• Batch processing (R CMD BATCH file.R )

• ESS – Emacs Speaks Statistics: Integrated environmentfor maintaining scripts, running R, saving results, etc.

Page 131: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Getting organized

Several possibilities:

• Save/restore entire workspace (objects only)

• Save selected objects and load them

• source() script files

• Batch processing (R CMD BATCH file.R )

• ESS – Emacs Speaks Statistics: Integrated environmentfor maintaining scripts, running R, saving results, etc.

Page 132: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Getting organized

Several possibilities:

• Save/restore entire workspace (objects only)

• Save selected objects and load them

• source() script files

• Batch processing (R CMD BATCH file.R )

• ESS – Emacs Speaks Statistics: Integrated environmentfor maintaining scripts, running R, saving results, etc.

Page 133: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Getting organized

Several possibilities:

• Save/restore entire workspace (objects only)

• Save selected objects and load them

• source() script files

• Batch processing (R CMD BATCH file.R )

• ESS – Emacs Speaks Statistics: Integrated environmentfor maintaining scripts, running R, saving results, etc.

Page 134: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Outline

About the Talk

Basics of R

Modeling

The Package System

Some Practical Issues

Graphics

Programming

Page 135: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

R graphics

• The standard interface

• Customizing plots

• Graphics parameters

• Math on plots

• Grid and lattice

Page 136: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

R graphics

• The standard interface

• Customizing plots

• Graphics parameters

• Math on plots

• Grid and lattice

Page 137: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

R graphics

• The standard interface

• Customizing plots

• Graphics parameters

• Math on plots

• Grid and lattice

Page 138: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

R graphics

• The standard interface

• Customizing plots

• Graphics parameters

• Math on plots

• Grid and lattice

Page 139: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

R graphics

• The standard interface

• Customizing plots

• Graphics parameters

• Math on plots

• Grid and lattice

Page 140: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Standard R graphics

• Ink on paper model; once something is drawn it cannot beerased.

• Sensible default plots

• Arguments can override defaults

• Options to turn off various elements of plots (e.g. the axes)

• Functions to add elements.

Page 141: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Standard R graphics

• Ink on paper model; once something is drawn it cannot beerased.

• Sensible default plots

• Arguments can override defaults

• Options to turn off various elements of plots (e.g. the axes)

• Functions to add elements.

Page 142: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Standard R graphics

• Ink on paper model; once something is drawn it cannot beerased.

• Sensible default plots

• Arguments can override defaults

• Options to turn off various elements of plots (e.g. the axes)

• Functions to add elements.

Page 143: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Standard R graphics

• Ink on paper model; once something is drawn it cannot beerased.

• Sensible default plots

• Arguments can override defaults

• Options to turn off various elements of plots (e.g. the axes)

• Functions to add elements.

Page 144: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Standard R graphics

• Ink on paper model; once something is drawn it cannot beerased.

• Sensible default plots

• Arguments can override defaults

• Options to turn off various elements of plots (e.g. the axes)

• Functions to add elements.

Page 145: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Basic x-y plots

• The plot function with one or two numeric arguments

• Scatterplot or line plot (or both) depending on typeargument: "l" for lines, "p" for points (the default), "b"for both, plus quite a few more.

• Functions for adding to a plot: lines, points,segments, abline, text, mtext, axis

• Also: formula interface, plot(y~x)

Page 146: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Basic x-y plots

• The plot function with one or two numeric arguments

• Scatterplot or line plot (or both) depending on typeargument: "l" for lines, "p" for points (the default), "b"for both, plus quite a few more.

• Functions for adding to a plot: lines, points,segments, abline, text, mtext, axis

• Also: formula interface, plot(y~x)

Page 147: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Basic x-y plots

• The plot function with one or two numeric arguments

• Scatterplot or line plot (or both) depending on typeargument: "l" for lines, "p" for points (the default), "b"for both, plus quite a few more.

• Functions for adding to a plot: lines, points,segments, abline, text, mtext, axis

• Also: formula interface, plot(y~x)

Page 148: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Basic x-y plots

• The plot function with one or two numeric arguments

• Scatterplot or line plot (or both) depending on typeargument: "l" for lines, "p" for points (the default), "b"for both, plus quite a few more.

• Functions for adding to a plot: lines, points,segments, abline, text, mtext, axis

• Also: formula interface, plot(y~x)

Page 149: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Graphical parameters

• Arguments to plot et al. (67 possibilities!)

• The par function can be used to set most of thempersistently. Most info is found via help(par)

• Look them up! Here are some of the more commonlyused:

• Point and line characteristics: pch, col, lty, lwd• Multiframe layout: mfrow, mfcol• Axes: xlim, ylim, xaxt, yaxt, log

Page 150: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Graphical parameters

• Arguments to plot et al. (67 possibilities!)

• The par function can be used to set most of thempersistently. Most info is found via help(par)

• Look them up! Here are some of the more commonlyused:

• Point and line characteristics: pch, col, lty, lwd• Multiframe layout: mfrow, mfcol• Axes: xlim, ylim, xaxt, yaxt, log

Page 151: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Graphical parameters

• Arguments to plot et al. (67 possibilities!)

• The par function can be used to set most of thempersistently. Most info is found via help(par)

• Look them up! Here are some of the more commonlyused:

• Point and line characteristics: pch, col, lty, lwd• Multiframe layout: mfrow, mfcol• Axes: xlim, ylim, xaxt, yaxt, log

Page 152: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Graphical parameters

• Arguments to plot et al. (67 possibilities!)

• The par function can be used to set most of thempersistently. Most info is found via help(par)

• Look them up! Here are some of the more commonlyused:

• Point and line characteristics: pch, col, lty, lwd• Multiframe layout: mfrow, mfcol• Axes: xlim, ylim, xaxt, yaxt, log

Page 153: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Graphical parameters

• Arguments to plot et al. (67 possibilities!)

• The par function can be used to set most of thempersistently. Most info is found via help(par)

• Look them up! Here are some of the more commonlyused:

• Point and line characteristics: pch, col, lty, lwd• Multiframe layout: mfrow, mfcol• Axes: xlim, ylim, xaxt, yaxt, log

Page 154: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Graphical parameters

• Arguments to plot et al. (67 possibilities!)

• The par function can be used to set most of thempersistently. Most info is found via help(par)

• Look them up! Here are some of the more commonlyused:

• Point and line characteristics: pch, col, lty, lwd• Multiframe layout: mfrow, mfcol• Axes: xlim, ylim, xaxt, yaxt, log

Page 155: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Specific plots

• Histograms — hist(x)

• Density plots — plot(density(x))

• Boxplots — boxplot(x)

• Barplots — barplot(x) (x can be a matrix)

• Pies — pie()

• Matrix plots (multiple y columns) — matplot()

Page 156: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Specific plots

• Histograms — hist(x)

• Density plots — plot(density(x))

• Boxplots — boxplot(x)

• Barplots — barplot(x) (x can be a matrix)

• Pies — pie()

• Matrix plots (multiple y columns) — matplot()

Page 157: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Specific plots

• Histograms — hist(x)

• Density plots — plot(density(x))

• Boxplots — boxplot(x)

• Barplots — barplot(x) (x can be a matrix)

• Pies — pie()

• Matrix plots (multiple y columns) — matplot()

Page 158: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Specific plots

• Histograms — hist(x)

• Density plots — plot(density(x))

• Boxplots — boxplot(x)

• Barplots — barplot(x) (x can be a matrix)

• Pies — pie()

• Matrix plots (multiple y columns) — matplot()

Page 159: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Specific plots

• Histograms — hist(x)

• Density plots — plot(density(x))

• Boxplots — boxplot(x)

• Barplots — barplot(x) (x can be a matrix)

• Pies — pie()

• Matrix plots (multiple y columns) — matplot()

Page 160: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Specific plots

• Histograms — hist(x)

• Density plots — plot(density(x))

• Boxplots — boxplot(x)

• Barplots — barplot(x) (x can be a matrix)

• Pies — pie()

• Matrix plots (multiple y columns) — matplot()

Page 161: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Demo 8

data(intake)par(mfrow=c(2,2))matplot(intake)matplot(t(intake))matplot(t(intake),type="b")matplot(t(intake),type="b",pch=1:11,col="black",

lty="solid", xaxt="n")axis(1,at=1:2,labels=names(intake))

Page 162: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Math on plots

• Sort of like TeX

• Works on unevaluated expressions (quote(alpha),expression(alpha))

• Special conventions: ˆ ,[] sub/superscript, special namesalpha , sum, int

• See help(plotmath)

Page 163: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Math on plots

• Sort of like TeX

• Works on unevaluated expressions (quote(alpha),expression(alpha))

• Special conventions: ˆ ,[] sub/superscript, special namesalpha , sum, int

• See help(plotmath)

Page 164: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Math on plots

• Sort of like TeX

• Works on unevaluated expressions (quote(alpha),expression(alpha))

• Special conventions: ˆ ,[] sub/superscript, special namesalpha , sum, int

• See help(plotmath)

Page 165: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Math on plots

• Sort of like TeX

• Works on unevaluated expressions (quote(alpha),expression(alpha))

• Special conventions: ˆ ,[] sub/superscript, special namesalpha , sum, int

• See help(plotmath)

Page 166: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Grid and Lattice graphics

• Standard R graphics allow graphs to be arranged in anm × n gridded layout.

• The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

• The lattice package uses grid for a structuralapproach to multiframe graphs

• Model formulas, y~x|g1*g2*...

• Shingles: Partially overlapping intervals used forconditioning plots

• Panel functions — potentially user codable

Page 167: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Grid and Lattice graphics

• Standard R graphics allow graphs to be arranged in anm × n gridded layout.

• The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

• The lattice package uses grid for a structuralapproach to multiframe graphs

• Model formulas, y~x|g1*g2*...

• Shingles: Partially overlapping intervals used forconditioning plots

• Panel functions — potentially user codable

Page 168: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Grid and Lattice graphics

• Standard R graphics allow graphs to be arranged in anm × n gridded layout.

• The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

• The lattice package uses grid for a structuralapproach to multiframe graphs

• Model formulas, y~x|g1*g2*...

• Shingles: Partially overlapping intervals used forconditioning plots

• Panel functions — potentially user codable

Page 169: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Grid and Lattice graphics

• Standard R graphics allow graphs to be arranged in anm × n gridded layout.

• The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

• The lattice package uses grid for a structuralapproach to multiframe graphs

• Model formulas, y~x|g1*g2*...

• Shingles: Partially overlapping intervals used forconditioning plots

• Panel functions — potentially user codable

Page 170: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Grid and Lattice graphics

• Standard R graphics allow graphs to be arranged in anm × n gridded layout.

• The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

• The lattice package uses grid for a structuralapproach to multiframe graphs

• Model formulas, y~x|g1*g2*...

• Shingles: Partially overlapping intervals used forconditioning plots

• Panel functions — potentially user codable

Page 171: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Grid and Lattice graphics

• Standard R graphics allow graphs to be arranged in anm × n gridded layout.

• The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

• The lattice package uses grid for a structuralapproach to multiframe graphs

• Model formulas, y~x|g1*g2*...

• Shingles: Partially overlapping intervals used forconditioning plots

• Panel functions — potentially user codable

Page 172: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Demo 9

library(lattice)data(airquality)

lset(theme = col.whitebg())myplot <-

xyplot(log(Ozone)~Solar.R | equal.count(Temp),group=Month, data=airquality,ylab=list(label=expression("log"*O[3]),cex=2),xlab=list(cex=2))

myplot # OBS: no plot until object is printed!

Page 173: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Outline

About the Talk

Basics of R

Modeling

The Package System

Some Practical Issues

Graphics

Programming

Page 174: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Ad-hoc programming

• This had better be brief . . .

• What does an R function look like

• Flow control

Page 175: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Ad-hoc programming

• This had better be brief . . .

• What does an R function look like

• Flow control

Page 176: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Ad-hoc programming

• This had better be brief . . .

• What does an R function look like

• Flow control

Page 177: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Simple functions

• logit <- function(p) log(p/(1-p))

• logit(0.5)

• Formal arguments

• Actual arguments

• Positional matching: plot(x,y)

• Keyword matching: t.test(x ~ g, mu=2,alternative="less")

• Partial matching: t.test(x ~ g, mu=2, alt="l")

Page 178: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Simple functions

• logit <- function(p) log(p/(1-p))

• logit(0.5)

• Formal arguments

• Actual arguments

• Positional matching: plot(x,y)

• Keyword matching: t.test(x ~ g, mu=2,alternative="less")

• Partial matching: t.test(x ~ g, mu=2, alt="l")

Page 179: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Simple functions

• logit <- function(p) log(p/(1-p))

• logit(0.5)

• Formal arguments

• Actual arguments

• Positional matching: plot(x,y)

• Keyword matching: t.test(x ~ g, mu=2,alternative="less")

• Partial matching: t.test(x ~ g, mu=2, alt="l")

Page 180: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Simple functions

• logit <- function(p) log(p/(1-p))

• logit(0.5)

• Formal arguments

• Actual arguments

• Positional matching: plot(x,y)

• Keyword matching: t.test(x ~ g, mu=2,alternative="less")

• Partial matching: t.test(x ~ g, mu=2, alt="l")

Page 181: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Simple functions

• logit <- function(p) log(p/(1-p))

• logit(0.5)

• Formal arguments

• Actual arguments

• Positional matching: plot(x,y)

• Keyword matching: t.test(x ~ g, mu=2,alternative="less")

• Partial matching: t.test(x ~ g, mu=2, alt="l")

Page 182: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Simple functions

• logit <- function(p) log(p/(1-p))

• logit(0.5)

• Formal arguments

• Actual arguments

• Positional matching: plot(x,y)

• Keyword matching: t.test(x ~ g, mu=2,alternative="less")

• Partial matching: t.test(x ~ g, mu=2, alt="l")

Page 183: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Simple functions

• logit <- function(p) log(p/(1-p))

• logit(0.5)

• Formal arguments

• Actual arguments

• Positional matching: plot(x,y)

• Keyword matching: t.test(x ~ g, mu=2,alternative="less")

• Partial matching: t.test(x ~ g, mu=2, alt="l")

Page 184: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Flow control

• if/else

• ifelse()

• switch()

• for loops

• repeat , while

• break

Page 185: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Flow control

• if/else

• ifelse()

• switch()

• for loops

• repeat , while

• break

Page 186: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Flow control

• if/else

• ifelse()

• switch()

• for loops

• repeat , while

• break

Page 187: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Flow control

• if/else

• ifelse()

• switch()

• for loops

• repeat , while

• break

Page 188: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Flow control

• if/else

• ifelse()

• switch()

• for loops

• repeat , while

• break

Page 189: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Flow control

• if/else

• ifelse()

• switch()

• for loops

• repeat , while

• break

Page 190: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Apply-functions/loop avoidance

• lapply – list-apply

• sapply – simplifying apply

• tapply – tabulating apply

• apply , sweep – along slices of tables

• replicate – repeat expression

Page 191: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Apply-functions/loop avoidance

• lapply – list-apply

• sapply – simplifying apply

• tapply – tabulating apply

• apply , sweep – along slices of tables

• replicate – repeat expression

Page 192: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Apply-functions/loop avoidance

• lapply – list-apply

• sapply – simplifying apply

• tapply – tabulating apply

• apply , sweep – along slices of tables

• replicate – repeat expression

Page 193: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Apply-functions/loop avoidance

• lapply – list-apply

• sapply – simplifying apply

• tapply – tabulating apply

• apply , sweep – along slices of tables

• replicate – repeat expression

Page 194: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Apply-functions/loop avoidance

• lapply – list-apply

• sapply – simplifying apply

• tapply – tabulating apply

• apply , sweep – along slices of tables

• replicate – repeat expression

Page 195: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Demo 10

# these examples all do the samepval <- numeric(1000)for (i in 1:1000)

pval[i] <- t.test(rexp(25),mu=1)$p.value

f <- function(i) t.test(rexp(25),mu=1)$p.valuepval <- sapply(1:1000, f)

pval <- replicate(1000, t.test(rexp(25),mu=1)$p.value)

qqplot(ppoints(1000), pval, pch=".")

Page 196: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Summary

So what have we seen?

• R is a versatile working environment

• There is a very flexible toolkit for building graphics displays

• You can handle simple tasks quite easily

• Complicated task can be handled via ad hoc programming,often elegantly

• Extensions can be made to integrate seamlessly and alarge body of such extensions is available from CRAN

Page 197: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Summary

So what have we seen?

• R is a versatile working environment

• There is a very flexible toolkit for building graphics displays

• You can handle simple tasks quite easily

• Complicated task can be handled via ad hoc programming,often elegantly

• Extensions can be made to integrate seamlessly and alarge body of such extensions is available from CRAN

Page 198: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Summary

So what have we seen?

• R is a versatile working environment

• There is a very flexible toolkit for building graphics displays

• You can handle simple tasks quite easily

• Complicated task can be handled via ad hoc programming,often elegantly

• Extensions can be made to integrate seamlessly and alarge body of such extensions is available from CRAN

Page 199: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Summary

So what have we seen?

• R is a versatile working environment

• There is a very flexible toolkit for building graphics displays

• You can handle simple tasks quite easily

• Complicated task can be handled via ad hoc programming,often elegantly

• Extensions can be made to integrate seamlessly and alarge body of such extensions is available from CRAN

Page 200: An Introduction to the R Environmentstaff.pubhealth.ku.dk/~pd/slides/binf-jun05/R-overview.pdf · An Introduction to the R Environment Peter Dalgaard Department of Biostatistics University

About the Talk Basics of R Modeling The Package System Some Practical Issues Graphics Programming

Summary

So what have we seen?

• R is a versatile working environment

• There is a very flexible toolkit for building graphics displays

• You can handle simple tasks quite easily

• Complicated task can be handled via ad hoc programming,often elegantly

• Extensions can be made to integrate seamlessly and alarge body of such extensions is available from CRAN