presentation of r

109
An Introduction to R An Introduction to R John Verzani CUNY/College of Staten Island Department of Mathematics NYC ASA, CUNY Ed. Psych/May 24, 2005

Upload: tomasbelen

Post on 10-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 1/109

An Introduction to R

An Introduction to R

John Verzani

CUNY/College of Staten Island Department of Mathematics

NYC ASA, CUNY Ed. Psych/May 24, 2005

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 2/109

An Introduction to R

OutlineWhat is R?The many faces of RData

Manipulating dataApplying functions to data

Vectorization of dataGraphics

Model formulasInference

Significance testsconfidence intervals

ModelsSimple linear regressionMultiple linear regression

Analysis of variance modelsLogistic regression models

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 3/109

An Introduction to R

What is R?

R is an open-source statistical computing environment

R is available from http://www.r-project.org R is a computing language, based on S and S-Plus, which is

well suited for statistical calculations

R has the ability to produce excellent graphics for statisticalexplorations and publications

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 4/109

An Introduction to R

What is R?

The structure of R

RKernel

Library

Base

base

grid

stats

stats4

...

Recommended

MASS

nlme

...

MASS

nlme

Contrib. (CRAN)UsingR

gregmisc

...

UsingR

gregmisc

library()

CLIBatch

DCOM,...

text graphicsdevice

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 5/109

An Introduction to R

The many faces of R

The many faces of R

R is ported to most modern computing platforms: Windows,MAC OS X, Unix with X11 (linux), ...

R has an interface that varies depending on the installation:

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 6/109

An Introduction to R

The many faces of R

Windows interface

Figure: Windows gui

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 7/109

An Introduction to R

The many faces of R

Mac OS X interface

Figure: Mac OS X gui

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 8/109

An Introduction to R

The many faces of R

X11 interface – the command line

Figure: There is no standard GUI for X11 implementations. A“typical”usage may look like this screenshot. Also of interest is ESS package for(X)Emacs users.

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 9/109

An Introduction to R

The many faces of R

The command line interface (CLI)

In R the typical means of interacting with the software is at thecommand line.

Commands are typed at the prompt >

Continuation lines are indicated with a +

ENTER sends the commands off to the interpreter

2 + 2

> 2 + 2  

[1] 4

A I d i R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 10/109

An Introduction to R

Data

Data types

In statistics data comes in different types: numeric,categorical, univariate, bivariate, multivariate, etc.

R has different data types or classes to accommodate thesedifferent types of data sets.

Some basic storage types are:

A I t d ti t R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 11/109

An Introduction to R

Data

numeric vectors: created with c(), etc.

> somePrimes = c(2, 3, 5, 7, 11, 13, 17)

> somePrimes

[1] 2 3 5 7 11 13 17

> odds = seq(1, 15, 2)

> odds

[1] 1 3 5 7 9 11 13 15

> ones = rep(1, 10)

> ones

[1] 1 1 1 1 1 1 1 1 1 1

An Introd ction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 12/109

An Introduction to R

Data

Character strings are indicated by using matching quote, double of single.

Character variables

> character = c("Homer", "Marge", "Bart", "Lisa",

+ "Maggie")

> gender = c("Male", "Female", "Male", "Female",

+ "Male")

An Introduction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 13/109

An Introduction to R

Data

Categorical variables – factors

> gender = factor(c("Male", "Female", "Male",

+ "Female", "Male"))

> gender 

[1] Male Female Male Female Male

Levels: Female Male

Factors have an extra attribute a fixed set of levels that requiresome care. (E.g., you can’t add new levels without some work.)Factors are used instead of character vectors, as this allows R toidentify certain types of data. (Storage space is smaller as well.)

An Introduction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 14/109

An Introduction to R

Data

Logical vectors: vectors of  TRUE or FALSE

> somePrimes

[1] 2 3 5 7 11 13 17

> somePrimes < 10 [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE

> somePrimes %in% c(3, 5, 7)

[1] FALSE TRUE TRUE TRUE FALSE FALSE FALSE

> somePrimes == 2 | somePrimes >= 10 

[1] TRUE FALSE FALSE FALSE TRUE TRUE TRUE

An Introduction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 15/109

An Introduction to R

Data

Matrices

defining matrices:matrix(), rbind(), ...

> M = rbind(c(1, 1), c(0, 1))> M 

[,1] [,2]

[1,] 1 1

[2,] 0 1

An Introduction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 16/109

u

Data

Matrices, cont.

Operations: multiplication. (* is entry-by-entry

> M % * % M  

[,1] [,2]

[1,] 1 2[2,] 0 1

Inverse is found by“solving”Ax 

=b 

,b 

an indentity matrix> solve(M)

[,1] [,2]

[1,] 1 -1

[2,] 0 1

An Introduction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 17/109

Data

Matrices, cont.

Least squares regression coefficients the hard way

> x = 1 : 5  

> y = c(2, 3, 1, 4, 5)> ones = rep(1, length(x))

> X = cbind(ones, x)

> solve(t(X) %*% X, t(X) %*% y)

[,1]ones 0.9

x 0.7

An Introduction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 18/109

Data

Lists

Lists are recursive structures with each level made up of components.

List components can be other data types, functions, additionallists, etc.

Lists are used often as return values of functions in R. Theprint method is set to show only part of the values contained

in the list.

An Introduction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 19/109

Data

Defining lists

> lst = list(a = somePrimes, b = M, c = mean)

> lst

$a

[1] 2 3 5 7 11 13 17

$b

[,1] [,2]

[1,] 1 1

[2,] 0 1

$c

function (x, ...)

UseMethod("mean")

<environment: namespace:base>

An Introduction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 20/109

Data

Other data types

Data can be given extra attributes, such as a time series:

Time series have regular date information

> google = c(100.2, 132.6, 196, 180, 202.7,

+ 191.9, 186.1, 180)> ts(google, start = c(2004, 9), frequency = 12)

Jan Feb Mar Apr May Jun Jul Aug Sep

2004 100.2

2005 202.7 191.9 186.1 180.0Oct Nov Dec

2004 132.6 196.0 180.0

2005

An Introduction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 21/109

Data

Tables – an extension of an matrix or array

The table() function (also xtabs, ftable,...)

> table(gender)

gender

Female Male2 3

> satisfaction = c(3, 4, 3, 5, 4, 3)

> category = c("a", "a", "b", "b", "a", "a")> table(category, satisfaction)

satisfaction

category 3 4 5

a 2 2 0

b 1 0 1

An Introduction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 22/109

Data

Data frames

The most common data-storage format is a data frame

Stores rectangular data: each column a variable, typically eachrow data for one subject

Columns have names for easy reference

May be manipulated like a matrix or a list (each variable a

top-level component)

An Introduction to R

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 23/109

Data

Relationship between vector, matrix, data frame, list

Matrix

q q q

q q q

q q q

q q q

Vector

q q q

Data frame List

An Introduction to R

D

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 24/109

Data

Data frame examples

> role = c("Comic relief", "Parent", "troublemaker",

+ "Goody two-shoes", "Cute baby")

> theSimpsons = data.frame(name = character,

+ gender = gender, role = role)

> theSimpsons

name gender role

1 Homer Male Comic relief

2 Marge Female Parent

3 Bart Male troublemaker

4 Lisa Female Goody two-shoes

5 Maggie Male Cute baby

An Introduction to R

D t

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 25/109

Data

Reading in data

Data can be built-in, entered in at the keyboard, or read in fromexternal files. These may be formatted using fixed width format,commas separated values, tables, etc. For instance, this commandreads in a data set from a url:

Reading urls

> f = "http://www.math.csi.cuny.edu/st/R/crackers.csv"

> crackers = read.csv(f)

> names(crackers)

[1] "Company" "Product"

[3] "Crackers" "Grams"

[5] "Calories" "Fat.Calories"

[7] "Fat.Grams" "Saturated.Fat.Grams"

[9] "Sodium" "Carbohydrates"[11] "Fiber"

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 26/109

Data

Manipulating data

Assignment, Extraction

Values in vectors, matrices, lists, and data frames can be accessedby their components:

By index

> google[1:3][1] 100.2 132.6 196.0

> crackers[1:3, 2:3]

Product Crackers

1 Country Water Cracker Crck Pepper 4

2 Country Water Cracker Klassic 4

3 Country Water Cracker Sun Dried Tomato 4

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 27/109

Data

Manipulating data

By index cont.

> M[1, ]

[1] 1 1

> lst[[1]]

[1] 2 3 5 7 11 13 17

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 28/109

Data

Manipulating data

Access by name

by name

> crackers[1:3, c("Product", "Crackers")]

Product Crackers

1 Country Water Cracker Crck Pepper 42 Country Water Cracker Klassic 4

3 Country Water Cracker Sun Dried Tomato 4

> theSimpsons[["role"]][1] Comic relief Parent troublemaker

[4] Goody two-shoes Cute baby

5 Levels: Comic relief Cute baby ... troublemaker

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 29/109

Data

Manipulating data

Access by logical expressions

logical questions answered TRUE or FALSE

> somePrimes < 10 [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE

> somePrimes[somePrimes < 10]

[ 1 ] 2 3 5 7

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 30/109

ata

Manipulating data

Recycling values

When making assignments in R we might have a situation wheremany values are replaced by 1, or a few. R has a means of recycling  the values in the assignment to fill in the size mismatch.

replace coded values with NA

> x = c(1, 1, 0, 1, 99, 0, 1, 99, 0, 1, 99)

> x[x == 99] = NA

> x[x == 1] = "Yes"

> x[x == 0] = "No"

> x 

[1] "Yes" "Yes" "No" "Yes" NA "No" "Yes" NA

[9] "No" "Yes" NA

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 31/109

Applying functions to data

Applying functions to data

Finding the mean

> fat = crackers$Fat.Grams

> mean(fat)

[1] 3.679

The median

> median(fat)

[1] 3.25

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 32/109

Applying functions to data

Extra arguments to find trimmed mean

> mean(fat, trim = 0.2)

[1] 3.482

missing data – coded NA

> shuttleFailures = c(0, 1, 0, NA, 0, 0, 0)

> mean(shuttleFailures, na.rm = TRUE)

[1] 0.1667

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 33/109

Applying functions to data

Functions

Functions are called by name with a matching pair of  ()

Arguments may be indicated by position or name

Named arguments can (and usually do) have reasonabledefaults

A special role is played by the first argument

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 34/109

Applying functions to data

generic functions

Interacting with R from the command line requires one toremember a lot of function names, although R helps outsomewhat. In practice, many tasks may be viewed generically:E.g.,“print”the values of an object,“summarize”values of an

object,“plot”the object. Of course, different objects should yielddifferent representations.R has methods (S3, S4) to declare a function to be generic. Thisallows different functions to be“dispatched”based on the“class”of the first argument.

A basic template is:

methodName( object, extraArguments)

Some common generic functions are print() (the default action),summary() (for summaries), plot() (for basic plots).

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 35/109

Applying functions to data

summary() function called on a number and factor

> summary(somePrimes)

Min. 1st Qu. Median Mean 3rd Qu. Max.

2.00 4.00 7.00 8.29 12.00 17.00

> summary(gender)

Female Male

2 3

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 36/109

Vectorization of data

R, like MATLAB, is naturally vectorized. For instance, to find thesample variance, (n−1)−1∑(x i − x̄ )2 by hand involves:

sample variance (also var())

> x = c(2, 3, 5, 7, 11, 13)

> fractions(x - mean(x))

[1] -29/6 -23/6 -11/6 1/6 25/6 37/6

> fractions((x - mean(x))^2)

[1] 841/36 529/36 121/36 1/36 625/36 1369/36

> fractions(sum((x - mean(x))^2)/(length(x) -

+ 1))

[1] 581/30

An Introduction to R

Data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 37/109

Vectorization of data

Example: simulating a sample distribution

A simulation of the sampling distribution of  x̄  from a randomsample of size 10 taken from an exponential distribution withparameter 1 naturally lends itself to a“for loop:”

for loop simulation

> res = c()

> for (i in 1:200) { 

+ res[i] = mean(rexp(10, rate = 1))

+ }> summary(res)

Min. 1st Qu. Median Mean 3rd Qu. Max.

0.392 0.786 0.946 1.000 1.220 2.440

An Introduction to R

Data

V i i f d

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 38/109

Vectorization of data

Vectorizing a simulation

It is often faster in R to vectorize the simulation above bygenerating all of the random data at once, and then applying  the mean() function to the data. The natural way to store the data isa matrix.

Simulation using a matrix

> m = matrix(rexp(200 * 10, rate = 1), ncol = 200)

> res = apply(m, 2, mean)

> summary(res)Min. 1st Qu. Median Mean 3rd Qu. Max.

0.308 0.800 1.010 1.020 1.190 1.830

An Introduction to R

Graphics

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 39/109

Graphics

R has many built-in graphic-producing functions, and facilities tocreate custom graphics. Some standard ones include:

histogram and density estimate

> hist(fat, probability = TRUE, col = "goldenrod")

> lines(density(fat), lwd = 3)

An Introduction to R

Graphics

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 40/109

histogram and density estimate

Histogram of fat

fat

      D     e     n     s      i      t     y

0 2 4 6 8

      0 .      0

      0

      0 .      0      5

      0 .      1

      0

      0 .      1

      5

      0 .      2

      0

An Introduction to R

Graphics

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 41/109

Graphics: cont.

Quantile-Quantile plots

> qqnorm(fat)

Boxplots

> boxplot(MPG.highway ~ Type, data = Cars93)

plot() is generic, last one also with

> plot(MPG.highway ~ Type, data = Cars93)

An Introduction to R

Graphics

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 42/109

Quantile-Quantile plot

qqq

qq

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qqq

q

q

q

qq

qqq

qq

q

q

q

q

qq

q

q

q

q

qq

qq

q

q

q

qq

q

q

q

q

qq

q

q

qqq

q

q

q

qq

q

q

q

q

q

q

qqqq

q

q

q

q

q

q

q

q

qq

−2 −1 0 1 2

   0

   2

   4

   6

   8

Normal Q−Q Plot

Theoretical Quantiles

   S  a  m  p   l  e   Q  u  a  n   t   i   l  e  s

An Introduction to R

Graphics

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 43/109

Boxplots

q

q

Compact Large Midsize Small Sporty Van

        2        0

        2        5

        3        0

        3        5

        4        0

        4        5

        5        0

An Introduction to R

Graphics

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 44/109

Fancy examples from upcoming book of P. Murrell

0.0 0.2 0.4 0.6 0.8 1.0

   0

   5   0

   1   0   0

   1   5   0

   2   0   0

   2   5

   0

   3   0   0

   3   5   0

234 (65%)

159 (44%)

1.2

   N  u  m   b  e  r  o   f   V  e  s  s  e   l  s

Sampling Fraction

   C  o  m  p   l  e   t  e  n  e  s  s

   1 .   0

   1 .   2

   1 .   4

   1 .   6

   1 .   8

   2 .   0

N = 360 brokenness = 0.5

An Introduction to R

Graphics

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 45/109

3-d graphics

X1

X2 X3

X2 X3

X1

An Introduction to R

Graphics

Model formulas

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 46/109

Model formula notation

The boxplot example illustrates R’s model formula. Many genericfunctions have a method to handle this type of input, allowing foreasier usage with multivariate data objects.

Example with xtabs – tables

> df = data.frame(cat = category, sat = satisfaction)

> xtabs(~cat + sat, df)

sat

c a t 3 4 5a 2 2 0

b 1 0 1

An Introduction to RGraphics

Model formulas

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 47/109

Model formula cont.

Suppose x, y are numeric variables, and f is a factor. The basicmodel formulas have these interpretations:

y ~ 1 y i  = β0 + εi y ~ x y i  = β0 + β1x i  + εi y ~ x - 1 y i  = β1x i  + εi  remove the intercepty ~ x | f y i  = β0 + β1x i  + εi  grouped by levels of  f y ~ f y ij  = τi  + εij 

The last usage suggests storing multivariate data in two variables –a numeric variable with the measurements, and a factor indicatingthe treatment.

An Introduction to RGraphics

Model formulas

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 48/109

Lattice graphics

Lattice graphics can effectively display multivariate data that arenaturally defined by some grouping variable. (Slightly morecomplicated than need be to show groupedData() function in

nlme package.)

lattice graphics

> cars = groupedData(MPG.highway ~ Weight |

+ Type, Cars93)> plot(cars)

An Introduction to RGraphics

Model formulas

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 49/109

Weight

     M     P     G .     h

     i    g     h    w    a    y

2000 3000 4000

20

25

30

35

40

45

50

q

qq

qq

qq

Van

q

qq

q

q

q

qq

qq

Large

2000 3000 4000

q

q

q

q

q

q

q

q

q

qq

qqq

q

q

q

q

qq

Midsize

q

q

q

qq

q

q

qq

q

q

q

Compact

2000 3000 4000

q

qq

q

q

q

q

q

q

q

q

q

Sporty

20

25

30

35

40

45

50

q

q

q

q

q

q

q

q

q

qq

q q

q

qq

q

q

q

q

Small

An Introduction to RGraphics

Model formulas

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 50/109

more lattice graphics (Murrell’s book)

Barley Yield (bushels/acre)

20 30 40 50 60

SvansotaNo. 462

ManchuriaNo. 475

VelvetPeatlandGlabronNo. 457

Wisconsin No. 38Trebi

q

q

q

q

q

q

q

q

q

q

Grand RapidsSvansota

No. 462Manchuria

No. 475Velvet

PeatlandGlabronNo. 457

Wisconsin No. 38Trebi

q

q

q

q

q

q

q

q

q

q

DuluthSvansota

No. 462Manchuria

No. 475Velvet

PeatlandGlabronNo. 457

Wisconsin No. 38Trebi

q

q

q

q

q

q

q

q

q

q

University FarmSvansota

No. 462Manchuria

No. 475Velvet

PeatlandGlabronNo. 457

Wisconsin No. 38Trebi

q

q

q

q

q

q

q

q

q

q

MorrisSvansota

No. 462Manchuria

No. 475Velvet

PeatlandGlabronNo. 457

Wisconsin No. 38Trebi

q

q

q

q

q

q

q

q

q

q

CrookstonSvansota

No. 462Manchuria

No. 475Velvet

PeatlandGlabronNo. 457

Wisconsin No. 38Trebi

q

q

q

q

q

q

q

q

q

q

Waseca

19321931

q

An Introduction to RInference

Significance tests

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 51/109

Significance tests

There are several functions for performing classical statistical testsof significance: t.test(), prop.test(), oneway.test(),wilcox.test(), chisq.test(), ...These produce a p -value, and summaries of the computations.

An Introduction to RInference

Significance tests

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 52/109

The Bumpus data set (Ramsey and Shafer) contains data from1898 lecture supporting evolution (Some birds survived a harshwinter storm)

two-sample t  test

> Bumpus = read.table("Bumpus.txt", header = TRUE)

> plot(humerus ~ factor(code), data = Bumpus)

An Introduction to RInference

Significance tests

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 53/109

Diagnostic plot

q

q

1 2

     6     6     0

     6     8     0

     7     0     0

     7     2     0

     7     4     0

     7     6     0

     7     8     0

factor(code)

     h    u    m    e    r    u    s

An Introduction to RInference

Significance tests

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 54/109

t.test() output

> t.test(humerus ~ code, data = Bumpus)

Welch Two Sample t-test

data: humerus by codet = -1.721, df = 43.82, p-value = 0.09236

alternative hypothesis: true difference in means is not eq 

95 percent confidence interval:

-21.895 1.728

sample estimates: mean in group 1 mean in group 2

727.9 738.0

An Introduction to RInference

Significance tests

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 55/109

The SchizoTwins data set (R&S) contains data on 15 pairs of monozygotic twins. Measured values are of volume of lefthippocampus.

t -tests: paired

> twins = read.table("SchizoTwins.txt", header = TRUE)

> plot(affected ~ unaffected, data = twins)> attach(twins)

> t.test(affected - unaffected)$p.value

[1] 0.006062

> t.test(affected, unaffected, paired = TRUE)$p.value

[1] 0.006062

> detach(twins)

An Introduction to RInference

Significance tests

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 56/109

Diagnostic plot

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

1.4 1.6 1.8 2.0

       1  .

       0

       1  .

       2

       1  .

       4

       1  .

       6

       1  .

       8

       2  .

       0

unaffected

      a        f        f      e      c       t      e        d

An Introduction to RInference

confidence intervals

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 57/109

Confidence intervals

Confidence intervals are computed as part of the output of manyof these functions. The default is to do 95% CIs, which may beadjusted using conf.level=.

95% CI humerus length overall

> t.test(Bumpus$humerus)

...

95 percent confidence interval:728.2 739.6

...

An Introduction to RInference

confidence intervals

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 58/109

Chi-square tests

Goodness of fit tests are available through chisq.test() andothers. For instance, data from Rosen and Jerdee (1974, fromR&S) on the promotion of candidates based on gender:

gender data

> rj = rbind(c(21, 3), c(14, 10))

> dimnames(rj) = list(gender = c("M", "F"),

+ promoted = c("Y", "N"))

> rj

promotedgender Y N

M 21 3

F 14 10

An Introduction to RInference

confidence intervals

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 59/109

sieveplot(rj)

Sieve diagram

promoted

    g    e    n     d    e    r

     F

Y N

     M

An Introduction to RInference

confidence intervals

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 60/109

chi-squared test p -value

> chisq.test(rj)$p.value

[1] 0.05132

Fischer’s exact test

> fisher.test(rj, alt = "greater")$p.value

[1] 0.02450

An Introduction to RModels

Simple linear regression

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 61/109

Fitting linear models

Linear models are fit using lm():

This function uses the syntax for model formula Model objects are reticent – you need to ask them for more

information

This is done with extractor functions: summary(), resid(),fitted(), coef(), predict(), anova(), deviance(),...

An Introduction to RModels

Simple linear regression

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 62/109

Body fat data set

> source("http://www.math.csi.cuny.edu/st/R/fat.R")

> names(fat)

[1] "case" "body.fat" "body.fat.siri"[4] "density" "age" "weight"

[7] "height" "BMI" "ffweight"

[10] "neck" "chest" "abdomen"

[13] "hip" "thigh" "knee"

[16] "ankle" "bicep" "forearm"[19] "wrist"

An Introduction to RModels

Simple linear regression

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 63/109

Fitting a simple linear regression model

Basic fit is done with lm: response ˜ predictor(s)

> res = lm(body.fat ~ BMI, data = fat)

> res

Call:

lm(formula = body.fat ~ BMI, data = fat)

Coefficients:

(Intercept) BMI

-20.41 1.55

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 64/109

An Introduction to RModels

Simple linear regression

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 65/109

Scatterplot with regression line

q

q

q

q

q

q

q

q

q

q

qq

q qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

qq

q

q

q

q

qq

q

q

q

q

qq q

q

q

q

q

q

q

q

q

q

q

qq

q

qq

qq

q

q

q

qq

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

20 25 30 35 40 45 50

       0

       1

       0

       2       0

       3       0

       4       0

BMI

        b      o        d      y  .

        f      a       t

BMI predicting body fat

least−squares

lqs

An Introduction to RModels

Simple linear regression

()

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 66/109

Basic output is minimal, more is given by summary()

> summary(res)

Call:

lm(formula = body.fat ~ BMI, data = fat)

Residuals:

Min 1Q Median 3Q Max

-21.4292 -3.4478 0.2113 3.8663 11.7826

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) -20.40508 2.36723 -8.62 7.78e-16 ***

BMI 1.54671 0.09212 16.79 < 2e-16 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’

An Introduction to RModels

Simple linear regression

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 67/109

summary() output cont.

Residual standard error: 5.324 on 250 degrees of freedomMultiple R-Squared: 0.53, Adjusted R-squared: 0.5281

F-statistic: 281.9 on 1 and 250 DF, p-value: < 2.2e-16

An Introduction to RModels

Simple linear regression

E f i d i f i

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 68/109

Extractor functions used to extract information

extractor functions

> coef(res)

(Intercept) BMI

-20.405 1.547

> summary(residuals(res))

Min. 1st Qu. Median Mean 3rd Qu.

-2.14e+01 -3.45e+00 2.11e-01 3.52e-17 3.87e+00Max.

1.18e+01

An Introduction to RModels

Simple linear regression

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 69/109

Residual plots to test model assumptions

> plot(fitted(res), resid(res))

> qqnorm(resid(res))

An Introduction to RModels

Simple linear regression

R id l l t

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 70/109

Residual plots

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

qq

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

qq

qq

q

qq

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

qq

q

q

q

q

qq

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

qq

qq

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

10 20 30 40 50

   −       2       0

   −       1       5

   −       1       0

   −       5

       0

       5

       1       0

fitted(res)

      r      e      s        i        d        (      r      e      s        )

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qqq

q

q

q

q

q

q

q

qq

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

qq

qqq

q

qq

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

qq

q

q

q

q

q

q

qq

q

q

q

q

qq

q

q

q

q

qq

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

qq

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

−3 −2 −1 0 1 2 3

 −   2   0

 −   1   5

 −   1   0

 −   5

   0

   5

   1   0

Normal Q−Q Plot

Theoretical Quantiles

   S  a  m  p   l  e   Q  u  a  n   t   i   l  e  s

An Introduction to R

Models

Simple linear regression

D f lt di g sti l ts

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 71/109

Default diagnostic plots

Model objects, such as the output of  lm(), have default plotsassociated with them. For lm() there are four plots.

plot(res)

> par(mfrow = c(2, 2))

> plot(res)

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 72/109

An Introduction to R

Models

Simple linear regression

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 73/109

Predictions done using the predict() extractor function

> vals = seq(15, 35, by = 2)

> names(vals) = vals

> predict(res, newdata = data.frame(BMI = vals))15 17 19 21 23 25 27

2.796 5.889 8.982 12.076 15.169 18.263 21.356

29 31 33 35

24.450 27.543 30.636 33.730

An Introduction to R

Models

Multiple linear regression

Multiple regression

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 74/109

Multiple regression

Multiple regression models are modeled with lm() as well. Extracovariates are specified using the following notations:

+ adds terms (- subtracts them, such as -1)

Math expressions can (mostly) be used as is: log, exp,...

I() used to insulate certain math expressions

a:b adds an interaction between a and b. Also, *, ^ areshortcuts for more complicated interactions

To illustrate, we model the body.fat variable, by measurementsthat are easy to compute

An Introduction to R

Models

Multiple linear regression

Modeling body fat

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 75/109

Modeling body fat

> res = lm(body.fat ~ age + weight + height + + chest + abdomen + hip + thigh, data = fat)

> res

Call:lm(formula = body.fat ~ age + weight + height + chest + ab

Coefficients:

(Intercept) age weight height

-33.27351 0.00986 -0.12846 -0.09557chest abdomen hip thigh

-0.00150 0.89851 -0.17687 0.27132

An Introduction to R

Models

Multiple linear regression

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 76/109

Model selection using AIC

> stepAIC(res, trace = 0)

Call:

lm(formula = body.fat ~ weight + abdomen + thigh, data = f

Coefficients:

(Intercept) weight abdomen thigh

-48.039 -0.170 0.917 0.209

(Set trace=1 to get diagnostic output.)

An Introduction to R

Models

Multiple linear regression

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 77/109

Model selection using F -statistic

> res.sub = lm(body.fat ~ weight + height + 

+ abdomen, fat)

> anova(res.sub, res)

Analysis of Variance Table

Model 1: body.fat ~ weight + height + abdomen

Model 2: body.fat ~ age + weight + height + chest + abdome

Res.Df RSS Df Sum of Sq F Pr(>F)

1 248 42062 244 4119 4 88 1.3 0.27

An Introduction to R

Models

Analysis of variance models

Analysis of variance models

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 78/109

Analysis of variance models

A simple one-way analysis of variance test is done using,oneway.test() with a model formula of the type y ~ f.For instance, we look at data on the lifetime of mice who have

been given a type of diet (R&S).

read in data, make plot

> mice = read.table("mice-life.txt", header = TRUE)

> plot(lifetime ~ treatment, data = mice, ylab = "Months")

An Introduction to R

Models

Analysis of variance models

Plot of months survived by diet

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 79/109

Plot of months survived by diet

qq

qq

q

q

qq

q

q

q

q

N/N85 N/R40 N/R50 NP R/R50 lopro

     1     0

     2     0

     3     0

     4

     0

     5     0

treatment

     M    o    n     t     h    s

An Introduction to R

Models

Analysis of variance models

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 80/109

One way test of equivalence of means

> oneway.test(lifetime ~ treatment, data = mice,

+ var.equal = TRUE)

One-way analysis of means

data: lifetime and treatment

F = 57.1, num df = 5, denom df = 343, p-value <

2.2e-16

(var.equal=TRUE for assumption of equal variances, not default)

An Introduction to R

Models

Analysis of variance models

Using lm() for ANOVA

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 81/109

Us g () o O

Modeling is usually done with a modeling function. The lm()

function can also fit a one-way ANOVA, again with the samemodel formula

Using lm()

> res = lm(lifetime ~ treatment, data = mice)> anova(res)

Analysis of Variance Table

Response: lifetimeDf Sum Sq Mean Sq F value Pr(>F)

treatment 5 12734 2547 57.1 <2e-16 ***

Residuals 343 15297 45

---

Si nif codes: 0 ’***’ 0 001 ’**’ 0 01 ’*’ 0 05 ’ ’ 0 1 ’

An Introduction to R

Models

Analysis of variance models

Following Ramsey and Schafer we ask: Does lifetime on 50kcal/wkd h f 85 k l/ h? W k d f h

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 82/109

exceed that of 85 kcal/month? We can take advantage of the use

of treatment contrasts to investigate this (difference in mean fromfirst level is estimated).

Treatment contrasts, set β1

> treat = relevel(mice$treatment, "N/R50")

> res = lm(lifetime ~ treat, data = mice)

> coef(summary(res))

Estimate Std. Error t value Pr(>|t|)

(Intercept) 42.30 0.8 53.37 3.4e-168

treatN/N85 -9.61 1.2 -8.09 1.1e-14treatN/R40 2.82 1.2 2.41 1.7e-02

treatNP -14.90 1.2 -12.01 5.7e-28

treatR/R50 0.59 1.2 0.49 6.2e-01

treatlopro -2.61 1.2 -2.19 2.9e-02

An Introduction to R

Models

Logistic regression models

logistic regression

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 83/109

g g

Logistic regression extends the linear regression framework tobinary response variables. It may be seen as a special case of ag eneralized  l inear  model  which consists of:

A response y  and covariates x 1,x 

2, . . . ,x 

p  A linear predictor η = β1x 1 + · · ·+βp x p .

A specific family  of distributions which describe the randomvariable y  with mean response, µy |x , related to η through alink function, m−1, where

 µy |x  = m(η), or η = m−1( µy |x )

An Introduction to R

Models

Logistic regression models

logistic regression example

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 84/109

g g p

Linear regression would be m being the identity and thedistribution being the normal distribution.

For logistic regression, the response variable is Bernoulli, sothe mean is also the probability of success. The link functionis the logit , or log-odds, function and the family is Bernoulli, aspecial case of the binomial.

We illustrate the implementation in R with an example from

Devore on the failure of space shuttle rings (a binary responsevariable) in terms of take off temperature.

An Introduction to R

Models

Logistic regression models

Space shuttle data

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 85/109

read in data, diagnostic plot

> shuttle = read.table("shuttle.txt", header = TRUE)

> dotplot(~Temperature | Failure, data = shuttle,

+ layout = c(1, 2))

An Introduction to R

Models

Logistic regression models

Lift-off temperature by O-Ring failure/success (Y/N)

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 86/109

y g / ( / )

Temperature

55 60 65 70 75 80

q q q q q q q q q q q q q q

N

q q q q q

Y

An Introduction to R

Models

Logistic regression models

Fitting a logistic model

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 87/109

We need to specify the formula, the family , and the link  to theglm() function:

Specify model, family, optional link

> res.glm = glm(Failure ~ Temperature, data = shuttle,+ family = binomial(link = logit))

> coef(summary(res.glm))

Estimate Std. Error z value Pr(>|z|)

(Intercept) 11.7464 6.0214 1.951 0.05108

Temperature -0.1884 0.0891 -2.115 0.03443

(Actually, link=logit is default for binomial family.)

An Introduction to R

Models

Random effects models

mixed effects

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 88/109

Mixed-effects models are fit using the nlme package (or its newreplacement lmer which can also do logistic regression).

Need to specify the fixed and random effects Can optionally specify structure beyond independence for the

error terms.

The implemention is well documented in Pinheiro and Bates(2000)

An Introduction to R

Models

Random effects models

Mixed-effects example

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 89/109

We present an example from Piheiro and Bates on a data setinvolving a dental measurement taken over time for the same set of subjects – longitudinal data.

The key variables Orthodont the data frame containing:

distance – measurement

age – age of subject at time of measurement

Sex – gender of subject subject – subject code

An Introduction to R

Models

Random effects models

Fit model with lm(), check

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 90/109

The simple regression model is

y i  = β0 +β1x i  + εi ,

where εi  is N  (0,σ2).

model using lm()

> res.lm = lm(distance ~ I(age - 11), Orthodont)

> res.lm

> bwplot(getGroups(Orthodont) ~ resid(res.lm))

Call:

lm(formula = distance ~ I(age - 11), data = Orthodont)

Coefficients:

(Intercept) I(age - 11)

24 02 0 66

An Introduction to R

Models

Random effects models

Boxplots of residuals by group

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 91/109

resid(res.lm)

−5 0 5

M16

M05

M02

M11

M07

M08

M03

M12

M13

M14

M09

M15

M06

M04

M01

M10

F10

F09

F06

F01

F05

F07

F02

F08

F03

F04

F11

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

An Introduction to R

Models

Random effects models

Fit each group

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 92/109

A linear model fit for each group is this model

y ij  = βi 0 +βi 1x ij  + εij ,

where εij  are N  (0,

σ2).

fit each group with lmList()

> res.lmlist = lmList(distance ~ I(age - 11) |

+ Subject, data = Orthodont)> plot(intervals(res.lmlist))

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 93/109

An Introduction to R

Models

Random effects models

Fit with random effect

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 94/109

This fits the model

y ij  = (β0 + b 0i ) + (β1 + b 1i )x ij  + εij ,

with b ·i  N  (0,Ψ). εij  N  (0,σ2).

fit with lme()

> res.lme = lme(distance ~ I(age - 11), data = Orthodont,

+ random = ~I(age - 11) | Subject)

> plot(augPred(res.lme))> plot(res.lme, resid(.) ~ fitted(.) | Sex)

An Introduction to R

Models

Random effects models

Predictions based on random-effects model

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 95/109

Age (yr)

   D   i  s   t  a  n  c  e   f  r  o  m   p

   i   t  u   i   t  a  r  y   t  o  p   t  e  r  y  g  o  m  a  x   i   l   l  a  r  y   f   i  s  s  u  r  e   (  m  m   )

8 9 10 11 12 13 14

20

25

30

qq

q

q

M16

q

qq

q

M05

8 9 10 11 12 13 14

qq

q

q

M02

q qq

q

M11

8 9 10 11 12 13 14

q q

q

q

M07

q

q

qq

M08

qq

q

q

M03

q

qq

q

M12

q

q

q

q

M13

q

q qq

M14

q

q

q

q

M09

20

25

30

q

q

q

q

M15

20

25

30

qq

q

q

M06

q

qq

q

M04

qq

q

q

M01

qq

qq

M10

q

q qq

F10

qq

qq

F09

qq q

q

F06

qq

q

q

F01

q

qq

q

F05

qq

q

q

F07

qq

q

q

F02

20

25

30

q qq

q

F08

20

25

30

q

qq

q

F03

8 9 10 11 12 13 14

qq

q

q

F04

qq

q q

F11

An Introduction to R

Models

Random effects models

Residuals by gender, subject

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 96/109

Fitted values (mm)

   R  e  s   i   d  u  a   l  s   (  m  m   )

20 25 30

−4

−2

0

2

4

q

q

q

q

q

q

q

qq

qq

qq

q

q

q

q

q

qq

q q q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

Male

20 25 30

q

q

q

qq

q

qq

q

q

q q

q

q

qq

q

q

q q

q q

q

q

qq

q

q

q

q

q

q

qqq

qq

q

q

q

q

q

q

q

Female

An Introduction to R

Models

Random effects models

Adjust the variance

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 97/109

Adjust the variance for different groups is done by specifying aformula to the weights= argument:

Adjust σ for each gender

> res.lme2 = update(res.lme, weights = varIdent(form = ~1

+ Sex))

> plot(compareFits(ranef(res.lme), ranef(res.lme2)))

> plot(comparePred(res.lme, res.lme2))

An Introduction to R

Models

Random effects models

Compare random effects BLUPs for two models

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 98/109

−4 −2 0 2 4

M16

M05

M02

M11

M07

M08

M03

M12

M13

M14

M09

M15

M06

M04

M01

M10

F10

F09

F06

F01

F05

F07

F02

F08

F03

F04

F11

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

(Intercept)

−0.2 0.0 0.2 0.4

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

I(age − 11)

q ranef(res.lme) ranef(res.lme2)

An Introduction to R

Models

Random effects models

Compare the different predictions

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 99/109

Age (yr)

   D   i  s   t  a  n  c  e   f  r  o  m   p

   i   t  u   i   t  a  r  y   t  o  p   t  e  r  y  g  o  m  a  x   i   l   l  a  r  y   f   i  s  s  u  r

  e   (  m  m   )

8 9 10 11 12 13 14

20

25

30

qq

q

q

M16

q

qq

q

M05

8 9 10 11 12 13 14

qq

q

q

M02

q qq

q

M11

8 9 10 11 12 13 14

q q

q

q

M07

q

q

qq

M08

qq

q

q

M03

q

qq

q

M12

q

q

q

q

M13

q

q qq

M14

q

q

q

q

M09

20

25

30

q

q

q

q

M15

20

25

30

qq

q

q

M06

q

qq

q

M04

qq

q

q

M01

qq

qq

M10

q

q qq

F10

qq

qq

F09

qq q

q

F06

qq

q

q

F01

q

qq

q

F05

qq

q

q

F07

qq

q

q

F02

20

25

30

q qq

q

F08

20

25

30

q

qq

q

F03

8 9 10 11 12 13 14

qq

q

q

F04

qq

q q

F11

res.lme res.lme2

An Introduction to R

Models

Random effects models

compare models using anova()

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 100/109

These nested models can be formally compared using a likelihoodratio test. The details are carried out by the anova() method:

compare nested models

> anova(res.lme, res.lme2)Model df AIC BIC logLik Test L.Ratio

res.lme 1 6 454.6 470.6 -221.3

res.lme2 2 7 435.6 454.3 -210.8 1 vs 2 20.99

p-value

res.lme

res.lme2 <.0001

An Introduction to R

Extras

Extending R: writing functions

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 101/109

R can be extended by writing functions. These may be defined inseparate files and read into R, or defined within an R session. Forinstance, how to create the following diagram?

An Introduction to R

Extras

Pizza and pepperoni

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 102/109

An Introduction to R

Extras

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 103/109

helper functions

plotCircle = function(x,radius=1,...) {

t = seq(0,2*pi,length=100)

polygon(x[1]+ radius*cos(t), x[2]+radius*sin(t), ...)

}

doPlot = function(x,r,R,...) {

if(sqrt(sum(x^2))+r < R) plotCircle(x,r=r,...)

}

An Introduction to R

Extras

Key points

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 104/109

functions are defined using function()

Arguments are matched by position or name

Named arguments may be abbreviated

The special argument ... is used to pass along extraarguments

Command blocks are indicated using braces

Functions can be defined on the command line, usedanonymously, or be stored in files to be sourced in.

An Introduction to R

Extras

Calling function

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 105/109

g

plotPizza = function(n,R=n,r=0.2) {

par(mai=c(0,0,0,0))

plot.new();plot.window(xlim=c(-n,n),ylim=c(-n,n),asp=1)

plotCircle(c(0,0),radius=R,lwd=2)

x = rep(-n:n,rep(2*n+1,2*n+1))

y = rep(-n:n,length.out=(2*n+1)^2)

apply(cbind(x,y),1,function(x) doPlot(x,r,R,col=gray(.5)

}

plotPizza(5)

An Introduction to R

Extras

Extending R with add-on packages

Extending R: add-on packages

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 106/109

One can extend R using add-on packages. Some are built-in(≈ 10), over 400 contributed packages are hosted on CRAN(http://cran.r-project.org), others are on author’s websites.For instance, neglecting issues with permissions, to download andinstall a basic GUI for R can be done with the command

Installing a package

> install.packages("Rcmdr")

This GUI, available for the three main platforms, allows one toselect variables, and fill in function arguments with a mouse.The command install.packages() allows one to browse theavailable packages.

An Introduction to R

ExtrasExtending R with add-on packages

Learning more

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 107/109

R has several different ways that you can learn more:The built in help pages. Some key fuctions are

help.start() – to start web interface ?functionName – to find specific help for the named function

apropos("word") To search through searchlist for a word

help.search() matches in more places than apropos()

An Introduction to R

ExtrasExtending R with add-on packages

More free documentation

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 108/109

The accompanying manuals Included with R are 5 manuals in pdf or html form. An Introduction to R  contains lots  of information.

Contributed documentation Contributed documentation: On the R

project webpage http://www.r-project.org is alink to contributed documentation. There are quite afew documents of significant size, including a fewthat have made it into book form.

The R mailing list The R mailing list is full of information.Questions should only be asked after a reading of theFAQ or you are likely to have a rather terse response.

An Introduction to R

ExtrasExtending R with add-on packages

Books on R (Also numerous S-plus titles)

8/8/2019 Presentation of R

http://slidepdf.com/reader/full/presentation-of-r 109/109

200 400 600

Verzani, Using R for Introductory Statistics

Dalgaard, Introductory Statistics with R

Maindonald and Braun, Data Analysis and Graphics Using R

Fox, An R and S−Plus Companion to Applied Regression

Faraway, Linear Models with R

Heiberger and Holland, Statistical Analysis and Data Display...

Venables and Ripley, Modern Applied Statistics with S

Pinheiro and Bates, Mixed−Effects Models in S and S−Plus

Venables and Ripley, S Programming

Chambers and Hastie, Statistical Models in S.

Chambers, Programming with Data

Paul Murrell, R Graphics