r lecture 5 naomi altman department of statistics

29
R Lecture 5 Naomi Altman Department of Statistics

Upload: adele-terry

Post on 18-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: R Lecture 5 Naomi Altman Department of Statistics

R Lecture 5

Naomi Altman

Department of Statistics

Page 2: R Lecture 5 Naomi Altman Department of Statistics

Example: RegressionThe data are available athttp://www.stat.psu.edu/~jls/stat511/homework/body.dat

?read.table

body=read.table("body.txt",header=T)

plot(body$hips,body$weight)

plot(body$waist,body$weight)

?formula

lm.out=lm(weight~hips+waist,data=body)

attributes(lm.out)

Page 3: R Lecture 5 Naomi Altman Department of Statistics

Formulaslm fits the regression of Y on a set of X variables.

The variable for Y and the predictors are denoted by a formula of the form.

You can also use formulas in other contexts. e.g.

plot(weight~waist, data=body)

Page 4: R Lecture 5 Naomi Altman Department of Statistics

Object Oriented Programming in R

or how a bunch of smart programming types made R easier to use and harder to program - at least in the eyes of a statistician

Page 5: R Lecture 5 Naomi Altman Department of Statistics

In the bad old daysIf I wanted to write a function similar to something already in R, I

would edit the R code:

myFun=edit(Rfun)

myDensity=edit(density)

Sometimes the R code would call a C or C++ program, but the code for that is also available.

Page 6: R Lecture 5 Naomi Altman Department of Statistics

But now ...

plot

boxplot

rnorm

Page 7: R Lecture 5 Naomi Altman Department of Statistics

Classes and Generic FunctionsI have already mentioned that one of the

attributes a R object can have is a class.

A generic function is a function that captures the class of an object and then calls another function to do the actual work. If the function is called fun and the class is called cls, the function that does the work is (almost always) called fun.cls.

If there is no suitable fun.cls, then fun.default is used.

Page 8: R Lecture 5 Naomi Altman Department of Statistics

e.g.

plot(body$hips,body$weight)

plot(lm.out)

plot.default

plot.lm

methods(plot)

Page 9: R Lecture 5 Naomi Altman Department of Statistics

Classes

Actually, a class can be a pair

c("first","second") in which the "first" "inherits from" i.e. is a special case of "second". In practise, this means that it has all the components of class "first" objects but possibly some additional ones.

If there is no fun.first, then the generic function will search for fun.second. Only if there is also no fun.second will fun.default be used.

Page 10: R Lecture 5 Naomi Altman Department of Statistics

e.g. plot

uses plot.lm on an object with class "lm"

and also on an object with class ("glm","lm")

Page 11: R Lecture 5 Naomi Altman Department of Statistics

'inherits' indicates whether its first argument inherits from any of the classes specified in the 'what' argument

glm.out=glm(weight~hips+waist,data=body)

class(glm.out)"glm" "lm"

inherits(lm.out,"lm") inherits(glm.out,"lm")inherits(lm.out,"glm") inherits(glm.out,"glm")

plot.lmplot.glm

plot(glm.out)

Page 12: R Lecture 5 Naomi Altman Department of Statistics

unclassIf you remove the class, most objects are just lists.

lm.out

unclass(lm.out)

For example, the "lm" objects are lists with the following components:

"coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual" "xlevels" "call" "terms" "model"

Some of these components are obvious.Some of them are matrix computations that can be used to compute, e.g. the leverages and Cook's Distance (notice that these have not been stored).Some of them are only empty - they are used primarily when the predictor variable is a factor (ANOVA).

Page 13: R Lecture 5 Naomi Altman Department of Statistics

Why use classesFor the user: less to think about

e.g. you can try generic functions like plot and summary with any output

For the programmer: provides a framework

e.g. you might think about having a plot.myfun and summary.myfun for the function you are writing

also, you can use inheritance so that you do not need to write your own functions

Page 14: R Lecture 5 Naomi Altman Department of Statistics

Generic Functions

Functions that act on many different types of objects are termed "generic functions".

Examples include:

plot print

summary coefficients

anova residuals

Page 15: R Lecture 5 Naomi Altman Department of Statistics

Generic Functions

We have already seen that generic functions behave differently for different classes. The idea is that the user should not have to remember a lot of different function names.

Generic functions are a "good thing" when you want R to do what someone else thinks it should do and can be a "bad thing" when you are trying to do something else with your data.

Page 16: R Lecture 5 Naomi Altman Department of Statistics

Generic Functions

The form of the generic function "genfun" is

genfun=function (object, ...) {

UseMethod("genfun")

}

Page 17: R Lecture 5 Naomi Altman Department of Statistics

Generic Functions

We can use UseMethod to give aliases to the same function.

genfun=function (object, ...){

UseMethod("genfun")}

gen=function (object, ...){

UseMethod("genfun")}

gfun=function (object, ...){

UseMethod("genfun")}

Page 18: R Lecture 5 Naomi Altman Department of Statistics

Generic Functions

If you want an argument other than the first to be the one whose class controls the generic function, then the name of the argument must be sent to UseMethod

genfun=function(x,y,z,...){

UseMethod("genfun",z)

}

Page 19: R Lecture 5 Naomi Altman Department of Statistics

Generic Functions

If UseMethod finds that the calling object inherits from a class, it searches for a function "genfun.class". If there is no function that matches the class, it looks through the inheritance list. If there is no match, or no class, the function "genfun.default" is used.

Page 20: R Lecture 5 Naomi Altman Department of Statistics

Generic Functions

There is a lot more on this in the

"S Poetry" manual - it looks very complete to me.

I have been writing programs in S/R since

1981, and have not needed to create classes or methods but ...

Page 21: R Lecture 5 Naomi Altman Department of Statistics

Generic FunctionsI have often used an existing function to create

new functions - I have been confused by failing to understand generic functions (especially "summary" and "print").

One way to become well-known is to distribute your methodology as an R package. To be distributed from CRAN or other project repositories, your package must adhere to R programming standards.

Page 22: R Lecture 5 Naomi Altman Department of Statistics

Generic Functions

Some of the newer packages (particularly packages for bioinformatics) rely heavily on the use of Generic Functions, and you can never understand what they are doing without understanding at least the basics of this material.

Page 23: R Lecture 5 Naomi Altman Department of Statistics

SlotsI was not able to find an intuitive definition for "slot" so this is my

own heuristic.

An object is a list with a class.

A slot is a function that extracts data from an object.

It may be one of the elements stored in the object, or a derived data element.

Page 24: R Lecture 5 Naomi Altman Department of Statistics

Slots For example: an lm object includes the list:For example: an lm object includes the list:

"coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual" "xlevels" "call" "terms" "model"

We might build a new class, "Elm" (extended We might build a new class, "Elm" (extended "lm")"lm")

Page 25: R Lecture 5 Naomi Altman Department of Statistics

Slots Suppose we wanted to write a method that draws

a histogram of any of dependent variable, residuals, studentized residuals, fitted values.

We could have a method of the form:

hist.Elm=function(object,slot)

Our slots would be: dependent, residuals, student, fitted

Page 26: R Lecture 5 Naomi Altman Department of Statistics

SlotsIf we set class(lm.out)=c("Elm","lm")

then

hist(lm.out,residual) would extract the residuals from the list and draw the histogram.

hist(lm.out,student) would compute the studentized residuals (which are not stored) and draw the histogram.

Page 27: R Lecture 5 Naomi Altman Department of Statistics

SlotsBy convention, the slots of an object can be

extracted either by:

objectname@slotname

or

slotname(objectname)

Page 28: R Lecture 5 Naomi Altman Department of Statistics

SlotsAgain, I have used S/R for many years without

writing or even encountering slots.

But some of the recent packages use this programming concept, so it is important to understand it.

My understanding is that slots are used primarily in areas like data-mining and microarrays, where the data storage requirements are large.

Page 29: R Lecture 5 Naomi Altman Department of Statistics

Learning to Use Objects and other Extensions

Calling C or C++ from R:

Writing R extensions

Object oriented programming in R

(S3 protocol)

R Language Definition

(S4 protocol)

R Internals