r lecture 5

29
R Lecture 5 Naomi Altman Department of Statistics

Upload: bryson

Post on 25-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

R Lecture 5. Naomi Altman Department of Statistics. Example: Regression. The data are available at http://www.stat.psu.edu/~jls/stat511/homework/body.dat ?read.table body=read.table("body.txt",header=T) plot(body$hips,body$weight) plot(body$waist,body$weight) ?formula - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: R Lecture 5

R Lecture 5

Naomi Altman Department of Statistics

Page 2: R Lecture 5

Example: RegressionThe data are available athttp://www.stat.psu.edu/~jls/stat511/homework/body.dat

?read.tablebody=read.table("body.txt",header=T)plot(body$hips,body$weight)plot(body$waist,body$weight)?formulalm.out=lm(weight~hips+waist,data=body)attributes(lm.out)

Page 3: R Lecture 5

Formulaslm fits the regression of Y on a set of X variables.The variable for Y and the predictors are denoted

by a formula of the form.

You can also use formulas in other contexts. e.g.

plot(weight~waist, data=body)

Page 4: R Lecture 5

Object Oriented Programming in R

or how a bunch of smart programming types made R easier to use and harder to program - at least in the eyes of a statistician

Page 5: R Lecture 5

In the bad old daysIf I wanted to write a function similar to something already in R, I

would edit the R code:

myFun=edit(Rfun)

myDensity=edit(density)

Sometimes the R code would call a C or C++ program, but the code for that is also available.

Page 6: R Lecture 5

But now ...

plotboxplotrnorm

Page 7: R Lecture 5

Classes and Generic FunctionsI have already mentioned that one of the attributes

a R object can have is a class.

A generic function is a function that captures the class of an object and then calls another function to do the actual work. If the function is called fun and the class is called cls, the function that does the work is (almost always) called fun.cls.

If there is no suitable fun.cls, then fun.default is used.

Page 8: R Lecture 5

e.g.

plot(body$hips,body$weight)plot(lm.out)

plot.defaultplot.lmmethods(plot)

Page 9: R Lecture 5

Classes

Actually, a class can be a pairc("first","second") in which the "first" "inherits

from" i.e. is a special case of "second". In practise, this means that it has all the components of class "first" objects but possibly some additional ones.

If there is no fun.first, then the generic function will search for fun.second. Only if there is also no fun.second will fun.default be used.

Page 10: R Lecture 5

e.g. plot

uses plot.lm on an object with class "lm"

and also on an object with class ("glm","lm")

Page 11: R Lecture 5

'inherits' indicates whether its first argument inherits from any of the classes specified in the 'what' argument

glm.out=glm(weight~hips+waist,data=body)

class(glm.out)"glm" "lm"

inherits(lm.out,"lm") inherits(glm.out,"lm")inherits(lm.out,"glm") inherits(glm.out,"glm")

plot.lmplot.glm

plot(glm.out)

Page 12: R Lecture 5

unclassIf you remove the class, most objects are just lists.

lm.out

unclass(lm.out)

For example, the "lm" objects are lists with the following components:

"coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual" "xlevels" "call" "terms" "model"

Some of these components are obvious.Some of them are matrix computations that can be used to compute, e.g. the leverages and Cook's Distance (notice that these have not been stored).Some of them are only empty - they are used primarily when the predictor variable is a factor (ANOVA).

Page 13: R Lecture 5

Why use classesFor the user: less to think about e.g. you can try generic functions like plot and

summary with any output

For the programmer: provides a frameworke.g. you might think about having a plot.myfun and summary.myfun for the function you are writingalso, you can use inheritance so that you do not need to write your own functions

Page 14: R Lecture 5

Generic Functions

Functions that act on many different types of objects are termed "generic functions".

Examples include:

plot printsummary coefficientsanova residuals

Page 15: R Lecture 5

Generic FunctionsWe have already seen that generic functions

behave differently for different classes. The idea is that the user should not have to remember a lot of different function names.

Generic functions are a "good thing" when you want R to do what someone else thinks it should do and can be a "bad thing" when you are trying to do something else with your data.

Page 16: R Lecture 5

Generic Functions

The form of the generic function "genfun" is

genfun=function (object, ...) { UseMethod("genfun") }

Page 17: R Lecture 5

Generic Functions

We can use UseMethod to give aliases to the same function.

genfun=function (object, ...){ UseMethod("genfun")}gen=function (object, ...){ UseMethod("genfun")}gfun=function (object, ...){ UseMethod("genfun")}

Page 18: R Lecture 5

Generic FunctionsIf you want an argument other than the first to be the one whose class controls the generic function, then the name of the argument must be sent to UseMethod

genfun=function(x,y,z,...){ UseMethod("genfun",z) }

Page 19: R Lecture 5

Generic Functions

If UseMethod finds that the calling object inherits from a class, it searches for a function "genfun.class". If there is no function that matches the class, it looks through the inheritance list. If there is no match, or no class, the function "genfun.default" is used.

Page 20: R Lecture 5

Generic Functions

There is a lot more on this in the "S Poetry" manual - it looks very complete to me.

I have been writing programs in S/R since 1981, and have not needed to create classes or

methods but ...

Page 21: R Lecture 5

Generic FunctionsI have often used an existing function to create

new functions - I have been confused by failing to understand generic functions (especially "summary" and "print").

One way to become well-known is to distribute your methodology as an R package. To be distributed from CRAN or other project repositories, your package must adhere to R programming standards.

Page 22: R Lecture 5

Generic FunctionsSome of the newer packages (particularly

packages for bioinformatics) rely heavily on the use of Generic Functions, and you can never understand what they are doing without understanding at least the basics of this material.

Page 23: R Lecture 5

SlotsI was not able to find an intuitive definition for "slot" so this is my

own heuristic.

An object is a list with a class. A slot is a function that extracts data from an

object.It may be one of the elements stored in the object,

or a derived data element.

Page 24: R Lecture 5

Slots For example: an lm object includes the list:For example: an lm object includes the list:

"coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual" "xlevels" "call" "terms" "model"

We might build a new class, "Elm" (extended We might build a new class, "Elm" (extended "lm")"lm")

Page 25: R Lecture 5

Slots Suppose we wanted to write a method that draws

a histogram of any of dependent variable, residuals, studentized residuals, fitted values.

We could have a method of the form:hist.Elm=function(object,slot)

Our slots would be: dependent, residuals, student, fitted

Page 26: R Lecture 5

SlotsIf we set class(lm.out)=c("Elm","lm")thenhist(lm.out,residual) would extract the residuals

from the list and draw the histogram.

hist(lm.out,student) would compute the studentized residuals (which are not stored) and draw the histogram.

Page 27: R Lecture 5

SlotsBy convention, the slots of an object can be

extracted either by:

objectname@slotname

or

slotname(objectname)

Page 28: R Lecture 5

SlotsAgain, I have used S/R for many years without

writing or even encountering slots.But some of the recent packages use this

programming concept, so it is important to understand it.

My understanding is that slots are used primarily in areas like data-mining and microarrays, where the data storage requirements are large.

Page 29: R Lecture 5

Learning to Use Objects and other Extensions

Calling C or C++ from R:Writing R extensions

Object oriented programming in R (S3 protocol)R Language Definition(S4 protocol)R Internals