spring 2007advanced statistics using rstephen cox [email protected] advanced statistics using....

39
Advanced Statistics Using R Stephen Cox [email protected] Spring 2007 Advanced Statistics Advanced Statistics using . using . Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. H. G. Wells Data analysis is an aid to thinking and not a replacement for it. Richard Shillington “Organic chemist!”, said Tilley disdainfully. “Probably knows no statistics whatever.” Nigel Balchin The Small Back Room Statistics means never having to say you’re certain. Philip Stark Before the curse of statistics fell upon mankind we lived a happy, innocent life, full of merriment and go, and informed by fairly good judgment. Hilaire Belloc The Silence of the Sea

Upload: olivia-mckinney

Post on 23-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Advanced StatisticsAdvanced Statisticsusing .using .

Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.

H. G. Wells

Data analysis is an aid to thinking and not a replacement for it.Richard Shillington

“Organic chemist!”, said Tilley disdainfully. “Probably knows no statistics whatever.”

Nigel BalchinThe Small Back Room

Statistics means never having to say you’re certain. Philip Stark

Before the curse of statistics fell upon mankind we lived a happy, innocent life, full of merriment and go, and informed by fairly good judgment. Hilaire Belloc The Silence of the Sea

Page 2: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Why R?Why R?• An open source environment for statistical computing An open source environment for statistical computing

and visualizationand visualization– GNU/GPL version of the S Language from Bell LaboratoriesGNU/GPL version of the S Language from Bell Laboratories– Highly extensible (i.e., customizable)Highly extensible (i.e., customizable)

• Integrated suite of software facilities for data Integrated suite of software facilities for data manipulation, calculation, analysis, and graphical manipulation, calculation, analysis, and graphical displaydisplay– Effective data handling and storage facilityEffective data handling and storage facility– Large, coherent, integrated collection of tools for data analysisLarge, coherent, integrated collection of tools for data analysis– Graphical facilities for data analysis and displayGraphical facilities for data analysis and display– A well-developed, simple, and powerful programming A well-developed, simple, and powerful programming

languagelanguage

Page 3: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Why R?Why R?

““The term "environment" is intended to characterize it as a fully The term "environment" is intended to characterize it as a fully planned and coherent system, rather than an incremental planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.”case with other data analysis software.”

R is free :)R is free :)

Binaries available for Windows, Mac, Linux, Unix, … Binaries available for Windows, Mac, Linux, Unix, …

Page 4: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

R is a programming language!R is a programming language!• Interpreted LanguageInterpreted Language

– Issue a commandIssue a command– R immediately gives a response (no compiling)R immediately gives a response (no compiling)

• Two basic ways to interact with RTwo basic ways to interact with R– Interactive sessionInteractive session

• Type in command – get an answerType in command – get an answer• R commands are functionsR commands are functions

– output = function_name(input)output = function_name(input)

– R Scripts (text file with name - R Scripts (text file with name - file_name.Rfile_name.R))• Save a long list of commands in a text fileSave a long list of commands in a text file• Run the script using Run the script using source()source()

Page 5: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Scripting!!!!Scripting!!!!• Explicit code!Explicit code!

– File mergesFile merges

– Case deletions Case deletions

– TransformationsTransformations

– CalculationsCalculations

– AnalysisAnalysis

– GraphicsGraphics

• AdvantagesAdvantages– Retains integrity of original dataRetains integrity of original data

– All manipulation of raw data is All manipulation of raw data is documenteddocumented

– Reduces ambiguity and number Reduces ambiguity and number of data filesof data files

– Reduces chances of mistakesReduces chances of mistakes

– Facilitates unanticipated changesFacilitates unanticipated changes

– Saves time in the long run!!Saves time in the long run!!

Page 6: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Write your own functions!Write your own functions!EC50.calcEC50.calc<-<-functionfunction((coefcoef,,vcovvcov,,conf.levelconf.level==.95.95) { ) {

# calculates confidence interval based upon Fieller's thm.# calculates confidence interval based upon Fieller's thm.

# assumes link is linear in dose# assumes link is linear in dose

callcall <- <- match.callmatch.call() ()

b0b0<-<-coefcoef[[11] ]

b1b1<-<-coefcoef[[22] ]

var.b0var.b0<-<-vcovvcov[[11,,11] ]

var.b1var.b1<-<-vcovvcov[[22,,22] ]

cov.b0.b1cov.b0.b1<-<-vcovvcov[[11,,22] ]

alphaalpha<-<-11--conf.levelconf.level

zalpha.2zalpha.2 <- - <- -qnormqnorm((alphaalpha//22) )

gammagamma <- <- zalpha.2zalpha.2^̂22 * * var.b1var.b1 / ( / (b1b1^̂22) )

EC50EC50 <- - <- -b0b0//b1b1

const1const1 <- ( <- (gammagamma/(/(11--gammagamma))*())*(EC50EC50 + + cov.b0.b1cov.b0.b1//var.b1var.b1))

const2aconst2a <- <- var.b0var.b0 + + 22**cov.b0.b1cov.b0.b1**EC50EC50 + + var.b1var.b1**EC50EC50^̂22 - - gammagamma*(*(var.b0var.b0 - - cov.b0.b1cov.b0.b1^̂22//var.b1var.b1))

const2const2 <- <- zalpha.2zalpha.2/( (/( (11--gammagamma)*)*absabs((b1b1) )*) )*sqrtsqrt((const2aconst2a) )

LCLLCL <- <- EC50EC50 + + const1const1 - - const2const2

UCLUCL <- <- EC50EC50 + + const1const1 + + const2const2

conf.ptsconf.pts <- <- cc((LCLLCL,,EC50EC50,,UCLUCL) )

namesnames((conf.ptsconf.pts) <- ) <- cc(("Lower""Lower",,"EC50""EC50",,"Upper""Upper") ) returnreturn((conf.ptsconf.pts,,conf.levelconf.level,,callcall==callcall) )

}}

EC50a.calcEC50a.calc<-<-functionfunction((objobj,,conf.levelconf.level==.95.95) { ) {

# calculates confidence interval based upon Fieller's thm.# calculates confidence interval based upon Fieller's thm.# modified version of EC50.calc found in P&B Fig 7.22# modified version of EC50.calc found in P&B Fig 7.22# now allows other link functions, using the calculations# now allows other link functions, using the calculations # found in dose.p (MASS)# found in dose.p (MASS)# SBC 19 May 05# SBC 19 May 05

callcall <- <- match.callmatch.call()()coefcoef = = coefcoef((objobj))vcovvcov = = summary.glmsummary.glm((objobj)$)$cov.unscaledcov.unscaledb0b0<-<-coefcoef[[11]]b1b1<-<-coefcoef[[22]]var.b0var.b0<-<-vcovvcov[[11,,11]]var.b1var.b1<-<-vcovvcov[[22,,22]]cov.b0.b1cov.b0.b1<-<-vcovvcov[[11,,22]]alphaalpha<-<-11--conf.levelconf.levelzalpha.2zalpha.2 <- - <- -qnormqnorm((alphaalpha//22))gammagamma <- <- zalpha.2zalpha.2^̂22 * * var.b1var.b1 / ( / (b1b1^̂22))etaeta = = familyfamily((objobj)$)$linkfunlinkfun((.5.5) ) #based on calcs in V&R's dose.p#based on calcs in V&R's dose.p EC50EC50 <- ( <- (etaeta--b0b0)/)/b1b1const1const1 <- ( <- (gammagamma/(/(11--gammagamma))*())*(EC50EC50 + + cov.b0.b1cov.b0.b1//var.b1var.b1) ) const2aconst2a <- <- var.b0var.b0 + + 22**cov.b0.b1cov.b0.b1**EC50EC50 + + var.b1var.b1**EC50EC50^̂22 - -

gammagamma*(*(var.b0var.b0 - - cov.b0.b1cov.b0.b1^̂22//var.b1var.b1))const2const2 <- <- zalpha.2zalpha.2/( (/( (11--gammagamma)*)*absabs((b1b1) )*) )*sqrtsqrt((const2aconst2a) ) LCLLCL <- <- EC50EC50 + + const1const1 - - const2const2 UCLUCL <- <- EC50EC50 + + const1const1 + + const2const2 conf.ptsconf.pts <- <- cc((LCLLCL,,EC50EC50,,UCLUCL) ) namesnames((conf.ptsconf.pts) <- ) <- cc(("Lower""Lower",,"EC50""EC50",,"Upper""Upper"))returnreturn((conf.ptsconf.pts,,conf.levelconf.level,,callcall==callcall) ) }}

As found in Piegorsch, W. W. & Bailer, A. J. 1997. Statistics for Environmental Biology and Toxicology. Chapman and Hall, London.

Page 7: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Page 8: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Command Window:Command Window:

-where the action takes place -where the action takes place -

Command Window:Command Window:

-where the action takes place -where the action takes place -

Page 9: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Help Menu:Help Menu:

-YOUR FRIEND!-YOUR FRIEND!-

Help Menu:Help Menu:

-YOUR FRIEND!-YOUR FRIEND!-

Page 10: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Page 11: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

R Libraries (aka Packages)R Libraries (aka Packages)• Suites of predefined R codeSuites of predefined R code• Available for a wide variety of topics and Available for a wide variety of topics and

specific analysesspecific analyses• Useful examplesUseful examples

– drc: Analysis of dose-response curvesdrc: Analysis of dose-response curves– survival: Survival analysis, including penalised likelihoodsurvival: Survival analysis, including penalised likelihood– nlme: Linear and nonlinear mixed effects modelsnlme: Linear and nonlinear mixed effects models– NADA: Nondetects And Data Analysis for environmental dataNADA: Nondetects And Data Analysis for environmental data– ade4: Analysis of Environmental Data : Exploratory and Euclidean methodade4: Analysis of Environmental Data : Exploratory and Euclidean method– Rcmdr: R Commander (GUI)Rcmdr: R Commander (GUI)

……. and many, many, more…. and many, many, more…

Page 12: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Installing RInstalling R• Download from CRAN site:Download from CRAN site:

– http://www.r-project.orghttp://www.r-project.org

• Install the ‘base’ R packageInstall the ‘base’ R package– Self-extracting installerSelf-extracting installer

• Find, install R libraries (i.e., extensions)Find, install R libraries (i.e., extensions)– Listing of many contributed packagesListing of many contributed packages

• http:http://cran//cran.stat..stat.uclaucla..edu/src/contrib/packagesedu/src/contrib/packages.html.html

– Use Google!Use Google!– Windows …Windows …

• Use the Use the PackagesPackages menu in the Rgui menu in the Rgui

Page 13: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Installing RInstalling R• DemoDemo

Page 14: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Getting data in \ out…Getting data in \ out…• Generally, two import/export optionsGenerally, two import/export options

– Exchange via delimited ASCII fileExchange via delimited ASCII file• R method R method read.table()read.table() (and variants) (and variants)

– Exchange with external file formats via add-on R packageExchange with external file formats via add-on R package • RDBMSRDBMS

– ROracle: Oracle database interface for RROracle: Oracle database interface for R– RODBC: ODBC database accessRODBC: ODBC database access

• Commercial Statistics PackagesCommercial Statistics Packages – RODBC: ODBC database accessRODBC: ODBC database access– foreign: Read Data Stored by Minitab, S, SAS, SPSS, Stata, foreign: Read Data Stored by Minitab, S, SAS, SPSS, Stata,

Systat, dBase,Systat, dBase,– R.matlab: Read and write of MAT files together with R-to-Matlab R.matlab: Read and write of MAT files together with R-to-Matlab

connectivityconnectivity

Page 15: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Getting data in \ out…Getting data in \ out…• A word (or two) about ASCII as opposed to binary A word (or two) about ASCII as opposed to binary

formatsformats– Universal access to the dataUniversal access to the data– Lifespan is not limitedLifespan is not limited

– Consider it the “open source” standard for data Consider it the “open source” standard for data accessaccess

Page 16: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Getting data in \ out…Getting data in \ out…• ASCII Data import: the read() methodASCII Data import: the read() method

– read.table()read.table(): reads comma-delimited ASCII file, creates data : reads comma-delimited ASCII file, creates data frameframe

• read.csv(),read.csv(), read.delim()...read.delim()... also create data frame also create data frame• But have different default input parametersBut have different default input parameters

– read.fwf():read.fwf(): reads fixed-width format ASCII file reads fixed-width format ASCII file– scan():scan(): Read data into a vector or list from the console OR Read data into a vector or list from the console OR

file.file.

• ASCII Data ExportASCII Data Export– write.table(): write.table(): writes data to an ASCII text filewrites data to an ASCII text file

Page 17: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Getting data in \ out…Getting data in \ out…• DEMODEMO

Page 18: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Managing data …Managing data …• The data frameThe data frame

> mydata = read.csv(“mydata.csv”)> mydata = read.csv(“mydata.csv”)> mydata[i,j]> mydata[i,j]> mydata[-i,j] > mydata[-i,j] > mydata[[i]]> mydata[[i]]> mydata$variable> mydata$variable

• Manipulating dataManipulating data> subset()> subset()> merge()> merge()> sort()> sort()> order()> order()many more…many more…

Page 19: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Managing data …Managing data …

• DEMODEMO

Page 20: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Useful websitesUseful websites• NCEAS tutorials and demonstrationsNCEAS tutorials and demonstrations

– http://www.nceas.ucsb.edu/scicomp/RProgTutorialsLatest.htmlhttp://www.nceas.ucsb.edu/scicomp/RProgTutorialsLatest.html

• R labs/tutorials for ecologistsR labs/tutorials for ecologists– http://ecology.msu.montana.edu/labdsv/R/http://ecology.msu.montana.edu/labdsv/R/

• Vegetation analysis toolbox (lots of useful multivariate analysis Vegetation analysis toolbox (lots of useful multivariate analysis and visualization tools)and visualization tools)– http://cc.oulu.fi/~jarioksa/softhelp/vegan.htmlhttp://cc.oulu.fi/~jarioksa/softhelp/vegan.html

• Analysis of bioassays using RAnalysis of bioassays using R– http://www.bioassay.dk/http://www.bioassay.dk/

• Huge effort for ‘omics’ data analysisHuge effort for ‘omics’ data analysis– http://www.bioconductor.org/http://www.bioconductor.org/

Page 21: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Philosophy of science…Philosophy of science…

Observable PhenomenaObservable Phenomena(Freestanding Reality)(Freestanding Reality)

Conceptual ConstructsConceptual Constructs(Reconstitution of Reality)(Reconstitution of Reality)

ScienceScience

Scientific UnderstandingScientific Understanding

Page 22: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Models in ScienceModels in Science

• A conceputal construct intended to A conceputal construct intended to represent a phenomenon of interestrepresent a phenomenon of interest

X Y

Page 23: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

““Modeling” in EcotoxicologyModeling” in Ecotoxicology

Systems EcologySystems Ecology• Population Population

DynamicsDynamics– Matrix basedMatrix based– ODE basedODE based

• Inter-specific Inter-specific InteractionsInteractions

• Habitat SelectionHabitat Selection• Food Webs/ChainsFood Webs/Chains

• PBTKPBTK• Individual-basedIndividual-based• EpidemiologyEpidemiology• MetapopulationsMetapopulations• ……

Page 24: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

““Modeling” in EcotoxicologyModeling” in Ecotoxicology• Dynamic systems modelingDynamic systems modeling

– Modeling the flow of “materials” through Modeling the flow of “materials” through compartmentscompartments

• Difference equationsDifference equations• Differential equationsDifferential equations

• Simulation modelingSimulation modeling– Conducting “sampling” exercises to mimic real Conducting “sampling” exercises to mimic real

processesprocesses– Derive descriptive or inferential statisticsDerive descriptive or inferential statistics– Null modelsNull models

Page 25: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Models in RModels in R• R is built on the notion that statistical analysis R is built on the notion that statistical analysis

can be viewed as an exercise in statistical can be viewed as an exercise in statistical modeling, an exercise that modeling, an exercise that is tightly linkedis tightly linked to to the original scientific question.the original scientific question.

• This view provides a coherent framework forThis view provides a coherent framework for– conducting standard hypothesis tests, conducting standard hypothesis tests, andand– dealing with data that contain complexities that dealing with data that contain complexities that

restrict the use of standard hypothesis testsrestrict the use of standard hypothesis tests– estimating effect sizesestimating effect sizes– prediction prediction

Page 26: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Collect DataCollect Data

Models in R Models in R

MysteryMystery

IntrigueIntrigueVoodooVoodoo

MagicMagic

““Statistics”Statistics”

ResultsResults

• Peer inside the black box!Peer inside the black box!

Page 27: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

• "I like to think of statistics as the science "I like to think of statistics as the science of of learning from datalearning from data...”...”

Jon Kettenring, ASA President, 1997Jon Kettenring, ASA President, 1997

What is Statistics?What is Statistics?

Page 28: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Example modelExample model• We think that the concentration of a blood enzyme (Y) is the result of We think that the concentration of a blood enzyme (Y) is the result of

exposure to Pb. We design an experiment and expose organisms to a exposure to Pb. We design an experiment and expose organisms to a series of concentrations of Pb (series of concentrations of Pb ().).

Yij = + i + ij

Page 29: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Example modelExample model• We think that the concentration of a blood enzyme (Y) is the result of We think that the concentration of a blood enzyme (Y) is the result of

exposure to Pb. We design an experiment and expose organisms to a exposure to Pb. We design an experiment and expose organisms to a series of concentrations of Pb (series of concentrations of Pb ().).

Yij = + i + ij i. ~ N(0,2)

Grand mean of all Yij

Effect of concentration i

Random variability in Y after accounting for Pb concentration

Page 30: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Example modelExample model• We think that the concentration of a blood enzyme (Y) is the result of We think that the concentration of a blood enzyme (Y) is the result of

exposure to Pb. We design an experiment and expose organisms to a exposure to Pb. We design an experiment and expose organisms to a series of concentrations of Pb (series of concentrations of Pb ().).

Errors within each level of are normally distributed with mean=0 and variance =2

Yij = + i + ij i. ~ N(0,2)

Page 31: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Example modelExample model• We think that the concentration of a blood enzyme (Y) is the result of We think that the concentration of a blood enzyme (Y) is the result of

exposure to Pb. We design an experiment and expose organisms to a exposure to Pb. We design an experiment and expose organisms to a series of concentrations of Pb (series of concentrations of Pb ().).

Analysis of Variance (ANOVA)

Yij = + i + ij i. ~ N(0,2)

Page 32: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

An alternative modelAn alternative model• We think that the concentration of a blood enzyme (Y) is the result of We think that the concentration of a blood enzyme (Y) is the result of

exposure to Pb. We design an experiment and expose organisms to a exposure to Pb. We design an experiment and expose organisms to a series of concentrations of Pb. series of concentrations of Pb. Let’s consider Pb as a continuous Let’s consider Pb as a continuous variable (X).variable (X).

Yi = + 1X + i i ~ N(0,2)

Page 33: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

An alternative modelAn alternative model• We think that the concentration of a blood enzyme (Y) is the result of We think that the concentration of a blood enzyme (Y) is the result of

exposure to Pb. We design an experiment and expose organisms to a exposure to Pb. We design an experiment and expose organisms to a series of concentrations of Pb. series of concentrations of Pb. Let’s consider Pb as a continuous Let’s consider Pb as a continuous variable (X).variable (X).

Yi = + 1X + i i ~ N(0,2)

Rename as 0

Yi = 0 + 1X + i

Simple Linear Regression

Page 34: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Dummy VariablesDummy Variables• We could rewrite the ANOVA model using the regression “terminology” via We could rewrite the ANOVA model using the regression “terminology” via

dummy variables. For example, assume 3 concentrations. dummy variables. For example, assume 3 concentrations. • StrategyStrategy

– Recode the independent variables (XRecode the independent variables (X ii) using 0 or 1 to represent treatment levels.) using 0 or 1 to represent treatment levels.

Analysis of Variance (ANOVA)

XX11 XX22

11 00 00

22 11 00

33 00 11

Yi = 0 + 1X1 + 2X2 + i

Contrast Matrix:

The way we perform the coding of dummy variables determines how to interpret model parameters. This coding scheme is called “Treatment Contrasts” - the default in R

Page 35: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

A further complicationA further complication• We think that the concentration of a blood enzyme (Y) is the result of We think that the concentration of a blood enzyme (Y) is the result of

exposure to Pb. We design an experiment and expose organisms to a exposure to Pb. We design an experiment and expose organisms to a series of concentrations of Pb (series of concentrations of Pb (). ). Assume we also want to get rid of Assume we also want to get rid of the possibly confounding effects of body size (S).the possibly confounding effects of body size (S).

Yij = + i + ij & Yi = 0 + 1S + i

Page 36: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

A further complicationA further complication• We think that the concentration of a blood enzyme (Y) is the result of We think that the concentration of a blood enzyme (Y) is the result of

exposure to Pb. We design an experiment and expose organisms to a exposure to Pb. We design an experiment and expose organisms to a series of concentrations of Pb (series of concentrations of Pb (). ). Assume we also want to get rid of Assume we also want to get rid of the possibly confounding effects of body size (S).the possibly confounding effects of body size (S).

Yij = + i + ij & Yi = 0 + 1S + i

Yi = 0 + 1X1 + … pXp + p+1S + i

Analysis of Covariance(Assuming equal slopes)

Dummy Variables for

Page 37: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

The general linear modelThe general linear model• Forms the basis for most classical statisticsForms the basis for most classical statistics

• Implemented in R through Implemented in R through lm()lm()> m1 = lm(y ~ x, data) > m1 = lm(y ~ x, data) # fit the model and save output as “m1”# fit the model and save output as “m1”> summary(m1) > summary(m1) # print a table summary of model information# print a table summary of model information> anova(m1) > anova(m1) # summarize results in an ANOVA table# summarize results in an ANOVA tableYi = 0 + 1X1 + 2X2 + … + pXp + i

Yi = X + i i ~ N(0,2I)

Page 38: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

Example 17.8 from Zar, J. 1999. Biostatistical Analysis. 4th Ed. Prentice Hall. ISBN 0-13-081542-X

age sbp30 10830 11030 10640 12540 12040 11840 11950 13250 13750 13460 14860 15160 14660 14760 14470 16270 15670 16470 15870 159

Example Data SetExample Data Set

• Demo & HandoutDemo & Handout

Page 39: Spring 2007Advanced Statistics Using RStephen Cox stephen.cox@ttu.edu Advanced Statistics using. Statistical thinking will one day be as necessary for

Advanced Statistics Using R Stephen [email protected]

Spring 2007

AncovaAncova

• Demo and HandoutDemo and Handout