regression & correlation analysis of biological data ryan mcewan and julia chapman department of...

23
Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton [email protected]

Upload: breana-sours

Post on 14-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Regression & Correlation

Analysis of Biological DataRyan McEwan and Julia ChapmanDepartment of BiologyUniversity of [email protected]

Page 2: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Simple linear regression is a standard technique in the Analysis of Biological Data:

The main idea is assessing the relationship between two variables, assuming that the relationship is direction and linear…and assuming that one variable is a driver of the relationship.

The Response variable (plotted on X) is assumed to respond in a linear relationship to changes in the Predictor variable (plotted on Y).

The reverse is not assumed in this analysis (that X drives Y). Think heart rate and exercise.Other examples?

Page 3: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

But if you have a cloud of points…where do you put the line?

Page 4: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Best fit lines & “Least Squares” regression

The idea is to drive the line through the cloud in the area that minimizes the distance between the points and the line.

Page 5: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Regression residuals

You can generate a table of residuals..a new data set!

How much does each point deviate from theregression line?

Page 6: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Detrending… a scientific siren song

Page 7: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Regression lines can have varying slopes from a single Y intercept.

Page 8: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Regression lines can have identical slopes, but different Y intercepts.

Page 9: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu
Page 10: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

We will be running a test of this sort in R. The thing I want to you to understand is that the statistical test…. The P-value generated… relates to the null hypothesis of NO SLOPE. That the line is indeed flat. That would mean the response variable is NOT changing in relation to the predictor.

Page 11: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu
Page 12: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

…ruut row…

Page 13: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

IMPORTANT! The P-value from a regression, tells you whether the line is statistically flat….it does not tell you how much variation is captured!

Page 14: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu
Page 15: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

It may be more useful to calculate a confidence interval

Page 16: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

You might wish to have replicate values

Page 17: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Your relationship might not be linear!

Polynomial Regression

Page 18: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Regression Diagnostics!A stepwise process of adding factors to the regression. Testing P value, r2, etc.

If you are going to take this on, you need to grind! Read, analyze, read some more

Page 19: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Correlation is a related form of analysis, but is different in one fundamental way…a correlation is testing for a relationship between two factors, but NOT ASSUMING one causes the other.

Thus, no predictor and response

Page 20: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

You would use a correlation analysis if you are not making assumptions about one factor driving another.

Pearson correlation for normally distributed data

Spearman (rank) correlation for non normally distributed data.

Page 21: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Logistic regression:

To be used if your data are categorical……

Page 22: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Caution 1: Correlation is not causation!

Page 23: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu

Extrapolation is dangerous!!