regression & correlation analysis of biological data ryan mcewan and julia chapman department of...

Post on 14-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Regression & Correlation

Analysis of Biological DataRyan McEwan and Julia ChapmanDepartment of BiologyUniversity of Daytonryan.mcewan@udayton.edu

Simple linear regression is a standard technique in the Analysis of Biological Data:

The main idea is assessing the relationship between two variables, assuming that the relationship is direction and linear…and assuming that one variable is a driver of the relationship.

The Response variable (plotted on X) is assumed to respond in a linear relationship to changes in the Predictor variable (plotted on Y).

The reverse is not assumed in this analysis (that X drives Y). Think heart rate and exercise.Other examples?

But if you have a cloud of points…where do you put the line?

Best fit lines & “Least Squares” regression

The idea is to drive the line through the cloud in the area that minimizes the distance between the points and the line.

Regression residuals

You can generate a table of residuals..a new data set!

How much does each point deviate from theregression line?

Detrending… a scientific siren song

Regression lines can have varying slopes from a single Y intercept.

Regression lines can have identical slopes, but different Y intercepts.

We will be running a test of this sort in R. The thing I want to you to understand is that the statistical test…. The P-value generated… relates to the null hypothesis of NO SLOPE. That the line is indeed flat. That would mean the response variable is NOT changing in relation to the predictor.

…ruut row…

IMPORTANT! The P-value from a regression, tells you whether the line is statistically flat….it does not tell you how much variation is captured!

It may be more useful to calculate a confidence interval

You might wish to have replicate values

Your relationship might not be linear!

Polynomial Regression

Regression Diagnostics!A stepwise process of adding factors to the regression. Testing P value, r2, etc.

If you are going to take this on, you need to grind! Read, analyze, read some more

Correlation is a related form of analysis, but is different in one fundamental way…a correlation is testing for a relationship between two factors, but NOT ASSUMING one causes the other.

Thus, no predictor and response

You would use a correlation analysis if you are not making assumptions about one factor driving another.

Pearson correlation for normally distributed data

Spearman (rank) correlation for non normally distributed data.

Logistic regression:

To be used if your data are categorical……

Caution 1: Correlation is not causation!

Extrapolation is dangerous!!

top related