multiple regression models experimental design and data analysis for biologists (quinn & keough,...
TRANSCRIPT
![Page 1: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/1.jpg)
Multiple regression models
Experimental design and data analysis for biologists (Quinn & Keough, 2002)
Environmental sampling and analysis
![Page 2: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/2.jpg)
Multiple regression models
• One response (dependent) variable:– Y
• More than one predictor (independent variable) variable:– X1, X2, X3 …, Xj
– number of predictors = p (j = 1 to p)
• Number of observations = n (i = 1 to n)
![Page 3: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/3.jpg)
Forest fragmentation
![Page 4: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/4.jpg)
Forest fragmentation• 56 forest patches in SE Victoria (Loyn 1987)• Response variable:
– bird abundance
• Predictor variables:– patch area (ha)– years isolated (years)– distance to nearest patch (km)– distance to nearest larger patch (km)– stock grazing intensity (1 to 5 scale)– altitude (m)
![Page 5: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/5.jpg)
Biomoinitoring with Vallisneria
• Indicators of sublethal effects of organochlorine contamination– leaf-to-shoot surface area ratio of
Vallisneria americana– response variable
• Predictors:– sediment contamination, plant
density, PAR, rivermile, water depth
• 225 sites in Great Lakes• Potter & Lovett-Doust (2001)
![Page 6: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/6.jpg)
Regression models
Linear model:
Sample equation:
...y b b x b xi 0 1 i1 2 i2
yi = 0 + 1xi1 + 2xi2 + .... + i
![Page 7: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/7.jpg)
Example
• Regression model:
(bird abundance)i = 0 + 1(patch area)i + 2(years isolated)i + 3(nearest patch distance)i + 4(nearest large patch distance)i + 5(stock grazing)i + 6(altitude)i + i
![Page 8: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/8.jpg)
Multiple regression planebi
rd a
bund
ance
altitude log10area
![Page 9: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/9.jpg)
Partial regression coefficients
• H0: 1 = 0
• Partial population regression coefficient (slope) for Y on X1, holding all other X’s constant, equals zero
• Example:– slope of regression of bird abundance against patch
area, holding years isolated, distance to nearest patch, distance to nearest larger patch, stock grazing intensity and altitude constant, equals 0.
![Page 10: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/10.jpg)
Testing H0: i = 0
• Use partial t-tests:
• t = bi / SE(bi)
• Compare with t-distribution with n-2 df
• Separate t-test for each partial regression coefficient in model
• Usual logic of t-tests:– reject H0 if P < 0.05
![Page 11: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/11.jpg)
Model comparison
• Test H0: 1 = 0
• Fit full model:– y = 0+1x1+2x2+3x3+…
• Fit reduced model:– y = 0+2x2+3x3+…
• Calculate SSextra:
– SSRegression(full) - SSRegression(reduced)
• F = MSextra / MSResidual(full)
![Page 12: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/12.jpg)
Overall regression model
• H0: 1 = 2 = ... = 0 (all population slopes equal zero)
• Test of whether overall regression equation is significant
• Use ANOVA F-test:– variation explained by regression– unexplained (residual) variation
![Page 13: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/13.jpg)
Explained variance
r2
proportion of variation in Y explained by linear relationship with X1, X2 etc.
SS Regression SS Total
![Page 14: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/14.jpg)
Forest fragmentation
Intercept 20.789 8.285 0 0.015Log10 area 7.470 1.465 0.565 <0.001Log10 distance -0.907 2.676 -0.035 0.736Log10 ldistance -0.648 2.123 -0.035 0.761Grazing -1.668 0.930 -0.229 0.079Altitude 0.020 0.024 0.079 0.419Years -0.074 0.045 -0.176 0.109
r2 = 0.685, F6,49 = 17.754, P <0 .001
Parameter Coefficient SE Stand coeff P
![Page 15: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/15.jpg)
Biomoinitoring with Vallisneria
Parameter Coefficient SE P
Intercept 1.054 0.565 0.063Sediment contamination 1.352 0.482 0.006Plant density 0.028 0.007 <0.001PAR -0.087 0.017 <0.001Rivermile 1.00 x 10-4 9.17 x 10-5 0.277Water depth 0.246 0.486 0.613
![Page 16: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/16.jpg)
Assumptions
• Normality and homogeneity of variance for response variable
• Independence of observations
• Linearity
• No collinearity
![Page 17: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/17.jpg)
Scatterplots
• Scatterplot matrix (SPLOM)– pairwise plots for all variables
• Partial regression (added variable) plots– relationship between Y and Xj, holding other
Xs constant
– residuals from Y against all Xs except Xj vs residuals from Xj against all other Xs
– graphs partial regression slope for Xj
![Page 18: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/18.jpg)
Partial regression plot (log10 area)
-2 -1 0 1 2
Log10 Area
-20
-10
0
10
20
Bird
abu
ndan
ce
![Page 19: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/19.jpg)
Regression diagnostics
• Residual:– observed yi - predicted yi
• Residual plots:– residual against predicted yi
– residual against each X
• Influence:– Cook’s D statistics
![Page 20: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/20.jpg)
Collinearity
• Collinearity:– predictors correlated
• Assumption of no collinearity:– predictor variables uncorrelated with (ie.
independent of) each other
• Effect of collinearity:– estimates of js and significance tests
unreliable
![Page 21: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/21.jpg)
Response (Y) and 2 predictors (X1 and X2)
1. X1 and X2 uncorrelated (r = -0.24)
coeff se tol t Pintercept -0.17 1.03 -0.16 0.873X1 1.13 0.14 0.95 7.86 <0.001X2 0.12 0.14 0.95 0.86 0.404
r2 = 0.787, F = 31.38, P < 0.001
Collinearity
![Page 22: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/22.jpg)
Collinearity
intercept 0.49 0.72 0.69 0.503X1 1.55 1.21 0.01 1.28 0.219X2 -0.45 1.21 0.01 -0.37 0.714
2. Rearrange X2 so X1 and X2 highly correlated (r = 0.99)
coeff se tol t P
r2 = 0.780, F = 30.05, P < 0.001
![Page 23: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/23.jpg)
Checks for collinearity
• Correlation matrix and/or SPLOM between predictors
• Tolerance for each predictor:– 1-r2 for regression of that predictor on all
others– if tolerance is low (near 0.1) then
collinearity is a problem– VIF (variance inflation factor)
![Page 24: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/24.jpg)
Forest fragmentationL 1
0 DIS
TL 1
0 LD
I ST
L 10 A
RE
AG
RA
ZE
ALT
L10DIST
YR
S
L10LDIST L10AREA GRAZE ALT YRS
Tolerances:0.396 – 0.681
![Page 25: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/25.jpg)
Solutions to collinearity
• Drop redundant (correlated) predictors• Principal components regression
– potentially useful– replace predictors by independent
components from PCA on predictor variables
• Ridge regression– controversial and complex
![Page 26: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/26.jpg)
Predictor importance
• Tests on partial regression slopes
• Standardised partial regression slopes
j
j
Y
X
jj s
sbb *
![Page 27: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/27.jpg)
Predictor importance
• Change in explained variation– compare fit of full model to reduced model
omitting Xj
• Hierarchical partitioning– splits total r2 for each predictor into
• independent contribution of each predictor• joint contribution of each predictor with other
predictors
Residual
Extra2
SS Reduced
SS
jXr
![Page 28: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/28.jpg)
Forest fragmentation
Predictor Independent Joint Total Stand coeffr2 r2 r2
Log10 area 0.315 0.232 0.548 0.565Log10 distance 0.007 0.009 0.016 -0.035Log10 ldistance 0.014 <0.001 0.014 -0.035Altitude 0.057 0.092 0.149 0.079Grazing 0.190 0.275 0.466 -0.229Years 0.101 0.152 0.253 -0.176
![Page 29: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/29.jpg)
Interactions
• Interactive effect of X1 and X2 on Y
• Dependence of partial regression slope of Y against X1 on the value of X2
• Dependence of partial regression slope of Y against X2 on the value of X1
• yi = 0 + 1xi1 + 2xi2 + 3xi1xi2 + i
![Page 30: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/30.jpg)
Forest fragmentation
• Does effect of grazing on bird abundance depend on area?– log10 area x grazing interaction
• Does effect of grazing depend on years since isolation?– grazing x years interaction
• Etc.
![Page 31: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/31.jpg)
Interpreting interactions
• Interactions highly correlated with individual predictors:– collinearity problem– centring variables (subtracting mean) removes
collinearity
• Simple regression slopes:– slope of Y on X1 for different values of X2
– slope of Y on X2 for different values of X1
– use if interaction is significant
![Page 32: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/32.jpg)
Polynomial regression
• Modeling some curvilinear relationships• Include quadratic (X2) or cubic (X3) etc.• Quadratic model:
yi = 0 + 1xi1 + 2xi12 + i
• Compare fit with:
yi = 0 + 1xi1 + i
• Does quadratic fit better than linear?
![Page 33: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/33.jpg)
Local and regional species richness
• Relationship between local and regional species richness in North America– Caley & Schluter (1997)
• Two models compared:
local spp = 0 + 1(regional spp) + 2(regional spp)2 +
local spp = 0 + 1(regional spp) +
![Page 34: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/34.jpg)
0 50 100 150 200 250
Regional species richness
0
50
100
150
200
Loca
l spe
cies
ric
hnes
s
Linear
Quadratic
![Page 35: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/35.jpg)
Model comparison
Full model:SSResidual = 376.620, df = 5
Reduced model:SSResidual = 1299.257, df = 6
Difference due to (regional spp)2:SSExtra = 922.7, df = 1, MSExtra = 922.7F = 12.249, P < 0.018
See Quinn & Keough Box 6.6
![Page 36: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/36.jpg)
Categorical predictors
• Convert categorical predictors into multiple continuous predictors– dummy (indicator) variables
• Each dummy variable coded as 0 or 1
• Usually no. of dummy variables = no. groups minus 1
![Page 37: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/37.jpg)
Forest fragmentation
Grazing Grazing1 Grazing2 Grazing3 Grazing4
intensity
Zero (1) 0 0 0 0Low (2) 1 0 0 0Medium (3) 0 1 0 0High (4) 0 0 1 0Intense (5) 0 0 0 1
Each dummy variable measures effect of low – intense categories compared to “reference” category – zero grazing
![Page 38: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/38.jpg)
Forest fragmentation
Coefficient Est SE t PIntercept 21.603 3.092 6.987 <0.001Grazing -2.854 0.713 -4.005 <0.001Log10 area 6.890 1.290 5.341 <0.001
Intercept 15.716 2.767 5.679 <0.001Grazing1 0.383 2.912 0.131 0.896Grazing2 -0.189 2.549 -0.074 0.941Grazing3 -1.592 2.976 -0.535 0.595Grazing4 -11.894 2.931 -4.058 <0.001Log10 area 7.247 1.255 5.774 <0.001
![Page 39: Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis](https://reader035.vdocument.in/reader035/viewer/2022062719/56649eeb5503460f94bfcf01/html5/thumbnails/39.jpg)
Categorical predictors
• All linear models fit categorical predictors using dummy variables
• ANOVA models combine dummy variables into single factor effect– partition SS into factor and residual– dummy variable effects often provided by software
• Models with both categorical (factor) and continuous (covariate) predictors– adjust factor effects based on covariate– reduce residual based on strength of relationship
between Y and covariate – more powerful test of factor