part iii the general linear model chapter 9 regression

65
Part III The General Linear Model Chapter 9 Regression

Upload: zinna

Post on 24-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Part III The General Linear Model Chapter 9 Regression. GLM, applied to regression. Example 9.3.1 from Snedecor and Cochran (1989 ) Interested in the relationship between: phosphorus content of corn ( Pcorn in ppm) & phosphorus levels in soil samples ( Psoil in ppm). 1. Construct Model. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Part III The General Linear Model Chapter 9 Regression

Part IIIThe General Linear Model

Chapter 9Regression

Page 2: Part III The General Linear Model Chapter 9 Regression
Page 3: Part III The General Linear Model Chapter 9 Regression

GLM, applied to regression

• Example 9.3.1 from Snedecor and Cochran (1989)• Interested in the relationship between:– phosphorus content of corn (Pcorn in ppm) & phosphorus levels in soil samples (Psoil in ppm).

Page 4: Part III The General Linear Model Chapter 9 Regression

1. Construct Model

Verbal

Graphical Formal

Page 5: Part III The General Linear Model Chapter 9 Regression

1. Construct Model

Name Units Dimensions Measurement Scale

Response

Explanatory

Graphical

Verbal Phosphorus content of corn (Pcorn) depends on Phosphorus content of soil (Psoil)

Page 6: Part III The General Linear Model Chapter 9 Regression

1. Construct ModelVerbal

Graphical Formal

Phosphorus content of corn (Pcorn) depends on Phosphorus content of soil (Psoil)

Units Dimensions Measurement Scale

Page 7: Part III The General Linear Model Chapter 9 Regression

2. Execute analysis. Place data in model format: 𝑃𝑐𝑜𝑟𝑛=𝛼+𝛽𝑃𝑠𝑜𝑖𝑙 ∙𝑃𝑠𝑜𝑖𝑙+𝜖

lm1 <- lm(Pcorn~Psoil, data=corn)

2. Execute analysis. Compute fitted values and residuals.

fits <- fitted(lm1)resid <- residuals(lm1)cbind(corn, fits, resid)

Page 8: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model. Plot residuals against fitted values

Check linear trend

Page 9: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model. Plot residuals against fitted values plot(fits,resid,pch=16)

Check linear trend

Page 10: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model. Plot residuals against fitted values

Page 11: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model.

• Using theoretical distributions (χ2, t, F) to calculate p-value, therefore we need to check their assumptions:– Fixed variance (errors homogeneous)– Normally distributed errors.– Independent errors– Unbiased estimate (errors sum to zero)

Page 12: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model. Homogeneous errors.

Page 13: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model. Homogeneous errors.

Page 14: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model. Normal errors.

Page 15: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model. Independent errors.This is a text example, we do not have information on spatial layout of samples, or on collection sequence. We will assume independence

3. Evaluate Model. Conclusion.Residuals appear to homogeneous, but not normal. We assume independence, we do not have enough information to evaluate this assumption.

We may need to use an empirical distribution to compute p-values or confidence limits

Page 16: Part III The General Linear Model Chapter 9 Regression

4. State population and whether sample is representative.

Population?

Sample(n=9)

The population is all values of phosphorus in corn, given knowledge of phosphorus in the soil

The sample is representative if the 17 soil types represent the range of possible soil types

Page 17: Part III The General Linear Model Chapter 9 Regression

5. Decide on mode of inference. Is hypothesis testing appropriate?

• Since the relationship between P and P content in corn is unknown, we proceed

6. State HA / Ho, test statistic and α

HA:

Ho:

Statistic: α:

𝑃𝑐𝑜𝑟𝑛=𝛼+𝛽𝑃𝑠𝑜𝑖𝑙 ∙𝑃𝑠𝑜𝑖𝑙+𝜖

Page 18: Part III The General Linear Model Chapter 9 Regression

7. ANOVA: partition df according to model.

n=9

dftot = ________ = _____

dfmodel = 1

dfres= dftotal – dfmodel = _____

Page 19: Part III The General Linear Model Chapter 9 Regression

7. ANOVA: Calculate SS, partition according to model.

Page 20: Part III The General Linear Model Chapter 9 Regression

7. ANOVA: Calculate SS, partition according to model.

Page 21: Part III The General Linear Model Chapter 9 Regression

7. ANOVA: Calculate SS, partition according to model.

Null model: Pcorn = mean(Pcorn)SS total: 2274.00

Regression model: 61.58 + 1.417*PsoilSS residual: 800.43

SS improvement? __________

Page 22: Part III The General Linear Model Chapter 9 Regression

7. ANOVA: Calculate SS, partition according to model.

Page 23: Part III The General Linear Model Chapter 9 Regression

7. ANOVA: Partition df, SS according to model. Complete ANOVA table

7. ANOVA: Calculate Type I error from F distribution.

Packages compute and place the p-value in the ANOVA tablep = 0.00885

Page 24: Part III The General Linear Model Chapter 9 Regression

8. Recompute p-value if necessary.

• p-values can be inaccurate if assumptions are violated• Distortion depends on sample size– As a rule of thumb, distortion is greatest if n < 30– less serious if 30 < n < 100– usually not serious if n > 100

• When assumptions are not met, recompute Type I error if two conditions are met:1. n small2. p near α

Page 25: Part III The General Linear Model Chapter 9 Regression

8. Recompute p-value if necessary.

• Due diligence recompute p-value using randomization– Free of assumptions

• In 4000 randomizations there were 27 instances of an F-ratio greater than 12.89– Empirical p-value: 0.00675– Theoretical p-value: 0.008854

Page 26: Part III The General Linear Model Chapter 9 Regression

9. Declare and report decision about model terms.

• p = 0.006750 (via randomization, hence no assumptions)– p < α = 5% so reject Ho for HA

• Report decision with evidence:– There was a significant increase in available

phosphorus with increase in soil phosphorus (F1,7 = 12.89, p = 0.00675 by randomization)

Page 27: Part III The General Linear Model Chapter 9 Regression

10. Report and interpret parameters of biological interest.

• Regression Equation:

Page 28: Part III The General Linear Model Chapter 9 Regression

• Today: Lab 4 due

• Monday & Tuesday: No classes

• Wednesday: Grad seminarLectureQuizz 5

• Thursday: Lab 5a

Page 29: Part III The General Linear Model Chapter 9 Regression

Chapter 9.2Regression. Explanatory Variable Fixed into Classes

Page 30: Part III The General Linear Model Chapter 9 Regression

GLM, applied to regressionX variable fixed into classes

• Example: Galton’s Law

• Quantity of interest is the stature (height) of sons in relation to stature (height) of their fathers.

• Data collected by Francis Galton at end of the 19th century.

• 1st application of regression

Page 31: Part III The General Linear Model Chapter 9 Regression

1. Construct Model

Verbal

Graphical Formal

Data

Page 32: Part III The General Linear Model Chapter 9 Regression

1. Construct Model Verbal

Graphical Formal

Data

There is a positive relation between heights of sons and fathers

Explanatory: _____________

Response: _____________

Model: __________________

Page 33: Part III The General Linear Model Chapter 9 Regression

1. Construct Model

Symbol Units Dimensions Measurement Scale

HsonHf

𝐻 𝑠𝑜𝑛=𝛼+𝛽𝐻 𝑓∙𝐻 𝑓 +𝜀

…… …𝐻 𝑠𝑜𝑛=�̂�+ �̂�𝐻 𝑓∙𝐻 𝑓 +𝜀

𝐻 𝑠𝑜𝑛=𝑎+𝑏𝐻 𝑓∙𝐻 𝑓+𝑒

Page 34: Part III The General Linear Model Chapter 9 Regression

2. Execute analysis. Place data in model format:

lm1 <- lm(Hson~Hf, weights=Nfamily, data=Heights)

𝐻 𝑠𝑜𝑛=𝛼+𝛽𝐻 𝑓∙𝐻 𝑓 +𝜀

…… …

Page 35: Part III The General Linear Model Chapter 9 Regression

2. Execute analysis. Compute fitted values and residuals.

coefficients(lm1)(Intercept) Hf 33.2855960 0.5225171

𝐻 𝑠𝑜𝑛=𝛼+𝛽𝐻 𝑓∙𝐻 𝑓 +𝜀

𝐻 𝑠𝑜𝑛=33.29+0.52 ∙𝐻 𝑓 +𝜀

63.667 = +65.643 = +

… … …

Page 36: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model

□ Straight line model ok?

□ Errors homogeneous?

□ Errors normal?

□ Errors independent?

Page 37: Part III The General Linear Model Chapter 9 Regression

4. State population and whether sample is representative.• Population is all possible measurements, given the

measurement protocol, if we repeated the study thousands of times

• We infer a population consisting of thousands of runs of the same experiment, using the same protocol

Page 38: Part III The General Linear Model Chapter 9 Regression

5. Decide on mode of inference. Is hypothesis testing appropriate?

• Might expect a 1:1 ratio• Undertake hypothesis testing?• Use confidence limits

10. Report and interpret parameters of biological interest.• Compute confidence limits from standard error of

the slope parameter summary(lm1)$coefficients

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 33.28560 1.64243 20.27 2.61e-12 ***Hf 0.52252 0.02424 21.55 1.06e-12 ***

Page 39: Part III The General Linear Model Chapter 9 Regression

10. Report and interpret parameters of biological interest.

Page 40: Part III The General Linear Model Chapter 9 Regression

10. Report and interpret parameters of biological interest.

𝐻 𝑠𝑜𝑛=33.29+0.52 ∙𝐻 𝑓 +𝜀

• Confidence limits do not include hypothesis of

• Nor does it include (i.e. no relationship)

• is tightly centered around a value of ~0.5

– Great! But why?

Page 41: Part III The General Linear Model Chapter 9 Regression

10. Report and interpret parameters of biological interest.

Page 42: Part III The General Linear Model Chapter 9 Regression
Page 43: Part III The General Linear Model Chapter 9 Regression

Chapter 9.3Regression. Explanatory Variable Measured with Error

Page 44: Part III The General Linear Model Chapter 9 Regression

• Adds bias to regression parameter estimates• Example:– Relation between number of eggs and body size in

cabezon fish (Box 14.12, Sokal and Rohlf 1995)

– What is the magnitude of the bias?

GLM, applied to regressionExplanatory Variable Measured with Error

Page 45: Part III The General Linear Model Chapter 9 Regression

1. Construct Model

• Verbal– Does egg number Neggs depend on body mass M ?

• Graphical

D

V

G F

• Formal– Response: Neggs– Explanatory: M

𝑁 𝑒𝑔𝑔𝑠=𝛼+𝛽𝑀 ∙𝑀+𝜀units?

dimensions?

measurement scale?

Page 46: Part III The General Linear Model Chapter 9 Regression

2. Execute analysis. Place data in model format:

• The package first estimates the parameters of the general linear model, and

• Where:

lm1 <- lm(Neggs~M, data=data)𝑁 𝑒𝑔𝑔𝑠=𝛼+𝛽𝑀 ∙𝑀+𝜀

Estimate parameters and compute fitted values and residuals

𝑁 𝑒𝑔𝑔𝑠= �̂�𝑜+ �̂�𝑀 ∙(𝑀 −𝑀)+𝜀

Page 47: Part III The General Linear Model Chapter 9 Regression

2. Execute analysis. Place data in model format:lm1 <- lm(Neggs~M, data=data)𝑁 𝑒𝑔𝑔𝑠=𝛼+𝛽𝑀 ∙𝑀+𝜀

Estimate parameters and compute fitted values and residuals

Page 48: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model

𝑁 𝑒𝑔𝑔𝑠=𝛼+𝛽𝑀 ∙𝑀+𝜀

Where is measurement error

If are normal and independent will be < by a factor of

-Reliability ratio

unknown, but no worse than measurement resolution (1 hectogram)

□ Structure?

□ Straight line model ok?

□ Errors homogeneous?

□ Errors normal?

□ Errors independent?

Page 49: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model

□ Structure?

□ Straight line model ok?

□ Errors homogeneous?

□ Errors normal?

□ Errors independent?

Page 50: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model

□ Structure?

□ Straight line model ok?

□ Errors homogeneous?

□ Errors normal?

□ Errors independent?

M Neggs Res Lag.Res14 61 15.05 NA17 37 -14.56 15.0524 65 0.35 -14.5625 69 2.48 0.3527 54 -16.26 2.4833 93 11.52 -16.2634 87 3.65 11.5237 89 0.04 3.6540 100 5.43 0.0441 90 -6.43 5.4342 97 -1.30 -6.43

Page 51: Part III The General Linear Model Chapter 9 Regression

4. State population and whether sample is representative.

a) All measurements that could have been made on the fish by this protocol

b) All cabezon fish

c) All fish that could have been collected when the collection was made

d) Measurements from 11 cabenzon fish reported here

Page 52: Part III The General Linear Model Chapter 9 Regression

5. Decide on mode of inference. Is hypothesis testing appropriate?

• We want to know if the relationship between body size and egg count deviates from 1:1

• Use confidence limits

10. Report and interpret parameters of biological interest.• Compute confidence limits

confint(lm1)

2.5 % 97.5 %(Intercept) -4.098376 43.632008M 1.117797 2.622113

Page 53: Part III The General Linear Model Chapter 9 Regression

10. Report and interpret parameters of biological interest.

Neggs = Fits + Res61 = 45.95 + 15.0537 = 51.56 + -14.5665 = 64.65 + 0.3569 = 66.52 + 2.4854 = 70.26 + -16.2693 = 81.48 + 11.5287 = 83.35 + 3.6589 = 88.96 + 0.04

100 = 94.57 + 5.4390 = 96.43 + -6.4397 = 98.30 + -1.30

• Check limits free of assumptions – randomization

3.652.48

-14.560.04

15.05-1.305.43-6.430.35

-16.2611.52

49.6054.0450.0966.5685.3180.1788.7882.5294.9280.18

109.83

Page 54: Part III The General Linear Model Chapter 9 Regression

10. Report and interpret parameters of biological interest.

Page 55: Part III The General Linear Model Chapter 9 Regression

10. Report and interpret parameters of biological interest.• Report conclusions with evidence:

– = 1.87 with 95% confidence limits of 1.28 to 2.48 kiloeggs/hectogram

– does not include 0, so there is a relationship

– also excludes 1:1 ratio

– We conclude that in this species, large fish invest disproportionately more in eggs (per unit of body mass) than do small fish

Page 56: Part III The General Linear Model Chapter 9 Regression

Chapter 9.4Exponential Function, using Linear Regression

Page 57: Part III The General Linear Model Chapter 9 Regression

Exponential functions

• Exponential rates are common in biology• Example: Intrinsic rate of population increase

Page 58: Part III The General Linear Model Chapter 9 Regression

Exponential functions

• Exponential rates are common in biology• Example: specific growth rate

• = initial weight (kg)

• = recapture weight (kg)

• = time in days from initial to recapture (days)

• = exponential growth rate (%/day)

Page 59: Part III The General Linear Model Chapter 9 Regression

Exponential functions

• Exponential rates are common in biology• Example: specific growth rate– Growth of 6 lungfish in 2001 in Lake Baringo,

Kenya

kg kg TimeInitial End Days1.32 1.46 501.30 1.48 641.60 1.84 650.76 0.90 560.60 0.65 202.74 2.86 48

Page 60: Part III The General Linear Model Chapter 9 Regression

1. Construct Model• Verbal– Growth rate of lungfish is exponential, with fixed growth rate k

• Graphical

D

V

G F

• Formal

– Have to linearize to apply regression:

Page 61: Part III The General Linear Model Chapter 9 Regression

2. Execute analysis.

Page 62: Part III The General Linear Model Chapter 9 Regression

3. Evaluate Model

□ Straight line model ok?

□ Errors homogeneous?

□ Errors normal?

□ Errors independent?

Page 63: Part III The General Linear Model Chapter 9 Regression

4. State population and whether sample is representative.• All measurements that could have been made on the fish

by this protocol

5. Decide whether to use hypothesis testing.

• The research objective is to estimate specific growth rate of fish.

• We will examine the parameters and compute confidence limits (skip to step 10).

Page 64: Part III The General Linear Model Chapter 9 Regression

10. Report and interpret parameters of biological interest.• Compute confidence limits

• Limits bound zero, suggesting no growth. Yet all fish were larger upon recapture. Improbable result:– 0.56 = 0.0156

• But was growth exponential?

confint(lm1)

2.5 % 97.5 %(Intercept) -0.133723588 0.197839514t -0.001595261 0.004696776

L = Lower limit = -0.160 %/dayU = Upper limit = 0.470 %/day

Page 65: Part III The General Linear Model Chapter 9 Regression

10. Report and interpret parameters of biological interest.• The estimate of growth rate is approximately 0.1%/day, or

about 3% per month – but the estimate is not reliable!