Download - Lecture 2 Regression relationships
RDP Statistical Methods in Scientific Research - Lecture 2 1
Lecture 2
Regression relationships
2.1 The influence of actual widths of the anorexics
2.2 Testing the importance of each influence
2.3 Comments on the anorexia study
RDP Statistical Methods in Scientific Research - Lecture 2 2
2.1 The influence of actual widths of the anorexics
Anorexics Controls
BPI Actual width BPI Actual width
130 22.5 202 18.2
194 19.2 140 24.2
160 19.3 168 16.0
120 23.3 160 21.3
152 21.3 147 21.3
144 22.8 133 24.9
120 28.2 229 17.2
141 21.9 172 19.9
130 22.0
206 19.2
153 22.1
RDP Statistical Methods in Scientific Research - Lecture 2 3
Scatter plot
RDP Statistical Methods in Scientific Research - Lecture 2 4
Observations
BPI decreases with actual width
The controls have smaller waists than the anorexics!
Actual width appears to be a stronger determinant of BPI than anorexic status
RDP Statistical Methods in Scientific Research - Lecture 2 5
Five models for the data
1 INTERCEPT Neither anorexia nor
actual width affect BPI
2 INTERCEPT + GROUP Anorexia affects BPI,
but actual width does not
RDP Statistical Methods in Scientific Research - Lecture 2 6
3 INTERCEPT + AW Actual width affects BPI,
but anorexia does not
4 INTERCEPT + GROUP + AW
Anorexia and actual width affect BPI additively
RDP Statistical Methods in Scientific Research - Lecture 2 7
5 INTERCEPT + GROUP + AW + INTERACTION
Anorexia and actual width affect BPI non-additively
Which model fits the data best?
How can we judge?
How should we play off goodness-of-fit against complexity?
RDP Statistical Methods in Scientific Research - Lecture 2 8
Residuals
The residuals are the vertical distances between the observedpoints and the fitted models:
residual = BPIobserved – BPIfitted
For example, for Model 4 we have:
RDP Statistical Methods in Scientific Research - Lecture 2 9
RDP Statistical Methods in Scientific Research - Lecture 2 10
Showing only the residuals, we have:
RDP Statistical Methods in Scientific Research - Lecture 2 11
Moving them all down to 0 gives:
The goodness-of-fit of a models is assessed in terms of theresidual sum of squares, RSS, (the smaller, the better):
2RSS residual
RDP Statistical Methods in Scientific Research - Lecture 2 12
Model fits
degrees-of-freedom (df) = n # parameters
Goodness-of-fit improves as terms are added into the model,although model complexity (number of parameters) increases(which is a bad thing)
Anorexics Controls
Model intercept slope intercept slope RSS df
1 157.9 0 157.9 0 16952.95 18
2 145.1 0 167.3 0 14681.06 17
3 338.5 8.47 338.5 8.47 6368.05 17
4 324.2 8.03 332.4 8.03 6087.24 16
5 296.2 6.77 346.0 8.93 5936.12 15
RDP Statistical Methods in Scientific Research - Lecture 2 13
Interaction
We start with the most complex model (Model 5), and see whether it can be simplified
That is, we test H0: there is no aw group interaction (Model 4 is valid)
If the observations are normally distributed, then if Model 4 is true, then Fint follows an F-distribution with (1, 15) degrees-of-freedom: that is Fint ~ F1,15, where
4 5 4 5int
5 5
(RSS RSS ) /(df df )F
RSS / df
2.2 Testing the importance of each influence
RDP Statistical Methods in Scientific Research - Lecture 2 14
Interaction
Large values of Fint indicate that H0 is false
Here, we have
This value is too small to suggest that interaction is important
The p-value is p = P(F 0.38) where F ~ F1,15, and
p = 0.5459
int
(6087.24 5936.12) /(16 15)F 0.38
5936.12 /15
RDP Statistical Methods in Scientific Research - Lecture 2 15
Actual width
Take the model in which BPI depends on actual width only (Model 3), and see whether the effect of actual width is necessary
That is, we test H0: actual width does not effect BPI, which means that Model 1 is valid
If the observations are normally distributed, then if Model 1 is true, then Faw ~ F1,17, where
1 3 1 3group
3 3
(RSS RSS ) /(df df )F
RSS / df
RDP Statistical Methods in Scientific Research - Lecture 2 16
Actual width
We have
This value is too large to come from the F1,17 distribution
The p-value is p = P(F 28.26) where F ~ F1,17, and
p < 0.0001
H0: actual width does not effect BPI is rejected
group
(16952.95 6368.05) /(18 17)F 28.26
6368.05/17
RDP Statistical Methods in Scientific Research - Lecture 2 17
Group
Accepting that actual width is needed in the model, now take Model 4, and see whether it can be simplified by removing the effect of anorexia
That is, we test H0: anorexia does not effect BPI (once aw is allowed for), which means that Model 3 is valid
If the observations are normally distributed, then if Model 3 is true, then Fgroupaw ~ F1,16, where
3 4 3 4group aw
4 4
(RSS RSS ) /(df df )F
RSS / df
RDP Statistical Methods in Scientific Research - Lecture 2 18
Group
We have
This value is too small to suggest that group is important
The p-value is p = P(F 0.74) where F ~ F1,16, and
p = 0.4030
group aw
(6368.05 6087.24) /(17 16)F 0.74
6087.24 /16
RDP Statistical Methods in Scientific Research - Lecture 2 19
Final model
This is Model 3, which states that BPI has
mean = 8.47 + 338.5 aw
standard deviation = 395.74 = 19.89
and that being anorexic has no significant effect on bodyperception index
RDP Statistical Methods in Scientific Research - Lecture 2 20
Order of fitting is important
Test interaction first: if this is significant, then the two main effects should not be tested: Model 5 is needed to describe the data
Then determine whether actual width is needed in the model
As actual width is needed, test the effect of group (the factor that is of interest), by comparing Model 3 with Model 4
If actual width were not needed, test the effect of group by
comparing Model 1 with Model 2
RDP Statistical Methods in Scientific Research - Lecture 2 21
Order of fitting is important
To compare Model 1 with Model 2, find
which is
The p-value is p = P(F 2.63) where F ~ F1,16, and
p = 0.1232
1 2 1 2group
2 2
(RSS RSS ) /(df df )F
RSS / df
group
(16952.95 14681.06) /(18 17)F 2.631
14681.06 /17
RDP Statistical Methods in Scientific Research - Lecture 2 22
Order of fitting is important
The t-statistic for testing the effect of anorexia shown on Slide 1.11 was equal to 1.622
The square of 1.622 is 2.631, which is equal to Fgroup
This is no coincidence: these two tests are in fact identical
BUT, in this case, due to the important influence of actualwidth, any analysis that fails to account for aw is invalid
RDP Statistical Methods in Scientific Research - Lecture 2 23
Choice of subjects
The anorexics were consecutive unmarried female patients at St George’s Hospital, London
The controls were volunteer fifth form pupils from Putney Girls’ High School, with normal dietary habits
Ages: Anorexics mean = 19.7, sd = 3.6
Controls mean = 15.4, sd = 0.5
This was not a suitable control group for this study
2.3 Comments on the anorexia study