sadc course in statistics a model for comparing means (session 12)
TRANSCRIPT
SADC Course in Statistics
A model for comparing means
(Session 12)
2To put your footer here go to View > Header and Footer
Learning Objectives
At the end of this session, you will be able to
• understand and interpret the components of a linear model for comparing means
• make comparisons from an examination of the parameter estimates via t-tests
• describe assumptions associated with a linear model for two categorical factors
• conduct a residual analysis to check model assumptions
3To put your footer here go to View > Header and Footer
A model for the paddy data
Consider again the objective of comparing paddy yields across the 3 varieties.
A linear model for this data takes the form:
yij = 0 + gi + ij , i = 1, 2, 3
Here 0 represents a constant, and the gi
represent the variety effect.
Estimates of 0 and gi can be obtained with
appropriate software.
4To put your footer here go to View > Header and Footer
A model for the paddy dataGraph showing the model:yij = 0 + gi + ij , i = 1, 2, 3
23
45
6Y
ield
in k
g/ha
Grand mean=4.06
Mean value for old improved variety
New imp Old imp Traditional
5To put your footer here go to View > Header and Footer
Model estimates and anova
Source d.f. S.S. M.S. F Prob.
Variety 2 35.278 17.639 40.8 0.000
Residual 33 14.269 0.4324
Total 35 49.547
Parameter Coeff. Std.error t t prob
Constant 5.960 0.329 18.1 0.000
Old impro. -1.416 0.365 -3.88 0.000
Traditional -2.960 0.370 -8.00 0.000
What do these results tell us?
6To put your footer here go to View > Header and Footer
Graph showing model againyij = 0 + gi + ij , i = 1, 2, 3 Mean
for variety i = constant + gi = 5.96 + gi ,
where g1 = 0, g2 = -1.416, g3 = -2.96
New imp Old imp Traditional
23
45
6Y
ield
in k
g/ha
Mean value of new improved variety at 5.96
7To put your footer here go to View > Header and Footer
Relating estimates to means
Note: Old - New = -1.42 = Estimate of g2
Trad - New = -2.96 = Estimate of g3
Variety Means Std.error
New improved 5.96 0.329
Old improved 4.54 0.159
Traditional 3.00 0.170
Thus comparison with the “first” level becomes easy – and t-tests (slide 5) can be interpreted as comparisons with this level.
8To put your footer here go to View > Header and Footer
Other comparisons
Variety Means Std.error
New improved 5.96 0.329
Old improved 4.54 0.159
Traditional 3.00 0.170
How do we compare old with traditional?
First note (using parameter estimates) that
Old-Trad = (Old-New)-(Trad-New) = g2-g3
= - 1.416 - (-2.960) = 1.544
This is the same as the difference in means between the two varieties (see below).
9To put your footer here go to View > Header and Footer
Finding the standard error
Var-covar: 0 g2 g3
0 0.1081 -0.1081 -0.1081
g2 0.1335 0.1081
g30.1369
But how can the std. error be found?
For this, the variance-covariance matrix between parameter estimates is needed, (see below) followed by some computations.
Variances are the diagonal elements, co-variances are the off-diagonal elements
10To put your footer here go to View > Header and Footer
Computing the standard error
Need Var(g2-g3)
= Var(g2)+Var(g3)-2covar(g2,g3)
= 0.1335 + 0.1369 – 2(0.1081) = 0.0542
Hence, std error(g2-g3) = 0.0542 = 0.2328
So t-test for the comparison will be
t = 1.544 / 0.2328 = 6.63, which is clearly a highly significant result.
So clear evidence of a difference between old improved and traditional varieties.
11To put your footer here go to View > Header and Footer
Model AssumptionsAnova model with one categorical factors is:
yij = 0 + gi + ij
As in linear regression, it is assumed that this model is linear. Additionally, the i areassumed to be
• independent, with
• zero mean and constant variance 2,
• and be normally distributed.
Note: As before, values predicted for yij are
called fitted values.
12To put your footer here go to View > Header and Footer
Checking Model Assumptions
Model assumptions are checked in exactly the same way as for regression analysis.
A residual analysis is done, looking at plots of residuals in various ways.
Such procedures are the same when modelling any quantitative response using a model linear in its unknown parameters.
We give below a residual analysis for the model fitted above.
13To put your footer here go to View > Header and Footer
Histogram to check normality
Histogram of standardised residuals after fitting a model of yield on variety.
0.1
.2.3
.4.5
De
nsity
-2 -1 0 1 2Standardized residuals
14To put your footer here go to View > Header and Footer
A normal probability plot…
Another check on the normality assumption
Do you think the points follow a straight line?
-2-1
01
2S
tand
ard
ize
d re
sidu
als
-2 -1 0 1 2Inverse Normal
15To put your footer here go to View > Header and Footer
Std. residuals versus fitted values
Checking assumption of variance homogeneity, and identification of outliers:
Is this plot satisfactory?
The straight vertical lines appear because variety has just 3 distinct values.
-2-1
01
2S
tand
ard
ize
d re
sidu
als
3 4 5 6Fitted values
16To put your footer here go to View > Header and Footer
Conclusions
• There was little indication to doubt any of the assumptions associated with the model.
• There was clear evidence that the varieties differed in terms of the corresponding mean paddy yields.
• The new improved variety gave highest production, showing an increase of 1.42 tonnes/ha with confidence interval (0.67, 2.16) over the old improved variety.
• Least production was with the traditional variety.
17To put your footer here go to View > Header and Footer
Practical work follows to ensure learning objectives are
achieved…