sadc course in statistics a model for comparing means (session 12)

17
SADC Course in Statistics A model for comparing means (Session 12)

Upload: xavier-wyatt

Post on 28-Mar-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SADC Course in Statistics A model for comparing means (Session 12)

SADC Course in Statistics

A model for comparing means

(Session 12)

Page 2: SADC Course in Statistics A model for comparing means (Session 12)

2To put your footer here go to View > Header and Footer

Learning Objectives

At the end of this session, you will be able to

• understand and interpret the components of a linear model for comparing means

• make comparisons from an examination of the parameter estimates via t-tests

• describe assumptions associated with a linear model for two categorical factors

• conduct a residual analysis to check model assumptions

Page 3: SADC Course in Statistics A model for comparing means (Session 12)

3To put your footer here go to View > Header and Footer

A model for the paddy data

Consider again the objective of comparing paddy yields across the 3 varieties.

A linear model for this data takes the form:

yij = 0 + gi + ij , i = 1, 2, 3

Here 0 represents a constant, and the gi

represent the variety effect.

Estimates of 0 and gi can be obtained with

appropriate software.

Page 4: SADC Course in Statistics A model for comparing means (Session 12)

4To put your footer here go to View > Header and Footer

A model for the paddy dataGraph showing the model:yij = 0 + gi + ij , i = 1, 2, 3

23

45

6Y

ield

in k

g/ha

Grand mean=4.06

Mean value for old improved variety

New imp Old imp Traditional

Page 5: SADC Course in Statistics A model for comparing means (Session 12)

5To put your footer here go to View > Header and Footer

Model estimates and anova

Source d.f. S.S. M.S. F Prob.

Variety 2 35.278 17.639 40.8 0.000

Residual 33 14.269 0.4324

Total 35 49.547

Parameter Coeff. Std.error t t prob

Constant 5.960 0.329 18.1 0.000

Old impro. -1.416 0.365 -3.88 0.000

Traditional -2.960 0.370 -8.00 0.000

What do these results tell us?

Page 6: SADC Course in Statistics A model for comparing means (Session 12)

6To put your footer here go to View > Header and Footer

Graph showing model againyij = 0 + gi + ij , i = 1, 2, 3 Mean

for variety i = constant + gi = 5.96 + gi ,

where g1 = 0, g2 = -1.416, g3 = -2.96

New imp Old imp Traditional

23

45

6Y

ield

in k

g/ha

Mean value of new improved variety at 5.96

Page 7: SADC Course in Statistics A model for comparing means (Session 12)

7To put your footer here go to View > Header and Footer

Relating estimates to means

Note: Old - New = -1.42 = Estimate of g2

Trad - New = -2.96 = Estimate of g3

Variety Means Std.error

New improved 5.96 0.329

Old improved 4.54 0.159

Traditional 3.00 0.170

Thus comparison with the “first” level becomes easy – and t-tests (slide 5) can be interpreted as comparisons with this level.

Page 8: SADC Course in Statistics A model for comparing means (Session 12)

8To put your footer here go to View > Header and Footer

Other comparisons

Variety Means Std.error

New improved 5.96 0.329

Old improved 4.54 0.159

Traditional 3.00 0.170

How do we compare old with traditional?

First note (using parameter estimates) that

Old-Trad = (Old-New)-(Trad-New) = g2-g3

= - 1.416 - (-2.960) = 1.544

This is the same as the difference in means between the two varieties (see below).

Page 9: SADC Course in Statistics A model for comparing means (Session 12)

9To put your footer here go to View > Header and Footer

Finding the standard error

Var-covar: 0 g2 g3

0 0.1081 -0.1081 -0.1081

g2 0.1335 0.1081

g30.1369

But how can the std. error be found?

For this, the variance-covariance matrix between parameter estimates is needed, (see below) followed by some computations.

Variances are the diagonal elements, co-variances are the off-diagonal elements

Page 10: SADC Course in Statistics A model for comparing means (Session 12)

10To put your footer here go to View > Header and Footer

Computing the standard error

Need Var(g2-g3)

= Var(g2)+Var(g3)-2covar(g2,g3)

= 0.1335 + 0.1369 – 2(0.1081) = 0.0542

Hence, std error(g2-g3) = 0.0542 = 0.2328

So t-test for the comparison will be

t = 1.544 / 0.2328 = 6.63, which is clearly a highly significant result.

So clear evidence of a difference between old improved and traditional varieties.

Page 11: SADC Course in Statistics A model for comparing means (Session 12)

11To put your footer here go to View > Header and Footer

Model AssumptionsAnova model with one categorical factors is:

yij = 0 + gi + ij

As in linear regression, it is assumed that this model is linear. Additionally, the i areassumed to be

• independent, with

• zero mean and constant variance 2,

• and be normally distributed.

Note: As before, values predicted for yij are

called fitted values.

Page 12: SADC Course in Statistics A model for comparing means (Session 12)

12To put your footer here go to View > Header and Footer

Checking Model Assumptions

Model assumptions are checked in exactly the same way as for regression analysis.

A residual analysis is done, looking at plots of residuals in various ways.

Such procedures are the same when modelling any quantitative response using a model linear in its unknown parameters.

We give below a residual analysis for the model fitted above.

Page 13: SADC Course in Statistics A model for comparing means (Session 12)

13To put your footer here go to View > Header and Footer

Histogram to check normality

Histogram of standardised residuals after fitting a model of yield on variety.

0.1

.2.3

.4.5

De

nsity

-2 -1 0 1 2Standardized residuals

Page 14: SADC Course in Statistics A model for comparing means (Session 12)

14To put your footer here go to View > Header and Footer

A normal probability plot…

Another check on the normality assumption

Do you think the points follow a straight line?

-2-1

01

2S

tand

ard

ize

d re

sidu

als

-2 -1 0 1 2Inverse Normal

Page 15: SADC Course in Statistics A model for comparing means (Session 12)

15To put your footer here go to View > Header and Footer

Std. residuals versus fitted values

Checking assumption of variance homogeneity, and identification of outliers:

Is this plot satisfactory?

The straight vertical lines appear because variety has just 3 distinct values.

-2-1

01

2S

tand

ard

ize

d re

sidu

als

3 4 5 6Fitted values

Page 16: SADC Course in Statistics A model for comparing means (Session 12)

16To put your footer here go to View > Header and Footer

Conclusions

• There was little indication to doubt any of the assumptions associated with the model.

• There was clear evidence that the varieties differed in terms of the corresponding mean paddy yields.

• The new improved variety gave highest production, showing an increase of 1.42 tonnes/ha with confidence interval (0.67, 2.16) over the old improved variety.

• Least production was with the traditional variety.

Page 17: SADC Course in Statistics A model for comparing means (Session 12)

17To put your footer here go to View > Header and Footer

Practical work follows to ensure learning objectives are

achieved…