xii. justifying the anova-based hypothesis test

50
Statistical Modelling Chapter XI 1 XII. Justifying the ANOVA-based hypothesis test XII.A The sources for an ANOVA XII.B The sums of squares for an ANOVA XII.C Degrees of freedom of the sums of squares for an ANOVA XII.D Expected mean squares for an ANOVA XII.E The distribution of the F statistics for an ANOVA XII.F Application of theory for ANOVA-based hypothesis test to an example

Upload: kelsey-douglas

Post on 31-Dec-2015

28 views

Category:

Documents


0 download

DESCRIPTION

XII. Justifying the ANOVA-based hypothesis test. XII.A The sources for an ANOVA XII.B The sums of squares for an ANOVA XII.C Degrees of freedom of the sums of squares for an ANOVA XII.D Expected mean squares for an ANOVA XII.EThe distribution of the F statistics for an ANOVA - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 1

XII. Justifying the ANOVA-based hypothesis test

XII.A The sources for an ANOVAXII.B The sums of squares for an ANOVAXII.C Degrees of freedom of the sums of

squares for an ANOVAXII.D Expected mean squares for an

ANOVAXII.E The distribution of the F statistics for

an ANOVAXII.F Application of theory for ANOVA-

based hypothesis test to an example

Page 2: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 2

Introduction

• In chapter VI we gave a procedure for determining an ANOVA table.

• In this chapter we look at the theory that justifies this procedure.

• In particular, examine basis of the SSqs, dfs, E[MSq]s and the distribution of the F statistic.

Page 3: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 3

XII.A The sources for an ANOVA

• Various methods for determining the sources to include in an ANOVA table.

• Perhaps most popular is to try to identify the situation that is closest to yours in a textbook.

• Big disadvantage is not exact match and use wrong analysis.

• Our approach based on dividing the factors into unrandomized and randomized factors

Page 4: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 4

Our approach1. identify the unrandomized and randomized factors;2. determine the structure formulae that describe the

crossing and nesting relations a) amongst the unrandomized factors and b) amongst the randomized factors (and in some cases between the

randomized and some unrandomized factors);

3. expand the structure formulae to obtain two sets of sources;

4. form the lines in the ANOVA table by a) listing the unrandomized sources in the table, b) entering the randomized sources indented under the appropriate

unrandomized sources, and c) adding Residuals for unrandomized sources where there are df

left over. • This extracts the key features of the experiment arising

from the randomization and captures these in the structure formulae for use in determining the terms in the analysis.

• The ANOVA table produced using this method displays the confounding arising from the randomization, something that traditional tables do not do.

Page 5: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 5

XII.B The SSqs for an ANOVA • Derivation of SSqs, given the sources in the ANOVA, is based on the

S matrices. • For a structure formula, one must identify the generalized factors

corresponding to the sources derived from that formula. • For each generalized factor there is an S that specifies the units that

are related in having the same level of that generalized factor. • For example, SBlocks has a one for every pair of units that occur in the

same block. • Generally have two sets of relationship (S) or, equivalently, mean-

operator matrices (M (n/f)-1S g-1S): one for the unrandomized factors and one for the randomized factors. (relationship matrices also called summation matrices)

• Now a set of such matrices forms the basis for an algebra that is called a relationship algebra.

• Associated with a set of Ss is a set of Qs, again one Q for each generalized factor, that are the idempotents of the relationship algebra and provide the quadratic-form matrices for the analysis.

• Under certain conditions, met by all our examples, a set of Qs derived from a structure formula forms a complete set of mutually-orthogonal idempotents (CSMOI — see defn XI.10).

• So we form SSqs of the projections of the data vector into the subspaces projected onto by the Qs.

Page 6: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 6

Obtaining expressions for Qs and Ms

• Rule VI.6 used to obtain expressions for Qs in terms of Ms. • Following rule used to obtain expressions for the S (& Ms) as the

direct products of I and J matrices. (generalizes Rule XI.1)

Rule XII.1: The S for a generalized factor, formed as the subset of a set of s equally-replicated factors that uniquely index the units, – is the direct product of s matrices, provided the s factors are arranged in

standard order. • Taking the s factors in the sequence specified by the standard order,

– for each factor in the generalized factor include an I matrix in the direct product and

– J matrices for those that are not. • The order of a matrix in the direct product is equal to the number of

levels of the corresponding factor. • The M matrix is the same direct product of I and J matrices, except

that each J is multiplied by the reciprocal of its order.

• Rule applies to factors from the unrandomized structure.

Page 7: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 7

Ms are symmetric and idempotent• Prove this useful result for those that can be expressed as

the direct product of I and J matrices in the next lemma.Lemma XII.1: Let M be the direct product of I and J matrices,

the latter multiplied by the reciprocal of their order, as prescribed in rule XII.1. Then M is symmetric and idempotent.

Proof: Since (A B)' A' B' and I and J are both symmetric, M must also be symmetric.

• Since (A B)(C D) (AC BD)and in particular (A B)(A B) (A2 B2), M2 must be the direct product of I and the square of J matrices, each J multiplied by the reciprocal of its order.

2 21 1 1 1 1But .q q q q q qq q qq q

q J J J J J J

• Consequently, M2 and M will be same direct product of I and J matrices, latter multiplied by the reciprocal of their order.

Page 8: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 8

XII.C Degrees of freedom of the SSqs for an ANOVA

• Definition XII.1: The degrees of freedom of a sum of squares is the rank of the idempotent of its quadratic form. That is the degrees of freedom of Y'QY is given by rank(Q).

• The following definition and lemma establish that the trace of an idempotent is the same as rank and hence its df.

• Definition XII.2: The trace of a square matrix is the sum of its diagonal elements.

• Lemma XII.2: For B idempotent, trace(B) rank(B).• Proof: The proof is based on the following facts:

– the trace of a matrix is equal to the sum of its eigenvalues;– the rank of a matrix is equal to the number of nonzero

eigenvalues; and – the eigenvalues of an idempotent can be proved to be either 1 or

zero and so sum = number.

Page 9: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 9

Results on traces of matrices

• Clearly, the dfs can be established by determining the trace of a matrix so some results on the traces.

• Lemma XII.3: Let c be a scalar and A, B and C be matrices. Then, when the appropriate operations are defined, we have– trace(A') trace(A)– trace(cA) ctrace(A)– trace(A + B) trace(A) + trace(B)– trace(AB) trace(BA)– trace(ABC) trace(CAB) trace(BCA)– trace(A B) trace(A) trace(B)– trace(A'A) 0 if and only if A 0.

Page 10: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 10

Trace of a Q• Each Q matrix is a linear combination of M matrices so that

– the trace(Q) will be the same linear combination of the traces of the Ms.

• In the next lemma we prove that the trace of an M is equal to the no. of levels in its generalized factor,– when corresponding summation matrix is direct product of I and Js.

Lemma XII.4: Let MF be a mean operator matrix for a generalized factor F that has f levels each replicated n/f = g times and SF be the corresponding summation matrix that is direct product of I and Js. Then trace(MF) f.

Proof: • Now trace(MF) g-1 trace(SF).• But SF is the direct product of I and Js. • However, trace(A B) trace(A) trace(B)

and the traces of I and J matrices are equal to their orders.• As the product of the orders of the I and Js for any S must

be n, trace(SF) nand so trace(MF) g-1 trace(SF) n/g f.

Page 11: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 11

XII.D Expected mean squares for an ANOVA

• As has been previously stated, the E[MSq]s are just the average or mean value of the MSqs under sampling from a population whose Y variables behave as described by the linear model.– i.e. the true mean value of a Mean Square– It depends on the model parameters

• To derive the expected values, we note that the general form of an MSq is a quadratic form divided by a df, Y'QY/.

• So we first establish an expression for the expectation of any quadratic form.

Page 12: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 12

Expectation of a quadratic form

• Theorem XII.1: Let Y be an n 1 vector of random variables with

E[Y] and var[Y] V,

where is a n 1 vector of expected values and V is an n n matrix.

• Let A be an n n matrix of real numbers.

• Then

E[Y'AY] trace(AV) + 'A.

Page 13: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 13

Proof of theorem XII.1• Firstly note that, as Y'AY is a scalar, Y'AY trace(Y'AY)• Also, recall that trace of matrix products are cyclically

commutative. • Thus, E[Y'AY] E[trace(Y'AY)] E[trace(AYY')].• Now, for an n n matrix Z

1 1

n nii iii i

E trace E z E z trace E

Z Z

• Also, lemma XI.5 (E[Y] results) states that E[AY] AE[Y] and this can be extended to E[AYY'] AE[YY'].

• Hence, E[trace(AYY')] trace(E[AYY']) trace(AE[YY']).

• Now we require an expression for E[YY'] which we obtain by considering definition I.7 of V E[(Y ) (Y )'].

• From this

lemma I.1

— transposes

.

E

E

E E E E

V Y ψ Y ψ

YY Yψ ψY ψψ

YY Yψ ψY ψψ

Page 14: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 14

Proof of theorem XII.1 (continued)• Now, E[] as the elements of are population

quantities and are constants with respect to expectation. • Thus

E[Y'] E[Y]' ' E[Y].• Hence, V E[YY'] E[Y'] E[Y] + ' E[YY'] '

so that E[YY'] V + '.• This leads to

.

E trace E

trace

trace trace

trace trace

trace

Y AY A YY

A V ψψ

AV Aψψ

AV ψ Aψ

AV ψ Aψ

Page 15: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 15

Theorem for E[MSq]• Theorem XII.2: Let Y be an n 1 vector of random

variables with E[Y] and var[Y] V,

where is a n 1 vector of expected values and V is an n n matrix.

• Let Y'QY/ be the mean square where Q is an n n symmetric, idempotent matrix and is the degrees of freedom of the sums of squares.

• ThenE[Y'QY/] (trace(QV) + 'Q/.

• Proof: Since is a constant, E[Y'QY/] E[Y'QY]/ and the result follows straightforwardly using theorem XII.1.

• So to derive E[MSq] for a particular source under a specific model, one substitutes – the Q matrix for the source and – the and V for the model

into the expression E[Y'QY/] (trace(QV) + 'Q/ given by theorem XII.2.

Page 16: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 16

XII.E The distribution of the F statistics for an ANOVA

• Each F statistic in an ANOVA is used to assess whether or not a null hypothesis is likely to be true by determining if the value of our test statistic is unlikely when H0 is true.

• To do this we need to establish the sampling distribution of the test statistic when the null hypothesis is true.

• Test statistic is of the following general form

• It is the ratio of two MSqs each of which is a quadratic form divided by its df.

• We will in the following three theorems:1. Give the distribution of a quadratic form.2. Establish the relationship between several quadratic forms.3. Obtain the distribution of the ratio of two independent quadratic

forms.

1 2

21 1 1

, 22 22

sF

s

Y Q YY Q Y

Page 17: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 17

Distribution of a quadratic form• Theorem XII.3: Let A be an n n symmetric matrix of

rank and Y be an n 1 normally distributed random vector with E[AY] 0, var[Y] V and E[Y'AY/] .

• Then (1/ )Y'AY follows a chi-squared distribution with degrees of freedom if and only if A is idempotent.

• Proof: not given • The chi-square probability distribution function for the

random variable U is:

2 22 22

1, 0

/ 2 2uu u e u

n

Probability distribution function for

26

Page 18: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 18

Relationships between idempotents • Theorem XII.4: Let Y be an n 1 vector of random

variables with E[Y] X and var[Y] V. • Let Y'A1Y, Y'A2Y, …, Y'AhY be a collection of h quadratic

forms where, for each i 1, 2, …, h, – Ai is symmetric, of rank i, – E[AiY] 0 and – E[Y'AiY/i] i.• If any two of the following three statements are true (any two implies the other), 1. All Ai are idempotent2. is idempotent3. AiAj 0, i j

then not only does (1/i)Y'AiY, for each i, follows a chi-squared distribution with i degrees of freedom as Theorem XII.3 establishes, but

1

hii A

– Y'AiY are independent for i j and– where denotes the rank of .1

hii

1

hii A

• Proof: not given

Page 19: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 19

Distribution of the ratio of two independent quadratic forms

• Theorem XII.5: Let U1 and U2 be two random variables distributed as chi-squares with 1 and 2 degrees of freedom.

• Then, provided U1 and U2, are independent, the random variable

is distributed as Snedecor's F with 1 and 2 degrees of freedom.

• Proof: not given

1 1

2 2

UW

U

Page 20: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 20

F distribution for W

1

1 2

1

1 2

21 2 1

22 22 1

,1 2 2

21 ,

2 2

0 .

rrr

F w w w

w

Probability distribution function for F3,46

Page 21: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 21

XII.F Application of theory for ANOVA-based hypothesis test to an example

a) The sources for an ANOVAb) The sums of squares for an ANOVAc) Degrees of freedom of the sums of squares for

an ANOVAd) Expected mean squares for an ANOVAe) The distribution of the F statistics for an

ANOVA

Page 22: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 22

Example XII.1 Randomized Complete Block Design

a) The sources for an ANOVA1. For the RCBD, the unrandomized factors are Blocks and Units

and the randomized factors are Treatments.

2. The unrandomized structure formula is b Blocks/t Units and the randomized structure formula t Treatments.

3. The unrandomized sources are Blocks and Units[Blocks] and the randomized source is Treatments.

4. The sources for the ANOVA table for this experiment are:

Source

Blocks

Units[Blocks]

Treatments

Residual

Total

Page 23: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 23

Example XII.1 Randomized Complete Block Design (continued)

b) The sums of squares for an ANOVA• Generalized factors

– unrandomized: (G), Blocks and BlocksUnits and – randomized: (G) and Treatments.

• So Qs are – unrandomized: QG, QB and QBU and – randomized: QG and QT.

• Since each of these sets is derived from a structure formula consisting of only nesting and crossing operators, they each form a CSMOI.

Page 24: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 24

Example XII.1 Randomized Complete Block Design (continued)b) The sums of squares for an ANOVA (continued)• The Hasse diagrams of generalized-factor marginalities,

that includes expressions for the Qs in terms of the Ms, are as follows:

Page 25: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 25

Example XII.1 Randomized Complete Block Design (continued)b) The sums of squares for an ANOVA (continued)• If the data is ordered in standard order for Blocks and

Treatments and if Units are given the same numbering as Treatments (this doesn't affect the analysis), rule XII.1 (S, M as direct products) yields:– MBU Ib It, and ;– and .

1B b tt

M I J 1 1G b tb t

M J J1

T b tb M J I 1 1

G b tb t M J J

• Latter requires inclusion of a factor for replicates, here Blocks; this factor never occurs in its generalized factors.

Source SSq

Blocks B B G Y Q Y Y M M Y

Units[Blocks] BU BU B Y Q Y Y M M Y

Treatments T T G Y Q Y Y M M Y

Residual

ResBU BU T

BU B T G

Y Q Y Y Q Q Y

Y M M M M Y

Total U BU G Y Q Y Y M M Y

• So the ANOVA table with SSqs added is:

Page 26: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 26

Example XII.1 Randomized Complete Block Design (continued)c) Degrees of freedom of the SSqs for an ANOVA• The following theorem establishes that the degrees of

freedom of the sums of squares are as given in the analysis of variance table.

• Theorem XII.6: Let quadratic forms for the SSqs be

Res

U BU G

B B G

BU BU B

T T G

BU BU T BU B T G

Y Q Y Y M M Y

Y Q Y Y M M Y

Y Q Y Y M M Y

Y Q Y Y M M Y

Y Q Y Y Q Q Y Y M M M M Y

– where MBU Ib It, , and 1 1G b tb t

M J J1B b tt M I J 1

T b tb M J I

• Their degrees of freedom are n1, b1, b(t1), t1 and (b1)(t1), respectively, – where n is the no. of observations, b is the no. of blocks and t is

the no. of treatments.

Page 27: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 27

Proof of theorem XII.6 (continued)

• First we establish that all Q matrices are symmetric and idempotent so that we can utilise lemma XII.2 to conclude that the ranks of the Q matrices are equal to their traces.

• Now QB, QBU and QT are symmetric and idempotent as they are in the complete sets of mutually-orthogonal idempotents.

• It remains to demonstrate that QU and are symmetric and idempotent.

ResBUQ

Page 28: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 28

Proof of theorem XII.6 (continued)

• Firstly, from lemma I.1 (transpose properties), (cA + dB)' cA' + dB' and we have from lemma XII.1 that the Ms are symmetric and idempotent so that

U BU G BU G BU G U Q M M M M M M Q

U U BU G BU G

BU BU BU G G BU G G.

Q Q M M M M

M M M M M M M M

• Now MBU In so that

U U BU BU BU G G BU G G

BU G G G

BU G

U.

Q Q M M M M M M M M

M M M M

M M

Q• That is QU is symmetric and idempotent.

Page 29: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 29

Proof of theorem XII.6 (continued)• Secondly,

ResBU BU B T G BU B T G

Q M M M M M M M M

as each M is symmetric. • Further,

Res

2BU BU B T G BU B T G

BU BU BU B BU T BU G

B BU B B B T B G

T BU T B T T T G

G BU G B G T G G

BU B T G

B B B T B G

T T B T T G

G G B G T G .

Q M M M M M M M M

M M M M M M M M

M M M M M M M M

M M M M M M M M

M M M M M M M M

M M M M

M M M M M M

M M M M M M

M M M M M M

Page 30: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 30

Proof of theorem XII.6 (continued)• Now

1 1 1 1B T G T B

1 1 1 1 1B G G G B

1 1 1 1 1T G G G T

b t b t b tt b b t

b t b t b tt b t b t

b t b t b tb b t b t

M M I J J I J J M M M

M M I J J J J J M M M

M M J I J J J J M M M• and so

Res

Res

2BU BU B T G

B B B T B G

T T B T T G

G G B G T G

BU B T G

B B G G

T G T G

G G G G

BU B T G

BU .

Q M M M M

M M M M M M

M M M M M M

M M M M M M

M M M M

M M M M

M M M M

M M M M

M M M M

Q

Page 31: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 31

Proof of theorem XII.6 (continued)• Consequently lemma XII.2 (rank trace) applies to the

Qs so that the ranks of all the Q matrices are equal to their trace.

• We also note that using lemma XII.4 (trace(MF) f) we have that

G B

T BU

1

.

trace trace b

trace t trace bt n

M M

M M

Res

U BU G

B B G

BU BU B

T T G

BU BU B T G

1

1

1

1

1 1 1 .

trace trace n

trace trace b

trace trace bt b b t

trace trace t

trace trace

bt b t b t

Q M M

Q M M

Q M M

Q M M

Q M M M M

• Since, from lemma XII.3 (traces) trace c d c trace d trace A B A B

• we have that

Page 32: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 32

ANOVA table with degrees of freedom added

Source df SSq

Blocks b-1 B B G Y Q Y Y M M Y

Units[Blocks] b(t-1) BU BU B Y Q Y Y M M Y

Treatments t-1 T T G Y Q Y Y M M Y

Residual (b-1)(t-1)

ResBU BU T

BU B T G

Y Q Y Y Q Q Y

Y M M M M Y

Total n-1 U BU G Y Q Y Y M M Y

Page 33: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 33

d) Expected mean squares for an ANOVA

• We now prove the E[MSq]s for the case of Blocks random.

• But first a useful lemma. • Lemma XII.5: Let

Res

T

1 1 1 1BU B T G

B B G T T G BU BU B T G

, , ,

, ,

b t

b t n b t b t b tt b b t

E

Y X 1 I

M I I I M I J M J I M J J

Q M M Q M M Q M M M M

• Then

Res

1G Τ

B Τ

BU Τ .

b ttQ X 1 J

Q X 0

Q X 0

Τ G Τand bttX Q X J

Page 34: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 34

Proof of lemma XII.5• First we have

1G Τ

1

b t b tbt

b tt

M X J J 1 I

1 J

1T Τ b t b tb

b t

M X J I 1 I

1 I

G Τ G Τ

1b tt

Q X M X

1 J

• so that

ResBU Τ BU B T G Τ

1 1

.

b t b t b t b tt t

Q X M M M M X

1 I 1 J 1 I 1 J

0

1B Τ

1

b t b tt

b tt

M X I J 1 I

1 J

BU Τ b t b t

b t

M X I I 1 I

1 I

B Τ B G Τ

1 1b t b tt t

Q X M M X

1 J 1 J

0 1

Τ G Τ

Finally,

.bb t b t tt tX Q X 1 I 1 J J

Page 35: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 35

Theorem XII.7

• Then, the expected mean squares are

Res

2 2B BU B

2T BU T

2BU BU

1

1

1 1

E SS b t

E SS t q

E SS b t

• where , , j is the jth element of the t-vector , b is the number of blocks and t is the number of treatments.

2T .1

1t

jjq b t

. 1

tjj

t

Res Res

T

2 2BU B

B B B G

T T T G

BU BU

BU T BU B T G .

b t

n b t

E

SS

SS

SS

Y X 1 I

V I I J

Y Q Y Y M M Y

Y Q Y Y M M Y

Y Q Y

Y Q Q Y Y M M M M Y

• Let

Page 36: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 36

• For E[SSB/(b 1)], we use theorem XII.2 (E[Y'QY/] (trace(QV) + 'Q/) and that it can be shown that QBMBU QB and QBMB QB to obtain the following expressions:

Proof of theorem XII.7• First note that we can write 2 2

BU BU B Bt V M M

BB

2 2B BU BU B B

T B T

2 2BU B BU B B B

T B T

2 2BU B B B

T B T

2 2BU B B

T B T

11

1

1

1

SSE E b

b

trace tb

trace t traceb

trace t traceb

t trace

Y Q Y

Q M M

X Q X

Q M Q M

X Q X

Q Q

X Q X

Q

X Q X

1 .b

Page 37: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 37

Proof of theorem XII.7 (continued)• Now from theorem XII.6 (RCDD dfs) we have

that trace(QB) (b 1) and from lemma XII.5 B ΤQ X 0

• Hence, the expected mean square is

2 2BU B B

B

T B T

2 2BU B

2 2BU B

1 1

1 0 1

.

t traceE SS b b

t b b

t

Q

X Q X

• The proof that is left as an exercise for you.

2T BU T1E SS t q

Page 38: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 38

Proof of theorem XII.7 (continued)• Now, for , again we use

theorem XII.2 (E[MSq]s) and also that it can be shown that

ResBU 1 1E SS b t

Res Res Res

Res Res

BU BU BU BU BU

B BU BU B

M Q Q M Q

M Q Q M 0

• so that we have

Res

Res

Res Res

Res Res

BU

BU

2 2BU BU BU B B T BU T

2BU BU T BU T

1 1

1 1

1 1

0.

1 1

E SS b t

E b t

trace t

b t

trace

b t

Y Q Y

Q M M X Q X

Q X Q X

Page 39: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 39

Proof of theorem XII.7 (continued)• Now from theorem XII.6 (RCBD dfs) we have that

ResBU 1 1trace b t Q

and from lemma XII.5 so that ResBU ΤQ X 0

Res

Res Res

BU

2BU BU T BU T

2BU

2BU

1 1

1 1

1 1 0

1 1

E SS b t

trace

b t

b t

b t

Q X Q X

as claimed.

Page 40: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 40

ANOVA table with E[MSq]s added

Source df SSq E[MSq]

Blocks b-1 B B G Y Q Y Y M M Y 2 2BU Bt

Units[Blocks] b(t-1) BU BU B Y Q Y Y M M Y

Treatments t-1 T T G Y Q Y Y M M Y 2BU Tq

Residual (b-1)(t-1)

ResBU BU T

BU B T G

Y Q Y Y Q Q Y

Y M M M M Y

2BU

Total n-1 U BU G Y Q Y Y M M Y

Page 41: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 41

e) The distribution of the F statistics for an ANOVA

• We next derive the sampling distribution of the F-statistics for testing treatment and block differences.

• The following theorems involve establishing the sampling distribution of the F-statistics under these models.

• Note that the model – under the null hypothesis of no treatment effects is

E[Y] XG and – while under the null hypothesis of no block variation is

E[Y] XT and

2 2BU Bn b t V I I J

2BU nV I

Page 42: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 42

Theorem XII.8• Let

Res Res

G

2 2BU B

2T T

2BU BU

1

1 1

b t

n b t

E

s t

s b t

ψ Y X 1 1

V I I J

Y Q Y

Y Q Y

• Then, the ratio of these two mean squares, given by

Res

2T

1 , 1 1 2BU

t b ts

Fs

is distributed as a Snedecor's F with (t1) and (b1)(t1) degrees of freedom.

where

ResT T G BU BU B T G

1 1 1 1BU B T G

,

, , ,b t n b t b t b tt b b t

Q M M Q M M M M

M I I I M I J M J I M J J

Page 43: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 43

Proof of theorem XII.8• We have to show that

1. and

under the expectation and variation models above, so that theorem XII.3 (distn Y'AY) can be invoked to conclude that

follow chi-square distributions;

ResT BUE E Q Y Q Y 0

Res

2T BU BU1 1 1E t E b t Y Q Y Y Q Y

Res

2 2BU BU BU T1 and 1 Y Q Y Y Q Y

2. the quadratic forms and are independent as in theorem XII.4 (several Y'AiYs);

TY Q YResBUY Q Y

3. then theorem XII.5 (F distribution) can be invoked to obtain the distribution of the F test statistic.

Page 44: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 44

Proof of theorem XII.8 (continued)• Firstly,

T T

T G

T G G

1 1 1b t b t b tb b t

b t b t

E E

Q Y Q Y

Q X

M M X

J I J J 1 1

1 1 1 1

0

Res Res

Res

BU BU

BU G

BU B T G G

1 1 1 1

.

b t b t b t b t b tt b b t

b t b t b t b t

E E

Q Y Q Y

Q X

M M M M X

I I I J J I J J 1 1

1 1 1 1 1 1 1 1

0

Page 45: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 45

Proof of theorem XII.8 (continued)• Also, using theorem XII.2 (E[MSq]s) and

– as QTMBU QTIn QT

– QTMB (MT MG)MB MG MG 0

(MBMT MTMB MBMG MGMB MGMT MTMG MG from the proof of theorem XII.6 {RCBD dfs})

– QTXG 0 (see above in this proof), and – trace(QT) t 1 (from theorem XII.6 {RCBD dfs}),

• we have

2 2T BU BU B B

T

G T G

2BU T G T G

2BU

2BU

1 1

0 1

1 0 1

.

trace tE t t

trace t

t t

Q M MY Q Y

X Q X

Q X Q X

Page 46: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 46

Proof of theorem XII.8 (continued)• Now, using theorem XII.2 (E[MSq]s), and as

– –

(MBMT MTMB MBMG MGMB MGMT MTMG MG from the proof of theorem XII.6 {RCBD dfs})

– (see above in this proof), and– (from theorem XII.6 {RCBD dfs}),

• we have

Res Res ResBU BU BU BUn Q M Q I Q

ResBU B BU B T G B

B B G G

Q M M M M M M

M M M M

0

ResBU G Q X 0

ResBU 1 1trace b t Q

Res

Res

Res

Res Res

BU

2 2BU BU BU B B

G BU G

2BU BU G BU G

2BU

2BU

1 1

1 1

0 1 1

1 1 0 1 1

.

E b t

trace tb t

trace b t

b t b t

Y Q Y

Q M M

X Q X

Q X Q X

Page 47: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 47

Proof of theorem XII.8 (continued)• Secondly, to show that and are

independent quadratic forms we have to show that QT and meet two of the three conditions outlined in theorem XII.4 (several Y'AiYs).

TY Q YResBUY Q Y

ResBUQ

• As outlined in the proof of theorem XII.6 (RCBD dfs), QT and are idempotent so that condition 1 is met.

ResBUQ• For condition 3, we require that .

ResT BU Q Q 0• Now,

ResT BU T G BU B T G

T BU T B T T T G

G BU G B G T G G

T G T G G G G G

.

Q Q M M M M M M

M M M M M M M M

M M M M M M M M

M M M M M M M M

0

Page 48: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 48

Proof of theorem XII.8 (continued)• Thirdly, theorem XII.5 (F distribution) means that, as

are distributed as independent chi-squares with degrees of freedom (t 1) and (b 1)(t 1), respectively, then

Res

2 2BU T BU BU1 and 1 Y Q Y Y Q Y

follows an F distribution with (t 1) and (b 1)(t 1) degrees of freedom.

ResRes

2BU T T

2BUBU BU

1 1 1

1 11 1 1

t tF

b tb t

Y Q Y Y Q Y

Y Q YY Q Y

Page 49: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 49

Theorem XII.9• Let

Res Res

T

2BU

2B B

2BU BU

1

1 1

b t

n

E

s b

s b t

ψ Y X 1 I

V I

Y Q Y

Y Q Y

• where

ResB B G BU BU B T G

1 1 1 1BU B T G

,

, , ,b t n b t b t b tt b b t

Q M M Q M M M M

M I I I M I J M J I M J J

• Then, the ratio of these two mean squares, given by

is distributed as a Snedecor's F with (b1) and (b1)(t1) degrees of freedom.

Res

2B

1 , 1 1 2BU

b b ts

Fs

• Proof: parallels that for the treatment differences.

Page 50: XII. Justifying the ANOVA-based hypothesis test

Statistical Modelling Chapter XI 50

XII.G Exercises

• Ex. 12-1 requires the proofs of some properties of Q and M matrices for the RCBD.

• Ex. 12-2 asks you to derive expressions for Q and M matrices for a CRD and to prove that the Residual operator is symmetric and derive its df.

• Ex. 12-3 involves the proof of the E[MSq] for the RCBD.

• Ex. 12-4 asks you to prove that the ratio of the Rows and Residual Msq for a Latin square is distributed as Snedecor's F under the null hypothesis.