comparing mean vectors for several populationsmaitra/stat501/lectures/manova-i.pdf · comparing...

25
Comparing Mean Vectors for Several Populations Compare mean vectors for g treatments (or populations). Randomly assign n units to the -th treatment (or take independent random samples from g populations) Measure p characteristics of each unit. Observation vectors for the -th population Pop : x 1 ,x 2 , ..., x ‘n , =1, ..., g. are p × 1 vectors of measurements. We use ¯ x to denote the sample mean vector for the th treatment, and S to denote the estimated covariance matrix in the th group. Each unit responds independently of any other unit. We will use n to denote the total sample size: n = n . 364

Upload: votu

Post on 09-Mar-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Comparing Mean Vectors for SeveralPopulations

• Compare mean vectors for g treatments (or populations).

• Randomly assign n` units to the `-th treatment (or takeindependent random samples from g populations)

• Measure p characteristics of each unit. Observation vectorsfor the `-th population

Pop ` : x`1, x`2, ..., x`n`, ` = 1, ..., g.

are p× 1 vectors of measurements. We use x̄` to denote thesample mean vector for the `th treatment, and S` to denotethe estimated covariance matrix in the `th group.

• Each unit responds independently of any other unit.

• We will use n to denote the total sample size: n =∑` n`.

364

Comparing Several Mean Vectors

• If all n`−p are large, the following assumptions are all we needto make inferences about the difference between treatments:

1. X`1, X`2, ..., X`n` ∼ p-variate distribution(µ`,Σ`).

2. Each unit responds independently of any other unit (unitsare randomly allocated to the g treatment groups).

3. Covariance matrices are homogeneous: Σ` = Σ for allgroups.

4. Each unit responds independently of any other unit.

• When samples sizes are small, we use more assumptions:

1. Distributions are multivariate normal.

365

Pooled estimate of the covariance matrix

• If all population covariance matrices are the same, then allgroup-level matrices of sums of squares and cross-productsestimate the same quantity.

• Then, it is reasonable to combine all the group-levelcovariance matrices into a single estimate by computingthe weighted average of the covariance matrices. Weightsare proportional to the number of units in each treatmentgroup.

• The pooled estimate of the common covariance matrix is

Spool =g∑

`=1

(n` − 1)∑gj=1(nj − 1)

S`.366

Analysis of Variance (ANOVA)

• To develop approaches to compare g multivariate means, it

will be convenient to make use of the usual decomposition

of the variability in the sample response vectors into two

sources:

1. Variability due to differences in treatment mean vectors

(between-group variation)

2. Variability due to measurement error or differences among

units within treatment groups( within-group variation)

• We review some of these concepts in the univariate setting,

when p = 1.

367

ANOVA (cont’d)

• If an observation X`,j ∼ N(µ`, σ2), we can write down a model

X`,j = µ` + e`,j= µ+ τ` + e`,j,

where µ is an overall mean, τ` is the effect of the `thtreatment, and e`,j ∼ N(0, σ2).

• A test of the null hypothesis of no differences among treat-ment means consists of testing

H0 : µ+ τ1 = µ+ τ2 = ... = µ+ τg = 0which is equivalent to H0 : τ1 = τ2 = ... = τg = 0.

• For identifiability reasons, we typically impose a restrictionlike ∑

`

τ` = 0 or τg = 0

368

ANOVA (cont’d)

• Note that because µ` = µ+ τ`, it follows that τ` = µ`− µ, sothat a treatment effect is really indicating a deviation of thegroup-level mean fromµ.

• We can decompose an observation in a similar manner:

x`j = x̄+ (x̄` − x̄) + (x`j − x̄`),

by adding and subtracting x̄ and x̄` to the observation.

• Note that

(x`j − x̄)︸ ︷︷ ︸Overall variability

= (x̄` − x̄)︸ ︷︷ ︸Between-group var.

+ (x`j − x̄`)︸ ︷︷ ︸Within-group var.

369

ANOVA (cont’d)

• If we first square both sides of the above expression and sum

over all n` observations in the group and over all groups we

have

(x`j − x̄)2 = (x̄` − x̄)2 + (x`j − x̄`)2 + 2(x̄` − x̄)(x`j − x̄`)

andg∑

`=1

n∑̀j=1

(x`j − x̄)2 =g∑

`=1

n∑̀j=1

(x̄` − x̄)2 +g∑

`=1

n∑̀j=1

(x`j − x̄`)2

=g∑

`=1

n`(x̄` − x̄)2 +g∑

`=1

n∑̀j=1

(x`j − x̄`)2

= SSTreatments + SSError.

370

ANOVA (cont’d)

• The null hypothesis of equal treatment means is rejected at

level α if

F =SSTreatments/(g − 1)

SSError/(n− g)> F(g−1,n−g),α

371

MANOVA: Multivariate Analysis of Variance• We now extend ANOVA to the case where observations x`j

are p-dimensional vectors.

• A one-way linear model similar to the one we wrote for theone-dimensional case is now

x`j1x`j2

...x`jp

=

µ1 + τ`1µ2 + τ`2

...µp + τ`p

+

e`j1e`j2

...e`jp

.

• In vector form, the observation for the jth unit in the `thtreatment group is written as

x`j = µ+ τ` + e`j,

where all are p-dimensional vectors and e`j ∼ Np(0,Σ`).

372

MANOVA: Multivariate Analysis of Variance

• A data matrix X for all units in all groups has dimension n×pwhere n =

∑` n`. Each row of X is a unit and each column

represents a measurement:

Xn×p =

x111 x112 · · · x11p... ... ... ...

x1n11 x1n12 · · · x1n1px211 x212 · · · x21p

... ... ... ...x2n21 x2n22 · · · x2n2p... ... ... ...

... ... ... ...xgng1 xgng2 · · · xgngp

373

MANOVA: Multivariate Analysis of Variance

• We can write the multivariate linear model as

Xn×p = An×(g+1)β(g+1)×p + εn×p,

where the right-hand side in more detail is

1 1 0 · · · 0... ... ... ... ...1 1 0 · · · 01 0 1 · · · 0... ... ... ... ...1 0 1 · · · 0... ... ... ... ...... ... ... ... ...1 0 0 · · · 1

µ1 µ2 · · · µpτ11 τ12 · · · τ1pτ21 τ22 · · · τ2p

... ... ... ...τg1 τg2 · · · τgp

+

ε′11ε′12...ε′1n1ε′21ε′2n2...

...ε′gng

374

MANOVA (cont’d)

• Each column of the matrix β corresponds to a variable (ormeasured trait).

• Each row of the error matrix ε is a p× 1 vector.

• As written, the n× (g + 1) design matrix A has linearlydependent columns. To deal with this, SAS imposes therestriction

τg1 = τg2 = · · · = τgp = 0,

so that the last row of β and the last column of A areeliminated. Under this restriction

E(xgj) = µ, and τ` = µ` − µg = E(x`j)− E(xgj).

375

MANOVA (cont’d)

• With this restriction, A becomes a n×g matrix of full-column

rank, and the MLE of the g × p matrix β is

β̂g×p = (A′g×nAn×g)−1A′g×nXn×p.

• When we set τg = 0, β̂ (as estimated by SAS) is

β̂ =

µ̂′

τ̂1′

...τ̂ ′g−1

=

x̄′g

(x̄1 − x̄g)′...

(x̄g−1 − x̄g)′

.

376

MANOVA (cont’d)

• For the kth measurement (kth column of β, k = 1, ..., p) we

have

β̂k ∼ Ng(βk, σkk(A′A)−1),

and

cov(β̂k, β̂i) = σki(A′A)−1.

• Estimates of the σkk and σki are obtained from the decom-

position of total sums of squares and cross-products into the

matrix of treatment SS and CP and the matrix of error SS

and CP.

377

Sums of squares and cross-products matrices

• As in the univariate case, we can write a p-dimensional

observation vector as a sum of deviations:

(x`j − x̄) = (x̄` − x̄) + (x`j − x̄`).

• Note that

(x`j − x̄)(x`j − x̄)′ = [(x̄` − x̄) + (x`j − x̄`)][(x̄` − x̄) + (x`j − x̄`)]′

= (x̄` − x̄)(x̄` − x̄)′+ (x̄` − x̄)(x`j − x̄`)′+(x`j − x̄`)(x̄` − x̄)′+ (x`j − x̄`)(x`j − x̄`)′.

378

Sums of squares and cross-products matrices(cont’d)

• Within any treatment group,∑n`j=1(x`j − x̄`) = 0

• Then,∑g`=1

∑n`j=1(x̄` − x̄)(x`j − x̄`)′ = 0 and∑g

`=1∑n`j=1(x`j − x̄`)(x̄` − x̄)′ = 0

• It follows thatg∑

`=1

n∑̀j=1

(x`j − x̄)(x`j − x̄)′ =g∑

`=1

n`(x̄` − x̄)(x̄` − x̄)′

+g∑

`=1

n∑̀j=1

(x`j − x̄`)(x`j − x̄`)′.

379

Sums of squares and cross-products matrices(cont’d)

• The matrix to the left of the = sign is called the correctedtotal sums of squares and cross products matrix.

• The matrices on the right side are called, respectively, thetreatment sums of squares and cross-products matrix,denoted by B, and the error sums of squares and cross-products matrix, denoted by W (for ’within groups’).

• Notice that we can re-write the W matrix as

W =g∑

`=1

n∑̀j=1

(x`j − x̄`)(x`j − x̄`)′

= (n1 − 1)S1 + (n2 − 1)S2 + · · ·+ (ng − 1)Sg.

380

Sums of squares and cross-products matrices(cont’d)

• If the g population covariance matrices are homogeneous,

then S1, S2, · · · , Sg estimate the same quantity. Then

W = (n1−1)S1+(n2−1)S2+· · ·+(ng−1)Sg = [∑`

(n`−1)]Spool,

and an estimate of the pooled covariance matrix is given by

Spool =W∑

`(n` − 1)=

W

(n− g).

• The diagonal elements of W/(n− g) estimate the p variances

and the off-diagonal elements are estimates of covariances.

381

Sums of squares and cross-products matrices(cont’d)

• Using the linear model set-up, we can extend some of the

results from linear model theory and note that

B = X ′A(A′A)−1A′X = X ′PAX

W = X ′[I − PA]X,

where PA = A(A′A)−1A′ is the usual idempotent projection

matrix.

382

Hypotheses Testing in MANOVA

• We often wish to test H0 : τ1 = τ2 = · · · = τg versusH1 : At least two τ ′`s are not equal.

• Compare the relative sizes of B and W .

Source of Matrix of sum of squares and Degrees ofvariation cross-products (SSP) freedom (d.f.)

Treatment B =∑` n`(x̄` − x̄)(x̄` − x̄)′ g − 1

Residual W =∑`∑j(x`j − x̄`)(x`j − x̄`)′ n− g

Total corrected B +W =∑`∑j(x`j − x̄)(x`j − x̄)′ n− 1

383

Hypotheses Testing in MANOVA (cont’d)

• One test of the null hypothesis is carried out using a statisticcalled Wilk’s Λ (a likelihood ratio test):

Λ =|W |

|B +W |.

• If B is ”small” relative to W, then Λ will be close to 1.Otherwise, Λ will be small.

• We reject the null hypothesis when Λ is small.

• SAS uses different notation. It calls the B matrix H andit calls the W matrix E, for ’hypothesis’ and ’error’,respectively.

384

Hypotheses Testing in MANOVA (cont’d)

• The exact sampling distribution of Wilk’s Λ can be derived

only for special cases (see next page).

• In general, for large n and under H0, Bartlett showed that

−(n− 1−

(p+ g)

2

)ln Λ ∼ χ2

p(g−1),

where the distribution is approximate. Thus, we reject H0 at

level α when

−(n− 1−

(p+ g)

2

)ln Λ ≥ χ2

p(g−1)(α).

385

Exact distribution of Wilk’s Λ

No. of No. of Sampling distribution for multivariatevariables groups normal data

p = 1 g ≥ 2(n−gg−1

) (1−Λ

Λ

)∼ Fg−1,n−g

p = 2 g ≥ 2(n−g−1g−1

) (1−√

Λ√Λ

)∼ F2(g−1),2(n−g−1)

p ≥ 1 g = 2(n−p−1

p

) (1−Λ

Λ

)∼ Fp,n−p−1

p ≥ 1 g = 3(n−p−2

p

) (1−√

Λ√Λ

)∼ F2p,2(n−p−2)

386

Other Tests

• Most packages (including SAS) will compute Wilk’s Λ andsome other statistics.

• Note that

Λ =|W |

|B +W |= |W ||B +W |−1 = |BW−1 + I|−1.

• Lawley-Hotelling trace: Reject the null hypothesis of notreatment differences at level α if

nT20 = tr(BW−1) ≥ χ2

g×p(α)

• Pillai trace: V = tr[B(B +W )−1].

387

Other Tests

• Roy’s maximum root: the test statistic is the largest

eigenvalue of BW−1. (The F-distribution used by SAS is not

accurate.)

• The power of Wilk’s, Lawley-Hotelling and Pillai statistics is

similar. Roy’s statistic has higher power only when one of

the g treatments is very different from the rest.

• Limited simulation results suggest that Pillai’s trace may be

slightly more robust to departures from multivariate normal-

ity.

388