Download - I Ieboos/library/mimeo.archive/... · 2019-07-25 · I I. I I I I I I I Ie I I I I I I I I· I RANK ANALYSIS OF COVARIANCE by Dana Quade University of North Carolina Institute of

I

I.IIIIIIIIeIIIIIIII·

I

RANK ANALYSIS OF COVARIANCE

by

Dana Quade

University of North Carolina

Institute of Statistics Mimeo Series No. 483

July 1966

This research was supported by the National Institutesof Health Grants No. GM-10397 and GM-12868-0l.

DEPARTMENT OF BIOSTATISTICS

UNIVERS ITY OF NORTH CAROLINA

Chapel Hill, N. C.

I

I.IIIIIIIIeIIIIIIIIe

I

RANK ANALYSIS OF COVARIANCE1

Dana Quade

University of North Carolina

Various methods are discussed for the problem of comparing two

or more populations with respect to a response variable Y in

the presence of a (possibly multivariate) concomitant variable

x - a situation in which the standard method is the classical

one-way analysis of covariance. A method based on ranks is

developed.

1. INTRODUCTION

Suppose that from each of m populations we have a random sample of

observations (Y. , Xi ), where Y. is the univariate response of the ith ob-J.a a J.a -

servation in the ath sample (1 < i < n , 1 < a < m), and X. is the corre-- - a - - J.a

sponding value of a concomitant variable, possibly multivariate, whose mar-

gina1 probability distribution is the same in each population. The problem

is then to test the hypothesis H that the conditional distribution of Yo

given X is also the same for each population, where the alternatives of in-

terest are those which imply that some populations tend to have greater va1-

ues of Y then others for all fixed values of X.

For such situations let us invoke a very general principle which

may be described as follows. If the hypothesis is true then the populations

are all identical and the samples can be pooled. Use the pooled sample to

determine a relationship by which Y can be predicted from X. Then compare

1Supported by National Institutes of Health Grants No. GM-10397 andGM-12868-01. .

-2-

each observed response Yia with the value which would be predicted for it

from the corresponding Xia , and assign it a score Zia' positive if Yia is

greater than predicted, and negative if smaller. Finally, use the scores

to perform an ordinary one-way analysis of variance for comparing the popu-

lations.

As a particular instance of this general method I propose a pro-

cedure called rank analysis of covariance. Assume,in order to make ranking

possible, that each variate has been measured on at least an ordinal scale;

continuity is not required, however, and even a dichotomy is permitted as an

extreme case. So let the rank of Y. among all the N = En observed values~a a

of Y be Ria + (N+l)/2, where the term (N+l)/2 has been inserted for con-

venience so that EER. = 0, thus correcting the ranks for their mean; use~a

"average ranks" in case of ties, and (for definiteness) rank from the smallest

. . «1) (2) (p))first. Similarly, if X is actually a p-var~ate var~able X ,X , ••• ,X ,

let C(k) + (N+l)/2 be the rank of x~k) among the N observed values ofia ~a

x(k). Then characterize the relationship between Y and X by performing an

ordinary multiple linear regression of Ron c(l) ,C(2) , ..• ,c(p); calculate

fitted values R. , and assign as scores the residuals from this regression~a

of ranks: i.e., let

Z. = R. - R.~a ~a ~a

Finally, to test the hypothesis of identical conditional distributions of

Y on X use as criterion the variance ratio

(N-m) E(EZi )2/n. a aVR = --=a'-=~'_ _

(m-l) [EEZ 2 - E(EZi

)2/n ]ai ia a i a a

I

.IIIIIII

_IIIIIIII

••I

the Nx1 matrix of fitted values is

-3-

Write R for the Nx1 matrix of corrected ranks of responses, and C for

q

Table 1

rrz =Z=O.)ai ia

As a numerical example consider the (artificial) data of Table 1.

A

Sample y Xl X2 R C1 C2 R Z

16 26 12 -7 -1 -3 -1. 72 -5.28

60 10 21 -3 -5 0 -2.90 - .10

1 82 42 24 -1 3 1 2.12 -3.12

126 49 29 3 5 3 4.04 -1.04

137 55 34 4 6 4 5.00 -1.00

44 21 17 -4 -2 -1 -1.54 -2.46

67 28 2 -2 0 -6 -2.28 .28

2 87 5 40 0 -6 7 - .82 .82

100 12 38 1 -4 6 - .04 1.04

142 58 36 5 7 5 5.96 - .96

17 1 8 -6 -7 -5 -5.96 - .04

28 19 1 -5 -3 -7 -4.40 - .60

3 105 41 9 2 2 -4 - .36 2.36

149 48 28 6 4 2 3.08 2.92

160 35 16 7 1 -2 - .18 7.18

Sum 1320 450 315 0 0 0 0 0

Sum of s uares 149282 18296 8977 280 280 280 165.48 114.52

comparing it with the critical value of an F with (m-1, N-m) degrees of

since

freedom. (Note that no explicit correction for the mean is required in VR

the Nxp matrix (here p=2) of corrected ranks of concomitant variates; then

I


I

2. THEORY

-4-

and Table 1 shows Rand Z = R - R. The totals of the scores for the m=3

Let us continue with the case where the concomitant variable X is

I

-'IIIIIII

elIIIIIII-,I

-1where! = (C'C) C'R.

(C'C)-lC'R = (.58) ,.38

EZ. 3 = 11. 82 ,i J.

, .

(189)C'R = ,147

u > 0

u = 0

u < 0

EZ' 2 = -1.28 ,. J.J.

{~

= -~

70 )

280 '

Let

EZil = -10.54 ,i

C' C = (28070

where for this, example

samples are

para~etric analysis of covariance are not satisfied, and the standard test

degrees of freedom. It may be noted that application of the usual pre1imi-

teriori level of significance P ~ .03 using the F distribution with (2,12)

and EEZ~ =114.52, so that finally VR = 4.73, corresponding to an aposJ.a

(N-m) E(EZ. (A») 2In. J.a - aVR(~) = -.:::.;a,~J.~ _

(m-1) [ EE. Zi2a(~) - E(EZ. (A~2 In ]aJ. a i J.a - a

nary procedures produces no evidence that the assumptions required for the

yields F(2,10) = 3.38, P > .075.

p-dimensiona1. For any vector A = (A1 ,A2, •.• ,Ap)' let T(~) be the test

which rejects H for large values ofo

Z. (A) = R. - EAkc~k) •J.a - J.a k J.a

Rank analysis of covariance is then the special case T(!)

where

But for the present let us suppose that ~ is fixed (and hence in particular is

E£!~); and consider the mean and variance of a score Z. (A).J.a -

Now

for 1 < a, b < m.

1 < a < m.

8 = 0,aa

E[R. ] = N81a a

8 - };p,8 where Pb = n./N,·a - b" 0 ab ' 0

E[C(k)] °ia -

E[Zi (A)] = N8 •a - a

Now let (N2-l)};/l2 be the pxp variance matrix of the ranks of

X(k) and x(k) are random observations from identical populations foria jb

and b. Hence, for all ~ ,

-5-

so that

where Ya and Yb are independent random observations from the marginal

distributions of Y in the ath andbth populations, respectively; write

then

and note that

Define

but

Similarly, write

c(k) = };};~(X~k)_ x(k»ia bj 1a jb

since

entries of }; will be equal to land}; will in fact be the correlation matrix;

any a

the concomitant variates; if these variates are continuous then the diagonal

at any rate the diagonal entries will be no greater than 1. Let also

I

I.~

~

1J]

r

]

]e]

1J

I]

j

J,.I

of the ranks of the response with the concomitant variates. Then clearly

I

.1IIIIIII

_IIIIIIII

·1I

give the covariances

Zi (A) = H<P(Y. ,X. ),(Yjb,x.b); ~)a - bj J.a J.a J

If we define

L<P«Yi ,X. ) '(Yj 'Xj ) ;A) = 0j a J.a a a-

-6-

~(~) =-l L P [02 - 2A'n + A'LAJ12 a a a - -a - -

(N2-1)cr2 /l2 be the variance of the rank of a response from the ath populaa

tion, for 1 ~ a ~ m, and let the vector

then under the null hypothesis Ho

Write

for all a.

then we can express the typical score as

note that because

we have

also. We now obtain immediately

and

-7-

assumption B2 is satisfied since

for 1 < a, b < m •8 = cab + o(~Nl)ab m liN

Theorem 1. Let N -+ 00 with n = Np for fixed Pa > 0 1 < a < m· thena a , ,

s~(~)-+ r;Q) Lp 82 in probability,

(N-l) 2 a aa

VR(~) has asymptotically the F-distribution with (m-l, N-m) degrees of

where

that H is true and that r; > 0; then for every fixed A the random variableo

Now consider the asymptotic relative efficiency (ARE) of the test

(N-m)HVR(Q) = -...,}_.:.:.....:=-~-

(m-l) (N-l-H)

Theorem 2. Let N -+ 00 with n = Np for fixed p > 0, 1 < a _< m, and assumea a a

freedom; and the test T(~) is consistent against any alternative for which

LP 82 > o.a aa

These results follow from Theorems 5,6, and 7 of Quade [7J, noting that his

tive HN is true, where under HN

T(~) in the Pitman sense, under the assumption that for each N the alterna-

are related through the equation

The standard test for comparison will be T(Q), which is easily seen to be

a modified version of the familiar Kruskal-Wallis test (KW). (The two tests

]

1_J]

]

1

]

J1Je]

-1

1J]

]

.I

1-

I

-8- I

.IIIIIII

_IIIIIIIIeII

-1L exists, then one

L\(A) 0 2= --=-- =------

~(Q) 02-2~'~+~'L~

In such cases the ARE is the ratio of the

In particular, if

(or KW)

a < m.

~(A) = Lp 02/~(A)- a a -a

ARE of T(~) w.r.t. T(Q)

where

H = 12 E(ER. )2/nN(N+1) a i ~a a

is the Kruska1-Wallis statistic, usually treated as a X2 with (m-1) degrees

entirely equivalent.) Supposing also that

of freedom; asymptotically, in particular with respect to ARE, the tests are

for all ia under the sequence of alternatives {HN

}, the asymptotic distribu

tion of VR(~) will then be noncentra1 F with (m-1, N-m) degrees of freedom

andnoncentra1ity parameter

where

noncentra1ity parameters: i.e.

chosen to minimize (~'L~ - 2~'~).

where 0 2 and n are the values of 0 2 and n common to all populations under H •- a . -a 0

Clearly the most efficient test of this type is one in which ~ has been

say, where Rs might be called the Spearman multiple correlation between the

The ARE of T(~) with respect to T(Q) or the Kruska1-Wa11is test is then

should take

response and the concomitant variates.

-9-

When the concomitant variable is univariate this reduces to

ARE(KW/AV) = 3/rr

for all a

3(l-p~)

rr(l-p~)

= ARE(T(I)/KW) x ARE(KW/AV) x ARE(AV/AC)

3(l-~)

rr(l-R2 )S

=

8* (l)=ex+vi+o/N'

ex = exa

exa

ARE(T(!)/AC) =

the classical parametric analysis of covariance procedure in the situation

ARE(AV/AC) = 1 - ~

The ARE of the test T(~) may also be considered with respect to

of Y given X are normal with constant variance and with

where the latter is appropriate, namely, when the conditional distributions

E[Y. Ix. ] = ex + EBkXi(k)1a 1a a k a

but under HN

parametric analysis of covariance is

where under Ho

Then the ARE of the parametric analysis of variance with respect to the

covariance is

where ~ is the Pearson product-moment multiple correlation between the re

sponse and the concomitant variates; and the ARE of T(Q) or the Kruskal-

hence the ARE of the test T(I) with respect to the parametric analysis of

Wallis test with respect ot the parametric analysis of variance is

where Pp and Ps are the Pearson and Spearman correlations between response

I

I.IIIIIIIIeIIIIIIII·I

-10-

and concomitant variable, related through the equation

Pp = 2 sin(1TPS /6) •

This ARE has maximum value 3/1T = .955 at Pp = Ps = 0 and decreases to

/:3/2 = .866 as Pp and Ps approach 1 or -1.

In general, of course, the vector E is unknown; it would a~pear

that the most reasonable thing to do then would be to estimate it. The ob

vious estimates of ~ and ~ are

~ = 12 e'e n = 12 e'RN(N2-1) N(N2-1)

where e and R are as defined at the end of Section 1. Hence, if e'e is non

singular, takeA -1L = (e'e) e'R = t

i.e., use the rank analysis of covariance test as there proposed. (If e'c

is singular take any vector t such as to minimize the expression

t'e'et - 2~'e'R.)

It may be noted that the elements of ~ and n are U-statistics, and

the elements of t are continuous functions of them. Thus

E[<j>«Yia,Xia)'(Yjb,Xjb);:E) - <j>«Yia,Xia)'(Yjb,Xjb);~)J2 = ol~),

which is the Assumption D required for Theorem 8 of [7J; hence it follows

that the statistic VR(~) in which the elements of L have been estimated from

the data also has asymptotically the F-distribution with (m-1, N~m)degrees

of freedom under the hypothesis. However, the test is valid and consistent

for any choice of the vector ~ , although its efficiency is reduced as ~

departs from T.

I

.IIIIIII

elIIIIIII

••I

3. RELATED WORK

concomitant variable X. Let

and

a = 1,2;W = I:Z.a . 1a1

Zia = R. - tSC,1a 1a

Previously published work related to the problem of nonparametric

HR. C.1a 1a

-11-

At the Fourth Berkeley Symposium, David and Fix [5J proposed several

lZ(N-2)WfVR = -------~---

nlnz(N2-l)(1-r~) - lZWf

down the formulas for rank analysis of covariance in that special case.

only two populations to be compared and the concomitant variable is univariate:

analysis of covariance seems to be limited to the situation where there are

i.e., m=2, p=l. For purposes of comparison it will be convenient to set

Corresponding to the bivariate observation (Yi ,X. ) we then assign the scorea 1a

for i=1,2, ••• ,n , a=1,2, wherea

is the observed Spearman regression coefficient for the response Y on the

then WI = -W2 and the variance ratio is

N(N-2)WfVR = -----..;;:....-

nln2HZ~a - NWf

If there are no ties in the sample we have

ing the homogeneity of two bivariate random variables (Yl,Xl ) and (YZ'XZ)'

methods based on ranks, including three essentially different ones, for test-

I


-12- I

.IIIIIII

_IIIIIIII

••I

a = 1,2.E[Y Ix ] = a + eXa a a a

T =1

For testing

For testing the hypothesis

T1

/n1

(N-1)

In (l-rZ)2 S

They did not require that the marginal distributions of Xl and X2 be the

same, but they did assume that the two regression functions are parallel

lines: i.e. that

they propose the criterion

For small samples they suggest basing the test on the permutation distribu-

tion of T1 given the N pairs of ranks (Ri ,C. ), and they carry out thea J.a

necessary computations for a number of examples. For larger samples they

conclude that under Ho1 the statistic

may be treated as a normal deviate. ,This proposal is essentially rank ana1y-

approximation to the null-hypothesis distribution of the test criterion.

sis of covariance - limited to the case m=2, p=l - with a somewhat different

against general alternatives, David and Fix propose the criterion

I


-13-

and they conclude that

nl(N-l)TZR = -=----

nZ(l-r~)

is distributed approximately as X2 with Z degrees of freedom, at least for

nl and nZ both at least equal to 10. The statistic R has been proposed again,

apparently independently, by Chatterjee and Sen [4] for the general b<ivari

ate two-sample location problem. However, the very generality of the al

ternatives contemplated by this procedure suggests that it will have rela

tively low power under our ass~ption of identical marginals for Xl and XZ'

and we do not discuss it further.

The third proposal of David and Fix, made "on conunon-sense grounds",

is to use a test criterion (their ~ or 8) equivalent to

T3

= LRil - LCil ;

under HoZ T3 may be treated as approximately normally distributed with mean

o and variance nlnz(N+l) (1-rS)/6. Note that the test based on T3

is the

same as the test which would be called T(l) in the notation of Section Z.

It turns out, as will be seen, that this last test is also closely

related to a procedure which was proposed by Bross [3] for the restricted

situation where the response is a simple dichotomy, (although its extension

to more general responses is inunediate). Bross' test, which he calls caVAST

("Covariable Adjusted Sign Test"), has an interesting intuitive justification.

Assume that the response and the concomitant variable are positively associated

within anyone population. Now let (YI,XI ) and «YZ'XZ) be random observa

tions from populations I and Z respectively. Then this pair of observations

may be called concordant if (YI-YZ) (XI-XZ) > 0, Le., if YI > YZ' Xl > X2

or if YI < YZ' Xl < XZ• The occurrence of such pairs might be explained on

the basis of the general positive association between Y and X, without

-14-

supposing any difference between the two popu1ations~ But if Y1 > Y2

, Xl < X2

then the pair is discordant (Bross calls it an inversion) and provides a

clear-cut bit of evidence indicating that responses in population 1 tend to

be greater than in population 2. Similarly, if Yl < Y2, Xl > X2 the pair

indicates that population 2 tends to have greater responses. Consider all

nl n2 intertreatment pairs, let II and 12 be the number of them which'are in

versions indicating greater responses in population 1 and 2 respectively.

Then an intuitively reasonable test is to reject the hypothesis of homogeneity

I

.1IIII

that situation.

an estimate of its variance conditioned upon the numbers of concordant and

variable are independent, and his estimate of variance is applicable only in

III

_IIIIIII

12(1 -I )21 2COVAST =

if II and 12 are too disparate.

For small samples Bross suggests using the permutation distribution

the populations are identically the same but also that the response and co-

of the statistic (11-12), He then attempts to show that, under a "grand null

hypothesis", the statistic

large samples. Unfortunately, this proof contains an error (pointed out to

discordant pairs. The effect of this error is such that COVAST may simply

will have asymptotically the X2-distribution with 1 degree of freedom for

me by N. L. Johnson) in that the unknown variance of (11-12

) is replaced by

be divided by 4 in order to make the stated asymptotic null hypothesis distri-

bution correct; however, Bross' "grand null hypothesis" implies not only that

If, however, we define1 if Yia > Yjb , X. < X

jb1a

</>* (Yia,Xia) ,(yjb,Xjb») = -1 if Y. < Yjb , Xia > Xjb1a

0 otherwise

I·1

I

-15-

and assign scores

tributiona1 problem.

then it is not difficult to verify that

On the other hand, considering more carefully the test T(l), note

= L:L:~*(Y. ,X. ) , (Yjb,Xjb))bj l.a l.a

for i=1,2, ..• ,n , a=1,2,a

variance on the scores Z*, the proof of asymptotic validity being exactly the

The hypothesis of homogeneity may now be tested by performing an analysis of

same as for the scores Z(~) considered previously. This solves Bross' dis-

that= L:L:~{(Y. ,Xi ),(y.b ,X. b);l))z. (1)l.a bj l.a a J J

where

1 if Y. > Yjb " X. < Xjbl.a \ l.a

if Y. > Yjb , X. = Xjb~

l.a l.a

or Y. = Yjb , X. < Xjbl.a l.a

if Y. > Yjb , X. > Xjbl.a l.a

~ (Yia,Xia) , (Yjb ,Xjb) ;1) 0 or Y. = Yjb , X. = Xjbl.a l.a

or Y. < Yjb , X. < Xjbl.a l.a

if Y. == Yjb , X. > Xjb-~

l.a l.a

or Y. < Yjb , Xia = Xjbl.a

-1 if Y. < Yjb , Xia > Xjbl.a

I


I

.1IIIIIII

_IIIIIIII

e'II

-16-

and

EZi1 (1) = 11 + ~P1 - ~P2 - 12i

where P1 and P2 are the number of "partial inversions" indicating greater

responses in populations 1 and 2 respectively. Since the test T(l) can

also be expressed in terms of ranks it is generally simpler to use than that

of Bross (although not in the case of a dichotomous response); but it is not

easy to say which may be preferable if grounds other than simplicity of

computation are considered. The difference in interpretation is that T(l)

gives half-weight to the partial -inversions which Bross discards entirely.

Of course, for efficiency against alternatives of the type considered in

Section 2 the best choice is the rank analysis of covariance test, i.e.,

T(rs) if there are no ties.

In Table 2 all three tests are illustrated using the data which

Bross presented in [3J. The observations are (Y,X) , where X is the birth

weight of a baby with hyaline membrane disease, Y is the outcome (R, or 1,

if "recovered"; D, or 0, if "died"), and the two populations or treatments

are UK (urokinase activated human plasmin) and PL (placebo). Ties on birth

weight were broken by using a second variable (X-ray findings) and the re-

sults of this are indicated by adding (+) or (-) in the table. The reader

may find it instructive to check the computation of a few scores. Bross' test

gives 11 -12 = 21: the variance ratio, computed as described above, turns

out to be 8.93 with (1,23) degrees of freedom, corresponding to a one-tailed

probability level of .003; the level from the permutation distribution, ac

cording to Bross' calculations, is .004. The test T(l) gives (I1¥~P1-~P2-I2) = 26,

with variance ratio 2.55, which is not significant. Rank analysis of covariance

gives W1 = 28.98, with variance ratio 6.25 and corresponding one-tailed proba

bility level about .01.

I -17-

I. Table 2

I Weight Outcome Scores

XTreatment y Rank AnalysisBross T(l) of Covariance

I 1.08 UK D 0 3.5 -3.65

I 1.13 UK R 7 15.0 8.44

1.14 PL D -1 1.5 -4.46

~1.20 UK R 6 13.0 7.63

1. 30 UK R 6 12.0 7.23

3 1.40 PL D -3 - 1.5 -5.67

1.59 UK D -3 - 2.5 -6.08

J1.69 UK R 4 9.0 6.02

1. 88 PL D -4 - 4.5 -6.88

2.00 (-) PL D -4 - 5.5 -7.29

1 2.00 PL D -4 - 6.5 -7.69

]e2.00 (+) PL R 1 5.0 4.40

2.10 PL R 1 4.0 4.00

2.13 UK R 1 3.0 3.60

] 2.14 PL D -7 -10.5 -9.31

J2.15 UK R 0 1.0 2.79

2.18 PL R 0 0.0 2.38

2.30 UK R 0 -1.0 1.98

] 2.40 UK R 0 -2.0 1.58

2.44 PL R 0 . -3.0 1.17

I 2.50 (-) UK R 0 -4.0 .77

(+)(\

-5.0]

2.50 PL R 0 .37

2.70 (-) UK R 0 -6.0 - .04

2.70 UK R 0 -7.0 - .44

J 2.70 (+) UK R 0 -8.0 - .85

I,.I

-18-

4. DISCUSSION

The problem which has been considered is that of comparing two or

more populations with respect to a response variable Y in the presence of a

(possibly multivariate) concomitant variable X. In such situations the stan-

dard method is the parametric analysis of covariance. The assumptions re-

quired for strict validity of this classical procedure are that under H theo

conditional distribution of Y given X be (1) normal with (2) expectation

linearly dependent on X, and (3) variance independent of X. The effects of

violations of the first two of these have been considered by Atiqullah [lJ,

who concluded that the parametric analysis of covariance is much less robust

than the corresponding analysis of variance, and in particular that the re-

quirement of linearity may be critical. In practice one may spend consider-

able time in a preliminary checking of assumptions; and if it is found that

they have been violated, although this may be a result of some interest in

itself, it is often essentially of secondary importance relative to the de-

sired overall comparison.

Rank analysis of covariance avoids all three of the above assump-

tions, and yet, as has been shown, it is still fairly efficient when they

are satisfied. On the other hand, it has been assumed throughout this paper

that the concomitant variable X has the same distribution in each population,

and this requirement is crucial; thus, since the parametric method does have

any such assumption, it is applicable in certain situations where the rank

method is not. Usually this assumption would be established non-statistically,

from considerations of logic, although presumably it could be checked'by a

preliminary analysis of the concomitant variable. Another disadvantage of

the rank method at this time appears to be its unknown behavior for small

I_.IIIIIII

elIIIIIII-.I

I

IIIIIIIIIeIIIIIIII-

I

-19-

samples, say with less than five to ten observations per group.

The results of a rank analysis of covariance may be interpreted in

terms of the probability that a response chosen at random from one popula-

tion will exceed a response chosen at random from another. Such a concept

seems at least as simple and useful as the usual formulation in terms of

means. (The two interpretations are entirely equivalent when the assump-

tions of the parametric method are satisfied.) It is a commmon approach to

problems of this sort, when it seems that a parametric analysis cannot be

justified with the data as they stand, to search for a suitable transforma-

tion. Guidelines for such a search have been almost entirely heuristic,

however; and if a successful transformation is found, its use with the para-

metric analysis may involve considerable additional complication not only in

computation but also in interpretation. Of course, the method proposed here

might be considered as an example of this approach in which the transforma-

tion is that of ranking. It is interesting to note that exactly the same

interpretation may be given to a regression of the ranked responses on the

concomitant variates after applying any transformation whatever, including

ranking, or none, and not necessarily the same to each.

The general principle described in Section 1 is very widely appli-

cable indeed. The tests of form T(~), including rank analysis of covariance

in particular, and the closely related test of Bross, are all examples of it;

and further examples will be considered below.

The following "quick and dirty" method for the case of one concomi-

tant variate is based on a proposal of David and Fix [5J which they in turn

state is a variant of a test due to Mood. Draw the scatter diagram of the

pooled sample of N observations. Fit a curve to these data by any convenient

-20-

method - by eye, if you like - and assign to each individual observation the

score +1 if it lies above the fitted curve, and -1 if it lies below. Now

corresponding to observations above and below the fitted curve, and ni columns,

I_.I

Icorresponding to the different samples; it may be verified that

VR ...(N-m)X2

(m-l) (N-X2 )

II

where VR is the variance ratio statistic of the analysis of variance and X2

is the statistic of the x2-test.) The preceding method could of course be

"slowed down" and "cleaned up" by carefully specifying how to fit the curve.

David and Fix suggest fitting a straight line which, when considered together

with a vertical line drawn through the median of the pooled sample of concomi-

tant variables, divides the diagram into four sectors each containing the

same number of observations. However, any method of fitting, including by

eye, will allow a valid test of the hypothesis of homogeneity if it is done

without any knowledge as to the distribution of the observations among the

samples and if the additional requirement is imposed that the proportions of

observations on each side of the fitted curve not deviate too greatly from

one-half. A proof of this may be derived easily from Theorem 2 of [7J. The

method could obviously be extended to the case of more than one concomitant

variate but would then lose much of its appeal.

Still another approach, commonly used when there are large samples

and only two treatments to be compared, is to pick out pairs of observations

which are "matched" on the concomitant variates and then analyse these by one

of the standard techniques for paired comparisons. This method has the

II

elIIIIIII-.I

I

I.IIIIIIIIe

]

J11

-21-

advantage of allowing the covariates to be completely arbitrary in form, even

purely nominal, and also of allowing any arbitrary relationships among co-

variates and response. On the other hand, it appears to lack power re1a-

tive to the preceding methods if some structure can be imposed on the data.

For further discussion of these points see the recent paper by Bi11ewicz [2J.

I am now in the process of developing a broad class of methods which'app1y

the general principle described above in conjunction with the idea of

matching.

Finally, E1ashoff and Govindaraju1u [6J have recently developed

another method in which I understand that they assume linear regression and

obtain a nonparametric estimate of the slope; but I have not yet seen any of

their results.

-22-

REFERENCES

[lJ Atiqullah t M. t "The robustness of the covariance analysis of a one-wayclassification/I Biometrika t 51 (1964) t 365-372.

[2J Bi11ewicz t W. Z.t "The efficiency of matched samples: an empirical investigation/I Biometrics t 21 (1965) t 623-644.

[3J Bross t Irwin D. J. t "Taking a covariable into account t" Journal of theAmerican Statistical Association, 59 (1964)t 725-736.

[4J Chatterjee t S. K., and Sent "Non-parametric tests for the bivariatetwo-sample location problem," Calcutta Statistical AssociationBulletin t 13 (1964)t 18-58.

[5J David t F. N. t and Fix t Evelyn t "Rank correlation and regression in anonnormal surface t" Proceedings of the Fourth Berkeley Symposiumon Mathematical Statistics and Probability t I (1961), 177-197.

[6J E1ashoff, Robert t and Govindaraju1u t Zakku1a t "Nonparametric covarianceanalysis I: testing methods," presented to the Western RegionalMeeting of the Institute of Mathematical Statistics t Ju1Yt 1965.

[n Quade, Dana t "On analysis of variance for the k-samp1e problem,"Institute of Statistics t University of North Caro1ina t MimeoSeries No. 453. To appear in the Annals of Mathematical Statistics.

I

••IIIIIII

_IIIIIIII

••I

Download - I Ieboos/library/mimeo.archive/... · 2019-07-25 · I I. I I I I I I I Ie I I I I I I I I· I RANK ANALYSIS OF COVARIANCE by Dana Quade University of North Carolina Institute of

Top Related