comparing survival times for treatments with those for a control under proportional hazards

15
Lifetime Data Analysis, 4, 265–279 (1998) c 1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Comparing Survival Times for Treatments with Those for a Control Under Proportional Hazards BAHADUR SINGH F. T. WRIGHT [email protected] Department of Statistics, University of Missouri, Columbia, MO 65211-4100 Received January 16, 1997; Revised November 10, 1997; Accepted February 17, 1998 Abstract. Inferences for survival curves based on right censored data are studied for situations in which it is believed that the treatments have survival times at least as large as the control or at least as small as the control. Testing homogeneity with the appropriate order restricted alternative and testing the order restriction as the null hypothesis are considered. Under a proportional hazards model, the ordering on the survival curves corresponds to an ordering on the regression coefficients. Approximate likelihood methods, which are obtained by applying order restricted procedures to the estimates of the regression coefficients, and ordered analogues to the log rank test, which are based on the score statistics, are considered. Mau’s (1988) test, which does not require proportional hazards, is extended to this ordering on the survival curves. Using Monte Carlo techniques, the type I error rates are found to be close to the nominal level and the powers of these tests are compared. Other order restrictions on the survival curves are discussed briefly. Keywords: chi-bar-squared distributions, log rank tests, order restricted hypotheses, proportional hazards models, simple tree ordering 1. Introduction Suppose that S 1 is the survival curve associated with a control, S 2 ,...,S m are associated with m - 1 treatments, and that the treatments are known to have survival times at least as large as the control. To determine if any of the treatments are significantly better than the control, one may test homogeneity, H 0 : S 1 = ··· = S m , with the alternative restricted by the simple tree ordering, H 1 : S 1 S j for j =2,...,m, where S i = S j or S i S j means S i (x)= S j (x) or S i (x) S j (x) for all x, that is one tests H 0 versus H 1 - H 0 . On the other hand, if the ordering, H 1 , were in question then one may test H 1 versus H 2 with H 2 the complement of H 1 . Many of the results on testing order restricted hypotheses, cf. Robertson et al. (1988), require complete samples or do not allow for covariate information. Some of the first tests of H 0 versus H 1 - H 0 are given for normal means by Dunnett (1955) and Bartholomew (1961) and in a nonparametric setting, by Steel (1959). We consider tests of H 0 versus H 1 - H 0 and of H 1 versus H 2 which allow for right-censored data and in the case of proportional hazards, for the use of covariate information. Singh and Wright (1996) give a detailed bibliography on tests of these hypotheses with H 1 replaced by the simply ordered alternative, S 1 S 2 ≤···≤ S m . Three types of tests are considered; the first two are based on the assumption of propor- tional hazards and the third one is not. Using Cox’s (1972, 1975) proportional hazards model with noninformative censoring, we consider a score test studied by Sen (1984) and

Upload: bahadur-singh

Post on 28-Jul-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Lifetime Data Analysis, 4, 265–279 (1998)c© 1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.

Comparing Survival Times for Treatments withThose for a Control Under Proportional Hazards

BAHADUR SINGH

F. T. WRIGHT [email protected] of Statistics, University of Missouri, Columbia, MO 65211-4100

Received January 16, 1997; Revised November 10, 1997; Accepted February 17, 1998

Abstract. Inferences for survival curves based on right censored data are studied for situations in which it isbelieved that the treatments have survival times at least as large as the control or at least as small as the control.Testing homogeneity with the appropriate order restricted alternative and testing the order restriction as the nullhypothesis are considered. Under a proportional hazards model, the ordering on the survival curves correspondsto an ordering on the regression coefficients. Approximate likelihood methods, which are obtained by applyingorder restricted procedures to the estimates of the regression coefficients, and ordered analogues to the log ranktest, which are based on the score statistics, are considered. Mau’s (1988) test, which does not require proportionalhazards, is extended to this ordering on the survival curves. Using Monte Carlo techniques, the type I error ratesare found to be close to the nominal level and the powers of these tests are compared. Other order restrictions onthe survival curves are discussed briefly.

Keywords: chi-bar-squared distributions, log rank tests, order restricted hypotheses, proportional hazards models,simple tree ordering

1. Introduction

Suppose thatS1 is the survival curve associated with a control,S2, . . . , Sm are associatedwith m− 1 treatments, and that the treatments are known to have survival times at least aslarge as the control. To determine if any of the treatments are significantly better than thecontrol, one may test homogeneity,H0 : S1 = · · · = Sm, with the alternative restrictedby the simple tree ordering,H1 : S1 ≤ Sj for j = 2, . . . ,m, whereSi = Sj or Si ≤ SjmeansSi(x) = Sj(x) orSi(x) ≤ Sj(x) for all x, that is one testsH0 versusH1−H0. Onthe other hand, if the ordering,H1, were in question then one may testH1 versusH2 withH2 the complement ofH1.

Many of the results on testing order restricted hypotheses, cf. Robertson et al. (1988),require complete samples or do not allow for covariate information. Some of the first testsof H0 versusH1 − H0 are given for normal means by Dunnett (1955) and Bartholomew(1961) and in a nonparametric setting, by Steel (1959). We consider tests ofH0 versusH1 − H0 and ofH1 versusH2 which allow for right-censored data and in the case ofproportional hazards, for the use of covariate information. Singh and Wright (1996) give adetailed bibliography on tests of these hypotheses withH1 replaced by the simply orderedalternative,S1 ≤ S2 ≤ · · · ≤ Sm.

Three types of tests are considered; the first two are based on the assumption of propor-tional hazards and the third one is not. Using Cox’s (1972, 1975) proportional hazardsmodel with noninformative censoring, we consider a score test studied by Sen (1984) and

266 SINGH AND WRIGHT

later by Silvapulle and Silvapulle (1995) and a Wald-type approach studied by Silvapulle(1994), which also yields estimates of the regression coefficients whose ordering is consis-tent withH1. Finally, because Singh and Wright (1996) found that the test given by Mau(1988) performed well in the simply ordered case if the assumption of proportional hazardsis violated, we extend Mau’s test to the simple tree ordering.

For the order restricted hypotheses considered here with proportional hazards, we rec-ommend the score tests, however Wald’s tests have essentially the same powers for light tomoderate censoring (censoring proportion between 0% and 25% in our study). For heavycensoring (censoring proportion between 50% and 75%), the score tests are clearly pre-ferred. However, in the case of heavy censoring, one needs to be aware of the possibility ofadverse treatment effects which could mean that the assumption of noninformative censor-ing may not hold. For the cases considered which do not have proportional hazards, Mau’stest has the largest powers except for heavy censoring and in such cases, the score tests hadthe largest powers. In all the cases considered, the score tests had powers at least 95% aslarge as Mau’s test.

The three tests are discussed in Section 2, simplifications of the score test and Wald’s testare described in Section 3 for the case in which covariate information is not available, andthe results of a Monte Carlo study are presented in Section 4. Finally, Section 5 containssome observations about generalizations to other orderings including the inverted tree, i.e.S1 ≥ Sj for j = 2, . . . ,m.

2. The Three Tests.

First, we use Cox’s (1972) proportional hazards model and adopt notation like that inLawless (1982, Chapter 7). Withp ≥ m − 1, let x = (x1, . . . , xp)′ be a known vector ofcovariates where the firstm − 1 coordinates indicate the subject’s treatment group. Thehazard function of an individual with covariate vectorx is h(t|x) = h0(t) exp(β′x) withβ = (β1, . . . , βp)′ an unknown vector of regression coefficients andh0(t) the baselinehazard function withx = 0.

To compare them survival distributions, partitionx into (x(1)′, x(2)′)′ with x(1) of di-mensionm − 1. For individuals in groups2, . . . ,m, i.e. the treatment groups,x(1) =(1, 0, . . . , 0)′, x(1) = (0, 1, 0, . . . , 0)′, . . . , x(1) = (0, 0, . . . , 1)′, respectively, and for in-dividuals in group 1, the control,x(1) = (0, 0, . . . , 0)′. If there is no additional covariateinformation, i.e.p = m− 1, then withπi+1 = exp(βi), the survival functions are given by

S1(t), S2(t) = [S1(t)]π2 , · · · , Sm(t) = [S1(t)]πm .

Assume there areN individuals in the study withNj from groupj for j = 1, . . . ,m.Each individual has a covariate vectorx, a failure timeTi and a censoring timeCi withT1, C1, . . . , TN andCN independent and the distributions of the censoring times not de-pending on the parameters of the lifetime distributions. One observes the smaller ofTi andCi and knows if the observed time is due to failure or censoring. For theN individuals,thek distinct failure times, with censoring times not included, aret(1) < . . . < t(k). LetRi = R(t(i)) be the risk set at timet(i), that is the set of individuals alive and uncensoredjust prior tot(i), let the number inRi beni, n0 = N , and let the number of deaths at timet(i) bedi with d0 = 0. Of theni at risk att(i), nji are from groupj, and of thedi deaths

COMPARING TREATMENTS WITH A CONTROL 267

at t(i), dji are from groupj. If all di = 1, then there are no ties in the failure times. Theordered scores and the Wald tests discussed here are appropriate if there are relatively fewties. Otherwise, the results mentioned in Section 5 can be applied.

The partial likelihood,L1(β), for the case of relatively few ties andI(β), the matrixcontaining minus the second partial derivatives of logL1(β), are given in (7.2.4) and(7.2.7) of Lawless (1982). In this setting, the hypotheses are

H ′0 : βj = 0, 1 ≤ j ≤ m− 1, H ′1 : βj ≤ 0, 1 ≤ j ≤ m− 1 andH ′2 :∼ H ′1. (2.1)

Score tests. Partition β into (β(1)′, β(2)′)′ with β(1) of dimensionm − 1. Let β(2)

=(βm, . . . , βp)′ maximize the partial likelihood whenβ(1) = 0. The partial likelihoodscore vector isU = (U1, U2, . . . , Um−1)′ with

Ur =∂logL1(β)

∂βr|β = β for r = 1, 2, . . . ,m− 1 (2.2)

whereβ = (β(1)′, β

(2)′)′andβ

(1) = 0. Let Γ = (Γrt) with

Γrt = I(βrt) = −∂2logL1(β)∂βr∂βt

|β = β for r, t = 1, 2, . . . , p, (2.3)

and analogous to the partitions ofx, β andβ, partitionΓ intoΓ11(m−1×m−1),Γ12(m−1× p−m+ 1), Γ21(p−m+ 1×m− 1), andΓ22(p−m+ 1× p−m+ 1). All partitionsof p dimensional vectors andp × p matrices are like those forβ andΓ. UnderH0, U isasymptotically normal with mean 0 and covariance

Σ = Γ11 − Γ12Γ−122 Γ21. (2.4)

We reparameterize in terms ofη = −β and use the score vectorX = −U . With K= {x : xi ≥ 0 for i = 1, 2, · · · ,m − 1} them − 1 dimensional positive orthant anddistance determined by the inner product< a, b >Σ= a′Σb, the projection ofa ontoK isdenoted byπΣ(a|K) and it

minimizes(a− b)′Σ(a− b) subject tobεK. (2.5)

Projections ontoK can be computed using quadratic programming, and in fact, one can usethe IMSL subroutine QPROG.

Applying the union-intersection principle, Sen (1984) obtains a test ofH ′0 versusH ′1−H ′0.Using a different rationale, Silvapulle and Silvapulle (1995) derive a test which is essentiallythe same as Sen’s test. (Silvapulle (1994) gives another score test, but it is not as simpleto use because it requires the computation of estimates ofβ underH ′1.) With no covariateinformation, i.e. p = m − 1, and local alternatives, i.e.β = β0/

√N with −β0εK,

they noted thatΣ−1X has an approximate normal distribution with meanη = −β andcovarianceΣ−1. Acting as if

Σ−1X ∼ N(η,Σ−1) with Σ−1known and nonsingular, (2.6)

268 SINGH AND WRIGHT

they obtain an approximate test by using the likelihood ratio test ofη = 0 versusηεK−{0}.This test rejectsH0 for large values of

S01 = a′Σa, (2.7)

wherea = πΣ(∑−1

X|K). Even if there is covariate information, (2.7) is defined and itsapproximate distribution underH ′0 is given below, and thus the test can be applied in thismore general setting. Withα denoting the significance level of a test, the results in Singhand Wright (1996) show that forα ≤ 1

2 , Sen’s test and the one basedS01 are equivalent.Before describing the null distribution ofS01, we consider a test ofH ′1 versusH ′2. Again,

acting as if (2.6) holds and applying the likelihood ratio test ofηεK versusη6 εK, one rejectsH ′1 for large values of

S12 = (Σ−1X − a)′Σ(Σ−1X − a). (2.8)

The results in Kudo (1963) and Shapiro (1985) lead to the following approximations. LetP (l,m− 1; Σ−1) be the probability underH ′0 thata has exactlyl positive components forl = 0, 1, . . . ,m− 1. With c1 andc2 constants,χ2

v denoting a chi-squared random variablewith v degrees of freedom andχ2

0 ≡ 0,

q01(c1) ≡ P [S01 ≥ c1] .=m−1∑l=0

P (l,m− 1; Σ−1)P [χ2l ≥ c1], and (2.9)

q12(c2) ≡ supH′1P [S12 ≥ c2] .=m−1∑l=0

P (l,m− 1; Σ−1)P [χ2m−1−l ≥ c2]. (2.10)

Computation of theP (l,m− 1; Σ−1) is discussed in Singh and Wright (1996). Letc01(α)andc12(α) be theα level critical values forS01, andS12, respectively. Of course,q01(c1),q12(c2), c01(α) andc12(α) depend onΣ, but we do not show this in the notation.

Simplifications of the score tests for the special case in which there is no covariateinformation are discussed in Section 3. Powers ofS01 are studied by Monte Carlo techniquesin Section 4.

Wald tests. Silvapulle (1994) proposed a Wald-type test, and the statistic presented hereis asymptotically equivalent to his but simpler to compute. Tsiatis (1981) showed thatβ, the unrestricted maximum partial likelihood estimator ofβ, has an asymptotic normaldistribution. The Wald-type tests studied here are obtained by acting as if

β ∼ N(β, I(β)−1) with I(β)−1 knownand nonsingular (2.11)

and applying the appropriate order restricted tests. Partitionβ into β(1) andβ(2) andI(β)into I11, I12, I21 andI22. From Robertson et al. (1988, p 217), sinceH ′0 andH ′1 onlyinvolveβ(1) and (2.11) holds, the likelihood ratio tests ofH ′0 versusH ′1 −H ′0 and ofH ′1versusH ′2 can be based onβ(1). The restricted estimate,β(1), minimizes the followingsubject toβ(1)εH ′1 :

(β(1) − β(1))′V (β(1) − β(1)) whereV = I11 − I12I−122 I21. (2.12)

COMPARING TREATMENTS WITH A CONTROL 269

With

T01 = β(1)′V β(1) andT12 = (β(1) − β(1))′V (β(1) − β(1)), (2.13)

the results in Shapiro (1985) show that one rejectsH ′0 in favor ofH ′1−H ′0 for large values ofT01 and that one rejectsH ′1 in favor ofH ′2 for large values ofT12. Shapiro (1985) also showsthat withc1 andc2 constants,p01(c1) ≡ P [T01 ≥ c1] andp12(c2) ≡ supH′1P [T12 ≥ c2]can be approximated by (2.9) and (2.10) withΣ replaced byV . We letb01(α) andb12(α)be theα level critical values forT01 andT12. Again,p01(c1), p12(c2), b01(α) andb12(α)depend onV , but we do not show this in the notation.

Mau’s test. The tests developed in this section are not based on the assumption ofproportional hazards and do not allow for covariate information. For theath individualin the ith group, letZia be the smaller of the individual’s survival time and censoringtime and letδia = 1 if Zia is a survival time andδia = 0 if Zia is a censoring time.Let Ψ(Zia, δia;Zjb, δjb) = 1 if Zia < Zjb andδia = 1, Ψ(Zia, δia;Zjb, δjb) = −1 ifZia > Zjb andδjb = 1 andΨ(Zia, δia;Zjb, δjb) = 0 otherwise. For1 ≤ i, j ≤ m, let

Wij =Ni∑a=1

Nj∑b=1

Ψ(Zia, δia;Zjb, δjb),Wi =m∑j=1

Wij andW i = Wi/Ni. (2.14)

Because the survival times tend to be no larger for the control group, one expectsW 1 ≥W j

for j = 2, . . . ,m. Following Mau (1988), theWi which satisfy this ordering and are“closest” to theW i are used in Breslow’s test statistic, i.e. (12) of Breslow (1970).

Let T = {(x1, x2, . . . , xm)′ : x1 ≥ xj for j = 2, . . . ,m} and for eachx,wεRm withwi > 0 for i = 1, 2, . . . ,m, let πw(x|T ) solve

minimizem∑i=1

wi(xi − ξi)2 subject toξ = (ξ1, ξ2, . . . , ξm)′εT . (2.15)

WithW = (W 1,W 2, . . . ,Wm)′ andw = (N1/N,N2/N, . . . , Nm/N)′, letW = πw(W |T ).If WεT , thenWi = W i, but if not W = (W1, W2, . . . , Wm)′ is given by−g∗ from thealgorithm in Example (1.3.2) of Robertson et al. (1988) withg = −W and weight vectorw. The analogue of Mau’s test ofH0 versusH1 −H0, rejectsH1 for large values of

M01 =m∑i=1

NiW2i /(Nσ)2 (2.16)

with σ given in Appendix A of Mau (1988). The corresponding test ofH1 versusH2 rejectsH1 for large values of

M12 =m∑i=1

Ni(W i − Wi)2/(Nσ)2. (2.17)

Let PT (l,m;w) be the level probabilities for a simple tree ordering with independentnormally distributed sample means which are discussed in Robertson et al. (1988, pp 82-

270 SINGH AND WRIGHT

84 and pp 136-141). Withc1 andc2 constants, the following approximations are justifiedin the Appendix: underH0,

P [M01 ≥ c1] .=m∑l=1

PT (l,m;w)P [χ2l−1 ≥ c1], and (2.18)

supH1P [M12 ≥ c2] .=

m∑l=1

PT (l,m;w)P [χ2m−l ≥ c2]. (2.19)

The approximation given in (2.18) and the power ofM01 are studied in Section 4.

3. Simplifications of the Tests.

For the case in which there is no covariate information, i.e.p = m − 1, we considerapproximations to the null distributions ofS01, S12, T01 andT12 which are simpler thanthose given in Section 2 and approximations to the projections which are simpler to computethana andβ. Throughout this section, we assumep = m− 1.

Approximations to the null distributions. The null distributions ofT01 andT12 are de-termined by theP (l,m − 1;V −1) and in the casep = m − 1, V = I(β). As in Singhand Wright (1996), we first approximateV by I(0), but as they noted this does not sim-plify the computation of the level probabilities or the projections. However, an additionalapproximation is very helpful. Before discussing the second approximation, we note thatthe null distributions ofS01 andS12 depend onP (l,m− 1; Σ−1) whereΣ = I(0). So theapproximation will be useful for approximating the significance level ofS01 andS12 also.I(0) is given by (7.2.22) in Lawless (1982) withr ands replaced byr + 1 ands + 1,

respectively. UnderH0, the proportion of individuals at risk at timet(i) from populationj, nji/ni, can be approximated bywj = Nj/N . This yields the desired approximation

I(0) ≈ Ia(0) ≡ C(A−B) whereC =k∑i=1

dini − dini − 1

(3.1)

andA andB are(m − 1) × (m − 1) matrices withA = diag(w2, . . . , wm) andBij =wi+1wj+1 for 1 ≤ i, j ≤ m − 1. It is straightforward to verify thatIa(0)−1 = (A−1 +w−1

1 J)/C whereJ is an(m− 1)× (m− 1) matrix withJij ≡ 1.Because the multiplicative constantC−1 does not change the level probabilities, we

approximateP (l,m − 1;V −1) as well asP (l,m − 1; Σ−1) by P (l,m − 1;CIa(0)−1).However,P (l,m−1;CIa(0)−1) = PT (l+1,m;w) which are the same level probabilitiesas in the subsection on Mau’s test. The approximations to (2.9) and (2.10) which are obtainedby replacingP (l,m − 1; Σ−1) by PT (l + 1,m;w) are denoted byr01(c1) andr12(c2).Thusr01(c1) andr12(c2) provide approximations toq01(c1) q12(c2) as well asp01(c1) andp12(c2). Choosingd01(α) andd12(α) so thatr01(d01(α)) = α andr12(d12(α)) = α,yields approximations tob01(α) andb12(α) as well asc01(α) andc12(α). The accuraciesof the approximations,b01(α), c01(α) andd01(α), are discussed in the next section.

COMPARING TREATMENTS WITH A CONTROL 271

Simplified projections. We consider approximating the ordered score statistics first. Recallthat forp = m − 1,Σ = I(0). As in Singh and Wright (1996), we approximatea by a∗

where

a = πΣ(Y |K), a∗ = πIa(0)(Y |K) andY = Σ−1X.

Because a simple tree projection can be converted to an orthant projection, setg1 = 0andgi+1 = Yi for i = 1, 2, · · · ,m − 1. Obtaing∗ by applying the algorithm in Example1.3.2 of Robertson et al. (1988) tog with weight vectorw (recallwi = Ni/N). Thena∗i = g∗i+1 − g∗1 for i = 1, 2, · · · ,m − 1. For testingH ′0 versusH ′1 −H ′0 andH ′1 versusH ′2, we propose the test statistics

S′01 = a∗′Σa∗ andS′12 = (Σ−1X − a∗)′Σ(Σ−1X − a∗) (3.2)

with the simplified critical valuesd01(α) andd12(α).For the Wald tests, we approximateβ by β∗ where

β = −πI(β)(−β|K) andβ∗ = −πIa(0)(−β|K). (3.3)

As in the case of score tests, setg1 = 0 andgi+1 = −βi for i = 1, 2, · · · ,m − 1, obtaing∗ by applying the algorithm in Example 1.3.2 of Robertson et al. (1988) tog with weightvectorw. Thenβ∗i = g∗1 − g∗i+1 for i = 1, 2, · · · ,m− 1. We propose the test statistics

T ′01 = β∗′I(β)β∗ andT ′12 = (β − β∗)′I(β)(β)(β − β∗) (3.4)

for testingH ′0 versusH ′1−H ′0 andH ′1 versusH ′2 with the simplified critical valuesd01(α)andd12(α).

In the Monte Carlo study discussed in the next section, we found that ifI(β) is replaced byIa(0) in the definition ofT ′01, then the significance levels are too large. A similar commentholds forS′01. The tests based onS′01 andT ′01 are studied in the next section.

4. A Monte Carlo Study of the Powers

A Monte Carlo study was conducted for the tests ofH0 versusH1 −H0 with p = m− 1(i.e. no covariate information) to assess the accuracy of the approximations presented inSections 2 and 3 and to compare the powers of the tests proposed here. The estimates ofrejection probabilities given in this section are all based on 10,000 replications. Lifetimeswith proportional hazards and baseline survival distributions which are exponential andWeibull with shape parameter equal to two were considered. Singh and Wright (1996) alsoconsidered lifetimes with proportional hazards and baseline survival distributions which arelognormally distributed. Because their conclusions did not seem to depend on the form ofthe lifetime distributions, we do not consider such proportional hazards models. However, toassess the usefulness of the tests developed here if the hazards are not proportional, lifetimeswhich are lognormally distributed are considered. Withi = 1, 2, · · · ,m, j = 1, 2, · · · , Ni,andβ0 = 0, lifetimes,Vij , were generated with survival functions

Si(t) = exp{−t exp{βi−1}/θf}in the exponential case, (4.1)

272 SINGH AND WRIGHT

Si(t) = exp{−(t/θf )2exp{βi−1}} in the Weibull case, (4.2)

and withΦ the cumulative distribution function of the standard normal distribution,

Si(t) = 1− Φ((ln t− µf + βi−1)/σ) in the lognormal case. (4.3)

The following two types of independent censoring were considered: (1) same type cen-soring, i.e. the censoring distribution is the same as the baseline lifetime distribution exceptthatθf orµf is replaced byθc or µc, and (2) uniform censoring, i.e. the censoring distribu-tion is uniform(0, τ). The censoring variables,Cij , were generated with all theVij andCijindependent. IfVij < Cij the lifetimeVij was observed, and ifVij ≥ Cij the censoringtimeCij was observed.

If the same increasing transformation is applied to both the failure times,Vij , and thecensoring times,Cij , then the number oft(i), the risk setsRi and thedji are unchanged.Hence, all of the test statistics considered here are unchanged. Thus, we only need toconsiderθf = 1, µf = 0 andσ = 1. With Weibull failure times and parametersβi, Weibullcensoring and parameterθc, and shape parameter two for both failures and censoring,the powers of each of the tests considered forH0 versusH1 − H0 are the same as forexponential failures with parametersβi and exponential censoring with parameterθ2

c . Inthe tables, power estimates are given for exponential failures with exponential censoringbut not for Weibull failures with Weibull censoring.

In this study, we consideredm = 3, 4, 5, N1 = N2 = · · · = Nm = N andN1 = 2NandN2 = · · · = Nm = N withN = 10, 20, 50 and100. Following Lininger et al. (1979),with average censoring proportion,π, equal to 0.10, 0.25, 0.50 and 0.75 andg denoting thecensoring density, the parameterθc, µc or τ is obtained by solving

Nπ =m∑i=1

Ni

∫ ∞0

g(t)Si(t)dt. (4.4)

The censoring density isg(t) = exp{−t/θc}/ θc for 0 < t < ∞, g(t) = 2(t/θ2c ) exp

{−(t/θc)2} for 0 < t < ∞, g(t) = (1/t)φ(ln t − µc) for 0 < t < ∞ whereφ is thestandard normal density, org(t) = 1/τ for 0 < t < τ . Complete samples, i.e. those withπ = 0, were considered also.

The ordered score test is based onS01 with critical valuesc01(α) and is denoted SCR. Withthe simplified critical valuesd01(α), it is denoted SCR1. With a target level ofα = 0.05, inall of the cases considered here these two versions of the ordered score test have estimatedsignificance levels that typically do not differ by more than 0.002 and the largest differenceis 0.003. Also, the differences in power are not substantial, and in fact, the differences,SCR1 minus SCR, range from -0.001 to 0.011. Thus, we only give the values for SCR1in the tables. The test based on the statisticsS′01, which uses the approximate projectiona∗ with the simplified critical valuesd01(α) is denoted SCR2. The Wald statistic,T01,with critical valueb01(α), and with the critical valued01(α) are denoted by WALD andWALD1, respectively. As with the ordered score tests, there is little difference betweenthe performances of WALD and WALD1. However, for heavy censoring, WALD1 is moreconservative than WALD, and we only report values for WALD in the tables. The testbased onT ′01, which uses the simplified restricted estimateβ∗ and the approximate critical

COMPARING TREATMENTS WITH A CONTROL 273

Table 1. Estimated significance levels times 1,000 withα = 0.05, N1 = N2 = ... = N , average censoringproportionπ and 10,000 replications.

valued01(α), is denoted WALD2. Mau’s (1988) test is denoted MAU. Finally, to givesome indication of the gains in power that arise from using an order restricted tests whenthe hypothesized ordering is correct, the log rank test, which is not one-sided, was includedand it is denoted by LOGR in the tables.

The Monte Carlo study was conducted like that in Singh and Wright (1996) except that theapproximating projections for SCR2 and WALD2 were computed using the tree algorithmin Example 1.3.2 of Robertson et al. (1988). Also, the critical values for SCR1, SCR2,WALD1 and WALD2 were taken from Table A.5 Robertson et al. (1988) for equal samplesizes, and forN1 = 2N2 andN2 = · · · = Nm, they were obtained from the results inRobertson et al. (1988, pp 82-84).

Significance levels. Form = 3 and 5, equal sample sizesN = 20 and 50, averagecensoring proportionsπ = 0.10, 0.25, 0.50 and0.75, and a target significance levelα =0.05, the estimated significance levels, i.e. estimated powers atβ = (0, 0, · · · , 0)′, of thevarious tests are given in Table 1. The columns of the table are labeled first by the lifetimedistribution and second by the censoring distribution, i.e. the column labeled LU gives theestimated values for lognormally distributed lifetimes and uniformly distributed censoringtimes.

The significance levels of all of the tests seem acceptable except that for small N andlight to moderate censoring, the levels of LOGR and SCR2 are too large and for smallN and heavy censoring, the other tests are too conservative. In both of these cases, thediscrepancies become more noticeable for smaller N and larger m. To be specific, for

274 SINGH AND WRIGHT

N = 20, π = 0.1 andm = 3 (m = 5) the estimated values for LOGR range from 0.063 to0.065 (0.066 to 0.067) and the estimated values for SCR2 range from 0.058 to 0.063 (0.066to 0.071).

We also estimated the significance levels of these tests with a target level ofα = 0.05 andN1 = 2N andN2 = · · · = Nm = N . Because the estimated significance levels in the caseof equal sample sizes were similar for the six cases of lifetime distributions with censoringdistributions, we only considered exponential lifetimes with exponential censoring. Theconclusions are like those in the case of equal sample sizes, but the estimated significancelevels of all the tests are smaller than for equal sample sizes. Thus the levels for SCR1,SCR2, WALD, WALD2 and MAU are more conservative for smallN and heavy censoringthan in the case of equal sample sizes. For instance withm = 5, N = 20 (N=50) andπ = 0.75, the estimate for WALD is 0.016 (0.032), for SCR2 is 0.029(0.042), and for MAUit is 0.029(0.045).

Powers. We first discuss the powers of these tests in the case of proportional hazards.The powers of the tests were estimated at two types of alternatives,

βL = c(−1,−1, · · · ,−1)′, andβS = c(−1, 0, · · · ,−0)′, (4.5)

wherec, which depends on the sample sizes, the lifetime distribution and the censoringdistribution, is chosen to make the power of SCR atβL about 0.95 for an average censoringproportion ofπ = 0.10, and the value ofc for βS is chosen to give a power of about 0.90for SCR with an average censoring proportion ofπ = 0.10. These two types of alternativeswere included because Bartholomew (1961) conjectured that they yield the “largest” andthe “smallest” powers for the corresponding likelihood ratio test for normal means basedon independent samples, see the discussion in Robertson et al. (1988, p.94). However,the power function of these tests are more complex and it is not clear how Bartholomew’sconjecture should be extended to this setting.

Tables 2 and 3 contain the power estimates forβL andβS with m = 3, 5, N1 = N2 =· · · = Nm = 20, 50 andπ = 0.10, 0.25, 0.50, 0.75. The columns in these tables labeledLU, which are for lognormal failure times and uniform censoring times, will be discussedin the subsection on nonproportional hazards.

For both of the alternatives considered,βL andβS , the order scores tests have better powersthan the other tests. We recommend SCR1 for small N and light to moderate censoring,N = 20 and0 ≤ π ≤ 0.25, because the significance levels of SCR2 seem too large in suchcases. (Recall that SCR and SCR1 have similar powers, but SCR1 is easier to use.) Forother(N, π) we recommend SCR2. Thus if one desires a test that performs well over theentire simple tree alternative,βi ≤ 0 for i = 1, · · · ,m − 1, this is our recommendation.However, except for the cases withN = 20 andπ = 0.5 or π = 0.75, if one uses WALD,then the loss in power relative to the recommenced ordered scores test is at most about 6%.Furthermore, the increase in estimated power of the recommended ordered scores test overthe log rank test is between 3% and 41% in the cases considered.

It should be noted that in several cases, the log rank test, which is not one-sided out-performed Mau’s test. This may be due in part to the fact that the log rank test has largersignificance levels than Mau’s test. However, it appears that the gain in power in Mau’stest due to the ordering (the tree ordering is a weak type of ordering) is offset by the gainin power due to proportional hazards in the log rank test.

COMPARING TREATMENTS WITH A CONTROL 275

Table 2. Estimates of power times 1,000 when the treatments are equally better than the control, i.e. atβL =c(−1,−1, ...,−1)′ with α = 0.05, N1 = N2 = ... = Nm = N , average censoring proportionπ, and 10,000replications.

With N1 = 2N andN2 = · · · = Nm = N andm,N andπ as above, we obtainedestimates of the powers of these tests for exponential lifetimes and exponential censoringdistributions. The conclusions are the same as for the case of equal sample sizes.

Nonproportional hazards. To study the tests when the assumption of proportional hazardsis violated, we consider lifetimes determined by (4.3) withσ = 1, µf = 0, β = βL andβ = βS , see (4.5) for definitions ofβL andβS . The constantc in βL or in βS is chosen asin the last subsection. Uniform and lognormal censoring are considered and the constants,τ andµc, are obtained by solving (4.4). Estimates of the significance levels of the testsin this setting are given in the columns labeled LU and LL in Table 1. With lognormalfailures, the relative powers of these tests are similar for lognormal censoring and uniformcensoring. Thus, we only give the estimated powers for the case LU in Tables 2 and 3.

In each case (except one which may be due to Monte Carlo error), the estimate of powerfor Mau’s test is larger than any of the other tests. However, in all of the cases considered,the recommended ordered scores test has powers at least95% as large as Mau’s test. Forthese log normal failures, if one uses Mau’s test, the gain in power over the (unordered) logrank test ranges from6% to 28%.

Tests of the validity of the assumption of proportional hazards are discussed in severalreferences including Arjas (1988) and Horowitz and Neumann (1992).

276 SINGH AND WRIGHT

Table 3. Estimates of power times 1,000 when only one treatment is better than the control, i.e. atβS =c(−1, 0, ..., 0)′ with α = 0.05, N1 = N2 = ... = Nm = N , average censoring proportionπ, and 10,000replications.

5. Concluding Remarks

In this section, methodology appropriate for other order restrictions and grouped data arementioned briefly.

Inverted tree ordering. If one believes that the survival times are at least as large for thecontrol as for all of the treatments, then the corresponding hypothesis isH1: S1 ≥ Sj forj = 2, · · · ,m. For the score tests, one only needs to change the score vector toX = U .For the Wald tests, one only changesH ′1 in (2.1) to βj ≥ 0 for j = 1, 2, · · · ,m − 1,and the simplified projection is computed as follows:g1 = 0, gi+1 = ηi = βi for i =1, 2, · · · ,m − 1, g∗ is obtained by applying the algorithm in Example 1.3.2 of Robertsonet al. (1988) tog with weightswi = Ni/N, andβ∗i = g∗i+1 − g∗1 for i = 1, 2, · · · ,m− 1.For Mau’s tests, only the computation ofW needs to be changed. In particular,W is givenby g∗ from the algorithm in Example (1.3.2) of Robertson et al. (1988) withg = W andweightswi = Ni/N . In this case,W1 ≤ Wj for j = 2, 3, · · · ,m.

Other order restrictions. For<∼ a partial ordering on{1, 2, · · · ,m}, letw be defined as

in Mau’s test in Section 2, letP (l,m;w) be the level probabilities discussed in Section 2.4

of Robertson et al. (1988), and letH1 = {xεRm : xj ≤ xj wheneveri<∼ j}. Mau’s test

statistics for this ordering are given by (2.16) and (2.17) withW = πw(W | − H1), see

COMPARING TREATMENTS WITH A CONTROL 277

(2.15). The approximate significance levels are given by (2.18) and (2.19) withPT (l,m;w)replaced byP (l,m;w).

Under the assumption of proportional hazards, the Wald-type test in Silvapulle (1994)and the score test in Silvapulle and Silvapulle (1995) can be applied. Since the conclusionsof the Monte Carlo study described here are like those in Singh and Wright (1996), weanticipate they hold in general.

Grouped data. If one does not observe the exact time of death or the exact time anindividual leaves the study, but only the intervals to which these times belong or if thereare many tied observations, Cox’s (1972) logistic model can be used. Withx a covariatevector whose firstm−1 coordinates indicate the population to which an individual belongsandβ the corresponding vector of regression coefficients, Singh and Wright (1996) givethe partial likelihood score vector and its covariance as well asβ, the partial maximumlikelihood estimate ofβ, and its covariance. The techniques of Section 2 can be applied tothis score vector and toβ.

Acknowledgments

The authors thank Mervyn Silvapulle for calling their attention to the references Sen (1984)and Silvapulle and Silvapulle (1995) and the referees for helpful suggestions. This researchwas supported in part by the National Institute of Health under Grant 1 R01 CA61060-01.

Appendix

We justify the approximations given in (2.18) and (2.19). Recall that

W = (W 1,W 2, · · · ,Wm)′, w = (w1, w2, · · · , wm)′ with wi = Ni/N,

σ is given in Appendix A of Mau (1988),πw(x|T ) is defined by (2.15) withT ={(x1, x2, · · · , xm)′ : x1 ≥ xj for j = 2, · · · ,m}, andW = πw(W |T ). Assuming equalcensorship, i.e. all censoring variables have the same distribution, and Lehmann-type al-ternatives, that is

Si(t) = (S0(t))1−θi/√N for i = 1, 2, · · · ,m,

Breslow (1970) shows thatW/(σ√N) is asymptotically normally distributed with mean

vectork(θ − θ1, θ − θ2, · · · , θ − θm)′ and covariance matrixΣ∗ wherek is a positiveconstant,δij is the Kronecker delta,

limN→∞

wi = λi > 0, θ = λ1θ1 + · · ·+ λmθm andΣ∗ij = δij/λi − 1.

For i = 1, 2, · · · ,m, let φi be independent normal random variables with mean−kθi andvariance1/λi, letφ = (φ1, φ2, · · · , φm)′, and note that

W/(σ√N)D→(φ1 − φ, φ2 − φ, · · ·φm − φ)′ with φ =

m∑i=1

λiφi

278 SINGH AND WRIGHT

For Lehmann alternatives,H0, H1 andH2 are associated withθ1=θ2 = · · · = θm, θ1 ≤ θjfor all j = 2, · · · ,m andθ1 > θj for somej = 2, · · · ,m respectively. We show that thetests based onM01 andM12 are obtained by applying the likelihood ratio tests based on theφi − φ to theWi/(σ

√N). As discussed in Robertson et al. (1988, p. 216), one can obtain

the likelihood ratio test by taking differences in parameters as well as the correspondingobservations. Withγi = θi+1 − θ1 for i = 1, 2, · · · ,m − 1 we want to testH ′′0 : γi = 0for eachi versusH ′′1 : γi ≥ 0 for eachi, and we want to testH ′′1 versusH ′′2 : γi < 0 forsomei. However,(φ1 − φ)-(φi+1 − φ)=φ1 − φi+1 has meankγi and these are the samedifferences for testingH ′′0 versusH ′′1 orH ′′1 versusH ′′2 based on theφi. If y1 = · · · = ymanda > 0 then

πλ(x+ y|T ) = πλ(x|T ) + y andπλ(ax|T ) = aπλ(x|T ).

The likelihood ratio test ofH ′′0 versusH ′′1 based on theφi, rejectsH ′′0 for large values of

m∑i=1

λi(φ− πλ(φ|T )i)2 =m∑i=1

λi(πλ(φ− φ|T )i)2.

Replacingλi byNi/N andφi − φ by Wi/(σ√N), leads to the test which rejectsH1 for

large values ofM01.The likelihood ratio test ofH ′′1 versusH ′′2 based on theφi rejectsH ′′1 for large values of

m∑i=1

λi(φi − πλ(φ|T )i)2 =m∑i=1

λi(φi − φ− (πλ(φ− φ|T )i)2.

Again replacingλi byNi/N andφi− φ by Wi/(σ√N), leads to the test which rejectsH1

for large values ofM12. The approximations now follow from Theorem 2.3.1 in Robertsonet al. (1988, p. 69).

References

Arjas, Elja, “A graphical method for assessing the goodness of fit in Cox’s proportional hazards model,”J. Amer.Statist. Assoc., vol. 83 pp. 204-212, 1988.

Bartholomew, D. J., “A test of homogeneity of means under restricted alternatives (with discussion),”J. R. Statist.Soc. B, vol. 23 pp. 239-281, 1961.

Breslow, N., “A generalized Kruskal-Wallis test for comparing k samples subject to unequal pattern of censorship,”Biometrika, vol. 57 pp. 579- 594, 1970.

Cox, D. R., “Regression models and life-tables (with discussions),”J. R. Statist. Soc. B, vol. 34 pp. 187-202,1972.

Cox, D. R., “Partial likelihood,”Biometrika, vol. 62 pp. 269-276, 1975.Dunnett, C.W., “A multiple comparisons procedure for comparing several treatments with a control,”J. Amer.

Statist. Assoc., vol. 50 pp. 1096-1121, 1955.Horowitz, J.L. and Neumann, G.R., “ A general moments specification test of the proportional hazards model, ”

J. Amer. Statist. Assoc., vol. 87 pp. 234-240, 1992.Kudo , A., “A multivariate analogue of the one-sided test,”Biometrika, vol. 50 pp. 403-418, 1963.Lawless, J. F.,Statistical Models& Methods for Lifetime Data, Wiley: New York, 1982.Lininger, L., Gail, M.H., Green, S.B. and Byar, D.P., “Comparison of four tests for equality of survival curves in

the presence of stratification and censoring,”Biometrika, vol. 66 pp. 419-428, 1979.

COMPARING TREATMENTS WITH A CONTROL 279

Mau, Jochen, “A generalization of a nonparametric test for stochastically ordered distributions to censored survivaldata,J. R. Statist. Soc. B, vol. 50 pp. 403-412, 1988.

Robertson, T., Wright, F. T. and Dykstra, R. L.,Order Restricted Statistical Inference, Wiley: New York, 1988.Sen, P. K., “Subhypotheses testing against restricted alternatives for the Cox regression model,”J. Statist. Planning

and Inference, vol. 10 pp. 31-42, 1984.Shapiro, A., “Asymptotic distribution of test statistics in the analysis of moment structures under inequality

constraints,”Biometrika, vol. 72 pp. 133-140, 1985.Silvapulle, M. J., “On tests against one-sided hypotheses in some generalized linear models,”Biometrics, vol. 50

pp. 853-858, 1994.Silvapulle, M. J. and Silvapulle, P. “A score test against one-sided alternatives,”J. Amer. Statist. Assoc., vol. 90

pp. 342-349, 1995.Singh, B. and Wright, F. T., “Testing order restricted hypotheses with proportional hazards,”Lifetime Data Analysis,

vol. 2 pp. 363-389, 1996.Steele, R.G.D. “A multiple comparisons rank sum test: treatment versus control,”Biometrics, vol. 15 pp. 560-572,

1959.Tsiatis, A. A. “A large sample study of Cox’s regression model,”Ann. Statist., vol. 9 pp. 93-108, 1981.