testing whether an identified treatment is best

Upload: zhihua-xu

Post on 02-Mar-2016

8 views

Category:

Documents


0 download

DESCRIPTION

Testing Whether an Identified Treatment is Best

TRANSCRIPT

  • Testing Whether an Identified Treatment Is BestAuthor(s): Eugene M. Laska and Morris J. MeisnerSource: Biometrics, Vol. 45, No. 4 (Dec., 1989), pp. 1139-1151Published by: International Biometric SocietyStable URL: http://www.jstor.org/stable/2531766 .Accessed: 28/09/2014 22:49

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

    .

    JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

    .

    International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access toBiometrics.

    http://www.jstor.org

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • BiOMETRiCS 45, 1139-1151 December 1989

    Testing Whether an Identified Treatment Is Best

    Eugene M. Laska and Morris J. Meisner Statistical Sciences & Epidemiology Division,

    Nathan S. Kline Institute for Psychiatric Research, Orangeburg, New York, 10962, U.S.A. and

    Department of Psychiatry, New York University Medical Center, New York, New York 10016, U.S.A.

    SUMMARY We consider the problem of testing whether an identified treatment is better than each of K treatments. Suppose there are univariate test statistics Si that contrast the identified treatment with treatment i for i = 1, 2, .. . , K. The min test is defined to be the a-level procedure that rejects the null hypothesis that the identified treatment is not best when, for all i, Si rejects the one-sided hypothesis, at the a-level, that the identified treatment is not better than the ith treatment. In the normal case where Si are t statistics the min test is the likelihood ratio test. For distributions satisfying mild regularity conditions, if attention is restricted to test statistics that are monotone nondecreasing functions of Si, then regardless of their covariance structure the min test is an optimal a-level test. Tables of the sample size needed to achieve power .5, .8, .90, and .95 are given for the min test when the Si are Student's t and Wilcoxon.

    1. Introduction

    The problem we consider is how to test whether an identified treatment, A, is the best one among K + 1 treatments. Without loss of generality we shall take K = 2, and label the other two treatments B and C. When the outcome measure is univariate, the null hypothesis HO is that either B or C or both treatments are at least as good as treatment A. Let HOB be the null hypothesis that B is at least as good as A and let HOC be analogously defined. Then HO is the union of HOB and HOC and the alternative hypothesis is H1: A is superior to both B and C.

    Let X0, X1, and X2 be three random variables that characterize the effects of treatments A, B, and C, respectively. Suppose the distribution of Xi is associated with parameter Aui, which generally is a location parameter. Assume that larger values of ui are more desirable. Formally, letting y = to- t,u and 0 = AO- 2, the null hypothesis is

    - ~~~~~~~~HO: /.t _< /t or /.o -< A25 which is equivalent to y < 0 or 0 S 0, and the alternative hypothesis that the identified treatment is best is

    H1: A0 > A1 and Ao > A2, which is equivalent to min(y, 0) > 0.

    The problem of testing whether an identified treatment is best is thus framed in terms of testing where the parameters (ay, 0) lie in a two-dimensional plane. The points in the interior

    Key words: Clinical trials; Combination treatment; Likelihood ratio test; Optimal test. 1139

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • 1140 Biometrics, December 1989

    of the first quadrant excluding the positive axes comprise the alternative hypothesis H1. All of the points in quadrants two, three, and four, as well as those on both positive axes, comprise the null hypothesis Ho.

    The question of how to test this null hypothesis should be distinguished from the question of how to identify the best treatment (or set of treatments) considered by many authors, including Bechhofer (1954), Gupta and Panchapakesan (1972), and Gibbons, Olkin, and Sobel (1977). In our case the treatment of interest is already specified. It is also distinct from the question of how to determine if any treatments are superior (or inferior) to a specified control; see Dunnett (1955, 1964) and Miller (1981). We are interested in whether the specified treatment is superior to all of the other treatments.

    The problem of testing whether an identified treatment is best arises in many contexts. As an illustration, suppose that A is a combination of drugs B and C as discussed by Lasagna (1975). Before the combination may be approved for public use, the U.S. Food and Drug Administration regulations for fixed-combination drugs require evidence that " .. . each component makes a contribution to the claimed effects. . . " (21 CFR, 300.50). Suppose that both drugs B and C affect the same target symptom; that is, the outcome measure is univariate. Clearly, a physician should not prescribe the combination treatment if one of the constituents is at least as effective because there is increased risk of adverse effects, yet no added benefit from exposure to the second constituent. It is necessary therefore to demonstrate that the combination is more effective than each of its constituents.

    Testing a hypothesis that the parameters lie in specified orthants or quadrants has been considered in other contexts. For example, Gail and Simon (1985), in analyzing patient subsets that respond differently to a treatment, study the situation in which HO is the first and third orthant. Their likelihood ratio statistic under normality assumptions with known variance is the minimum of two statistics.

    Suppose there are two statistics, S and T, that possess some desirable property for testing HOB and HOC, respectively. For example, S and T may be determined from the Neyman- Pearson lemma or they may be locally most powerful rank tests. The procedure that we shall call the min test, which requires both test statistics S and T to reject at most at the a-level their respective null hypotheses HOB and HOC, is, as pointed out by Lehmann (1952), Fairweather (unpublished manuscript presented at meeting of FDA Biometrics and Epi- demiological Methodology Advisory Committee, 1976), Berger (1982), Laska and Meisner (1986), and Leung and O'Neill (1986), an a-level test for the univariate identified treatment testing problem.

    It is natural to ask whether there are more powerful tests of HO than the min test. In particular, since the univariate test statistics may be correlated, it would seem that advantageous use could be made of the covariance structure to achieve better performance characteristics. In this communication we show that if consideration of candidate test statistics is restricted to monotone functions of the univariate test statistics, then under rather mild conditions no test is superior to the min test. On the other hand, on mathe- matical grounds the requirement of monotonicity need not be imposed. In the translation parameter case with independent test statistics, Guttman (1987) has explicitly constructed more powerful tests than the min test. However, as Guttman has pointed out, the probability of a Type I error of his tests exceeds that of the min test, for each parameter point in HO. More generally, Cohen, Gatsonis, and Marden (1983) have shown that the min test is an admissible test.

    Since larger values of S and T correspond to greater evidence for the alternative hypotheses, restriction to monotone functions seems natural. An experimenter who obtains values of each component test statistic at least as large as those observed by his colleague would not find reasonable a test procedure that results in his colleague rejecting HO but

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • Testing Whether an Identified Treatment Is Best 1141

    denies such rejection to him. Since regulatory agencies are subject to judicial review, allowing nonmonotone procedures cannot help but lead to accusations of unfairness. Imagine how the courts would view the fairness of the FDA if the outcome (a, a) rejected Ho but the outcome (oo, a) did not.

    The theory exposited below concerns one-sided testing. In some applications a more conservative approach may be required. For example, for testing the efficacy of an active medication relative to a placebo, a two-tailed test might be preferred. To achieve a comparable degree of conservatiRsm, a simple expedient is to set the a-level at .025 and proceed with the one-tailed test (Koch, personal communication).

    In Section 2 the statistical setting of the identified treatment testing problem is given and sufficient conditions for an optimal property of the min test are stated. Several lemmas that simplify the verification of the conditions are given. In Section 3 we restrict attention to two normal cases, one with known covariance matrix X: and the other with X: = a2I, where I is the identity matrix and a 2 is unknown. In these two important cases the min test is the likelihood ratio test. In Section 4, a nonparametric model, the Wilcoxon, and a binomial model are considered. Tables of sample sizes that achieve given power levels are provided in Section 5 when the underlying random variables are normal and the univariate test statistics comprising the min test are Student's t and when they are Wilcoxon.

    2. Optimal Tests for the General Univariate Case We suppose that there are two univariate test statistics: S, for testing HOB: y < 0 versus H1B: y > 0, and T for testing Hoc: 0 < 0 versus H1c: 0 > 0. We assume further that the univariate distribution of S, say, F1(s; y), does not depend on the parameter 0, and similarly, the univariate distribution of T, F2(t; 0), does not depend on y. The distribution of the underlying random variables Xo, X1, and X2 will, in general, determine the joint distribution, F(s, t; -y, 0), of the random variables S and T. We shall assume that larger values of S and T provide more evidence in favor of H1B and Hic, respectively. To test HOB univariately the rejection region is of the form S > k1 and similarly, for HOC the rejection region is T> k2, where k1 and k2 are the smallest scalar values that yield a-level tests. Thus,

    a = sup Pr,(S > k1) = sup Pro(T > k2). (2.1) y--o o o

    In searching for admissible a-level tests of HO, we shall restrict attention to the family of test statistics G that are monotone nondecreasing functions of s and t in both arguments. The corresponding region R is monotone for it has the property that if the point (so, to) lies in R and if s >, so and t >, to then (s1, t1) also lies in R.

    The notation (y, 0) -- (oo, 0) should be understood to mean that y goes to infinity along each line 0 equal to a constant. The notation g(s, oo) represents the limit of g(s, t) as t -- oo if it exists. For those values for which the limit does not exist, we define it as infinity. Also, for any measurable set D in two-dimensional Euclidean space, let

    Pr,0(D) = Pr8,0{(S, T) E D} = dF(s, t; -y, 6). Lehmann (1952, Theorem 4.2) gave conditions under which the min test is a uniformly most powerful a-level test of HO among all regions of rejection that are monotone nondecreasing in both S and T. Laska and Meisner (1986) also gave an analogous result with optimality given within the family G of functions of s and t.

    Theorem 1 Let G be the family of all monotone nondecreasing functions of two variables. Let the test statistics S and T for HOB and HOC, respectively, be specified. For each g E G

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • 1142 Biometrics, December 1989

    let k, depending on g, be chosen such that sup Pr,,Ig > k k) is a monotone nondecreasing function of y for each fixed 0 and of 0 for each fixed y;

    (B1) lim Pro,0{g>kg = Pro(g(S, oo)> k); and (B2) lim Pr, 0 { g > k} = Pro(g(oo, T) > k);

    then the rejection region of the uniformly most powerful a-level test of Ho versus H1 within the family G is given by M = min(S - ki, T - k2) > 0, where k1 and k2 are given in (2.1).

    Note that, because of the monotonicity of Prz,0, the smallest probability of a Type I error for the min test on the boundary of the first quadrant in the (ay, 0)-plane occurs at (0, 0). In general, this value is less than a. Therefore, for parameter points in H1 close to the origin the power may also be less than a. Indeed, not only is the min test not unbiased, but, in general, unbiased tests do not exist (Lehmann, 1952).

    A helpful tool in determining if condition A of Theorem 1 obtains is provided by the following result, which, although stated for two parameters, holds for an arbitrary number of parameters.

    Theorem 2 Let G be the family of all monotone nondecreasing functions g of two variables. A necessary and sufficient condition for Prz,o(g(U, V) > k) to be monotonically nondecreasing in each parameter for all g in G is that for all (u, v), Pr,0(U > u, V > v) is monotonically nondecreasing in each parameter.

    Proof The necessity of the theorem follows immediately for a function g'that is an indicator function of an upper quarter-plane. To prove sufficiency we start with the supposition that Pr8,0(U > u, V > v) is nondecreasing, which implies Pr,(U > u) and PrO(V> v) are each nondecreasing in the parameters. The equation

    Pr ,0(g(U, V) > k) = Pro(g(u, V) > k) dF(u; y) = Pr( V> k 1(u)) dF(u; y) implies that Pr8,0(g > k) is an average of monotone nondecreasing functions of 0, Pr( V > k1 (u)), and is therefore itself monotone nondecreasing in 0.

    Lemma 1 If F(s, t; 'y, 0) is a translation parameter family, i.e., is of the form F(s - y, t - 0), then Theorem 1 holds.

    Guttman (1987) proved this result under the assumptions that S and Tare independent. A result useful in the case of the multivariate t, considered below, is given in the following

    corollary.

    Corollary Let U1 and U2 have joint distribution K(u1, u2). Let V1 and V2 be two positive random variables, independent of Ur and U2 with distribution function H(vu, v2). Suppose S and T have the representations S = (U1 + cly)/VI and T = (U2 + c26)/V2 with cl and c2 > 0. Then Theorem 1 holds.

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • Testing Whether an Identified Treatment Is Best 1143

    3. The Normal Case

    3.1 The Likelihood Ratio Test Inada (1978) for the bivariate case and Sasabuchi (1980) for the multivariate case have shown that if the underlying distribution is normal, the likelihood ratio test is the min test:

    Theorem 3 Let (Y1, Y2)' be normally distributed with mean (6, y)'. If the covariance matrix 2 is known, then the likelihood ratio A of Ho is given by the expression

    0 ? Y1 < 0 or Y2 < 0 log A = iQ2 y2)

    t C1~1 22 -min -Y1 > 0and Y2 >0.

    If 2 = a2I, where I is the identity matrix and a2 is unknown, then the likelihood ratio is as above with a,, and U22 replaced by the pooled S2.

    Since the likelihood ratio test is of the form A < X, for 0 < X < 1, we note that this is equivalent to the requirement that the minimum of the components is greater than a constant. Cohen et al. (1983) showed that if 2X = a2I, the min test is admissible.

    3.2 Normal Random Variables with Known Variance Suppose the underlying random variables X0, X1, and X2 are independently normally distributed with means Ao ul , and A2, respectively, and known variance, say, 1. In a completely randomized three-group design with n subjects per group form S = V7 i(Xo - X1) and T = 1i2/1(Xo- X2). S and T are bivariate normal with mean v'72(7y, 0), and covariance matrix

    From Lemma 1, within the family G, the uniformly most powerful, say, a = .05-level test of Ho versus H1 is M = min(S - 1.645, T - 1.645) > 0. This test is also the likelihood ratio test. This result, for arbitrary known covariance matrix, was given by Cohen et al. (1983). At the point (y, 0) = (0, 0) the Type I error probability is, for all n, the probability that the minimum of two standard normal random variables with correlation equal to 2- is greater than 1.645. From the tables of Gupta (1963), this probability is .012. Hence, as mentioned above, M is a biased test. However, it is consistent in that for any point in H1 the power of the test approaches 1 as n approaches infinity.

    3.3 Normal Random Variables with Unknown Variance Suppose the underlying random variables X0, X1, and X2 are independently normally distributed with respective means /to, Iti, and A2 and unknown but equal variance U2. The two variables X0 - X1 and X0 - X2 are jointly normal with mean (y, 0) and covariance matrix

    U2 1( t) Suppose n independent observations are taken on each variable Xo, X1, and X2. Let a2 be the mean squared error from a one-way analysis of variance with 3n - 3 degrees of freedom.

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • 1144 Biometrics, December 1989

    Let S = n(X0 - X1)/ and T= vI2/7(Xo - X2)/lf. For any given (y, 0), the variables S and T have representation

    U1 - _nyl/(VA2ff) U2- Vn6/(2oj S= ~~

    , = V

    where (U1, U2) are jointly normally distributed with zero means, unit variance, and covariance equal to 2. The denominator V is independent of the pair (U1, U2) and is equal to {3n 2/[o2(3n - 3)] 1/2. Thus, univariately Sand Teach follow a noncentral t distribution with 3n - 3 degrees of freedom and noncentrality parameter (/7n2)'y/a and (v/n-/)0/o, respectively.

    As a consequence of the corollary of Section 2, the test min(S - k, T - k) > 0, where k is the (1 - a)th percentile of the t distribution with 3n - 3 degrees of freedom, is the optimal a-level test of Ho versus H1 within the family of monotone functions of S and T. By Theorem 3 it is also the likelihood ratio test.

    Leung and O'Neill (1986) have suggested a multivariate confidence ellipsoid approach. Specifically, the null hypothesis is rejected at level a if and only if the (1 - a) Hotelling confidence ellipsoid for the mean vector lies entirely in the first quadrant. It is not hard to show that this procedure rejects Ho if and only if the minimum of the two corresponding t statistics is larger than a constant determined by the Hotelling procedure. However, this constant is too large and the procedure therefore too conservative for, as shown above, the proper constant to achieve an a-level test is obtained from the distribution of the univariate t statistics.

    4. Other Cases

    In this section, the min procedure is applied to test statistics having discrete distributions. In this setting there are only a finite number of a that can be obtained without recourse to randomized tests.

    4.1 Wilcoxon Rank Test Suppose the underlying random variables X0, X1, and X2 are distributed by F(x), F(x - y), and F(x - 0), respectively, for some unknown absolutely continuous distribution function F. In a parallel group design with n subjects in each of the three groups, form Ux0, x and Uxo x2 where U is the Mann-Whitney form of the Wilcoxon rank sum test. That is, Ux, y is defined to be the number of pairs i, j such that Xi < Yj. The sample space of the pair (Uxo,x, X Ux0,x2) consists of the integer lattice points (k, 1) for 0 < k, I < n2. Here S = Ux0,xl and T = Uxo,x2. The subscripts y and 0 in the notation Pr, o{(Ux,y, Uz, w) E D} indicate that the translation parameters of the distribution of X and Y differ by y and those of Z and W differ by 0. Minor modifications can easily be made to Theorems 1 and 2 so that they may be applied to this discrete probability model. Specifically, integrals are replaced by sums, and terms involving the boundary of the sample space, such as g(s, oo), are replaced by g(s, n2).

    To verify condition A of Theorem 1, it suffices by Theorem 2 to show that Pr,,o {S > s, T > t I = Prz,0 I Ux0 xl > s, Uxo,x2> t I is monotone nondecreasing in y and 0. Following the proof of Lehmann (1975) in the univariate case, let zyo < y1, and 0 be fixed. Let X0o, X1i, and X2i for i = 1, 2, ..., n be n independent and identically distributed random variables with distribution F(x), F(x - yo), and F(x - 0), respectively. Let Vii = X1i + (y, - yo). The common distribution function of the independent random

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • Testing Whether an Identified Treatment Is Best 1145

    variables Vii is F(x - yi). Notice that UxO,x, < Uxo,vl for if Xoi < Xlj, then Xoi < V1j since Vlj > Xlj. Thus,

    Pr,,00Ux0,xI > s, Uxo,x2 > t} < Pry,oIUxo,v, > s, UxO,X2 > t} The identical technique may be used to show that Pr-,0 is increasing in 0. Hence,

    condition A of Theorem 1 holds. Condition B2, lim Pr,O{Ig(UxO,x, Uxo,x2) > k} = Pro(g(n2, Uxo,xl) > k), -Y-- oo

    follows because the random variable Uxo,xI converges almost surely to n2 as y goes to infinity. The condition B 1 follows by a similar argument. Thus, the test min(S - k, T - k) > 0 where k is the (1 - a)th percentile of the two-sample Mann-Whitney test, is the optimal a-level test of Ho versus H1 within the family of monotone functions of S and T.

    4.2 A Binomial Model Suppose there are two groups, each with n subjects. In the first group treatments A and B are given and each subject reports the one preferred. Assuming no ties can occur, let ay be the probability that A is preferred to B. Each subject in group 2 is given treatments A and C with corresponding probability 0 that A is preferred to C. The sample space consists of the integer lattice points (i, j), 0 < i, j < n, where i and j are the numbers of subjects who preferred A in group 1 and group 2, respectively. The null and alternative hypotheses for the identified treatment problem are Ho: 0 < y < I or 0 < 0 u I and Pro I T > v I are increasing in y and 0, respectively. Intuitively, since S represents a sum of independent binomial random variables with success probability y, the expression Pr'(S > u) is increasing in y. This follows formally since the binomial distribution has monotone likelihood ratio (Lehmann, 1959). A similar result holds for Pro{ T> k}.

    Condition B 1 of Theorem 1 requires that I1 nn

    lim (-) (j )(_ )n(1-) = PrI/2(g(S, n)> k). 0 > 1 2 {i,J:g(QJ)>k} I J i:g(i,n)>k}

    The limit for those terms in the sum for which j $ n is 0. Those terms for which j = n have limit ( I )n(n), which establishes that B 1 holds. The condition B2 follows in the same fashion. Thus, the test min{S - k, T - k} > 0, where k is the (1 - a)th percentile of the binomial distribution with parameter 2 , is the optimal a-level test of Ho versus H1 within the family of monotone functions of S and T.

    5. Power and Sample Size in the Univariate Case In most applications the calculation of power for the min test is complex and requires numerical integration. However, a few bounds hold generally. The power of the min test,

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • 1146 Biometrics, December 1989

    under the conditions of Theorem 1, is increasing in the parameters. Therefore, at any point (y, 0) the power is bounded below by Pr,,,(M > 0), where 6 = min(y, 0) and is bounded above by Pr, ,(M > 0), where X = max(y, 0). For the translation parameter case, if M = min(S - k1, T - k2) as 0 goes to infinity the power Prz,0(M > 0) approaches the power of the univariate test statistic S at the point y.

    We next consider the power of the min test for the univariate cases considered in Sections 3.2, 3.3, and 4.1.

    5.1 Normal with Known Variance For the min test given in Section 3.2, the power at a point (ay, 0) in H1 is given by

    Pr8,0{M> 0} = Pr{U >- , U2 >- / where U1, U2 have a bivariate normal distribution with a zero mean vector, unit variance, and correlation equal to 2 . Tables of the sample size per group necessary to achieve power of.5, .8, .9, .95, and .99 for various values of (y, 0) are given in Laska and Meisner (1986). Berger (1982) also presents a power table when the correlation is .7, which in this framework could occur if the sample sizes in the three groups were not equal.

    How many more subjects are required to test whether the identified treatment is better than two (K = 2) treatments than are required to test whether it is better than one (K = 1) treatment? The latter hypothesis involving the treatments A and B is

    HOB: 3 < O vs. H1B: 6 > 0. Suppose y < 0 and let y be fixed. Let n be the number of subjects per group that are required to test the hypothesis HO at power p computed at (y, y). Let m be the number required to test HOB at power p computed at the single parameter 'Y. If 0 - 'Y is large, n and m are essentially equal. The sample size per treatment group required for M to achieve power p is largest on the diagonal, i.e., when 0 = y. Considering the sample size as a continuous variable, the number required for the test of HOB is

    km- 2( 'kp,

    where kx is the (1 - x)th percentile of the standard normal distribution. Therefore, the ratio of the sample sizes per group, n to m, is the same for each y and is given by

    n (k)2p m Vkag- kpX

    where wx is the (1 - x)th percentile of the distribution function of the minimum of two standard normal variates with correlation 2. For p = .50, .80, .90, .95, and .99, the ratios are 1.53, 1.25, 1.20, 1.17, and 1.14, respectively. For univariate power .80 or better this ratio is less than 1.25, so that 25% more subjects per treatment group are required.

    5.2 Independent Normal Random Variables with Unknown Variance The power of the min test considered in Section 3.3 is given by

    Prz, {S > k, T > k} = J' h(v)SO(uk + vk + )dv, where k is the (1 - a)th percentile of a t distribution with (3n - 3) degrees of freedom, 'P(x, y) = Pr{ U1 > x, U2 > y}, where (U1, U2) are two standard normal random variables with correlation 2 v, and h(v) is the distribution of a (xt/f)"/2 variable wheref= 3n - 3.

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • Testing Whether an Identified Treatment Is Best 1147

    The left side of Table 1 gives the sample sizes per group necessary to achieve power .50, .80, .90, and .95 for y/a and 0/c ranging from .2 to 2.0 in increments of .1. The blanks in Table 1 correspond to sample sizes that are beyond the range of accuracy of our numerical integration technique. For a given y/o, as 0/ goes to infinity the power converges to the univariate power. Consequently, for fixed y/o the sample size required to achieve power p remains constant as 0/c increases for the latter parameter sufficiently large. Therefore, for a given y/a, the largest value of 0/c displayed in the table is the parameter value beyond which the sample size remains constant.

    Table 1 Sample size per group for 5% test to achieve given power

    Student min test Wilcoxon min test Power Power

    'y/U 0/u .50 .80 .90 .95 .50 .80 .90 .95 .2 .2 210 220 415 544 664

    .3 153 161 330 451 568

    .4 139 146 325 449 567

    .5 137 144 325 449 567 .3 .3 93 177 232 284 99 185 242 295

    .4 73 146 196 244 78 152 204 254

    .5 65 139 192 242 69 146 200 252

    .6 62 139 192 242 66 145 200 252

    .7 61 139 192 242 65 145 200 252

    .4 .4 53 100 131 160 56 105 136 166 .5 44 85 113 140 47 89 117 145 .6 39 80 109 137 42 84 113 142 .7 36 78 108 137 39 83 113 142 .8 35 78 108 137 38 82 113 142

    .5 .5 34 64 84 103 37 67 87 106 .6 29 56 74 91 32 58 77 94 .7 26 52 70 88 29 55 73 91 .8 24 51 70 88 27 54 73 91 .9 23 50 69 88 26 53 73 91

    1.0 23 50 69 88 25 53 73 91

    .6 .6 24 45 59 71 26 47 61 74 .7 21 40 52 64 23 42 54 66 .8 19 37 50 62 21 39 52 64 .9 18 36 49 61 20 38 51 63

    1.0 17 35 49 61 19 38 51 63 1.1 17 35 49 61 18 38 51 63 1.2 16 35 49 61 18 38 51 63

    .7 .7 18 33 43 53 20 35 45 54 .8 16 30 39 48 18 31 40 49 .9 15 28 37 46 16 29 38 47

    1.0 14 27 36 45 15 29 38 47 1.1 13 26 36 45 15 28 38 47 1.2 13 26 36 45 14 28 38 47 1.3 12 26 36 45 14 28 38 47

    .8 .8 14 26 33 41 16 27 35 41 .9 13 23 30 37 14 25 31 38

    1.0 12 22 29 36 13 23 30 36 1.1 11 21 28 35 13 22 29 36 1.2 10 21 28 35 12 22 29 36 1.3 10 20 28 35 11 22 29 36 1.4 10 20 28 35 11 22 29 36 1.5 10 20 28 35 11 22 29 36 1.6 10 20 28 35 11 22 29 36 1.7 10 20 28 35 11 22 29 36 1.8 9 20 28 35 11 22 29 36

    (continued )

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • 1148 Biometrics, December 1989

    Table 1-Continued Student min test Wilcoxon min test

    Power Power

    y/U 6/a .50 .80 .90 .95 .50 .80 .90 .95 .9 .9 11 20 27 32 13 22 27 33

    1.0 10 19 24 30 12 20 25 30 1.1 9 18 23 28 11 19 24 29 1.2 9 17 23 28 11 18 24 29 1.3 9 17 22 28 10 18 23 28 1.4 8 16 22 28 10 18 23 28 1.5 8 16 22 28 10 18 23 28 1.6 8 16 22 28 10 18 23 28 1.7 8 16 22 28 9 18 23 28

    1 1.0 9 17 22 26 11 18 22 27 1.1 9 15 20 24 10 17 21 25 1.2 8 15 19 23 10 16 20 24 1.3 8 14 19 23 9 15 19 23 1.4 7 14 18 23 9 15 19 23 1.5 7 14 18 23 9 15 19 23 1.6 7 13 18 23 8 15 19 23

    1.1 1.1 8 14 18 22 9 15 19 22 1.2 7 13 17 20 9 14 17 21 1.3 7 12 16 20 8 13 17 20 1.4 7 12 16 19 8 13 16 19 1.5 6 12 15 19 8 13 16 19 1.6 6 11 15 19 8 13 16 19 1.7 6 11 15 19 7 12 16 19

    1.2 1.2 7 12 15 19 8 13 16 19 1.3 6 11 14 17 8 12 15 17 1.4 6 11 14 17 8 12 14 17 1.5 6 10 13 16 7 11 14 17 1.6 5 10 13 16 7 11 14 16 1.7 5 10 13 16 7 11 14 16 1.8 5 10 13 16 7 11 14 16 1.9 5 10 13 16 7 11 14 16 2.0 5 10 13 16 7 11 14 16 2.1 5 10 13 16 6 11 14 16

    1.3 1.3 6 10 13 16 7 11 14 16 1.4 6 10 12 15 7 11 13 15 1.5 5 9 12 14 7 10 12 15 1.6 5 9 12 14 7 10 12 14 1.7 5 9 11 14 6 10 12 14 1.8 5 9 11 14 6 10 12 14 1.9 5 8 11 14 6 10 12 14 2.0 5 8 11 14 6 10 12 14 2.1 5 8 11 14 6 10 12 14 2.2 5 8 11 14 6 9 12 14

    1.4 1.4 5 9 12 14 7 10 12 14 1.5 5 9 11 13 7 9 11 13 1.6 5 8 10 13 6 9 11 13 1.7 5 8 10 12 6 9 11 12 1.8 4 8 10 12 6 9 11 12 1.9 4 8 10 12 6 9 11 12 2.0 4 7 10 12 6 9 10 12 2.1 4 7 10 12 6 8 10 12 2.2 4 7 10 12 6 8 10 12 2.3 4 7 10 12 6 8 10 12 2.4 4 7 10 12 5 8 10 12

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • Testing Whether an Identified Treatment Is Best 1149 Table 1-Continued

    Student min test Wilcoxon min test Power Power

    'y/U 0/u .50 .80 .90 .95 .50 .80 .90 .95 1.5 1.5 5 8 10 12 6 9 1 1 12

    1.6 4 8 10 12 6 9 10 12 1.7 4 7 9 11 6 8 10 11 1.8 4 7 9 11 6 8 10 11 1.9 4 7 9 11 6 8 9 11 2.0 4 7 9 11 5 8 9 11

    1.6 1.6 4 7 9 11 6 8 10 11 1.7 4 7 9 10 6 8 9 10 1.8 4 7 8 10 5 8 9 10 1.9 4 6 8 10 5 7 9 10 2.0 4 6 8 10 5 7 9 10 2.1 4 6 8 10 5 7 8 10

    1.7 1.7 4 6 8 10 5 7 9 10 1.8 4 6 8 9 5 7 8 9 1.9 4 6 8 9 5 7 8 9 2.0 4 6 7 9 5 7 8 9 2.1 4 6 7 9 5 7 8 9 2.2 4 6 7 9 5 7 8 9 2.3 4 6 7 9 5 7 8 9 2.4 4 6 7 9 5 6 8 9

    1.8 1.8 4 6 7 9 5 7 8 9 1.9 3 6 7 8 5 7 8 8 2.0 3 5 7 8 5 6 7 8 2.1 3 5 7 8 5 6 7 8 2.2 3 5 7 8 5 6 7 8 2.3 3 5 7 8 5 6 7 8 2.4 3 5 7 8 5 6 7 8 2.5 3 5 7 8 5 6 7 8 2.6 3 5 7 8 4 6 7 8

    1.9 1.9 3 5 7 8 5 6 7 8 2.0 3 5 7 8 5 6 7 8 2.1 3 5 7 8 5 6 7 8 2.2 3 5 7 8 5 6 7 7 2.3 3 5 7 8 5 6 7 7 2.4 3 5 7 8 4 6 7 7

    2.0 2.0 3 5 6 7 5 6 7 7 2.1 3 5 6 7 5 6 6 7 2.2 3 5 6 7 4 6 6 7 2.3 3 5 6 7 4 6 6 7 2.4 3 5 6 7 4 5 6 7

    Analogous to the analytic computation in Section 5.1, a numerical calculation of the ratio of the sample sizes necessary to achieve power p of the min test to the univariate test may be computed from Table 1. For example, at y = .4 and power p = .9, the ratio is 138, which is approximately 1.21. This ratio for power .5, .8, .9, and .95, and for n large, is for all y approximately 1.53, 1.27, 1.21, and 1.17. For power greater than 80% this ratio is at most 1.27. Note that the univariate t statistic has degrees of freedom f = 3n - 3 and noncentrality parameter given by 1?y/(2), rather than by the usual

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • 1150 Biometrics, December 1989

    5.3 Wilcoxon Rank Test

    Asymptotically the joint distribution of the pair of Wilcoxon statistics introduced in Section 4.1 approaches the bivariate normal, as shown by Lehmann (1963).

    If the true underlying distribution is standard normal, say, 4, the mean and variance of the Wilcoxon (Lehmann, 1975) are E[Ux.,x,] = n24(,y/V2) and

    var[Ux0,x] = n-24bXy 2 b(X) + 2(n - 1)[ (, y ) - (X2 )]}

    Here P(x, y) = Pri U1 < x, U2 < y}, where U1 and U2 are each standard normal variables with correlation 2. It is easy to compute the covariance of Ux ,xl and Ux0,x2, which is given by

    cov(Ux0,x,, UxO,x2) = n ['P(, ) 4)

    For a 5% min test of two Wilcoxons, arising from an underlying normal distribution, the right-hand side of Table 1 gives the sample sizes per group that achieve power .50, .80, .90, and .95 for parameter values ranging from .2 to 2.0 in increments of .1.

    The sample sizes given in Table 1 were computed based on the joint asymptotic dis- tribution of the Wilcoxon random variables. Hence, care should be taken in interpreting small sample sizes. Here, too, as in the previous cases, the ratio of sample sizes for the min test to the univariate Wilcoxon test for power .50, .80, .90, and .95 are approximately for large n, 1.5, 1.27, 1.20, and 1.17.

    ACKNOWLEDGEMENTS

    The authors wish to thank Drs Robert T. O'Neill and Hoi Leung of the FDA and Drs H. B. Kushner and Carole Siegel for their helpful comments. The numerical calculations were performed by Dr Alex Levy and Joseph Wanderling, for whose help we are most grateful. This work was supported in part by NIMH Grant No. MH42959.

    RESUME Nous considerons le probleme de tester si un traitement identifie est meilleur que chacun de K autres traitements. Supposons que des statistiques de test Si univariates comparent le traitement identifie avec chaque traitement i pour i = 1, 2, . .. , K. Le test "min" est defini comme la procedure de niveau ae qui rejette l'hypothese nulle que le traitement identifie n'est pas le meilleur quand l'hypothese unilaterale au niveau ae de comparaison du traitement identifie au traitement i est rejettee pour chaque traitement i. Dans le cas normal ou les Si sont les statistiques t le test "min" est le test du rapport de vraisemblance. Pour des distributions satisfaisant a des conditions de regularite larges et si on s'interesse a des statistiques de test qui sont des fonctions des Si monotones non decroissantes alors, sans prendre en compte leur structure de covariance, le test "min" est optimal au niveau ae. Des tables de tailles d'echantillon necessaires pour atteindre des puissances de .50, .80, .90, et .95 sont fournies pour le test "min" lorsque les Si sont des t de Student et de Wilcoxon.

    REFERENCES

    Bechhofer, R. E. (1954). A single-sample multiple-decision procedure for ranking means of normal populations with known variances. Annals of Mathematical Statistics 25, 16-39.

    Berger, R. L. (1982). Multiparameter hypothesis testing and acceptance sampling. Technometrics 24, 295-300.

    Cohen, A., Gatsonis, C., and Marden, J. (1983). Hypothesis tests and optimality in discrete multivar- iate analysis. In Studies in Econometrics, Time Series and Multivariate Statistics, S. Karlin (ed.). New York: Academic Press.

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

  • Testing Whether an Identified Treatment Is Best 1151

    Dunnett, C. W. (1955). A multiple comparisons procedure for comparing several treatments with a control. Journal of the American Statistical Association 50, 1096-1121.

    Dunnett, C. W. (1964). New tables for multiple comparisons with a control. Biometrics 20, 482-491. Gail, M. and Simon, R. (1985). Testing for qualitative interactions between treatment effects and

    patient subsets. Biometrics 41, 361-372. Gibbons, J. D., Olkin, I., and Sobel, M. (1977). Selecting and Ordering Populations: A New Statistical

    Methodology. New York: Wiley. Gupta, S. S. (1963). Probability integrals of multivariate normal and multivariate t. Annals of

    Mathematical Statistics 34, 792-828. Gupta, S. S. and Panchapakesan, S. (1972). On a class of subset selection procedures. Annals of

    Mathematical Statistics 43, 814-822. Guttman, S. (1987). Tests uniformly more powerful than uniformly most powerful monotone tests.

    Journal of Statistical Planning and Inference 17, 279-292. Inada, K. (1978). Some bivariate tests of composite hypotheses with restricted alternatives. Rep. Fac.

    Sci. Kagoshima Univ. (Math. Phys. Chem.) 11, 25-3 1. Lasagna, L. (1975). Combination Drugs-Their Use andRegulation. New York: Stratton International

    Medical Book Corporation. Laska, E. M. and Meisner, M. (1986). Testing whether an identified treatment is best: The combination

    problem. Proceedings of the Biopharmaceutical Section of the American Statistical Association, 163-170.

    Lehmann, E. L. (1952). Testing multiparameter hypotheses. Annals of Mathematical Statistics 23, 541-552.

    Lehmann, E. L. (1959). Testing Statistical Hypotheses. New York: Wiley. Lehmann, E. L. (1963). Robust estimation in analysis of variance. Annals of Mathematical Statistics

    34, 957-966. Lehmann, E. L. (1975). Nonparametrics: Statistical Methods Based on Ranks. San Francisco,

    California: Holden-Day. Leung, H. M. and O'Neill, R. (1986). Statistical assessment of combination drugs: A regulatory view.

    Proceedings of the Biopharmaceutical Section of the American Statistical Association, 33-36. Miller, R. G., Jr. (1981). Simultaneous Statistical Inference, 2nd edition. New York: Springer-Verlag. Sasabuchi, S. (1980). A test of a multivariate normal mean with composite hypotheses determined

    by linear inequalities. Biometrika 67, 429-439.

    Received September 1988; revised March 1989.

    This content downloaded from 128.227.196.74 on Sun, 28 Sep 2014 22:49:37 PMAll use subject to JSTOR Terms and Conditions

    Article Contentsp.1139p.1140p.1141p.1142p.1143p.1144p.1145p.1146p.1147p.1148p.1149p.1150p.1151

    Issue Table of ContentsBiometrics, Vol. 45, No. 4 (Dec., 1989), pp. i-iv+1041-1354+v-xxviiiVolume Information [pp.xi-xxviii]Front Matter [pp.i-iv]Kernel Function Smoothing of Insulin Absorption Kinetics [pp.1041-1052]Effects of Cluster Sampling on Epidemiologic Analysis in Population-Based Case-Control Studies [pp.1053-1071]Semiparametric Marshall-Olkin Models Applied to the Occurrence of Metastases at Multiple Sites after Breast Cancer [pp.1073-1086]A Model with First-Year Variation for Ring-Recovery Data [pp.1087-1101]Designs for Synthetic Case-Control Studies in Open Cohorts [pp.1103-1116]An Increasing Failure Rate Approach to Low-Dose Extrapolation [pp.1117-1123]The Probability of Causation under a Stochastic Model for Individual Risk [pp.1125-1138]Testing Whether an Identified Treatment Is Best [pp.1139-1151]One Degree of Freedom for Nonadditivity: Applications with Generalized Linear Models and Link Functions [pp.1153-1162]Nonparametric Confidence and Tolerance Regions in Canonical Variate Analysis [pp.1163-1173]A Multiple Comparisons Procedure for Use in Conjunction with the Benard-van Elteren Test [pp.1175-1182]Modelling the Covariance Structure of Repeated Measurements [pp.1183-1195]Monitoring Accumulating Data in a Clinical Trial [pp.1197-1211]The Predictive Value of Simple Rules for Combining Two Diagnostic Tests [pp.1213-1222]Interval Estimation of the Size of a Small Population from a Mark-Recapture Experiment [pp.1223-1231]Confidence Intervals for Nearly Unbiased Estimators in Single-Mark and Single-Recapture Experiments [pp.1233-1237]Estimation of Delay Times in Stochastic Compartmental Models [pp.1239-1247]Shorter CommunicationsThe Subject-by-Formulation Interaction as a Criterion of Interchangeability of Drugs [pp.1249-1254]Testing for Random Dropouts in Repeated Measurement Data [pp.1255-1258]A Note on Approximating the Cumulative Distribution Function of the Time to Tumor Onset in Multistage Models [pp.1259-1263]Best Subsets Logistic Regression [pp.1265-1270]

    The Consultant's ForumGoodness of Fit in Armitage-Doll Poisson Models [pp.1271-1280]A Robust Two-Stage Multiple Comparison Procedure with Application to a Random Drug Screen [pp.1281-1297]

    Reader ReactionModels for Quantal Response Experiments Over Time [pp.1299-1308]

    Reader ReactionsConfounding in Epidemiologic Studies [pp.1309-1322]

    CorrespondenceOn the Variance Estimator for the Mantel-Haenszel Risk Difference [pp.1323-1324]On the Estimation of Threshold Values [pp.1324-1328]A Note on the Use of the O'Brien-Dyck Runs Test with a Blocking Variable [pp.1328-1329]Cohen's Kappa [pp.1329-1330]Test for Mixture of Two Normals [pp.1330-1332]

    Obituary: Charles Roy Henderson 1911-1989 [pp.1333-1335]Book Reviewsuntitled [pp.1337-1338]untitled [p.1338]untitled [pp.1338-1339]untitled [p.1339]untitled [pp.1339-1340]untitled [pp.1340-1341]untitled [p.1341]untitled [pp.1341-1342]untitled [p.1342]untitled [p.1343]untitled [pp.1343-1344]untitled [p.1344]

    Brief Reports by the Editorsuntitled [p.1345]untitled [p.1345]untitled [p.1345]

    Correction: Estimation and Comparison of Changes in the Presence of Informative Right Censoring by Modeling the Censoring by Modeling the Censoring Process [p.1347]Correction: Jackknife Estimate of the Matching Coefficient of Similarity [p.1347]Correction: Estimating the Size of a Population from a Single Sample [p.1347]The Biometric Society Financial Statements and Independent Auditors' Report January 31, 1989 [pp.1351-1353]Back Matter [pp.1349-x]