article choice modeling

Upload: agyekum-oti-enoch

Post on 09-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 article choice modeling

    1/14

    Biostatistics (2007), 8, 1, pp. 7285

    doi:10.1093/biostatistics/kxj034

    Advance Access publication on April 5, 2006

    The logistic transform for bounded outcome scores

    EMMANUEL LESAFFRE , DIMITRIS RIZOPOULOS, ROULA TSONAKABiostatistical Centre, Catholic University of Leuven,

    U.Z. St. Rafael, Kapucijnenvoer 35, B3000 Leuven, Belgium

    [email protected]

    SUMMARY

    The logistic transformation, originally suggested by Johnson (1949), is applied to analyze responses that

    are restricted to a finite interval (e.g. (0, 1)), so-called bounded outcome scores. Bounded outcome scores

    often have a non-standard distribution, e.g. J- or U-shaped, precluding classical parametric statistical

    approaches for analysis. Applying the logistic transformation on a normally distributed random variable,

    gives rise to a logit-normal (LN) distribution. This distribution can take a variety of shapes on (0, 1).

    Further, the model can be extended to correct for (baseline) covariates. Therefore, the method could be

    useful for comparative clinical trials. Bounded outcomes can be found in many research areas, e.g. drug

    compliance research, quality-of-life studies, and pain (and pain relief) studies using visual analog scores,

    but all these scores can attain the boundary values 0 or 1. A natural extension of the above approach is

    therefore to assume a latent score on (0, 1) having a LN distribution. Two cases are considered: (a) the

    bounded outcome score is a proportion where the true probabilities have a LN distribution on (0, 1) and

    (b) the bounded outcome score on [0, 1] is a coarsened version of a latent score with a LN distribution

    on (0, 1). We also allow the variance (on the transformed scale) to depend on treatment. The usefulnessof our approach for comparative clinical trials will be assessed in this paper. It turns out to be important

    to distinguish the case of equal and unequal variances. For a bounded outcome score of the second type

    and with equal variances, our approach comes close to ordinal probit (OP) regression. However, ignoring

    the inequality of variances can lead to highly biased parameter estimates. A simulation study compares

    the performance of our approach with the two-sample Wilcoxon test and with OP regression. Finally, the

    different methods are illustrated on two data sets.

    Keywords: Barthel index; Bounded outcome scores; Compliance research; Logistic-transform, Ordinal probit

    regression.

    1. INTRODUCTION

    Bounded outcome scores are measurements that are restricted to a finite interval, which can be closed,

    open, or half-closed. Examples of bounded outcome scores can be found in many medical disciplines.

    For instance, in compliance research one measures the proportion of days that patients correctly take

    their drug, hereafter denoted as pdays. Another example is the Barthel index (Mahoney and Barthel,

    1965) which is an Activities on Daily Living scale that (in one version) jumps with steps of 5 from 0

    To whom correspondence should be addressed.

    c The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].

  • 8/7/2019 article choice modeling

    2/14

    The logistic transform for bounded outcome scores 73

    (death or completely immobilized) to 100 (able to perform all daily activities independently). This scale

    is often used in stroke trials to measure the recovery of a patient after an acute stroke. Finally, in pain and

    pain-relief studies, a visual analog score (VAS) is used to measure the psychological state of the subject.

    Bounded outcome scores show a variety of distributions, from unimodal to J- and U-shaped. These

    peculiar shapes often motivate the use of non-parametric methods, like the Wilcoxon test (Lesaffre and

    others, 1993) when comparing two treatments. However, possibilities for statistical modeling, e.g. when

    covariate adjustment is envisaged, are then limited. Alternatively, a dichotomized version of the score

    may be constructed and analyzed using logistic regression. For instance, the Barthel index could be split

    at 0.9. A value above 0.9 implies that the patients are able to perform most of their daily activities, and

    hence the dichotomized Barthel index has a simple interpretation. However, such an approach has two

    disadvantages; first the choice of the threshold is usually ad hoc and second reducing the score to a binary

    variable may reduce the efficiency of the comparison. Ordinal regression (McCullagh, 1980) is an alter-

    native method to analyze bounded outcome scores although this ignores the numeric character of the data.

    In this paper, we explore the use of the logistic transformation first suggested by Johnson (1949) to

    model the distribution of bounded outcome scores on (0, 1). However, many outcome scores are defined

    on a closed interval. In this case, we assume here that a latent variable with range (0, 1) gives rise to an

    observed score on [0, 1]. Bounded outcomes on [0, 1] can be discrete or of a mixed continuousdiscrete

    type. Here, we will concentrate on the first type and consider two cases: (1) a proportion where the true

    probabilities have a logit-normal (LN) distribution on (0, 1) and (2) a discrete score ranging from 0 to

    1 interpreted as the grouped version of a continuous latent variable on (0, 1). We call the first case the

    binomial-logit-normal (BLN) approach and the second case the coarsening (CO) approach. A typical

    example of a continuousdiscrete score on [0, 1] would be the visual analog scale, which takes values

    continuously on (0, 1) but can also give the extreme values 0 or 1 with a non-zero probability.

    In Section 2, we indicate the usefulness of the logistic transformation for comparative clinical re-

    search with bounded outcomes on (0, 1). We then present methods for analyzing bounded outcomes on

    [0, 1] and focus on the comparison of two treatments. We consider the cases of equal and unequal vari-

    ances on the transformed scale. The competitor to the CO approach is the classical ordinal probit (OP)

    regression. We will show that OP regression is very similar to the CO approach for equal variances, butwith unequal variances it may give a severely biased estimate of the treatment effect. Section 4 describes a

    simulation study evaluating the performance of our approaches for various distributions on [0, 1] in com-

    parison with the two-sample Wilcoxon test and OP regression. Details of the simulation study are given in

    supplementary material available at Biostatistics online (http://www.biostatistics.oxfordjournals.org). In

    Section 5, we illustrate our method first on pdays, the primary endpoint of the THAMES study, a recent

    compliance-enhancing intervention study performed in Belgium. Further, we re-analyzed the primary

    endpoint (Barthel index) of the European Cooperative Acute Stroke Study I study, an early placebo-

    controlled randomized clinical trial evaluating the effect of a thrombolytic drug on patients with an acute

    ischemic stroke. In Section 6, we look at distributions other than the LN, discuss some other approaches,

    and look at the goodness-of-fit of the LN distribution. Finally, in Section 7, we summarize our results and

    make some suggestions for further research in this area.

    2. THE LOGISTIC TRANSFORMATION AND ITS APPLICATION TO CLINICAL RESEARCH

    Johnson (1949) suggested the logistic transformation Z = + logUabU

    , where ,, a, b R and

    U is a score on the interval (a, b). The aim of Johnson was to achieve standard normality. In the case of

    proportions, a = 0 and b = 1. Here we take = 0 and = 1 and assume that the logistic transformationachieves a normal density N(, 2). In general, when Z has density f(z) f(z; ), then U has densityg(u) g(u; ) = f(logit(u)) 1

    u(1u) , where logit(u) = log

    u1u

    . When Z N(, 2), then U has a

    LN distribution, denoted as LN(, 2) and T = (, 2).

    http://www.biostatistics.oxfordjournals.org/http://www.biostatistics.oxfordjournals.org/
  • 8/7/2019 article choice modeling

    3/14

    74 E. LESAFFRE AND OTHERS

    Fig. 1. Different LN distributions.

    The LN distribution can take very different shapes depending on the choice of and 2, as is shown

    in Figure 1. Note that when changes sign this corresponds to mirroring the distribution around u = 0.5.Hence, the logistic transformation is very well suited to model a variety of distributions on (0, 1). A

    similar property holds for the Beta family, but Aitchison and Begg (1976) indicate that the LN distribution

    is richer and can approximate any Beta density.

    It is clear that when the bounded outcome scores have a LN distribution, the analysis could be done

    on the Z-scale using classical statistical analyses assuming a Gaussian distribution. For instance, suppose

    we wish to compare the effects of control and new treatments based on a bounded score with distributions

    LN(, 21 ) and LN( + , 22 ), respectively. When 21 = 22 = 2, a simple unpaired t-test can becalculated on the final Z-values and a 95% confidence interval can be obtained for (>0) on the Z-

    scale. The interpretation of is more difficult because it represents a location shift on the transformed

    scale. Since the logistic transformation is strictly monotone, = log 2/(12)1/(11), where 1 and 2 are themedians for the control and new treatment, respectively, on the original scale. Figure 2 gives an example

    of how the location-shift alternative on the Z-scale is translated into an alternative hypothesis on the

    observed U-scale, when 21 = 22 . The parameter can also be interpreted in relation to the WilcoxonMannWhitney test (Lehmann and DAbrera, 1998). Assume that Z1 and Z2 are independent random

    variables on the transformed scale corresponding to the control and new treatments, respectively, then

    P(Z2 > Z1) = P =

    2

    , which is also equal to P(U2 > U1) for the corresponding original U

    values. Brunner and Munzel (2000) called P the relative effect of the treatment, which is therefore

    seen to be determined by the ratio / . In general, the relative effect is equal to F1 dF2, where Fj is thecumulative distribution function of Zj or here equivalently ofUj (j = 1, 2). Hence, loosely speaking, Pdetermines the proportion of individuals better off with the new treatment than with the control treatment.A 95% CI for P can be obtained using the Delta method when estimates for , , and their covariance

    matrix are available. If instead transformation to a logistic distribution is envisaged, then / can be

    directly interpreted as a log-odds ratio of cumulative distribution functions, see Section 6.

    When 1 = 2, one could use the Welch test (Welch, 1951) on the transformed Z-scale. This test isalso called the unpaired t-test for unequal variances. However, it is well known that ignoring the inequality

    of the variances, i.e. applying in this case the classical unpaired t-test, has no great impact on the type I

    error as long as n1 n2 (see, e.g. Wetherill, 1960; Murphy, 1967). Further, in this case the relative effect isequal to P(Z2 > Z1) = P =

    /

    21 + 22

    , which can be estimated in a similar manner as above.

  • 8/7/2019 article choice modeling

    4/14

    The logistic transform for bounded outcome scores 75

    Fig. 2. Correspondence of the location-shift alternative hypothesis on the transformed Z-scale to the corresponding

    alternative hypothesis on the observed U-scale.

    The logistic transformation is useful for power and sample size calculations in a clinical trial with

    a bounded outcome score U as primary endpoint because the classical location-shift alternative is most

    often not appropriate. While power and sample size calculations are more difficult, they can be realized

    by first specifying the relative effect together with .

    Finally, the logistic transformation is also useful in statistical modeling of bounded outcome scores on

    (0, 1). Indeed, the logistic regression model

    log

    U

    1 U

    = xT + Z, (2.1)

    with Z N(0, 1), has been used in various applications (Kieschnick and McCullough, 2003). Thisapproach is especially useful in clinical trial applications when baseline covariate adjustment is envisaged.

    Finally, Expression (2.1) can easily be extended to allow to depend on covariates, as for example in

    Pourahmadi (1999).

    3. MODELING BOUNDED OUTCOME SCORES ON [0, 1]

    In this section, we primarily focus on bounded outcomes on [0, 1] which we denote by Yi to distinguish

    them from Ui (0, 1). Two different types of outcomes are considered here. First, the bounded outcomescores Yi are observed proportions equal to ri /Ni (i = 1, . . . , n), whereby ri is the i th count out of Niunits. In this case, Ui represents the true proportion measured with error by Yi . For instance, in compliance

    research, Yi = ri /Ni , whereby ri is the number of days out of Ni on which the i th patient has correctlytaken the drug. Second, Yi is a coarsened version of Ui on (0, 1). Typical examples can be found in

    Quality of Life research (e.g. the Barthel index) where Yi denote sums of individual items each measuring

    an aspect of the subjects true quality of life Ui . Such a score can be standardized to lie between 0 and 1

    while taking a finite number of values.

    3.1 Modeling proportions on [0, 1]

    When the bounded outcome score is a proportion derived from a series of conditionally (conditional

    on the subject) independent Bernoulli experiments, then an obvious choice is to work with a binomial

  • 8/7/2019 article choice modeling

    5/14

    76 E. LESAFFRE AND OTHERS

    distribution. Namely, in the BLN approach, we assume that

    ri Bin(Ui , Ni ) (i = 1, . . . , n) (3.1)

    with Ui

    LN(, 2) and say that ri has a BLN distribution. For each value of Ui , one observes Ni

    binary outcomes Wi j (j = 1, . . . , Ni ) summing up to ri . For the compliance example, the rec-orded adherence is the observed proportion of days that the patients take their medication correctly

    (with respect to dosage and timing) in a period of Ni days. In this case, the Ui could be interpreted

    as the (true but unobserved) latent adherence of the i th patient to the drug. Observe that this model is

    actually a classical measurement error model (Carroll and others, 1995), specifying the distribution

    f(Y|U).Model (3.1) can be extended by replacing by xTi to give a generalized linear mixed-effects model,

    whereby conditional on Ui the Wi j are assumed to be independent. As indicated above, a further exten-

    sion allows to depend on covariates. Fitting such a model can be done with, e.g. the SAS procedure

    NLMIXED or the function lmer() in package lme4 in R Development Core Team (2005).

    3.2 Modeling discrete bounded outcome scores on [0, 1]

    When Yi is a discrete random variable on [0, 1] (e.g. the Barthel index), but not a proportion, then it is

    natural to assume that Yi is a grouped version ofUi . As a specific example, Yi could have been realized

    from a CO mechanism such that Yi = k/m when (k0.5)/m Ui < (k+0.5)/m, where k= 0, . . . , m.At the boundaries, Yi = 0 when 0 < Ui < 0.5/m and Yi = 1 when (m 0.5)/m < Ui < 1. The abovegrid of boundary values is equal for all subjects and is denoted below as

    a0 0 < a1 = 0.5/m < < am = (m 0.5)/m < a(m+1) 1.

    However, our approach also allows a grid varying with the subjects.

    The framework of coarsened data has been formalized by Heitjan and Rubin (1991) and Heitjan(1993). In their terminology, we consider here only deterministic CO. More formally, we assume that

    as(i ) Ui < as(i )+1 when Yi is recorded. For the likelihood this implies the following expression:

    L(; y) =n

    i=1

    as(i )+1as(i )

    g(ui ; )du, (3.2)

    where g() is the probability density function of the LN distribution. This leads to the likelihood

    L(; y) L(, ; y) =n

    i=1

    z

    (u)s(i ) xTi

    z

    (l)s(i ) xTi

    , (3.3)

    where z(l)s(i ) = logit(as(i )), z(u)s(i ) = logit(as(i )+1), and () is the distribution function of the standard

    normal distribution. At the boundaries, i.e. a0 = 0 and a(m+1) = 1, the values of zs(i ) become and, respectively. When depends on covariates, the above expression needs to be adapted. For instance,in the two-group comparison Expression (3.3) splits up in two parts, one with 1 (first treatment) and

    the other with 2 (second treatment). For obvious reasons, we have called this the CO approach either

    assuming equal variances or allowing unequal variances.

    The maximum likelihood estimates for this model can easily be obtained using standard numerical

    optimization procedures such as the Broyden, Fletcher, Goldfarb, and Shanno (BFGS) algorithm (Lange,

    2004).

  • 8/7/2019 article choice modeling

    6/14

  • 8/7/2019 article choice modeling

    7/14

    78 E. LESAFFRE AND OTHERS

    4.1 Set up of the simulation study

    We have divided the simulation study according to the type of the bounded outcome score on [0, 1]. For

    a proportion, we have compared the Wilcoxon test and the BLN approach, and in some cases also the OP

    regression model, despite it being not strictly appropriate for proportions. For coarsened data, we have

    compared the Wilcoxon test, the OP regression model, and the CO approach. A variety of scenarios wereconsidered, all involving two-group comparisons.

    One of the main purposes of the simulation study is to show that including covariates can greatly

    increase the power in detecting a treatment effect when dealing with bounded outcome scores. Therefore,

    we considered cases with and without covariates. Further, we included a variety of distributions on [0, 1].

    Three different treatment effects were evaluated, which could be classified as low, moderate, and large.

    Finally, we also varied the study size, but in all cases specified n1 = n2.For a proportion, we compared the probability of the type I error and the power of the three approaches.

    For coarsened data, we additionally determined the estimated treatment effect, except of course for the

    Wilcoxon test.

    For coarsened data on [0, 1], we considered both the case of equal variances ( 1 = 2) and unequalvariances (1 = 2), specifically 1 = 22 and 2 = 21. Consequently, we included two versions of theCO approach: (a) assuming equal variances (CO1 approach) and (b) allowing for unequal variances (CO2

    approach). While OP regression is a popular approach in this setting, the possibility of unequal variances

    is often neglected. Therefore, we have included the additional case of unequal variances for coarsened

    data to highlight the impact of ignoring inequality of variances. On the other hand, for proportions the

    BLN approach is actually the only appropriate method, and can also be easily extended to the case of

    unequal variances. Thus, in this case extensive empirical comparison with OP regression is unnecessary.

    To determine the performance of the different approaches, we used the following software(a)

    Wilcoxon test: the R-function wilcox.test(), (b) OP regression: R-function polr() from package MASS,

    (c) BLN approach: generalized mixed-effects model (SAS proc NLMIXED and R-function lmer() from

    package lme4), and (d): CO approach: R-function grouped() from package grouped written by the authors

    (available from CRAN http://cran.r-project.org).

    4.2 Simulation results and discussion

    Proportions on [0, 1]. When no covariates are involved, the performance of the Wilcoxon test is nearly

    identical to that of the BLN model. The Pr(type I error) is close to 0.05 for all cases even for a sample size

    of n1 = n2 = 20. Further, we observed the expected positive association of the power with the samplesize and the effect size / . However, the power also seems to depend on the shape of the distribution.

    In particular, we observed that the U-shaped distribution yields in general a higher power, followed by

    the unimodal and the J-shaped distributions. An explanation of this phenomenon lies in the fact that the

    latent score Ui (true probability of success) is not known but only the observed proportion yi . When all

    true proportions are relatively close to 0 or 1, the observed proportions will be relatively close to each

    other. A proof of this is seen in the power of the Wilcoxon test which shows a similar behavior.When a significant baseline covariate is included, the Wilcoxon test had a much lower power than the

    covariate adjusted parametric models, and mainly as the effect of the covariate and increases. Finally,

    the BLN and OP models behaved similarly with a slight inferiority for the latter.

    Coarsened data on [0, 1]. First we summarize the results when there are no covariates. When 1 = 2,the type I error was well preserved for all approaches. Further, overall the power of the CO2-approach

    was less than for the other approaches, which is natural because the other approaches are developed under

    the assumption of equal variances. In all cases, the treatment effect was estimated without bias. When

    1 = 2, the type I error was well preserved for the CO2-approach, but was sometimes severely increased

    http://cran.r-project.org/http://cran.r-project.org/
  • 8/7/2019 article choice modeling

    8/14

    The logistic transform for bounded outcome scores 79

    for the other approaches. For the CO1- and the OP regression models, the reason is that the treatment

    effect is sometimes estimated with a large bias. The anti-conservative character of the Wilcoxon test is

    explained by its relationship with ordinal logistic regression (McCullagh, 1980). The power of the CO2-

    approach was sometimes much less than for the other approaches, but this can be explained by their anti-

    conservative character. Indeed, the power of the other approaches was even higher than the corresponding

    power obtained from the Welch test, i.e. when no CO is involved.

    When covariates are available, the first and obvious conclusion is that the power can be greatly im-

    proved depending on the relationship of the covariates with the response. Apart from that, the conclusions

    are similar to those reported.

    Discussion. We expected the BLN approach, and especially the CO approach, to be more powerful than

    OP regression since the latter requires more parameters to be estimated. However, the simulation results

    showed only small differences. We attribute this to the low correlation of all estimated cut points (except

    for 0) with the estimated regression parameters in the OP regression model.

    When the variances are unequal in the treatment groups, our simulations indicated that the Wilcoxon

    test, the CO1 approach, and classical OP regression yield seriously distorted type I errors. The two latter

    approaches also produced severely biased treatment effects, even with n1 = n2. In Wetherill (1960), the-oretical calculations revealed that the type I error and the power of the Wilcoxon test are fairly insensitive

    to unequal variances provided n1 = n2. However, we observed that when the data are coarsened the per-formance of the Wilcoxon test is severely affected even when the sample sizes are equal, probably due

    to the large number of ties. The sensitivity of the CO1 approach and classical OP regression is explained

    by the fact that for non-linear models misspecification of the (co)variance structure has an impact on the

    correct estimation of the mean parameters (see, e.g. Butler and Louis, 1992).

    5. APPLICATIONS

    5.1 A compliance-enhancing intervention study: THAMES study

    Recently, an open-label, multicenter compliance-enhancing intervention (THAMES) study was completedin Belgium to measure the effect of a program of pharmaceutical care, designed to enhance adherence to

    atorvastatin treatment. Four well-defined districts were identified, two in Flanders (northern Belgium)

    and two in Wallonia (southern Belgium). In both Flanders and Wallonia, all pharmacists in one of the

    districts were to apply measures to improve compliance and enhance persistence, whereas in the second

    district no such measures were taken. There were 187 patients in the intervention group and 182 patients

    in the control group. All pharmacists were equipped with the Medication Electronic Monitoring System

    (MEMS) system, an electronically monitored pharmaceutical package designed to compile the dosing

    histories of ambulatory patients taking oral medications (Urquhart, 1997). The total study duration was

    12 months. The number of visits to the pharmacy ranged from 5 to 13. At each visit, the patients dosing

    history was checked by means of the electronic monitoring system. The period between the first and

    second visit was considered to be the baseline period. More details on the setup of this intervention studycan be found in Vrijens and others (2006).

    The primary efficacy parameter of the THAMES study was adherence to prescribed therapy in the

    post-baseline period, whereby adherence was defined for each patient as the proportion of days during

    which the MEMS record showed that the patient had opened the pill container correctly. This variable

    was also estimated at baseline (baseline adherence). Finally, for the calculation of the post-baseline

    adherence the post-baseline period was arbitrarily cut off at day 300.

    Baseline covariates. The THAMES study could not be randomized due to practical difficulties. There-

    fore, we need to compare the baseline covariates of the intervention and control groups. In Table 1, we

  • 8/7/2019 article choice modeling

    9/14

    80 E. LESAFFRE AND OTHERS

    Table 1. THAMES study: comparison of baseline covariates for difference in the two groups. For

    categorical variables frequencies (percentages) and for continuous variables the mean are reported

    Variable Levels Intervention Control p-value

    Total number 187 182 Adherence 0.93 0.90 0.011

    Gender Males 102(55%) 85(45%) 0.161

    Age 62 60 0.231

    Weight 77 78 0.735

    Work Unemployed 46(25%) 64(35%) 0.029

    Cardiovascular risk 12.74 10.52 0.013

    Family history No 145(78%) 142(78%) 0.911

    compared gender, age, weight, work status (unemployed versus employed), a cardiovascular risk score

    (Vrijens and others, 2006), family history of CHD, and the pdays at baseline with the appropriate statisti-

    cal techniques. The adherence at baseline is of particular interest and the Wilcoxon test gives a significantresult (p = 0.011). The reasons for this significant difference at the start are not clear, but it requires thatthe imbalance at the start needs to be taken into account. We were aware of the potential dangers of correct-

    ing for baseline covariates in the presence of imbalance at baseline (Wainer and Brown, 2004). However,

    for illustrative purposes we still performed an analysis of covariance type of analysis in Section 5.2.

    Efficacy comparison. Initially, we checked for a treatment effect without correcting for any baseline

    covariates. Both the Wilcoxon test and the BLN model gave a significant intervention effect with p