statistical assessment

Upload: muhammad-afzal-mirza

Post on 14-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Statistical Assessment

    1/11

    J Clin Epidem iol Vol. 50, No. 1, pp. 45-55, 1997Copyright 0 1997 Elsevier Scien ce Inc.

    ELSEVIER

    0895s4356/97/$17.00PI1 SO895-4356(96)00312-S

    Statistical Assessment of Ordinal Outcomesin Comparative StudiesSusan C. Scott, Murk S. Goldberg, and Nancy E. Mu~o~~

    DIVISION OF CLINICAL EPIDEMIOLOGY, ROYAL VICTORIA HOSPITAL, MONTREAL, QUEBEC, CANADA H3A lA1,2E~~~~~~~~~~~~ AND BIOST ATISTIC S UNIT, CENTRE DE RECHERCHE EN IMMUNOLOGIE, UNIVERS ITY DU QLJ~BEC,

    INSTIT UT ARMAND-FRAPPIER, LAVAL, QUPBEC, CANADA H7N 423, DEPARTM ENT OF MEDICINE, MCGILL UNIVER SITY,MONT&AL, QUEBEC, CANADA

    ABSTRACT. Ordinal regression is a relat ively new statis tical method developed for analyzing ranked outcomes.In the past, ranked scales have often been analyzed without making full use of the ordinality of the data or,alternatively, by assigning arbitrary numerical scores to the ranks. While ordinal regression models are nowavailable to make full use of ranked data, they are not used widely. This article, directed to clinical researchersand epidemiologists, provides a description of t-he properties of these methods. Using ordinal measures of backpain in a follow-up study of adolescent idiopathic scoliosis, we illustrate the advantages of these methods anddescribe how to interpret the estimated parameters. Comparisons with binary logistic regression are made tcshow why a single dichotomization of the ordinal scale may lead to incorrect inferences. Two ordinal models(the proportional odds and the continuation ratio models) are discussed, and the goodness-of-fit of these modelsis examined. We conclude that ordinal regression is a tool that is powerful, simple to use, and produces aninterpretable parameter that summarizes the ef fect between groups over all levels of the outcome. Copyright 01997 Elsevier Science Inc. J CLIN EPIDEMIOL 50;1:45-55, 1997.KEY WORDS. Ordinal regression, statis tical methods, back pain, epidemiology

    INTRODUCTIONMany indicators of health status of interest to clinical re-searchers and epidemiologists are measured on ordinalscales: scales that are rank-ordered and for which the quan-tita tive differences between levels are not known [l] . Attri-butes such as perceived health status, functional indepen-dence, mobility, and pain are most appropriately capturedby an ordinal scale. However, while data may be collectedusing an ordinal scale, they are rarely analyzed as such. Be-cause methods for analyzing ordinal data have not beenreadily accessible, ordinal outcomes are often analyzed byaltering the attributes of the scale: collapsing the scale toa dichotomous one, treating it as nominal, or consideringit to be continuous.

    Important information is lost when the ordinality ofranked data is not ful ly exploited. Ordinal scales are oftentreated as nominal scales, with proportions calculated foreach level of outcome. The differences in proportions couldthen be tested using chi-square tests of association. Chi-square tests have several limitations: they are not amenable

    Address for correspondence: Dr. Mark Goldberg, Epidemiology and Hiosta-ristics Unit, Centre de recherche en immun ologic, UniversitC du Q&bet, Institut Armand-Frappier, 531, boulevard des Prairies, CI 100,L-D-R, Laval, QuChec, Canada H7N 423.

    Accepted for publication on 6 August 1996.

    to statis tical adjustment; the results are sample-size depen-dent; and no measure of association is produced. In ad-dition, because they ignore the ordinality of the data, chi-square tests have less than optimal power, potentiallyleading to incorrect inferences.

    An ordinal outcome may also be analyzed with binarylogistic regression. Again, this method discards information,as it requires that the ordinal outcome be forced into twolevels. The decision as to where to dichotomize the ordinaloutcome is arbitrary, and the relationship between the oddsratio produced at the selected dichotomization with oddsratios produced at alternate dichotomizations is ignored. Inan attempt to overcome the arbitrariness of this decision,a var iety of dichotomizations could be defined and separateodds ratios estimated at each; however, if the differencesbetween estimates are due merely to random error, then oneoverall estimate is preferable. Polytomous logistic modelscan be used to accommodate more than two levels of out-come but do so without incorporating information on orderDA.

    Chi-square tests of trend, t -tests, analysis of variance, andanalysis of covariance are also routinely employed in theanalysis of ordinal data, often as a naive attempt to over-come shortcomings of the methods for nominal data. How-ever, they require that the ordinal categories and, therefore,the distances between them, be quantified and treated as

  • 7/30/2019 Statistical Assessment

    2/11

    46 s. c. Scott et al.

    continuous. In the simplest case, the distances between con-secut ive levels are assumed to be equal: for example, theincrease in pain between the categories of no pain and mildpain would be made equivalent in magnitude to that be-tween mild and moderate pain, as well as to the magnitudebetween moderate and severe pain. This quantification isarbitrary. In fact , the values selected to quantify the ordinalcategories can have a considerable ef fect on the inferencesdrawn [4] and can produce misleading results [5].

    Another dif ficult y in applying linear regression modelsto ordinal outcomes is that these models are based on theassumption that the variance of the outcome is homage,neous. The variance of ordinal data with an underlyingmultinomial distribution, however, is not homogeneous.Applying ordinary least squares regression to this data mayproduce unbiased parameter estimates, but the correspond-ing estimates of variance will be biased and inconsistent [6].

    In summary, methods based on either reducing ordinalscales to nominal or dichotomous ones, or assuming thatordinal scales have the properties of interval scales, haveseveral drawbacks and may lead to erroneous statis tical in-ferences. These approaches are still being used, even thoughstatistical theory and software have advanced sufficiently topermit exploiting the ordinal nature of the data. Statisti -cally powerful methods, referred to as ordinal regressionmodels, have been designed to take full advantage of ordi-nal outcomes. Ordinal regression provides a more sensit iveanalysis than would be possible by arbitrarily dichotomizingthe outcome variable, and does so without imposing unveri-fiable assumptions regarding the structure of the data. Bymodeling the dependence of an ordinal variable on a num-ber of explanatory variables, an adjusted estimate of effect,in the form of a summary odds ratio, is produced.

    Ordinal regression methods have been developed theo-retically and presented in the statist ical literature [7-161,including recent work on sample size estimation [ 17,181 andmodels for dependent observations [19-221. While, in mostcases, they are hailed as a breakthrough in analyzing rankedoutcomes, there is controversy regarding their use in prob-lems of classification [23,24]. Several varieties of ordinalmodels have been developed. We will deal only with thetwo most accessible: the proportional odds and the continu-ation ratio models, for use with a single outcome per subject.Several articles describing these two and other ordinal mod-els and the application of these methods have been pre-sented [5,7-16,25-311. However, while their application isbeginning to appear in the medical literature [28,32-401they are not ye t in widespread use. We believe this is duein part to the level of statis tical sophistication required tounderstand the literature. We suspect there is a need forpractical information on using these models, as well as forgreater understanding of the model assumptions, goodness-of-fi t, and the interpretation of results. Our experience withthese methods [38-401 leads us to believe that they are easyto use and produce estimates of eff ect that are interpretable

    epidemiologically. Thus, the purpose o f this paper is to mo-tiva te the use of these models by presenting the methodol-ogy in a form that is readily useable by the epidemiologistand the clinical researcher.

    We will f irs t present a description of ordinal regressionmodels, and then we will illustrate their use in an analysisof the prevalence of back pain from a follow-up study ofadolescent idiopathic scoliosis. In this example, we will re-veal the pitfa lls of using chi-square tests and binary logisticregression when outcomes are ordinal and we will contrastthese results to those obtained with the proportional oddsmodel and with the continuation ratio model. We will em-phasize both computational issues and the interpretation ofresults.

    METHODSOrdinal regression methods off er interesting analyt ic op-tions: (1) They retain the inherent ordinality of the data.They impose neither the loss of information inherent intreating an ordinal outcome as nominal or dichotomous,nor the unjustified quantification of category differencescreated when ordinal data is treated as continuous. (2) Theygo beyond simple s ignificance testing, which has limitedvalue in health research [41], to producing a measure ofef fect with corresponding confidence limits. The parameterthat is estimated is a type of odds ratio, and thus is recogniz-able and readily interpreted. The possibi lity of using alter-nate link functions to estimate other parameters (such asrisk ratios or risk differences) exists [5,26] , but the softwareis not well developed for all types of models. (3) The esti-mated odds ratio is not based on a particular dichotomiza-tion of the outcome variable, but rather summarizes theassociation of interest over all levels of outcome. (4)Confounding and interaction can be assessed for all typesof independent covariates: discrete, categorical and contin-uous. (5) The valid ity of model assumptions can be tested.(6) Sample size calculations for use with these models areavailable [17,18].

    Both the proportional odds and the continuation ratioordinal regression models are linear and additive on thelogit scale, and both use maximum likelihood methods toestimate a summary odds ratio. However, different series ofdichotomizations of the data, referred to as cut-points, areused in the two models. With both models, homogeneityof ef fect across cut-points is assumed and a single odds ratiosummarizing the effect of interest over all cut-points is cal-culated. While both these methods can be used with dataarising from cohort and from cross-sectional studies, resultsmay be biased [15] if they are applied to data that has beensampled according to outcome, such as in case-control stud-ies. Although this bias will be minimal if the frequency ofall but the reference level for the outcome is low [ 151, otherforms of ordinal models [l 1,13,15] may be more appropriatefor data arising from case-control sampling.

  • 7/30/2019 Statistical Assessment

    3/11

    Statist ical Assessment of Ordinal Outcomes 47

    TABLE I. Comparison of cut-points between proportionaland continuation odds models based on a SIeveI ordinal out-comeProportional odds Continuation ratiomodel: succes sive in- model: conditiona l in-cremental cut-points cut-point cremental cut-points

    0 vs. 1,2,3,4 1 0 vs. 1,2,3,40,l vs. 2,3,4 2 1 vs. 2,3,40,1,2 vs. 3,4 3 2 vs. 3,40,1,2,3 vs. 4 4 3 vs. 4

    The Proportional Odds ModelOf the two ordinal models, the proportional odds model[5,7,9,10,12,14,15,17,23-26,28-31,38-40,42-451 producesthe most easily interpretable estimate. As it is an extensionof binary logistic regression, it is sometimes referred to asthe ordinal logistic model; because it is defined by the logodds of the cumulative probabilities, it is also referred to asa cumulative odds model [12,15,26,31]. As displayed inTable 1, the proportional odds model can be intui tively un-derstood as being based on odds ratios formed over a seriesof successive incremental cut-points; with each cut-point,the level of severi ty required for categorization as a caserather than a non-case becomes increasingly stringent.Each cut-point-specific estimate is calculated using all ob-servations in the sample, but at a different dichotomizationof the outcome. The magnitude of the summary estimatesdoes not depend on the direction employed in modelingthe outcome, i.e., whether the cut-points are formed usingincreasing or decreasing levels of severi ty. The proportionalodds ratio, then, is a summary of the binary logistic oddsratios representing each of the cut-points. However, as thecut-point-specific estimates are not statistically indepen-dent, the proportional odds ratio is not a simple weightedaverage of these values, but rather is based on maximizationof a speci fic likelihood function [26]. On the assumptionthat the cut-point-specific odds ratios are homogeneous(the proportional odds property), the summary propor-tional odds ratio is independent of the degree of severity(cut-point) used to classi fy the outcome variable and is thusvalid over all cut-points simultaneously. In other words, itcan be viewed as an odds ratio that is independent of thedichotomy chosen to class ify the outcome. Due to this sim-ple interpretation, this model may be preferred by investiga-tors, even over other models that provide a better fi t to thedata [42]. While parameters other than the odds ratio canin theory also be calculated [26], software for such adapta-tions are not well developed. The hypothesis of homogene-ity of the proportional odds ratio over all cut-points can betested with a score test. This test has, unfortunately, severallimitations. First, zero cells for a regressor variable at an in-ner value of the outcome may produce spuriously high chi-square values [43,44]. A similar problem may result when

    data is generally sparse or when one of the values of theoutcome represents only a small fraction of the total samplesize, although the impact on the latter will depend on thetotal number of levels of the outcome. Second, this is aglobal test of nonproportionality and it cannot distinguishheterogeneity associated with the exposure variable fromthat associated with other covariates. To minimize boththese difficulties, this test might be better performed with acrude rather than an adjusted model, so long as confoundingvaries little over cut-points. Third, the score test is sensitiveto sample size, such that large samples may produce statis ti-cally significant p values when in fact there is little practicaldifference between the cut-point-specific estimates. In theabsence of a more robust test , we have found that it is advis-able to examine the data carefully by estimating binary lo-gistic odds ratios for each cut-point. A plot o f these cut-point-specific odds ratios with their confidence limitsagainst cut-points of the outcome provides a visual aid forjudging the validity of imposing one common odds ratioacross all cut-points, i.e., of fitt ing a line of slope zero withinthe confidence intervals [28]. When the cut-point-specificodds ratios are not suffic iently homogeneous, a binary logis-tic analysis at any one cut-point, while valid, cannot beused to characterize the relationship of the dependent andindependent variables over all cut-points. In this situation,another ordinal regression model, such as the continuationratio model, may better fi t the data [42]. Other strategiesto be considered in the event of non-proportional odds arediscussed below under the heading Goodness-of-Fi t andChoice of Models.

    Once heterogeneity has been ruled out, the proportionalodds model offers several advantages over binary logistic re-gression, including increased power and a measure o f ef fectthat applies to all dichotomies of the outcome. The increasein eff iciency will depend on several factors including thenumber of outcome levels, the distribution of subjectsamong outcome levels, and the ratio o f subjects in the com-parison and exposure groups [17,26].

    The Continuation Ratio ModelThe summary odds ratio based on the continuation ratiomodel [8,12,15-17,23,26,27,30] represents, among subjectswho have attained a certain level of severi ty (say, level j),the odds of exposure of having an outcome level greaterthan j relative to the odds of exposure of being in categoryj. It is therefore analogous to the proportional hazards modelof Cox [46], but in discrete time [16,26]. As illustrated inTable 1, the continuation ratio is based on conditionalincremental cut-points, with outcomes at a given level dis-carded af ter being compared to all higher levels. By viewingthe outcome as going from more severe to less severe, themodel can also be applied in reverse. However, because ofthe conditioning on adjacent cut-points, the continuationratio, unlike the proportional odds ratio, is affected by the

  • 7/30/2019 Statistical Assessment

    4/11

    48 s. c. Scottet al.direction chosen for modeling the response variable[15,16,26]. Thus, the odds ratio obtained when modelingincreasing severity is not equivalent to the reciprocal of thatobtained when modeling decreasing severity. The directionof choice will be clear when there is an absolute directionimplied in the scale of the outcome variable, such as withtime or irreversible deterioration [15,26]. The summarymeasure is also more easily interpreted for such outcomessince individuals at a given level of outcome will havepassed through all lower levels of the outcome. This modelis thus well-suited for failure time data [15,27] and outcomesmeasuring threshold points [ 151.

    The cut-point-specific ratios that comprise the continua-tion ratio are asymptotically independent [8,16]. The con-tinuation ratio model can therefore be fi t with any statis ticalpackage that includes binary logistic regression, after suit-able restructuring of the input data set [26,27]. Risk ratioscan also be calculated, either by Mantel-Haenszel tech-niques or relative risk regression [47], using the restructureddata. When the outcome has many levels, Greenland [15]suggests using conditional logistic regression.

    A test of heterogeneity over cut-points is also availablefor the continuation ratio model. Unlike the test used withthe proportional odds model, which is global, this test isspeci fic to the exposure-outcome relationship.

    Cjoodness-of-Fit and Choice of ModelsIn many cases, the data sets being analyzed are not suffi-cient ly large to definiti vely rule out either the proportionalodds or the continuation ratio model based solely on theiraccompanying tests of heterogeneity [9,17,26]. Very largesamples, on the other hand, may pose the opposite problemin that they have power to detect minimal deviations thatmay be unimportant. As well as the specif ic tests of hetero-geneity over cut-points, the goodness-of-fit of ordinal mod-els, as in all generalized linear models, can be examinedusing log likelihoods and residual deviances [26,45]. Directcomparisons of the likelihoods of proportional odds andcontinuation ratio models can be made only informally; toperform a likelihood ratio test, a more general model thatnests these two models would need to be formed [26]. Theselection of the appropriate model, as in any modeling exer-cise, should therefore include a priori considerations such ascharacterist ics of the outcome [5,9,15,26] and ease of inter-pretation [9,26,42].

    When the proportional odds model does not provide agood fi t to the data, the continuation ratio model may bea viable alternative [16,42]. When the data are found toviolate the assumptions of both the proportional odds andthe continuation ratio models, other options are available.The first step would be to examine the particulars of theviolation, i.e., to determine which variables have a nonpro-portional relationship with the outcome and whether thecut-point-specific odds ratios for those factors appear to fol-low a particular trend. Several options are available for

    modeling odds that are not proportional. Functional asymp-totic regression methodology (FARM) [48,49] allows theproportional odds assumption to be imposed on some, butnot all, regressor variables. A more powerful method, usingmaximum likelihood procedures, is presented b y Petersonand Harrell[43,44]. They describe three models: (1) an un-constrained partial proportional odds model in which theeffect of the exposure is estimated at each cut-point; (2)a model imposing linear constraints on selected regressorvariables; and (3) a model in which multiple forms of con-straints can be combined. The first two models can be fi tusing PROC LOGIST from the SAS SUGI l ibrary. Unfor-tunately, these options were not maintained when PROCLOGISTIC was incorporated into the standard SAS statis-tics module [50], and SAS no longer supports the SUGIlibrary in release 6. A third alternative is presented by Cox[51]: location-scale cumulative odds models, a fami ly ofmodels that encompass both the partial proportional oddsmodels of Peterson and Harrell[43,44] and another nonpro-portional odds model proposed by McCullagh [9] and gener-alized by Angelos Tosteson and Begg [52]. These modelscan be implemented using nonlinear regression (e.g., PROCNLIN in SAS [50]). Another possibil ity is the stereotypemodel [11,13,15,30]. This model allows a covariate to in-fluence some, but not all, levels of the outcome. Stereotypemodels may also be considered when the ordering of thesupposed ranked outcome is in question, or when the datahas arisen from a case-control study [15].

    Homogeneity over cut-points is also an assumption of thecontinuation ratio model. I f this assumption is violated, al-terations to the continuation ratio model can be made eas-ily . Details are provided below in Compmtional Issues.

    Computational IssuesBoth ordinal regression models presented in this paper canbe fi t using SAS [50]. The proportional odds model is auto-matically produced with PROC LOGISTIC when there aremore than two levels of outcome. The output includes, foreach independent variable in the model, an estimate of theregression coefficient (i.e., the log odds ratio) and its stan-dard error, Wald chi-square statistic , and p value, as well asthe corresponding odds ratio and confidence limits. Alsoincluded are the results of the score test of heterogeneity.Deviance from a saturated model can be obtained with amacro provided by Wan et al. [45]. Estimates of the binaryodds ratios at each of the cut-points are not generated rou-tinely; however, a simple SAS macro for this purpose is in-cluded in the Appendix.

    For the continuation ratio model, any standard statisticalpackage that performs binary logistic regression can be used,but the original data set must first be restructured [26,27].This new data set is created by repeatedly including subsetsof all observations contributing to each respective cut-point. Two new variables must also be added to this dataset: one indicating the cut-point from which the particular

  • 7/30/2019 Statistical Assessment

    5/11

    Statist ical Assessment of Ordinal Outcomes 49

    TAB LE Z. Hypothetical example to describe restructuring ofdata when fitting the continuation ratio model with logisticregression softwareFrequency and du- Total number of subjec tsration categories of (exposed and unexposedpain Category combined)None 0 no= 35Short Infrequent 1 n, = 23Short Recurrent 2 n2= 17Long 3 n3 = 4Almost Continuous 4 n4= 15Continuous 5 n5= 6

    100

    subset arose and a second, a binary variable indicating thedichotomous status of the outcome at that cut-point. Forexample, using the data presented in Table 2, let no = 35be the number of subjects with no back pain, ni = 23 bethe number of subjects with short infrequent pain, etc. Thefir st subset of data to be created would have the same num-her of observations as the original data set, because all no+ n, + nz + ni + n4 + nS = 100 observations would beincluded. Each observation in this data subset would be as-signed a cut-point level of 1. In keeping with the dichotomi-zation for the first cut-point, the no = 35 observations withoutcome level none would be assigned the new dichoto-mous outcome 0 and the remaining n, + n2 + n3 + n4 +n5 = 65 observations would be assigned a value of 1. Thesecond subset of data to be included in the restructured filewould consist of the ni + n2 + n3+ n4 + n5 = 65 observa-tions remaining at cut-point 2, all assigned a stratum levelof 2; the n, = 23 observations with short, infrequent painwould be assigned the dichotomous outcome 0 and the re-maining nz + ni + n4 + n5 = 42 observations would beassigned a value of 1. Continuing in this manner, the re-structured data set would contain a total o f 100 + 65 + 42+ 25 + 21 = 253 observations.

    The continuation ratio is obtained by performing binarylogistic regression on the restructured data set, with the newdichotomous outcome as the dependent variable, and in-cluding as a categorical independent variable the newly cre-ated variable indicating cut-point level [26,27]. The as-sumption of heterogeneity can be tested by including in themodel terms representing the interaction ef fect between theexposure o f interest and the cut-point indicator variables,and then comparing the log-likelihood of this model to onewithout these interaction terms. I f significant heterogeneityof effect over cut-points is found, the continuation ratiomodel can be easily adapted to allow effects to var y by cut-point. By including terms that model the interaction ef fectof the exposure variable with the series of dummy variablesrepresenting cut-points, a model with a separate odds ratiofor each cut-point is obtained. A linear trend could similarlybe modeled by coding the cut-point indicator, and the cor-responding interaction term, as continuous.

    A SAS macro to restructure the data and to calculatecut-point speci fic estimates, the continuation ratio and thelog likelihoods necessary for testing homogeneity over cut-points, is provided in the Appendix.

    The continuation ratio model can also be run withPROC LOGISTIC, without restructuring the data, by usinga complementary log-log link, rather than a logistic link.However, testing of the assumption of homogeneity of effectover strata is limited with this method: the test performedwith this procedure is global, simultaneously testing all pa-rameter estimates for homogeneity over cut-points.

    AN ILLUSTRATION OF ORDINAL REGRESSIONWe will illustrate ordinal regression techniques using datafrom a comparative, retrospective cohort study of adoles-cent idiopathic scoliosis (AIS). This study has been de-scribed in detail elsewhere [53] and results for a numberof end-points have been presented [38-40,54,55]. Briefly , acohort of children and young adults referred to Ste-JustineHospital, Montreal, Canada, for diagnosis and managementof AIS from 1960 to 1979 was identified. A postal question-naire, administered between 1989 and 1990, provided infor-mation on sociodemographic, occupational and personalcharacterist ics as well as various health outcomes in adult-hood, including back and neck pain. The original AIS co-hort consisted of 2092 subjects; of these, 1858 subjects(89%) were traced and 1476 returned questionnaires (80%of those traced). A population-based comparison group,consisting of 1755 randomly selected individuals, also com-pleted the questionnaire.

    Questions on back and neck pain were taken f rom twogeneral health surveys [56,57], the McGill Pain Question-naire [58], the Roland Index [59], and the Oswestry LowBack Pain Disabi lity Questionnaire [60]. These instrumentswere used to elicit information on back pain at the time ofsurvey and during the preceding year. A number of indicesdescribing back pain were used to compare the pain experi-ence between the two study groups, including the presenceand intensity of pain, history of pain, and related disabili tyand handicap.

    For the purposes of this paper, we selected an outcomethat combined the frequency and duration of pain episodesreported for the year prior to filling out the questionnaire.These two variables were combined into six ordinal levelsof pain occurrence: ( 1) no pain episodes; (2) 1 or 2 episodesthat lasted less than one week (short, infrequent); (3) 3to 12 episodes, with an average duration of less than oneweek (short, recurrent); (4) up to 12 episodes, lasting onaverage more than one week (long); (5) more than 12episodes (almost continuous); and (6) continuous pain.This restructured variable was chosen for presentation overother ordinal outcomes because it was gathered on thelargest group of subjects and it provided a good example ofdiscrimination between ordinal regression models.

    One of the issues that we wish to illustrate with this data

  • 7/30/2019 Statistical Assessment

    6/11

    S. C. Scott et al.

    Frequency and Dura-tion Categories of PainNoneShort InfrequentShort RecurrentLongAlmost ContinuousContinuous

    AIS Comparison Unadjusted and Adjusted Logistic OddsCohort Group Ratios OR, (95% confidence intervals) atn n Successive Incremental Cut-points 4292418

    317

    9100

    41 4-2216

    512

    4

    -

    4-

    OR,1.54Unadjusted 1.70 1.51(0.95,3.06) (0.86,2.66) (0.81,2.93)

    Adjusted

    Proportional Odds Ratio:Score Test of Homogeneity:

    2.07 1.72 2.01 2.28 not(1.02,4.21) (0.90,3.32) (0.95,4.25) (1.03,5.02) estimable

    Unadjusted Adjusted1.65 (1 .OO, 2.72) 1.95 (1.12, 3.43)

    p=O.78 p=O.Ol

    4OR4R1.84 2.37(0.92,3.70) (0.71,7.98)

    FIGURE 1. Frequency and duration of back pain in the past year using a proportional odds model to apply increasingly stringentdefinitions of pain.

    is that random error may result in considerable differencesin the cut-point-specific odds ratios. This could lead to in-correct inferences if results based on binary logistic regres-sion performed at some arbitrary cut-point are generalizedbeyond that particular dichotomization. Because of thelarge number of subjects in our study, the impact of randomerror was minimal. We therefore present an analysis per-formed on a sub-sample of 200 women, 100 randomly se-lected from each of the two study groups. As our intent isto show the diff icul ties associated with random variabilityin small samples, only one random sample was drawn. In ouroriginal study, a large number of covariates were includedas potential confounders, although li ttle confounding wasactual ly observed. With the sub-sample of 200, a reducedset of covariates was used, consisting of age, smoking status,education, and an index of lifting on the job. As this smallersample does not provide sufficient power to compare thegoodness-of-fit of the two models, we will return to the fullsample of 3231 for this analysis.

    The number of people in the sub-samples of the twogroups at each of the six levels of occurrence of back painare shown in Figs. 1 and 2. Using a standard chi-square teston proportions, no significant difference between groupswas found (p = 0.353; 5 degrees of freedom [df l), even aftercollapsing some of the less prevalent levels o f outcome(p = 0.210; 3 df). Strikingly different conclusions arereached when using ordinal regression models.

    Figure 1 also shows the successive incremental cut-pointsinvolved in the formulation of the proportional odds ratio,

    with both the crude and adjusted binary logistic odds ratiosand corresponding 95% confidence intervals (CI) calcu-lated at each of the individual cut-points. For example, atthe firs t cut-point, OR, represents the odds ratio from abinary logistic model comparing any pain to no painbetween groups. At the second cut-point, the degree of se-veri ty required for classification as a case is increased bycombining persons with only one or two short episodes ofback pain with those without back pain. Over all cut-points,the adjusted binary logistic odds ratios ranged from 1.72 to2.28, with no estimate produced at the last cut-point. Twoof the four estimates produced were statistically significant.Confounding was minimal (odds ratios from the unadjustedbinary models were similar to those from the adjusted mod-els, ranging from 1.51 to 2.37).

    Applying the proportional odds regression model, oddsratios from the fiv e statistically dependent strata are summa-rized by the adjusted proportional odds ratio of 1.95 (95%CI: 1.12-3.43). This value represents the odds of havinghad back pain during the previous year in the AIS grouprelative to the comparison group, independent of the levelof severi ty chosen to separate subjects with pain from thosewithout pain. Because it is based on a series of dichotomousdefinitions of pain, the proportional odds ratio is more infor-mative than an odds ratio from a single binary logistic re-gression. Comparing results from the proportional oddsmodel with those from the series of binary logistic regres-sions performed at each cut-point, it is clear that the propor-tional odds model produced a more stable estimate; the in-

  • 7/30/2019 Statistical Assessment

    7/11

    Statistical Assessment of Ordinal Outcomes 51

    Frequency and Dura-tion Categories of Pain

    NoneShort InfrequentShort RecurrentLongAlmost ContinuousContinuous

    AIS Comparison Adjusted Logistic Odds Ratios OR,Cohort Group (95% confidence interval) atn n Conditional Incremental Cut-points + -

    29 4124 2218 16

    3 517 12

    9 4100 100

    4t-

    4 I 1 +-ORI OR2 OR3 OR, OR,

    2.07 1.12 1.72 3.33 not(1.12,4.21) (0.48,2.59) (0.57,5.18) (0.55,20.20) estimable

    Adjustedcontinuation Ratio: 1.61 (1.05, 2.48)Test of Homoogeneity: 0.7O 0.70).

    Using the reduced data set of 200 subjects, both the pro-portional odds and continuation ratio models provided areasonable fi t to the data. The conclusions drawn from one

  • 7/30/2019 Statistical Assessment

    8/11

    s. c. Scott et al.

    TABL E 3. Goodness of fit of the proportional odds and con.tinuation ratio models u sing the Ml study population (N =3,231)Proportional odds Continuation ratiomodel (adjusted) Cut-point model (adjusted)2.14 (1.81, 2.53) OR,2.07 (1.76, 2.44) OR>2.06 (1.72, 2.47)2.15 (1.77, 2.60)2.38 (1.69, 3.37)

    2.14 (1.81, 2.53)1.56 (1.27. 1.91)1.32 (1.01; 1.71j1.55 (1.02, 2.38)1.39 (0.94, 2.06)

    Summary2.14 (1.86, 2.46) odds ratio 1.70 (1.53, 1.89)p > 0.70 (crude) Test of homogeneity 0.0005 < p < 0.005As discussed n rext.

    or the other would not be strikingly different. We preferredthe proportional odds model for this data because of its easeof interpretation: it represents a summary ratio of the oddsof back pain between the two groups that is indepen-dent of the level of severi ty used to class ify back pain.

    while the test from the crude model did not (p = 0.77).Sparse data did not appear to be the main problem here:collapsing levels of the outcome as was done with the sub-sample of 200 still produced a low p value (p = 0.03). Anexplanation of the problem was found by examining resultsfrom a series of models that included the exposure variablepaired with each of the confounding variables. taken one- t a time. These results suggested that the low p value f romthe score test of the adjusted model reflected a lack of pro-portionality in the relationship between the outcome andsome of the confounding variables. Thus, we are confidentthat the heterogeneity being reflected by the test p value isnot due to the association between the exposure of interestand the outcome. However, what remains unclear is theextent of residual confounding that results from imposingthe assumption of proportionality on covariates whose rela-tionship with the outcome is not proportional. This couldbe sorted out by applying the nonproportional models de-scribed by Peterson and Harrell [43,44]; unfortunately, thesoftware they discuss is no longer available. Work on thisarea using the location-scale models of Cox [51] is underway.

    While it is diff icul t to discriminate between the propor-tional odds and continuation ratio models using the sampleof 200 subjects, we were able to do so with the original dataof 3231 subjects. From Table 3 it is clear that the continua-tion ratio model does not provide a good fit to this data,with the p-value from the difference in log likelihoods beingless than 0.005. There was considerable variation in theodds ratios calculated with binary logistic regression at eachof the conditional incremental cut-points, with adjustedOR, ranging from 1.32 to 2.14 compared to a range of 2.06to 2.38 with the proportional odds model. Comparisons be-tween cut-point-specific estimates from the two modelsshould be made with caution. First, with the continuationratio model, cut-point-speci fic estimates can be consideredindependent, while those from the proportional odds modelcannot. Second, more random variation within the cut-point-specific estimates is to be expected with the continu-ation ratio model since it is based on conditional, ratherthan incremental, cut-points. Each of the binary odds ratiosbased on cut-points of the proportional odds model include323 1 subjects; however, with the continuation ratio model,only the fir st cut-point-specific estimate is based on 3231individuals. The subsequent cut-point-specific estimates setout in Table 3 have 1175, 1931,2389, and 2522 fewer sub-jects, respectively. Although the cut-point-specific esti-mates based on the proportional odds model appear to behomogeneous, results of the formal test of proportionalityfor the adjusted model are diff icul t to interpret. While amodel with no covariates produces estimates of effect simi-lar to those of the adjusted model (crude models: OR, be-tween 2.01 and 2.17, with a proportional odds ratio o f 2.06),the test of proportionality from the adjusted model indi-cated statistically significant heterogeneity (p = O.OOl),

    DISCUSSIONWe have shown by way of example that serious errors ininference can be made if conclusions drawn from compara-tive studies are based on analyses o f ordinal outcomes inwhich the ordinality of the data is not maintained. In par-ticular, chi-square tests of association can be highly mis-leading, especially in relat ively small samples. The loss ofpower that results from reducing the ordinal scale to a nomi-nal one and from applying a global test of association maylead to an incorrect decision not to reject the null hypothe-sis. In addition, these tests do not lead to an estimate of themagnitude of ef fect between study groups and do not takeinto account the effect s of covariates. We have also shownthat the use of binary logistic regression, in which an arbi-trary cut-point is selected to dichotomize the ordinal out-come, can lead to an estimate of effect that is applicableonly for that particular cut-point; inferences outside theboundaries of that cut-point may be incorrect. Thus, binarylogistic regression does not provide an adequate summary o fthe data, due to statistical variability in cut-point-specificestimates of odds ratios as well as to a loss of informationwhen the multinomial nature of the data is not accountedfor.

    After selecting carefully designed and validated scales,coupled with the time and eff ort involved in collecting data,it behooves investigators to apply the most appropriate sta-tistical techniques. With the statist ical methods and com-puter software readi ly available, ordinal regression tech-niques merit more widespread use.

    In our examples based on the sub-sample of 200 subjects,the two main models for ordinal regression yielded fai rly

  • 7/30/2019 Statistical Assessment

    9/11

    Statistic al Asses sment of Ordinal Outcomes 53

    similar results. This situation, in which several differentmodels may provide an adequate fi t to data, is not uncom-mon in epidemiology. Thus, care should be taken to selecta model based on the nature of the health index under con-sideration. However, when one model does not provide agood fi t, it may be that another, based on a different cut-point structure, would be more appropriate. When de-termining goodness-of-fi t, the actual cut-point-specific esti-mates should be considered in tandem with the homogene-ity test statistic. It is hoped that in the future more robusttests of homogeneity for the proportional odds model willbe developed.

    In conclusion, ordinal regression models offer an attrac-tive option for the modeling of ranked data: they make fulluse of the structure of an ordinal scale; they are more e ff i-cient than binary logistic regression at arbitrarily selectedcut-points; and the estimates produced have a broad inter-pretation, applicable across multiple dichotomizations ofthe outcome. The proportional odds model is especially ap-pealing in situations in which there is no preferred or clini-cally relevant point at which to dichotomize the data, as arelative measure of ef fect is obtained that is not dependenton the level chosen to define an event. We believe thatordinal regression analysis should be a part of the standardstatis tical repertoire used by epidemiologists and clinical re-searchers.The authors would like to thank Dr. Ben G. Armstrong for his criticalreading and comments on a druft of this article. The study of adolescentidiopathic scoli osis euas supported by CAFIR (Uniuersite de Montr&l)and Le For& de la recherche en santi du Quebec (FRSQ), projectnumbers 871352 and 901397.

    APPENDIX*Th is appendix contains the macro ORDINAL, that produces:(1) cut-point-specific betas and standard errors for odds ratioscontributing to the proportional odds ratio (crude and adjusted);(2) betas and standard errors for proportional odds ratios (crudeand adjusted) and p value for test of homogeneity; (3) cut-point-spec ific betas and standard errors for odds ratios contributing tothe continuation ratio (adjusted only); and (4) beta and standarderror for continuation ratio (adjusted) and log likelihoods neces-sary to perform test of homogeneity.

    The following parameters are required to run this macro:(1) SASdata is the name of the input SAS data set; (2) Out-come is the name of the outcome variable. It should be coded 0= least severe category to k = most severe category; (3) Tcutptsis the total number of cut-points, equivalent to 1 less than thetotal number of levels of the outcome variable; (4) Exposure isthe name of the exposure variable. It should be coded 1 = exposedor 0 = unexposed, or can be continuous. The macro can be ad-

    Available on request on the Internet from Dr. Goldberg ([email protected]). The SAS macro for calculating the likelihood of thesaturated proportional odds model prepared by Wan et ul. [45] can similarlyhe made available.

    justed to accept categorical levels; and (5) Confound is a list ofconfounding variables.

    These parameters should be entered in the following macro call,and placed at the end of the program: %ORDINAL( SASdata,out-come,tcutpts,exposure,confound);*/%MACRO ORDINAL(SASdata,outcome,tcutpts,exposure,con,found);*Proportional Odds Model;%do i=l %to &tcutpts;

    data t;set &SASdata;

    if .

  • 7/30/2019 Statistical Assessment

    10/11

    54 s. c. Scottet al.label dvar=Dichotomized outcome based on cutpoint;label ctptl=Indicator of Cutpoint 1;label ctpt&tcutpts=Indicator of Cutpoint &tcutpts;output;

    end;end;

    run;proc sort data= restruct;

    by ORi;run;proc logistic data=restruct descending;

    by ORi;model dvar=&exposure &confound /risklimits;title &outcome: Adjusted ORi contributing to Continuation

    OR;run;proc logistic data=restruct descending ;

    model dvar=&exposure ctpt2+ctpt&tcutpts &confound/risklimits;

    title &outcome: Adjusted Continuation OR;run;proc logistic data=restruct descending ;

    model dvar=&exposure ctpt2-ctpt&tcutpts ctptint2-ctptint&tcutpts &confound;title &outcome: 2nd log likelihood needed to test homoge-

    neity;run;%MEND;

    Example: To obtain: (1) proportional and cut-point spec ific binaryodds ratios, with 95% confidence limits, and global test of homoge-neity, and (2) continuation and cut-point spec ific binary odds ra-tios, with 95% confidence limits, an d log likelihoods necessary totest homogeneity from: (i) a SAS data set called tt, (ii & iii) a 6-leveled ordinal outcome variable called pain (coded O=no painto 5 =continuous pain), (iv) an exposure variable called status, and(iv) contro lling for sex (coded O- 1 ), age (continuous) and smokingstatus (2 dummy variables, smkl & smk2), submit the followingcall statement:

    %ORDINAL(tt,pain,5,status,%STR(sex age smkl-smk2));*/

    References1. Kleinbaum DC, Kupper LL, Muller KE. Applied Regression

    Analysis and Other Multivariable Methods. Boston: PWS-Kent Publishin g Company; 1988: 10-11.

    2. Thom as DC, Goldberg M, Dewar R. Statistic al methods forrelating several exposure factors to several disea ses in case-heterogeneity studies. Stat Med 1986; 5: 49-60.

    3. Begg CB, Gray R. Calculation of polychotomous logistic re-gression parameters us ing individualized regressions. Biome-trika 1984; 71: 11-18.

    4. Everitt BS. The Analysis of Contingency Table s. London:Chapman & Hall; 1992: 117-135.

    5.6.

    7.

    8.9.

    10.

    11.12.

    13.14.

    Hastie TJ, Botha JL, Schnitzler CM. Regression with an or-dered categorical response. Stat Med 1989; 8: 785-794.Lipsitz SR, Buonc ristiani JF. A robust goodness-of-fit test sta-tistic with application to ordinal regression models. Stat Med1994; 13: 143-152.Walker SH, Duncan DB. Estimation of the probability of anevent as a function of several independent variables. Biome-trika 1967; 54: 167-179.Fienberg SE. The Analysis of Cross-Classified CategoricalData. Cambridge, MA: MIT Press; 1980: 110-116.McCullagh P. Regression models for ordinal data (with discus-sion). J R Statist Sot [B] 1980; 42: 109-142.Anderson JA, Philip s PR. Regression, discrimination andmeasurement models for ordered categorical variables. A pplStatist 1981; 30: 22-31.Anderson JA. Regression and ordered categorical variables. JR Statist Sot [Bl 1 984; 46: l-30.Cox C, Chuang C. A comparison of chi-square partitioningand two logit analyses of ordinal pain data from a pharmaceu-tical study. Stat Med 1984; 3: 273-285.Greenland S. An application of logistic models to the analysisof ordinal responses. Biom J 1985; 27: 189-197.Brant R. Asse ssing proportionality in the proportional oddsmodel for ordinal logistic regression. Biometrics 1990; 46:1171-1178. ..< . . .15. Greenland S. Alternative models tor ordmal logistic regres-

    sion. S tat Med 1994; 13: 1665-1677.16. Cox C. Multinomial regression models based on continuation

    ratios. Stat Med 1988; 7: 435-441.17. Whitehead J. Sample size calcula tions for ordered categorical

    data. Stat Med 1993; 12: 2257-2271.18. Hilton JF, Mehta CR. Power and sample size calculatio ns for

    exact conditional tests with ordered cate gorical data. Biomet-rics 1993; 49: 609-616.

    19. Agresti A. A survey of models for repeated ordered categoricalresponse data. Stat Med 1989; 8: 1209-1224.

    20.21.

    22.23.24.

    25.

    26.27.28.

    29.

    30.

    Hedeker D, Gibbons RD. A random-effects ordinal regressiontnodel for multilevel analysis. B iometrics 1994; 50: 933-944.Kenward MC, Lesaffre E, Molenberghs G. An application ofmaximum likelihood and generalized estimating equations tothe analysis of ordinal data from a longitudinal study withcases missin g at random. Biome trics 1994; 50: 945-953.Qu Y, Piedmonte MR, Medendorp SV. Latent variable modelsfor clustered ordinal data. Biometrics 1995; 51: 268-275.Campbe ll MK, Donner A, Webster KM. Are ordinal modelsuseful for classific ation? Stat Med 1991; 10: 383-394.Bell R. Comment re: Campbe ll MK, Donner A, Webster KM.Are ordinal models useful for classifica tion? Stat Med 1992;11: 133-134.Ashby D, Pocock SJ, Shaper AC. Ordered polytomous regres-sion: An example relating serum biochemistry and haematol-ogy to alcoho l consum ption. Appl Stat 1986; 35: 289-301.Armstrong BG, Sloan M. Ordinal regression models for epide-miolog ic data. Am J Epidem iol 1989; 129: 191-204.Berridge DM, Whitehead J. Analysis of failure time data withordinal categories of response. Stat Med 19 91; 10: 1703-1710.Brazer S R, Pancotto FS, Long TT, III, Harrell FE, Jr, Lee KL,Tyor MP, Pryor DB. Using ordinal logistic regression to esti-mate the likelihood of colorectal neoplasia. J Clin Epidemiol1991; 44: 1263-1270.

    Ashby D, West CR, Ames D. The ordered logistic regressiontnodel in psychiatry: Rising prevalence of dementia in old peo-ples homes. Stat Med 1989; 8: 1317-1326.Greenwood C, Farewell V. A cotnparison of regression modelsfor ordinal data in an analysis of transplanted kidney function.Can J Stat 1988; 16: 325-335.

  • 7/30/2019 Statistical Assessment

    11/11

    Statistica l Assess ment of Ordinal Outcomes 55

    31. Lee J. Cumulative logit mode lling for ordinal response vari-ables: Applica tions to biomed ical research. Comput Appl Bi-osci 1992; 8: 555-562.

    32. Linzer M, Prystowsky EN, Divine GW, Matchar DB, SamsaG, Harrell F, Pressley JC, Pryor DB. Predicting the outcomesof electrophysiologic studies of patients w ith unexplained syn-cope: Preliminary validation of a derived model. J Gen IntMed 1991; 6: 113-120.

    33. Fitzpatrick MF, Martin K, Fossey E, Shap iro CM, E ltonRA, Douglas NJ. Snoring, asthma and sleep disturbance inBritain: A community-based survey. Eur Respir J 1993; 6:531-535.

    34. Rizzoli G, Tiso E, Mazzucco A, Daliento L, Rubino M, TursiV, Fracasso A . Discrete subaortic stenos is. Operative age andgradient as predictors of late aortic valve incompeten ce. JThorac Cardiovasc Surg 1993; 106: 95-104.

    35. Chen Y. Environmental tobacco smoke, low birth weight, andhospitalization for respiratory disease . Am J Respir Crit CareMed 199 4; 150: 54-58.

    36. Ro ss JDC, Brettle R, Zhu C, Haydon G, Elton RA. A compar-ison of AIDS-defining events and subsequent CDC stage IVevents in IDUs and gay men. Int J STD & AIDS 1994; 5 :419-423.

    37. Grossi SC, Genco RJ, Machtei EE, Ho AW, Koch 0, Dun-cord R, Zambon JJ, Hausmann E. Assess ment of risk for peri-odontal disease. II. Risk indicators for alveolar bone loss. JPeriodontol 1995; 66: 23-29.

    38. Goldberg MS, Mayo NE, Poitras B, Scott S, Hanley J. TheSte-Justine adolescent idiopathic sco liosis cohort study. II.Perception of health, self and body image, and participationin physical activities. Spine 1994; 19: 1562-1572.

    39. Mayo NE, Goldberg MS, Poitras B, Scott S, Hanley J. TheSte-Justine adolescent idiopathic sco liosis cohort study. III.Back pain. S pine 1994; 19: 1573-1581.

    40. Poitras B, Mayo NE, Goldberg MS, Scott S, Hanley J. TheSte-Justine adolescent idiopathic sco liosis cohort study. IV.Surgical correction and back pain. Spine 1994; 19: 1582-1588.41. Rothman KJ. Modern Epidemiology. Toronto: Little, Brownand Company; 1986. pp. 115-125.

    42. Armstrong BG, Sloan M. The Authors Reply. Am J Epide-miol 1990; 131: 745-748.

    43. Peterson BL, Harrell FE, Jr. Partial proportional odds modelsand the LOGIST procedure. Proc 13th A Conf SAS UsersGroup Int, Orlando. Cary, NC: SAS Institute, 1988; pp. 952-956.

    44. Peterson BL, Harrell FE, Jr. Partial proportional odds modelsfor ordinal response variables. Appl Stat 1990; 39: 205-217.

    45. Wan J Y, Wang W, Bromberg J. A SAS macro for residual

    deviance of ordinal regression analysis. Computer Meth ProgBiomed icine 1994; 45: 307-310.

    46. Cox DR. Regression models and life tables (with discussio n).J R Stat Sot [B] 1972; 34: 187-220.

    47. Wacholder S. Binom ial Regression in GLIM: Estimation ofrisk ratios and risk differences. Am J Epidem iol 1986; 123:174-184.48. Imrey P B, Koch GG, Stokes ME. Categorical data analysis:Some reflections on the log linear model and logistic regres-sion. Int Statist Rev. 1981; 49: 265-283.

    49. Koch GG, Amara IA, Singer JM. A two-stage procedure forthe analysis of ordinal categorical data. In: Sen PK, Ed. Biosta-tistics: Statistic s in Biome dical, Pub lic Health and Environ-mental Scienc es. North-Holland: Elsevier Scienc e PublishersB.V.; 1985, p. 357-387.

    50. SAS Institute Inc. SAS /STA T Users Guide, Version 6,Fourth Edition, Volume 2. Cary, NC: SAS Institute Inc.;1989.

    51. Cox C. Location-scale cumulative odds models for ordinaldata: A generalized non-linear model approach. Stat Med1995; 14: 1191-1203.

    52. Angelos Tosteso n AN, Begg CB. A general regression meth-odology for ROC curve estimation. Med Decis Making 1988;8: 204-215.

    53. Goldberg MS, Mayo NE, Poitras B, Scott S, Hanley J. TheSte-Justine adolescent idiopathic scol iosis cohort study. I. De-scription of the cohort. Spine 1994; 19: 1551-1561.

    54. Levy AR, Goldberg MS, Hanley JA, Mayo NE, Poitras B.Projecting the lifetime risk of cancer from exposure to diagnostic ionizing radiation for adolescen t idiopathic sco liosis.Health Phys 1994; 66: 621-633.

    55. Levy AR, Goldberg MS, Mayo NE, Hanley JA, Poitras B. Re-ducing the lifetime risk of cancer from spinal radiographsamong persons with adolescent idiopathic scoliosis. Spine1996; 21: 1540-1548.

    56. Canada Hea lth Survey. Health and Welfare Canada, Statis-tics Canada. The Health of Canadians. Report of the CanadaHealth Survey. Ottawa, Ontario: Minister of Supp ly and Ser-vices Canada; 1981.

    57. Sante Quebec. Et la Sante, Ca va? Rapport de lenquiteSante Quebec 1987. Quebec, Canada: L es Publica tions duQuebec; 1988.

    58. Melzack R. The McGill pain questionnaire: Major propertiesand scoring methods. Pain 1975; 1: 277-299.

    59. Roland M, Morris R. A study of the natural history of backpain. I. Development of a reliable and sensitive measure ofdisability in low-back pain. Spine 1983; 8: 141-144.

    60. Fairbank J, Couper J, Davies JB, OBrien J. The Oswestry lowback pain disability questionnaire. Physiotherapy 1980; 66:271-273.