bivariate versus univariate ordinal categorical data with reference to an ophthalmologic study

16
This article was downloaded by: [Monash University Library] On: 30 September 2013, At: 14:15 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Statistical Computation and Simulation Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/gscs20 Bivariate versus univariate ordinal categorical data with reference to an ophthalmologic study Jean-François Angers a & Atanu Biswas a a Département de Mathématiques et de Statisque, Université de Montréal, C.P. 6128, Succ., “Centre-ville” Montréal, Québec, H3C3J7, Canada b Applied Statistics Unit, Indian Statistical Institute, 203 B. T Road, Kolkata –, 700 108, India Published online: 21 May 2008. To cite this article: Jean-François Angers & Atanu Biswas (2008) Bivariate versus univariate ordinal categorical data with reference to an ophthalmologic study, Journal of Statistical Computation and Simulation, 78:6, 489-502 To link to this article: http://dx.doi.org/10.1080/00949650701230563 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Upload: atanu

Post on 12-Dec-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

This article was downloaded by: [Monash University Library]On: 30 September 2013, At: 14:15Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Statistical Computation andSimulationPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/gscs20

Bivariate versus univariate ordinalcategorical data with reference to anophthalmologic studyJean-François Angers a & Atanu Biswas aa Département de Mathématiques et de Statisque, Université deMontréal, C.P. 6128, Succ., “Centre-ville” Montréal, Québec,H3C3J7, Canadab Applied Statistics Unit, Indian Statistical Institute, 203 B. TRoad, Kolkata –, 700 108, IndiaPublished online: 21 May 2008.

To cite this article: Jean-François Angers & Atanu Biswas (2008) Bivariate versus univariate ordinalcategorical data with reference to an ophthalmologic study, Journal of Statistical Computation andSimulation, 78:6, 489-502

To link to this article: http://dx.doi.org/10.1080/00949650701230563

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

Journal of Statistical Computation and SimulationVol. 78, No. 6, June 2008, 489–502

Bivariate versus univariate ordinal categorical data withreference to an ophthalmologic study

JEAN-FRANÇOIS ANGERS† and ATANU BISWAS*‡

†Département de Mathématiques et de Statisque, Université de Montréal,C.P. 6128, Succ. “Centre-ville” Montréal, Québec, H3C3J7, Canada

‡Applied Statistics Unit, Indian Statistical Institute, 203 B. T Road, Kolkata – 700 108, India

(Received 14 July 2005; in final form 20 January 2007)

The Wisconsin Epidemiologic Study of Diabetic Retinopathy is a population-based epidemiologicalstudy carried out in Southern Wisconsin during the 1980s. The resulting data were analysed by differentstatisticians and ophthalmologists during the last two decades. Most of the analyses were carried outon the baseline data, although there were two follow-up studies on the same population. A Bayesiananalysis of the first follow-up data, taken four years after the baseline study, was carried out by Angersand Biswas [Angers, J.-F. and Biswas, A., 2004, A Bayesian analysis of the four-year follow-up dataof theWisconsin epidemiologic study of diabetic retinopathy. Statistics in Medicine, 23, 601–615.],where the choice of the best model in terms of the covariate inclusion is done, and estimates of theassociated covariate effects were obtained using the baseline data to set the prior for the parameters. Inthe present article we consider an univariate transformation of the bivariate ordinal data, and a parallelanalysis with the much simpler univariate data is carried out. The results are then compared with theresults of Angers and Biswas (2004). In conclusion, our analyses suggest that the univariate analysisfails to detect features of the data found by the bivariate analysis. Even an univariate transformationof our data with quite high correlation with both left and right eyes is inadequate.

Keywords: Univariate ordinal data; Bivariate ordinal data; Sensitivity analysis; Bayesian selectionmodel

1. Introduction and data description

Wisconsin Epidemiologic Study of Diabetic Retinopathy (WESDR) is a population-basedstudy in Southern Wisconsin between 1980 and 1982, in which a total of 996 insulin-takingyounger onset diabetic persons were examined using standard protocols to determine theprevalence and severity of diabetic retinopathy and the associated risk variables [see refs. 1, 2].The basic goal of the study [cf. 1, 2, 3] was to find the associated risk factors that are important inplanning a well-coordinated approach to the public health problem posed by the complicationsof diabetes. There were 4-year and 10-year follow-up examinations [see refs. 4, 5].

*Corresponding author. Email: [email protected]

Journal of Statistical Computation and SimulationISSN 0094-9655 print/ISSN 1563-5163 online © 2008 Taylor & Francis

http://www.tandf.co.uk/journalsDOI: 10.1080/00949650701230563

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

490 J.-F. Angers and A. Biswas

The observations corresponding to any individual (at any time point) was a bivariate ordi-nal, categorical in nature (in the presence of several covariates). Both the right and left eyeretinopathy severity levels are recorded as two components of the bivariate response. Possiblevalues are 10, 21, 31, 37, 43, 47, 53, 60, 61, 65, 71, 75 and 85 corresponding to increasinglevels of severity of retinopathy within an eye.

Three eye-specific covariates are recorded. These are right and left eye macular edema(ME) (present/absent), right and left eye refractive error (RE) in diopters and right and lefteye intraocular pressure (IOP) in mmHg. In addition, eight person-specific covariates arerecorded, namely the duration of diabetes (DuD) in years, glycosylated hemoglobin (GH) inpercent, systolic and diastolic blood pressures (SBP & DBP) are measured in mmHg., bodymass index (BMI) in kilograms per meter squared, pulse rate (PR) in beats per 30 seconds,urine protein (UP) (present/absent) and doses of insulin (DI) per day.

Analysis of ordinal categorical data becomes much more complicated when the ordinalcategorical data are multivariate in nature. Dale’s [6] paper opened the floodgate followed bya deluge of papers in this direction. Many of the early analyses were in the frequentist’s setup [see refs. 7–12]. Note that, most of the frequentist’s approaches are computationally quiteexpensive.

Angers and Biswas [13] carried out a Bayesian analysis of the 4-year follow-up data usingthe baseline data to fix the priors for different parameters. In the present paper, we first providea brief description of the methodology and the results of the analysis of the bivariate data insection 2. In section 3, we discuss a transformation of the bivariate data to an univariate one,and provide the methodology of analyzing such univariate data. Quite naturally, the techniqueof analysis will be simpler and the computational task will be easier. The results are presentedin section 4. Section 5 concludes.

2. Analysis of bivariate data

Let yLi and yRi denote the bivariate ordered categorical responses for the ith indi-vidual corresponding to the left and right eye, respectively. Note that, yLi, yRi ∈{10, 21, 31, 37, 43, 47, 53, 60, 61, 65, 71, 75, 85}. Let yL and yR be the vectors combiningyLi’s and yRi’s for all the individuals. In order to assume normality of the error terms, we addzLi and zRi to yLi and yRi , where (zLi, zRi)

T ∼ N2(0, σ 2ZIρ) with Iρ is the 2 × 2 correlation

matrix with unknown correlation ρ, and σ 2z is such that yLi ± 3σz and yRi ± 3σz will not

change categories. Let zL(zR) be the vector combining all the zLi’s (zRi’s).Let uL = yL + zL and uR = yR + zR and these uL and uR are the true values and we observe

yL and yR in place of them. We model uL and uR as follows:

u = Xθ + ε,

where

u =(

uL

uR

), X =

(X0 X1 0

X0 0 X2

), θ = (βT

0 , βT1 , βT

2 )T , ε = (εTL , εT

R )T ,

β0 are the covariate common to both eyes, β1, those specific for the left eye only and β2, thosefor the right eye only and ε ∼ N2n(0, σ 2I2n).

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

Bivariate versus univariate ordinal data 491

For the p-component parameter vector θ , we consider

θ ∼ Np

(θ0,

σ 2

κV

), (1)

where the hyperparameters θ0 and κ(κ ≤ 1) are assumed to be known and σ 2 is unknown. Aninverted gamma prior with parameters α and β is considered for σ 2. The prior of ρ is chosento be an uniform density on the interval (−1, 1). We use the baseline data to estimate θ0. Todenote the baseline data we just put ‘ * ’ to yL, yR and X. Thus θ0 is estimated using y∗

L, y∗R

and X∗ as follows:

θ0 = (X∗T X∗)−1X∗T y∗,

where y∗ = (y∗TL , y∗T

R )T . Sensitivity analysis on κ is also needed to be done.Using standard technique, after some routine steps, it can be shown that the conditional

posteriors of θ , σ 2 and ρ are

θ |u, σ 2, ρ ∼ Np((κV−1 + XT U−1X)−1(κV−1θ0

+ XT U−1XθLS(ρ)), σ 2(κV−1 + XT U−1X−1),

σ 2|ρ, u ∼ I�

(α + n − 2

2, 0.5 × [β + rT U−1r + (θLS(ρ) − θ0)

T

× [κ−1V + (XT U−1X)−1]−1(θLS(ρ) − θ0)])

,

π3(ρ|u) ∝ 1

|D − ρE + (1 − ρ2)κV−1|1/2

×(

(1 − ρ2)

rT r − 2ρt + (1 − ρ2)[β + h(ρ)])(α+n)/2

,

where

r = u − XθLS(ρ) = (rTL, rT

R)T , θLS(ρ) = (XT U−1X)−1XT U−1u,

U =(

I ρI

ρI I

), t = rT

LrR,

D = 2XT0 X0 + XT

1 X1 + XT2 X2, and E = 2xT

0 X0 + XT1 X2 + XT

2 X1,

h(ρ) = (θLS(ρ) − θ0)T (κ−1V + (1 − ρ2)(D − ρE)−1)−1(θLS(ρ) − θ0).

Hence, given a fixed value of ρ, we can estimate θ by

θ(ρ) = (κV−1 + XT U−1X)−1(κV−1θ0 + XT U−1XθLS(ρ)).

Note that θ(ρ) does not depend on σ 2. Thus, even if σ 2 is unknown, we will obtain thesame estimator. However, θ(ρ) depends on ρ. Under the squared error loss, the estimator ofθ is given by θ = Eπ3(ρ|u)[θ(ρ)]. This expectation can be computed using the Monte Carlointegration technique. Since u depends on z, which is random, we use EM algorithm [see 14]to find θ as

θ = θ0 + Eπ3(ρ|u)[(κV−1 + XT U−1X)−1XT U−1](u − Xθ0),

where u = (1/m)m∑m

i=1 u(zi ) = y + (1/m)∑m

i=1 zi = y + z.

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

492 J.-F. Angers and A. Biswas

In model selection (in terms of inclusion of the covariates), we are interested to test

H0: θ (2) = 0 against H1: θ (2) �= 0,

where θ = (θT(1), θ

T(2))

T , to decide whether we would include the covariates corresponding toθ (2) in the model or not. Hence, if the above H0 is true, then

θLS(ρ) =(

θLS(1)(ρ)

θLS(2)(ρ)

)∼ Np

((θ (1)

θ (2) = 0

), σ 2(XT U−1X)−1

).

Writing (XT U−1X)−1 =(

A BBT C

), we have

θLS(1)(ρ)|θLS(2)(ρ), ρ, σ 2 ∼ Npi(θ (1) + BC−1θLS(2)(ρ), σ 2F),

where F = A − BC−1BT . One needs to take expectation with respect to σ 2 and ρ, and thatcan be tackled by the Monte Carlo integration technique. Using standard technique, if θ (1) ∼Np1

(θ0(1), (σ

2/κ)V(1)

), the distribution of θLS(1) given σ 2 and ρ is

θLS(1)(ρ)|σ 2, ρ ∼ Np1(θ0(1) + BC−1θLS(2)(ρ), σ 2G),

where G = F−1 − F−1(κV−1(1) + F−1)−1F−1.

Let

m0(u) = Eπ3(ρ|u)[m0,1(θLS(1)(ρ)|θLS(2)(ρ)) · m0,2(θLS(2)(ρ))]be the marginal density of u under the null hypothesis.

Then

m0(u) = Eπ3(ρ|u)

[1

(2πσ 2)p/2|G|1/2|C|1/2

× exp

{− 1

2σ 2

[θT

LS(2)(ρ)C−1θLS(2)(ρ)

+ (θLS(1)(ρ) − θ0(1) − BC−1θLS(2)(ρ))T G−1

× (θLS(1)(ρ) − θ0(1) − BC−1θLS(2)(ρ))]}]

. (2)

Write the marginal of u under the alternative hypothesis as

m1(u) = Eπ3(ρ|u)

[ |XT U−1X|1/2

(2πσ 2)r/2

× exp

{− 1

2σ 2(θLS(ρ) − θ0)

T XT U−1X(θLS(ρ) − θ0)

}].

Then, we accept H0 if m0(u)/m1(u) > 1 and accept H1 otherwise.Several models are tried starting from one component equal to zero to all but one equal to

zero. To implement, the standardized Bayesian estimates are ordered in the following way.Writing Q(i) as the ith diagonal element of the square matrix Q, we write

μi = |θ i |Eπ3(ρ|(u))[σ 2(κV−1 + XT U−1X)−1](i) ,

and suppose μ(1) ≤ μ(2) ≤ · · · ≤ μ(p) be the ordered arrangement. The different possiblemodels are then Ml : μ(1) = · · · = μ(l) = 0 for l = 1, 2, . . . , p − 1. If ml denotes the marginal

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

Bivariate versus univariate ordinal data 493

probability of Ml , and if Bl = ml/m0, with m0 being the marginal probability of the full model,then we accept Ml∗ as the correct model for our purpose if

Bl∗ = max0≤l≤p−1

Bl.

Note that here B0 = 1.It is to be noted that with the chosen a priori model, only numerical integration with respect

to ρ is needed. In order to evaluate θ , the Monte Carlo method with importance sampling isused. The importance sampling function used to generate the ρ values is

g(ρ) ∝ (1 − ρ2)(α+n)/2,

and 5000 iterations were made. For each value of ρ, the z is computed using 1000 iterations.In practice, it is generated from z ∼ N2(0, σ 2

ZIρ/1000).From the computation (cf. tables 9 and 10), we observe that except the constant term,

RSRBL (the effect of the retinopathy scale of the right eye of the baseline data as covariatefor the left eye of the current study), RSLBL (the effect of the retinopathy scale of the lefteye of the baseline data as covariate for the left eye of the current study) and GH came outas important covariates. In fact, Klein et al. [15] also observed GH as an important covariatefor retinopathy. Again, the same scenario was observed in the analyses of Kim [9] and Biswasand Das [16]. The analysis was carried out with these covariates only, for different κ .

3. Analysis of concatenated univariate data

One can easily understand that an analysis with the bivariate ordinal categorical data set isquite complicated and computationally expensive, too. Moreover, one has to take care of thepolychoric correlation (correlation between two categorical random variables). In the presentstudy, our objective is to see whether we can suitably transform the bivariate ordinal categoricaldata to an univariate ordinal categorical data, in the sense that we can clearly catch the majorfeatures of the data in such transformed univariate data, whose analysis is quite simpler.

The first thing to do is, of course, to make a suitable transformation, maintaining the ordinalnature and flavor. In order to do so, let

Zi = aYLi + bYRi.

We want to choose a and b such that the Zi’s are as similar as possible to the YLi’s and YRi’s.This can be achieved by maximizing the sum of the correlation coefficient between Z and YL

and that of Z and YR . It can be shown that

ρZ, YL + ρZ, YR = (1 + ρL,R)(a√

VL + b√

VR

)√

a2VL + b2VR + 2abρL,R

√VLVR

, (3)

where

VL = Var(YLi),

VR = Var(YRi),

ρL,R = Corr(YLi, YRi).

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

494 J.-F. Angers and A. Biswas

Equation (3) is maximum when

a = b

√VR

VL

.

If we impose the constraint that Var(Zi) = (Var(YLi) + Var(YRi))/2, then we obtain

a = 1

2

√VL + VR

VL(1 + ρL,R),

b = 1

2

√VL + VR

VR(1 + ρL,R),

and ρZ,YL+ ρZ,YR

= √2(1 + ρL,R). Consequently, the ‘best’ univariate transformation is

given by

Zi = 1

2

√VL + VR

1 + ρL,R

[√VRYLi + √

VLYRi√VLVR

]. (4)

For the data set under consideration, the different quantities of equation (4) have been estimatedfrom the data and the coefficient of YLi is a = 0.5163, and the one for YRi is b = 0.5174. Con-sequently, Zi is almost equal to the average between YLi and YRi . The correlation coefficientsof Z with YL and YR are respectively ρZ,YL

= 0.9675 and ρZ,YR= 0.9672.

An alternative univariate transformation of the bivariate data is done by the experimentersthemselves. This is a retinopathy scale (RS) in which the retinopathy levels of both eyes areconcatenated into a person-level scale. The RS used in the present work is a more current onethan the one used in some earlier works by different authors. In finding RS, the worse eyeis given greater weight. The fellow eye has either the same level or a lower level. All levelsof proliferative retinopathy (60–85) are grouped together. This results in a 15-level scale:10/10, 21/<21, 21/21, 31/<31, 31/31, 37/<37, 37/37, 43/<43, 43/43, 47/<47, 47/47,53/<53, 53/53, 60+/<60+, 60+/60+, which are numbered 0 through 14. The ordinal natureis maintained by simply putting arbitrarily larger weight to the worse eye [see ref. 17]. Thecorrelation coefficients between the retinopathy scale and the other variables (YL, YR and Z)are given in table 1. Hence, Z is very similar (up to a linear transformation) to RS, since is hasa strong correlation with RS. However, the correlation of Z with YL and YR is slightly higherthan the one of RS with YL and YR . The descriptive statistics of Z (along with the ones ofYL and YR) are given in table 2. From this table, it can be seen that Z is very similar to YL

and YR .Let z be the vector of responses from all the individuals. We model z as

z = X∗γ + ε∗,

Table 1. Correlation coefficientbetween RS and YL, YR and Z.

Variables Correlation

RS and YL 0.9517RS and YR 0.9506RS and Z 0.9833

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

Bivariate versus univariate ordinal data 495

Table 2. Descriptive statistics of Z, YL and Yr .

Z YL YR

Mean 30.571 29.549 29.599Std dev. 16.193 16.209 16.177Minimum 10.337 10 101st Quartile 16.882 21 21Median 26.882 31 313rd Quartile 38.254 37 37Maximum 73.395 71 75

where X∗ = (X0 X1 X2) and γ = (βT0 , βT

1 , βT2 )T . A normal prior, as in equation (1), for γ is

considered as

γ ∼ Np

(γ0,

σ 2

κV

),

with σ 2, κ and V may be different from the bivariate case. But in the univariate set up, nocorrelation parameter comes into consideration. Thus 0 is estimated using z∗, X

∗(baseline

data) as

γ0 = (X∗T X∗)−1X∗T z.

Using standard technique, after some routine steps, it can be shown that the conditionalposteriors of and γ and σ 2 are:

γ |z, σ 2 ∼ Np

([kV−1 + XT X

]−1 [kV−1γ0 + XT XγLS

], σ 2

[kV−1 + XT X

]−1)

,

σ 2|z ∼ I�

(α + n − 2

2, 0.5 × [

β + rT r + (γLS − γ0)T

×[k−1V +

(XT X

)−1]−1

(γLS − γ0)

]),

where γLS = (XT X)−1XT z and r = z − XγLS. Hence, under the squarederror loss, the Bayes

estimator of γ is independent of σ 2 and it is given by:

γ =[κV−1 + XT X

]−1 [κV−1γ0 + XT XγLS

]

and its posterior expected loss is σ 2[κV−1 + XT X]−1, where

σ 2 = 1

α + n − 4×

[β + rT r + (γLS − γ0)

T [κ−1V + (XT X)−1]−1(γLS − γ0)].

The model selection process is similar to section 2. However, since there is no correlationcoefficient ρ to account for, the formulae are somewhat easier. The marginal density of z under

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

496 J.-F. Angers and A. Biswas

the hypothesis H0: θ (2) = 0 is given by:

m0(z) = �([α + p(1)/2])�(α/2)πp(1)/2|A(1)|1/2

× βα/2[β + rT

(1)r(1) + (γLS(1) − γ0(1))T A−11 (γLS(1) − γ0(1))

](α+p(1))/2,

where

p(1) = the number of components in γ(1),

A1 = �1,1 − �1,2�−12,2�

T1,2, r(1) = z − X(1)γLS(1)

and �1,1, �1,2 and �2,2, are such that

κ−1V + (XT X)−1 =(

�1,1 �1,2

�T1,2 �2,2

)

The marginal density under the full model is

m1(z) = �([α + p]/2)

�(α/2)πp/2|κ−1V + (XT X)−1|1/2

× βα/2[β + rT r + (γLS − γ0)T [κ−1V + (XT X)−1]−1(γLS − γ0)

](α+p)/2.

To choose the best model, we follow the technique discussed at the end of section 2.

4. Numerical calculations

The 4-year follow-up WESDR data, on 629 individuals, are analyzed. The details of thecomputations in the bivariate case, can be obtained in [13].

Table 3 provides the values of the correlation between the univariate summaries and theretinopathy scales. Tables 4 to 8 present the estimated values for the parameters of the fullmodel, with all the covariates, for several values of κ , for:

• the bivariate data as in [13];• the univariate data as discussed in section 3 with the left and right eyes;• the univariate data as discussed in section 3 with the minimum and maximum retinopathy

scale.• the retinopathy severity levels discussed in the introduction.

Table 3. Correlation between the univariatesummaries and the retinopathy scale.

Right RS Left RS

Univariate left\right 0.967 0.968Univariate min\max 0.967 0.967Severity Level 0.951 0.952

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

Bivariate versus univariate ordinal data 497

Table 4. Results for the full model for κ = 1.

Bivariate Univariate Univariate SeverityParameter data left\right min\max level

Constant −19.280 −18.037 −17.837 −14.306Right ME 11.777 4.880 4.670 5.497Right RE −0.172 0.192 0.191 0.158Right IOP −0.006 −0.290 −0.289 −0.316RSRBR 0.366 0.324 0.320 0.309RSLBR 0.278 NA NA NALeft ME 13.399 12.121 11.953 12.858Left RE 0.126 −0.247 −0.245 −0.196Left IOP −0.013 0.240 0.240 0.301RSRBL 0.293 NA NA NARSLBL 0.361 0.321 0.319 0.340DuD 0.156 0.161 0.160 0.118GH 0.942 0.928 0.917 0.817SBP 0.006 0.009 0.009 0.013DBP 0.123 0.119 0.117 0.117BMI 0.340 0.356 0.352 0.260PR 0.061 0.059 0.059 0.058UP −0.268 −0.397 −0.386 −0.376DI −0.004 −0.138 −0.139 −0.519Correlation 0.786 0.819 0.819 0.816

Table 5. Results for the full model for κ = 0.75.

Bivariate Univariate Univariate SeverityParameter data left\right min\max level

Constant −19.664 −18.682 −18.482 −14.741Right ME 11.786 4.774 4.617 5.420Right RE −0.173 0.195 0.193 0.159Right IOP −0.003 −0.289 −0.288 −0.316RSRBR 0.366 0.325 0.321 0.309RSLBR 0.278 NA NA NALeft ME 13.399 12.154 11.994 12.913Left RE 0.125 −0.250 −0.248 −0.198Left IOP −0.011 0.244 0.244 0.304RSRBL 0.292 NA NA NARSLBL 0.361 0.321 0.319 0.340DuD 0.157 0.162 0.161 0.119GH 0.947 0.939 0.929 0.825SBP 0.006 0.010 0.010 0.014DBP 0.124 0.120 0.119 0.118BMI 0.342 0.360 0.356 0.263PR 0.062 0.061 0.060 0.060UP −0.290 −0.422 −0.413 −0.398DI 0.017 −0.101 −0.102 −0.495Correlation 0.786 0.819 0.819 0.816

Quite naturally, the parameters corresponding to RSRBR and RSLBR are absent in theunivariate model (represented by NA in tables 4 to 8). For the minimum and maximumretinopathy scale, the variable identified with the left eye in these tables correspond to theeye with the minimum value of the retinopathy scale, whereas the values related to the righteye correspond to the maximum values. The retinopathy severity level was rescaled in orderto have the same mean and variance as the univariate transformation of the left and righteyes. The covariates specific for each eye were still used in X1 and X2 in order to measuretheir influence separately on the univariate summary. The correlation between the univariate

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

498 J.-F. Angers and A. Biswas

Table 6. Results for the full model for κ = 0.5.

Bivariate Univariate Univariate SeverityParameter data left\right min\max level

Constant −19.664 −19.415 −19.219 −15.229Right ME 11.786 4.665 4.546 5.341Right RE −0.173 0.197 0.195 0.160Right IOP −0.003 −0.288 −0.286 −0.316RSRBR 0.366 0.325 0.321 0.309RSLBR 0.278 NA NA NALeft ME 13.399 12.189 12.033 12.966Left RE 0.125 −0.253 −0.250 −0.200Left IOP −0.011 0.249 0.248 0.307RSRBL 0.292 NA NA NARSLBL 0.361 0.321 0.318 0.340DuD 0.157 0.163 0.162 0.119GH 0.947 0.952 0.941 0.834SBP 0.006 0.011 0.011 0.014DBP 0.124 0.121 0.120 0.119BMI 0.342 0.365 0.360 0.266PR 0.062 0.063 0.062 0.061UP −0.290 −0.451 −0.444 −0.423DI 0.017 −0.054 −0.057 −0.465Correlation 0.786 0.819 0.819 0.816

Table 7. Results for the full model for κ = 0.25.

Bivariate Univariate Univariate SeverityParameter data left\right min\max level

Constant −19.664 −20.254 −20.053 −15.788Right ME 11.786 4.555 4.466 5.259Right RE −0.173 0.200 0.197 0.161Right IOP −0.003 −0.287 −0.285 −0.315RSRBR 0.366 0.325 0.321 0.309RSLBR 0.278 NA NA NALeft ME 13.399 12.224 12.074 13.024Left RE 0.125 −0.257 −0.253 −0.202Left IOP −0.011 0.254 0.253 0.310RSRBL 0.292 NA NA NARSLBL 0.361 0.321 0.318 0.340DuD 0.157 0.164 0.163 0.120GH 0.947 0.967 0.955 0.843SBP 0.006 0.013 0.013 0.015DBP 0.124 0.123 0.122 0.120BMI 0.342 0.370 0.365 0.269PR 0.062 0.065 0.064 0.063UP −0.290 −0.483 −0.476 −0.448DI 0.017 −0.004 −0.006 −0.434Correlation 0.786 0.819 0.819 0.816

summaries and the left and right retinopathy scale are given in table 3. Table 3 shows thatthe univariate measure proposed in section 3 (either based on left and right or minimum andmaximum retinopathy scales) are more correlated with the right and left RS than the severitylevel. However, in the following analysis, all three univariate measures are considered.

From tables 4 to 8, it can be seen that the coefficient of RSRBR (RSRBL) is similar to the oneof RSLBL (RSLBR), while using the bivariate ordinal data. Hence, it can be concluded thatthe retinopathy scale of the right eye in baseline study has a similar effect of the retinopathyscale of the right eye in the current study. However, some variables are quite different such

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

Bivariate versus univariate ordinal data 499

Table 8. Results for the full model for κ = 0.1.

Bivariate Univariate Univariate SeverityParameter data left\right min\max level

Constant −19.664 −20.819 −20.622 −16.164Right ME 11.786 4.484 4.422 5.211Right RE −0.173 0.202 0.198 0.163Right IOP −0.003 −0.286 −0.284 −0.315RSRBR 0.366 0.325 0.321 0.309RSLBR 0.278 NA NA NALeft ME 13.399 12.244 12.096 13.057Left RE 0.125 −0.259 −0.255 −0.204Left IOP −0.011 0.257 0.256 0.312RSRBL 0.292 NA NA NARSLBL 0.361 0.321 0.318 0.340DuD 0.157 0.165 0.163 0.120GH 0.947 0.977 0.965 0.850SBP 0.006 0.014 0.014 0.016DBP 0.124 0.124 0.127 0.121BMI 0.342 0.373 0.369 0.272PR 0.062 0.066 0.066 0.063UP −0.290 −0.504 −0.497 −0.466DI 0.017 0.031 0.028 −0.411Correlation 0.786 0.819 0.819 0.816

as right macular edema (Right ME) and the variable DI (doses of insulin) that even changesigns between the bivariate and the univariate analysis for κ = 0.75 and 0.5. (However, theseare very close to 0.) It can also be seen from these tables that the estimated values for thecovariates in both the approaches are not influenced by the choice of κ , mainly because thedata set is quite large.

For the full model, the correlation coefficient (last row of tables 4 to 8) between the obser-vations on both eyes and the predicted values is 0.786 for all values of κ , in case of thebivariate data. Using the univariate transformation, the correlation coefficient between theobservations and the predicted values increases to 0.819 for the Left\Right and Min\Maxunivariate measures and 0.816 for the severity level for all values of κ .

In table 9, the estimated values of the coefficients of the covariates, along with the correlationbetween the observations and the predicted values for the ‘best’ model are given. The ‘best’model is defined as the model maximizing the marginal probability of the observations. The‘best’ model for the univariate transformation involves only the baseline retinopathy scale(variable RSLBL). The estimated values does not depend on κ and it is equal to 1.130. In thisset up, the correlation coefficient is 0.736. The second best model, with a correlation of 0.745,involves the baseline retinopathy scale (0.777) and the GH (1.208) for all values of κ .

Table 9. Results for the best model chosen using the maximum marginal probabilityand the bivariate ordinal observations.

Parameter κ = 1 κ = 0.75 κ = 0.5 κ = 0.25 κ = 0.1

Constant – – – – 2.165RSRBR 0.732 0.744 – – –RSLBR 0.377 0.745 – – –RSRBL – – 1.090 1.090 0.741RSLBL 0.372 – 1.100 1.100 0.746GH – 1.173 – – 0.977DBP 0.156 – – – –Correlation 0.734 0.721 0.712 0.712 0.721

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

500 J.-F. Angers and A. Biswas

Table 10. Results for the chosen univariate model.

Parameter κ = 1 κ = 0.75 κ = 0.5 κ = 0.25 κ = 0.1

Constant Bivariate −10.508 −10.514 −10.535 −10.548 −10.559Left\Right −11.117 −11.210 −11.307 −11.408 −11.478Min\Max −10.936 −11.041 −11.142 −11.253 −11.326

RS −10.194 −10.279 −10.361 −10.451 −10.509RSRBR Bivariate 0.715 0.715 0.715 0.715 0.715

Left\Right 0.401 0.401 0.401 0.401 0.401Min\Max 0.396 0.396 0.396 0.396 0.396

RS 0.383 0.383 0.383 0.383 0.383RSLBL Bivariate 0.723 0.723 0.723 0.723 0.723

Left\Right 0.400 0.400 0.400 0.400 0.400Min\Max 0.396 0.396 0.396 0.396 0.396

RS 0.415 0.415 0.415 0.415 0.415GH Bivariate 0.929 0.930 0.930 0.930 0.931

Left\Right 0.969 0.972 0.975 0.978 0.981Min\Max 0.956 0.959 0.963 0.966 0.969

RS 0.913 0.915 0.918 0.921 0.923DBP Bivariate 0.179 0.179 0.180 0.180 0.180

Left\Right 0.170 0.170 0.171 0.172 0.173Min\Max 0.168 0.169 0.170 0.171 0.171

RS 0.166 0.167 0.167 0.168 0.169

Since it is quite difficult to compare all possible models, the models tested are obtained inthe following way: (1) compute the posterior mean and variance for all covariate in the model;(2) compute the ratio of the posterior mean over the posterior standard deviation; (3) deletethe covariate with the smallest ratio from the model and repeat steps 1 to 3.

From Table 9, it can be seen that the choice of the best bivariate model depends heavilyon the choice of κ , whereas it is always the same for the univariate set up. However, thecovariates RSRBL, RSLBL and GH are included in almost all model, and hence we decidedto fit our model with the covariates RSRBL, RSLBL, GH, DBP and a constant term. Thismodel is adjusted in Table 10 for several values of κ . From this table, we observe that theresults do not depend much on the univariate summary used. It can be seen that the constantterm is a decreasing function of k, the coefficient of GH is an increasing one, whereas theother variables are almost constant. The results for the bivariate model is closed to the differentunivariate models except for the variables RSRBR and RSLBL. Hence, the bivariate modelgive a greater importance to the retinopathy scale of the baseline data than any of the univariatemodels considered.

The correlation between the observations and the predicted values does not depend on κ

and is equal to 0.778 for the Left\Right and Min\Max univariate summaries and 0.773 for theSeverity level.

5. Concluding remarks

The present paper is an attempt to analyze bivariate ordinal data using a general linear modelin a Bayesian framework. Two very general and flexible models are proposed, one keeping thebivariate ordinal set up, and the second one using a transformed univariate set up. The resultsdiffer, indicating that even the optimal univariate transformation may be inadequate to extractinformation from the bivariate data. The prior can be made as noninformative (or informative)by choosing suitable values for the hyperparameters κ , α and γ . The observations from thebaseline study was used to elicit the prior mean θ0 of the covariates.

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

Bivariate versus univariate ordinal data 501

Noninformative prior for σ2 is used (α = 1 and γ = 0), and a sensitivity analysis forthe choice of κ is conducted. It can be seen that κ does not have a significant influenceon the resulting estimates. The model selection approach provides an opportunity to deal withthe significant covariates only. It is observed that although a lot of covariates were recorded,only a few of them have significant contribution to the severity of retinopathy. Note that anyother suitable standard criterion like the Bayes information criteria (BIC) could be used formodel selection. It can also be seen that the results obtained using the univariate transformationare slightly better than those using the bivariate ordinal data because it has a slightly bettercorrelation with the observation.

In the perspective of the WESDR study it could be of interest to analyze the 10 year follow-up data also, by a similar technique. But we could not access the data in the form of rawdata.

Acknowledgements

The authors wish to thank the referee and the Associate Editor for valuable suggestions, whichled to some improvement over an earlier version of the manuscript.

The work is partially supported by a grant from NSERC. A part of the work of the secondauthor was carried out when he was visiting the Département de mathématiques et de statis-tique, Université de Montréal. The second author thanks the department for its hospitality.The authors wish to thank Drs. Ronald Klein and Barbara E. K. Klein of the University ofWisconsin for providing both the baseline and 4-year follow-up data of the WESDR study.The WESDR project was originally supported, in part, by grant EY 03083 (R. Klein) from theNational Eye Institute, NIH.

References

[1] Klein, R., Klein, B.E.K., Moss, S.E., Davis, M.D. and DeMets, D.L., 1984, The Wisconsin Epidemiologic studyof diabetic retinopathy, II: prevalence and risk of diabetic retinopathy when age at diagnosis is less than 30years. Archives of Ophthalmology, 102, 520–526.

[2] Klein, R., Klein, B.E.K., Moss, S.E., Davis, M.D. and DeMets, D.L., 1984, The Wisconsin Epidemiologic studyof diabetic retinopathy, II: prevalence and risk of diabetic retinopathy when age at diagnosis is 30 or more years.Archives of Ophthalmology, 102, 527–532.

[3] Klein, R., Klein, B.E.K., Moss, S.E., DeMets, D.L., Kaufman, I. and Voss, P.S., 1984, Prevalence of diabetesmellitus in southern Wisconsin. American Journal of Epidemiology, 119, 54–61.

[4] Klein, R., Klein, B.E.K., Moss, S.E., Davis, M.D. and DeMets, D.L., 1989, The Wisconsin epidemiologic studyof diabetic retinopathy IX. Four-year incidence and progression of diabetic retinopathy when age at diagnosisis less than 30 years. Archives of Ophthalmology, 107, 237–243.

[5] Klein, R., Klein, B.E.K., Moss, S.E. and Cruickshanks, K.J., 1994, The Wisconsin epidemiologic study of dia-betic retinopathy XIV. Ten-year incidence and progression of diabetic retinopathy. Archives of Ophthalmology,112, 1217–1228.

[6] Dale, J.R., 1986, Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics, 42, 909–917.[7] Williamson, J.M., Kim, K. and Lipsitz, S.R., 1995, Analyzing bivariate ordinal data using a global odds ratio.

Journal of the American Statistical Association, 90, 1432–1437.[8] Molenberghs, G. and Lesaffre, E., 1994, Marginal modeling of correlated ordinal data using a multivariate

Plackett distribution. Journal of the American Statistical Association, 89, 633–644.[9] Kim, K., 1995,A bivariate cumulative probit regression model for ordered categorical data. Statistics in Medicine,

14, 1341–1352.[10] Williamson, J. and Kim, K., 1996, A global odds ratio regression model for bivariate ordered categorical data

from ophthalmologic studies. Statistics in Medicine, 15, 1507–1518.[11] Kim, K., Lipsitz, S.R. and Williamson, J.M., 1996, Regression models for bivariate ordered categorical data

from ophthalmological studies. In: Koo, J.Y., Park, B.U., Lee, K.W., Jeon, J.W. (Eds) Collected Papers in Honorof Retirement of Professor Chung Han Yung. Seoul National University, Department of Computer Science andStatistics Alumni Association, pp. 36–55.

[12] Williamson, J., Lipsitz, S.R. and Kim, K., 1999, GEECAT and GEEGOR: computer programs for the analysisof correlated categorical response data. Computer Methods and Programs in Biomedicine, 58, 25–34.

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13

502 J.-F. Angers and A. Biswas

[13] Angers, J.-F. and Biswas, A., 2004, A Bayesian analysis of the four-year follow-up data of theWisconsinepidemiologic study of diabetic retinopathy. Statistics in Medicine, 23, 601–615.

[14] Dempster, A.P., Laird, N.M. and Rubin, D.B., 1977, Maximum likelihood estimation from incomplete data viathe EM algorithm. Journal of the Royal Statistical Society, series B, 39, 1–22.

[15] Klein, R., Klein, B.E.K., Moss, S.E., Davis, M.D. and DeMets, D.L., 1988, Glycosylated hemoglobin predictsthe incidence and progression of diabetic retinopathy. Journal of the American Medical Association, 260,2864–2871.

[16] Biswas, A. and Das, K., 2002, A Bayesian analysis of bivariate ordinal data: Wisconsin epidemiologic study ofdiabetic retinopathy revisited. Statistics in Medicine, 21, 549–559.

[17] Klein, B.E.K., Davis, M.D., Segal, P., Long, J.A., Harris, W.A., Haug, G.A., Magli, Y. and Syrjala, S., 1984,Diabetic retinopathy: assessment of severity and progression. Ophthalmology, 91, 10–17.

Dow

nloa

ded

by [

Mon

ash

Uni

vers

ity L

ibra

ry]

at 1

4:15

30

Sept

embe

r 20

13