roccurves for cohort data - haben michael · introduction: data and roc curves longitudinal...

Introduction: Data and ROC CurvesLongitudinal Dependence (ROC Curves)

Longitudinal and Pairwise Dependence (AUC)Future Work

ROC Curves for Cohort Data

Haben MichaelAdviser: Lu Tian

Summer 2017

1 / 49



Outline

Introduction: Data and ROC Curves

Longitudinal Dependence (ROC Curves)

Longitudinal and Pairwise Dependence (AUC)

Future Work

2 / 49



Yale Prospective Longitudinal Pediatric HIV Cohort (1996–2013)

I 104 patients with HIV monitored over a period of 17 years, onaverage 6.7 yo at enrollment

I median 32.5 visits/patient, median 1 visit/3 months

3 / 49



visit schedule

4 / 49



Among the measurements taken at each visit, we focus on:

I response: “blip” status, a binary variable representing atransient spike in viral load

I predictor #1: CD4 count, # CD4 cells/mm3 blood

I predictor #2: CD4 percentage, the proportion of CD4 in asmall sample

5 / 49



The goal of therapy is to keep CD4 count high and to suppressviral load, and both are tested regularly (every 3-6 months)

I viral load is the amount of HIV in a sample, indicating, e.g.,transmission risk

I CD4 count measures immunosuppression–risk of opportunisticinfections and strength of immune system (∼ 500+ normal,< 200 is an AIDS diagnosis)

I “strongest predictor of HIV progression”–US DHHS

I CD4 percentage measures the proportion of white blood cellsthat are CD4 cells

I usually more stable than CD4 count, but regarded as a poorerpredictor of disease progression

6 / 49



Even when viral load is suppressed, transient spikes (“blips”) inviral load are often observed

Blips are often ignored by clinicians, but recent research hasestablished an association with the depletion of CD4 cells

CD4 count is the currently used predictor of blip status, but CD4percentage is much cheaper to measure. How can we compare thequality of the two predictors?

7 / 49



8 / 49



ROC curve

I graphical diagnostic of the performance of a threshold-basedclassifier

I plot of TPR vs FPR for a range of thresholds

I area under curve (AUC), summary statistic between 0 and 1,probability that a non-diseased marker is less than a diseasedmarker

9 / 49



10 / 49



Suppose we want to use an ROC curve or derived statistics tocompare CD4 count and percentage as predictors of blip status.

11 / 49



12 / 49



Three types of dependence to contend with:I longitudinal dependence (a) within and (b) between a given

patient’s diseased and non-diseased observations,I common variance estimators assume iid observations within

each population, and between populations

I and (c) pairwise dependence between a given patient’spredictors, CD4 count and percent

I common methods of comparison of ROC or AUC data assumeindependent samples for each predictor

(data from different patients assumed independent)

13 / 49



patients

−−−−−−−→

(S11,

(X11

X ′11

)), (S12,

(X12

X ′12

)), . . . , (S1n1 ,

(X1n1

X ′1n1

))

(S21,

(X21

X ′21

)), (S22,

(X22

X ′22

)), . . . , (S2n2 ,

(X2n2

X ′2n2

))

...

(SN1,

(XN1

X ′N1

)), (SN2,

(XN2

X ′N2

)), . . . , (SNnN ,

(XNnN

X ′NnN

))

−−−−−−−→time/visits

14 / 49



Outline




Future Work

15 / 49



patients

−−−−−−−→

(S11,

(X11

X ′11

)), (S12,

(X12

X ′12

)), . . . , (S1n1 ,

(X1n1

X ′1n1

))

(S21,

(X21

X ′21

)), (S22,

(X22

X ′22

)), . . . , (S2n2 ,

(X2n2

X ′2n2

))

...

(SN1,

(XN1

X ′N1

)), (SN2,

(XN2

X ′N2

)), . . . , (SNnN ,

(XNnN

X ′NnN

))


16 / 49



I With non-iid data the empirical ROC curve isn’t consistentI consistent for what?

I A common approach is to fit a glm

logit{P(Sij = 1) | Xi1, . . . ,Xini} = αi + βiXij

with (αi , βi ) multivariate normal and construct an ROC curvefrom the fitted values{

(αi + βiXij ,Sij), i = 1, · · · , n; j = 1, · · · , ni}

17 / 49



{(αi + βiXij ,Sij), i = 1, · · · , n; j = 1, · · · , ni}

I Predictiveness of the biomarkers Xi1, . . . ,Xini , versus a“score” fi (Xi1, . . . ,Xini ), e.g., αi + βiXij

I This measures predictiveness of both the biomarker and themodel

I E.g., intercept term large in magnitude relative to slope term,status is nearly constant within a cluster, nearly perfect ROCcurves, regardless of quality of the biomarker Xij .

18 / 49



I It is reasonable that the analyst should exploit predictivenessof the model if that is available

I If the model is poor, the scoring system may not do justice tothe biomarker

19 / 49



Even assuming the model is correct, problems remain:

I In practice, the model is fit to the data,{(αi + βiXij ,Sij), i = 1, · · · , n; j = 1, · · · , ni

}so the predictiveness of the model will be overestimated

I Can’t set aside a test set of patients because the subjecteffects are drawn independently. Could set aside along the jdimension, which is close to what we do.

I The fit uses the entire data set, including “future”observations

I We adopt the vantage of an analyst who has a patient’shistory of biomarkers and disease statuses(Xij ,Sij), j = 1, . . . , ni , is confronted with a new biomarkerXi,j+1, and must now predict a new disease status. What isthe quality of the analyst’s prediction?

20 / 49



I The procedure seems to be measuring the predictiveness ofthe biomarker for a given patient after adequate follow-up

I We can generalize the notion of ROC curve to the cohortsetting in other ways

21 / 49



I refine notion of ROC curve

I relax parametric assumptions

I account for generalization error

I use available data–model relationship between past data andcurrent status–but no more

22 / 49



We model biomarker levels Xij as an autoregressive processconditional on disease status Sij ∈ {0, 1}, in turn modeled as atwo-state markov process

P{Sij = b|Si(j−1) = a,Hi(j−1)} = pabi , a, b ∈ {0, 1}, j = 2, · · · ,

Xij | {Sij = a,Hij} = θ(a)i + ρ0Xi(j−1) + εij , j = 1, · · · ,

P(Si1 = a) = pa and Xi0 = 0,

where

ρ0 ∈ (−1, 1)

εijiid∼ N (0, σ2

0), j = 1, · · · , ni ,

ξi =

θ

(0)i

θ(1)i

g(p01i )g(p11i )

iid∼ N

θ0

θ1

µ01

µ11

,

(Σθ 00 Σp

) , i = 1, . . . , n,

and η = (p0, θ0, θ1, µ01, µ11,Σθ,Σp) are hyperparameters23 / 49



...

... Si ,j−1

Xi ,j−1

Si ,j

Xi ,j

Si ,j+1

Xi ,j+1 ...

...

...

... Si ,j−1

Xi ,j−1

Si ,j

Xi ,j

Si ,j+1

Xi ,j+1 ...

...

. ..

. ..

...

... Si ,j−1

Xi ,j−1

Si ,j

Xi ,j

Si ,j+1

Xi ,j+1 ...

...

...

. . .

24 / 49



Random effects models are submodels:

g {P (Sij = 1 | Hij , ξi )} = g{P(Sij = 1 | Xi(j−1),Xij , Si(j−1), ξi

)}= αi0 + αi1Si(j−1) + αi2Xi(j−1) + αi3Xij

=def

Mij ,

where

αi0 =1

2σ20

{(θ

(0)i )2 − (θ

(1)i )2

}+ µ01i ,

αi1 = µ11i − µ01i ,

αi2 =1

σ20

ρ0

(θ

(0)i − θ

(1)i

),

αi3 = − 1

σ20

(θ

(0)i − θ

(1)i

).

(1)

25 / 49



Individual ROC curve

I contrast an individual’s diseased and non-diseased survivalfunctions

P(Mij > m | Sij = 1, ξi ) vs. P(Mij > m | Sij = 0, ξi )

withMij = αi0 + αi1Si(j−1) + αi2Xi(j−1) + αi3Xij

I Two CDFs imply an ROC curve (and derived statistics); arethese the correct CDFs?

I use all available history: under our model the current diseasestatus Sij given the score Mij is conditionally independent ofthe remaining data

I are functions of the available data

26 / 49



Individual ROC curve

I

P(Mij > m | Sij = 1, ξi ) vs. P(Mij > m | Sij = 0, ξi )

withMij = αi0 + αi1Si(j−1) + αi2Xi(j−1) + αi3Xij

I The ROC curve is a function of the subject effect ξi throughthe subject specific-coefficients, which must be estimated

I E.g., posterior means (empirical bayes)

ξi = E(ξi | available subject data, η)

27 / 49



The quality of the approximation to the patient’s true ROCdepends on the size of the patient’s history. When the patienthistory is too small, an alternative is the average individual ROCcurve

ROC1(u, j) = Eξ {ROC(u, j |ξ, η)} , (2)

where the expectation is taken with respect to the random effect ξ

I in practice, sample ξi | η under our model and average theempirical ROCs

ROC1(u, j) = B−1B∑

b=1

ROC(u, j | ξ, η)

I√n{ROC1(u, j)− ROC1(u, j)} converges weakly to a mean

zero gaussian process (in u), provided√n(η − η) G

28 / 49



Expected individual ROC ROC1(u|t) and population ROC ROC4(u|t1, t2) at visits t1 = 6 through t2 = 8 with95% bootstrap CI; limiting individual ROC ROC2i for four patients (pediatric HIV data).

29 / 49



Population ROC curveI Consider a time-indexed ROC curve based on the marginal

distribution of the scores Mij

ROC3(u | j) = Sj ,1(S−1j ,0 (u))

Sj ,a(u) = Pj(M·j > u | S·j = a)

30 / 49



Population ROC curve

I Conditional on the population parameter, the patient data areiid, so the same holds for Mij , i = 1, . . . ,N, j fixed

I The empirical ROC curve based on pairs(Mij ,Yij), i = 1, . . . ,N,

ROC3(u | j) = Sj ,1(S−1j ,0 (u))

is a valid measure of the predictive performance of the scoresMij , regardless of the validity of the model (beyondindependence)

31 / 49



With mild regularity conditions on the population parameter,

I ROC3(u | j) is consistent for ROC3(u | j) as the number ofpatients grows

I√n{ROC3(u | j)− ROC3(u | j)} converges to a mean zero

gaussian process (indexed by the FPR u)

I Variance can be approximated by the bootstrap

32 / 49



Expected individual ROC ROC1(u|t) and population ROC ROC4(u|t1, t2) at visits t1 = 6 through t2 = 8 with95% bootstrap CI; limiting individual ROC ROC2i for four patients (pediatric HIV data).

33 / 49



Population ROC for misspecified model

FPR 0.10 0.25 0.50 0.75t3 0.95 (0.045) 0.91 (0.065) 0.94 (0.043) 0.96 (0.027)5 0.94 (0.052) 0.93 (0.048) 0.92 (0.042) 0.93 (0.029)9 0.94 (0.045) 0.97 (0.042) 0.93 (0.048) 0.94 (0.027)

34 / 49



Outline




Future Work

35 / 49



patients

−−−−−−−→

(S11,

(X11

X ′11

)), (S12,

(X12

X ′12

)), . . . , (S1n1 ,

(X1n1

X ′1n1

))

(S21,

(X21

X ′21

)), (S22,

(X22

X ′22

)), . . . , (S2n2 ,

(X2n2

X ′2n2

))

...

(SN1,

(XN1

X ′N1

)), (SN2,

(XN2

X ′N2

)), . . . , (SNnN ,

(XNnN

X ′NnN

))


36 / 49



In considering both longitudinal and pairwise dependence weestimate a more tractable target, the AUC

I X1, . . . ,XNiid∼ FX non-diseased observations

I Y1, . . . ,YNiid∼ FY diseased observations

I AUC is P(X < Y )

I Unbiased estimator is∑

i ,j P(Xi < Yj) (Mann-WhitneyU-statistic)

37 / 49



As with the ROC, we can generalize to individual and populationAUCs

UN = P(X < Y | X and Y belong to the same cluster)

=1

N

N∑i=1

Pµi (Xij < Yij)

VN = P(X < Y )

estimators

UN =1

N

N∑i=1

1

nimi

∑j ,k

{Xij < Yik}

VN =1∑

mi∑

ni

∑i ,j

∑r ,s

{Xir < Yjs}

38 / 49



θi ∼ Unif [0, 1]

Xij | θi ∼ Unif [θi − δ, θi ]Yij | θi ∼ Unif [θi , θi + δ]

UN = 1

VN →p 1/2 + ε (N →∞)

39 / 49



Bi ∼ bernoulli

Xi1, . . . ,Xi,n−1,Yi1 | Bi

∼ Unif [2Bi − 1, 2Bi ]

UN →a.s. 1/2

VN →a.s. 1− ε (n,N →∞)

40 / 49



Expected individual AUC (UN , U′N)

I Independent sequence of bounded RVs, apply CLT

N−1/2(

(UN

U ′N

)−(UN

U ′N

)) N (0,Σ)

I Use usual covariance estimators to perform inference

41 / 49



Population AUC (VN , V′N)

I More commonly studied than individual AUC (opposite forROC)

I Almost a U-statistic, but (probably) not quite, because of therandom vector lengths as normalization

I We want a variance estimator in order to perform inference

I We could use the bootstrap, treating the clusters asindependent observations, but maybe lose too much power

42 / 49



I Obuchowski ’97 describes a variance estimator. Developedwith radiology applications in mind. The different biomarkerscorrespond to different “readers.”

I nonparametric, asymptotic estimator

I little in the way of proof in the original paper

I in practice, seems very robust

43 / 49



Gives close to the same estimate as the jackknife varianceestimator, treating the clusters as the independent observations“left out”

Replication of Obuchowski simulation with jackknifevariance estimator. (Approximately 3.6e+5 observations;status and maker correlations of 0, .4, and .8; normal andnonnormal data; 100 clusters with 2observations/cluster.)

The Obuchowski ’97 simulation was replicated whilevarying N = # of clusters, computing the maximumabsolute difference between the 2 estimators on eachreplicate. Also plotted is a fit to 1/N2.

44 / 49



The difference between the two variance estimators is O(1/N2)

Two conclusions:

1. Consistency of the Obuchowski estimator can be establishedfrom consistency of the jackknife estimator

I Arvesen ’69 establishes consistency of the jackknife varianceestimator for well-behaved functions of U-statistics

VN =N−2

∑i ,j

∑r ,s{Xir < Yjs}

(N−1∑

mi )(N−1∑

ni)

I also establishes consistency for independent, non-identicallydistributed data, matching our assumption on the patients

I minor modification of his proof needed to account forwithin-cluster dependence across diseaesed and non-diseasedobservations

45 / 49



Two conclusions:

2. might as well use bootstrap at the cluster level

46 / 49



Outline




Future Work

47 / 49



I HIV data have a natural time parameter–the visits where thepatients are treated and measured

I Other data may consist of irregularly measured markersI individual ROC curves will mean different things for different

patientsI population ROC curve loses interpretation

48 / 49



For the HIV data, are visits the “natural” time parameter?

49 / 49

roccurves for cohort data - haben michael · introduction: data and roc curves longitudinal...

Documents