the meaning and use of the area under a receiver operating ... · pdf filean roc curve and,...

8
Illeprinted llom R'\l)IOLO(l\'. Vol. l-1;]. No. l.l)ages 2!) :lii.April. 1982.] ('opvright 1!)82 br'1he Iladioiogical Sot'ietv ol Norllr Anrelita. In(orl)onlc(i James A. Hanley, Ph.D. Barbara |. McNeil, M.D., Ph.D. A representationand interpretation of the area under a receiver operating char- acteristic (ROC) curve obtained by the "rating" method, or by mathematical pre- dictions basedon patient characteristics, is presented. It is shown that in such a setting the arearepresents the probability that a randomly chosendiseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosennon- diseased subject.Moreover, this probabil- ity of a correct ranking is the samequan- tity that is estimatedby the already well- studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one usesto accompanythe area under a smoothedROC curve, (b) guide in determining the size of the sample re- quired to provide a sufficiently reliable estimate of this area/ and (c) determine how large sample sizesshould be to en- sure that one can statistically detect dif- ferences in the accuracy of diagnostic techniques. Index terms: Receivt'r operating ch.r.rcteristic curv(' (noc) Radiology 743:29-36, April 1982 I Fronr tht L)ep.rrirnr.nt of Epiclemirrlogl antl Health, Mc(lilI Urriversitr', NIontrt'al, Canada (J.,\.H.), ancl the L)epartrnent o1 llatliolog\', Harvartl Nletlical School ancl Brigham trrcl l\itrmen's Hosirit.rl, Boston. MA (B.J.N{.). Receivecl March 18, l98l ; accepteel ancl lcvisiorr r('quostecl Julr'21; rcvision rrccivet{ l)ec. l5 Supported in part lrv the Ilartforcl Ijountl.rtion rntl the National Ctntr-r for Hcalth Carc Technologr". See .rlso the articles bv Hessel ci a/ (pp. I 29- l 33) .rntl Abrams rf n/. (pp. 121 128) in this issue. cc1 T) ECEIvER operating characteristic (ROC) curves (1-3) have become l\ i.,.."uringly popular over the past few years as the radiologic community has become concerned with the measurement of the in- formation content of a variety of imaging svstems (4, 5). In addition, ROC curves are being used to judge the discrimination ability of various statistical methods that combine various clues, test results, etc. for predictive purposes (e.9., to determine whether a patient will need hospitalization or will benefit from treatment). In most cases, however, ROC curves have been plotted and evaluated qualitatively with relatively little attention paid to their statistical characteristics. This has occurred for a number of reasons. First, the most common quantitative index describing an ROC curve is thc'area under it, and there has not been a full description in the radiologic literature of an intuitive meaning of this area. Second, most quantitative measures used to describe an ROC curve are derived assuming that the varying degrees of normality/abnormality seen in the images can be represented by twc'r separate but usually over- Iapping Gaussian digributions, one for the diseased group and one for the nondiseased group. This has led a number of investigators to question the validity of the ROC approach altogether. Third, iterative numerical methods rather than closed-form expressions are required to estimate the standard error of these quantitative indices; opera- tionally, these methods are cumbersome and require a specialized computer program. Fourth, there has been no method shown to date to indicate the sample size necessarv to ensure a specified degree of statistical precision for a particular quantitative index. In this paper, we wili elaborate on the meaning of the area under an ROC curve and, using the links between it and other, better known statistical concepts, we will develop analytic techniques for eliciting its statistical properties. METHODS Indices Used to Summarize ROC Curves A large number of theoretically base.d measures has been pro- posed to reduce an entire ROC curve to a singlc quantitative inclex of diagnostic accuracy; all of these measures have been rooted in the assumption that the functional form of the ROC curve is the same as that implie'd bv supposing that the undc'rlyine distributions for normal and abnormal groups are Caussian (4). When an llOC curve, plotted on double probabilitl, paper, is fitted by eye to a straight iine or when the RC)C points are submitted to an iterative maximum likelihood c.stimation program, two parameters, one a difference of The Meaningand Use of the Area undera Receiver Operating Characteristic (ROC) Curvel 29

Upload: dangtu

Post on 06-Mar-2018

223 views

Category:

Documents


2 download

TRANSCRIPT

I l l ep r i n t ed l l om R ' \ l ) IOLO( l \ ' . Vo l . l - 1 ; ] . No . l . l ) ages 2 ! ) : l i i . Ap r i l . 1982 . ]( ' opv r i gh t 1 ! ) 82 b r ' 1he I l ad i o i og i ca l So t ' i e t v o l No r l l r An re l i t a . I n (o r l ) on l c ( i

James A. Hanley, Ph.D.Barbara |. McNeil, M.D., Ph.D.

A representation and interpretation ofthe area under a receiver operating char-acteristic (ROC) curve obtained by the"rating" method, or by mathematical pre-dictions based on patient characteristics,is presented. It is shown that in such asetting the area represents the probabil itythat a randomly chosen diseased subjectis (correctly) rated or ranked with greatersuspicion than a randomly chosen non-diseased subject. Moreover, this probabil-ity of a correct ranking is the same quan-tity that is estimated by the already well-studied nonparametric Wilcoxon statistic.These two relationships are exploited to(a) provide rapid closed-form expressionsfor the approximate magnitude of thesampling variabil ity, i.e., standard errorthat one uses to accompany the areaunder a smoothed ROC curve, (b) guidein determining the size of the sample re-quired to provide a sufficiently reliableest imate of th is area/ and (c) determinehow large sample sizes should be to en-sure that one can statistically detect dif-ferences in the accuracy of diagnostictechniques.

Index te rms: Rece iv t ' r opera t ing ch . r . rc te r is t i c curv ( '(noc)

Radiology 743:29-36, Apr i l 1982

I F ronr th t L )ep . r r i rn r .n t o f Ep ic lemi r r log l an t lH e a l t h , M c ( l i l I U r r i v e r s i t r ' , N I o n t r t ' a l , C a n a d a ( J . , \ . H . ) ,

anc l the L)epar t rnent o1 l la t l io log \ ' , Harvar t l N le t l i ca l

Schoo l anc l Br igham t r rc l l \ i t rmen 's Hos i r i t . r l , Bos ton .

M A ( B . J . N { . ) . R e c e i v e c l M a r c h 1 8 , l 9 8 l ; a c c e p t e e l a n c l

l cv is io r r r ( 'quos tec l Ju l r '21 ; rcv is ion r rcc ive t { l )ec . l5

Suppor ted in par t l rv the I la r t fo rc l I joun t l . r t ion rn t l

the Nat iona l Ctn t r - r fo r Hca l th Carc Techno logr " .

See . r l so the ar t i c les bv Hesse l c i a / (pp . I 29- l 33). rn t l Abrams r f n / . (pp . 121 128) in th is i ssue. cc1

T) ECEIvER operating characterist ic (ROC) curves (1-3) have becomel\ i . , . ."uringly popular over the past few years as the radiologiccommunity has become concerned with the measurement of the in-formation content of a variety of imaging svstems (4, 5). In addit ion,ROC curves are being used to judge the discrimination abi l i ty ofvarious stat ist ical methods that combine various clues, test results,etc. for predict ive purposes (e.9., to determine whether a patient wi l lneed hospital izat ion or wi l l benefi t from treatment). In most cases,however, ROC curves have been plotted and evaluated quali tat ivelywith relat ively l i t t le attention paid to their stat ist ical characterist ics.This has occurred for a number of reasons.

First, the most common quantitative index describing an ROC curveis thc'area under i t , and there has not been a ful l descript ion in theradiologic literature of an intuitive meaning of this area. Second, mostquanti tat ive measures used to describe an ROC curve are derivedassuming that the varying degrees of normali ty/abnormali ty seenin the images can be represented by twc'r separate but usually over-Iapping Gaussian digributions, one for the diseased group and onefor the nondiseased group. This has led a number of investigators toquestion the val idity of the ROC approach altogether. Third, i terat ivenumerical methods rather than closed-form expressions are requiredto estimate the standard error of these quanti tat ive indices; opera-t ional ly, these methods are cumbersome and require a special izedcomputer program. Fourth, there has been no method shown to dateto indicate the sample size necessarv to ensure a specif ied degree ofstat ist ical precision for a part icular quanti tat ive index.

In this paper, we wil i elaborate on the meaning of the area underan ROC curve and, using the links between it and other, better knownstatist ical concepts, we wil l develop analyt ic techniques for el ici t ingits stat ist ical propert ies.

METHODS

Indices Used to Summarize ROC Curves

A large number of theoretical ly base.d measures has been pro-posed to reduce an entire ROC curve to a singlc quanti tat ive inclexof diagnostic accuracy; al l of these measures have been rooted in theassumption that the functional form of the ROC curve is the same asthat implie'd bv supposing that the undc'r lyine distr ibutions fornormal and abnormal groups are Caussian (4). When an l lOC curve,plotted on double probabil i t l , paper, is f i t ted by eye to a straight i ineor when the RC)C points are submitted to an i terat ive maximumlikel ihood c.st imation program, two parameters, one a dif ference of

The Meaning and Use of the Areaunder a Receiver OperatingCharacteristic (ROC) Curvel

29

TABLE I: Rating of 109 CT Images

Probably Quest ion-Normal able

(2) (3)

Probably Def in i te lyAbnormal Abnormal

(4) (5) Total

the Wilcoxon statistic or by the trapc'-zoidal rule will be virtually identical tcrany smoothed area.) Second, and moreimportant. we show how the statisticalproperties of the Wilcoxon statistic canbe used to predict the statistical prop-erties of the area under an ROCcurve.

RESULTS

I. A Three-Way Equivalence

To ampli fy the three-way equiva-Ience between the area under an ROCcurve, the probabil i ty of a correctranking of a (normal, abnormal) pair,and the Wilcoxon stat ist ic, we presenti t as two pairwise relat ionships:

A. The area under the ROC curvemeasures the probabil i ty, denoted bvd, that in randomly paired normal andabnormal images, the perceivecl ab-normali ty of the twc.r images wil l al lowthem to be correctly identi f ied.

B. The Wi lcoxon s ta t i s t i c a lsomeasures th is p robab i l i t y F l th . l t r . rn -domlv chosen normal and abnormalimages wil l be correctly ranked.We now dea l w i th A and B in tu rn .

IL Mathemat ica l Res ta temento f Re la t ionsh ip A

We make an . imp i ic i t assumpt ionthat the sensory information conveveclby a radiographic image can be quan-t i f ied by and ordered on a one-cl imen-sional scale represented by,r, with lon'values of .r favoring the decision to cal lthe image normal and high values ia-voring the decision to cal l i t abnormal.The distr ibutions of -t values for ran-domly se lec ted abnormal images, c le -noted by "r,1 , and those for notm;rl im-ages, denotc'd by -tr, wi l l overlap; t l -re, t r d is t r ibu t ion w i l l be centered to ther i g h t o f t h c . r ' \ ( ) n e . I n a r . r t i n g e r . p t ' r i -ment , the degree o f susp ic ion , r , n ' i l lactual ly be reported on an orclererlca tegor ica l sca le . TARLI , I p resents i i -lus t ra t i ve da ta showing hon ' a s ing lereader rated the computc'cl tomogra-ph ic (CT) images ob ta ined in . r s .u lp lq 'o f 1 0 9 p a t i e n t s w i t h n e u r o l o g i c a lproblems. As expectecl, the' r .1 ancl r 'd i s t r i b u t i o n s o v e r l a p ( i . c . . r o m e t t , ' r r -diseased patients had abnormal re.rt l-ings and some diseased patients hatln o r m a l r e a d i n g s ) . T h u s , i f t h e i m . r g t ' -from a randomly chosen notmal anrl arandomlv chosen abnormal case \\ 'orcpaired. there woulcl be' le-ss than a 100' iprobabil i ty that the. sensory informa-

Rat ing

True DefinitelyDisease NormalS ta tus ( 1 )

NormalAbnormal

Totais

2J J

.1 f

62

8

62

8

J J

3

36

1 11 1

22

l l r = 5 8

t 1 n = 5 1

1 0 9

means and the other a rat io of vari-

ances, are obtained. From these, anumber of indices can be calculated,the most popular being an estimate ofthe area under the fitted smooth curve(4). This index, denoted A(z) to sym-bolize i ts Gaussian underpinnings,varies from 0.5 (no aPParent accuracy)

to 1.0 (perfect accuracy) as the ROC

curve moves towards the left and top

boundaries of the ROC graph.2 Whenone f i ts the two parameters by maxi-mum l ikel ihood rather than by eYe,one also obtains their standard errors,thereby allowing the area derived fromthe two parameters to be also accom-panied by a standard error. This can beused to construct confidence intervalsand to perform stat ist ical tests of sig-nif icance.

The Meaning of the Area underan ROC Curve

A precise meaning of the areaunder an ROC curve in terms of theresult of a signal detection exPerimentemploying the two-alternative forcedchoice (2AFC) technique has beenknown for some t ime. In this system,Green and Swets (6) showed that thearea under the curve corresponds to theprobabil i ty of correctly identi fyingwhich of the two stimuli is "noise" andwhich is "signal plus noise." In medicalimaging studies, the more economicalrat ing method is general ly used: im-ages from diseased and nondiseasedsubjects are thoroughly mixed, thenpresented in this random order to areader who is asked to rate each on adiscrete ordinal scale ranging fromdefinitely normal to definitely abnor-mal. Very often a five-category scale is

2 An area can a lso be ca lcu la ted by thc t rape-

zoidal rule; the;rrea obtainecl in this u';rv has been

c les ignated I ' (A) . As is seen in F igure 1 , / ' ( ,4 ) i s

smal le r than the area under anv smooth curve ,

an t - l i s someu 'ha t more sens i t i ve to the loc . r t ion

ant i spread o f the po in ts t ie f in ing the curve than

is the area / ( : ) ca lcuLated . rs the smooth ( l . russ i . rn

cstrmil te.

used. Although the points required toproduce the ROC curve are obtained ina more indirect way, i .e., by succes-sively considering broader and broadercategories of abnormal (e .9., category 5alone, categories 5 plus 4, categories 5plus 4 plus 3), the important point isthat on a conceptual level, and thus froma stat ist ical viewpoint, the area underthe curve obtained from a rat ing ex-periment has the same meaning as it haswhen i t is derived from a 2AFC exper-iment. As we wil l explain below, theROC area obtained from a rat ing ex-periment can be viewed, at least con-ceptual ly, in the same way as the areaobtained from a 2AFC experiment.Basical ly, when an investigator calcu-lates the area under the ROC curve di-rect ly from a rat ing experiment, he isin fact, or at least in mathematical fact,reconstructing random pairs of images,one from a diseased subject and onefrom a normal subject, and using thereader's separate rat ings of these twoimages to simulate what the readerwould have decided i f these two im-ages had in fact been presented togetheras a pair in a 2AFC experiment. Indeed,this mathematical equivalence (equiv-alent in the sense that the two areas aremeasuring the same quantity, even if thetwo curves are constructed differently)has also been veri f ied empir ical ly in arecognit ion memory experiment byGreen and Moses (7).

More important, however, was themore recent recognit ion by Bamber (8)that this "probabil i ty of correctlyranking a (normal, abnormal) pair" isintimately connected with the quantitycalculated in the Wilcoxon or Mann-Whitney stat ist ical test. We now elab-orate on this relat ionship in two ways.First, we show empir ical ly that i f oneperforms a Wilcoxon test on the ratingsgiven to the images from the normaland diseased subjects, one obtains thesame quanti ty as that obtained by cal-culat ing the area under the corre-sponding ROC curve using the trape-zoidal rule. (I f in fact the rat ings are ona continuous scale, the area obtained by

Apr i l 1982 Vo lume 143, Number 1 Hanley an<-l McNeil

T€ A9OIOICVU JIISONCVIC

-qloddq (llnu) ar{t tsal ot pasn .,(11ensnsr M

'a^oqe polels sP lllSnoqllv'JUnOJJe

OlUr SUOS

-rredruor palpiarralur asar{l sa)el M JOroJJa pJepuels aql pue 'aJuarua^uoJ e^{laratu sr IIe urar{} Surpnlrui lluapuad-apur alp suosrredluor Nl o Hrr II€ lou,{lsnor.Lqg's,Nr aql 1o uorlrodo,rd 1eq.uueql ralearS aJe s,sr aql Jo uorlrod-ord 1uq.tl /plnoLIS lr se 'slJalJar pue

I pue 0 uea.nalaq soII M alors a8era,,rear{l '0 ro 'zf

r'I se paJOJS sr uosrrudurolqrea aJuts (Nr < t'.r)qo;4 = 6SurqrDs-ap uarI.l,t pauorluauJ suosrreduor

lo pul) arll saTeur d11en1re lr aJursi M Jo uortelnturol .{ra-t ar{l uroJJ snorl

-qo aq .trou plnor{s g drqsuor1e1a11'sBurluer rraql

uo ,,{1uc'r lnq s,-r aLI} Jo sanlel lenlJear{} uo puadap lou saop 1,4

'suosrred

-uror ou/sa.{ uo paseq sr lsal aLIl aJurs'oslv 'AI uorl)as ur Molaq paqrrJsop aqol poqlaru ralseJ Ll)nur e.,{q pau,roJJadaq ueJ uorlelndruor aql 'aJrlJeJd uI

I I N111H1

(nr'Yr)s33 ,

= t

''r'l'suosuPdruoJ

Nu c vu aql Ja o s,S eLIl Sur8e,rarre pue

(.,{1uo elepalarrstp)

olnr aqlo1 Surprome uosrredruoc qrea Surrors's,Nr aldtues Nl aLIl pue s, vr aldures F'lr

aql uaa.^.\laq suosrredruor alqrssod Nrt .vr IIe Sulletu Jo slsrsuor 'hlpnldatuttt

$aal la'arnparorcl aLI] 'N urorJ Nu puey ruor1 vu az\s 1o aldrues p qllM 'f,]lun

ol rasolJ L{Jnw aq ol ser{ AlrlrqeqorclsrLll 'Jolpuil.IIu:srp pooS p dq ()l I Joj'9

0 = ('tr < vx)qtttd: 1, teqt ro'lI uer{lJalParb aq ol sr 1l sP N ruoJJ iPnpr^rp-urue urorJ anle^ .r ue ueql rallpurs aqol Ale)rl se lsnt sr H tuoJJ Ienpl^rpuI uEuroJj anle^ r ue leql ''J'l 'JoleurrulJJ>^lp

lnJ.rsn plou sr r'leLll sr slqrqlod^q llnuaql suorlelndod o.u.1 er{l ul palnqrrl-srp are s,r ar{l .r.toq Surrunsse .'i11en1rr:

lnolllr^4. '(;r-) uorlelndod puoJas P uIuuqt rolearS aq ot puot (y) uollelndodeuo ur r slqPrJe^ a^rlPlrluPnD auros Jc)sla^rl aql raqlaLI.^ lsal ol palndruor ^l-lensn sr 'M 'Jrlsrlels uoxoJlrM aLiI

sBurIueUasrMJred lJaJJoJ Jo dlrlrqPqoJd

ar{l pue JllsllPls uoxorlrMaql:gdrqsuorlelal'III

'.{ressalau aq lou .l,lqeqord 11r.uSulug aarnr xaldruo) pue 'Jallloours

qrnru aq oslP IIr,^^ a^rnJ )ou IeJIJIo-rua aql leql suParu aleJs lllrqeqoJoJO aJOfS aql Jo aJnleu snonulluof, aJoiuaLIl 'uorlenlls srlll ul 'raAa.MoIJ suolJ-nqrrlsrp uerssneS ol iuJoJuoJ Alrres-sJ)JU lou lr,l,,t sJrlrlrqpqold ro siJtlJsasaql Jo suorlnqrrlsrp aLIl '.{Sololpluol

-duis Surluasard s,luerlpd e uo paseqsuorlenba ad.{1-uorssar8ar ruor; alerssnonurluoJ e uo palJnJlsuoJ sarllllq-eqo,rd ro soJors r{ll.lt pa^ali{Je uolleu-rLurJJSrp or{1 aqIJJSaP O1 so^rnJ JOUSursn ur palsorolul sJoJegrlsalur JoC

'suorlnq

-rJJSrp uerssneC o.trl .{q parldrur r.uro;a^rnJ JOU aql qllrvr IIa,v\ lqeuosearaar8e o1 pual sarpnls Sur8er,ur I€)ipalupup Ip)rs{qdoqr,,(sd ruorJ el€p a^rs-uJlyJ eql 'l,rAoaJo41 ',{;rrr3r;ads pur:dlr,rrlrsuas uaamJoq sJJo-aperl arll se

a^rnf IOU r.rp r.rpun parv aql Jo asll pue burueal

LIJns'a^rnf, fou aql Jo slradse Jerllo uJprlsaralur uauo sr JoleSrJsaAuI aLIl (-r)pue luodn paar8e .,(1rsee aroru aJe J JnJqloorus e 3ur11r1 roJ errallrr alil (./)1a^JnJ IOU Llloours e dlr1ea,r ur sI lPt{.ry\JaPun PaJe aLIl ale_wrlsaJopun ol puol

IIrM alnJ Ieprozaderl aql lPt{l sueatuSr,rrsn 1o alqeder sr Jepear p se rJogelpJ8ur1er xrs Jo aArJ Jo urnrurxeuJ aql(r) :suorldrunssp Ieuorlnqlrlslp Jo asnaql roJ suoseal IeJII)eJo ale alar{J'JaAa-.MoH 'parJrluenb .{1l1rt1dxa aq paau(erur-ueB :-r2 lurluauodxa ertrlegau 's-rz

uerssneg) ruroJ Ieuorlnqlrlslp lrexa aqlJou (piltPl lJal

'c,2 p.r;re1 lqBryqr) JIJlrtu-ru,,\s) arnleu aql Jrqlrau :rred (1eru-Jouqe 'lerurou) e Jo SurlueJ JJaJJoJ e

;o ,{lrlrqeqord aq1 Surlnsearu se .,(ldrursgo lq8noqi aq uef, alJnr aql rapunPar€ or{l /snLIJ sllolirlqr)lstp Nx pttu r!x JtlJ

Jo uuo| J\l ltloEJ suotldunssu oll sa)€ru

lr leql sl /snlnJIeJ leJgalur uo spuadap

qllr{,!\ ';oo.rd aql J-o arntpal lueuod-urr lsoru aLIl 'lulooMer^ Jno uloJc

(Nr < t'r)qo;d =

p - a^Jnr JOU rapun eal€ ,,anrJ.,,

ro'1enba a,re 8ur1uel l)aJJoJ e

1o ,{lrliquqord aql pue alrnf, or{l Jopuneare er{l 's8urler .,{ro8aler Jo raqwnualrurJ e ,{1uo ueql JaqleJ runnurluo)r arrlua aql Sursn Surlroder yo alqed-e) sr rapear e leLIl pue sluaried Jo ald-rues aJrurJur uP aAPq a.4^ lPql lueruorudLll JoJ Jr.!nssP J,M Jt leql s,{es 11nso.r,slo.^ s puP uaarC u_or{I '(Nr

l vx)qot4

= // se lrlrqeqoro srlll ol raJal a.^^'puequor{s

Ie)rlsrlels e sy aSerur leru-Jou Jr{l uroJJ paurelqo Nr Surpuodsar-roJ aLIl ueql ralearS aq lreJ ur plnoMa8erur Ierurouqe aqt ruoJJ surplqoauo LIITLIM 'i,r'uorlrdsns yo aar8ap aql'urJa] asrJJJd aJour e ]o lue,^4 JoJ 'Jo uorl

^n* t,:",,-ol =, u,,,-,,,T..?il;l

'kZt=ZtO0:= (d)3S

7,t68 = e68 0 = (i9.89) +ZV9'Z :(/4.Nii) + (9)1e1o1 = H = M

989926 1 + rgltEj 0 + 199660 0

Z8I8 0 = ([rr . Nal + (t) 1e1og = rO tlzZtt'€Zl

€I€8'0 = (f rr . vxl + (9) Ielol =zO €lzZI9'ZnI

€68 0 : (Yil . Nrr) + (9) Ie1o1 = ,14 Zr9'Z

0 ol su()ll-rppe alrssa)f,ns ,{q 1 tuor; paure}qo

I c'lsvJ. LUorJ PauI€lqO vu = 19

Ig = ilrr ruoJJ suorlJeJls-qns alrssaJJns ,{q g uor; paulelqo

I 3-ISVJ tuorJ paulelqo rl.l : 89

99

€€

0

z

gZL tlz9ll'91 Z9I'Z\

gZZ'tU €lz€9r'82 ,€9'€

€e zheZl 0lZ

CI?

II

ssz'€r t88'08 [z(€) x €/I + (€) x (z) + z(z)] x (I) t

86s'z 680'I [.(t) x €/r + (I) x (]) + z(t)l x (s) 9

Z8Z 'lt€E9'l(€)x(r)xz^+(z)x(I) s

g€ 0 r> prl€r slPrurou Jo raqunN t

z t r palPJ sll?luJouqe Jo JJqrunN €

91 8t r< paler i^leuuouqe Jo raqtunN Z

g €€ r paler sleulou Jo raqunN I

6l

Z

€€ bh

II 9

€=x z=xslualuo)

(3ur1e1) utunloJs jrPuaNIPIOf

rorrg pJepuels sll pue 14 1o uorielndtuo3 :II aravl

esis that variable r cannot be used todiscriminate between A and N (i.e., thatd equals 0.5), i ts behavior when d ex-ceeds 0.5 ( i .e., when r is actual ly ofdiscriminatory value) is also well es-tabl ished. In that regard, for our pur-poses, the most important characteristici s i t s s t a n d a r d e r r o r , s i n c e o u r m a i ninterest is in quantifying how variabieW (or i ts new al ias, the area under theROC curve) wil l be in dif ferent simi-Iar-sized samples. When d ) 0.5, W isno longer nonparametric; i ts standarderror, SE(W), depends on two distr i-bution-specif ic quanti t ies, Q1 and Q2,which have the fol lowing interpreta-t ion :

Qt = Prob (two randomly chosenabnormal images wil l both be rankedwith greater suspicion than a randomlychosen normal image)

Qz = Prob (one randomly chosenabnormal image wil l be ranked withgreater suspicion than two randomlychosen normal images)If we assume for the moment (as Greenand Swets do in their proof regardingfl and the area under the ROC curve)that the rat ings are on a scale that issuff iciently continuous that i t does notproduce "t ies," then SE(W), or equiv-alently SE(area underneath empir icalROC curve), can be shown (8) to be

( 1 )

The quantity W can be thought of as anestimate of d, the "true" area under thecurve, l .e., the area one would obtainwith an inf inite sample and a contin-uous rating scale. In the rating categorysituation, of course, W wil l tend tounderestimate d, but Formula 1 wil l beuseful nevertheless.

We now go on in the fol lowing sec-t ion to calculate the Wilcoxon stat ist icfor the data in TanrE I and to show thati t does indeed correspond to an areaunder the ROC curve, albeit the areafound by the trapezoidal rule. We alsocarry out the computations required toestimate direct ly from the data ( i .c.,without any distr ibutional assump-t ions) the two quanti t ies Q1 and Q2.With these, and using W as an estimateof 0, we then use Formula 1 to estimatea standard error for what is in this casea somewhat biased area under thecurve. As wil l become evident in sub-sequent sections, there is a secondmethod, requiring fewer computations,for est imating Qr and Q2 for use in

Formula 1. We give both methods forthe sake of completeness.

IV. W and SE(lY)-Calculatedwithout DistributionalAssumpt ions

We i l lustrate the calculat ions usingthe data from TasI-E I. Since the Wil-coxon stat ist ic is based on pairwisecomparisons, the specif ic values 1through 5 that we have applied to thefive rating categories are to be thoughtof simply as rankings. The computa-t ions can be conveniently carr ied outaccording to the scheme shown inTanle II .3 Rows 3 and 1 are taken di-rect ly from Tenrr I , whi le rows 2 and4 are d.erived from 3 and 1 by succes-sive delet ion and cumulation respec-t ively. The quanti ty W can be comput-ed in row 5 by using the entries in rowsI,2, and 3; the SE requires calculat ionof the two intermediate probabil i t ies

Q 1 and Q2 (see rows 6 and 7 for details),which are then used to compute an es-t imate o f SE( W) f rom Formula | .

Tast-r I I shows the detai led calcula-tiop of W and its standard error. The W= 0 = 0.893 = 89.3% derived in this wayagrees exactly with the area under theROC curve calculated bv the trapezoi-dal rate. By way of comparison, the areaunder the smooth Caussian-based ROCcurve f i t ted by the maximum i ikel i-hood technique of Dorfman and Alf (9)

is 0.911 or 9I. I( /o; the area under thesmooth ROC curve derived from theparameters of a straight- l ine f i t to theROC plotted on double probabil i typaper (see Swets [4 ] , pp . 114-115) i s0.905 or 90.57,. The sl ightiv lower es-t imate provided by W, or equivalentlvby the trapezoidal rule, merelv ref lectsthe fact that the rat ing scale does nothave infinitely fine "grain." In anothercontext, where the rat ings might havebeen expressed on a more continuousscale ( i .c., without t ies), the two wouldagree even better. What is morc. im,portant is that TABLE II , using 89.37 asits est imate of d, produces a standarderror of 3.2%,, compared with the SE of2.96% predicted by the maximumlikel ihood parametric technique. Al-though this 3.2% appears to be a l i t t lehigh, i t is not greatly so; moreover i t ison the conservative side, and guardsaga ins t the poss ib i l i t y tha t the d is t r i -butional assumptions that producecl

l ' l ' he ra t ion . r le fo r th is schemc is conceptua l l rs imp le bu t leng thv . Dc ta i l s can be ob t . r inec l f ronrt h e a u t h o r s .

the SE of 2.96c,4 are not entirely just i-f ied.

The Wilcoxon stat ist ic now providesa useful tool for the researcher whodoes not have access to the computerprogr . rm descr ibed above, bu t who s t i l lwishe's to use as an inclex of discrimi-nation the area under a smooth ROCcurve and to accompany i t by an ap-proximate standard error. He can uselhe parameters of the straight- l ine f i tto produce the smooth ROC curve andthe area under i t and he can use SE(W)as a sl ightly conservative estimate ofthe SE of this smoothed area. In ourexample, simplv by plott ing the clataon double probabil i ty paper, est imat-ing a slope and intercept from astraight l ine f i t , calculat ing from thesea quanti ty that Swets (4) cal ls z(,4 ). andIooking z(A ) up in Gaussian probabil-i ty tables, one obtains a smoothed areaof 90.5%., which is only 0.6% (in abso-lute terms) or 0.667c ( in relat ive terms)dif ferent from the 91.17,obtained by aful l maximum l ikel ihood f i t . By anequally straightforward approach, onecan use the calculat ions in Tast.r I I tocome reasonablv close (3.2% comparedwith 2.96'/c) to the standard error pro-cluced by the maximum l ikel ihood ap-proach. As we wil l see later in sectionV , w e w i l l b e a b l e t o i m p r o v e e v e nfurther on predict ing the SE Producedby this method.

A11 of the discussion thus far hascentered on computing an are.a ancl i tsSE from observed data; we now turn tothe commonlv askeci question, "How

big a sample do I need?"

V. P lann ing Sample S izes

Perhaps the more important use ofthe three-way equivalence between thearea under the curve, the probability ofa corre'ct ranking, ancl the Wilcoxonstat ist ic is in pre-experiment calcula-t ions . A t th is s tage, one is o f ten asked:"We wish to use ( i , the area under theROC curve, to describe the perfor-mance of an imaging system, and wewould l ike to have this index accom-panied by a measure of i ts uncertaintv,i . c . , o f the f luc tua t ions in the indexcaused by the random sampling ofcases. How many cases must be studiedto ensure an acceptable level of preci-sion?" This is equivalent to asking howlarge rt .1 and n1, must be so that the re-su l t ing SE is o f a reasonab ly smal lmagn i tude, and the resu l t ing conf i -dence in te rva l i s cor respond ing lynar row.

In addit ion to the quanti t ies 1.1 and

1

sE(r,v) =

32 Apr i l 1982 Volume 143, Number 1 Hanlev anc l McNei l

€€ ADO-IOICVU f,IISONSVI(I

-u;dra 8ur1er u iuorJ pa^rrap (fi)gs .,nasn o] MoLl Suraas ur dlaq op spoqlauo,\\l JLII uJaMJJq SJr] IeJrlsrlPls IeurJC),IJLII'TJ J.t,roH'slLr.lurradro 3ur.9eur

Ir?)rparu ur pol{laru 8ur1er aql oJ Jolr-1.:rdruor sn()rJas e lou sr polllatu fllvzJr{l'suosPJJ JaLIlo pue asJql Joc

.^] IJIJ

-1;;6ls puc .ilrnrlrsuas '.il.,ueu ',rre.r

-nrre rrisou8erp Jo sluirrodruor ;1ere-d.-rs o,u1 rL{l uo ulpp e{qenlp^ sp1ar,{ 1r'(squarlecl ro.uo1 Suurnbar ' ; i) Ie)rurou-oJa aJoru Suraq o1 uorlrppp ul a)rol{J

Jo Lroqlau aql oq plnom poqletu Surluraqt'rualsis Sur8eur Ierrprru e iq paryrs-sep .{11rarroJ aq plno,l,r leLIl sluarlpd

to JBeluaJJrd 11ut,t,to rql SLrrlpurrlsr ur,i1ucl palsaralLrr JJa,\ auo Jr rrJAJ 'snqJ'60[ rno yo luale,rinba ierr]srle]s rr.llse (srred rrr) sa8eurr rlz aql Jo {urllt uere.n'p Sur.rnseJur ur (AS ,uot) uorsrrerd

Jo sai-odJnd JoC 'luJrutJadra Surieraql ul pasn 60I - vlt + Nll aLIl UPLIj(002 ro ut7) stliuuu Jo raqrunu rrterr8ilqeraprsuor e'sagetur (leurouqe'1uul-rou) 1o s.irarl gg1 - lr paau p{noM auo' '/.06 to 6 0 Jq ol lno pauJnl luaturradxoJCVZ l{l ur 1;ltreJ uI }pt{} Surunsse pue',/,€ ro 90 0 ;d rorra pJppupls e aAJrL{JeoI'tu lq) - I)0t Jo

'uoltnqrrtslP lelluou

-l.l .rql u(r PJqt'q'J()JJr pJt'put'ls t't{lrM

d Jo alerurlsa sn{l ,{uedruoJre plno.,',t

auo pue /parJrluapl f llrarror ara,na8er.ur leurouqe pup leurJou Jql eJaLIMse8uur Jo srred u p 0 uorllerJ aLIl

's.)stlJ IeuJJOu Jo (,\rr) raqtunu Ienb.l

up JunssP suorlplnJle] (p) a,rrnr JC)U J.rpLIn

PJIP JnJl pUP (SJS[?J ICTUJOUqp J() JilquInU

= f'rr);zrs aldues o1 uorlplar ur (r) a^rnJ.)C)tl

rapun pJJp prlprxrtsa loj (:1S) roJrJ LrJplruptS

vu

a^Jnl -)ou up Japun paJV aq] Jcl JsIl prJp surupatr i

3ur1e1nr1er iq ildurs d atptullsa pInt)ia.ruo /tJtlul aql ul 'tuaturradx.r

JJVZlufil.)t1 uu UJOJJ pautelqo uJJq JAPr{plnom Iell.lr r.{lr.&\ alduura luaurlodraflriyl; pauorluau Alsnor,rard rno r-uor,yplrurptqo 7,9 ilalerurxoldde yo (d)ES aqlcrBduor ol lsaJalur yo sr 1r '3urss'ed u1

'l'r/ sJsPJ Ieurouqe Jo raqLunu aql

lenba 1ou saop Ni/ seseJ IPuJJou Jo Jaq-urnu rq] uJr{.ry\ s,1s alelnf,ler o} z pue

I selnruJo-I ol uos.lJ ]snru auo 't'(Z 3lC) Irpour uerssnP) rr{}

rJpun paurelqo asoql uer{J a^rle^Jas-rror arour {1lq8r1s are s,fs aLII €

'I o] asolr asoll] ' r'l 's,1, LISILI.{ra,r. ro1 lsalleurs als s,fs aq1 'Z

'ES ar{1 a^leq ol azls aldulesaql aldnrpenb lsnLu auo 'alduera

roJ 'leqt os 'lll LIIIM Alasra^ut .{le.lAoql '�s,1s LIIrM sl)adxa auo sv 'I

:patou aq plnoqsslurod;o rrqrunu y s,p palednlJup snol-JeA pue sazrs aldues snorrel JoJ (d)ESsa,rr8 E arngr{ '%9g

E o} ES er{l aJnpar

III,vr 09 - Nil - vr allqm 'hl€'V p ASueslrrperd I plnruroC'97 = Nrr = f'lr qlll,tuaql IISl'0 = 98'I +.(gg'O)Z = zO puP

l6El'0 = 9I I + S8 0 : t8 uaql 'o/,98 =

1i Jo poor{roqq8reu aq} uI aq ol polrad-ra sr ,irernrre rrlsouBerp aql eJJq.{\

JuarurJadxa ue Japrsuor 'alduera

roc IIetus .{1lua1rg;;ns sl (d)ES Ulun

-ocira .r,r.11e8au = rlrrl prlr)s 1tuop.)rr1 ,f :t:1;-ilp sn()rJpA LIll,u uuuES = sa:enbs lE 6:1 or1e.r

JJuuIJllA tJlll\ LIeIssneD = saliutrtJl 19 I: I OIIPJ

,)lurlrJP,\ r{Jr,\,r uL'rssnPS = salJJrJ suorlnqrr}srp

Bur,il.rapun turJaJJrp ruoJJ palpraur8 .r,r.rn:

.)(JU rJpun eaJe JoJ JoJJa lrrepue]s palec{rrrguy

a^rn) lou raPun oarv,,anrI,,

'uorlPur)sa poor{rl-alrl urniurxpu ,iq parnporcl L96 Z Jt.lt ot JasolJ

ua^a'li,/g1 ro l0€00 JO:JS rrp sart3 1 uotlenb"1

olur sanlpA JJJqI asrrqt Srrr;n]r1sqn5 66gg g- zD pup SgZg 0 = lO urt?lqo am'7 uorlenbt ur grol srql Hursl ,1,9 06 ro 906 0 lo erre ue pa:npord

Al uorlJas ur pJssnJsrp elpp arll'alcluer: roqplep l2lJsrJo JOJ 35 ue iJurlelnf,leJ JO porllJu puo

-:as e aprrord ol'lll uorlf,as Jo pua aqt tp pJUoIl-uau sp 'oslp )nq fS )tnlill u 1-t,tltul o1 ,{1uo 1oupJsn aq upr qreordde prspq-uoltnqlrlslp slVJ I

s,r aql .'{re,r uBr pue'1; oJupruroyrad 1t-rIr^al paledrJrlue .,{ue 1P ipalJadxa aq ol

:1S rr{1 urelqo a,4^'I elnurroc olul paln}-rlsqns ale suorssardxa JSar{J uaL{M

ft) Ql+ t) + 70e =zO

Qt-z)+d=tO''J't'0lo suoil)unJ

aldurs se passardxa aq ueJ z| pue tf

sarlrluenb ar{l 'laporu slr{l rapun eJUISrlr?unlroJ .{11enadsa sl slt{I paraprs-uor slapotu JaLIlo aql uelll aAIleAJas-uor rJour {11t13r1s are lelll s,1g sp1ar.{

Irporu Ierluauodxo aarle8au alil'(arotuJo %0g Jo seare) lsaralur 1o aSuer aqlur leq] sruaas JI'Ja^oarol/{ suollnqul-srp ierluauodra anrleSau ro 'euue8'uprssne.J 3ur,{1,iapun salelnlsod auo sesa8ueqry pue (d):IS uaa.{ laq drqsuorl-elar orlt alllll

".4^oq s1t{or{s z ernSl{

'oldr.uexa up sV 'suorlnqrJlsrp aql JosJelaruered rrqlrnJ .{ue .{q paluanlJul.{ltq8r1s ̂,{.ra.L,{1uo pue'B dq paurruralap.4.1arr1ua lsotule sr I elnwJoC suollnq-rrlsrp Jo rred par;neds ^,{ue ro1 ',,{1a1eu

-nlloC Nr pue t'-r JoJ suollnqrJlslp3ur,{lrapun ar{l Jo suorlf,uny xalduorare z0 pue I0 serlrluenb aql 'p PareanJl arll Jo sanle^ paledrrrlue asn UPJouo alrr{M zO pue'10'(/-sralaurer€dJaqlo aarql suleluo) I elnruroC 'Nll

'uollPullullf,slp ou =aurl leuoBerp palop 1r^rnl (pai^eq-uerssne3)p;r{}oorus = .rrirl prlos 1o,t:n: 1e:rrrclura= aurl prqsPc J:n{v.L ur elPp roJ.r^rnr Jo}l

ooL oso ooJ") 'ooo

a.D ooo

l:::, ]"",

f

o

m

om

?.f

o6

o

1 arn81g

TasrE II I : Number of Normal and Abnormal Subjects Required to Provide a Probabil i ty of B0%,90'/<,, or 95% of DetectingVarious Differences between the Areas 0y and 02 under Two ROC Curves (Using a One-Sided Test ofSignif icance with P = 0.05)

0z0 r

0.700

o.725

0.7 50

0.775

0.800

0.825

0.850

0.875

0.900

0.925

0.950

0.750652897

I 1 3 1

0.77 5286392493

6 1 0839

1057

0.800158216271

0.8251001 3 5t69

0.8s06892

1 1 5

93126157

1 3 61 8 5/ . 1 |

221306383

463634797

0.87549Ot)

82

0.90037496 1

1 7 67392q8

0.925283846

9601 3 1 41 648

0.9751 82329

20273 3

3 l38

273644

3343tr.1

4052

50oo8 i

6687

107

967 2 71 5 6

1 6 5220272

4576 1 5765

0 9s0222936

.1+

3 l4 150

374960

4 66 I75

597997

8 ir 0 81 3 4

1 2 31 6 5205

22830tt3E3

9f ,

55

4 15568

az6986

6fl92

1 1 3

96r29160

r 5 0203z a 1

290393' l 9 l

6 17 5

587796

77104l ? q

1 1 01491 8 5

638 5

106

851 1 5l + l

r 2 3167209

20'l

342

408557699

1 4 8201252

,).) /423

5 1 6707889

267366459

565776976

350477597

710966

I 209

80? probab i l i t y = top number ;90 ' r l p robab i l i t y = midd le numbt ' r ;95? probab i l i t v : bo t tom number

me-nt to construct a confidence intervalon 0 . As one might expec t , and indeedas we h t rve found by repeatcd ly s imu-la t ing da ta f rom two ove r lapp ingGaussian cl istr ibutions, constructing anROC curve, and deriving tht- area t/gnder the curve, the distr ibution of thed's we obtainecl is not entirel l , sym-m e t r i c , b u t i s ^ i n s t e a d s o m e w h a tskewec1 towards l i = 0.5. This skewne'ssis more ' marked as the " t rue" / i ap-proaches 1, and as the expected numbe'ro f "misc lass i f ied pa i rs " [ r r ( t

* 1 ,1 ) ] fa l l sbelow 5; this is iclentical to wht.rt occursr t ' i th the b inomia l d is t r ibu t ion anc l asuccess probab i l i t y c lose to un i tv . Insuch cases , one usua l lv resor ts to anexact (asymrnetr ic) conficlence intervar Ifo r 0 , ra ther than us ing the appro l i -mate (s lmmct r ic ) one o f + 1 .645 SE( l i ) ,1.96 SE(//), . . . , provided bv the normald is t r ibu t ion . In the examplc here r r r (1- 0 ) = l0 i s cons idc- rab lv g rea ter than

the ru le o f thumb o f 5 ; thus , the sym-met r ic 95% conf idence in te rva l o f90 .5 ' / , I1 .96(3 .07) o r (84 .57 , 96 .5%) w i l lbe reasonablv correct, cttmpared rt ' i t l -rthe exact, sl ightly asvmmetric intervalo f (82 .4 '7 , ,95 .11 , ) ob ta inec l bv consu l t -ing chartecl conficlence- l i rni^ts for bi-n o m i a l s a m p l i n g ( 1 0 ) w i t h 1 / = 0 . 9 0 5a n d l r = 1 0 0 .

F ina l l v , we cons ider the qucs t ion o fob ta in ing su f f i c ien t ly Ia rge sampler . t n B e i w h e l l ( ) n e w i r l t e : t ( r ( ' \ , t n l i l l e

the t l i f ference be-tween twt 'r areas, strthat l f nl i r tryorl nrt di i f crurcc i t t yt ' r ior ' 'n r t t r tL 'c ex is !s , i t w i l l be un l i ke ly to gcrundetec tec- l in a tcs t o f s ign i f i cance.

V I . D e t e c t i n g D i f f e r e n c e sbetween Areas under Two l tOCCurves

Again , knowing in ac lvance the ap-prox imate SE 's tha t a re l i ke lv to ac-

company an es t imate o f f / , we 'can ca l -culate how many cases must be stucl iedso tha t a comPar ison o f two imag ing

sys tems w i l i have anv g iven degree o f

s ta t i s t i ca l Power . Th is power o r ' i s ta -

t ist ical sensit ivi tv" depends on howsmal l the probab i l i t ies a and P o fcommitt ing a tvpe- I or tyPe II crror are.Tvp ica l l y , one seeks a power ( f 00 - / l )of 85% or 90'7 so that i f t sTtt 'ci f icd dif

f crurce r 'r isfs, i t is 85% or 907 ce'rtai n tobe ref lectecl in samples that wi l l be

dec la red "s ta t i s t i ca l l y d i f fe ren t . " T ra-

d i t iona l l v , one uses a tyPe I e r ro rprobab i l i t v o r o o f 0 .05 (5? ) as the c r i -te r ion fo r a s ign i f i can t d i f fe rence- .

' Ian l l I I I g ives the numbers o f nor -

mal and abnormal cases requ i re 'd fo r

each ROC curve to have an 801r ,90(Ic, ctt

957 assurance tha t var ious rea l d i f fe r -errces 6 between two areas, 0y and H2,lv i l i i ndeed resu l t in sample curvesshowing ; r s ta t i s t i ca l l y s ign i f i car r t c l i f -

34 Apr i l 1982 Vo lumc 143, Numbr- r 1 Han ley and McNe: i l

9€ ADOTOIOVU ]IISONCVI(I

ol saqsrm auo JI Ieuo8erp (a,Lrleuro1-Lrruou) Jr{} Ol .r^rlelaJ aAJnJ lC)U uelo ltotlu;t0l IIeJa^o aLIl anlP

IulrJJurnua18urs e ur ,{;rluenb ol pasn JJlarupr-ed p sr aAJnJ JOU up Japun ,.)Jrl aLIl'laslno alil le pauorluaur se 'puoJas

'r,ualqord sri{l Jo aldr.ueraJerlnueJ e sr slsoll parredun pue parredu,).l.ulaq rrurJrJJrp aLtI saldr,uns

lucpuadapur roJ slsol 1err1sr1e1s {qpolPJJl aJe suorl€AJasqo parrecl uaq^^pJurplqo sr upLll lsJl InJJJ,t,r()d.rl,rLuqJnuJ e saprnord sas.,ileue IerrlsrJelsparred Jo asn aql /alqelrele ale suorlel-Jrsqo parred uaLI.r\ ler{l aldrrurrd rqluo paseq are .{aql ',{11errseg (Zt) atrp-;n luanba*qns p Jo l)J[qnq Jq] aJp u()rl-enlrs rallpi slql qll^{ Ieap o1 spoq}al\l'(lserluor

Jo sla^al 3ur.,{re,L lnoqlr,4r pupq1r.u, '{rotslq lnoqtl,4t pue qll-{{ ' rl ,r)suorlrpuo) luararJrp rapun saseJ Jo lasaues aql 3ur1en1e.La sr Japear auo uaq^^lo sJseJ Jo sl;s ;rups Jr.ll Sururruuv.r .r.rt,sarlrlepouI o,&tl aql uaq,^^ suorlenlrs JoJdIqplrns lou JJE sJdqtunu asaql'pa.rud-ruor Suraq suor;rpuoJ leluaurrJadxaoMl aql Jo LlrpJ .ro.; slrafqns lpturouqppue IeruJou ;o 1as aleredas e 8ur1ra1asuo pase,q are III Etgvl ur suollPlnrlerazrs rldures aql lelll Palou aq plnolls

11 '.{pn1s slrll Jo stlnsar aql Surlard

-Jalur ur .{ressarau aJe slealeJ o.nlrl'paurelqo uorleruJo,

-ur aql qlro.tt lou sr elep aql xululelqo

Jo lsoJ aql lPr{] palrnDaJ aJe suorlnlrJS-ur .{ueru os lJeJ ur Jar{lar{,tr ro 'parrnb

-aJ aJe suorlnlrlsur alorllnl.u Jarllaq-^,{'euo1e ̂,{pn1s qf,reasar e ruroJrad ,{laTrlueJ uorlnJrJsur auo Iar{laLIM noulsrole8rlsa.aul lal III,lt sraqrunu esaqJ.'.{1ure1rar

;o saar8ap rrJrrads qll,vr(sarlrlepou SurBerur rraql aruaq pue)seale o^/!tJ uaa.{}aq saruaJJJJrp }Jalapo1 ,{ressarau sazrs aldures alerurxordderllt alplnJler o1 .rlqrssod lr aletu p)pp

asar{I 'palerur+sa aq uer JII E-trrvJ urulvror{s seoJe aLIl 'suorldrunsse 3ur,,{;r1d-rurs auros LIIIM 'sallllepotu Hurlstxa o1drqsuorlela.r ur flnrynads pue ,{1rrrrlrs-uas alerurxordde slr Jo aleturlsa aruosapr,rord 11rm dpnls 1o1rd e /elqelre^e

sauroJaq .,{111epour SurBerur rurau euaqzur'dlleraua3'tueuodur fllerrlrrrsr sazrs aldrues Jo uorlerurlsg 'suorlelnJ

-1er ra.trod pue sazrs aldures to uorlerx-rlsa JoJ Jrlsrlels uoxoJlrM or{l Jo asnaql sl Jaded slql Jo uorlnqrrluol rofeuaql 'tuelJodurr lsoru pue qunoC

'uerssneJ uoJJ AIqeJaprs-uor JaJJrp suorlnqrJlsrp 3ur,,(1"rapunrql Jr urla 'ruole

B wor; pal:rpa.rdaq .,{1a8re1 uer }r ler{l 1sa33ns 7 a.rn8rgur u/vlor.{s suorlPInJleJ Jno 'JJAoJJol'(lI srgvJ ur se) elep aql ruorJ .{lt)or-rp palelnJleJ aq uer lr aJurs'prle^ aqoJ anurluoJ Jlrlvr toJJa pJepuels slr pue

'a^rnJ .)ou aLIl rapun /, urrP ,,rnrl,, rlllEurlelurlsJ Jo poL[]JuJ aJJJ-sPrq JJoLU eaq .{\ou IIr,M Jrlsrlels uoxoJlrA\ Jr{l 'ra^a

-.!roH slrolldunssp I€rurourq JLII LroprsLrq sas^IL?ue fQu r()J riqetins rq lou,\Pul SA)rpUr alrsocluJoJ Jsaql Jo suorlnq-rJlsrp aql 'a^oqP

PaqrJJsaP .]APrJ Jl\4sairoBalel 3u11er aq1 uerll ,,snonurJurlJ,,JJOLU l.lJnuI sI lutll JIPJS P apl^oJcl Jb^JtilqBnoqllV uorssarBa; rrlsrBol pup srs,{-]eue lueurrurJlsrP se qJns sJnDruqJaJurruJ l).lAudP sar1r;rqt'.1old PLrr.' sJl(r){Sursn suorlrrpard lerrparu ;o i1rre1-ndod SursearJur uu sr aJaLIl 'pJIr{I

'aJrJrrJPs IIPUS e SuraJs lr

lnq 'l)l'\;ur .iltqBr1s.rq .ieu l()lr.r prpp-uelr- a{1 ',{1pa11rupy ralndruor o8rele ol asJnolJJ IPnlJp lnoLIlr,M poor{rla)rlurnrurxPur )rJlarueJPcl e Jo lrJauaqllnJ rLll lsoLUlP urPlq() ur'�) rulPbtliin-ur aql lPrll suearu .{4ou JoJJa pJepuelsslr J()J plnuJoJ luuorgelnduo) n puueJJe paqloorus e Bururelqo JoJ poqlaurlqder8 p Jo uorleurqruol aLIl 'pu.oJas

( sanssruorsrJap ruoJJ alqrssod se qJnr.use ,{1r1eporu SurBetur aql Jo sarlrlenb,{loleurturlrsrp JrsurJlur aletedas ol luel-Jodur sr lr 'sassaJoJd

1errs,'{qdoqr,,(sdJaLIIO r{}r,\{ sP'JO a.4-{OH SJOIJA JTJSOU

-3erp ;o sad^,{l orr1 aql Jo slsoJ alllelaraql pup sguarled pas€asrpuou pue pasea-slp Jo [1:y ra8uol ou] arnlnu lurt aqtapnlJur aJupruJo;Jad rrlsouBerp arua-nl;ur leql srolret Iuorsrrap] ieura]xararlto 1.{llenba paiqBre .na ,{1nr;rradspue .,{1r,r11rsuas qllM uor}eururrJJsrproJ Ierlualod rrsurrlur aql sia,ruor.{1uo Surluer lrarro) e ;o ,{lrlrqeqordsrLll lcLl l ,uaq azrsuqdLU.) rM) ,)lplsJSedsrp lrdr.{l ol sp pdlupr.{11-l;.rlo:aq IIlr!1 sa8Prrr IeurJouqe puP IPruJou Jorred_ uropue.r e tpql {tlllqeqord aqt stuos-alclaJ a^JnJ Jou aLIl Japun paJe arll'qreordde

IOU rr{1 Sursn sarlrlepotuSurSerur 3urz,{1eue ro1 pa,{o1dua,{11euorlua,ruoJ'poqlau 3ur1er aql u1lpr{} MoLIs sllnsar aq1 '.{11errseg

lnydlaqalrnb ^,{11en1re ere radBd srql ul u.^.{oqsuaaq a^eq lPql sllnsal a^rlrnlur aqJ

'plarJ slr{t ur {ro.4^ roj srseq

Ierrlsrlels JrurrrJ e epr.Lord ol .{{lueuod-uJr arour lnq eJJe srql ul uorlrnlur rnopre o1 lred ur'uorle8rlsalur srr{l Toouap-un J,t.r'snql'sJSPq elep sno;ua3o;o-1aq '{11ensn pup IIeus palerJosse rraqlpue srualqoJd lecrpatu o1 sanbruqralasar{l Jo uorlerrldde aql qll^,\ palelJ-osse suralqord lenads are araq] 'oslv'3ur>pe1 sr saAJnr f,OU ,{lt^ sarlrlppourSurreduror JoJ pesodord sanbru-qJal lpJrlsrlels Jql ;o Surpuels.ropuna^rlrnlur ue lelll realJ auroJaq se! ll'JalalraoH'srs.{1eue JraLIl ul saAJnJ JOUpasn aleq suosrreduror asaql yo ̂,(ueyq'sanbruqral SurBetur asar{l ruoJJ peurel

J^lnl -)ou ue Japun eaJV rrl] IO asll pue iuruPJtrAl

se,u .i11ur: :,u:.r 11 rc{.i1 e Jo Ii .\lrirqPq(ud ,)tll

113r11 .[o11 ln() ptIrJ ()l !]1JoJ I udrlPnb:l .),\l()\ Llf-l

!)Ll() /pirpnls Jr.r.\\ lrrll s,r/ JL{} ILIt,\\oLIl'ltttJ LII L

-qo uorleurroJul aq+ 1o sttostredwo.l.3urn1o,rur s.rrpnls iueu 1o JJLIT?LII-rolrad aql ol pel seq ualqord rrlsou-3erp aurr:s aql JoJ (aurrrpar.u JeJIJnLI'punoserlln'1.3) sarlrlepou Sur8eurra.Lrlrladtuor .ltau Jo lualpp JLIJ

NOISSNf,SICI

'/r0g pue rr0l uarMlaq rruaraJjrp esr ll Jr uelll %06 puP 7108 uaa.{.{]rq aJua-rrJllp e sl ll JI patrrlrp iltsea aror.u sr,1,)0I Jo aruaralJrp e lelll s^{orqs III:rtfrvJ,'i11eurg lsrxa lreJ ur lou op saJuoJrJ-Jrp palPls rLIl lelll apnl)uor ,{lqeuos-EaJ LreJ Juo'III E-tgVJ uI u^{Ot{s Jsor{lse qJns sazrs aldues elenbape ;o alrdsur sgsrsrad ,,a)uara;Jrp ou,, aql JI

'puell

ralllo aql uo c'rorra 11 ad.fi e;o .'{1r1rqrs-sod lear e sr araql 'aJuaraJJrp lue)rJIu-8rs ,{11err1sr1els e .{ ot{s o1 pa1re1 .{pn1s

IIerus e Jr tpr{l ,Mor{s op .{aql 'ra.r.e.rtc)11'saJuaraJJrp lueirodurr lsetu o1 1depue,,purjun,, aq IIIM Surldrues ar{} teLItrpnlrlle rrlsrrurssad aql qll.4^ peuSrs-ap are ieql '11e ratJe lSuIpurq ,{1rres-sa)3u Jou aJe suorlelaplsuoJ asaql

'ra,ltod ]Ptrlsllels ssJI

a.lr8 seut'r rallews alrLI.tt'aluaJaJJIp azlsaues Jql 3ur1ra1ap Jo JJueJnsse Jal€JJBe a^Pq Io saJuaraJJlp Jsllerus lJalapol auo ,\^olle sezrs aldues raSJP'I 'aJua

-rarJlp luerryru8rs .,{11err1sr1e1s e p1ar.{plnom saldures aql uo aruelrltu8ts

Jo tsal ar{t }eql (.{trlrqeqord 469 ue ''.t r)aJueJnsse qBIq e aAeL{ ol aAJnJ qJea

ro1 slralqns lPrurouqe 9 LI pue slrafqns

Ierurou 9lI lo eldwes e uo uuld o1paau plno^L auo'%0'06 pue 7,9'28 araM(suorlelndod alrurJur r{llm pa^alqle aqplno.\4. lelll asoql '';r'l) seare ,,onrl,, aLIlpaapur JI ler{} smotls 1r 'aidruexa ue sy

(7 uorlenbE s1 zp Sursn paurPl-qo z0 pue r0) 9gZ - zO + rO = zA

(7 uorleirbl g1 Ig Sursn paulel-qo zO puP I0)

!.gZ - rO + tO - I/t0_20=Q

ra,ryrod tbg6 to' "1,06 '

hjg tol grg' I to ' gz' I 'rB' 0 : r, ,

aruerryruBrs

Jo lsal paprs-auo %9 e rcl '9v9'I = ilZ

aseJ Jno uI aJaqM

(s)

-lof {q u_a,rr8 elnuroJ azrs aldues aql

Jo uorleldepP u€ uo pasPq aJe suorlPln)-[p) rr.{l (prpls-auo 's0

0 > ,/; arue.r.l;

test stat ist ical ly whether two curlcs ate

dif ferent (they could st i l l subtend the

same area but cross each other), then

one must resort to a bivariate statistical

tes t (13) . Th is tes t s imu l taneous lycompares the two parameter values-athe dif ference between the r.1 and ,t ,ydistr ibutions and b the rat io of theirvariances, which describe one ROC

curve with the corresponding values

for the second curve. The test requires

that one supPly eit imates of n and &, as

w e l l a s e s t i m a t e s o f t h e i r v a r i a t t c e s a n d

covariance. These estimates are pro-

vided by the maximum l ikel ihood es-t imat ion techn ique c lescr ibec l byDor fman and A l f (9 ) .

Acknowledgments : Wt ' u ' i sh k r thank Co l in

Bt 'gg , Char l t ' s Metz , anc l S tan lev Shap i ro fo r

he lp fu l suggest ions anc l l r t ' ne McC. rmmon fo r

p r e p a r i n g t h o m r n u s c r i p t .

J a n r c s A . H a n l e r ' , P h . D .

Dep; r r tmcnt o f I ip ic lemio logv anc1 Ht 'a I th

McCi l l Un ivers i ty3775 Un ivers i tv S t re r ' tMont rca l , QuebecCanac ia H3A 2U.1

References

I . L u s t c c l L I l . l ) c c i s i o t t - m a k i n g s t u d i c s i n

p a t i e n t m a n a g c m e n t . N E n g i I l v { e t l 1 9 7 1 ;

28'1:.11 6-'12'1.2 . Cooc icnough DJ, Rossmanr r K , Lus ted LB.

Rad iograph ic app l i ca t ions o f rece iver op-e r a t i n S c h a r a c t e r i s t i c ( R O C ) c u r v e s . R a d i -

o l o g v 1 9 7 4 ; I l 0 : 8 9 - 9 5 .

3 . Metz CF. . Bas ic p1 i116 ip1es o f I IOC , rn . r l vs is .

Semin Nuc l l v {e t l 1c )78 ; 8 :283-298

4. Swets JA. ROC ana l r -s is app l ie t l to th t '

eva Iua t i r )n o f mec l i ca I i mag i ng tech t r i c1 u t ' s .

l n y c s t R a t l i g l 1 9 7 c . ) ; l , l : 1 0 9 1 2 1 .

5 . S n e t s l A . I ' i c k e t t R M , I V h i t e h c a d S l i , c l . r l .

Assessn le r t o f d iagnos t ic i t ' chno log i t ' s

S c i r n c e 1 9 7 9 ; 2 0 5 : 7 5 3 - 7 5 9 .

6 . ( l r t ' en I l , Swets J . S igna l c le tec t ion theorv

anc i psvchophvs ics . Nerv York : Johr l Wi lev

a n c l S o n s , I 9 6 6 : , 1 5 - 4 9 .7 . C, reen l )M, l v loses FL. ( )n th t 'equ iv . r le t tcc

o f two recogn i l io t r n t t 'asu tes o f s l r l l r t ' t c rm

n lcm() r ] ' . Psvcho l l lu l l 1966;66 :228 23 '1 .

8 . Bar rb t ' r I l . The r re , r above the or t l i l l . r l

c l o n r i n . r n c t g r a p h a n c l t h c . r r e . r b e l o x ' t h e

rece iver opera t ing graph. I Math I ' svc l r I 975;

l 2 : 3 8 7 , 1 1 5 .

9 . L )or f rnan I lD , A l f [ . N{ . r r in tum l i kc l ihoo t ]

es t in ld t i ( )n o f p . t rameters t l f s ig t ta l t l c tec t io l l

t l reor r ' . rn t1 de t t ' rn t ina t io l o f co t l l i t l t ' nc t '

in te rv . r l s ra t ing-nre thor l t la t , r . I N l . r th

f ' svch 1 969; 6 :187 -196.

10 . Pe; r rsou ES, H. r r t lev I IO. ec ls . l l i t rn le t r i ka

tah les fo r s ta t i s t i c i . t t t s . \ ' o l . 1 .3 t1 c r l . , Lo t t t lo t r :

I l i omct r i k . r T r r rs t , 1 c )76 :228.

1 1 . C o l t o n T . S t , r t i s t i c s i n m e r i i c i r t c . I J o s t o n ;

l - i t t l e , B r t r t v n n n d C o m p a n \ ' , 1 9 7 . 1 : 1 6 E .

Han l t ' r ' lA , McNr ' i l BJ . Conrpar ing thc a r ( 'as

unc le r tu 'o I IOC curves t le r i ve t i f rom thc

sa me s . rm p ic ( ) f pJ t ien ts . Rac l io logr ' ( lo r th -

c ( ) m i n g ) .

Mr tz CI : , Kronnr , rn H I l . S t . t j s t i c . t l s jg r t i f i -

c . rncc tes ts fo r b i r ro rm. r l I IOCI cun 'es . I Math

I ' svch I 980; 22 :218 743.

L2

t 3

36 Apr i l 1982 Volume 143, Numbcr I Hanler . 'an t l McNei l