design and assessment of definition diagnostic test chp 2 diagnostic test.pdf · design and...

15
Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen University Cancer Center Email: [email protected] Content 2 Definition 1 Study design 2 Assessment 3 Application and clinical significance 4 Content 3 Definition 1 Study design 2 Assessment 3 Application and clinical significance 4 Definition Diagnostic or screening tests are done to obtain information that can guide a health care provider's decision to initiate or continue a therapeutic intervention. Tests performed in persons with a symptom or sign of an illness are usually termed diagnostic test. Tests did in individuals with no such symptoms or sign are referred to as screening test. 4

Upload: others

Post on 07-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

Design and assessment of diagnostic test

LI Jibin, MD, PhDDepartment of Clinical Research, Sun Yat-sen

University Cancer CenterEmail: [email protected]

Content

2

Definition1

Study design2

Assessment3

Application and clinical significance4

Content

3

Definition1

Study design2

Assessment3

Application and clinical significance4

Definition

• Diagnostic or screening tests are done to obtain information that can guide a health care provider's decision to initiate or continue a therapeutic intervention.– Tests performed in persons with a symptom or sign of an

illness are usually termed diagnostic test.

– Tests did in individuals with no such symptoms or sign are referred to as screening test.

4

Page 2: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

5

Screening test

Diagnostic test

Treatment

preventionScreening again

NegativeNormal

PositiveNormal

PositiveDiagnosed with disease

Flowchart of screening and diagnostic test

• Diagnostic tests can be: – medical history– physical examination– laboratory test– imaging examination (X-ray, CT, MRI, etc.)– recognized diagnostic criteria– ……

6

Diagnostic test can be used to

• Screening• Determining severity• Optimal therapy• Prognosis• Monitor• ……

7

Example

• Carotid ultrasound can diagnose the severity of the patient’s carotid stenosis.

• Carotid ultrasound can tell you the patient’s prognosis of stroke.

• Carotid ultrasound can predict the efficacy of certain therapy on your patient.

8

Page 3: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

Content

9

Definition1

Study design2

Assessment3

Application and clinical significance4

(1) Gold standard

• Gold standard: the most recognized standard for clinician to diagnose the target disease.

• Biopsy

• Surgical operation

• Pathological anatomy or autopsy

• Special imaging detection (X-ray film, CT scan)

• Long-term follow-up

• Other convincing tests

• ……

10

New diagnostic test vs. Gold standard

• Apply gold standard to confirm whether or not the participants have the target disease.

• New diagnostic test to examine whether a participant has a positive or negative result.

• Test results can be expressed as a 2×2 table.

11

• Construct a 2×2 table

12

Gold standard

Disease Normal Total

New diagnostic

test result

Positive a b a+b

Negative c d c+d

Total a+c b+d n

 

True +

True -False -

False +

Table 1 a 2×2 table of diagnostic test

Page 4: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

(2) Participants selection

• Representativeness– Case and control participants should be recruited from the those

with the target disease and without the target disease, which should be representative for the corresponding population.

– A broad spectrum of the disease• Case group: different types of the disease, such as typical and non-

typical, from mild to severe, etc.• Control group: a broad spectrum of competing conditions

13

(3) Blinding method

• Blinding is important.• To avoid observer bias.• Observers determine the results of diagnostic test by

blinding of the disease conditions of participants.

14

(4) Sample size determination

• Statistical significance level: α• Allowable error: δ• Estimates of sensitivity and specificity

15

2

1 2 group : (1 )Case Z Sen Senn

2

2 2 group :(1 )

ControlZ Spe Spe

n

0.05, 1.96(two-side), 0.80, 0.60, 0.10Z Sen Spe

Example 1: Assuming a sensitivity of 80%, specificity of 60% of ultrasonography for diagnosis of cholecystolithiasis. Please estimate the sample size ?

16

2

1 2 group : (1 )Case Z Sen Senn

2

2 2 group :(1 )

ControlZ Spe Spe

n

Page 5: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

0.05, 1.96(two-side), 0.80, 0.60, 0.10Z Sen Spe

Example 1: Assuming a sensitivity of 80%, specificity of 60% of ultrasonography for diagnosis of cholecystolithiasis. Please estimate the sample size ?

1.96 0.80 1 0.800.10 621.96 0.60 1 0.600.10 93

17

2

1 2 group : (1 )Case Z Sen Senn

2

2 2 group :(1 )

ControlZ Spe Spe

n

Content

18

Definition1

Study design2

Assessment3

Application and clinical significance4

Measures of assessment

19

Measures Formula

Sensitivity (Sen) a/(a+c)

Specificity (Spe) d/(b+d)

Youden’s index (J) Sen-(1-Spe)

Accuracy (Acc) (a+d)/(a+b+c+d)

Positive predictive value (+PV) a/(a+b)

Negative predictive value (-PV) d/(c+d)

Positive likelihood ratio (+LR) Sen/(1-Spe)

Negative likelihood ratio (-LR) (1-Sen)/Spe

Prevalence (Prev) (a+c)/(a+b+d+c)20

Gold standard

Case Control Total

New diagnostic

test result

Positive a b a+b

Negative c d c+d

Total a+c b+d n

 

True +

True -False -

False +

Table 1 a 2×2 table of diagnostic test

Page 6: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

Example 2: 360 subjects received an independent, blind CPK (Creatine Phosphate Kinase) test for diagnosis of myocardial infarction (MI). The diagnostic test results were showed in table 2.

21

Table 2 the 2×2 table of CK diagnostic test

Gold standardTotal

MI No MI

CPK + (<80) 215 (a) 16 (b) 231

CPK – (≥80) 15 (c) 114 (d) 129

Total 230 130 360

22

(1) Sensitivity (Sen)• proportion of those with the certain disease who have a positive test.• Sen=a/(a+c)=215/230=0.935• False negative rate=1-Sen

(2) Specificity (Spe)• proportion of those without the certain disease who have a negative

test.• Spe=d/(b+d)=114/130=0.877• False positive rate=1-Spe

Gold standardTotal

MI No MICPK + (<80) 215 (a) 16 (b) 231CPK – (≥80) 15 (c) 114 (d) 129

Total 230 130 360

Normal Illness

Diagnostic test result

Normal IllnessOverlap

Population

A

B

Ideal distribution of normal and abnormal population

Actual distribution of normal and abnormal population

Diagnostic test result

23

Cut-off point

Normal Illness

- test + test

Relationship between sensitivity and specificity

24

False negative rate (β) False positive rate (α)

Spe (1-α)

Sen (1-β)

Page 7: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

• Sen ↑,False - ↓

• More clinical significance with

diagnosing negative result.

• Application: it would result in

significant consequence when having

a high omission diagnostic rate.

Relationship between sensitivity and specificity

Cut-off point

Spe (1-α)

Sen (1-β)

Normal Illness

- test + test

25

Cut-off point

Spe (1-α)

Sen (1-β)

Normal Illness

• Spe ↑,False + ↓

• More clinical significance with

diagnosing positive result.

• Application: it would result in

significant consequence when having

a high missed diagnostic rate.- test + test

Relationship between sensitivity and specificity

26

Relationship between sensitivity and specificity

27

Blood glucose (mg/100ml) Sen (%) Spe(%) Blood glucose

(mg/100ml) Sen (%) Spe(%)

80 100.0 1.2 150 64.3 96.190 98.6 7.3 160 55.7 98.6100 97.1 25.3 170 52.9 99.6110 92.9 48.4 180 50.0 99.8120 88.6 68.2 190 44.3 99.8130 81.4 82.4 200 37.1 100.0140 74.3 91.2

Table 3 the Sen and Spe under different cut-off points of blood glucose test

To weight sensitivity and specificity by using optimal cut-off point

Which is better?

Gold standard

A Cancer Other

Cancer 160 40 200

Other 40 360 400

Total 200 400 600

Sensitivity160/200=80%

Specificity:360/400=90%

Gold standard

B Cancer Other

Cancer 170 60 230

Other 30 340 370

Total 200 400 600

Sensitivity:170/200=85%

Specificity340/400=85%

28

Page 8: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

29

(3) Accuracy (Acc)

• The proportion of those with and without the disease who have a correct test results

=(215+114)/360=91.4%

Gold standardTotal

MI No MICPK + (<80) 215 (a) 16 (b) 231CPK – (≥80) 15 (c) 114 (d) 129

Total 230 130 360

30

(4) Youden’s index• The difference between true positive rate (Sen) and false positive rate

(1-Spe).• J=Sen-(1-Spe)• Range from 0 to 1; the closer to 1 of Youden’s index, the more

accuracy of the diagnostic test is.In the example 1,

• J=0.935-(1-0.877)=0.812

Gold standardTotal

MI No MICPK + (<80) 215 (a) 16 (b) 231CPK – (≥80) 15 (c) 114 (d) 129

Total 230 130 360

31

(5) Predictive values (posttest probability)• Positive predictive value (+PV): proportion of those with a positive test

who have the disease.• +PV=a/(a+b)=215/231=0.931

• Negative predictive value (-PV): proportion of those with a negative test who do not have the disease.• -PV=d/(c+d)=114/129=0.884

Gold standardTotal

MI No MICPK + (<80) 215 (a) 16 (b) 231CPK – (≥80) 15 (c) 114 (d) 129

Total 230 130 360

32

(6) Prevalence (pretest probability)

• The proportion of those with disease in the population

• Prevalence may vary largely according to different population

=230/360=0.639=63.9%

Gold standardTotal

MI No MICPK + (<80) 215 (a) 16 (b) 231CPK – (≥80) 15 (c) 114 (d) 129

Total 230 130 360

Page 9: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

• Based on Bayes’ conditional probability theory,

33

1 1‐

Content

34

Definition1

Study design2

Assessment3

Application and clinical significance4

Stability of the index

• Stable index: Sensitivity, Specificity, +LR, -LR

• Relatively stable index: Accuracy

• Unstable index: +PV, -PV

35

Figure 5 Relationship between prevalence, Sen, Spe and PPV

36

Prevalence

PPVSen/Spe

Page 10: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

37

• Prevalence ↑, PPV ↑

• Prevalence ↑, NPV ↓

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

PV (%

)

Prevalence (%)Figure 6 Illustration for the relationship

between PV and prevalence

-PV

+PV

Example 3: Predictive values under different prevalence of MI

• CPK test to diagnose MI in ICU

38

MI No MICPK + (<80) 215 16 231CPK – (≥80) 15 114 129Total 230 130 360

Sen=93.5%Spe=87.7%+PV=93.1%-PV=88.4%+LR=7.6-LR=0.07

Pre=64%

Sen=93.5%Spe=87.7%+PV=46.4%-PV=99.2%+LR=7.6-LR=0.7

Example 3: Predictive values under different prevalence of MI

• CPK test to diagnose MI in ICU

39

MI No MICPK + (<80) 215 16 231CPK – (≥80) 15 114 129Total 230 130 360

MI No MICPK + (<80) 215 248 463CPK – (≥80) 15 1822 1837Total 230 2070 2300

• CPK test to diagnose MI in general hospital

Sen=93.5%Spe=87.7%+PV=93.1%-PV=88.4%+LR=7.6-LR=0.07

Pre=64%

Pre=10% • Negative likelihood ratio (-LR)

• The ratio of false negative rate to true negative rate

• The smaller the value, the stronger the ability of the test to exclude the disease

• Positive likelihood ratio (+LR)

• The ratio of true positive rate to false positive rate

• The larger the value, the stronger the ability of the test to confirm the disease.

40

=0.07

=7.6

Likelihood ratio and its application

Page 11: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

Likelihood ratio and its application

41

pretestpost odds odds LR

post

Pretest odds=

CPK testMI

+LRYes No

>280u 97 1 (97/230)/(1/130)=55

80-279u 118 15 (118/230)/(15/130)=4.4

40-79u 13 26 (13/230)/(26/130)=0.3

1-39u 2 88 (2/230)/(88/130)=0.01

合计 230 130

42

• Example 4: A male patient, 60 years old, the level of CPK test is 120u. Please estimate the probability that the patient was diagnosed with MI.– Based on the clinical information, it is estimated that the pretest probability of

MI is 60%– +LR=4.2 under CPK=120u

CPK testMI

+LRYes No

>280u 97 1 (97/230)/(1/130)=55

80-279u 118 15 (118/230)/(15/130)=4.4

40-79u 13 26 (13/230)/(26/130)=0.3

1-39u 2 88 (2/230)/(88/130)=0.01

合计 230 130

43

• Example 4: A male patient, 60 years old, the level of CPK test is 120u. Please estimate the probability that the patient was diagnosed with MI.– Based on the clinical information, it is estimated that the pretest probability of

MI is 60%– +LR=4.4 under CPK=120u

Pretest odds=0.6/(1-0.6)=1.5Posttest odds=1.5×4.2=6.6Posttest probability=6.6/(1+6.6)=0.868

44

Page 12: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

ROC curve

• Receiver Operating Characteristic curve, a helpful way to distinguish real signals from false noises in the early days of radar.

• Be widely used to assess the accuracy of a diagnostic test.

• ROC curves nicely display the trade-offs of using more cut-off points of a diagnostic test

45

MI group CPK level No MI35 480 08 440 07 400 0

15 360 019 320 013 280 118 240 119 200 121 160 030 120 530 80 813 40 262 2 88

230 1 0230 130

46

97

133

1

129Sen=97/230=42.2%Spe=129/130=99.2%

215

15

16

114Sen=215/230=93.5%Spe=114/130=87.7%

Example 5: a diagnostic test of CPK level for MI

Sen=35/230=15.2%Spe=130/130=100.0%

35 0

195 130

47

>=480 >=280 >=80 >=40 >=1Sen 15.2% 42.2% 93.5% 99.1% 100%Spe 100% 99.2% 87.7% 67.6% 0%

Using 1、40、80、280 、480 of CPK level as cut-off points for diagnosing Myocardial infarction (MI), the corresponding Sensitivity and Specificity are:

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Sens

itivi

t

1-Specificity

Figure 7 ROC curve for CPK diagnostic test of MI

Area under ROC curve

• The area under ROC curve (AUC) can reflect the overall accuracy of a

diagnostic test.

• AUC ranges from 0.5 to 1.0; for the worthless test, AUC=0.5; for a perfect

test, AUC=1.0

48

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Change line

Worthless test Ideal perfect test

Page 13: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

AUC and accuracy

AUC0.5~0.7 Poor accuracy0.7~0.9 Good accuracy

>0.9 Excellent accuracy

49

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Poor Good Excellent

a b c

• AUC can help decide which of two competing tests for the same target disease is the better one.

50

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

CK

EKG

Sens

itivi

ty

1-Specificity

Figure 8 ROC curve for CPK and EKG diagnostic test of MI

Change line

Combination of multiple diagnostic test

• Parallel test

51

A test B test A+B

+ +

++ +

(1 )Sen SenA SenA SenBSpe SpeA SpeB

• Reduce omission diagnosis rate .• When prevalence is low, parallel test can be used

as primary screening method.

52

Sen Spe↑ ↓

• Parallel test

Page 14: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

Combined application of multiple diagnostic test

• Serial test

A test B test A+B+ + +

+ +

53

Sen = Sen A ×Sen BSpe = Spe A + (1-Spe A) × Spe B

• Misdiagnosis may cause nuisance effect • Confirmatory diagnosis

Sen Spe↓ ↑

• Serial test

54

Example 6: Combined tests for diagnosing Diabetes using urine glucose and blood glucose test

55

Test results DiabetesParallel

test resultsSerial test

resultsUrine glucose

Blood glucose Yes No

+ - 14 10- + 33 11+ + 117 21- - 35 7599

Total 199 7641

Example 6: Combined tests for diagnosing Diabetes using urine glucose and blood glucose test

56

Test results DiabetesParallel

test resultsSerial test

resultsUrine glucose

Blood glucose Yes No

+ - 14 10 + -- + 33 11 + -+ + 117 21 + +- - 35 7599 - -

Total 199 7641

Page 15: Design and assessment of Definition diagnostic test chp 2 diagnostic test.pdf · Design and assessment of diagnostic test LI Jibin, MD, PhD Department of Clinical Research, Sun Yat-sen

57

Diagnostic test Results Diabetes No

diabetesSen (%)

Spe(%)

False –(%)

False + (%)

Urine glucose + 131 31 65.8 99.6 34.2 0.4- 68 7610

Blood glucose + 150 32 75.4 99.6 24.6 0.4- 49 7609

Serial test + 117 21 58.8 99.7 41.2 0.3- 82 7620

Total 199 7643Parallel test + 164 42 82.4 99.5 17.6 0.5

- 35 7599Total 199 7641

Table 4 Results of single and combined test for diabetes

58

Figure 9 Test and treatment thresholds in the diagnostic test process