3up
DESCRIPTION
3upTRANSCRIPT
-
Statistical Analysis ofMethod Comparison studies
Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark& Dept. Biostatistics, Medical Faculty,
University of Copenhagen
http://BendixCarstensen.com
Claus Thorn Ekstrm Statistics, Faculty of Life Sciences,University of Copenhagenwww.statistics.life.ku.dk/~ekstrom/
Tutorial, SISMEC, Ancona, Italy28 September 2011
http://BendixCarstensen.com/MethComp/Ancona.2011
Comparing two methods withone measurement on eachMorning
Bendix Carstensen
MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy
http://BendixCarstensen.com/MethComp/Ancona.2011
Comparing measurement methods
General questions:
Are results systematically dierent?
Can one method safely be replaced by another?
What is the size of measurement errors?
Dierent centres use dierent methods ofmeasurement: How can we convert from onemethod to another?
How precise is the conversion?
Comparing two methods with one measurement on each 1/ 90
-
Two methods for measuring fat content inhuman milk:
1 2 3 4 5 6
12
34
56
Gerber
Trig
Therelationshiplooks like:
y1 = a+ by2
Comparing two methods with one measurement on each 2/ 90
Two methods one measurement by each
How large is the dierence between a measurementwith method 1 and one with method 2 on a(randomly chosen) person?
Di = y2i y1i, D, s.d.(D)Limits of agreement:
D 2 s.d.(D)95% prediction interval for the dierence between ameasurement by method 1 and one by method 2.[1, 2]
Comparing two methods with one measurement on each 3/ 90
Limits of agreement: Interpretation
If a new patient is measured once with each ofthe two methods, the dierence between thetwo values will with 95% probability be withinthe limits of agreement.
This is a prediction interval for a (future)dierence.
Requires a clinical input:Are the limits of agreement suciently narrowto make the use of either of the methodsclinically acceptable?
Is it relevant to test if the mean is 0?
Comparing two methods with one measurement on each 4/ 90
-
Limits of agreement: Test?
Testing whether the dierence is 0 is a bad idea:
If the study is suciently small this will beaccepted even if the dierence is important.
If the study is suciently large this will berejected even if the dierence is clinicallyirrelevant.
Comparing two methods with one measurement on each 5/ 90
Limits of agreement:
1 2 3 4 5 6
0.4
0.2
0.0
0.2
0.4
( Trig + Gerber ) / 2
Trig
G
erbe
r
0.17
0.00
0.17 Plotdierences(Di) versusaverages(Ai).
Comparing two methods with one measurement on each 6/ 90
Model in Limits of agreement
Methods m = 1, . . . ,M , applied to i = 1, . . . , Iindividuals:
ymi = m + i + emiemi N (0, 2m) measurement error
Two-way analysis of variance model, withunequal variances in columns.
Dierent variances are not identiable withoutreplicate measurements for M = 2 because thevariances cannot be separated.
Models 7/ 90
-
Limits of agreement:
Usually interpreted as the likely dierence betweentwo future measurements, one with each method:
y2 y1 = D = 2 1 1.96 s.d.(D)Normally we use 2 instead of 1.96.
Neither are formally correct if we take the modelseriously:
Use a t-quantile with I 1 d.f. Estimation s.d. of 2 1 is /
I.
So we should use t0.95
(I + 1)/I instead.This is 2.08 for I = 30 and less than 2 if I > 85.
Models 8/ 90
Limits of agreement:
Limits of agreement can be converted to aprediction interval for y2 given y1, by solving for y2:
y2 y1 = 2 1 2 s.d.(D)which gives:
y2|1 = y2|y1 = 2 1 + y1 2 s.d.(D)
Models 9/ 90
Introduction to computingMorning
Bendix Carstensen
MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy
http://BendixCarstensen.com/MethComp/Ancona.2011
-
Structure of practicals
This tutorial is both theoretical and practical, i.e.the aim is to convey a basic understanding of theproblems in method comparison studies, but also toconvey practical skills in handling the statisticalanalysis.
R for data manipulation and graphics.
So we assume familiarity with R.
Occasionally BUGS for estimation in non-linearvariance component models.
BUGS is hidden inside an R-function.
Introduction to computing 10/ 90
How it works
Example data sets are included in the MethComppackage.
Functions in MethComp are based on a data framewith a particular structure; a Meth object:
meth method (factor)item item, person, individual, sample (factor)repl replicate (if present) (factor)
y the actual measurement (numerical)
Once converted to Meth, just use summary, plotetc.
Introduction to computing 11/ 90
How it looks:
> subset(ox,as.integer(item) subset(to.wide(ox),as.integermeth item repl y Note:
1 CO 1 1 78.0 Replicate measurements are t2 CO 1 2 76.4 item repl id CO pulse3 CO 1 3 77.2 1 1 1 1.1 78.0 714 CO 2 1 68.7 2 1 2 1.2 76.4 725 CO 2 2 67.6 3 1 3 1.3 77.2 736 CO 2 3 68.3 4 2 1 2.1 68.7 68184 pulse 1 1 71.0 5 2 2 2.2 67.6 67185 pulse 1 2 72.0 6 2 3 2.3 68.3 68186 pulse 1 3 73.0187 pulse 2 1 68.0188 pulse 2 2 67.0189 pulse 2 3 68.0
Introduction to computing 12/ 90
-
Getting your own data into R
Take a look in The R Primer by Claus Ekstrm,or:
If your data are not too large, the simplest is to edityour data in Excel or some other spreadsheet tolook like this:
item repl id CO pulse1 1 1.1 78.0 711 2 1.2 76.4 721 3 1.3 77.2 732 1 2.1 68.7 682 2 2.2 67.6 672 3 2.3 68.3 68
The rst line is variable names; the following linesare data.
Introduction to computing 13/ 90
Analysis options in this course
Scatter plots.
Bland-Altman plots ((y2 y1) vs. (y1 + y2)/2) Limits of Agreement (LoA).
Models with constant bias.
Models with linear bias.
Conversion formulae between methods (singlereplicates)
Transformation of measurements.
Plots of conversion equations.
Reporting of variance components.
Introduction to computing 14/ 90
Requirements
R for data manipulation and graphics. Keep a script of what you did:
Use the built-in editor in R the nerds can use ESS or you can download R-Studio.
You need the packages: MethComp R2WinBUGS coda BRugs Epi - Version 1.10 !!!
Introduction to computing 15/ 90
-
Functions in the MethComp package
5 broad categories of functions in MethComp:
Graphical exploring data.
Data manipulation reshaping and changing.
Simulation generating datasets or replacingvariables.
Analysis functions tting models to data.
Reporting functions displaying results fromanalyses.
Overview of these in the Practicals.
Introduction to computing 16/ 90
Does it work?
library(MethComp)library(help=MethComp) # Do you have version 1.10??data(ox)ox
-
Limits of agreement assumptions
The dierence between methods is constant
The variances of the methods (and hence ofthe dierence) is constant.
Check this by:
Regress dierences on averages.
Regress absolute residuals from this on theaverages.
Non-constant dierence 18/ 90
Glucose measurements
4 6 8 10 12 14
4
6
8
10
12
14
Venous plasma
Cap
illar
y bl
ood
Non-constant dierence 19/ 90
Glucose measurements
5 6 7 8 9 10 11 12
4
2
0
2
4
( capil + plasma ) / 2
capi
l p
lasm
a
2.84
0.41
2.02
Non-constant dierence 20/ 90
-
Regress dierence on average
Di = a+ bAi + ei, var(ei) = 2D
If b is dierent from 0, we could use this equation toderive LoA:
a+ bAi 2Dor convert to prediction as for LoA:
y2|1 = y1 + a+ bAi y1 + a+ by1 = a+ (1 + b)y1Exchanging methods would give:
y1|2 = a+ (1 b)y1instead of: y1|2 =
a1 + b
+1
1 + by1
Non-constant dierence 21/ 90
Variable limits of agreement
5 6 7 8 9 10 11 12
4
2
0
2
4
( capil + plasma ) / 2
capi
l p
lasm
a
Non-constant dierence 22/ 90
Improving the regression of D on A
y2i y1i = a+ b(y1i + y2i)/2 + eiy2i(1 b/2) = a+ (1 + b/2)y1i + ei
y2i =a
1 b/2 +1 + b/2
1 b/2y1i +1
1 b/2ei
y1i =a
1 + b/2+
1 b/21 + b/2
y2i +1
1 + b/2ei
This is what comes out of the functionsDA.reg and BA.plot
Non-constant dierence 23/ 90
-
Variable limits of agreement
5 6 7 8 9 10 11 12
4
2
0
2
4
( capil + plasma ) / 2
capi
l p
lasm
a
capilplasma = 2.09 0.32 (capil+plasma)/2 (95% p.i.: +/2.16)
plasma = 2.49 + 1.38 capil (95% p.i.: +/2.57)capil = 1.80 + 0.73 plasma (95% p.i.: +/1.87)
Non-constant dierence 24/ 90
Conversion equation with prediction limits
4 6 8 10 12 14
4
6
8
10
12
14
Venous plasma
Cap
illar
y bl
ood
Non-constant dierence 25/ 90
Prediction intervals
Prediction s.e. for y1|2 is /(1 b/2) Prediction s.e. for y1|2 is /(1 + b/2) The slope of the prediction line is the ratio ofthe prediction s.e.s.
Hence prediction limits can be used both ways:
Non-constant dierence 26/ 90
-
Conversion equation with prediction limits
4 6 8 10 12 14
4
6
8
10
12
14
Venous plasma
Cap
illar
y bl
ood
Non-constant dierence 27/ 90
Why does this work?
The general model for the data is:
y1i = 1 + 1i + e1i, e1i N (0, 21)y2i = 2 + 2i + e2i, e2i N (0, 22)
Work out the prediction of y1 given anobservation of y2 in terms of these parameters.
Work out how dierences relate to averages interms of these parameters.
Then the prediction is as we just derived it.
Non-constant dierence 28/ 90
Why is it wrong anyway?
Introducing linear bias, ymi = m + mi + emiputs measurements by dierent methods ondierent scales.Hence it has formally no meaning to form thedierences.
In the induced model for Di a+ bAi + ei,ei and AI are not independent.
But if is not too far from 1 it not a bigproblem, though.
Non-constant dierence 29/ 90
-
Comparing two methods withreplicate measurementsMorning
Bendix Carstensen
MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy
http://BendixCarstensen.com/MethComp/Ancona.2011
Replicate measurementsFat data; exchangeable replicates:
item repl KL SL1 1 4.5 4.91 2 4.4 5.01 3 4.7 4.83 1 6.4 6.53 2 6.2 6.43 3 6.5 6.1
Oximetry data; linked replicates:
item repl CO pulse1 1 78.0 711 2 76.4 721 3 77.2 732 1 68.7 682 2 67.6 672 3 68.3 68
Linked or exchangeable replicates!Comparing two methods with replicate measurements 30/ 90
Extension of the model:exchangeable replicates
ymir = m + i + cmi + emirs.d.(cmi) = m matrix-eect
s.d.(emir) = m measurement error
Replicates within (m, i) are needed to separate and .
Even with replicates, the separate s are onlyestimable if M > 2.
Still assumes that the dierence betweenmethods is constant.
Assumes exchangeability of replicates.Comparing two methods with replicate measurements 31/ 90
-
Extension of the model:linked replicates
ymir = m + i + air + cmi + emirs.d.(air) = between replicates
s.d.(cmi) = m matrix-eect
s.d.(emir) = m measurement error
Still assumes that the dierence betweenmethods is constant.
Replicates are linked between methods:air is common across methods, i.e. the rstreplicate on a person is made under similarconditions for all methods (i.e. at a specicday or the like).
Comparing two methods with replicate measurements 32/ 90
Replicate measurements
Three approaches to limits of agreement withreplicate measurements:
1. Take means over replicates within each methodby item stratum.
2. Replicates within item are taken as items.
3. Fit the correct variance components model anduse this as basis for the LoA.The model is tted using:> BA.est( data, linked=TRUE ).
Comparing two methods with replicate measurements 33/ 90
Oximetry data
20 40 60 80
20
10
010
20
(CO+pulse)/2
CO
pul
se
Comparing two methods with replicate measurements 34/ 90
-
Replicate measurements
The limits of agreement should still be fordierence between future single measurements.
Analysis based on the means of replicates istherefore wrong:
Model:
ymir = m + i + air + cmi + emir
var(y1jr y2jr) = 21 + 22 + 21 + 22 note that the term air air cancels becausewe are referring to the same replicate.
Comparing two methods with replicate measurements 35/ 90
Wrong or almost right
In the model the correct limits of agreement wouldbe:
1 2 1.96 21 +
22 +
21 +
22
But if we use means of replicates to form thedierences we have:
di = y1i y2i = 1 2 +
r airR1i
r airR2i
+c1i c2i +
r e1irR1i
r e2irR2i
Comparing two methods with replicate measurements 36/ 90
The terms with air are only relevant for linkedreplicates in which case R1i = R2i and therefore theterm vanishes. Thus:
var(di) = 21+
22+
21/R1i+
22/R2i <
21+
22+
21+
22
so the limits of agreement calculated based on themeans are much too narrow as prediction limits fordierences between future single measurements.
Comparing two methods with replicate measurements 37/ 90
-
(Linked) replicates as items
If replicates are taken as items, then the calculateddierences are:
dir = y1ir y2ir = 1 2 + c1i c2i + e1ir e2irwhich has variance 21 +
22 +
21 +
22, and so gives
the correct limits of agreement. However, thedierences are not independent:
cov(dir, dis) = 21 +
22
Negligible if the residual variances are very largecompared to the interaction, variance likely to beonly slightly downwards biased.
Comparing two methods with replicate measurements 38/ 90
Recommendations
Fit the correct model, and get the estimatesfrom that, e.g. by using BA.est.
If you must use over-simplied methods:
Use linked replicates as item.
If replicates are not linked; make a randomlinking.Note: If this give a substantially dierentpicture than using the original replicatenumbering as linking key, there might besomething shy about the data.
Further details, see [3].
Comparing two methods with replicate measurements 39/ 90
Oximetry data
Linkedreplicates usedas items
Mean overreplicates asitems
Limits based onmodel dashed lineassumingexchangeablereplicates
20 40 60 80
20
10
010
20
(CO+pulse)/2
CO
pul
se
Comparing two methods with replicate measurements 40/ 90
-
Repeatability andreproducibilityWednesday 9 February, morning
Bendix Carstensen
MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy
http://BendixCarstensen.com/MethComp/Ancona.2011
Accuracy of a measurement method
Repeatability:The accuracy of the method under exactlysimilar circumstances; i.e. the same lab, thesame technician, and the same day.(Repeatability conditions)
Reproducibility:The accuracy of the method under comparablecircumstances, i.e. the same machinery, thesame kit, but possibly dierent days orlaboratories or technicians.(Reproducibility conditions)
Repeatability and reproducibility 41/ 90
Quantication of accuracy
Upper limit of a 95% condence interval forthe dierence between two measurments.
Suppose the variance of the measurement is 2:
var(ymi1 ymi2) = 22
i.e the standard error is2, and a condnece
interval for the dierence:
0 1.962 = 0 2.772 2.8
This is called the reproducibility coecient orsimply the reproducibility. (The number 2.8 isused as a convenient approximation).
Repeatability and reproducibility 42/ 90
-
Quantication of accuracy
Where do we get the ?
Repeat measurements on the same item (oreven better) several items.
The conditions under which the repeat(replicate) measurements are taken determineswhether we are estimating repeatability orreproducibility.
In larger experiments we must consider theexchangeability of the replicates i.e. whichreplicates are done under (exactly) similarconditions and which are not.
Repeatability and reproducibility 43/ 90
Linear bias between methodsMorning
Bendix Carstensen
MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy
http://BendixCarstensen.com/MethComp/Ancona.2011
Extension with non-constant bias
ymir = m + mi + random eects
There is now a scaling between the methods.
Methods do not measure on the same scale therelative scaling is estimated, between method 1 and2 the scale is 2/1.
Consequence: Multiplication of all measurements onone method by a xed number does not changeresults of analysis:
The corresponding is multiplied by the samefactor as is the variance components for thismethod.
Linear bias between methods 44/ 90
-
Variance components
Two-way interactions:
ymir = m + m(i + air + cmi) + emir
The random eects cmi and emir have variancesspecic for each method.
But air does not depend on m must be scaled toeach of the methods by the corresponding m.
Implies that = s.d.(air) is irrelevant the scaleis arbitrary. The relevant quantities are m thebetween replicate variation within item as measuredon the mth scale.
Linear bias between methods 45/ 90
0 20 40 60 80 100
020
4060
8010
0
y1
0 20 40 60 80 100
020
4060
8010
0
y2
0 20 40 60 80 100
020
4060
8010
0
y1
y2
ymir = m + i + air + cmi + emir
Linear bias between methods 46/ 90
0 20 40 60 80 100
020
4060
8010
0
y1
0 20 40 60 80 100
020
4060
8010
0
y2
0 20 40 60 80 100
020
4060
8010
0
y1
y2
ymir = m + m(i + air + cmi) + emir
Linear bias between methods 47/ 90
-
Estimation: Alternatingrandom eects regressionMorning
Bendix Carstensen
MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy
http://BendixCarstensen.com/MethComp/Ancona.2011
Alternating random eects regression
Carstensen [4] proposed a ridiculously complicatedapproach to t the model
ymir = m + mi + cmi + emir
based in the observation:
For xed the model is a linear mixed model.
For xed (, ) it is a regression through 0.
This has be improved in [5]
Alternating regressions 48/ 90
Alternating random eects regression
Now consider instead the correctly formulatedversion of the slightly more general model:
ymir = m + m(i + air + cmi) + emir
Here we observe
For xed mir = i + air + cmi the model is alinear model, with residual variances dierentbetween methods.
For xed (, ) scaled responses y are used:
ymir mm
= i + air + cmi + emir/m
Alternating regressions 49/ 90
-
Estimation algorithm
ymir = m + m(i + air + cmi) + emir
1. Start with mir = ymi2. Estimate (m, m).
3. Compute the scaled responses and t therandom eects model.
4. Use the estimated is, and BLUPs of air andcmi to update mir.
5. Check convergence in terms of identiableparameters.
Alternating regressions 50/ 90
The residual variances
The variance components are estimated in themodel for the scaled response.
The parameters (m, m) are not taken intoaccount in the calculation of the residualvariance.
Hence the residual variances must be correctedpost hoc.
This machinery is implemented in the functionAltReg in the MethComp package.
Alternating regressions 51/ 90
> AR.ox round(AR.ox,3)From
To Intercept: CO pulse Slope: CO pulse IxR sd. MxI sd. res.sd.CO 0.000 -2.159 1.000 1.063 3.521 2.978 2.055pulse 2.031 0.000 0.941 1.000 3.313 2.802 4.079
Alternating regressions 52/ 90
-
Your turn:
Start on the practical titled:
Oximetry: Linked replicates with non-constantbias
Alternating regressions 53/ 90
Converting between methodsAfternoon
Bendix Carstensen
MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy
http://BendixCarstensen.com/MethComp/Ancona.2011
Predicting method 2 from method 1
y10r = 1 + 1(0 + a0r + c10) + e10ry20r = 2 + 2(0 + a0r + c20) + e20r
y20r = 2 +
21
(y10r 1 e10r)+ 2(c10 + c20) + e20r
The random eects have expectation 0, so:
E(y20|y10) = y20 = 2 + 21
(y10 1)
Converting between methods 54/ 90
-
y20r = 2 +21
(y10r 1 e10r)+ 2(c10 + c20) + e20r
var(y20|y10) =(21
)2(21
21 +
21) + (
22
22 +
22)
The slope of the prediction line from method 1 tomethod 2 is 2/1.
The width of the prediction interval is:
2 2(
21
)2(21
21 +
21) + (
22
22 +
22)
Converting between methods 55/ 90
If we do the prediction the other way round (y1|y2)we get the same relationship i.e. a line with theinverse slope, 1/2.
The width of the prediction interval in this directionis (by permutation of indices):
2 2
(2121 +
21) +
(12
)2(22
22 +
22)
= 2 2 12
(21
)2(21
21 +
21) + (
22
22 +
22)
i.e. if we draw the prediction limits as straight linesthey can be used both ways.
Converting between methods 56/ 90
20 40 60 80 10020
40
60
80
100
CO
puls
e
pulse = 2.11 + 0.94 CO ( 6.00 )
CO = 2.25 + 1.06 pulse ( 6.39 )
Converting between methods 57/ 90
-
What happened to the curvature?
0 5 10
20
10
010
2030
40
x
y
Usually the predictionlimits are curved:
y|x t0.975 1 + xx
In our prediction we have ignored the last term(xx), i.e. eectively assuming that there is noestimation error on 2|1 and 2|1.
Converting between methods 58/ 90
Transformation of dataAfternoon
Bendix Carstensen
MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy
http://BendixCarstensen.com/MethComp/Ancona.2011
If variances are not constantA transformation might help:
> round( ftable( DA.reg(ox) ), 3 )alpha beta sd.pred beta=1 s.d.=K
From: To:CO CO 0.000 1.000 NA NA NA
pulse 1.864 0.943 5.979 0.142 0.000pulse CO -1.977 1.061 6.342 0.142 0.000
pulse 0.000 1.000 NA NA NA
> oxt round( ftable( DA.reg(oxt) ), 3 )alpha beta sd.pred beta=1 s.d.=K
From: To:CO CO 0.000 1.000 NA NA NA
pulse -0.034 0.900 0.306 0.009 0.246pulse CO 0.038 1.111 0.340 0.009 0.246
pulse 0.000 1.000 NA NA NA
Transformation of data 59/ 90
-
20 40 60 80 10020
40
60
80
100
CO
puls
e
pulse = 2.11 + 0.94 CO ( 6.00 )
CO = 2.25 + 1.06 pulse ( 6.39 )
Transformation of data 60/ 90
Analysis on the transformed scale
> ARoxt ARoxt ARoxtNote: Response transformed by: log p/(100 - p)
Conversion between methods:alpha beta sd
To: From:CO CO 0.000 1.000 0.202
pulse 0.042 1.105 0.341pulse CO -0.038 0.905 0.309
pulse 0.000 1.000 0.271
Variance components (sd):s.d.
Method IxR MxI resCO 0.232 0.160 0.143pulse 0.210 0.145 0.191
This is an analysis for the transformed data.Transformation of data 62/ 90
-
Backtransformation for plotting
prpulse
-
0 20 40 60 80 100
40
20
0
20
40
( CO + pulse ) / 2
puls
e
CO
0 20 40 60 80 100
40
20
0
20
40
( CO + pulse ) / 2
puls
e
CO
pulse = 2.03 + 0.94 CO ( 6.01 )
CO = 2.16 + 1.06 pulse ( 6.38 )
0 20 40 60 80 100
40
20
0
20
40
( CO + pulse ) / 2
puls
e
CO
Transformation of data 66/ 90
Implementation in BUGSAfternoon
Bendix Carstensen
MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy
http://BendixCarstensen.com/MethComp/Ancona.2011
Implementation in BUGS
ymir = m + m(i + air + cmi) + emir
Non-linear hierarchical model:Implement in BUGS.
The model is symmetrical in methods.
Mean is overparametrized.
Choose a prior (and hence posterior!) for thes with nite support.
Keeps the chains nicely in place.
This is the philosophy in the function MCmcmc.
Implementation in BUGS 67/ 90
-
Results from tting the model
The posterior distn of (m, m, i) is singular.
But the relevant translation quantities areidentiable:
2|1 = 2 12/12|1 = 2/1
So are the variance components.
Posterior medians used to devise predictionequations with limits.
Implementation in BUGS 68/ 90
The MethComp package for R
Implemented model:
ymir = m + m(i + air + cmi) + emir
Replicates required. R2WinBUGS, BRugs or JAGS is required. Dataframe with variablesmeth, item, repl and y (a Meth object)
The function MCmcmc writes a BUGS-program,initial values and data to les.
Runs BUGS and sucks results back in to R, andgives a nice overview of the conversionequations.
Implementation in BUGS 69/ 90
Example output: Oximetry
> summary( ox )#Replicates
Method 1 2 3 #Items #Obs: 354 Values: min med maxCO 1 4 56 61 177 22.2 78.6 93.5pulse 1 4 56 61 177 24.0 75.0 94.0
>> MCox
-
Simulation run of a model with- method by item and item by replicate interaction:- using 4 chains run for 2000 iterations(of which 1000 are burn-in),
- monitoring all values of the chain:- giving a posterior sample of 4000 observations.
model is syntactically correctdata loadedmodel compiledInitializing chain 1: initial values loaded but this or anotherInitializing chain 2: initial values loaded but this or anotherInitializing chain 3: initial values loaded but this or anotherInitializing chain 4: initial values loaded but this or anotherinitial values generated, model initializedSampling has been started ...1000 updates took 38 sdeviance setmonitor set for variable alphamonitor set for variable betamonitor set for variable sigma.mimonitor set for variable sigma.irmonitor set for variable sigma.resmonitor set for variable devianceImplementation in BUGS 71/ 90
> MCox
Conversion between methods:alpha beta sd
To: From:CO CO 0.000 1.000 1.740
pulse -9.342 1.159 5.328pulse CO 8.061 0.863 4.508
pulse 0.000 1.000 6.115
Implementation in BUGS 72/ 90
Variance components (sd):s.d.
Method IxR MxI resCO 3.878 3.122 1.230pulse 3.222 2.757 4.324
Variance components with 95 % cred.int.:method CO pulseqnt 50% 2.5% 97.5% 50% 2.5% 97.5%
SDIxR 3.878 3.053 4.533 3.222 2.426 3.930MxI 3.122 2.193 9.764 2.757 1.915 5.902res 1.230 0.143 2.639 4.324 3.709 5.019tot 5.220 4.507 10.645 6.135 5.457 7.849
Implementation in BUGS 73/ 90
-
Mean parameters with 95 % cred.int.:50% 2.5% 97.5% P(>0/1)
alpha[pulse.CO] 8.057 -2.457 29.884 0.969alpha[CO.pulse] -9.346 -49.949 2.476 0.031beta[pulse.CO] 0.863 0.604 0.997 0.024beta[CO.pulse] 1.159 1.003 1.657 0.976
Note that intercepts in conversion formulae are adjusted to getconversion formulae that represent the same line both ways,and hence the median interceps in the posterior do not agreeexactly with those given in the conversion formulae.
Implementation in BUGS 74/ 90
Inter-rater agreementAfternoon
Claus Thorn Ekstrm
MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy
http://BendixCarstensen.com/MethComp/Ancona.2011
Program
Example
Random rater vs. xed methods
Statistical modelling
Inter-rater agreement 75/ 90
-
Example: depression ratings
DoctorPatient A B C D E1 8 9 5 8 82 0 0 1 2 13 4 5 5 5 34 5 8 7 8 55 3 3 1 8 26 8 9 9 9 1
Doctor doctor! Am I depressed?
Getting second opinions ... and third ... and fourth...Inter-rater agreement 76/ 90
Example: depression ratings
Research question
How well will two doctors agree on the diagnosis?
In this example we use humans as measurementmethods or raters.
However, we are not interested in makingstatements about specic raters.
Inter-rater agreement 77/ 90
Fixed versus random eects
Denition: Factors can either be xed or random.
A factor is xed when the levels (e.g. raters)under study are the only levels of interest.
A factor is random when the levels under studyare a random sample from a larger populationof raters and the goal of the study is to make astatement regarding the larger population.
Raters can be dened as xed or random factors:
If the raters themselves are of interest (youwant to use them again) then use xed model.
If raters are randomly chosen of possible poolof raters (you do not have specic raters inmind) then use the random model.Inter-rater agreement 78/ 90
-
Fixed versus random eects
Inter-rater agreement 79/ 90
Modelling: exchangeable replicates
The model for xed methods is:
ymir = m + i + cmi + emirs.d.(cmi) = m matrix-eect
s.d.(emir) = m measurement error
Replicates within (m, i) are needed to separate and .
Even with replicates, the separate s are onlyestimable if M > 2.
Assumes that the dierence between methodsis constant.
Assumes exchangeability of replicates.
If no replicates then disregard the cmis.Inter-rater agreement 80/ 90
Modelling: exchangeable replicates
The model for random methods/raters is:
ymir = bm + i + cmi + emirs.d.(bm) = variation among raters
s.d.(cmi) = m matrix-eect
s.d.(emir) = m measurement error
Replicates within (m, i) are needed to separate and .
Even with replicates, the separate s are onlyestimable if M > 2.
Note: average dierence is 0! Assumes exchangeability of replicates.
If no replicates then disregard the cmis.Inter-rater agreement 81/ 90
-
Model for replicate measurements
Same approach as before: Fit the correct variancecomponents model and use this as the basis for LoA.
Extremely exible. Can even be used to analyze the situationwhere every rater not necessarily has scoredevery item.
Exchangeable replicates are not uncommon, e.g.,
Experts scoring/extracting information fromimages
Measurements taken on couples/twins.
Linked replicates do not make sense, when it isarbitrary which person is partner 1 or partner 2.
Inter-rater agreement 82/ 90
Replicate measurements
The limits of agreement / prediction interval for tworandom raters scoring a new future observation is
0 1.96
22Extra variation
+ 21 + 22 +
21 +
22
However, since we are considering the predictioninterval for two random raters we use the averagevariance components in the formula
0 1.96
2(2 + 2 + 2)
Note that the expected dierence is zero since wehave no xed order of the raters.
Inter-rater agreement 83/ 90
Example: Stress scoring of dogs
10 judges scoring stress indicators from 10 dogs.
> dogdata BA.est(dogdata, random.raters=TRUE, linked=FALSE)
Variance components (sd):IxR MxI M res
j1 0 18.145 14.11 20.948j10 0 8.122 14.11 12.736j2 0 0.009 14.11 11.350j3 0 0.004 14.11 9.524j4 0 0.004 14.11 9.614j5 0 8.924 14.11 12.588j6 0 18.534 14.11 21.135j7 0 0.023 14.11 11.991j8 0 0.004 14.11 9.384j9 0 0.003 14.11 9.789
Inter-rater agreement 84/ 90
-
Example: Stress scoring of dogs
10 judges scoring stress indicators from 10 dogs.
> res res$LoA
Mean Lower Upper SDRand. rater - rater 0 -61.02451 61.02451 30.51225
Inter-rater agreement 85/ 90
Linked replicates
For linked replicates, extend the model as before:
ymir = bm + i + air + cmi + emirs.d.(bm) = variation among raters
s.d.(air) = between replicates
s.d.(cmi) = m matrix-eect
s.d.(emir) = m measurement error
The variation between replicates, , does not enterthe limits-of-agreement since the LoAs are for asingle new future observation (ie., the samereplicate from one item/individual for both raters).
0 1.96
2(2 + 2 + 2)Inter-rater agreement 86/ 90
Linked replicates
> dogdata BA.est(dogdata, random.raters=TRUE)
Variance components (sd):IxR MxI M res
j1 6.466 18.754 13.994 21.672j10 6.466 8.163 13.994 10.454j2 6.466 3.213 13.994 10.439j3 6.466 0.011 13.994 5.718j4 6.466 0.033 13.994 11.716j5 6.466 8.817 13.994 10.449j6 6.466 19.205 13.994 21.770j7 6.466 4.732 13.994 10.523j8 6.466 0.009 13.994 5.701j9 6.466 0.026 13.994 11.441
Inter-rater agreement 87/ 90
-
Linked replicates
> res2 res2$LoA
Mean Lower Upper SDRand. rater - rater 0 -60.47317 60.47317 30.23658
Inter-rater agreement 88/ 90
Random raters
Fit the correct variance component modelwhere variation among raters is considered arandom eect
Since each rater can have his/her individualvariance we need to average the individualvariance components
Extract the relevant variance components andcompute the limits-of-agreement
Inter-rater agreement 89/ 90
DG Altman and JM Bland.Measurement in medicine: The analysis of method comparison studies.The Statistician, 32:307317, 1983.
JM Bland and DG Altman.Statistical methods for assessing agreement between two methods of clinicalmeasurement.Lancet, i:307310, 1986.
B Carstensen, J Simpson, and LC Gurrin.Statistical models for assessing agreement in method comparison studies withreplicate measurements.International Journal of Biostatistics, 4(1):Article 16, 2008.
B Carstensen.Comparing and predicting between several methods of measurement.Biostatistics, 5(3):399413, Jul 2004.
B. Carstensen.Comparing Clinical Measurement Methods: A practical guide.Wiley, 2010.