3up

34
Statistical Analysis of Method Comparison studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark & Dept. Biostatistics, Medical Faculty, University of Copenhagen http://BendixCarstensen.com Claus Thorn Ekstrøm Statistics, Faculty of Life Sciences, University of Copenhagen www.statistics.life.ku.dk/ ~ ekstrom/ Tutorial, SISMEC, Ancona, Italy 28 September 2011 http://BendixCarstensen.com/MethComp/Ancona.2011 Comparing two methods with one measurement on each Morning Bendix Carstensen MethComp 28 September 2011 Tutorial, SISMEC, Ancona, Italy http://BendixCarstensen.com/MethComp/Ancona.2011 Comparing measurement methods General questions: Are results systematically different? Can one method safely be replaced by another? What is the size of measurement errors? Different centres use different methods of measurement: How can we convert from one method to another? How precise is the conversion? Comparing two methods with one measurement on each 1/ 90

Upload: antonsurviyanto

Post on 10-Nov-2015

3 views

Category:

Documents


0 download

DESCRIPTION

3up

TRANSCRIPT

  • Statistical Analysis ofMethod Comparison studies

    Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark& Dept. Biostatistics, Medical Faculty,

    University of Copenhagen

    http://BendixCarstensen.com

    Claus Thorn Ekstrm Statistics, Faculty of Life Sciences,University of Copenhagenwww.statistics.life.ku.dk/~ekstrom/

    Tutorial, SISMEC, Ancona, Italy28 September 2011

    http://BendixCarstensen.com/MethComp/Ancona.2011

    Comparing two methods withone measurement on eachMorning

    Bendix Carstensen

    MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy

    http://BendixCarstensen.com/MethComp/Ancona.2011

    Comparing measurement methods

    General questions:

    Are results systematically dierent?

    Can one method safely be replaced by another?

    What is the size of measurement errors?

    Dierent centres use dierent methods ofmeasurement: How can we convert from onemethod to another?

    How precise is the conversion?

    Comparing two methods with one measurement on each 1/ 90

  • Two methods for measuring fat content inhuman milk:

    1 2 3 4 5 6

    12

    34

    56

    Gerber

    Trig

    Therelationshiplooks like:

    y1 = a+ by2

    Comparing two methods with one measurement on each 2/ 90

    Two methods one measurement by each

    How large is the dierence between a measurementwith method 1 and one with method 2 on a(randomly chosen) person?

    Di = y2i y1i, D, s.d.(D)Limits of agreement:

    D 2 s.d.(D)95% prediction interval for the dierence between ameasurement by method 1 and one by method 2.[1, 2]

    Comparing two methods with one measurement on each 3/ 90

    Limits of agreement: Interpretation

    If a new patient is measured once with each ofthe two methods, the dierence between thetwo values will with 95% probability be withinthe limits of agreement.

    This is a prediction interval for a (future)dierence.

    Requires a clinical input:Are the limits of agreement suciently narrowto make the use of either of the methodsclinically acceptable?

    Is it relevant to test if the mean is 0?

    Comparing two methods with one measurement on each 4/ 90

  • Limits of agreement: Test?

    Testing whether the dierence is 0 is a bad idea:

    If the study is suciently small this will beaccepted even if the dierence is important.

    If the study is suciently large this will berejected even if the dierence is clinicallyirrelevant.

    Comparing two methods with one measurement on each 5/ 90

    Limits of agreement:

    1 2 3 4 5 6

    0.4

    0.2

    0.0

    0.2

    0.4

    ( Trig + Gerber ) / 2

    Trig

    G

    erbe

    r

    0.17

    0.00

    0.17 Plotdierences(Di) versusaverages(Ai).

    Comparing two methods with one measurement on each 6/ 90

    Model in Limits of agreement

    Methods m = 1, . . . ,M , applied to i = 1, . . . , Iindividuals:

    ymi = m + i + emiemi N (0, 2m) measurement error

    Two-way analysis of variance model, withunequal variances in columns.

    Dierent variances are not identiable withoutreplicate measurements for M = 2 because thevariances cannot be separated.

    Models 7/ 90

  • Limits of agreement:

    Usually interpreted as the likely dierence betweentwo future measurements, one with each method:

    y2 y1 = D = 2 1 1.96 s.d.(D)Normally we use 2 instead of 1.96.

    Neither are formally correct if we take the modelseriously:

    Use a t-quantile with I 1 d.f. Estimation s.d. of 2 1 is /

    I.

    So we should use t0.95

    (I + 1)/I instead.This is 2.08 for I = 30 and less than 2 if I > 85.

    Models 8/ 90

    Limits of agreement:

    Limits of agreement can be converted to aprediction interval for y2 given y1, by solving for y2:

    y2 y1 = 2 1 2 s.d.(D)which gives:

    y2|1 = y2|y1 = 2 1 + y1 2 s.d.(D)

    Models 9/ 90

    Introduction to computingMorning

    Bendix Carstensen

    MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy

    http://BendixCarstensen.com/MethComp/Ancona.2011

  • Structure of practicals

    This tutorial is both theoretical and practical, i.e.the aim is to convey a basic understanding of theproblems in method comparison studies, but also toconvey practical skills in handling the statisticalanalysis.

    R for data manipulation and graphics.

    So we assume familiarity with R.

    Occasionally BUGS for estimation in non-linearvariance component models.

    BUGS is hidden inside an R-function.

    Introduction to computing 10/ 90

    How it works

    Example data sets are included in the MethComppackage.

    Functions in MethComp are based on a data framewith a particular structure; a Meth object:

    meth method (factor)item item, person, individual, sample (factor)repl replicate (if present) (factor)

    y the actual measurement (numerical)

    Once converted to Meth, just use summary, plotetc.

    Introduction to computing 11/ 90

    How it looks:

    > subset(ox,as.integer(item) subset(to.wide(ox),as.integermeth item repl y Note:

    1 CO 1 1 78.0 Replicate measurements are t2 CO 1 2 76.4 item repl id CO pulse3 CO 1 3 77.2 1 1 1 1.1 78.0 714 CO 2 1 68.7 2 1 2 1.2 76.4 725 CO 2 2 67.6 3 1 3 1.3 77.2 736 CO 2 3 68.3 4 2 1 2.1 68.7 68184 pulse 1 1 71.0 5 2 2 2.2 67.6 67185 pulse 1 2 72.0 6 2 3 2.3 68.3 68186 pulse 1 3 73.0187 pulse 2 1 68.0188 pulse 2 2 67.0189 pulse 2 3 68.0

    Introduction to computing 12/ 90

  • Getting your own data into R

    Take a look in The R Primer by Claus Ekstrm,or:

    If your data are not too large, the simplest is to edityour data in Excel or some other spreadsheet tolook like this:

    item repl id CO pulse1 1 1.1 78.0 711 2 1.2 76.4 721 3 1.3 77.2 732 1 2.1 68.7 682 2 2.2 67.6 672 3 2.3 68.3 68

    The rst line is variable names; the following linesare data.

    Introduction to computing 13/ 90

    Analysis options in this course

    Scatter plots.

    Bland-Altman plots ((y2 y1) vs. (y1 + y2)/2) Limits of Agreement (LoA).

    Models with constant bias.

    Models with linear bias.

    Conversion formulae between methods (singlereplicates)

    Transformation of measurements.

    Plots of conversion equations.

    Reporting of variance components.

    Introduction to computing 14/ 90

    Requirements

    R for data manipulation and graphics. Keep a script of what you did:

    Use the built-in editor in R the nerds can use ESS or you can download R-Studio.

    You need the packages: MethComp R2WinBUGS coda BRugs Epi - Version 1.10 !!!

    Introduction to computing 15/ 90

  • Functions in the MethComp package

    5 broad categories of functions in MethComp:

    Graphical exploring data.

    Data manipulation reshaping and changing.

    Simulation generating datasets or replacingvariables.

    Analysis functions tting models to data.

    Reporting functions displaying results fromanalyses.

    Overview of these in the Practicals.

    Introduction to computing 16/ 90

    Does it work?

    library(MethComp)library(help=MethComp) # Do you have version 1.10??data(ox)ox

  • Limits of agreement assumptions

    The dierence between methods is constant

    The variances of the methods (and hence ofthe dierence) is constant.

    Check this by:

    Regress dierences on averages.

    Regress absolute residuals from this on theaverages.

    Non-constant dierence 18/ 90

    Glucose measurements

    4 6 8 10 12 14

    4

    6

    8

    10

    12

    14

    Venous plasma

    Cap

    illar

    y bl

    ood

    Non-constant dierence 19/ 90

    Glucose measurements

    5 6 7 8 9 10 11 12

    4

    2

    0

    2

    4

    ( capil + plasma ) / 2

    capi

    l p

    lasm

    a

    2.84

    0.41

    2.02

    Non-constant dierence 20/ 90

  • Regress dierence on average

    Di = a+ bAi + ei, var(ei) = 2D

    If b is dierent from 0, we could use this equation toderive LoA:

    a+ bAi 2Dor convert to prediction as for LoA:

    y2|1 = y1 + a+ bAi y1 + a+ by1 = a+ (1 + b)y1Exchanging methods would give:

    y1|2 = a+ (1 b)y1instead of: y1|2 =

    a1 + b

    +1

    1 + by1

    Non-constant dierence 21/ 90

    Variable limits of agreement

    5 6 7 8 9 10 11 12

    4

    2

    0

    2

    4

    ( capil + plasma ) / 2

    capi

    l p

    lasm

    a

    Non-constant dierence 22/ 90

    Improving the regression of D on A

    y2i y1i = a+ b(y1i + y2i)/2 + eiy2i(1 b/2) = a+ (1 + b/2)y1i + ei

    y2i =a

    1 b/2 +1 + b/2

    1 b/2y1i +1

    1 b/2ei

    y1i =a

    1 + b/2+

    1 b/21 + b/2

    y2i +1

    1 + b/2ei

    This is what comes out of the functionsDA.reg and BA.plot

    Non-constant dierence 23/ 90

  • Variable limits of agreement

    5 6 7 8 9 10 11 12

    4

    2

    0

    2

    4

    ( capil + plasma ) / 2

    capi

    l p

    lasm

    a

    capilplasma = 2.09 0.32 (capil+plasma)/2 (95% p.i.: +/2.16)

    plasma = 2.49 + 1.38 capil (95% p.i.: +/2.57)capil = 1.80 + 0.73 plasma (95% p.i.: +/1.87)

    Non-constant dierence 24/ 90

    Conversion equation with prediction limits

    4 6 8 10 12 14

    4

    6

    8

    10

    12

    14

    Venous plasma

    Cap

    illar

    y bl

    ood

    Non-constant dierence 25/ 90

    Prediction intervals

    Prediction s.e. for y1|2 is /(1 b/2) Prediction s.e. for y1|2 is /(1 + b/2) The slope of the prediction line is the ratio ofthe prediction s.e.s.

    Hence prediction limits can be used both ways:

    Non-constant dierence 26/ 90

  • Conversion equation with prediction limits

    4 6 8 10 12 14

    4

    6

    8

    10

    12

    14

    Venous plasma

    Cap

    illar

    y bl

    ood

    Non-constant dierence 27/ 90

    Why does this work?

    The general model for the data is:

    y1i = 1 + 1i + e1i, e1i N (0, 21)y2i = 2 + 2i + e2i, e2i N (0, 22)

    Work out the prediction of y1 given anobservation of y2 in terms of these parameters.

    Work out how dierences relate to averages interms of these parameters.

    Then the prediction is as we just derived it.

    Non-constant dierence 28/ 90

    Why is it wrong anyway?

    Introducing linear bias, ymi = m + mi + emiputs measurements by dierent methods ondierent scales.Hence it has formally no meaning to form thedierences.

    In the induced model for Di a+ bAi + ei,ei and AI are not independent.

    But if is not too far from 1 it not a bigproblem, though.

    Non-constant dierence 29/ 90

  • Comparing two methods withreplicate measurementsMorning

    Bendix Carstensen

    MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy

    http://BendixCarstensen.com/MethComp/Ancona.2011

    Replicate measurementsFat data; exchangeable replicates:

    item repl KL SL1 1 4.5 4.91 2 4.4 5.01 3 4.7 4.83 1 6.4 6.53 2 6.2 6.43 3 6.5 6.1

    Oximetry data; linked replicates:

    item repl CO pulse1 1 78.0 711 2 76.4 721 3 77.2 732 1 68.7 682 2 67.6 672 3 68.3 68

    Linked or exchangeable replicates!Comparing two methods with replicate measurements 30/ 90

    Extension of the model:exchangeable replicates

    ymir = m + i + cmi + emirs.d.(cmi) = m matrix-eect

    s.d.(emir) = m measurement error

    Replicates within (m, i) are needed to separate and .

    Even with replicates, the separate s are onlyestimable if M > 2.

    Still assumes that the dierence betweenmethods is constant.

    Assumes exchangeability of replicates.Comparing two methods with replicate measurements 31/ 90

  • Extension of the model:linked replicates

    ymir = m + i + air + cmi + emirs.d.(air) = between replicates

    s.d.(cmi) = m matrix-eect

    s.d.(emir) = m measurement error

    Still assumes that the dierence betweenmethods is constant.

    Replicates are linked between methods:air is common across methods, i.e. the rstreplicate on a person is made under similarconditions for all methods (i.e. at a specicday or the like).

    Comparing two methods with replicate measurements 32/ 90

    Replicate measurements

    Three approaches to limits of agreement withreplicate measurements:

    1. Take means over replicates within each methodby item stratum.

    2. Replicates within item are taken as items.

    3. Fit the correct variance components model anduse this as basis for the LoA.The model is tted using:> BA.est( data, linked=TRUE ).

    Comparing two methods with replicate measurements 33/ 90

    Oximetry data

    20 40 60 80

    20

    10

    010

    20

    (CO+pulse)/2

    CO

    pul

    se

    Comparing two methods with replicate measurements 34/ 90

  • Replicate measurements

    The limits of agreement should still be fordierence between future single measurements.

    Analysis based on the means of replicates istherefore wrong:

    Model:

    ymir = m + i + air + cmi + emir

    var(y1jr y2jr) = 21 + 22 + 21 + 22 note that the term air air cancels becausewe are referring to the same replicate.

    Comparing two methods with replicate measurements 35/ 90

    Wrong or almost right

    In the model the correct limits of agreement wouldbe:

    1 2 1.96 21 +

    22 +

    21 +

    22

    But if we use means of replicates to form thedierences we have:

    di = y1i y2i = 1 2 +

    r airR1i

    r airR2i

    +c1i c2i +

    r e1irR1i

    r e2irR2i

    Comparing two methods with replicate measurements 36/ 90

    The terms with air are only relevant for linkedreplicates in which case R1i = R2i and therefore theterm vanishes. Thus:

    var(di) = 21+

    22+

    21/R1i+

    22/R2i <

    21+

    22+

    21+

    22

    so the limits of agreement calculated based on themeans are much too narrow as prediction limits fordierences between future single measurements.

    Comparing two methods with replicate measurements 37/ 90

  • (Linked) replicates as items

    If replicates are taken as items, then the calculateddierences are:

    dir = y1ir y2ir = 1 2 + c1i c2i + e1ir e2irwhich has variance 21 +

    22 +

    21 +

    22, and so gives

    the correct limits of agreement. However, thedierences are not independent:

    cov(dir, dis) = 21 +

    22

    Negligible if the residual variances are very largecompared to the interaction, variance likely to beonly slightly downwards biased.

    Comparing two methods with replicate measurements 38/ 90

    Recommendations

    Fit the correct model, and get the estimatesfrom that, e.g. by using BA.est.

    If you must use over-simplied methods:

    Use linked replicates as item.

    If replicates are not linked; make a randomlinking.Note: If this give a substantially dierentpicture than using the original replicatenumbering as linking key, there might besomething shy about the data.

    Further details, see [3].

    Comparing two methods with replicate measurements 39/ 90

    Oximetry data

    Linkedreplicates usedas items

    Mean overreplicates asitems

    Limits based onmodel dashed lineassumingexchangeablereplicates

    20 40 60 80

    20

    10

    010

    20

    (CO+pulse)/2

    CO

    pul

    se

    Comparing two methods with replicate measurements 40/ 90

  • Repeatability andreproducibilityWednesday 9 February, morning

    Bendix Carstensen

    MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy

    http://BendixCarstensen.com/MethComp/Ancona.2011

    Accuracy of a measurement method

    Repeatability:The accuracy of the method under exactlysimilar circumstances; i.e. the same lab, thesame technician, and the same day.(Repeatability conditions)

    Reproducibility:The accuracy of the method under comparablecircumstances, i.e. the same machinery, thesame kit, but possibly dierent days orlaboratories or technicians.(Reproducibility conditions)

    Repeatability and reproducibility 41/ 90

    Quantication of accuracy

    Upper limit of a 95% condence interval forthe dierence between two measurments.

    Suppose the variance of the measurement is 2:

    var(ymi1 ymi2) = 22

    i.e the standard error is2, and a condnece

    interval for the dierence:

    0 1.962 = 0 2.772 2.8

    This is called the reproducibility coecient orsimply the reproducibility. (The number 2.8 isused as a convenient approximation).

    Repeatability and reproducibility 42/ 90

  • Quantication of accuracy

    Where do we get the ?

    Repeat measurements on the same item (oreven better) several items.

    The conditions under which the repeat(replicate) measurements are taken determineswhether we are estimating repeatability orreproducibility.

    In larger experiments we must consider theexchangeability of the replicates i.e. whichreplicates are done under (exactly) similarconditions and which are not.

    Repeatability and reproducibility 43/ 90

    Linear bias between methodsMorning

    Bendix Carstensen

    MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy

    http://BendixCarstensen.com/MethComp/Ancona.2011

    Extension with non-constant bias

    ymir = m + mi + random eects

    There is now a scaling between the methods.

    Methods do not measure on the same scale therelative scaling is estimated, between method 1 and2 the scale is 2/1.

    Consequence: Multiplication of all measurements onone method by a xed number does not changeresults of analysis:

    The corresponding is multiplied by the samefactor as is the variance components for thismethod.

    Linear bias between methods 44/ 90

  • Variance components

    Two-way interactions:

    ymir = m + m(i + air + cmi) + emir

    The random eects cmi and emir have variancesspecic for each method.

    But air does not depend on m must be scaled toeach of the methods by the corresponding m.

    Implies that = s.d.(air) is irrelevant the scaleis arbitrary. The relevant quantities are m thebetween replicate variation within item as measuredon the mth scale.

    Linear bias between methods 45/ 90

    0 20 40 60 80 100

    020

    4060

    8010

    0

    y1

    0 20 40 60 80 100

    020

    4060

    8010

    0

    y2

    0 20 40 60 80 100

    020

    4060

    8010

    0

    y1

    y2

    ymir = m + i + air + cmi + emir

    Linear bias between methods 46/ 90

    0 20 40 60 80 100

    020

    4060

    8010

    0

    y1

    0 20 40 60 80 100

    020

    4060

    8010

    0

    y2

    0 20 40 60 80 100

    020

    4060

    8010

    0

    y1

    y2

    ymir = m + m(i + air + cmi) + emir

    Linear bias between methods 47/ 90

  • Estimation: Alternatingrandom eects regressionMorning

    Bendix Carstensen

    MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy

    http://BendixCarstensen.com/MethComp/Ancona.2011

    Alternating random eects regression

    Carstensen [4] proposed a ridiculously complicatedapproach to t the model

    ymir = m + mi + cmi + emir

    based in the observation:

    For xed the model is a linear mixed model.

    For xed (, ) it is a regression through 0.

    This has be improved in [5]

    Alternating regressions 48/ 90

    Alternating random eects regression

    Now consider instead the correctly formulatedversion of the slightly more general model:

    ymir = m + m(i + air + cmi) + emir

    Here we observe

    For xed mir = i + air + cmi the model is alinear model, with residual variances dierentbetween methods.

    For xed (, ) scaled responses y are used:

    ymir mm

    = i + air + cmi + emir/m

    Alternating regressions 49/ 90

  • Estimation algorithm

    ymir = m + m(i + air + cmi) + emir

    1. Start with mir = ymi2. Estimate (m, m).

    3. Compute the scaled responses and t therandom eects model.

    4. Use the estimated is, and BLUPs of air andcmi to update mir.

    5. Check convergence in terms of identiableparameters.

    Alternating regressions 50/ 90

    The residual variances

    The variance components are estimated in themodel for the scaled response.

    The parameters (m, m) are not taken intoaccount in the calculation of the residualvariance.

    Hence the residual variances must be correctedpost hoc.

    This machinery is implemented in the functionAltReg in the MethComp package.

    Alternating regressions 51/ 90

    > AR.ox round(AR.ox,3)From

    To Intercept: CO pulse Slope: CO pulse IxR sd. MxI sd. res.sd.CO 0.000 -2.159 1.000 1.063 3.521 2.978 2.055pulse 2.031 0.000 0.941 1.000 3.313 2.802 4.079

    Alternating regressions 52/ 90

  • Your turn:

    Start on the practical titled:

    Oximetry: Linked replicates with non-constantbias

    Alternating regressions 53/ 90

    Converting between methodsAfternoon

    Bendix Carstensen

    MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy

    http://BendixCarstensen.com/MethComp/Ancona.2011

    Predicting method 2 from method 1

    y10r = 1 + 1(0 + a0r + c10) + e10ry20r = 2 + 2(0 + a0r + c20) + e20r

    y20r = 2 +

    21

    (y10r 1 e10r)+ 2(c10 + c20) + e20r

    The random eects have expectation 0, so:

    E(y20|y10) = y20 = 2 + 21

    (y10 1)

    Converting between methods 54/ 90

  • y20r = 2 +21

    (y10r 1 e10r)+ 2(c10 + c20) + e20r

    var(y20|y10) =(21

    )2(21

    21 +

    21) + (

    22

    22 +

    22)

    The slope of the prediction line from method 1 tomethod 2 is 2/1.

    The width of the prediction interval is:

    2 2(

    21

    )2(21

    21 +

    21) + (

    22

    22 +

    22)

    Converting between methods 55/ 90

    If we do the prediction the other way round (y1|y2)we get the same relationship i.e. a line with theinverse slope, 1/2.

    The width of the prediction interval in this directionis (by permutation of indices):

    2 2

    (2121 +

    21) +

    (12

    )2(22

    22 +

    22)

    = 2 2 12

    (21

    )2(21

    21 +

    21) + (

    22

    22 +

    22)

    i.e. if we draw the prediction limits as straight linesthey can be used both ways.

    Converting between methods 56/ 90

    20 40 60 80 10020

    40

    60

    80

    100

    CO

    puls

    e

    pulse = 2.11 + 0.94 CO ( 6.00 )

    CO = 2.25 + 1.06 pulse ( 6.39 )

    Converting between methods 57/ 90

  • What happened to the curvature?

    0 5 10

    20

    10

    010

    2030

    40

    x

    y

    Usually the predictionlimits are curved:

    y|x t0.975 1 + xx

    In our prediction we have ignored the last term(xx), i.e. eectively assuming that there is noestimation error on 2|1 and 2|1.

    Converting between methods 58/ 90

    Transformation of dataAfternoon

    Bendix Carstensen

    MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy

    http://BendixCarstensen.com/MethComp/Ancona.2011

    If variances are not constantA transformation might help:

    > round( ftable( DA.reg(ox) ), 3 )alpha beta sd.pred beta=1 s.d.=K

    From: To:CO CO 0.000 1.000 NA NA NA

    pulse 1.864 0.943 5.979 0.142 0.000pulse CO -1.977 1.061 6.342 0.142 0.000

    pulse 0.000 1.000 NA NA NA

    > oxt round( ftable( DA.reg(oxt) ), 3 )alpha beta sd.pred beta=1 s.d.=K

    From: To:CO CO 0.000 1.000 NA NA NA

    pulse -0.034 0.900 0.306 0.009 0.246pulse CO 0.038 1.111 0.340 0.009 0.246

    pulse 0.000 1.000 NA NA NA

    Transformation of data 59/ 90

  • 20 40 60 80 10020

    40

    60

    80

    100

    CO

    puls

    e

    pulse = 2.11 + 0.94 CO ( 6.00 )

    CO = 2.25 + 1.06 pulse ( 6.39 )

    Transformation of data 60/ 90

    Analysis on the transformed scale

    > ARoxt ARoxt ARoxtNote: Response transformed by: log p/(100 - p)

    Conversion between methods:alpha beta sd

    To: From:CO CO 0.000 1.000 0.202

    pulse 0.042 1.105 0.341pulse CO -0.038 0.905 0.309

    pulse 0.000 1.000 0.271

    Variance components (sd):s.d.

    Method IxR MxI resCO 0.232 0.160 0.143pulse 0.210 0.145 0.191

    This is an analysis for the transformed data.Transformation of data 62/ 90

  • Backtransformation for plotting

    prpulse

  • 0 20 40 60 80 100

    40

    20

    0

    20

    40

    ( CO + pulse ) / 2

    puls

    e

    CO

    0 20 40 60 80 100

    40

    20

    0

    20

    40

    ( CO + pulse ) / 2

    puls

    e

    CO

    pulse = 2.03 + 0.94 CO ( 6.01 )

    CO = 2.16 + 1.06 pulse ( 6.38 )

    0 20 40 60 80 100

    40

    20

    0

    20

    40

    ( CO + pulse ) / 2

    puls

    e

    CO

    Transformation of data 66/ 90

    Implementation in BUGSAfternoon

    Bendix Carstensen

    MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy

    http://BendixCarstensen.com/MethComp/Ancona.2011

    Implementation in BUGS

    ymir = m + m(i + air + cmi) + emir

    Non-linear hierarchical model:Implement in BUGS.

    The model is symmetrical in methods.

    Mean is overparametrized.

    Choose a prior (and hence posterior!) for thes with nite support.

    Keeps the chains nicely in place.

    This is the philosophy in the function MCmcmc.

    Implementation in BUGS 67/ 90

  • Results from tting the model

    The posterior distn of (m, m, i) is singular.

    But the relevant translation quantities areidentiable:

    2|1 = 2 12/12|1 = 2/1

    So are the variance components.

    Posterior medians used to devise predictionequations with limits.

    Implementation in BUGS 68/ 90

    The MethComp package for R

    Implemented model:

    ymir = m + m(i + air + cmi) + emir

    Replicates required. R2WinBUGS, BRugs or JAGS is required. Dataframe with variablesmeth, item, repl and y (a Meth object)

    The function MCmcmc writes a BUGS-program,initial values and data to les.

    Runs BUGS and sucks results back in to R, andgives a nice overview of the conversionequations.

    Implementation in BUGS 69/ 90

    Example output: Oximetry

    > summary( ox )#Replicates

    Method 1 2 3 #Items #Obs: 354 Values: min med maxCO 1 4 56 61 177 22.2 78.6 93.5pulse 1 4 56 61 177 24.0 75.0 94.0

    >> MCox

  • Simulation run of a model with- method by item and item by replicate interaction:- using 4 chains run for 2000 iterations(of which 1000 are burn-in),

    - monitoring all values of the chain:- giving a posterior sample of 4000 observations.

    model is syntactically correctdata loadedmodel compiledInitializing chain 1: initial values loaded but this or anotherInitializing chain 2: initial values loaded but this or anotherInitializing chain 3: initial values loaded but this or anotherInitializing chain 4: initial values loaded but this or anotherinitial values generated, model initializedSampling has been started ...1000 updates took 38 sdeviance setmonitor set for variable alphamonitor set for variable betamonitor set for variable sigma.mimonitor set for variable sigma.irmonitor set for variable sigma.resmonitor set for variable devianceImplementation in BUGS 71/ 90

    > MCox

    Conversion between methods:alpha beta sd

    To: From:CO CO 0.000 1.000 1.740

    pulse -9.342 1.159 5.328pulse CO 8.061 0.863 4.508

    pulse 0.000 1.000 6.115

    Implementation in BUGS 72/ 90

    Variance components (sd):s.d.

    Method IxR MxI resCO 3.878 3.122 1.230pulse 3.222 2.757 4.324

    Variance components with 95 % cred.int.:method CO pulseqnt 50% 2.5% 97.5% 50% 2.5% 97.5%

    SDIxR 3.878 3.053 4.533 3.222 2.426 3.930MxI 3.122 2.193 9.764 2.757 1.915 5.902res 1.230 0.143 2.639 4.324 3.709 5.019tot 5.220 4.507 10.645 6.135 5.457 7.849

    Implementation in BUGS 73/ 90

  • Mean parameters with 95 % cred.int.:50% 2.5% 97.5% P(>0/1)

    alpha[pulse.CO] 8.057 -2.457 29.884 0.969alpha[CO.pulse] -9.346 -49.949 2.476 0.031beta[pulse.CO] 0.863 0.604 0.997 0.024beta[CO.pulse] 1.159 1.003 1.657 0.976

    Note that intercepts in conversion formulae are adjusted to getconversion formulae that represent the same line both ways,and hence the median interceps in the posterior do not agreeexactly with those given in the conversion formulae.

    Implementation in BUGS 74/ 90

    Inter-rater agreementAfternoon

    Claus Thorn Ekstrm

    MethComp28 September 2011Tutorial, SISMEC, Ancona, Italy

    http://BendixCarstensen.com/MethComp/Ancona.2011

    Program

    Example

    Random rater vs. xed methods

    Statistical modelling

    Inter-rater agreement 75/ 90

  • Example: depression ratings

    DoctorPatient A B C D E1 8 9 5 8 82 0 0 1 2 13 4 5 5 5 34 5 8 7 8 55 3 3 1 8 26 8 9 9 9 1

    Doctor doctor! Am I depressed?

    Getting second opinions ... and third ... and fourth...Inter-rater agreement 76/ 90

    Example: depression ratings

    Research question

    How well will two doctors agree on the diagnosis?

    In this example we use humans as measurementmethods or raters.

    However, we are not interested in makingstatements about specic raters.

    Inter-rater agreement 77/ 90

    Fixed versus random eects

    Denition: Factors can either be xed or random.

    A factor is xed when the levels (e.g. raters)under study are the only levels of interest.

    A factor is random when the levels under studyare a random sample from a larger populationof raters and the goal of the study is to make astatement regarding the larger population.

    Raters can be dened as xed or random factors:

    If the raters themselves are of interest (youwant to use them again) then use xed model.

    If raters are randomly chosen of possible poolof raters (you do not have specic raters inmind) then use the random model.Inter-rater agreement 78/ 90

  • Fixed versus random eects

    Inter-rater agreement 79/ 90

    Modelling: exchangeable replicates

    The model for xed methods is:

    ymir = m + i + cmi + emirs.d.(cmi) = m matrix-eect

    s.d.(emir) = m measurement error

    Replicates within (m, i) are needed to separate and .

    Even with replicates, the separate s are onlyestimable if M > 2.

    Assumes that the dierence between methodsis constant.

    Assumes exchangeability of replicates.

    If no replicates then disregard the cmis.Inter-rater agreement 80/ 90

    Modelling: exchangeable replicates

    The model for random methods/raters is:

    ymir = bm + i + cmi + emirs.d.(bm) = variation among raters

    s.d.(cmi) = m matrix-eect

    s.d.(emir) = m measurement error

    Replicates within (m, i) are needed to separate and .

    Even with replicates, the separate s are onlyestimable if M > 2.

    Note: average dierence is 0! Assumes exchangeability of replicates.

    If no replicates then disregard the cmis.Inter-rater agreement 81/ 90

  • Model for replicate measurements

    Same approach as before: Fit the correct variancecomponents model and use this as the basis for LoA.

    Extremely exible. Can even be used to analyze the situationwhere every rater not necessarily has scoredevery item.

    Exchangeable replicates are not uncommon, e.g.,

    Experts scoring/extracting information fromimages

    Measurements taken on couples/twins.

    Linked replicates do not make sense, when it isarbitrary which person is partner 1 or partner 2.

    Inter-rater agreement 82/ 90

    Replicate measurements

    The limits of agreement / prediction interval for tworandom raters scoring a new future observation is

    0 1.96

    22Extra variation

    + 21 + 22 +

    21 +

    22

    However, since we are considering the predictioninterval for two random raters we use the averagevariance components in the formula

    0 1.96

    2(2 + 2 + 2)

    Note that the expected dierence is zero since wehave no xed order of the raters.

    Inter-rater agreement 83/ 90

    Example: Stress scoring of dogs

    10 judges scoring stress indicators from 10 dogs.

    > dogdata BA.est(dogdata, random.raters=TRUE, linked=FALSE)

    Variance components (sd):IxR MxI M res

    j1 0 18.145 14.11 20.948j10 0 8.122 14.11 12.736j2 0 0.009 14.11 11.350j3 0 0.004 14.11 9.524j4 0 0.004 14.11 9.614j5 0 8.924 14.11 12.588j6 0 18.534 14.11 21.135j7 0 0.023 14.11 11.991j8 0 0.004 14.11 9.384j9 0 0.003 14.11 9.789

    Inter-rater agreement 84/ 90

  • Example: Stress scoring of dogs

    10 judges scoring stress indicators from 10 dogs.

    > res res$LoA

    Mean Lower Upper SDRand. rater - rater 0 -61.02451 61.02451 30.51225

    Inter-rater agreement 85/ 90

    Linked replicates

    For linked replicates, extend the model as before:

    ymir = bm + i + air + cmi + emirs.d.(bm) = variation among raters

    s.d.(air) = between replicates

    s.d.(cmi) = m matrix-eect

    s.d.(emir) = m measurement error

    The variation between replicates, , does not enterthe limits-of-agreement since the LoAs are for asingle new future observation (ie., the samereplicate from one item/individual for both raters).

    0 1.96

    2(2 + 2 + 2)Inter-rater agreement 86/ 90

    Linked replicates

    > dogdata BA.est(dogdata, random.raters=TRUE)

    Variance components (sd):IxR MxI M res

    j1 6.466 18.754 13.994 21.672j10 6.466 8.163 13.994 10.454j2 6.466 3.213 13.994 10.439j3 6.466 0.011 13.994 5.718j4 6.466 0.033 13.994 11.716j5 6.466 8.817 13.994 10.449j6 6.466 19.205 13.994 21.770j7 6.466 4.732 13.994 10.523j8 6.466 0.009 13.994 5.701j9 6.466 0.026 13.994 11.441

    Inter-rater agreement 87/ 90

  • Linked replicates

    > res2 res2$LoA

    Mean Lower Upper SDRand. rater - rater 0 -60.47317 60.47317 30.23658

    Inter-rater agreement 88/ 90

    Random raters

    Fit the correct variance component modelwhere variation among raters is considered arandom eect

    Since each rater can have his/her individualvariance we need to average the individualvariance components

    Extract the relevant variance components andcompute the limits-of-agreement

    Inter-rater agreement 89/ 90

    DG Altman and JM Bland.Measurement in medicine: The analysis of method comparison studies.The Statistician, 32:307317, 1983.

    JM Bland and DG Altman.Statistical methods for assessing agreement between two methods of clinicalmeasurement.Lancet, i:307310, 1986.

    B Carstensen, J Simpson, and LC Gurrin.Statistical models for assessing agreement in method comparison studies withreplicate measurements.International Journal of Biostatistics, 4(1):Article 16, 2008.

    B Carstensen.Comparing and predicting between several methods of measurement.Biostatistics, 5(3):399413, Jul 2004.

    B. Carstensen.Comparing Clinical Measurement Methods: A practical guide.Wiley, 2010.