reading report: a unified approach for assessing agreement for continuous and categorical data...

21
Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Upload: constance-hunter

Post on 21-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Reading Report: A unified approach for assessing agreement

for continuous and categorical data

Yingdong Feng

Page 2: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Introduction

In Lin’s paper, they propose a series of indices for assessing agreement, precision and accuracy. In addition, this paper also proposes the CP and TDI for normal data. All these five indices are expressed as functions of variance components. Lin obtains the estimates and perform inferences for all the functions of variance components through GEE method. In their model, they measure the agreement among k raters, with each rater having multiple (m) readings from each of the n subjects for continuous and categorical data.

Page 3: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Introduction

The approach of this paper integrates the approaches by Barnhart et al. (2005) and Carrasco and Jover (2003), for example, Barnhart et al. (2005) proposed a series of indices (intra-rater CCC, inter-rater CCC and total CCC) and estimate those indices and their inferences by GEE method. The definition of these three indices are listed below.

In this paper, they introduce a unified approach which can be used for continuous, binary, and ordinal data. They provide the simulation results in assessing the performance of the unified approach in section 4 and give two examples to illustrate the use of the unified approach in section 5.

Index Definition

Intra-rater Multiple readings from same rater

Inter-rater Average of multiple readings among different raters

Total-rater Different raters based on individual readings

Page 4: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Method

In this paper, the model they use for measuring agreement is

y ijl stands for the lth reading from subject i given by rater j, with i = 1, 2,…, n, j = 1, 2,…, k, and l = 1, 2,…, m. μ is the overall mean, α i is the random subject effect, β j is the rater effect, γ ij is the random interaction effect between rater and subject, e ijl is the random error effect. The variance among all raters is denoted as

Based on this model, they propose a series of indices to measure agreement, precision and accuracy.

Page 5: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Method

Page 6: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Method

Page 7: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Method

Page 8: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Method

In the total agreement part, the authors give CCC total, precision total, accuracy total, MSD, TDI and CP. Since total agreement is a measure of agreement based on any individual reading from each reader, we can see from the paper that these indices do not depend on the number of replications unlike inter-rater agreement.

For the estimation and inference part, before we estimate all indices, we need to estimate the mean for each rater and all variance components first, this paper proposes a system of equations to estimate them.

Page 9: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Method

Then delta method is used to obtain the estimates and inferences of estimates for all indices. The following table shows what transformation is used when perform the corresponding indices.

Indices Transformation method

CCC-indices and precision indices Z-transformation

Accuracy and CP indices Logit transformation

TDIs Natural log transformation

Page 10: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Simulation

In section 3, this paper gives the simulation based on binary data, ordinal data and normal data in order to evaluate the performance of proposed indices above and to compare them against other existing methods. The results are shown from table 1 to table 5, among the 5 tables, both table 1 and table 2 are the results from binary data simulation, but the binary data in table 1 has been transformed using the methods in the above table. Similarly, both table 3 and table 4 are the results from ordinal data simulation, but table 3 used the transformation. Table 5 gives normal data results with transformation.From the five tables we can see for all of them, this paper uses three cases, case one is k=2 & m=1, case two is k=4 & m=1, case three is k=2 & m=3. For each case, they generate 1000 random samples of size 20.

Page 11: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Simulation

There are five columns in each table, the corresponding definition of each column is listed below.

Theoretical Theoretical value for this case

Mean The mean of the 1000 estimated indices from the 1000 random

samples.

Std (Est) The standard deviation of the 1000 estimated indices from the

1000 random samples.

Mean (Std) The mean of the 1000 estimated standard errors.

Sig Proportion of estimates which are outside the 95% confidence

interval

Page 12: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Simulation

Why could they have the theoretical value? For example, for binary data with k=2 & m=1, they set the correlation equals to 0.6, the margin for the first variable is (0.3, 0.7) and the margin for the second variable is (0.5, 0.5). For binary data with k = 4 and m = 1, they set the vector mean μ = (0.55, 0.6, 0.65, 0.8) and ρ12 = 0.75, ρ13 = 0.7, ρ14 = 0.5, ρ23 = 0.8, ρ24 = 0.6, and ρ34 = 0.6. For binary data with k = 2 and m = 3, we set the vector mean μ = (0.7, 0.7, 0.7, 0.6, 0.6, 0.6). The correlation between any two of the first three variables is 0.8. The correlation between any two of the last three variables is also 0.8. The correlation between any one of the first three variables with any one of the last three variables is 0.7. With these settings, we can calculate the theoretical value in advance.

Page 13: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

From the results of table 1 and table 2, all three cases perform very well, it is very straightforward to see that the numbers in first column and second column are very close and the numbers in the third column and forth column are very close, which means their estimates are very close to the corresponding theoretical value, and the means of the estimated standard error are very close to their corresponding standard deviations of the estimates. We can conclude that these indices they proposed fit well for binary data.

Page 14: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Table 3 and table 4 show the results for ordinal data simulation, similarly, we set the correlation and margin in advance to get the theoretical values. For both tables, the results are also similar to binary data, the numbers in first column are close to second column, so as the third column and forth column. We can say these indices fit well for ordinal data.

Page 15: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

The last table in simulation part is the result for normal data with transformation. In order to get the theoretical value, the author have to set the precision, accuracy, within-rater precision, between-rater precision and between-rater precision in advance. Notice most of the means of estimated standard error are close to the corresponding standard deviations of the estimates except for CP inter , unlike the conclusion from the author, I would say Carrasco’s method performs better than CCC here when m=1, notice in the case of k=2 & m=3, the inter-rater agreement calculated from Barnhart’s method is a little bit bigger than ours, the reason is in Barnhart’s method, they assume m is infinite. Thus from the results of simulation, we can conclude that the indices they proposed in this paper work fairly well in estimates and in corresponding inferences for binary data, ordinal data and normal data.·

Page 16: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Example

This paper gives two examples to illustrate the use of the unified approach. The first example is dispirin crosslinked hemoglobin (DCLHb) and the second example is assay validation. In this reading report we will discuss the results from the second example. They consider the Hemagglutinin Inhibition (HAI) assay for antibody to Influenza A (H3N2) in rabbit serum samples from two different labs. Serum samples from 64 rabbits are measured twice by each method. Antibody level is classified as: negative, positive, and highly positive.

Page 17: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Example

In the paper, table 7 to 10 show the frequency tables for within lab and between lab readings. From table 9 and 10 we can see that lab two tends to report higher values than lab one, but table 7 and 8 suggest that within lab agreement is good.

Page 18: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Example

Since it is an imprecise assay, the author allowed for looser agreement criteria where the agreement was defined as a within-sample total deviation not more than 50% of the total deviation if observations are from the same method, and a within-sample total deviation not more than 75% of the total deviation if observations are from different methods. Thus we get a least acceptable CCC intra of 0.75, and a least acceptable CCC inter of 0.4375.

Page 19: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Example

Table 11 shows the result, the item 97.5% confidence limit is the one-sided 97.5% lower confidence limits of its corresponding agreement statistics. Now let’s see the data in this table, for example, precision intra was estimated to be 0.88361, which means for observations from the same method, the within-sample deviation is about 34.1% of the total deviations. The CCC inter is estimated to be 0.37225, which means for the average observations from different methods, the within-sample deviation is about 79.2% of the total deviations.

Page 20: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Conclusion

Measuring agreements between different methods or different raters have received a great deal of attention recently, in this paper they proposes several indices include CCC, precision, accuracy, CP and TDI. They used these indices to measure intra, inter and total agreement among all raters. From the simulation part, we have figured out that these indices fit fairly well for binary, ordinal and normal data, and in the example of HAI assay, they also point out the ineffective of these indices for the agreement between two labs readings and suggest that kappa or weighted kappa could be applied to get the agreement within each lab.

Page 21: Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng

Further Research

we can consider to include the link functions such as log or logit in the GEE method in order to make the approach be more robust to different types of data. And in this paper, the variance components functions are based on balanced data, for the missing data, we may modify these functions or develop new functions.