when adjusting for bias due to linkage errors: a sensitivity analysis q2014 tiziana tuoto

16
When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto 05/06/2014 Joint work with Loredana Di Consiglio

Upload: adin

Post on 19-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto 05/06/2014 Joint work with Loredana Di Consiglio. Outline of the talk. Motivations Linkage errors and total survey error Methodologies for analyses on linked data A sensitivity analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

When adjusting for bias due to linkage errors: a sensitivity analysis

Q2014

Tiziana Tuoto

05/06/2014

Joint work with Loredana Di Consiglio

Page 2: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Outline of the talk

1. Motivations

2. Linkage errors and total survey error

3. Methodologies for analyses on linked data

4. A sensitivity analysis

5. Concluding remarks and future works

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 3: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Why linking and why linkage errors?

• Integration of different sources (surveys, administrative lists, registers )

has acquired a preeminent role

• The huge accomplished effort to link data is not the final aim of the

statistical process

• Whatever is the statistical analysis to perform on integrated data, when

dealing with data resulting from a record linkage process, it should be

taken into account that linkage is subject to two types of errors:

1. erroneous acceptance of false links

2.rejection of true matches (missed links)

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 4: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Linkage Errors and Total Survey Error

Biemer 2010

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 5: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Linkage Errors and Total Survey Error

Zhang 2012

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 6: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Methodologies for analyses on linked data

• 1965 : Neter, Maynes and Ramanathan

• 1993-1997 : Scheuren and Winkler

• 2000 : Lahiri and Larsen

• 2009 : Chambers Regression analysis of probability-linked data, Official

Statistics Research Series, Vol. 4.

• 2011 : Chipperfield, Bishop and Campbell

Chambers (2009) contains a systematic overview of regression analysis of

linked data, describes the approach developed by Neter et al., Scheuren et al,

Lahiri et al. and gives his own bias-corrected estimators of regression

parameters

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 7: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Methodologies for analyses on linked data

Those settings work under strong assumptions

• Exchangeability linkage errors model

• Equal size of linking sets (or smallest set contained in the biggest one)

• Linking in 1:1 constrain

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 8: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

A sensitivity analysis

Winkler (2014) notes

• «Scheuren and Winkler (1997) observed that, if linkage error is below 1%,

then can perform statistical analysis without adjustment.

• Most ‘good’ matching situations have overall linkage error above 10%.

• Even ‘high match scores’ sets of pairs may have linkage error in range 1-

5%.

• The current models may adjust the ‘observed’ matched pairs to having

linkage error down from 10% to 7.5%»

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 9: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Experimental data

Scenario Declared Matches False matches in Declared

Gold826

0 0.048 0 0

Silver 752 11 0.146 0.087 0.015

Bronze 786 30 0.129 0.236 0.038

Random Sample of 1000 units from the fictitious population census data in

the ESSnet DI (2011).

Linear model (as in Chambers, 2009): Y= X+

with X~[1,Uniform(0,1)] =[1,5]

~Norm(0,1)

Logistic model: X~Bernoulli(0.75)

Y~Multinom(0.7,0.05,0.2,0.05) dependent on X.

Two lists L1 and L2 were generated

L1 = [Xs, 942 units]

L2 = [Ys, 921 units]

Units in common (the true matches) 868; true un-matches are 127

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 10: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

The three Linkage scenarios

Probabilistic record linkage procedures (Fellegi and Sunter, 1969) with the

software RELAIS (2011).

• Gold scenario: Name, Surname, Complete date of birth

• Silver scenario: Name, Surname, Year of Birth

• Bronze scenario: Day of birth, Month of birth, Address.

Scenario Declared Matches

False matches in Declared

= prob. Missing true matches

= prob. False matches

= false matches rate

Gold826 0 0.048 0 0

Silver752 11 0.146 0.087 0.015

Bronze786 30 0.129 0.236 0.038

Table 1 – Results of linkage procedures for the three Scenarios

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 11: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Linear Model – Naive Estimator and Linkage error bias adjusted estimatorsLinkage scenario Estimator Beta Standard

Error

Population True Value 0.886 - 5.155 0.064 - 0.112

Perfect Linkage Naïve 0.907 - 5.093 0.069 - 0.121

Gold Linkage Naïve 0.927 - 5.085 0.071 - 0.123

Silver Linkage

 

Naïve 0.988 - 4.976 0.079 - 0.138

Ratio – ModOLS – Predictive 0.952 - 5.050 0.080 - 0.141

Eb_CUE 0.949 - 5.055 0.080 - 0.141

Bronze Linkage

Naïve 1.045 - 4.876 0.078 - 0.135

Ratio – ModOLS – Predictive 0.949 - 5.070 0.081 - 0.144

Eb_CUE 0.947 - 5.075 0.081 - 0.144

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 12: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Logistic Model – Naive and Adjusted estimators

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Linkage scenario Estimator Beta Standard Error

Population True Value -1.680 0.087

Perfect Linkage Naïve -1.744 0.096

Gold Linkage Naïve -1.762 0.100

Silver Linkage

 

Naïve -1.795 0.106

Est. Equ. ML -1.798 0.106

LL -1.803 0.107

Est. Equ. Ch. -1.817 0.107

Bronze Linkage

Naïve -1.734 0.101

Est. Equ. ML -1.741 0.102

LL -1.755 0.102

Est. Equ. Ch. -1.789 0.104

Page 13: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Remarks

• Relevance of the missing matches to completely remove linkage errors

effect on the estimate bias.

• The naïve estimators under perfect linkage and Gold scenario are still

biased due to missing true matches.

• Again, in the logistic regression, under the Bronze scenario the naïve

estimate is less biased because there the missed matches component is

lower than in the other scenarios.

• The correction for bias is effective in the linear case (achieving a bias

reduction of about 10% for the Silver scenario and higher in the Bronze

one) but more work is needed for the logistic case where the naïve

estimator performs slightly better.

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 14: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Future works

• Further works to investigate linkage errors effects on variability component.

• Further analyses to assess the trade-off in adjusting for bias with respect to

the expected increase of variance.

• More flexible framework, as in Chipperfield et al. (2011), where

exchangeability of linkage errors is not required and missed matches are

explicitly considered

• Finally, here the probability of being correctly linked and the probability of

erroneous missed matches are assumed to be known, whereas the linkage

errors evaluation is not a straightforward task

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 15: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Bibliography

Biemer (2010) Total Survey Error Design, Implementation, And Evaluation Public Opinion Quarterly, Vol. 74, No. 5, 2010

Chambers R. (2009) Regression analysis of probability-linked data, Official Statistics Research Series, Vol. 4.

Chipperfield, J. O., Bishop, G. R . and Campbell P. (2011). Maximum likelihood estimation for contingency tables and logistic regression with incorrectly linked data, Survey Methodology, Vol. 37, No. 1

Fellegi I.P., Sunter A.B. (1969) “A Theory for record linkage”, Journal of the American Statistical Association, 64, 1183-1210.

Lahiri, P., and Larsen, M.D. (2000). Model based analysis of records linked using mixture models. Proc. Of the section on survey research methods, ASA, 11-19

Lahiri, P., and Larsen, M.D. (2005). Regression analysis with linked data. Journal of the American Statistical Association, 100, 222-230.

McLeod, Heasman and Forbes, (2011) Simulated data for the on the job training, Essnet DI http://www.cros-portal.eu/content/job-training

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

Page 16: When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto

Bibliography

Neter, J., Maynes, S., Ramanathan, R. (1965): The effect of mismatching on the measurement of response errors, JASA

RELAIS, (2011). User’s guide version 2.2, available at http://joinup.ec.europa.eu/software/relais/release/22

Scheuren, F., Winkler, W.E. (1993): Regression analysis of data files that are computer matched, Survey Methodology, 39-58

Scheuren, F., Winkler, W.E. (1997): Regression analysis of data files that are computer matched part II, Survey Methodology, 157-165.

Winkler, W.E. (2014), Quality and Analysis of National Files - Computational Methods for Censuses and Surveys, Presentation, January 9, 2014

Zhang, L.-C. (2012), Topics of statistical theory for register-based statistics and data integration. Statistica Neerlandica, 66

Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014