imputation for multi care data

14
Imputation for Multi Care Data Naren Meadem

Upload: trang

Post on 06-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Naren Meadem. Imputation for Multi Care Data. Introduction. What is certain in life? Death Taxes What is certain in research? Measurement error Missing data Missing data can be: Due to preventable errors, mistakes, or lack of foresight by the researcher - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Imputation for Multi Care Data

Imputation for Multi Care Data

Naren Meadem

Page 2: Imputation for Multi Care Data
Page 3: Imputation for Multi Care Data

Introduction• What is certain in life?

– Death– Taxes

• What is certain in research?– Measurement error– Missing data

• Missing data can be:– Due to preventable errors, mistakes, or lack of foresight by the researcher– Due to problems outside the control of the researcher – Deliberate, intended, or planned by the researcher to reduce cost or

respondent burden– Due to differential applicability of some items to subsets of respondents – Etc.

Page 4: Imputation for Multi Care Data

Missing Data Mechanisms (1) Preliminaries:

Yobs: The non-missing or observed data Ymiss: The missing or unobserved data M: Whether the data on a given item for a given case is missing (1) or not

(0) Missing Completely at Random (MCAR)

The probability that an item is missing (M) is unrelated to either the observed (Yobs) or the unobserved (Ymiss) data

Missing at Random (MAR) The probability that an item is missing (M) may be related to the observed

data (Yobs) but is unrelated to the unobserved data (Ymiss) Missing Not at Random (MNAR)

The probability that an item is missing (M) is related to the (unknown) value of the unobserved data (Ymiss), even after conditioning on the observed data (Yobs)

Page 5: Imputation for Multi Care Data

Missing Data in Research Studies Missing data mechanism

Missing completely at random (MCAR)—Ignorable Missing at random (MAR)—Conditionally ignorable Missing not at random (MNAR)—Nonignorable

Amount of missing data Percent of cases with missing data Percent of variables having missing data Percent of data values that are missing

Pattern of missing data Missing by design Missing data patterns

UnivariateMonotonicFile matchingGeneral

Page 6: Imputation for Multi Care Data

Newer Missing Data Treatments

• Modern state-of-the-art missing data treatments for MAR data– Maximum likelihood– Multiple imputation

• Cutting edge investigational missing data treatments for MNAR data– Pattern mixture models– Selection models– Shared parameter models– Inverse probability weighting

Page 7: Imputation for Multi Care Data

Clustering methods: Mean substitution

Substitute the mean of the variable for the missing values

Page 8: Imputation for Multi Care Data

Graphical illustration

Page 9: Imputation for Multi Care Data

Better methods of handling missing data

Full information maximum likelihood methodsCan handle data that are MAR and NI

Special consideration required for NI dataImplemented as part of hierarchical linear modeling and

structural equation modelingMissing data handled during analysis

Multiple imputationCan also handle data that are MAR and NI

Special consideration required for NI dataSimulation-based approachMissing data are handled separately from analysis

Page 10: Imputation for Multi Care Data

Multiple imputation

Three steps:

1.Generate multiple complete-case datasets (imputations) through simulation (only 5 – 10 are needed)

2.Perform analyses on each imputation

3.Combine the multiple analyses using a set of special rules (Rubin’s (1987) rules)

Page 11: Imputation for Multi Care Data
Page 12: Imputation for Multi Care Data

Results

No Imputation

Naive Bayes Logistic Regression SVM

AUC: 0.6362 0.6025 0.635

Imputation

AUC: 0.6377 0.6033 0.649

Page 13: Imputation for Multi Care Data

Conclusions When you have missing data, think about WHY they are missing

Ask yourself whether you have observed variables that could explain why the data are missing

Missing data handled improperly can bias your conclusions Multiple imputation is one good way of handling missing data

Caveats: Multiple imputation is complex

An evolving fieldThe standards of reporting the results from imputed data are not well-established

If you need to do it (especially if you think your data are NI), read the source papers I referenced at the beginning of the slides

Page 14: Imputation for Multi Care Data

Questions?