on the pr e sen c e of mi s s i ng v a lues
DESCRIPTION
- PowerPoint PPT PresentationTRANSCRIPT
JENA GRADUATE ACADEMY Dr. Friedrich Funke
JENA GRADUATE ACADEMY Dr. Friedrich Funke
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Learning objectives
? What are missing values
? How do I basically treat missing data
? Why are data missing
? How do I detect (the systematics of) missingness
? How do I treat missing data - revisited
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Basic Types of Missing Values
• Unit-nonresponse (drop-out, attrition etc.)• Item-nonresponse• Missing Values by design
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Something is Missing - Why worry?
• Missing values are almost everywhere• Inefficiency (lack of power)• Bias of estimation (!!!)
Missing value analysis can support our understanding of the data!
Missing value management (examples)
Deletion
• Listwise deletion (complete cases analysis)
• Pairwise deletion (available data analysis)
• Both are unwise deletion
Imputation
• Mean imputation• Conditional Mean (regression)• Hot deck/cold deck• Maximum likelihood (EM,
FIML)• Multiple imputation
Deletion
Listwise Deletion
• most common way of dealing with missing data
• (implicitly in SPSS)• conservative
»At least I do nothing wrong«• Can result in zero cases
Pairwise deletion
• Estimate each moment with all available non-missing cases
• Appears to use all information in data
• Covariance matrices can become non-positive-definite
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Mean imputation
• Text neu machen!!!!!
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Regression imputation
x_completeSD13,002,001,000,00-1,00-2,00-3,00
com
plet
e de
pend
ent
var
r= .6
0
4,00
2,00
0,00
-2,00
-4,00
x_completeSD13,002,001,000,00-1,00-2,00-3,00
MCAR1
0_re
g
3,00
2,00
1,00
0,00
-1,00
-2,00
-3,00
• Actually a form of conditional mean imputation • Very elegant, if you add residuals (stochastic
regression imputation, mean=0 and variance equal to the residual variance)
• Actually a form of conditional mean imputation • Very elegant, if you add residuals (stochastic
regression imputation, mean=0 and variance equal to the residual variance)
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Hot deck imputation
• fills in missing values on incomplete records using values from similar, but complete records of the same dataset (hot deck of punch cards)
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Cold deck imputation
• fills in missing values on incomplete records using values from similar, but complete records of external dataset
• e.g. Historical imputation
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Maximum Likelihood Approaches
• Simple idea, but computationally complex
• Loosely speaking, for a fixed set of data and underlying probability model, maximum likelihood picks the values of the model parameters that make the data "more likely" than any other values of the parameters would make them.
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Multiple Imputation
• Combination of several random imputations and integration
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Learning objectives
? What are missing values
? How do I basically treat missing data
? Why are data missing
? How do I detect (the systematics of) missingness
? How do I treat missing data - revisited
Missingness is a probabilistic phenomenon
Dataset (data matrix)
3 4
3 5 4
4 3
2 3 2
3 3 2
4 2 2
X
MV »mechanism« (indicator matrix)
1 0 1 0
0 1 1 1
0 0 1 1
1 1 0 1
1 1 1 0
1 0 1 1
R
1if observed
0 if missingR
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Typology of missingness distributions
• MCAR Missing completely at random
• MAR Missing completely at random
• MNAR Non-ignorable
(eq. 1 and 2 are violated, missingness depends on the missing values
itself)
( ) ocom bsP R YY P
( )comP Y P R
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Typology of missingness distributions
• X completely observed• Y variable with some missings• R missingness• missingness »mechanism«
X
Y R
MCAR
Missingness is independent from empirical data
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Typology of missingness distributions
• X completely observed• Y variable with some missings• R missingness• missingness »mechanism«
X
Y R
MAR
Missingness is related to observed data
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Typology of missingness distributions
• X completely observed• Y variable with some missings• R missingness• missingness »mechanism«
X
Y R
MNAR
Missingness is related to missing data as well
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Typology of missingness distributions
• MCAR Missing completely at random
• MAR Missing completely at random
• MNAR Non-ignorable X
Y R
X
Y R
X
Y R
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Examples for MNAR
• We are interested in income, but managers refuse to answer
• We are interested in prejudice, but the racists skip that scale
• We are interested in depression scores, but the depressed are too tired to complete the questionnaire
X
Y R
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Now you can answer the question:
• Does this rule of thumb make sense?
• If up to 5% of my data are missing, I don‘t have a problem. If 50% are missing I am lost.
NO! The amount of missingness is much less important than the reason for missingness!
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Example: Perfect situation
Amount of missingness
10 % Missing 90% Missing
x_completeSD13,002,001,000,00-1,00-2,00-3,00
com
plet
e de
pend
ent
var r=
.60
4,00
2,00
0,00
-2,00
-4,00
presentmissing
MV Indicator MCAR 10%
x_completeSD13,002,001,000,00-1,00-2,00-3,00
com
plet
e de
pend
ent
var r=
.60
4,00
2,00
0,00
-2,00
-4,00
presentmissing
MV Indicator MCAR 90%
missingmissing
presentpresent
missingmissing
presentpresent
Mechanism of missingness
MAR• Missingness depends mainly
on X• solvable
MNAR/NI• Missingness depends mainly
on Y• Big trouble ahead
Mechanism of missingness
.60 .60 .66 .59 .58 R
1000 904 93 749 768 N
Biased Median
Imputation with MCARMCAR
Although 90% are missing, model based imputation can reproduce the data. Even under MCAR mean imputation is evil!
.60 0.66 0.94 0.2 0.65 R
1000 93 1000 1000 1000 N
Imputation with MARMAR
Under MAR model based imputation can reproduce the data. Mean imputation is evil!
0.59 0.49 0.63 0.60 0.66 R
749 1000 1000 1000 1000 N
JENA GRADUATE ACADEMY Dr. Friedrich Funke
Take home message
• missing values are not only decreasing efficiency/power
• they can (severely) bias the parameter estimates• listwise and especially pairwise deletion is unwise
deletion• naïve unconditional imputation is evil• understand the missingness „mechanism“• under MCAR relax• Under MAR model based imputation is no alchemy
Best Practice - PREVENTION