on the pr e sen c e of mi s s i ng v a lues

28
JENA GRADUATE ACADEMY Dr. Friedrich Funke JENA GRADUATE ACADEMY Dr. Friedrich Funke

Upload: haile

Post on 16-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

- PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Page 2: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Learning objectives

? What are missing values

? How do I basically treat missing data

? Why are data missing

? How do I detect (the systematics of) missingness

? How do I treat missing data - revisited

Page 3: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Basic Types of Missing Values

• Unit-nonresponse (drop-out, attrition etc.)• Item-nonresponse• Missing Values by design

Page 4: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Something is Missing - Why worry?

• Missing values are almost everywhere• Inefficiency (lack of power)• Bias of estimation (!!!)

Missing value analysis can support our understanding of the data!

Page 5: On   the pr e sen c e of mi s s i ng v a lues

Missing value management (examples)

Deletion

• Listwise deletion (complete cases analysis)

• Pairwise deletion (available data analysis)

• Both are unwise deletion

Imputation

• Mean imputation• Conditional Mean (regression)• Hot deck/cold deck• Maximum likelihood (EM,

FIML)• Multiple imputation

Page 6: On   the pr e sen c e of mi s s i ng v a lues

Deletion

Listwise Deletion

• most common way of dealing with missing data

• (implicitly in SPSS)• conservative

»At least I do nothing wrong«• Can result in zero cases

Pairwise deletion

• Estimate each moment with all available non-missing cases

• Appears to use all information in data

• Covariance matrices can become non-positive-definite

Page 7: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Mean imputation

• Text neu machen!!!!!

Page 8: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Regression imputation

x_completeSD13,002,001,000,00-1,00-2,00-3,00

com

plet

e de

pend

ent

var

r= .6

0

4,00

2,00

0,00

-2,00

-4,00

x_completeSD13,002,001,000,00-1,00-2,00-3,00

MCAR1

0_re

g

3,00

2,00

1,00

0,00

-1,00

-2,00

-3,00

• Actually a form of conditional mean imputation • Very elegant, if you add residuals (stochastic

regression imputation, mean=0 and variance equal to the residual variance)

• Actually a form of conditional mean imputation • Very elegant, if you add residuals (stochastic

regression imputation, mean=0 and variance equal to the residual variance)

Page 9: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Hot deck imputation

• fills in missing values on incomplete records using values from similar, but complete records of the same dataset (hot deck of punch cards)

Page 10: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Cold deck imputation

• fills in missing values on incomplete records using values from similar, but complete records of external dataset

• e.g. Historical imputation

Page 11: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Maximum Likelihood Approaches

• Simple idea, but computationally complex

• Loosely speaking, for a fixed set of data and underlying probability model, maximum likelihood picks the values of the model parameters that make the data "more likely" than any other values of the parameters would make them.

Page 12: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Multiple Imputation

• Combination of several random imputations and integration

Page 13: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Learning objectives

? What are missing values

? How do I basically treat missing data

? Why are data missing

? How do I detect (the systematics of) missingness

? How do I treat missing data - revisited

Page 14: On   the pr e sen c e of mi s s i ng v a lues

Missingness is a probabilistic phenomenon

Dataset (data matrix)

3 4

3 5 4

4 3

2 3 2

3 3 2

4 2 2

X

MV »mechanism« (indicator matrix)

1 0 1 0

0 1 1 1

0 0 1 1

1 1 0 1

1 1 1 0

1 0 1 1

R

1if observed

0 if missingR

Page 15: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Typology of missingness distributions

• MCAR Missing completely at random

• MAR Missing completely at random

• MNAR Non-ignorable

(eq. 1 and 2 are violated, missingness depends on the missing values

itself)

( ) ocom bsP R YY P

( )comP Y P R

Page 16: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Typology of missingness distributions

• X completely observed• Y variable with some missings• R missingness• missingness »mechanism«

X

Y R

MCAR

Missingness is independent from empirical data

Page 17: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Typology of missingness distributions

• X completely observed• Y variable with some missings• R missingness• missingness »mechanism«

X

Y R

MAR

Missingness is related to observed data

Page 18: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Typology of missingness distributions

• X completely observed• Y variable with some missings• R missingness• missingness »mechanism«

X

Y R

MNAR

Missingness is related to missing data as well

Page 19: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Typology of missingness distributions

• MCAR Missing completely at random

• MAR Missing completely at random

• MNAR Non-ignorable X

Y R

X

Y R

X

Y R

Page 20: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Examples for MNAR

• We are interested in income, but managers refuse to answer

• We are interested in prejudice, but the racists skip that scale

• We are interested in depression scores, but the depressed are too tired to complete the questionnaire

X

Y R

Page 21: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Now you can answer the question:

• Does this rule of thumb make sense?

• If up to 5% of my data are missing, I don‘t have a problem. If 50% are missing I am lost.

NO! The amount of missingness is much less important than the reason for missingness!

Page 22: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Example: Perfect situation

Page 23: On   the pr e sen c e of mi s s i ng v a lues

Amount of missingness

10 % Missing 90% Missing

x_completeSD13,002,001,000,00-1,00-2,00-3,00

com

plet

e de

pend

ent

var r=

.60

4,00

2,00

0,00

-2,00

-4,00

presentmissing

MV Indicator MCAR 10%

x_completeSD13,002,001,000,00-1,00-2,00-3,00

com

plet

e de

pend

ent

var r=

.60

4,00

2,00

0,00

-2,00

-4,00

presentmissing

MV Indicator MCAR 90%

missingmissing

presentpresent

missingmissing

presentpresent

Page 24: On   the pr e sen c e of mi s s i ng v a lues

Mechanism of missingness

MAR• Missingness depends mainly

on X• solvable

MNAR/NI• Missingness depends mainly

on Y• Big trouble ahead

Page 25: On   the pr e sen c e of mi s s i ng v a lues

Mechanism of missingness

.60 .60 .66 .59 .58 R

1000 904 93 749 768 N

Biased Median

Page 26: On   the pr e sen c e of mi s s i ng v a lues

Imputation with MCARMCAR

Although 90% are missing, model based imputation can reproduce the data. Even under MCAR mean imputation is evil!

.60 0.66 0.94 0.2 0.65 R

1000 93 1000 1000 1000 N

Page 27: On   the pr e sen c e of mi s s i ng v a lues

Imputation with MARMAR

Under MAR model based imputation can reproduce the data. Mean imputation is evil!

0.59 0.49 0.63 0.60 0.66 R

749 1000 1000 1000 1000 N

Page 28: On   the pr e sen c e of mi s s i ng v a lues

JENA GRADUATE ACADEMY Dr. Friedrich Funke

Take home message

• missing values are not only decreasing efficiency/power

• they can (severely) bias the parameter estimates• listwise and especially pairwise deletion is unwise

deletion• naïve unconditional imputation is evil• understand the missingness „mechanism“• under MCAR relax• Under MAR model based imputation is no alchemy

Best Practice - PREVENTION