imputation techniques for missing data in clinical trials

26
Imputation Techniques For Missing Data In Clinical Trials Presentation by, NITHIN GEORGE VINOD PROJECT ASSOCIATE CENTRE FOR LIVESTOCK DEVELOPMENT AND POLICY RESEARCH KERALA VETERINARY AND ANIMAL SCIENCES UNIVERSITY 1

Upload: nitin-george

Post on 03-Jun-2015

500 views

Category:

Education


3 download

DESCRIPTION

Missing data are unavoidable in clinical and epidemiological researches. Missing data leads to bias and loss of information in research analysis. Usually we are not aware of missing data techniques because we are depending on some software’s. The objective of this seminar is to introduce different missing data mechanisms and imputation techniques for missing data with the help of examples.

TRANSCRIPT

Page 1: Imputation techniques for missing data in clinical trials

1

Imputation Techniques For Missing Data In Clinical Trials

Presentation by, NITHIN GEORGE VINOD

PROJECT ASSOCIATECENTRE FOR LIVESTOCK DEVELOPMENT AND POLICY RESEARCH

KERALA VETERINARY AND ANIMAL SCIENCES UNIVERSITY

Page 2: Imputation techniques for missing data in clinical trials

2

Contents• Objectives• Introduction to missing data• Reasons for missing data• Missing data mechanism• Simple methods• Single imputation

» Last observation carried forward (LOCF)»Hot-deck imputation»Arithmetic mean imputation»Regression imputation»Stochastic imputation

• Multiple imputation

Page 3: Imputation techniques for missing data in clinical trials

3

Objectives

To introduce different imputation techniques in missing data mechanism.

Page 4: Imputation techniques for missing data in clinical trials

4

Introduction to missing data

Missing data Some of the values in the data set are either

lost or not observed or not available due to natural or non natural reasons.

(James R. Carpenter: Missing data in randomized controlled trials)

Page 5: Imputation techniques for missing data in clinical trials

5

Reasons for missing data

• patients are in very critical conditions.• patients wants to change the treatment.• Missing due to the break down of machines• Failed in continuing the follow up.• Failed to answer some questionnaires.• Patients are cured or died before the study.• Investigator is forgot to collect the data• Family migrated• Patients profile may missing

Page 6: Imputation techniques for missing data in clinical trials

6

Effect of missing data

• Bias • Power and variability• Inaccurate results

Page 7: Imputation techniques for missing data in clinical trials

7

Missing data mechanism (Rubin 1976)

• Missing Completely At Random (MCAR).• Missing at random (MAR).• Missing Not At Random (MNAR).

Page 8: Imputation techniques for missing data in clinical trials

8

Missing At Random (MAR)The probability of missing data on a variable Y is related to some other measured variables in the analysis model but not to the values of Y itself.

Examples • Missing blood pressure measurement may be lower than

measured blood pressure because younger people may be more likely to have missing blood pressure measurement.

• In the study of quality of life the psychologist finds that elderly patients with and patients with less education have a higher probability to refuse the QL questionnaire.

Page 9: Imputation techniques for missing data in clinical trials

9

Missing Completely At Random The probability of missing data on a variable Y is

unrelated to other measured variables and unrelated to the values of Y itself.

Examples• Blood Pressure measurement is missing because of

break down of an automatic sphygmomanometer.• Suppose that a psychologist is studying quality of life

in a group of cancer patients and finds that patient is missing, because they migrated to other place.

Page 10: Imputation techniques for missing data in clinical trials

10

Missing Not At Random (MNAR)The probability of missing data in a variable Y is related to the values of Y itself, even after controlling for other variables.

Examples• Suppose the study is not effective for reducing

the blood pressure, their may be a chance of subjects drop out.

Page 11: Imputation techniques for missing data in clinical trials

11

Different methods to deal missing data

• List Wise deletion• Pair Wise Deletion• Last Observation Carried Forward• Hot-Deck Imputation• Arithmetic Mean Imputation• Regression Imputation• Stochastic Regression Imputation• Cold-Deck Imputation• Averaging The Available Pattern Imputation• Maximum Likelihood Estimation• Markov chain Monte Carlo method

Page 12: Imputation techniques for missing data in clinical trials

12

Simple techniques• List wise deletion Discards the data for any case that has one or

more missing value.

Page 13: Imputation techniques for missing data in clinical trials

13

Single Imputation

Method that imputes the missing data with seemingly suitable replacement value.

Page 14: Imputation techniques for missing data in clinical trials

14

Last Observation Carried Forward (LOCF)

LOCF takes the last available response and substitutes the value into all subsequent missing values.

Advantages• It generates a complete data set.• Easy to implementDisadvantages• Produce biased estimates.• Not sensible when the data are MCAR.

Page 15: Imputation techniques for missing data in clinical trials

15

Hot-deck Imputation (Scheuren, 2005)

Replaces each missing value with a random draw from a subsample of respondents that scored similarly on a data set of matching variables.

Advantages• It generates a complete data set.Disadvantages• Not well suited for estimating measures of

association.• Produce substantially biased estimates of correlation

and regression coefficients.

Page 16: Imputation techniques for missing data in clinical trials

16

Arithmetic Mean Imputation (Wilks, 1932)

Filling the missing values with arithmetic mean of the available cases.

Advantages• It is applicable for all type of missingness.• It also generate a complete data set.Disadvantages• Reduces the variability of the data.• Affect the measures of association.

Page 17: Imputation techniques for missing data in clinical trials

17

Regression Imputation (Buck, 1960)

Replaces missing values with predicted scores from a regression equation by using information from the complete variables.

Advantages• It generates a complete data set.• Variables tend to be correlatedDisadvantages• Inputs data with perfectly correlated scores• Over estimate correlation• bias

Page 18: Imputation techniques for missing data in clinical trials

AGE QL QL_missing R I

35 90 90

36 89 89

38 88 88

38 87 87

41 82 82

45 80 80

47 78 78

48 76 76

49 71 71

55 73 73

57 70 70

59 70 70

62 68 __ 65.03

65 67 __ 62.37

68 67 __ 59.71

72 63 __ 56.17

72 60 __ 56.17

73 59 __ 55.28

75 52 __ 53.51

76 51 __ 52.63

QL R I

mean 70.74 72.05

SD 12.726 11.74

QL = βo+β1*AGEQL = 119.950-.886*AGE

Page 19: Imputation techniques for missing data in clinical trials

19

Stochastic Regression Imputation

Uses regression equations to predict the incomplete data with a normally distributed residual term.

Advantages• Most appropriate method.• Input approximately equal results.• It gives unbiased parameter under an MAR data

mechanism.Disadvantage• Under estimate standard error.

Page 20: Imputation techniques for missing data in clinical trials

20

AGE QL QL_missing R V S I

35 90 90

36 89 89

38 88 88

38 87 87

41 82 82

45 80 80

47 78 78

48 76 76

49 71 71

55 73 73

57 70 70

59 70 70

62 68 __ 5.67 70.69

65 67 __ 3.72 66.08

68 67 __ -4.13 55.57

72 63 __ -0.39 55.77

72 60 __ -7.20 48.96

73 59 __ 2.39 57.66

75 52 __ -6.64 46.86

76 51 __ 1.84 54.45

QL = βo+β1*AGE+ʐiQL = 119.950-.886*AGE+ʐi

QL S I

mean 70.74 70.50

SD 12.726 13.61

Page 21: Imputation techniques for missing data in clinical trials

21

Multiple imputation

Creates several copies of the data and imputes each copy with different plausible estimates of missing values.

Page 22: Imputation techniques for missing data in clinical trials

22

Procedure

I. Imputation phase• Data augmentation

» I-step» P-step

II. Analysis phase• A statistical analysis is performed on each data

using the same technique.

III. Pooling phase• Estimates and their standard errors are averaged

into a single set of value.

Page 23: Imputation techniques for missing data in clinical trials

23

Data augmentation

I-step

stochastic imputation

New data set

P-step

Data set 1 Data set 2

Data set 20

Page 24: Imputation techniques for missing data in clinical trials

24

Conclusion

• Imputation is an attractive idea because it produce a complete data set and make the data usable.

• Each imputation produce biased parameter estimates.• Stochastic regression is the only traditional approach

and yield unbiased estimate under an MAR mechanism.

• Multiple imputation also produce similar estimates• The techniques are rare, if the data is categorical and

if the missing mechanism is MNAR.

Page 25: Imputation techniques for missing data in clinical trials

25

References

• Amanda N. Baraldi & Craig K. Ender: An introduction to missing data analysis. Journal of school psychology . 2009; 9-18.

• Craig K. Enters: Applied missing data analysis. The Guilford press. New York. London. 2010, 2-85.

• James R. Carpenter: Missing data in randomized controlled trials-A practical guide 2007; 4-16

• Schafer J L, & Graham J W. Missing data: Our view of the state of the art 2002; 147-77.

Page 26: Imputation techniques for missing data in clinical trials

26

Thank You