general analysis in quantitative methods [4mm] british ...€¦ · british academy of management...

15
Background Missing data analysis in R rggobi: Data Visualistion Amelia: a program for missing data conclusion General Analysis in Quantitative Methods British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson School of Education University of Manchester Graeme D. Hutcheson Missing Data Background Missing data analysis in R rggobi: Data Visualistion Amelia: a program for missing data conclusion missing data data imputation missing data I Data sets with missing values are very common in the social sciences. I Missing data is commonly ‘dealt with’ by using: I list-wise deletion I simple data replacement (random values, mean values or values predicted directly from regression models) I removing variables with relatively large amounts of missing data from the analysis. None of these techniques is adequate. Graeme D. Hutcheson Missing Data

Upload: others

Post on 19-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

General Analysis in Quantitative Methods

British Academy of Management

November 16-17, London

Graphical Methods for Investigating Missing Data

Graeme Hutcheson

School of EducationUniversity of Manchester

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing datadata imputation

missing data

I Data sets with missing values are very common in the socialsciences.

I Missing data is commonly ‘dealt with’ by using:

I list-wise deletionI simple data replacement (random values, mean values or

values predicted directly from regression models)I removing variables with relatively large amounts of missing

data from the analysis.

None of these techniques is adequate.

Graeme D. Hutcheson Missing Data

Page 2: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing datadata imputation

King et al., 2001: American Political Science Review

‘...approximately 94% (of analyses) use listwise deletion toeliminate entire observations... List-wise deletion discards one-thirdof cases on average, which deletes both the few nonresponses andthe many responses in those cases. The result is a loss of valuableinformation at best and severe selection bias at worst.’

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing datadata imputation

Objectives

I Even though missing data is important, it is rarely dealt withor even acknowledged in social science research. Why is this?

I There is a general ignorance as to the damaging effects thatmissing data can have on analyses.

I There is a lack of training about imputation techniques andavailable ‘useable’ software.

I There is a general reluctance from reviewers to accept dataimputation (without detailed justifications they are often make‘easy targets’ for criticism).

I Data imputation is not easy and should not be achieved bypressing a single button in your statistics package.

Graeme D. Hutcheson Missing Data

Page 3: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing datadata imputation

data imputation

I Data imputation is now accepted (particularly multipleimputation), but has been very slow to be adopted byresearchers.

I The reason for this is only in part a lack of information andtraining. A bigger issue is that... in practice it can take manyhours or days to run and cannot be fully automated.... nocommercial software includes a correct implementation ofmultiple imputation (King et al., 2001).

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionR: programsconclusion

Missing data analysis in R

I The problem of software is now being addressed by the Rpackage. Researchers working on a number of techniques overthe last decade now have a platform on which to publish theirsoftware. This has led, in the last year or so, to manytechniques becoming accessible to researchers.

I A simple search for data imputation and missing data onCRAN shows the following (a selection of results are provided- note that these are only the packages that have the targetwords in the title):

Graeme D. Hutcheson Missing Data

Page 4: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionR: programsconclusion

Missing data analysis in R

I Amelia II: A Program for Missing Data

I arrayImpute: Missing imputation for microarray data

I cat: Analysis of categorical-variable datasets with missingvalues

I EMV: Estimation of Missing Values for a Data Matrix

I impute: Imputation for microarray data

I mi: Missing Data Imputation and Model Checking

I mice: Multivariate Imputation by Chained Equations

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionR: programsconclusion

Missing data analysis in R

I mirf: Multiple imputation and random forests forunobservable phase, high-dimensional data.

I mitools: Tools for multiple imputation of missing data

I mix: Estimation/multiple Imputation for Mixed Categoricaland Continuous Data

I pan: Multiple imputation for multivariate panel or clustereddata

Graeme D. Hutcheson Missing Data

Page 5: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionR: programsconclusion

Missing data analysis in R

I rggobi: Interface between R and GGobi (missing data tools)

I SeqKnn: Sequential KNN imputation method

I SimHap: A comprehensive modeling framework forepidemiological outcomes and a multiple-imputation approachto haplotypic analysis of population-based data

I VIM: Visualization and Imputation of Missing Values

I yaImpute: An R Package for k-NN Imputation

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionR: programsconclusion

R add-on packages

I This is an exciting time in statistics and data analysis as thesetechniques are only now being made available to allresearchers (most of these programs have been uploaded inthe last year).

I Many of the packages listed above also have point-and-clickinterfaces which makes them simple to operate (see, forexample, rggobi, VIM, Amelia) and all have comprehensivemanuals available for download from CRAN.

I This seminar will briefly demonstrate two packages rggobi, adata visualization package and Amelia II, a data imputationpackage.

Graeme D. Hutcheson Missing Data

Page 6: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing data shown in margin plotsvisualising missing dataImputing datachecking imputed data (simple imputation)checking imputed data (multiple imputation)

rggobi: A data visualistion program

Full details of ‘rggobi’ can be found at:http://www.ggobi.org/rggobi

Information about R and installing packages can be found at:

http://www.r-project.orghttp://www.rgsweb.net

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing data shown in margin plotsvisualising missing dataImputing datachecking imputed data (simple imputation)checking imputed data (multiple imputation)

rggobi: A data visualistion program

The following analyses are taken directly from the ggobi websitehttp://www.ggobi.org/ and the book:

Cook, D. and Swayne, D. F. (2007). Interactive and DynamicGraphics for Data Analysis: With Examples Using R and GGobi.Springer.

The data show environmental readings for two years (an el-ninoyear (1997) and a non-el-nino year (1993)).

Graeme D. Hutcheson Missing Data

Page 7: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing data shown in margin plotsvisualising missing dataImputing datachecking imputed data (simple imputation)checking imputed data (multiple imputation)

missing data shown in margin plots

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing data shown in margin plotsvisualising missing dataImputing datachecking imputed data (simple imputation)checking imputed data (multiple imputation)

rggobi: visualising missing data

Graeme D. Hutcheson Missing Data

Page 8: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing data shown in margin plotsvisualising missing dataImputing datachecking imputed data (simple imputation)checking imputed data (multiple imputation)

rggobi: imputing data

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing data shown in margin plotsvisualising missing dataImputing datachecking imputed data (simple imputation)checking imputed data (multiple imputation)

rggobi: imputing data

Graeme D. Hutcheson Missing Data

Page 9: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing data shown in margin plotsvisualising missing dataImputing datachecking imputed data (simple imputation)checking imputed data (multiple imputation)

rggobi: checking imputed data (simple imputation)

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

missing data shown in margin plotsvisualising missing dataImputing datachecking imputed data (simple imputation)checking imputed data (multiple imputation)

rggobi: checking imputed data (multiple imputation)

Graeme D. Hutcheson Missing Data

Page 10: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionGUIinputoptionsoutput

Multiple imputation

I Methodologists and statisticians agree that ‘multipleimputation’ is a superior approach to the problem of missingdata scattered through ones explanatory and dependentvariables than the methods currently used in applied dataanalysis (King et al., 2001: American Political ScienceReview).

I Amelia II is a package that implements a sophisticatedmultiple imputation of missing data and also allowsdiagnostics to assess the utility of the imputed data.

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionGUIinputoptionsoutput

Amelia: a simple GUI

Graeme D. Hutcheson Missing Data

Page 11: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionGUIinputoptionsoutput

Amelia: data input

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionGUIinputoptionsoutput

Amelia: options - variables

Graeme D. Hutcheson Missing Data

Page 12: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionGUIinputoptionsoutput

Amelia: options - Time Series/Cross Sectional data

With amelia, time series and cross-sectional indices can be set.

Researchers often also have additional prior information aboutmissing data values based on previous research, academicconsensus, or personal experience. This information can beincorporated into the data imputation algorithm to produce vastlyimproved imputations.

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionGUIinputoptionsoutput

Amelia: options - priors

Case priors and distributional priors can be easily coded using theGUI.

Graeme D. Hutcheson Missing Data

Page 13: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionGUIinputoptionsoutput

Amelia: output

The multiple imputed data files can be saved in a number offormats.

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionGUIinputoptionsoutput

Amelia: diagnostics - comparing imputed and observed

Graeme D. Hutcheson Missing Data

Page 14: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionGUIinputoptionsoutput

Amelia: diagnostics - overimputation

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

introductionGUIinputoptionsoutput

Amelia and rggobi

The values imputed using amelia can easily be saved and inspectedusing the graphical capabilities of rggobi. Data can be multiplyimputed and also checked graphically for fit.

Graeme D. Hutcheson Missing Data

Page 15: General Analysis in Quantitative Methods [4mm] British ...€¦ · British Academy of Management November 16-17, London Graphical Methods for Investigating Missing Data Graeme Hutcheson

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

King et al., 2001: American Political Science Review

‘For political scientists, almost any disciplined statistical model ofmultiple imputation would serve better than current practices. Thethreats to the validity of inferences from listwise deletion are ofroughly the same magnitude as those from the much better knownproblems of omitted variable bias.’

Graeme D. Hutcheson Missing Data

BackgroundMissing data analysis in Rrggobi: Data Visualistion

Amelia: a program for missing dataconclusion

Conclusion

I That was just 2 of the many programs available for dataimputation and missing data analysis.

I If you are interested in missing data analysis, investigate theavailable packages (see the manuals) and install those thatmight be of use.

I see www.RGSweb.net (for data coding and generalinformation about R).

Graeme D. Hutcheson Missing Data