Missing Data & How To Handle It
Missing Data & How To Handle It
Missing Data & How To Handle It
handle it
Arooj Arshad
PhD Scholar
Goals
Discuss ways to evaluate and
understand missing data
Discuss common missing data
methods
Know the advantages and
disadvantages of common methods
Treatment of the missing data
Point to be remember.
All researchers should examine their
data for missingness, and
researchers wanting the best (i.e.,
the most Replicable and
Generalizable) results from their
research need to be prepared to deal
with missing data in the most
appropriate and desirable way
possible.
Variance
Missing data can
sometimes leas to
wrong standard
errors.
Wrong study
conclusions about
relationship of
variables to
outcomes.
Model-Based Methods
Maximum Likelihood, Multiple
imputation
Deletion Method
Disadvantage
Reduces statistical power (because lowers n a researcher
cannot anticipate if an adequate amount of data remain for
the analysis).
Doesnt use all information
Estimates may be biased if data isnt MCAR (complete case
analysis assumes that the observed complete cases are a
random sample of the originally targeted sample, or in
Rubin's (1976) terminology, that the missing data are MCAR)
Mean/Mode Substitution
Replace missing value with sample mean
or mode
Run analyses as if complete cases analysis
Advantages
Can use complete case analysis methods
Disadvantages
Reduces variability
Weakens covariance and correlation estimates
in the data (because It ignores relationship
between variables)
Disadvantage
Results in biased estimates
Not theoretically driven
Regression Imputation
Replaces missing values with
predicted score from a regression
equation.
Advantage:
Uses information from observed
data
Disadvantages:
Overestimates model fit and
correlation estimates
Weakens variance
Multiple Imputation
1. Impute: Data is filled in with imputed values using
specified regression model
This step is repeated mtimes, resulting in a separate dataset
each time.
2. Analyze: Analyses performed within each dataset
3. Pool: Results pooled into one estimate
Advantages:
Variability more accurate with multiple imputations for each
missing value
Considers variability due to sampling AND variability due to
imputation
Disadvantages:
Cumbersome coding
Room for error when specifying models
References
Allison, Paul D. 2001. Missing Data.Sage University
Papers Series on Quantitative Applications in the
Social Sciences.Thousand Oaks: Sage.
Enders, Craig. 2010. Applied Missing Data Analysis.
Guilford Press: New York.
Little, Roderick J., Donald Rubin. 2002. Statistical
Analysis with Missing Data. John Wiley & Sons, Inc:
Hoboken.
Schafer, Joseph L., John W. Graham. 2002. Missing
Data: Our View of the State of the Art.
Psychological Methods.