0% found this document useful (0 votes)
102 views32 pages

Missing Data & How To Handle It

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 32

Missing data & how to

handle it
Arooj Arshad
PhD Scholar

Discuss ways to evaluate and
understand missing data
Discuss common missing data
Know the advantages and
disadvantages of common methods
Treatment of the missing data

Missing data can occur for many

participants can fail to respond to
questions (legitimately or illegitimately
more on that later),
equipment and data collecting or recording
mechanisms can malfunction,
subjects can withdraw from studies before
they are completed,
data entry errors can occur.

Difference between missing and

legitimate missing data

Methods for analyzing missing data require

assumptions about the nature of the data and about
the reasons for the missing observations that are
often not acknowledged.
When researchers use missing data methods without
carefully considering the assumptions required of that
method, they run the risk of obtaining biased and
misleading results. Reviewing the stages of data
collection, data preparation, data analysis, and
interpretation of results will highlight the issues that
researchers must consider in making a decision about
how to handle missing data in their work.

Point to be remember.
All researchers should examine their
data for missingness, and
researchers wanting the best (i.e.,
the most Replicable and
Generalizable) results from their
research need to be prepared to deal
with missing data in the most
appropriate and desirable way

Missing Data Mechanisms

Missing Completely at Random (MCAR)
Probability of the missing data on Y is unrelated to Y
and X

Example: the reporting of income by the respondents.

Checked with the help of Littles MCAR test.

Missing at Random (MAR)

Probability of missing data on y is relayed to X.
Example: for really sick patients, clinicians may not draw
blood for routine labs.

Not Missing at Random

Probability of missing data on Y is dependent on value
of Y
Example: Respondents with high income less likely to report

Missing Data Consequences

deviates from the
quantity of
No bias is the data
is MCAR, but bias
can occur with not

Missing data can
sometimes leas to
wrong standard
Wrong study
conclusions about
relationship of
variables to

Commonly-Used Missing Data

Handling Methods

Commonly-Used Missing Data

Deletion Methods
Listwise/complete case deletion,
pairwise deletion

Single Imputation Methods

Mean/mode substitution, dummy
variable method, single regression

Model-Based Methods
Maximum Likelihood, Multiple

Deletion Method

Listwise Deletion (Complete Case

Only analyze cases
with complete data
dropping the missing
When a researcher is
estimating a model,
such as a linear
regression, most
statistical packages
use listwise deletion
by default.

Listwise Deletion (Complete Case

Ease of implementation.
Comparability across analyses

Reduces statistical power (because lowers n a researcher
cannot anticipate if an adequate amount of data remain for
the analysis).
Doesnt use all information
Estimates may be biased if data isnt MCAR (complete case
analysis assumes that the observed complete cases are a
random sample of the originally targeted sample, or in
Rubin's (1976) terminology, that the missing data are MCAR)

Pairwise deletion (Available Case

Analysis with all cases in which
the variables of interest are
Keeps as many cases as
possible for each analysis.
Uses all information
possible with each analysis.
Cant compare analyses
because sample different
each time.

Single Imputation Methods

Single Imputation Methods

Mean/Mode substitution
Dummy variable control
Conditional mean substitution

Mean/Mode Substitution
Replace missing value with sample mean
or mode
Run analyses as if complete cases analysis
Can use complete case analysis methods

Reduces variability
Weakens covariance and correlation estimates
in the data (because It ignores relationship
between variables)

Dummy Variable Adjustment

Create an indicator for missing value (1=value is
missing for observation; 0=value is observed for
Impute missing values to a constant (such as the
Uses all available information about missing

Results in biased estimates
Not theoretically driven

Regression Imputation
Replaces missing values with
predicted score from a regression
Uses information from observed
Overestimates model fit and
correlation estimates
Weakens variance

Model Based Methods

Model Based Methods

Maximum Likelihood Using EM
Multiple imputation
These methods share two assumptions:
that the joint distribution of the data is
multivariate normal, and that the
missing data mechanism is ignorable.

Identifies the set of parameter values that produces the

highest log-likelihood.
ML estimate: value that is most likely to have resulted in
the observed data
Conceptually, process the same with or without missing
Uses full information (both complete cases and
incomplete cases) to calculate log likelihood
Unbiased parameter estimates with MCAR/MAR data
SEs biased downwardcan be adjusted by using observed
information matrix

we can base estimation on the

likelihood of the observed data.

Multiple Imputation
1. Impute: Data is filled in with imputed values using
specified regression model
This step is repeated mtimes, resulting in a separate dataset
each time.
2. Analyze: Analyses performed within each dataset
3. Pool: Results pooled into one estimate
Variability more accurate with multiple imputations for each
missing value
Considers variability due to sampling AND variability due to
Cumbersome coding
Room for error when specifying models

Allison, Paul D. 2001. Missing Data.Sage University
Papers Series on Quantitative Applications in the
Social Sciences.Thousand Oaks: Sage.
Enders, Craig. 2010. Applied Missing Data Analysis.
Guilford Press: New York.
Little, Roderick J., Donald Rubin. 2002. Statistical
Analysis with Missing Data. John Wiley & Sons, Inc:
Schafer, Joseph L., John W. Graham. 2002. Missing
Data: Our View of the State of the Art.
Psychological Methods.

You might also like