Skip to main content
Top
Published in: Emerging Themes in Epidemiology 1/2012

Open Access 01-12-2012 | Analytic perspective

Recovery of information from multiple imputation: a simulation study

Authors: Katherine J Lee, John B Carlin

Published in: Emerging Themes in Epidemiology | Issue 1/2012

Login to get access

Abstract

Background

Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases.

Methods

Simulated datasets (n = 1000) drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90%) were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI), with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses.

Results

For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate’s effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure.

Conclusions

Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple imputation can become unreliable and introduce bias not present in a complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method.
Appendix
Available only for authorised users
Literature
1.
go back to reference Rubin DB: Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.CrossRef Rubin DB: Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.CrossRef
2.
go back to reference Schafer JL: Analysis of Incomplete Multivariate Data. London: Chapman & Hall; 1997.CrossRef Schafer JL: Analysis of Incomplete Multivariate Data. London: Chapman & Hall; 1997.CrossRef
4.
go back to reference Sterne JAC, White IR, Carlin JB, Royston P, Kenward MG, Wood AM, Carpenter JR: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009, 338: b2393. 10.1136/bmj.b2393PubMedCentralCrossRefPubMed Sterne JAC, White IR, Carlin JB, Royston P, Kenward MG, Wood AM, Carpenter JR: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009, 338: b2393. 10.1136/bmj.b2393PubMedCentralCrossRefPubMed
5.
go back to reference Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P: A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology. 2001, 27: 85-95. Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P: A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology. 2001, 27: 85-95.
6.
go back to reference VanBuuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine. 1999, 18: 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-RCrossRef VanBuuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine. 1999, 18: 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-RCrossRef
7.
go back to reference Rubin D: Multiple imputation after 18+ years. J Am Stat Assoc. 1996, 91: 473-489. 10.1080/01621459.1996.10476908.CrossRef Rubin D: Multiple imputation after 18+ years. J Am Stat Assoc. 1996, 91: 473-489. 10.1080/01621459.1996.10476908.CrossRef
9.
go back to reference Carpenter JR, Kenward MG, White IR: Sensitivity analysis after multiple imputation under missing at random: a weighting approach. Statistical Methods in Medical Research. 2007, 16: 259-275. 10.1177/0962280206075303CrossRefPubMed Carpenter JR, Kenward MG, White IR: Sensitivity analysis after multiple imputation under missing at random: a weighting approach. Statistical Methods in Medical Research. 2007, 16: 259-275. 10.1177/0962280206075303CrossRefPubMed
10.
go back to reference Lee KJ, Carlin JB: Multiple imputation for missing data: fully conditional spacification versus multivariate normal imputation. Am J Epidemiol. 2010, 171: 624-632. 10.1093/aje/kwp425CrossRefPubMed Lee KJ, Carlin JB: Multiple imputation for missing data: fully conditional spacification versus multivariate normal imputation. Am J Epidemiol. 2010, 171: 624-632. 10.1093/aje/kwp425CrossRefPubMed
11.
go back to reference Schafer JL, Kang JDY: Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods. 2008, 13: 279-313.CrossRefPubMed Schafer JL, Kang JDY: Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods. 2008, 13: 279-313.CrossRefPubMed
12.
go back to reference StataCorp: Stata: Release 11. Statistical Software. College Station, TX: StataCorp LP; 2009. StataCorp: Stata: Release 11. Statistical Software. College Station, TX: StataCorp LP; 2009.
13.
go back to reference Royston P, Carlin JB, White IR: Multiple imputation of missing values: new features for “mim”. Stata J. 2009, 9: 252-264. Royston P, Carlin JB, White IR: Multiple imputation of missing values: new features for “mim”. Stata J. 2009, 9: 252-264.
14.
go back to reference Collins LM, Schafer JL, Kam C: A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods. 2001, 6: 330-351.CrossRefPubMed Collins LM, Schafer JL, Kam C: A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods. 2001, 6: 330-351.CrossRefPubMed
15.
go back to reference White I, Carlin J: Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate data. Statistics in Medicine. 2010, 29: 2920-2931. 10.1002/sim.3944CrossRefPubMed White I, Carlin J: Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate data. Statistics in Medicine. 2010, 29: 2920-2931. 10.1002/sim.3944CrossRefPubMed
16.
go back to reference Demirtas H, Freels SA, Yucel RM: Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment. J Stat Comput Simul. 2008, 78: 69-84. 10.1080/10629360600903866.CrossRef Demirtas H, Freels SA, Yucel RM: Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment. J Stat Comput Simul. 2008, 78: 69-84. 10.1080/10629360600903866.CrossRef
17.
go back to reference Marshall A, Altman DG, Royston P, Roger LH: Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Medical Research Methodology. 2010, 10: 7. 10.1186/1471-2288-10-7PubMedCentralCrossRefPubMed Marshall A, Altman DG, Royston P, Roger LH: Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Medical Research Methodology. 2010, 10: 7. 10.1186/1471-2288-10-7PubMedCentralCrossRefPubMed
18.
go back to reference Barzi F, Woodward M: Imputation of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies. Am J Epidemiol. 2004, 160: 34-45. 10.1093/aje/kwh175CrossRefPubMed Barzi F, Woodward M: Imputation of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies. Am J Epidemiol. 2004, 160: 34-45. 10.1093/aje/kwh175CrossRefPubMed
19.
go back to reference Rubin DB, Schenker N: Multiple imputation in health-care databases: an overview and some applications. Statistics in Medicine. 1991, 10: 585-598. 10.1002/sim.4780100410CrossRefPubMed Rubin DB, Schenker N: Multiple imputation in health-care databases: an overview and some applications. Statistics in Medicine. 1991, 10: 585-598. 10.1002/sim.4780100410CrossRefPubMed
Metadata
Title
Recovery of information from multiple imputation: a simulation study
Authors
Katherine J Lee
John B Carlin
Publication date
01-12-2012
Publisher
BioMed Central
Published in
Emerging Themes in Epidemiology / Issue 1/2012
Electronic ISSN: 1742-7622
DOI
https://doi.org/10.1186/1742-7622-9-3

Other articles of this Issue 1/2012

Emerging Themes in Epidemiology 1/2012 Go to the issue