Skip to main content
Top
Published in: Emerging Themes in Epidemiology 1/2017

Open Access 01-12-2017 | Analytic Perspective

Model checking in multiple imputation: an overview and case study

Authors: Cattram D. Nguyen, John B. Carlin, Katherine J. Lee

Published in: Emerging Themes in Epidemiology | Issue 1/2017

Login to get access

Abstract

Background

Multiple imputation has become very popular as a general-purpose method for handling missing data. The validity of multiple-imputation-based analyses relies on the use of an appropriate model to impute the missing values. Despite the widespread use of multiple imputation, there are few guidelines available for checking imputation models.

Analysis

In this paper, we provide an overview of currently available methods for checking imputation models. These include graphical checks and numerical summaries, as well as simulation-based methods such as posterior predictive checking. These model checking techniques are illustrated using an analysis affected by missing data from the Longitudinal Study of Australian Children.

Conclusions

As multiple imputation becomes further established as a standard approach for handling missing data, it will become increasingly important that researchers employ appropriate model checking approaches to ensure that reliable results are obtained when using this method.
Appendix
Available only for authorised users
Literature
1.
go back to reference Little RJ, D’Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, Frangakis C, Hogan JW, Molenberghs G, Murphy SA, et al. The prevention and treatment of missing data in clinical trials. N Engl J Med. 2012;367(14):1355–60.CrossRefPubMedPubMedCentral Little RJ, D’Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, Frangakis C, Hogan JW, Molenberghs G, Murphy SA, et al. The prevention and treatment of missing data in clinical trials. N Engl J Med. 2012;367(14):1355–60.CrossRefPubMedPubMedCentral
2.
go back to reference Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.CrossRef Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.CrossRef
3.
go back to reference Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393.CrossRefPubMedPubMedCentral Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393.CrossRefPubMedPubMedCentral
4.
go back to reference Kenward MG, Carpenter J. Multiple imputation: current perspectives. Stat Methods Med Res. 2007;16(3):199–218.CrossRefPubMed Kenward MG, Carpenter J. Multiple imputation: current perspectives. Stat Methods Med Res. 2007;16(3):199–218.CrossRefPubMed
5.
go back to reference Lee KJ, Carlin JB. Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. Am J Epidemiol. 2010;171(5):624–32.CrossRefPubMed Lee KJ, Carlin JB. Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. Am J Epidemiol. 2010;171(5):624–32.CrossRefPubMed
6.
go back to reference Collins LM, Schafer JL, Kam CM. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods. 2001;6(4):330–51.CrossRefPubMed Collins LM, Schafer JL, Kam CM. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods. 2001;6(4):330–51.CrossRefPubMed
7.
go back to reference Seaman S, Bartlett J, White I. Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Med Res Methodol. 2012;12(1):1–13.CrossRef Seaman S, Bartlett J, White I. Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Med Res Methodol. 2012;12(1):1–13.CrossRef
8.
go back to reference Lee KJ, Galati JC, Simpson JA, Carlin JB. Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study. Stat Med. 2012;31(30):4164–74.CrossRefPubMed Lee KJ, Galati JC, Simpson JA, Carlin JB. Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study. Stat Med. 2012;31(30):4164–74.CrossRefPubMed
10.
go back to reference Lee KJ, Carlin JB. Multiple imputation in the presence of non-normal data. Stat Med. 2017;36(4):606–17.CrossRefPubMed Lee KJ, Carlin JB. Multiple imputation in the presence of non-normal data. Stat Med. 2017;36(4):606–17.CrossRefPubMed
11.
go back to reference Hayati Rezvan P, Lee KJ, Simpson JA. The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol. 2015;15(1):1–14.CrossRef Hayati Rezvan P, Lee KJ, Simpson JA. The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol. 2015;15(1):1–14.CrossRef
12.
go back to reference Australian Institute of Family Studies. Longitudinal Study of Australian Children Data User Guide. Melbourne; 2011. Australian Institute of Family Studies. Longitudinal Study of Australian Children Data User Guide. Melbourne; 2011.
13.
go back to reference Nicholson J, Sanson A, Ungerer J, Wilson K, Zubrick S. Introducing the Longitudinal Study of Australian Children—LSAC discussion paper no. 1. Edited by Australian Institute of Family Studies; 2002. Nicholson J, Sanson A, Ungerer J, Wilson K, Zubrick S. Introducing the Longitudinal Study of Australian Children—LSAC discussion paper no. 1. Edited by Australian Institute of Family Studies; 2002.
14.
go back to reference Goodman R. The Strengths and Difficulties Questionnaire: a research note. J Child Psychol Psychiatry. 1997;38(5):581–6.CrossRefPubMed Goodman R. The Strengths and Difficulties Questionnaire: a research note. J Child Psychol Psychiatry. 1997;38(5):581–6.CrossRefPubMed
15.
go back to reference National Center for Education Statistics. Early Childhood Longitudinal Study (ECLS). Washington: Department of Education; 2004. National Center for Education Statistics. Early Childhood Longitudinal Study (ECLS). Washington: Department of Education; 2004.
16.
go back to reference Statistics Canada. National Longitudinal Survey of Children and Youth (NLSCY) Cycle 3 survey instruments: parent questionnaire. Canada: Ottowa; 2000. Statistics Canada. National Longitudinal Survey of Children and Youth (NLSCY) Cycle 3 survey instruments: parent questionnaire. Canada: Ottowa; 2000.
17.
go back to reference Zubrick SR, Lucas N, Westrupp EM, Nicholson JM. Parenting measures in the Longitudinal Study of Australian Children: Construct validity and measurement quality, waves 1 to 4. Canberra; 2014. Zubrick SR, Lucas N, Westrupp EM, Nicholson JM. Parenting measures in the Longitudinal Study of Australian Children: Construct validity and measurement quality, waves 1 to 4. Canberra; 2014.
18.
go back to reference Kessler RC, Barker PR, Colpe LJ, et al. Screening for serious mental illness in the general population. Arch Gen Psychiatry. 2003;60(2):184–9.CrossRefPubMed Kessler RC, Barker PR, Colpe LJ, et al. Screening for serious mental illness in the general population. Arch Gen Psychiatry. 2003;60(2):184–9.CrossRefPubMed
19.
go back to reference White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30(4):377–99.CrossRefPubMed White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30(4):377–99.CrossRefPubMed
20.
go back to reference van Buuren S. Flexible imputation of missing data. Boca Raton: CRC Press; 2012.CrossRef van Buuren S. Flexible imputation of missing data. Boca Raton: CRC Press; 2012.CrossRef
21.
go back to reference Schafer JL. Analysis of incomplete multivariate data. London: Chapman & Hall; 1997.CrossRef Schafer JL. Analysis of incomplete multivariate data. London: Chapman & Hall; 1997.CrossRef
22.
go back to reference Moons KGM, Donders RART, Stijnen T, Harrell FE Jr. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006;59(10):1092–101.CrossRefPubMed Moons KGM, Donders RART, Stijnen T, Harrell FE Jr. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006;59(10):1092–101.CrossRefPubMed
23.
24.
go back to reference Schafer JL, Olsen MK. Multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivar Behav Res. 1998;33(4):545–71.CrossRef Schafer JL, Olsen MK. Multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivar Behav Res. 1998;33(4):545–71.CrossRef
25.
go back to reference Soloff C, Lawrence D, Misson S, Johnstone R. LSAC technical paper no. 3: Wave 1 weighting and non-response; 2006. Soloff C, Lawrence D, Misson S, Johnstone R. LSAC technical paper no. 3: Wave 1 weighting and non-response; 2006.
26.
go back to reference van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007;16(3):219–42.CrossRefPubMed van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007;16(3):219–42.CrossRefPubMed
27.
go back to reference Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol. 2001;27:85–96. Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol. 2001;27:85–96.
28.
go back to reference von Hippel PT. Should a normal imputation model be modified to impute skewed variables? Sociol Methods Res. 2013;42(1):105–38.CrossRef von Hippel PT. Should a normal imputation model be modified to impute skewed variables? Sociol Methods Res. 2013;42(1):105–38.CrossRef
29.
go back to reference StataCorp. Stata statistical software: release 14. College Station: StataCorp LP; 2015. StataCorp. Stata statistical software: release 14. College Station: StataCorp LP; 2015.
30.
go back to reference Abayomi K, Gelman A, Levy M. Diagnostics for multivariate imputations. J R Stat Soc Ser C Appl Stat. 2008;57:273–91.CrossRef Abayomi K, Gelman A, Levy M. Diagnostics for multivariate imputations. J R Stat Soc Ser C Appl Stat. 2008;57:273–91.CrossRef
31.
go back to reference Rodwell L, Lee K, Romaniuk H, Carlin J. Comparison of methods for imputing limited-range variables: a simulation study. BMC Med Res Methodol. 2014;14(1):57.CrossRefPubMedPubMedCentral Rodwell L, Lee K, Romaniuk H, Carlin J. Comparison of methods for imputing limited-range variables: a simulation study. BMC Med Res Methodol. 2014;14(1):57.CrossRefPubMedPubMedCentral
32.
go back to reference Stuart EA, Azur M, Frangakis C, Leaf P. Multiple Imputation with large data sets: a case study of the children’s mental health initiative. Am J Epidemiol. 2009;169(9):1133–9.CrossRefPubMedPubMedCentral Stuart EA, Azur M, Frangakis C, Leaf P. Multiple Imputation with large data sets: a case study of the children’s mental health initiative. Am J Epidemiol. 2009;169(9):1133–9.CrossRefPubMedPubMedCentral
33.
go back to reference Su YS, Gelman A, Hill J, Yajima M. Multiple imputation with diagnostics (mi) in R: opening windows into the black box. J Stat Softw. 2011;45(2):1–31.CrossRef Su YS, Gelman A, Hill J, Yajima M. Multiple imputation with diagnostics (mi) in R: opening windows into the black box. J Stat Softw. 2011;45(2):1–31.CrossRef
34.
go back to reference Eddings W, Marchenko Y. Diagnostics for multiple imputation in Stata. Stata J. 2012;12(3):353. Eddings W, Marchenko Y. Diagnostics for multiple imputation in Stata. Stata J. 2012;12(3):353.
35.
go back to reference Nguyen CD, Carlin JB, Lee KJ. Diagnosing problems with imputation models using the Kolmogorov–Smirnov test: a simulation study. BMC Med Res Methodol. 2013;13:144.CrossRefPubMedPubMedCentral Nguyen CD, Carlin JB, Lee KJ. Diagnosing problems with imputation models using the Kolmogorov–Smirnov test: a simulation study. BMC Med Res Methodol. 2013;13:144.CrossRefPubMedPubMedCentral
36.
go back to reference Sipthorp M, Misson S. LSAC technical paper no. 6: Wave 3 weighting and non-response; 2009. Sipthorp M, Misson S. LSAC technical paper no. 6: Wave 3 weighting and non-response; 2009.
37.
go back to reference Bondarenko I, Raghunathan T. Graphical and numerical diagnostic tools to assess suitability of multiple imputations and imputation models. Stat Med. 2016;35(17):3007–20.CrossRefPubMed Bondarenko I, Raghunathan T. Graphical and numerical diagnostic tools to assess suitability of multiple imputations and imputation models. Stat Med. 2016;35(17):3007–20.CrossRefPubMed
39.
go back to reference Gelman A, King G, Liu CH. Not asked and not answered: multiple imputation for multiple surveys. J Am Stat Assoc. 1998;93(443):846–57.CrossRef Gelman A, King G, Liu CH. Not asked and not answered: multiple imputation for multiple surveys. J Am Stat Assoc. 1998;93(443):846–57.CrossRef
40.
go back to reference Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. 3rd ed. Boca Raton: CRC Press; 2013. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. 3rd ed. Boca Raton: CRC Press; 2013.
41.
go back to reference Nguyen CD, Lee KJ, Carlin JB. Posterior predictive checking of multiple imputation models. Biom J. 2015;57(4):676–94.CrossRefPubMed Nguyen CD, Lee KJ, Carlin JB. Posterior predictive checking of multiple imputation models. Biom J. 2015;57(4):676–94.CrossRefPubMed
42.
43.
go back to reference Gelman A, Van Mechelen I, Verbeke G, Heitjan DF, Meulders M. Multiple imputation for model checking: completed-data plots with missing and latent data. Biometrics. 2005;61(1):74–85.CrossRefPubMed Gelman A, Van Mechelen I, Verbeke G, Heitjan DF, Meulders M. Multiple imputation for model checking: completed-data plots with missing and latent data. Biometrics. 2005;61(1):74–85.CrossRefPubMed
44.
go back to reference Van Buuren S, Groothuis-Oudshoorn K. mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3):1–67.CrossRef Van Buuren S, Groothuis-Oudshoorn K. mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3):1–67.CrossRef
45.
go back to reference Honaker J, King G, Blackwell M. Amelia II: a program for missing data. J Stat Softw. 2011;45(7):1–47.CrossRef Honaker J, King G, Blackwell M. Amelia II: a program for missing data. J Stat Softw. 2011;45(7):1–47.CrossRef
46.
go back to reference Templ M, Alfons A, Kowarik A, Prantner B. VIM: visualization and imputation of missing values. Version 4.0 ed; 2013. Templ M, Alfons A, Kowarik A, Prantner B. VIM: visualization and imputation of missing values. Version 4.0 ed; 2013.
47.
go back to reference Brix P. miP: multiple imputation plots. Version 1.1 ed; 2012. Brix P. miP: multiple imputation plots. Version 1.1 ed; 2012.
48.
go back to reference SAS Institute Inc. SAS/STAT® 13.1 User’s Guide. Cary: SAS Institute Inc; 2013. SAS Institute Inc. SAS/STAT® 13.1 User’s Guide. Cary: SAS Institute Inc; 2013.
49.
go back to reference Bartlett JW, Seaman SR, White IR, Carpenter JR, for the Alzheimer’s Disease Neuroimaging Initiative. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Stat Methods Med Res. 2015;24(4):462–87.CrossRef Bartlett JW, Seaman SR, White IR, Carpenter JR, for the Alzheimer’s Disease Neuroimaging Initiative. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Stat Methods Med Res. 2015;24(4):462–87.CrossRef
50.
go back to reference White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med. 2010;29(28):2920–31.CrossRefPubMed White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med. 2010;29(28):2920–31.CrossRefPubMed
51.
go back to reference Ratitch B, O’Kelly M, Tosiello R. Missing data in clinical trials: from clinical assumptions to statistical analysis using pattern mixture models. Pharm Stat. 2013;12(6):337–47.CrossRefPubMed Ratitch B, O’Kelly M, Tosiello R. Missing data in clinical trials: from clinical assumptions to statistical analysis using pattern mixture models. Pharm Stat. 2013;12(6):337–47.CrossRefPubMed
52.
go back to reference Hayati Rezvan P, White IR, Lee KJ, Carlin JB, Simpson JA. Evaluation of a weighting approach for performing sensitivity analysis after multiple imputation. BMC Med Res Methodol. 2015;15(1):83.CrossRefPubMedCentral Hayati Rezvan P, White IR, Lee KJ, Carlin JB, Simpson JA. Evaluation of a weighting approach for performing sensitivity analysis after multiple imputation. BMC Med Res Methodol. 2015;15(1):83.CrossRefPubMedCentral
Metadata
Title
Model checking in multiple imputation: an overview and case study
Authors
Cattram D. Nguyen
John B. Carlin
Katherine J. Lee
Publication date
01-12-2017
Publisher
BioMed Central
Published in
Emerging Themes in Epidemiology / Issue 1/2017
Electronic ISSN: 1742-7622
DOI
https://doi.org/10.1186/s12982-017-0062-6

Other articles of this Issue 1/2017

Emerging Themes in Epidemiology 1/2017 Go to the issue