Skip to main content
Top
Published in: Emerging Themes in Epidemiology 1/2021

Open Access 01-12-2021 | Analytic perspective

Practical strategies for handling breakdown of multiple imputation procedures

Authors: Cattram D. Nguyen, John B. Carlin, Katherine J. Lee

Published in: Emerging Themes in Epidemiology | Issue 1/2021

Login to get access

Abstract

Multiple imputation is a recommended method for handling incomplete data problems. One of the barriers to its successful use is the breakdown of the multiple imputation procedure, often due to numerical problems with the algorithms used within the imputation process. These problems frequently occur when imputation models contain large numbers of variables, especially with the popular approach of multivariate imputation by chained equations. This paper describes common causes of failure of the imputation procedure including perfect prediction and collinearity, focusing on issues when using Stata software. We outline a number of strategies for addressing these issues, including imputation of composite variables instead of individual components, introducing prior information and changing the form of the imputation model. These strategies are illustrated using a case study based on data from the Longitudinal Study of Australian Children.
Appendix
Available only for authorised users
Literature
1.
go back to reference Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.CrossRef Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.CrossRef
2.
go back to reference Schafer JL. Analysis of incomplete multivariate data. London: Chapman & Hall; 1997.CrossRef Schafer JL. Analysis of incomplete multivariate data. London: Chapman & Hall; 1997.CrossRef
3.
go back to reference Van Buuren S, Groothuis-Oudshoorn K. mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45:1–67.CrossRef Van Buuren S, Groothuis-Oudshoorn K. mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45:1–67.CrossRef
4.
go back to reference Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Pract. 2001;27:85–96. Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Pract. 2001;27:85–96.
5.
go back to reference van Buuren S. Flexible imputation of missing data. Boca Raton: CRC Press; 2012.CrossRef van Buuren S. Flexible imputation of missing data. Boca Raton: CRC Press; 2012.CrossRef
6.
go back to reference Millar RB. Maximum likelihood estimation and inference: with examples in R, SAS and ADMB. Hoboken: Wiley; 2011.CrossRef Millar RB. Maximum likelihood estimation and inference: with examples in R, SAS and ADMB. Hoboken: Wiley; 2011.CrossRef
7.
go back to reference Hayati Rezvan P, Lee KJ, Simpson JA. The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol. 2015;15:30.CrossRef Hayati Rezvan P, Lee KJ, Simpson JA. The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol. 2015;15:30.CrossRef
8.
go back to reference Rombach I, Gray AM, Jenkinson C, Murray DW, Rivero-Arias O. Multiple imputation for patient reported outcome measures in randomised controlled trials: advantages and disadvantages of imputing at the item, subscale or composite score level. BMC Med Res Methodol. 2018;18:87.CrossRef Rombach I, Gray AM, Jenkinson C, Murray DW, Rivero-Arias O. Multiple imputation for patient reported outcome measures in randomised controlled trials: advantages and disadvantages of imputing at the item, subscale or composite score level. BMC Med Res Methodol. 2018;18:87.CrossRef
9.
go back to reference White IR, Daniel R, Royston P. Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables. Comput Stat Data Anal. 2010;54:2267–75.CrossRef White IR, Daniel R, Royston P. Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables. Comput Stat Data Anal. 2010;54:2267–75.CrossRef
10.
go back to reference Lee K, Roberts G, Doyle L, Anderson P, Carlin J. Multiple imputation for missing data in a longitudinal cohort study: a tutorial based on a detailed case study involving imputation of missing outcome data. Int J Soc Res Methodol. 2016;19:575–91.CrossRef Lee K, Roberts G, Doyle L, Anderson P, Carlin J. Multiple imputation for missing data in a longitudinal cohort study: a tutorial based on a detailed case study involving imputation of missing outcome data. Int J Soc Res Methodol. 2016;19:575–91.CrossRef
11.
go back to reference Nicholson J, Sanson A, Ungerer J, Wilson K, Zubrick S. Introducing the longitudinal study of Australian children—LSAC Discussion Paper No.1. Australian Institute of Family Studies. 2002. Nicholson J, Sanson A, Ungerer J, Wilson K, Zubrick S. Introducing the longitudinal study of Australian children—LSAC Discussion Paper No.1. Australian Institute of Family Studies. 2002.
12.
go back to reference Sawyer MG, Harchak T, Wake M, Lynch J. Four-year prospective study of BMI and mental health problems in young children. Pediatrics. 2011;128:677.CrossRef Sawyer MG, Harchak T, Wake M, Lynch J. Four-year prospective study of BMI and mental health problems in young children. Pediatrics. 2011;128:677.CrossRef
13.
go back to reference Varni JW, Burwinkle TM, Seid M, Skarr D. The PedsQLTM 4.0 as a pediatric population health measure: feasibility, reliability, and validity. Ambul Pediatr. 2003;3:329–41.CrossRef Varni JW, Burwinkle TM, Seid M, Skarr D. The PedsQLTM 4.0 as a pediatric population health measure: feasibility, reliability, and validity. Ambul Pediatr. 2003;3:329–41.CrossRef
14.
go back to reference White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30:377–99.CrossRef White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30:377–99.CrossRef
15.
go back to reference Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393.CrossRef Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393.CrossRef
16.
go back to reference StataCorp. Stata Statistical Software: Release 15. College Station: StataCorp LP. 2017. StataCorp. Stata Statistical Software: Release 15. College Station: StataCorp LP. 2017.
17.
go back to reference Heinze G, Schemper M. A solution to the problem of separation in logistic regression. Stat Med. 2002;21:2409–19.CrossRef Heinze G, Schemper M. A solution to the problem of separation in logistic regression. Stat Med. 2002;21:2409–19.CrossRef
18.
go back to reference Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression. Hoboken, NJ: Wiley; 2013.CrossRef Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression. Hoboken, NJ: Wiley; 2013.CrossRef
19.
go back to reference Midi H, Sarkar SK, Rana S. Collinearity diagnostics of binary logistic regression model. J Interdiscip Math. 2010;13:253–67.CrossRef Midi H, Sarkar SK, Rana S. Collinearity diagnostics of binary logistic regression model. J Interdiscip Math. 2010;13:253–67.CrossRef
20.
go back to reference Lee KJ, Galati JC, Simpson JA, Carlin JB. Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study. Stat Med. 2012;31:4164–74.CrossRef Lee KJ, Galati JC, Simpson JA, Carlin JB. Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study. Stat Med. 2012;31:4164–74.CrossRef
21.
go back to reference Collins LM, Schafer JL, Kam CM. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods. 2001;6:330–51.CrossRef Collins LM, Schafer JL, Kam CM. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods. 2001;6:330–51.CrossRef
22.
go back to reference Kleinbaum D, Kupper L, Nizam A, Rosenberg E. Applied regression analysis and other multivariable methods. Boston, MA : Cengage Learning, 2013. Kleinbaum D, Kupper L, Nizam A, Rosenberg E. Applied regression analysis and other multivariable methods. Boston, MA : Cengage Learning, 2013.
23.
go back to reference Meng X-L. Multiple-imputation inferences with uncongenial sources of input. Stat Sci. 1994;9:538–58. Meng X-L. Multiple-imputation inferences with uncongenial sources of input. Stat Sci. 1994;9:538–58.
25.
26.
go back to reference Howard WJ, Rhemtulla M, Little TD. Using principal components as auxiliary variables in missing data estimation. Multivar Behav Res. 2015;50:285–99.CrossRef Howard WJ, Rhemtulla M, Little TD. Using principal components as auxiliary variables in missing data estimation. Multivar Behav Res. 2015;50:285–99.CrossRef
27.
go back to reference Plumpton CO, Morris T, Hughes DA, White IR. Multiple imputation of multiple multi-item scales when a full imputation model is infeasible. BMC Res Notes. 2016;9:45.CrossRef Plumpton CO, Morris T, Hughes DA, White IR. Multiple imputation of multiple multi-item scales when a full imputation model is infeasible. BMC Res Notes. 2016;9:45.CrossRef
28.
go back to reference Bell ML, Fairclough DL, Fiero MH, Butow PN. Handling missing items in the hospital anxiety and depression scale (HADS): a simulation study. BMC Res Notes. 2016;9:479.CrossRef Bell ML, Fairclough DL, Fiero MH, Butow PN. Handling missing items in the hospital anxiety and depression scale (HADS): a simulation study. BMC Res Notes. 2016;9:479.CrossRef
29.
go back to reference Eekhout I, de Vet HCW, Twisk JWR, Brand JPL, de Boer MR, Heymans MW. Missing data in a multi-item instrument were best handled by multiple imputation at the item score level. J Clin Epidemiol. 2014;67:335–42.CrossRef Eekhout I, de Vet HCW, Twisk JWR, Brand JPL, de Boer MR, Heymans MW. Missing data in a multi-item instrument were best handled by multiple imputation at the item score level. J Clin Epidemiol. 2014;67:335–42.CrossRef
30.
go back to reference Gottschall AC, West SG, Enders CK. A Comparison of item-level and scale-level multiple imputation for questionnaire batteries. Multivar Behav Res. 2012;47:1–25.CrossRef Gottschall AC, West SG, Enders CK. A Comparison of item-level and scale-level multiple imputation for questionnaire batteries. Multivar Behav Res. 2012;47:1–25.CrossRef
31.
go back to reference SAS Institute Inc. SAS/STAT®15.1 User’s Guide. Cary: SAS Institute Inc; 2018. SAS Institute Inc. SAS/STAT®15.1 User’s Guide. Cary: SAS Institute Inc; 2018.
32.
go back to reference Honaker J, King G, Blackwell M. Amelia II: a program for missing data. J Stat Softw. 2011;45:1–47.CrossRef Honaker J, King G, Blackwell M. Amelia II: a program for missing data. J Stat Softw. 2011;45:1–47.CrossRef
33.
go back to reference Su YS, Gelman A, Hill J, Yajima M. Multiple imputation with diagnostics (mi) in R: opening windows into the black box. J Stat Softw. 2011;45:1–31.CrossRef Su YS, Gelman A, Hill J, Yajima M. Multiple imputation with diagnostics (mi) in R: opening windows into the black box. J Stat Softw. 2011;45:1–31.CrossRef
34.
go back to reference Gelman A, Jakulin A, Pittau MG, Su Y-S. A weakly informative default prior distribution for logistic and other regression models. Ann Appl Stat. 2008;2:1360–83. Gelman A, Jakulin A, Pittau MG, Su Y-S. A weakly informative default prior distribution for logistic and other regression models. Ann Appl Stat. 2008;2:1360–83.
35.
go back to reference Morris TP, White IR, Royston P. Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med Res Methodol. 2014;14:75.CrossRef Morris TP, White IR, Royston P. Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med Res Methodol. 2014;14:75.CrossRef
36.
go back to reference Wu W, Jia F, Enders C. A comparison of imputation strategies for ordinal missing data on likert scale variables. Multivar Behav Res. 2015;50:484–503.CrossRef Wu W, Jia F, Enders C. A comparison of imputation strategies for ordinal missing data on likert scale variables. Multivar Behav Res. 2015;50:484–503.CrossRef
38.
go back to reference Nevalainen J, Kenward MG, Virtanen SM. Missing values in longitudinal dietary data: a multiple imputation approach based on a fully conditional specification. Stat Med. 2009;28:3657–69.CrossRef Nevalainen J, Kenward MG, Virtanen SM. Missing values in longitudinal dietary data: a multiple imputation approach based on a fully conditional specification. Stat Med. 2009;28:3657–69.CrossRef
39.
go back to reference Welch C, Bartlett J, Petersen I. Application of multiple imputation using the two-fold fully conditional specification algorithm in longitudinal clinical data. Stand Genomic Sci. 2014;14:418–31. Welch C, Bartlett J, Petersen I. Application of multiple imputation using the two-fold fully conditional specification algorithm in longitudinal clinical data. Stand Genomic Sci. 2014;14:418–31.
41.
go back to reference Carpenter JR, Kenward MG. Multiple imputation and its application. Chichester: Wiley, 2013.CrossRef Carpenter JR, Kenward MG. Multiple imputation and its application. Chichester: Wiley, 2013.CrossRef
42.
go back to reference Huque MH, Carlin JB, Simpson JA, Lee KJ. A comparison of multiple imputation methods for missing data in longitudinal studies. BMC Med Res Methodol. 2018;18:168.CrossRef Huque MH, Carlin JB, Simpson JA, Lee KJ. A comparison of multiple imputation methods for missing data in longitudinal studies. BMC Med Res Methodol. 2018;18:168.CrossRef
43.
go back to reference Nguyen CD, Carlin JB, Lee KJ. Model checking in multiple imputation: an overview and case study. Emerg Themes Epidemiol. 2017;14:8.CrossRef Nguyen CD, Carlin JB, Lee KJ. Model checking in multiple imputation: an overview and case study. Emerg Themes Epidemiol. 2017;14:8.CrossRef
44.
go back to reference Abayomi K, Gelman A, Levy M. Diagnostics for multivariate imputations. J Royal Stat Soc Ser C Appl Stat. 2008;57:273–91.CrossRef Abayomi K, Gelman A, Levy M. Diagnostics for multivariate imputations. J Royal Stat Soc Ser C Appl Stat. 2008;57:273–91.CrossRef
Metadata
Title
Practical strategies for handling breakdown of multiple imputation procedures
Authors
Cattram D. Nguyen
John B. Carlin
Katherine J. Lee
Publication date
01-12-2021
Publisher
BioMed Central
Published in
Emerging Themes in Epidemiology / Issue 1/2021
Electronic ISSN: 1742-7622
DOI
https://doi.org/10.1186/s12982-021-00095-3

Other articles of this Issue 1/2021

Emerging Themes in Epidemiology 1/2021 Go to the issue