Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2014

Open Access 01-12-2014 | Research article

Tuning multiple imputation by predictive mean matching and local residual draws

Authors: Tim P Morris, Ian R White, Patrick Royston

Published in: BMC Medical Research Methodology | Issue 1/2014

Login to get access

Abstract

Background

Multiple imputation is a commonly used method for handling incomplete covariates as it can provide valid inference when data are missing at random. This depends on being able to correctly specify the parametric model used to impute missing values, which may be difficult in many realistic settings. Imputation by predictive mean matching (PMM) borrows an observed value from a donor with a similar predictive mean; imputation by local residual draws (LRD) instead borrows the donor’s residual. Both methods relax some assumptions of parametric imputation, promising greater robustness when the imputation model is misspecified.

Methods

We review development of PMM and LRD and outline the various forms available, and aim to clarify some choices about how and when they should be used. We compare performance to fully parametric imputation in simulation studies, first when the imputation model is correctly specified and then when it is misspecified.

Results

In using PMM or LRD we strongly caution against using a single donor, the default value in some implementations, and instead advocate sampling from a pool of around 10 donors. We also clarify which matching metric is best. Among the current MI software there are several poor implementations.

Conclusions

PMM and LRD may have a role for imputing covariates (i) which are not strongly associated with outcome, and (ii) when the imputation model is thought to be slightly but not grossly misspecified. Researchers should spend efforts on specifying the imputation model correctly, rather than expecting predictive mean matching or local residual draws to do the work.
Appendix
Available only for authorised users
Literature
1.
go back to reference Harel O, Zhou XH: Multiple imputation: review of theory, implementation and software. Stat Med. 2007, 26: 3057-3077. 10.1002/sim.2787.CrossRefPubMed Harel O, Zhou XH: Multiple imputation: review of theory, implementation and software. Stat Med. 2007, 26: 3057-3077. 10.1002/sim.2787.CrossRefPubMed
2.
go back to reference Horton NJ, Kleinman KP: Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am Stat. 2007, 61: 79-90. 10.1198/000313007X172556.CrossRefPubMedPubMedCentral Horton NJ, Kleinman KP: Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am Stat. 2007, 61: 79-90. 10.1198/000313007X172556.CrossRefPubMedPubMedCentral
3.
go back to reference White IR, Royston P, Wood AM: Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011, 30 (4): 377-399. 10.1002/sim.4067.CrossRefPubMed White IR, Royston P, Wood AM: Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011, 30 (4): 377-399. 10.1002/sim.4067.CrossRefPubMed
4.
go back to reference Rubin DB: Inference and missing data. Biometrika. 1976, 63: 581-592. 10.1093/biomet/63.3.581.CrossRef Rubin DB: Inference and missing data. Biometrika. 1976, 63: 581-592. 10.1093/biomet/63.3.581.CrossRef
5.
go back to reference Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: John Wiley and SonsCrossRef Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: John Wiley and SonsCrossRef
6.
go back to reference Schafer JL: Multiple imputation: a primer. Stat Methods Med Res. 1999, 8 (1): 3-15. 10.1191/096228099671525676.CrossRefPubMed Schafer JL: Multiple imputation: a primer. Stat Methods Med Res. 1999, 8 (1): 3-15. 10.1191/096228099671525676.CrossRefPubMed
7.
go back to reference Moons K, Donders R, Stijnen T, Harrel F: Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006, 59 (10): 1092-1101. 10.1016/j.jclinepi.2006.01.009.CrossRefPubMed Moons K, Donders R, Stijnen T, Harrel F: Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006, 59 (10): 1092-1101. 10.1016/j.jclinepi.2006.01.009.CrossRefPubMed
8.
go back to reference Seaman SR, Bartlett JW, White IR: Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Med Res Methodol. 2012, 12 (1): 46+-10.1186/1471-2288-12-46.CrossRefPubMedPubMedCentral Seaman SR, Bartlett JW, White IR: Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Med Res Methodol. 2012, 12 (1): 46+-10.1186/1471-2288-12-46.CrossRefPubMedPubMedCentral
9.
go back to reference Little RJA: Missing-data adjustments in large surveys. J Business & Econ Stat. 1988, 6: 287-296. Little RJA: Missing-data adjustments in large surveys. J Business & Econ Stat. 1988, 6: 287-296.
10.
go back to reference David M, Little RJA, Samuhel ME, Triest RK: Alternative methods for CPS income imputation. J Am Stat Assoc. 1986, 81 (393): 29-41. 10.1080/01621459.1986.10478235.CrossRef David M, Little RJA, Samuhel ME, Triest RK: Alternative methods for CPS income imputation. J Am Stat Assoc. 1986, 81 (393): 29-41. 10.1080/01621459.1986.10478235.CrossRef
11.
go back to reference Rubin DB, Schenker N: Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc. 1986, 81: 366-374. 10.1080/01621459.1986.10478280.CrossRef Rubin DB, Schenker N: Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc. 1986, 81: 366-374. 10.1080/01621459.1986.10478280.CrossRef
12.
go back to reference van Buuren S, Groothuis-Oudshoorn K: Mice: Multivariate Imputation by Chained Equations. February 2014, Netherlands Organisation for Applied Scientific Research TNO van Buuren S, Groothuis-Oudshoorn K: Mice: Multivariate Imputation by Chained Equations. February 2014, Netherlands Organisation for Applied Scientific Research TNO
13.
go back to reference Meinfelder F: BaBooN: Bayesian Bootstrap Predictive Mean Matching – Multiple and single imputation for discrete data. March 2011, Universität Bamberg Meinfelder F: BaBooN: Bayesian Bootstrap Predictive Mean Matching – Multiple and single imputation for discrete data. March 2011, Universität Bamberg
14.
go back to reference Gelman A, Hill J, Su YS, Yajima M, Pittau MG: mi: Missing Data Imputation and Model Checking. August 2013, Columbia University Gelman A, Hill J, Su YS, Yajima M, Pittau MG: mi: Missing Data Imputation and Model Checking. August 2013, Columbia University
19.
go back to reference Schenker N, Taylor JMG: Partially parametric techniques for multiple imputation. Comput Stat & Data Anal. 1996, 22 (4): 425-446. 10.1016/0167-9473(95)00057-7.CrossRef Schenker N, Taylor JMG: Partially parametric techniques for multiple imputation. Comput Stat & Data Anal. 1996, 22 (4): 425-446. 10.1016/0167-9473(95)00057-7.CrossRef
20.
go back to reference Heitjan DF, Little RJA: Multiple imputation for the fatal accident reporting system. J R Stat Soc Series C (Appl Stat). 1991, 40 (1): 13-29. Heitjan DF, Little RJA: Multiple imputation for the fatal accident reporting system. J R Stat Soc Series C (Appl Stat). 1991, 40 (1): 13-29.
21.
go back to reference Royston P: Multiple imputation of missing values: update. Stata J. 2005, 5: 527-536. Royston P: Multiple imputation of missing values: update. Stata J. 2005, 5: 527-536.
22.
go back to reference Harrell FE: Hmisc: Harrell Miscellaneous. January 2014, Vanderbilt University Harrell FE: Hmisc: Harrell Miscellaneous. January 2014, Vanderbilt University
23.
go back to reference Heitjan DF, Landis RJ: Assessing secular trends in blood pressure: a multiple-imputation approach. J Am Stat Assoc. 1994, 89 (427): 750-759. 10.1080/01621459.1994.10476808.CrossRef Heitjan DF, Landis RJ: Assessing secular trends in blood pressure: a multiple-imputation approach. J Am Stat Assoc. 1994, 89 (427): 750-759. 10.1080/01621459.1994.10476808.CrossRef
24.
go back to reference Zhou XH, Eckert GJ, Tierney WM: Multiple imputation in public health research. Stat Med. 2001, 20 (9-10): 1541-1549. 10.1002/sim.689.CrossRefPubMed Zhou XH, Eckert GJ, Tierney WM: Multiple imputation in public health research. Stat Med. 2001, 20 (9-10): 1541-1549. 10.1002/sim.689.CrossRefPubMed
25.
go back to reference Horton NJ, Lipsitz SR: Multiple imputation in practice: comparison of software packages for regression models with missing variables. Am Stat. 2001, 55: 244-254. 10.1198/000313001317098266.CrossRef Horton NJ, Lipsitz SR: Multiple imputation in practice: comparison of software packages for regression models with missing variables. Am Stat. 2001, 55: 244-254. 10.1198/000313001317098266.CrossRef
26.
go back to reference Tang L, Song J, Belin TR, Unützer J: A comparison of imputation methods in a longitudinal randomized clinical trial. Stat Med. 2005, 24 (14): 2111-2128. 10.1002/sim.2099.CrossRefPubMed Tang L, Song J, Belin TR, Unützer J: A comparison of imputation methods in a longitudinal randomized clinical trial. Stat Med. 2005, 24 (14): 2111-2128. 10.1002/sim.2099.CrossRefPubMed
27.
go back to reference Hsu CH, Taylor JMG, Murray S, Commenges D: Survival analysis using auxiliary variables via non-parametric multiple imputation. Stat Med. 2006, 25 (20): 3503-3517. 10.1002/sim.2452.CrossRefPubMed Hsu CH, Taylor JMG, Murray S, Commenges D: Survival analysis using auxiliary variables via non-parametric multiple imputation. Stat Med. 2006, 25 (20): 3503-3517. 10.1002/sim.2452.CrossRefPubMed
28.
go back to reference Barnes SA, Lindborg SR, Seaman JW: Multiple imputation techniques in small sample clinical trials. Stat Med. 2006, 25 (2): 233-245. 10.1002/sim.2231.CrossRefPubMed Barnes SA, Lindborg SR, Seaman JW: Multiple imputation techniques in small sample clinical trials. Stat Med. 2006, 25 (2): 233-245. 10.1002/sim.2231.CrossRefPubMed
29.
go back to reference Qi L, Wang Y-FF, He Y: A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates. Stat Med. 2010, 29 (25): 2592-2604. 10.1002/sim.4016.CrossRefPubMedPubMedCentral Qi L, Wang Y-FF, He Y: A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates. Stat Med. 2010, 29 (25): 2592-2604. 10.1002/sim.4016.CrossRefPubMedPubMedCentral
30.
go back to reference Siddique J, Belin TR: Multiple imputation using an iterative hot-deck with distance-based donor selection. Stat Med. 2008, 27 (1): 83-102. 10.1002/sim.3001.CrossRefPubMed Siddique J, Belin TR: Multiple imputation using an iterative hot-deck with distance-based donor selection. Stat Med. 2008, 27 (1): 83-102. 10.1002/sim.3001.CrossRefPubMed
31.
go back to reference Siddique J, Harel O: MIDAS: a SAS macro for multiple imputation using distance-aided selection of donors. J Stat Softw. 2009, 29 (9): 1-18.CrossRef Siddique J, Harel O: MIDAS: a SAS macro for multiple imputation using distance-aided selection of donors. J Stat Softw. 2009, 29 (9): 1-18.CrossRef
32.
go back to reference Moriarity C, Scheuren F: A note on rubin’s statistical matching using file concatenation with adjusted weights and multiple imputations. J Business & Econ Stat. 2003, 21 (1): 65-73. 10.1198/073500102288618766.CrossRef Moriarity C, Scheuren F: A note on rubin’s statistical matching using file concatenation with adjusted weights and multiple imputations. J Business & Econ Stat. 2003, 21 (1): 65-73. 10.1198/073500102288618766.CrossRef
33.
go back to reference Durrant GB, Skinner C: Using missing data methods to correct for measurement error in a distribution function. Surv Methodol. 2006, 32 (1): 25-36. Durrant GB, Skinner C: Using missing data methods to correct for measurement error in a distribution function. Surv Methodol. 2006, 32 (1): 25-36.
34.
go back to reference StataCorp: Stata Statistical Software: Release 13. 2013, College Station, TX: Stata Press StataCorp: Stata Statistical Software: Release 13. 2013, College Station, TX: Stata Press
36.
go back to reference Morris TP, White IR, Royston P, Seaman SR, Wood AM: Multiple imputation for an incomplete covariate that is a ratio. Stat Med. 2014, 33 (1): 88-104. 10.1002/sim.5935.CrossRefPubMed Morris TP, White IR, Royston P, Seaman SR, Wood AM: Multiple imputation for an incomplete covariate that is a ratio. Stat Med. 2014, 33 (1): 88-104. 10.1002/sim.5935.CrossRefPubMed
37.
go back to reference Clark TG, Altman DG: Developing a prognostic model in the presence of missing data. J Clin Epidemiol. 2003, 56 (1): 28-37. 10.1016/S0895-4356(02)00539-5.CrossRefPubMed Clark TG, Altman DG: Developing a prognostic model in the presence of missing data. J Clin Epidemiol. 2003, 56 (1): 28-37. 10.1016/S0895-4356(02)00539-5.CrossRefPubMed
38.
go back to reference Cox DR: Regression models and life tables. J R Stat Soc series B. 1972, 34: 187-220. Cox DR: Regression models and life tables. J R Stat Soc series B. 1972, 34: 187-220.
40.
go back to reference Dardanoni V, Modica S, Peracchi F: Regression with imputed covariates: A generalized missing-indicator approach. J Econom. 2011, 162 (2): 362-368. 10.1016/j.jeconom.2011.02.005.CrossRef Dardanoni V, Modica S, Peracchi F: Regression with imputed covariates: A generalized missing-indicator approach. J Econom. 2011, 162 (2): 362-368. 10.1016/j.jeconom.2011.02.005.CrossRef
41.
go back to reference Vink G, van Buuren S: Multiple imputation of squared terms. Sociol Methods & Res. 2013, 42 (4): 598-607. 10.1177/0049124113502943.CrossRef Vink G, van Buuren S: Multiple imputation of squared terms. Sociol Methods & Res. 2013, 42 (4): 598-607. 10.1177/0049124113502943.CrossRef
Metadata
Title
Tuning multiple imputation by predictive mean matching and local residual draws
Authors
Tim P Morris
Ian R White
Patrick Royston
Publication date
01-12-2014
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2014
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-14-75

Other articles of this Issue 1/2014

BMC Medical Research Methodology 1/2014 Go to the issue