Skip to main content
Top
Published in: Prevention Science 3/2018

01-04-2018

Principled Missing Data Treatments

Published in: Prevention Science | Issue 3/2018

Login to get access

Abstract

We review a number of issues regarding missing data treatments for intervention and prevention researchers. Many of the common missing data practices in prevention research are still, unfortunately, ill-advised (e.g., use of listwise and pairwise deletion, insufficient use of auxiliary variables). Our goal is to promote better practice in the handling of missing data. We review the current state of missing data methodology and recent missing data reporting in prevention research. We describe antiquated, ad hoc missing data treatments and discuss their limitations. We discuss two modern, principled missing data treatments: multiple imputation and full information maximum likelihood, and we offer practical tips on how to best employ these methods in prevention research. The principled missing data treatments that we discuss are couched in terms of how they improve causal and statistical inference in the prevention sciences. Our recommendations are firmly grounded in missing data theory and well-validated statistical principles for handling the missing data issues that are ubiquitous in biosocial and prevention research. We augment our broad survey of missing data analysis with references to more exhaustive resources.
Literature
go back to reference Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
go back to reference Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Chichester, West Sussex: Wiley. Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Chichester, West Sussex: Wiley.
go back to reference Diggle, P., & Kenward, M. G. (1994). Informative dropout in longitudinal data analysis (with discussion). Applied Statistics, 43, 49–94.CrossRef Diggle, P., & Kenward, M. G. (1994). Informative dropout in longitudinal data analysis (with discussion). Applied Statistics, 43, 49–94.CrossRef
go back to reference Enders, C. K. (2001). The performance of the full information maximum likelihood estimator in multiple regression models with missing data. Educational and Psychological Measurement, 61, 713–740. doi:10.1177/00131640121971482.CrossRef Enders, C. K. (2001). The performance of the full information maximum likelihood estimator in multiple regression models with missing data. Educational and Psychological Measurement, 61, 713–740. doi:10.​1177/​0013164012197148​2.CrossRef
go back to reference Enders, C. K. (2010). Applied missing data analysis. New York: Guilford. Enders, C. K. (2010). Applied missing data analysis. New York: Guilford.
go back to reference Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457. doi:10.1207/S15328007SEM0803_5.CrossRef Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457. doi:10.​1207/​S15328007SEM0803​_​5.CrossRef
go back to reference Goldstein, H., Carpenter, J., & Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 177, 553–564. doi:10.1111/rssa.12022.CrossRef Goldstein, H., Carpenter, J., & Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 177, 553–564. doi:10.​1111/​rssa.​12022.CrossRef
go back to reference Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. The Annals of Economic and Social Measurement, 5, 475–492. Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. The Annals of Economic and Social Measurement, 5, 475–492.
go back to reference Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: a program for missing data. Journal of Statistical Software, 45, 1–47.CrossRef Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: a program for missing data. Journal of Statistical Software, 45, 1–47.CrossRef
go back to reference Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125–134. doi:10.2307/2290705. Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125–134. doi:10.​2307/​2290705.
go back to reference Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. Hoboken, NJ: John Wiley & Sons.CrossRef Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. Hoboken, NJ: John Wiley & Sons.CrossRef
go back to reference Little, T. D., Lang, K. M., Wu, W., & Rhemtulla, M. (2016). Missing data. In D. Cicchetti (Ed.), Developmental Psychopathology: Vol. 1. Theory and method (3rd ed., pp. 760–796). New York: Wiley. Little, T. D., Lang, K. M., Wu, W., & Rhemtulla, M. (2016). Missing data. In D. Cicchetti (Ed.), Developmental Psychopathology: Vol. 1. Theory and method (3rd ed., pp. 760–796). New York: Wiley.
go back to reference Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: a review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525–556. doi:10.3102/00346543074004525.CrossRef Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: a review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525–556. doi:10.​3102/​0034654307400452​5.CrossRef
go back to reference Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27, 85–96. Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27, 85–96.
go back to reference Rubin, D. B. (1978). Multiple imputations in sample surveys—a phenomenological Bayesian approach to nonresponse (Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 30–34). Rubin, D. B. (1978). Multiple imputations in sample surveys—a phenomenological Bayesian approach to nonresponse (Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 30–34).
go back to reference Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.CrossRef Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.CrossRef
go back to reference Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York: Chapman Hall.CrossRef Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York: Chapman Hall.CrossRef
go back to reference Schafer, J. L., & Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics., 11, 437–457. doi:10.1198/106186002760180608.CrossRef Schafer, J. L., & Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics., 11, 437–457. doi:10.​1198/​1061860027601806​08.CrossRef
go back to reference van Buuren, S. (2011). Multiple imputation of multilevel data. In J. Hox & J. Roberts (Eds.), Handbook of advanced multilevel analysis (pp. 173–196). Milton Park, UK: Routledge. van Buuren, S. (2011). Multiple imputation of multilevel data. In J. Hox & J. Roberts (Eds.), Handbook of advanced multilevel analysis (pp. 173–196). Milton Park, UK: Routledge.
go back to reference van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press.CrossRef van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press.CrossRef
go back to reference van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–67.CrossRef van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–67.CrossRef
go back to reference van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 1049–1064. doi:10.1080/10629360600810434.CrossRef van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 1049–1064. doi:10.​1080/​1062936060081043​4.CrossRef
go back to reference Yucel, R. M. (2008). Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philosophical Transactions of the Royal Society A, 366, 2389–2403. doi:10.1098/rsta.2008.0038.CrossRef Yucel, R. M. (2008). Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philosophical Transactions of the Royal Society A, 366, 2389–2403. doi:10.​1098/​rsta.​2008.​0038.CrossRef
go back to reference Zhao, J. H., & Schafer, J. L. (2013). pan: multiple imputation for multivariate panel or clustered data (Version 0.9) [R Package]. Zhao, J. H., & Schafer, J. L. (2013). pan: multiple imputation for multivariate panel or clustered data (Version 0.9) [R Package].
go back to reference Zhao, E., & Yucel, R. M. (2009). Performance of sequential imputation method in multilevel applications. In the Proceedings of the American Statistical Association Survey Research Methods Section (pp. 2800–2810). Zhao, E., & Yucel, R. M. (2009). Performance of sequential imputation method in multilevel applications. In the Proceedings of the American Statistical Association Survey Research Methods Section (pp. 2800–2810).
Metadata
Title
Principled Missing Data Treatments
Publication date
01-04-2018
Published in
Prevention Science / Issue 3/2018
Print ISSN: 1389-4986
Electronic ISSN: 1573-6695
DOI
https://doi.org/10.1007/s11121-016-0644-5

Other articles of this Issue 3/2018

Prevention Science 3/2018 Go to the issue