Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2018

Open Access 01-12-2018 | Research article

A comparison of multiple imputation methods for missing data in longitudinal studies

Authors: Md Hamidul Huque, John B. Carlin, Julie A. Simpson, Katherine J. Lee

Published in: BMC Medical Research Methodology | Issue 1/2018

Login to get access

Abstract

Background

Multiple imputation (MI) is now widely used to handle missing data in longitudinal studies. Several MI techniques have been proposed to impute incomplete longitudinal covariates, including standard fully conditional specification (FCS-Standard) and joint multivariate normal imputation (JM-MVN), which treat repeated measurements as distinct variables, and various extensions based on generalized linear mixed models. Although these MI approaches have been implemented in various software packages, there has not been a comprehensive evaluation of the relative performance of these methods in the context of longitudinal data.

Method

Using both empirical data and a simulation study based on data from the six waves of the Longitudinal Study of Australian Children (N = 4661), we investigated the performance of a wide range of MI methods available in standard software packages for investigating the association between child body mass index (BMI) and quality of life using both a linear regression and a linear mixed-effects model.

Results

In this paper, we have identified and compared 12 different MI methods for imputing missing data in longitudinal studies. Analysis of simulated data under missing at random (MAR) mechanisms showed that the generally available MI methods provided less biased estimates with better coverage for the linear regression model and around half of these methods performed well for the estimation of regression parameters for a linear mixed model with random intercept. With the observed data, we observed an inverse association between child BMI and quality of life, with available data as well as multiple imputation.

Conclusion

Both FCS-Standard and JM-MVN performed well for the estimation of regression parameters in both analysis models. More complex methods that explicitly reflect the longitudinal structure for these analysis models may only be needed in specific circumstances such as irregularly spaced data.
Appendix
Available only for authorised users
Literature
1.
go back to reference Diggle P, Heagerty P, Liang KY, Zeger S. Analysis of longitudinal data. Oxford: Oxford University Press; 2013. Diggle P, Heagerty P, Liang KY, Zeger S. Analysis of longitudinal data. Oxford: Oxford University Press; 2013.
2.
go back to reference Fitzmaurice GM, Laird NM, Ware JH. Applied longitudinal analysis. Hoboken: Wiley; 2012. Fitzmaurice GM, Laird NM, Ware JH. Applied longitudinal analysis. Hoboken: Wiley; 2012.
4.
go back to reference Little RJ, Rubin DB. Statistical analysis with missing data. Hoboken: Wiley; 1987. Little RJ, Rubin DB. Statistical analysis with missing data. Hoboken: Wiley; 1987.
6.
go back to reference Rezvan PH, Lee KJ, Simpson JA. The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol. 2015;15(1):30–43.CrossRef Rezvan PH, Lee KJ, Simpson JA. The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol. 2015;15(1):30–43.CrossRef
7.
go back to reference SAS Institute, Base SAS 9. 4 Procedures Guide: Statistical Procedures. Cary: SAS Institute; 2014. SAS Institute, Base SAS 9. 4 Procedures Guide: Statistical Procedures. Cary: SAS Institute; 2014.
8.
go back to reference Stata Corporation, Stata statistical software, Release 13, College Station, Texas, TX, USA. 2013. Stata Corporation, Stata statistical software, Release 13, College Station, Texas, TX, USA. 2013.
9.
go back to reference R Core Team, R: A language and environment for statistical computing. R Foundation for statistical computing, Vienna, Austria. 2013. R Core Team, R: A language and environment for statistical computing. R Foundation for statistical computing, Vienna, Austria. 2013.
10.
go back to reference Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey methodology. 2001;27(1):85–96. Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey methodology. 2001;27(1):85–96.
12.
go back to reference Schafer JL. Analysis of incomplete multivariate data. New York: Chapman & Hall; 1997. Schafer JL. Analysis of incomplete multivariate data. New York: Chapman & Hall; 1997.
13.
go back to reference Van Buuren S, Brand JP, Groothuis-Oudshoorn C, Rubin DB. Fully conditional specification in multivariate imputation. J Stat Comput Simul. 2006;76(12):1049–64.CrossRef Van Buuren S, Brand JP, Groothuis-Oudshoorn C, Rubin DB. Fully conditional specification in multivariate imputation. J Stat Comput Simul. 2006;76(12):1049–64.CrossRef
14.
go back to reference Schafer JL, Yucel RM. Computational strategies for multivariate linear mixed-effects models with missing values. J Comput Graph Stat. 2002;11(2):437–57.CrossRef Schafer JL, Yucel RM. Computational strategies for multivariate linear mixed-effects models with missing values. J Comput Graph Stat. 2002;11(2):437–57.CrossRef
15.
go back to reference Goldstein H, Carpenter J, Kenward MG, Levin KA. Multilevel models with multivariate mixed response types. Stat Model. 2009;9(3):173–97.CrossRef Goldstein H, Carpenter J, Kenward MG, Levin KA. Multilevel models with multivariate mixed response types. Stat Model. 2009;9(3):173–97.CrossRef
16.
go back to reference Quartagno M, Carpenter J. Multiple imputation for IPD meta-analysis: allowing for heterogeneity and studies with missing covariates. Stat Med. 2015;35(17):2938–54.CrossRefPubMedPubMedCentral Quartagno M, Carpenter J. Multiple imputation for IPD meta-analysis: allowing for heterogeneity and studies with missing covariates. Stat Med. 2015;35(17):2938–54.CrossRefPubMedPubMedCentral
19.
go back to reference Van Buuren S. Multiple imputation of multilevel data. Handbook of advanced multilevel analysis, Taylor & Francis Group, New York, USA 2011;173–96. Van Buuren S. Multiple imputation of multilevel data. Handbook of advanced multilevel analysis, Taylor & Francis Group,  New York, USA 2011;173–96.
20.
go back to reference Nevalainen J, Kenward MG, Virtanen SM. Missing values in longitudinal dietary data: a multiple imputation approach based on a fully conditional specification. Stat Med. 2009;28(29):3657–69.CrossRefPubMed Nevalainen J, Kenward MG, Virtanen SM. Missing values in longitudinal dietary data: a multiple imputation approach based on a fully conditional specification. Stat Med. 2009;28(29):3657–69.CrossRefPubMed
21.
go back to reference Enders CK, Mistler SA, Keller BT. Multilevel multiple imputation: a review and evaluation of joint modeling and chained equations imputation. Psychol Methods. 2016;21(2):222–40.CrossRefPubMed Enders CK, Mistler SA, Keller BT. Multilevel multiple imputation: a review and evaluation of joint modeling and chained equations imputation. Psychol Methods. 2016;21(2):222–40.CrossRefPubMed
24.
go back to reference Lüdtke O, Robitzsch A, Grund S. Multiple imputation of missing data in multilevel designs: a comparison of different strategies. Psychol Methods. 2017;22(1):141–65.CrossRefPubMed Lüdtke O, Robitzsch A, Grund S. Multiple imputation of missing data in multilevel designs: a comparison of different strategies. Psychol Methods. 2017;22(1):141–65.CrossRefPubMed
26.
go back to reference Audigier V, White IR, Jolani S, Debray TP, Quartagno M, Carpenter J, et al. Multiple imputation for multilevel data with continuous and binary variables. Stat Sci. 2018;33(2):160–83.CrossRef Audigier V, White IR, Jolani S, Debray TP, Quartagno M, Carpenter J, et al. Multiple imputation for multilevel data with continuous and binary variables. Stat Sci. 2018;33(2):160–83.CrossRef
27.
go back to reference Jansen P, Mensah F, Clifford S, Nicholson J, Wake M. Bidirectional associations between overweight and health-related quality of life from 4–11 years: longitudinal study of Australian children. Int J Obes. 2013;37(10):1307–13.CrossRef Jansen P, Mensah F, Clifford S, Nicholson J, Wake M. Bidirectional associations between overweight and health-related quality of life from 4–11 years: longitudinal study of Australian children. Int J Obes. 2013;37(10):1307–13.CrossRef
28.
go back to reference Australian Government Department of Families H, Community Services and Indigenous Affairs (FaHCSIA). Growing Up in Australia: the Longitudinal Study of Australian Children: 2010–11 Annual Report. Canberra, Australia: Australian Government Department of Families, Housing, Community Services and Indigenous Affairs (FaHCSIA) 2012. Australian Government Department of Families H, Community Services and Indigenous Affairs (FaHCSIA). Growing Up in Australia: the Longitudinal Study of Australian Children: 2010–11 Annual Report. Canberra, Australia: Australian Government Department of Families, Housing, Community Services and Indigenous Affairs (FaHCSIA) 2012.
29.
go back to reference Cole TJ, Bellizzi MC, Flegal KM, Dietz WH. Establishing a standard definition for child overweight and obesity worldwide: international survey. BMJ. 2000;320(7244):1240–5.CrossRefPubMedPubMedCentral Cole TJ, Bellizzi MC, Flegal KM, Dietz WH. Establishing a standard definition for child overweight and obesity worldwide: international survey. BMJ. 2000;320(7244):1240–5.CrossRefPubMedPubMedCentral
30.
go back to reference Feeney R, Desha L, Khan A, Ziviani J, Nicholson JM. Speech and language difficulties along with other child and family factors associated with health related quality of life of Australian children. Appl Res Qual Life. 2016;11(4):1379–97.CrossRef Feeney R, Desha L, Khan A, Ziviani J, Nicholson JM. Speech and language difficulties along with other child and family factors associated with health related quality of life of Australian children. Appl Res Qual Life. 2016;11(4):1379–97.CrossRef
31.
go back to reference Bernaards CA, Belin TR, Schafer JL. Robustness of a multivariate normal approximation for imputation of incomplete binary data. Stat Med. 2007;26(6):1368–82.CrossRefPubMed Bernaards CA, Belin TR, Schafer JL. Robustness of a multivariate normal approximation for imputation of incomplete binary data. Stat Med. 2007;26(6):1368–82.CrossRefPubMed
32.
go back to reference Zhao E, Yucel RM. Performance of sequential imputation method in multilevel applications. Proceedings in Jonit statistical meetings, Washington DC. 2009. Zhao E, Yucel RM. Performance of sequential imputation method in multilevel applications. Proceedings in Jonit statistical meetings, Washington DC. 2009.
33.
go back to reference Audigier V, Resche-Rigon M. Micemd: Multiple imputation by chained equations with multilevel data. R package; 2017. Audigier V, Resche-Rigon M. Micemd: Multiple imputation by chained equations with multilevel data. R package; 2017.
34.
go back to reference Horton NJ, Lipsitz SR, Parzen M. A potential for bias when rounding in multiple imputation. Am Stat. 2003;57(4):229–32.CrossRef Horton NJ, Lipsitz SR, Parzen M. A potential for bias when rounding in multiple imputation. Am Stat. 2003;57(4):229–32.CrossRef
35.
go back to reference Kalaycioglu O, Copas A, King M, Omar RZ. A comparison of multiple-imputation methods for handling missing data in repeated measurements observational studies. J R Stat Soc A Stat Soc. 2016;179(3):683–706.CrossRef Kalaycioglu O, Copas A, King M, Omar RZ. A comparison of multiple-imputation methods for handling missing data in repeated measurements observational studies. J R Stat Soc A Stat Soc. 2016;179(3):683–706.CrossRef
36.
37.
go back to reference Hughes RA, White IR, Seaman SR, Carpenter JR, Tilling K, Sterne JA. Joint modelling rationale for chained equations. BMC Med Res Methodol. 2014;14(1):28–37.CrossRefPubMedPubMedCentral Hughes RA, White IR, Seaman SR, Carpenter JR, Tilling K, Sterne JA. Joint modelling rationale for chained equations. BMC Med Res Methodol. 2014;14(1):28–37.CrossRefPubMedPubMedCentral
38.
go back to reference Seaman SR, Hughes RA. Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: the general location model. Stat Methods Med Res. 2018;27(6):1603–14.CrossRefPubMed Seaman SR, Hughes RA. Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: the general location model. Stat Methods Med Res. 2018;27(6):1603–14.CrossRefPubMed
39.
go back to reference Murray JS. Multiple imputation: a review of practical and theoretical findings. Stat Sci. 2018;33(2):142–59.CrossRef Murray JS. Multiple imputation: a review of practical and theoretical findings. Stat Sci. 2018;33(2):142–59.CrossRef
40.
go back to reference Zhao Y, Long Q. Multiple imputation in the presence of high-dimensional data. Stat Methods Med Res. 2016;25(5):2021–35.CrossRefPubMed Zhao Y, Long Q. Multiple imputation in the presence of high-dimensional data. Stat Methods Med Res. 2016;25(5):2021–35.CrossRefPubMed
41.
go back to reference Audigier V, Husson F, Josse J. Multiple imputation for continuous variables using a Bayesian principal component analysis. J Stat Comput Simul. 2016;86(11):2140–56.CrossRef Audigier V, Husson F, Josse J. Multiple imputation for continuous variables using a Bayesian principal component analysis. J Stat Comput Simul. 2016;86(11):2140–56.CrossRef
42.
go back to reference Zhao J, Schafer J. Pan: Multiple imputation for multivariate panel or clustered data. R Foundation for statistical computing; 2013. Zhao J, Schafer J. Pan: Multiple imputation for multivariate panel or clustered data. R Foundation for statistical computing; 2013.
43.
go back to reference Carpenter JR, Goldstein H, Kenward MG. REALCOM-IMPUTE software for multilevel multiple imputation with mixed response types. J Stat Softw. 2011;45(5):1–14.CrossRef Carpenter JR, Goldstein H, Kenward MG. REALCOM-IMPUTE software for multilevel multiple imputation with mixed response types. J Stat Softw. 2011;45(5):1–14.CrossRef
44.
go back to reference Van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3). Van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3).
45.
go back to reference Keller BT, Enders CK. Blimp Software Manual (Version Beta 6.7). Los Angeles, Ca; 2017. Keller BT, Enders CK. Blimp Software Manual (Version Beta 6.7). Los Angeles, Ca; 2017.
46.
go back to reference Robitzsch A, Grund S, Henke T. Miceadds: some additional multiple imputation functions, especially for ‘mice’. R package version 1. 7–8. 2016. Robitzsch A, Grund S, Henke T. Miceadds: some additional multiple imputation functions, especially for ‘mice’. R package version 1. 7–8. 2016.
Metadata
Title
A comparison of multiple imputation methods for missing data in longitudinal studies
Authors
Md Hamidul Huque
John B. Carlin
Julie A. Simpson
Katherine J. Lee
Publication date
01-12-2018
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2018
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-018-0615-6

Other articles of this Issue 1/2018

BMC Medical Research Methodology 1/2018 Go to the issue