Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2014

Open Access 01-12-2014 | Research article

Model development including interactions with multiple imputed data

Authors: Gillian M Hendry, Rajen N Naidoo, Temesgen Zewotir, Delia North, Graciela Mentz

Published in: BMC Medical Research Methodology | Issue 1/2014

Login to get access

Abstract

Background

Multiple imputation is a reliable tool to deal with missing data and is becoming increasingly popular in biostatistics. However, building a model with interactions that are not specified a priori, in the presence of missing data, presents a challenge. On the one hand, the interactions are needed to impute the data, while on the other hand, the data is needed to identify the interactions. The objective of this study was to present a way in which this challenge can be addressed.

Methods

This paper investigates two strategies in which model development with interactions is achieved using a single data set generated from the Expectation Maximization (EM) algorithm. Imputation using both the fully conditional specification approach and the multivariate normal approach is carried out and results are compared. The strategies are illustrated with data from a study of ambient pollution and childhood asthma in Durban, South Africa.

Results

The different approaches to model building and imputation yielded similar results despite the data being mainly categorical. Both strategies investigated for building the model using the multivariate normal imputed data resulted in the identical set of variables and interactions being identified; while models built using data imputed by fully conditional specification were marginally different for the two strategies. It was found that, for both imputation approaches, model building with backward elimination applied to the initial EM data set was easier to implement, and produced good results, compared to those from a complete case analysis.

Conclusions

Developing a predictive model including interactions with data that suffers from missingness is easily done by identifying significant interactions and then applying backward elimination to a single data set imputed from the EM algorithm. It is hoped that this idea can be further developed and, by addressing this practical dilemma, there will be increased adoption of multiple imputation in medical research when data suffers from missingness.
Appendix
Available only for authorised users
Literature
1.
2.
go back to reference Greenland S, Finkle WD: A critical look at methods for handling missing covariates in epidemiologic regression analyses. Am J Epidemiol. 1995, 142 (12): 1255-1264.PubMed Greenland S, Finkle WD: A critical look at methods for handling missing covariates in epidemiologic regression analyses. Am J Epidemiol. 1995, 142 (12): 1255-1264.PubMed
3.
go back to reference Little RJA, Rubin DB: Statistical Analysis With Missing Data. 1987, New York: J. Wiley Little RJA, Rubin DB: Statistical Analysis With Missing Data. 1987, New York: J. Wiley
4.
go back to reference Rubin DB: Multiple imputation for nonresponse in surveys. 1987, New York: WileyCrossRef Rubin DB: Multiple imputation for nonresponse in surveys. 1987, New York: WileyCrossRef
5.
go back to reference Graham JW: Missing data analysis: making it work in the real world. Annu Rev Psychol. 2009, 60: 549-576. 10.1146/annurev.psych.58.110405.085530.CrossRefPubMed Graham JW: Missing data analysis: making it work in the real world. Annu Rev Psychol. 2009, 60: 549-576. 10.1146/annurev.psych.58.110405.085530.CrossRefPubMed
6.
go back to reference Lee KJ, Carlin JB: Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. Am J Epidemiol. 2010, 171 (5): 624-632. 10.1093/aje/kwp425.CrossRefPubMed Lee KJ, Carlin JB: Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. Am J Epidemiol. 2010, 171 (5): 624-632. 10.1093/aje/kwp425.CrossRefPubMed
7.
go back to reference Donders ART, van der Heijden GJ, Stijnen T, Moons KG: Review: a gentle introduction to imputation of missing values. J Clin Epidemiol. 2006, 59 (10): 1087-1091. 10.1016/j.jclinepi.2006.01.014.CrossRefPubMed Donders ART, van der Heijden GJ, Stijnen T, Moons KG: Review: a gentle introduction to imputation of missing values. J Clin Epidemiol. 2006, 59 (10): 1087-1091. 10.1016/j.jclinepi.2006.01.014.CrossRefPubMed
8.
go back to reference Schafer JL, Olsen MK: Multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivariate Behav Res. 1998, 33 (4): 545-571. 10.1207/s15327906mbr3304_5.CrossRefPubMed Schafer JL, Olsen MK: Multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivariate Behav Res. 1998, 33 (4): 545-571. 10.1207/s15327906mbr3304_5.CrossRefPubMed
9.
10.
go back to reference Collins LM, Schafer JL, Kam C-M: A Comparison of Inclusive and Restrictive Strategies in Modern Missing Data Procedures. Psychological Methods. 2001, 6: 330-351.CrossRefPubMed Collins LM, Schafer JL, Kam C-M: A Comparison of Inclusive and Restrictive Strategies in Modern Missing Data Procedures. Psychological Methods. 2001, 6: 330-351.CrossRefPubMed
11.
go back to reference Graham JW, Hofer SM, Donaldson SI, MacKinnon DP, Schafer JL: Analysis with missing data in prevention research. The science of prevention: methodological advances from alcohol and substance abuse research. 1997, Washington D.C.: American Psychological Association, 325-366.CrossRef Graham JW, Hofer SM, Donaldson SI, MacKinnon DP, Schafer JL: Analysis with missing data in prevention research. The science of prevention: methodological advances from alcohol and substance abuse research. 1997, Washington D.C.: American Psychological Association, 325-366.CrossRef
12.
go back to reference Rubin DB: Multiple imputation after 18+ years. J Am Stat Assoc. 1996, 91 (434): 473-489. 10.1080/01621459.1996.10476908.CrossRef Rubin DB: Multiple imputation after 18+ years. J Am Stat Assoc. 1996, 91 (434): 473-489. 10.1080/01621459.1996.10476908.CrossRef
13.
go back to reference Stuart EA, Azur M, Frangakis C, Leaf P: Multiple imputation with large data sets: a case study of the Children’s Mental Health Initiative. Am J Epidemiol. 2009, 169 (9): 1133-1139. 10.1093/aje/kwp026.CrossRefPubMedPubMedCentral Stuart EA, Azur M, Frangakis C, Leaf P: Multiple imputation with large data sets: a case study of the Children’s Mental Health Initiative. Am J Epidemiol. 2009, 169 (9): 1133-1139. 10.1093/aje/kwp026.CrossRefPubMedPubMedCentral
14.
go back to reference Schafer J: Analysis of incomplete multivariate data. 1997, London: Chapman & HallCrossRef Schafer J: Analysis of incomplete multivariate data. 1997, London: Chapman & HallCrossRef
15.
go back to reference Vergouwe Y, Royston P, Moons KG, Altman DG: Development and validation of a prediction model with missing predictor data: a practical approach. J Clin Epidemiol. 2010, 63 (2): 205-214. 10.1016/j.jclinepi.2009.03.017.CrossRefPubMed Vergouwe Y, Royston P, Moons KG, Altman DG: Development and validation of a prediction model with missing predictor data: a practical approach. J Clin Epidemiol. 2010, 63 (2): 205-214. 10.1016/j.jclinepi.2009.03.017.CrossRefPubMed
16.
go back to reference White IR, Royston P, Wood AM: Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011, 30 (4): 377-399. 10.1002/sim.4067.CrossRefPubMed White IR, Royston P, Wood AM: Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011, 30 (4): 377-399. 10.1002/sim.4067.CrossRefPubMed
17.
go back to reference Wood AM, White IR, Royston P: How should variable selection be performed with multiply imputed data?. Stat Med. 2008, 27 (17): 3227-3246. 10.1002/sim.3177.CrossRefPubMed Wood AM, White IR, Royston P: How should variable selection be performed with multiply imputed data?. Stat Med. 2008, 27 (17): 3227-3246. 10.1002/sim.3177.CrossRefPubMed
18.
go back to reference Naidoo RN, Robins TG, Batterman S, Mentz G, Jack C: Ambient pollution and respiratory outcomes among schoolchildren in Durban, South Africa. SAJCH. 2013, 7 (4): 127-134. 10.7196/sajch.598.CrossRefPubMedPubMedCentral Naidoo RN, Robins TG, Batterman S, Mentz G, Jack C: Ambient pollution and respiratory outcomes among schoolchildren in Durban, South Africa. SAJCH. 2013, 7 (4): 127-134. 10.7196/sajch.598.CrossRefPubMedPubMedCentral
19.
go back to reference Schafer J: NORM: Multiple imputation of incomplete multivariate data under a normal model [Computer software]. 1999, University Park: Pennsylvania State University, Department of Statistics Schafer J: NORM: Multiple imputation of incomplete multivariate data under a normal model [Computer software]. 1999, University Park: Pennsylvania State University, Department of Statistics
22.
go back to reference Azur MJ, Stuart EA, Frangakis C, Leaf PJ: Multiple imputation by chained equations: what is it and how does it work?. Int J Methods Psychiatr Res. 2011, 20 (1): 40-49. 10.1002/mpr.329.CrossRefPubMedPubMedCentral Azur MJ, Stuart EA, Frangakis C, Leaf PJ: Multiple imputation by chained equations: what is it and how does it work?. Int J Methods Psychiatr Res. 2011, 20 (1): 40-49. 10.1002/mpr.329.CrossRefPubMedPubMedCentral
23.
go back to reference Raghunathan TE, Solenberger PW, Van Hoewyk J: IVEware: Imputation and variance estimation software. 2002, Ann Arbor, MI: Survey Methodology Program, Survey Research Center, Institute for Social Research, University of Michigan Raghunathan TE, Solenberger PW, Van Hoewyk J: IVEware: Imputation and variance estimation software. 2002, Ann Arbor, MI: Survey Methodology Program, Survey Research Center, Institute for Social Research, University of Michigan
24.
go back to reference Graham JW, Olchowski AE, Gilreath TD: How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev Sci. 2007, 8 (3): 206-213. 10.1007/s11121-007-0070-9.CrossRefPubMed Graham JW, Olchowski AE, Gilreath TD: How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev Sci. 2007, 8 (3): 206-213. 10.1007/s11121-007-0070-9.CrossRefPubMed
25.
go back to reference Von Hippel PT: How to impute interactions, squares, and other transformed variables. Sociol Methodol. 2009, 39 (1): 265-291. 10.1111/j.1467-9531.2009.01215.x.CrossRef Von Hippel PT: How to impute interactions, squares, and other transformed variables. Sociol Methodol. 2009, 39 (1): 265-291. 10.1111/j.1467-9531.2009.01215.x.CrossRef
26.
go back to reference Abayomi K, Gelman A, Levy M: Diagnostics for multivariate imputations. J R Stat Soc Ser C Appl Stat. 2008, 57 (3): 273-291. 10.1111/j.1467-9876.2007.00613.x.CrossRef Abayomi K, Gelman A, Levy M: Diagnostics for multivariate imputations. J R Stat Soc Ser C Appl Stat. 2008, 57 (3): 273-291. 10.1111/j.1467-9876.2007.00613.x.CrossRef
27.
go back to reference Desai M, Esserman DA, Gammon MD, Terry MB: The use of complete-case and multiple imputation-based analyses in molecular epidemiology studies that assess interaction effects. Epidemiol Perspect Innovat. 2011, 8 (1): 5-10.1186/1742-5573-8-5.CrossRef Desai M, Esserman DA, Gammon MD, Terry MB: The use of complete-case and multiple imputation-based analyses in molecular epidemiology studies that assess interaction effects. Epidemiol Perspect Innovat. 2011, 8 (1): 5-10.1186/1742-5573-8-5.CrossRef
28.
go back to reference Graham JW, Schafer JL: On the performance of multiple imputation for multivariate data with small sample size. Statistical strategies for small sample research. 1999, 50: 1-27. Graham JW, Schafer JL: On the performance of multiple imputation for multivariate data with small sample size. Statistical strategies for small sample research. 1999, 50: 1-27.
29.
go back to reference Finch WH: Imputation methods for missing categorical questionnaire data: a comparison of approaches. J Data Sci. 2010, 8 (3): 361-378. Finch WH: Imputation methods for missing categorical questionnaire data: a comparison of approaches. J Data Sci. 2010, 8 (3): 361-378.
30.
go back to reference Hardt J, Herke M, Leonhart R: Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research. BMC Medical Research Methodology. 2012, 12 (1): 184-10.1186/1471-2288-12-184.CrossRefPubMedPubMedCentral Hardt J, Herke M, Leonhart R: Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research. BMC Medical Research Methodology. 2012, 12 (1): 184-10.1186/1471-2288-12-184.CrossRefPubMedPubMedCentral
31.
go back to reference He Y: Missing data analysis using multiple imputation getting to the heart of the matter. Circ Cardiovasc Qual Outcomes. 2010, 3 (1): 98-105. 10.1161/CIRCOUTCOMES.109.875658.CrossRefPubMedPubMedCentral He Y: Missing data analysis using multiple imputation getting to the heart of the matter. Circ Cardiovasc Qual Outcomes. 2010, 3 (1): 98-105. 10.1161/CIRCOUTCOMES.109.875658.CrossRefPubMedPubMedCentral
Metadata
Title
Model development including interactions with multiple imputed data
Authors
Gillian M Hendry
Rajen N Naidoo
Temesgen Zewotir
Delia North
Graciela Mentz
Publication date
01-12-2014
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2014
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-14-136

Other articles of this Issue 1/2014

BMC Medical Research Methodology 1/2014 Go to the issue