Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2012

Open Access 01-12-2012 | Research article

Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods

Authors: Shaun R Seaman, Jonathan W Bartlett, Ian R White

Published in: BMC Medical Research Methodology | Issue 1/2012

Login to get access

Abstract

Background

Multiple imputation is often used for missing data. When a model contains as covariates more than one function of a variable, it is not obvious how best to impute missing values in these covariates. Consider a regression with outcome Y and covariates X and X 2. In 'passive imputation' a value X* is imputed for X and then X 2 is imputed as (X*)2. A recent proposal is to treat X 2 as 'just another variable' (JAV) and impute X and X 2 under multivariate normality.

Methods

We use simulation to investigate the performance of three methods that can easily be implemented in standard software: 1) linear regression of X on Y to impute X then passive imputation of X 2; 2) the same regression but with predictive mean matching (PMM); and 3) JAV. We also investigate the performance of analogous methods when the analysis involves an interaction, and study the theoretical properties of JAV. The application of the methods when complete or incomplete confounders are also present is illustrated using data from the EPIC Study.

Results

JAV gives consistent estimation when the analysis is linear regression with a quadratic or interaction term and X is missing completely at random. When X is missing at random, JAV may be biased, but this bias is generally less than for passive imputation and PMM. Coverage for JAV was usually good when bias was small. However, in some scenarios with a more pronounced quadratic effect, bias was large and coverage poor. When the analysis was logistic regression, JAV's performance was sometimes very poor. PMM generally improved on passive imputation, in terms of bias and coverage, but did not eliminate the bias.

Conclusions

Given the current state of available software, JAV is the best of a set of imperfect imputation methods for linear regression with a quadratic or interaction effect, but should not be used for logistic regression.
Appendix
Available only for authorised users
Literature
1.
go back to reference Little RJA, Rubin DB: Statistical Analysis With Missing Data. 2002, New Jersey: WileyCrossRef Little RJA, Rubin DB: Statistical Analysis With Missing Data. 2002, New Jersey: WileyCrossRef
2.
go back to reference Royston J, Sauerbrei W: Multivariate Model-Building. 2008, Chichester: WileyCrossRef Royston J, Sauerbrei W: Multivariate Model-Building. 2008, Chichester: WileyCrossRef
3.
go back to reference Von Hippel PT: How to impute interactions, squares and other transformed variables. Sociol Methodol. 2009, 39: 265-291. 10.1111/j.1467-9531.2009.01215.x.CrossRef Von Hippel PT: How to impute interactions, squares and other transformed variables. Sociol Methodol. 2009, 39: 265-291. 10.1111/j.1467-9531.2009.01215.x.CrossRef
4.
go back to reference Schenker N, Taylor JMG: Partially parametric techniques for multiple imputation. Computational Statistics and Data Analysis. 1996, 22: 425-446. 10.1016/0167-9473(95)00057-7.CrossRef Schenker N, Taylor JMG: Partially parametric techniques for multiple imputation. Computational Statistics and Data Analysis. 1996, 22: 425-446. 10.1016/0167-9473(95)00057-7.CrossRef
5.
go back to reference White IR, Royston P, Wood AM: Multiple imputation for chained equations: issues and guidance for practice. Stat Med. 2011, 30: 377-399. 10.1002/sim.4067.CrossRefPubMed White IR, Royston P, Wood AM: Multiple imputation for chained equations: issues and guidance for practice. Stat Med. 2011, 30: 377-399. 10.1002/sim.4067.CrossRefPubMed
6.
go back to reference Schenker N, Welsh AH: Asymptotic results for multiple imputation. Ann Stat. 1988, 16: 1550-1566. 10.1214/aos/1176351053.CrossRef Schenker N, Welsh AH: Asymptotic results for multiple imputation. Ann Stat. 1988, 16: 1550-1566. 10.1214/aos/1176351053.CrossRef
7.
go back to reference Day NE, Oakes S, Luben R, Khaw KT, Bingham S, Welch A, Wareham N: EPIC in Norfolk: study design and characteristics of the cohort. Br J Cancer. 1999, 80 (Suppl 1): 95-103.PubMed Day NE, Oakes S, Luben R, Khaw KT, Bingham S, Welch A, Wareham N: EPIC in Norfolk: study design and characteristics of the cohort. Br J Cancer. 1999, 80 (Suppl 1): 95-103.PubMed
8.
go back to reference Bingham SA, Welch AA, McTaggart A, Mulligan AA, Runswick SA, Luben R, Oakes S, K-T K, Wareham N, Day NE: Nutritional methods in the European prospective investigation of cancer in Norfolk. Public Health Nutr. 2001, 4: 847-858. 10.1079/PHN2000102.CrossRefPubMed Bingham SA, Welch AA, McTaggart A, Mulligan AA, Runswick SA, Luben R, Oakes S, K-T K, Wareham N, Day NE: Nutritional methods in the European prospective investigation of cancer in Norfolk. Public Health Nutr. 2001, 4: 847-858. 10.1079/PHN2000102.CrossRefPubMed
9.
go back to reference Bates CJ, Thurnham DI: Biochemical markers of nutrient intake. Design Concepts in Nutritional Epidemiology. Edited by: Margetts BM, Nelson N. 1991, Oxford University Press Bates CJ, Thurnham DI: Biochemical markers of nutrient intake. Design Concepts in Nutritional Epidemiology. Edited by: Margetts BM, Nelson N. 1991, Oxford University Press
10.
go back to reference Dehghan M, Akhtar-Danesh N, McMillan CR, Thabane L: Is plasma vitamin C an appropriate biomarker of vitamin C intake? A systematic review and meta-analysis. Nutr J. 2007, 6: doi:10:1186/1475-2891-6-41 Dehghan M, Akhtar-Danesh N, McMillan CR, Thabane L: Is plasma vitamin C an appropriate biomarker of vitamin C intake? A systematic review and meta-analysis. Nutr J. 2007, 6: doi:10:1186/1475-2891-6-41
11.
go back to reference Brubacher D, Moser U, Jordan P: Vitamin C concentations in plasma as a function of intake: a meta-analysis. International Journal for Vitamin and Nutrient Research. 2000, 70: 226-237. 10.1024/0300-9831.70.5.226.CrossRef Brubacher D, Moser U, Jordan P: Vitamin C concentations in plasma as a function of intake: a meta-analysis. International Journal for Vitamin and Nutrient Research. 2000, 70: 226-237. 10.1024/0300-9831.70.5.226.CrossRef
12.
go back to reference Stegmayr B, Johansson I, Huhtasaari F, Moser U, Asplund K: Use of smokeless tobacco and cigarettes--effects on plasma levels of antioxidant vitamins. Int J Vitam Nutr Res. 1993, 63: 195-200.PubMed Stegmayr B, Johansson I, Huhtasaari F, Moser U, Asplund K: Use of smokeless tobacco and cigarettes--effects on plasma levels of antioxidant vitamins. Int J Vitam Nutr Res. 1993, 63: 195-200.PubMed
13.
go back to reference Van Buuren S: Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007, 16: 219-242. 10.1177/0962280206074463.CrossRefPubMed Van Buuren S: Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007, 16: 219-242. 10.1177/0962280206074463.CrossRefPubMed
14.
go back to reference Ake CF: Rounding After Imputation With Non-binary Categorical Covariates. Paper 112-30, SUGI 30 Proceedings, Philadelphia, Pennsylvania. 2005 Ake CF: Rounding After Imputation With Non-binary Categorical Covariates. Paper 112-30, SUGI 30 Proceedings, Philadelphia, Pennsylvania. 2005
15.
go back to reference Von Hippel PT: Regression with missing Y's: an improved strategy for analysing multiply imputed data. Sociol Methodol. 2007, 37: 83-117. 10.1111/j.1467-9531.2007.00180.x.CrossRef Von Hippel PT: Regression with missing Y's: an improved strategy for analysing multiply imputed data. Sociol Methodol. 2007, 37: 83-117. 10.1111/j.1467-9531.2007.00180.x.CrossRef
16.
go back to reference White H: A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980, 48: 817-838. 10.2307/1912934.CrossRef White H: A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980, 48: 817-838. 10.2307/1912934.CrossRef
17.
go back to reference Fine JP: Comparing nonnested cox models. Biometrika. 2002, 89: 635-647. 10.1093/biomet/89.3.635.CrossRef Fine JP: Comparing nonnested cox models. Biometrika. 2002, 89: 635-647. 10.1093/biomet/89.3.635.CrossRef
18.
go back to reference Scott AJ, Wild CJ: Selection Based on the Response Variable in Logistic Regression. Analysis of Complex Surveys. Edited by: Skinner CJ, Holt D, Smith TMF. 1989, New York: Wiley Scott AJ, Wild CJ: Selection Based on the Response Variable in Logistic Regression. Analysis of Complex Surveys. Edited by: Skinner CJ, Holt D, Smith TMF. 1989, New York: Wiley
19.
go back to reference Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: WileyCrossRef Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: WileyCrossRef
20.
go back to reference Robins JM, Wang N: Inference for imputation estimators. Biometrika. 2000, 87: 113-124. 10.1093/biomet/87.1.113.CrossRef Robins JM, Wang N: Inference for imputation estimators. Biometrika. 2000, 87: 113-124. 10.1093/biomet/87.1.113.CrossRef
21.
go back to reference Nielsen SF: Proper and improper multiple Imputation. Int Stat Rev. 2003, 71: 593-627.CrossRef Nielsen SF: Proper and improper multiple Imputation. Int Stat Rev. 2003, 71: 593-627.CrossRef
22.
go back to reference Prentice RL, Pyke R: Logistic disease incidence model and case-control studies. Biometrika. 1979, 66: 403-411. 10.1093/biomet/66.3.403.CrossRef Prentice RL, Pyke R: Logistic disease incidence model and case-control studies. Biometrika. 1979, 66: 403-411. 10.1093/biomet/66.3.403.CrossRef
Metadata
Title
Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods
Authors
Shaun R Seaman
Jonathan W Bartlett
Ian R White
Publication date
01-12-2012
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2012
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-12-46

Other articles of this Issue 1/2012

BMC Medical Research Methodology 1/2012 Go to the issue