Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2014

Open Access 01-12-2014 | Research article

Comparison of methods for imputing limited-range variables: a simulation study

Authors: Laura Rodwell, Katherine J Lee, Helena Romaniuk, John B Carlin

Published in: BMC Medical Research Methodology | Issue 1/2014

Login to get access

Abstract

Background

Multiple imputation (MI) was developed as a method to enable valid inferences to be obtained in the presence of missing data rather than to re-create the missing values. Within the applied setting, it remains unclear how important it is that imputed values should be plausible for individual observations. One variable type for which MI may lead to implausible values is a limited-range variable, where imputed values may fall outside the observable range. The aim of this work was to compare methods for imputing limited-range variables, with a focus on those that restrict the range of the imputed values.

Methods

Using data from a study of adolescent health, we consider three variables based on responses to the General Health Questionnaire (GHQ), a tool for detecting minor psychiatric illness. These variables, based on different scoring methods for the GHQ, resulted in three continuous distributions with mild, moderate and severe positive skewness. In an otherwise complete dataset, we set 33% of the GHQ observations to missing completely at random or missing at random; repeating this process to create 1000 datasets with incomplete data for each scenario.
For each dataset, we imputed values on the raw scale and following a zero-skewness log transformation using: univariate regression with no rounding; post-imputation rounding; truncated normal regression; and predictive mean matching. We estimated the marginal mean of the GHQ and the association between the GHQ and a fully observed binary outcome, comparing the results with complete data statistics.

Results

Imputation with no rounding performed well when applied to data on the raw scale. Post-imputation rounding and imputation using truncated normal regression produced higher marginal means than the complete data estimate when data had a moderate or severe skew, and this was associated with under-coverage of the complete data estimate. Predictive mean matching also produced under-coverage of the complete data estimate. For the estimate of association, all methods produced similar estimates to the complete data.

Conclusions

For data with a limited range, multiple imputation using techniques that restrict the range of imputed values can result in biased estimates for the marginal mean when data are highly skewed.
Appendix
Available only for authorised users
Literature
1.
go back to reference Little RJ, D’Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, Frangakis C, Hogan JW, Molenberghs G, Murphy SA, Neaton JD, Rotnitzky A, Scharfstein D, Shih WJ, Siegel JP, Stern H: The prevention and treatment of missing data in clinical trials. N Engl J Med. 2012, 367 (14): 1355-1360. 10.1056/NEJMsr1203730.CrossRefPubMedPubMedCentral Little RJ, D’Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, Frangakis C, Hogan JW, Molenberghs G, Murphy SA, Neaton JD, Rotnitzky A, Scharfstein D, Shih WJ, Siegel JP, Stern H: The prevention and treatment of missing data in clinical trials. N Engl J Med. 2012, 367 (14): 1355-1360. 10.1056/NEJMsr1203730.CrossRefPubMedPubMedCentral
2.
go back to reference Ware JH, Harrington D, Hunter DJ, D’Agostino RB: Missing Data. New Engl J Med. 2012, 367 (14): 1353-1354. 10.1056/NEJMsm1210043.CrossRef Ware JH, Harrington D, Hunter DJ, D’Agostino RB: Missing Data. New Engl J Med. 2012, 367 (14): 1353-1354. 10.1056/NEJMsm1210043.CrossRef
3.
go back to reference Rubin DB: Multiple imputation for nonresponse in surveys. 1987, New York: WileyCrossRef Rubin DB: Multiple imputation for nonresponse in surveys. 1987, New York: WileyCrossRef
4.
go back to reference Rubin DB: Multiple Imputation after 18+ Years. J Am Stat Assoc. 1996, 91 (434): 473-489. 10.1080/01621459.1996.10476908.CrossRef Rubin DB: Multiple Imputation after 18+ Years. J Am Stat Assoc. 1996, 91 (434): 473-489. 10.1080/01621459.1996.10476908.CrossRef
5.
go back to reference Schafer JL, Graham JW: Missing data: Our view of the state of the art. Psychol Methods. 2002, 7 (2): 147-177.CrossRefPubMed Schafer JL, Graham JW: Missing data: Our view of the state of the art. Psychol Methods. 2002, 7 (2): 147-177.CrossRefPubMed
6.
go back to reference Schafer JL: Analysis of Incomplete Multivariate Data. 1997, New York: Chapman & HallCrossRef Schafer JL: Analysis of Incomplete Multivariate Data. 1997, New York: Chapman & HallCrossRef
7.
go back to reference van Buuren S: Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007, 16 (3): 219-242. 10.1177/0962280206074463.CrossRefPubMed van Buuren S: Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007, 16 (3): 219-242. 10.1177/0962280206074463.CrossRefPubMed
8.
go back to reference Hussain S, Mohammed MA, Haque MS, Holder R, Macleod J, Hobbs R: A Simple Method to Ensure Plausible Multiple Imputation for Continuous Multivariate Data. Commun Stat Simulat. 2010, 39 (9): 1779-1784. 10.1080/03610918.2010.518267.CrossRef Hussain S, Mohammed MA, Haque MS, Holder R, Macleod J, Hobbs R: A Simple Method to Ensure Plausible Multiple Imputation for Continuous Multivariate Data. Commun Stat Simulat. 2010, 39 (9): 1779-1784. 10.1080/03610918.2010.518267.CrossRef
9.
go back to reference Raghunathan TE, Lepowski JM, van Howeyk JPS: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol. 2001, 27 (1): 85-95. Raghunathan TE, Lepowski JM, van Howeyk JPS: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol. 2001, 27 (1): 85-95.
11.
go back to reference He Y: Missing data analysis using multiple imputation: getting to the heart of the matter. Circ Cardiovasc Qual Outcomes. 2010, 3 (1): 98-105. 10.1161/CIRCOUTCOMES.109.875658.CrossRefPubMedPubMedCentral He Y: Missing data analysis using multiple imputation: getting to the heart of the matter. Circ Cardiovasc Qual Outcomes. 2010, 3 (1): 98-105. 10.1161/CIRCOUTCOMES.109.875658.CrossRefPubMedPubMedCentral
12.
go back to reference Chen L, Toma-Drane M, Valois RF, Drane JW: Multiple imputation for missing ordinal data. J Mod App Stat. 2005, 4 (1): 26- Chen L, Toma-Drane M, Valois RF, Drane JW: Multiple imputation for missing ordinal data. J Mod App Stat. 2005, 4 (1): 26-
13.
go back to reference Lee KJ, Carlin JB: Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation. Am J Epidemiol. 2010, 171 (5): 624-632. 10.1093/aje/kwp425.CrossRefPubMed Lee KJ, Carlin JB: Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation. Am J Epidemiol. 2010, 171 (5): 624-632. 10.1093/aje/kwp425.CrossRefPubMed
14.
go back to reference von Hippel PT: Should a Normal Imputation Model be Modified to Impute Skewed Variables?. Sociol Method Res. 2013, 42 (1): 105-138. 10.1177/0049124112464866.CrossRef von Hippel PT: Should a Normal Imputation Model be Modified to Impute Skewed Variables?. Sociol Method Res. 2013, 42 (1): 105-138. 10.1177/0049124112464866.CrossRef
15.
go back to reference Little RJA: Missing-Data Adjustments in Large Surveys. J Bus Econ Stat. 1988, 6 (3): 287-296. Little RJA: Missing-Data Adjustments in Large Surveys. J Bus Econ Stat. 1988, 6 (3): 287-296.
16.
go back to reference Swift W, Coffey C, Degenhardt L, Carlin JB, Romaniuk H, Patton GC: Cannabis and progression to other substance use in young adults: findings from a 13-year prospective population-based study. J Epidemiol Community Health. 2012, 66 (7): e26-10.1136/jech.2010.129056.CrossRefPubMed Swift W, Coffey C, Degenhardt L, Carlin JB, Romaniuk H, Patton GC: Cannabis and progression to other substance use in young adults: findings from a 13-year prospective population-based study. J Epidemiol Community Health. 2012, 66 (7): e26-10.1136/jech.2010.129056.CrossRefPubMed
17.
go back to reference Goldberg D, Williams P: A user’s guide to the GHQ. 1988, NFER-Nelson: Windsor Goldberg D, Williams P: A user’s guide to the GHQ. 1988, NFER-Nelson: Windsor
18.
go back to reference Donath S: The validity of the 12-item General Health Questionnaire in Australia: a comparison between three scoring methods. Aust N Z J Psychiatry. 2001, 35 (2): 231-235. 10.1046/j.1440-1614.2001.00869.x.CrossRefPubMed Donath S: The validity of the 12-item General Health Questionnaire in Australia: a comparison between three scoring methods. Aust N Z J Psychiatry. 2001, 35 (2): 231-235. 10.1046/j.1440-1614.2001.00869.x.CrossRefPubMed
19.
go back to reference Goodchild ME, Duncan-Jones P: Chronicity and the General Health Questionnaire. Brit J Psychiat. 1985, 146 (1): 55-61. 10.1192/bjp.146.1.55.CrossRefPubMed Goodchild ME, Duncan-Jones P: Chronicity and the General Health Questionnaire. Brit J Psychiat. 1985, 146 (1): 55-61. 10.1192/bjp.146.1.55.CrossRefPubMed
20.
go back to reference Brand JPL, Buuren S, Groothuis-Oudshoorn K, Gelsema ES: A toolkit in SAS for the evaluation of multiple imputation methods. Statistica Neerlandica. 2003, 57 (1): 36-45. 10.1111/1467-9574.00219.CrossRef Brand JPL, Buuren S, Groothuis-Oudshoorn K, Gelsema ES: A toolkit in SAS for the evaluation of multiple imputation methods. Statistica Neerlandica. 2003, 57 (1): 36-45. 10.1111/1467-9574.00219.CrossRef
21.
go back to reference van Buuren S: Flexible Imputation of Missing Data. 2012, Boca Raton, FL: CRC PressCrossRef van Buuren S: Flexible Imputation of Missing Data. 2012, Boca Raton, FL: CRC PressCrossRef
22.
go back to reference Morris TP: Practical Use of Multiple Imputation. PhD Thesis. 2013, University College London: MRC Clinical Trials Unit Morris TP: Practical Use of Multiple Imputation. PhD Thesis. 2013, University College London: MRC Clinical Trials Unit
23.
go back to reference StataCorp: Stata: Release 13. Statistical Software. 2013, College Station, TX: StataCorp LP StataCorp: Stata: Release 13. Statistical Software. 2013, College Station, TX: StataCorp LP
24.
go back to reference Horton NJ, Lipsitz SR, Parzen M: A Potential for Bias When Rounding in Multiple Imputation. Am Stat. 2003, 57 (4): 229-232. 10.1198/0003130032314.CrossRef Horton NJ, Lipsitz SR, Parzen M: A Potential for Bias When Rounding in Multiple Imputation. Am Stat. 2003, 57 (4): 229-232. 10.1198/0003130032314.CrossRef
25.
go back to reference Lee KJ, Galati JC, Simpson JA, Carlin JB: Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study. Stat Med. 2012, 31 (30): 4164-4174. 10.1002/sim.5445.CrossRefPubMed Lee KJ, Galati JC, Simpson JA, Carlin JB: Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study. Stat Med. 2012, 31 (30): 4164-4174. 10.1002/sim.5445.CrossRefPubMed
Metadata
Title
Comparison of methods for imputing limited-range variables: a simulation study
Authors
Laura Rodwell
Katherine J Lee
Helena Romaniuk
John B Carlin
Publication date
01-12-2014
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2014
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-14-57

Other articles of this Issue 1/2014

BMC Medical Research Methodology 1/2014 Go to the issue