Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2010

Open Access 01-12-2010 | Research article

Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study

Authors: Andrea Marshall, Douglas G Altman, Roger L Holder

Published in: BMC Medical Research Methodology | Issue 1/2010

Login to get access

Abstract

Background

The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model.

Methods

Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained.

Results

CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness.

Conclusions

Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness.
Appendix
Available only for authorised users
Literature
1.
go back to reference Burton A, Altman DG: Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. British Journal of Cancer. 2004, 91 (1): 4-8. 10.1038/sj.bjc.6601907.CrossRefPubMedPubMedCentral Burton A, Altman DG: Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. British Journal of Cancer. 2004, 91 (1): 4-8. 10.1038/sj.bjc.6601907.CrossRefPubMedPubMedCentral
2.
go back to reference Herring AH, Ibrahim JG: Likelihood-based methods for missing covariates in the Cox proportional hazards model. Journal of the American Statistical Association. 2001, 96 (453): 292-302. 10.1198/016214501750332866.CrossRef Herring AH, Ibrahim JG: Likelihood-based methods for missing covariates in the Cox proportional hazards model. Journal of the American Statistical Association. 2001, 96 (453): 292-302. 10.1198/016214501750332866.CrossRef
3.
go back to reference Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: John Wiley and SonsCrossRef Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: John Wiley and SonsCrossRef
4.
go back to reference Schafer JL: Analysis of Incomplete Multivariate Data. 1997, New York: Chapman and HallCrossRef Schafer JL: Analysis of Incomplete Multivariate Data. 1997, New York: Chapman and HallCrossRef
5.
go back to reference van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine. 1999, 18 (6): 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R.CrossRefPubMed van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine. 1999, 18 (6): 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R.CrossRefPubMed
6.
go back to reference Marshall A, Altman D, Royston P, Holder R: Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol. 2010, 10 (1): 7-10.1186/1471-2288-10-7.CrossRefPubMedPubMedCentral Marshall A, Altman D, Royston P, Holder R: Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol. 2010, 10 (1): 7-10.1186/1471-2288-10-7.CrossRefPubMedPubMedCentral
7.
go back to reference Murphy SP, Perera T: Successes and failures in UK/US development of simulation. Simulation Practice and Theory. 2002, 9: 333-348. 10.1016/S0928-4869(01)00048-9.CrossRef Murphy SP, Perera T: Successes and failures in UK/US development of simulation. Simulation Practice and Theory. 2002, 9: 333-348. 10.1016/S0928-4869(01)00048-9.CrossRef
8.
go back to reference Schafer J, Ezzati-Rice T, Johnson W, Khare M, Little R, Rubin D: The NHANES III multiple imputation project. Proceedings of the Survey Research Methods Section of the American Statistical Association. Chicago, Illnois. 1996, 28-37. Schafer J, Ezzati-Rice T, Johnson W, Khare M, Little R, Rubin D: The NHANES III multiple imputation project. Proceedings of the Survey Research Methods Section of the American Statistical Association. Chicago, Illnois. 1996, 28-37.
9.
go back to reference Schafer JL, Olsen MK: Modelling and imputation of semicontinuous survey variables. 2000, The Methodology Center, Penn State University, USA Schafer JL, Olsen MK: Modelling and imputation of semicontinuous survey variables. 2000, The Methodology Center, Penn State University, USA
10.
go back to reference Ezzati-Rice T, Johnson W, Khare M, Little R, Rubin D, Schafer J: A simulation study to evaluate the performance of model-based multiple imputations in NCHS health examination surveys. Proceedings of the Bureau of the Census Annual Research Conference. Washington, DC. 1995, 257-266. Ezzati-Rice T, Johnson W, Khare M, Little R, Rubin D, Schafer J: A simulation study to evaluate the performance of model-based multiple imputations in NCHS health examination surveys. Proceedings of the Bureau of the Census Annual Research Conference. Washington, DC. 1995, 257-266.
11.
go back to reference Concato J, Peduzzi P, Holford TR, Feinstein AR: Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. Journal of Clinical Epidemiology. 1995, 48 (12): 1495-1501. 10.1016/0895-4356(95)00510-2.CrossRefPubMed Concato J, Peduzzi P, Holford TR, Feinstein AR: Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. Journal of Clinical Epidemiology. 1995, 48 (12): 1495-1501. 10.1016/0895-4356(95)00510-2.CrossRefPubMed
12.
go back to reference Efron B, Tibshirani RJ: An introduction to the bootstrap. 1993, London: Chapman and Hall/CRCCrossRef Efron B, Tibshirani RJ: An introduction to the bootstrap. 1993, London: Chapman and Hall/CRCCrossRef
13.
go back to reference Xia Z: Sampling with and without replacement. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley & Sons, 3944-3945. Xia Z: Sampling with and without replacement. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley & Sons, 3944-3945.
14.
go back to reference Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, Petticrew M, Altman DG: Empirical evaluation of the ability of case-mix adjustment methodologies to control for selection bias. Health Technology Assessment. 2003, 7 (27): 63-86.CrossRef Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, Petticrew M, Altman DG: Empirical evaluation of the ability of case-mix adjustment methodologies to control for selection bias. Health Technology Assessment. 2003, 7 (27): 63-86.CrossRef
15.
go back to reference Gray RG, Kerr DJ, McConkey CC, Williams NS, Hills RK, On behalf of the Quasar Collaborative group: Comparison of flurouracil with additional levamisole, higher-dose folinic acid, or both, as adjuvant chemotherapy for colorectal cancer: a randomised trial. Lancet. 2000, 355 (9215): 1588-1596. 10.1016/S0140-6736(00)02214-5.CrossRef Gray RG, Kerr DJ, McConkey CC, Williams NS, Hills RK, On behalf of the Quasar Collaborative group: Comparison of flurouracil with additional levamisole, higher-dose folinic acid, or both, as adjuvant chemotherapy for colorectal cancer: a randomised trial. Lancet. 2000, 355 (9215): 1588-1596. 10.1016/S0140-6736(00)02214-5.CrossRef
16.
go back to reference Quasar Collaborative Group, Gray R, Barnwell J, McConkey C, Hills R, Williams N, Kerr D: Adjuvant chemotherapy versus observation in patients with colorectal cancer: a randomised study. Lancet. 2007, 370 (9604): 2020-2029. 10.1016/S0140-6736(07)61866-2.CrossRef Quasar Collaborative Group, Gray R, Barnwell J, McConkey C, Hills R, Williams N, Kerr D: Adjuvant chemotherapy versus observation in patients with colorectal cancer: a randomised study. Lancet. 2007, 370 (9604): 2020-2029. 10.1016/S0140-6736(07)61866-2.CrossRef
17.
go back to reference Burton A, Altman DG, Royston P, Holder RL: The design of simulation studies in medical statistics. Statistics in Medicine. 2006, 25 (24): 4279-4292. 10.1002/sim.2673.CrossRefPubMed Burton A, Altman DG, Royston P, Holder RL: The design of simulation studies in medical statistics. Statistics in Medicine. 2006, 25 (24): 4279-4292. 10.1002/sim.2673.CrossRefPubMed
18.
go back to reference Clark TG, Stewart ME, Altman DG, Gabra H, Smyth JF: A prognostic model for ovarian cancer. British Journal of Cancer. 2001, 85 (7): 944-952. 10.1054/bjoc.2001.2030.CrossRefPubMedPubMedCentral Clark TG, Stewart ME, Altman DG, Gabra H, Smyth JF: A prognostic model for ovarian cancer. British Journal of Cancer. 2001, 85 (7): 944-952. 10.1054/bjoc.2001.2030.CrossRefPubMedPubMedCentral
19.
go back to reference Little RJA, Rubin DB: Statistical Analysis with Missing Data. 2002, New York: John Wiley and Sons, Second Little RJA, Rubin DB: Statistical Analysis with Missing Data. 2002, New York: John Wiley and Sons, Second
20.
go back to reference van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB: Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation. 2006, 76 (12): 1049-1064. 10.1080/10629360600810434.CrossRef van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB: Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation. 2006, 76 (12): 1049-1064. 10.1080/10629360600810434.CrossRef
21.
go back to reference Harrell FE: Hmisc: Harrell Miscellaneous library for R statistical software. R package 2. 2004, 2-3. Harrell FE: Hmisc: Harrell Miscellaneous library for R statistical software. R package 2. 2004, 2-3.
22.
go back to reference Rubin DB: Multiple Imputation for Nonresponse in Surveys. 2004, New York: John Wiley and Sons Rubin DB: Multiple Imputation for Nonresponse in Surveys. 2004, New York: John Wiley and Sons
23.
go back to reference Royston P, Altman DG: Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Journal of the Royal Statistical Society Series C-Applied Statistics. 1994, 43 (3): 429-467. Royston P, Altman DG: Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Journal of the Royal Statistical Society Series C-Applied Statistics. 1994, 43 (3): 429-467.
24.
go back to reference Ambler G, Brenner A: mfp: Multiple Fractional Polynomials library. R package version 1.2.2. 2004 Ambler G, Brenner A: mfp: Multiple Fractional Polynomials library. R package version 1.2.2. 2004
25.
go back to reference Harrell FE: Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. 2001, New York: Springer-Verlag Harrell FE: Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. 2001, New York: Springer-Verlag
26.
go back to reference Royston P, Sauerbrei W: A new measure of prognostic separation in survival data. Statistics in Medicine. 2004, 23 (5): 723-748. 10.1002/sim.1621.CrossRefPubMed Royston P, Sauerbrei W: A new measure of prognostic separation in survival data. Statistics in Medicine. 2004, 23 (5): 723-748. 10.1002/sim.1621.CrossRefPubMed
27.
go back to reference Schafer JL, Graham JW: Missing data: our view of the state of the art. Psychological Methods. 2002, 7 (2): 147-177. 10.1037/1082-989X.7.2.147.CrossRefPubMed Schafer JL, Graham JW: Missing data: our view of the state of the art. Psychological Methods. 2002, 7 (2): 147-177. 10.1037/1082-989X.7.2.147.CrossRefPubMed
28.
go back to reference Marshall A, Altman D, Holder R, Royston P: Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Medical Research Methodology. 2009, 9 (1): 57-10.1186/1471-2288-9-57.CrossRefPubMedPubMedCentral Marshall A, Altman D, Holder R, Royston P: Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Medical Research Methodology. 2009, 9 (1): 57-10.1186/1471-2288-9-57.CrossRefPubMedPubMedCentral
29.
go back to reference Clark TG, Altman DG: Developing a prognostic model in the presence of missing data. an ovarian cancer case study. Journal of Clinical Epidemiology. 2003, 56 (1): 28-37. 10.1016/S0895-4356(02)00539-5.CrossRefPubMed Clark TG, Altman DG: Developing a prognostic model in the presence of missing data. an ovarian cancer case study. Journal of Clinical Epidemiology. 2003, 56 (1): 28-37. 10.1016/S0895-4356(02)00539-5.CrossRefPubMed
30.
go back to reference Barzi F, Woodward M: Imputations of missing values in practice: Results from imputations of serum cholesterol in 28 cohort studies. American Journal of Epidemiology. 2004, 160 (1): 34-45. 10.1093/aje/kwh175.CrossRefPubMed Barzi F, Woodward M: Imputations of missing values in practice: Results from imputations of serum cholesterol in 28 cohort studies. American Journal of Epidemiology. 2004, 160 (1): 34-45. 10.1093/aje/kwh175.CrossRefPubMed
31.
go back to reference Little RJ: Missing data. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley and Sons, 2622-2635. Little RJ: Missing data. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley and Sons, 2622-2635.
32.
go back to reference Vach W, Blettner M: Missing data in epidemiologic studies. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley & Sons, 2641-2654. Vach W, Blettner M: Missing data in epidemiologic studies. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 1998, New York: John Wiley & Sons, 2641-2654.
33.
go back to reference Demissie S, LaValley MP, Horton NJ, Glynn RJ, Cupples LA: Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model. Statistics in Medicine. 2003, 22 (4): 545-557. 10.1002/sim.1340.CrossRefPubMed Demissie S, LaValley MP, Horton NJ, Glynn RJ, Cupples LA: Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model. Statistics in Medicine. 2003, 22 (4): 545-557. 10.1002/sim.1340.CrossRefPubMed
34.
go back to reference Schenker N, Taylor JMG: Partially parametric techniques for multiple imputation. Computational Statistics & Data Analysis. 1996, 22 (4): 425-446.CrossRef Schenker N, Taylor JMG: Partially parametric techniques for multiple imputation. Computational Statistics & Data Analysis. 1996, 22 (4): 425-446.CrossRef
35.
go back to reference Durrant GB: Imputation methods for handling item-nonresponse in the social sciences: a methodological review. 2005, Southampton: University of Southampton Durrant GB: Imputation methods for handling item-nonresponse in the social sciences: a methodological review. 2005, Southampton: University of Southampton
36.
go back to reference Yu LM, Burton A, Rivero-Arias O: Evaluation of software for multiple imputation of semi-continuous data. Statistical Methods in Medical Research. 2007, 16 (3): 243-258. 10.1177/0962280206074464.CrossRefPubMed Yu LM, Burton A, Rivero-Arias O: Evaluation of software for multiple imputation of semi-continuous data. Statistical Methods in Medical Research. 2007, 16 (3): 243-258. 10.1177/0962280206074464.CrossRefPubMed
37.
go back to reference Kenward MG, Carpenter J: Multiple imputation: current perspectives. Statistical Methods in Medical Research. 2007, 16 (3): 199-218. 10.1177/0962280206075304.CrossRefPubMed Kenward MG, Carpenter J: Multiple imputation: current perspectives. Statistical Methods in Medical Research. 2007, 16 (3): 199-218. 10.1177/0962280206075304.CrossRefPubMed
38.
go back to reference Meng XL: Multiple-imputation inferences with uncongenial sources of input. Statistical Science. 1994, 9 (4): 538-558. Meng XL: Multiple-imputation inferences with uncongenial sources of input. Statistical Science. 1994, 9 (4): 538-558.
39.
40.
go back to reference van Buuren S, Oudshoorn CGM: mice: Multivariate Imputation by Chained Equations library. R package version 1.13.1. 2005 van Buuren S, Oudshoorn CGM: mice: Multivariate Imputation by Chained Equations library. R package version 1.13.1. 2005
Metadata
Title
Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
Authors
Andrea Marshall
Douglas G Altman
Roger L Holder
Publication date
01-12-2010
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2010
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-10-112

Other articles of this Issue 1/2010

BMC Medical Research Methodology 1/2010 Go to the issue