Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2016

Open Access 01-12-2016 | Research Article

Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation

Authors: Simone Wahl, Anne-Laure Boulesteix, Astrid Zierer, Barbara Thorand, Mark A. van de Wiel

Published in: BMC Medical Research Methodology | Issue 1/2016

Login to get access

Abstract

Background

Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pooled. If the aim is to estimate (added) predictive performance measures, such as (change in) the area under the receiver-operating characteristic curve (AUC), internal validation strategies become desirable in order to correct for optimism. It is not fully understood how internal validation should be combined with multiple imputation.

Methods

In a comprehensive simulation study and in a real data set based on blood markers as predictors for mortality, we compare three combination strategies: Val-MI, internal validation followed by MI on the training and test parts separately, MI-Val, MI on the full data set followed by internal validation, and MI(-y)-Val, MI on the full data set omitting the outcome followed by internal validation. Different validation strategies, including bootstrap und cross-validation, different (added) performance measures, and various data characteristics are considered, and the strategies are evaluated with regard to bias and mean squared error of the obtained performance estimates. In addition, we elaborate on the number of resamples and imputations to be used, and adopt a strategy for confidence interval construction to incomplete data.

Results

Internal validation is essential in order to avoid optimism, with the bootstrap 0.632+ estimate representing a reliable method to correct for optimism. While estimates obtained by MI-Val are optimistically biased, those obtained by MI(-y)-Val tend to be pessimistic in the presence of a true underlying effect. Val-MI provides largely unbiased estimates, with a slight pessimistic bias with increasing true effect size, number of covariates and decreasing sample size. In Val-MI, accuracy of the estimate is more strongly improved by increasing the number of bootstrap draws rather than the number of imputations. With a simple integrated approach, valid confidence intervals for performance estimates can be obtained.

Conclusions

When prognostic models are developed on incomplete data, Val-MI represents a valid strategy to obtain estimates of predictive performance measures.
Appendix
Available only for authorised users
Literature
1.
2.
go back to reference Steyerberg EW, Jr Harrell F, Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001; 54(8):774–81.CrossRefPubMed Steyerberg EW, Jr Harrell F, Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001; 54(8):774–81.CrossRefPubMed
3.
go back to reference Jr Harrell F, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996; 15(4):361–87.CrossRef Jr Harrell F, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996; 15(4):361–87.CrossRef
4.
go back to reference Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010; 21(1):128–38.CrossRefPubMedPubMedCentral Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010; 21(1):128–38.CrossRefPubMedPubMedCentral
5.
go back to reference Raessler S, Rubin DB, Zell ER. Incomplete data in epidemiology and medical statistics. Handb Stat. 2008; 27:569–601.CrossRef Raessler S, Rubin DB, Zell ER. Incomplete data in epidemiology and medical statistics. Handb Stat. 2008; 27:569–601.CrossRef
6.
go back to reference van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw. 2011; 45:1–67. van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw. 2011; 45:1–67.
7.
go back to reference van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med. 1999; 18:681–94.CrossRefPubMed van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med. 1999; 18:681–94.CrossRefPubMed
8.
go back to reference Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons; 1987.CrossRef Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons; 1987.CrossRef
9.
go back to reference Heymans MW, van Buuren S, Knol DL, van Mechelen W, de Vet HCW. Variable selection under multiple imputation using the bootstrap in a prognostic study. BMC Med Res Methodol. 2007; 7:33.CrossRefPubMedPubMedCentral Heymans MW, van Buuren S, Knol DL, van Mechelen W, de Vet HCW. Variable selection under multiple imputation using the bootstrap in a prognostic study. BMC Med Res Methodol. 2007; 7:33.CrossRefPubMedPubMedCentral
10.
go back to reference Vergouw D, Heymans MW, Peat GM, Kuijpers T, Croft PR, de Vet HCW, van der Horst HE, van der Windt DAWM. The search for stable prognostic models in multiple imputed data sets. BMC Med Res Methodol. 2010; 10:81.CrossRefPubMedPubMedCentral Vergouw D, Heymans MW, Peat GM, Kuijpers T, Croft PR, de Vet HCW, van der Horst HE, van der Windt DAWM. The search for stable prognostic models in multiple imputed data sets. BMC Med Res Methodol. 2010; 10:81.CrossRefPubMedPubMedCentral
11.
go back to reference Vergouwe Y, Royston P, Moons KGM, Altman DG. Development and validation of a prediction model with missing predictor data: a practical approach. J Clin Epidemiol. 2010; 63(2):205–14.CrossRefPubMed Vergouwe Y, Royston P, Moons KGM, Altman DG. Development and validation of a prediction model with missing predictor data: a practical approach. J Clin Epidemiol. 2010; 63(2):205–14.CrossRefPubMed
12.
go back to reference Musoro JZ, Zwinderman AH, Puhan MA, ter Riet G, Geskus RB. Validation of prediction models based on lasso regression with multiply imputed data. BMC Med Res Methodol. 2014; 14:116.CrossRefPubMedPubMedCentral Musoro JZ, Zwinderman AH, Puhan MA, ter Riet G, Geskus RB. Validation of prediction models based on lasso regression with multiply imputed data. BMC Med Res Methodol. 2014; 14:116.CrossRefPubMedPubMedCentral
13.
go back to reference Wood AM, Royston P, White IR. The estimation and use of predictions for the assessment of model performance using large samples with multiply imputed data. Biom J. 2015; 57(4):614–32.CrossRefPubMedPubMedCentral Wood AM, Royston P, White IR. The estimation and use of predictions for the assessment of model performance using large samples with multiply imputed data. Biom J. 2015; 57(4):614–32.CrossRefPubMedPubMedCentral
14.
go back to reference Hornung R, Bernau C, Truntzer C, Wilson R, Stadler T, Boulesteix AL. A measure of the impact of CV incompleteness on prediction error estimation with application to PCA and normalization. BMC Med Res Methodol. 2015; 15:95.CrossRefPubMedPubMedCentral Hornung R, Bernau C, Truntzer C, Wilson R, Stadler T, Boulesteix AL. A measure of the impact of CV incompleteness on prediction error estimation with application to PCA and normalization. BMC Med Res Methodol. 2015; 15:95.CrossRefPubMedPubMedCentral
15.
go back to reference Su JQ, Liu JS. Linear combinations of multiple diagnostic markers. J Am Stat Assoc. 1993; 88(424):1350–5.CrossRef Su JQ, Liu JS. Linear combinations of multiple diagnostic markers. J Am Stat Assoc. 1993; 88(424):1350–5.CrossRef
16.
go back to reference Marshall A, Altman DG, Royston P, Holder RL. Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol. 2010; 10:7.CrossRefPubMedPubMedCentral Marshall A, Altman DG, Royston P, Holder RL. Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol. 2010; 10:7.CrossRefPubMedPubMedCentral
17.
go back to reference Holle R, Happich M, Lowel H, Wichmann H. KORA – a research platform for population based health research. Gesundheitswesen. 2005; 67:19–25.CrossRef Holle R, Happich M, Lowel H, Wichmann H. KORA – a research platform for population based health research. Gesundheitswesen. 2005; 67:19–25.CrossRef
18.
go back to reference Herder C, Baumert J, Zierer A, Roden M, Meisinger C, Karakas M, Chambless L, Rathmann W, Peters A, Koenig W, Thorand B. Immunological and cardiometabolic risk factors in the prediction of type 2 diabetes and coronary events: MONICA/KORA Augsburg case-cohort study. PLoS ONE. 2011; 6:19852.CrossRef Herder C, Baumert J, Zierer A, Roden M, Meisinger C, Karakas M, Chambless L, Rathmann W, Peters A, Koenig W, Thorand B. Immunological and cardiometabolic risk factors in the prediction of type 2 diabetes and coronary events: MONICA/KORA Augsburg case-cohort study. PLoS ONE. 2011; 6:19852.CrossRef
19.
go back to reference Thorand B, Zierer A, Huth C, Linseisen J, Meisinger C, Roden M, Peters A, Koenig W, Herder C. Effect of serum 25-hydroxyvitamin D on risk for type 2 diabetes may be partially mediated by subclinical inflammation: results from the MONICA/KORA Augsburg study. Diabetes Care. 2011; 34(10):2320–2.CrossRefPubMedPubMedCentral Thorand B, Zierer A, Huth C, Linseisen J, Meisinger C, Roden M, Peters A, Koenig W, Herder C. Effect of serum 25-hydroxyvitamin D on risk for type 2 diabetes may be partially mediated by subclinical inflammation: results from the MONICA/KORA Augsburg study. Diabetes Care. 2011; 34(10):2320–2.CrossRefPubMedPubMedCentral
20.
go back to reference Karakas M, Koenig W, Zierer A, Herder C, Rottbauer W, Baumert J, Meisinger C, Thorand B. Myeloperoxidase is associated with incident coronary heart disease independently of traditional risk factors: results from the MONICA/KORA Augsburg study. J Intern Med. 2012; 271(1):43–50.CrossRefPubMed Karakas M, Koenig W, Zierer A, Herder C, Rottbauer W, Baumert J, Meisinger C, Thorand B. Myeloperoxidase is associated with incident coronary heart disease independently of traditional risk factors: results from the MONICA/KORA Augsburg study. J Intern Med. 2012; 271(1):43–50.CrossRefPubMed
21.
go back to reference Raghunathan TE, Lepkowski JM, Hoewyk JV, Solenberger P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol. 2001; 27:85–95. Raghunathan TE, Lepkowski JM, Hoewyk JV, Solenberger P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol. 2001; 27:85–95.
22.
go back to reference Yuan Y. Multiple imputation using sas software. J Stat Softw. 2011; 45:1–25.CrossRef Yuan Y. Multiple imputation using sas software. J Stat Softw. 2011; 45:1–25.CrossRef
23.
go back to reference Efron B, Tibshirani R. Improvement on cross-validation: the 0.632+ bootstrap method. J Am Stat Assoc. 1997; 92:548–60. Efron B, Tibshirani R. Improvement on cross-validation: the 0.632+ bootstrap method. J Am Stat Assoc. 1997; 92:548–60.
24.
go back to reference Gerds TA, Cai T, Schumacher M. The performance of risk prediction models. Biom J. 2008; 50(4):457–79.CrossRefPubMed Gerds TA, Cai T, Schumacher M. The performance of risk prediction models. Biom J. 2008; 50(4):457–79.CrossRefPubMed
25.
go back to reference Efron B. Estimating the error rate of a prediction rule: Some improvements on cross-validation. J Am Stat Assoc. 1983; 73:555–66. Efron B. Estimating the error rate of a prediction rule: Some improvements on cross-validation. J Am Stat Assoc. 1983; 73:555–66.
26.
go back to reference von Hippel PT. Regression with missing Y’s: an improved method for analyzing multiply-imputed data. Sociol Methodol. 2007; 37:83–117.CrossRef von Hippel PT. Regression with missing Y’s: an improved method for analyzing multiply-imputed data. Sociol Methodol. 2007; 37:83–117.CrossRef
27.
go back to reference Jr Harrell F, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA. 1982; 247(18):2543–6.CrossRef Jr Harrell F, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA. 1982; 247(18):2543–6.CrossRef
28.
go back to reference Miller ME, Hui SL, Tierney WM. Validation techniques for logistic regression models. Stat Med. 1991; 10(8):1213–26.CrossRefPubMed Miller ME, Hui SL, Tierney WM. Validation techniques for logistic regression models. Stat Med. 1991; 10(8):1213–26.CrossRefPubMed
29.
go back to reference Brier G. Verification of forecasts expressed in terms of probability. Mon Weather Rev. 1950; 78:1–3.CrossRef Brier G. Verification of forecasts expressed in terms of probability. Mon Weather Rev. 1950; 78:1–3.CrossRef
30.
go back to reference Pencina MJ, Sr D’Agostino RB, Jr D’Agostino RB, Vasan RS. Evaluating the added predictive ability of a new marker: From area under the roc curve to reclassification and beyond. Stat Med. 2008; 27:157–72.CrossRefPubMed Pencina MJ, Sr D’Agostino RB, Jr D’Agostino RB, Vasan RS. Evaluating the added predictive ability of a new marker: From area under the roc curve to reclassification and beyond. Stat Med. 2008; 27:157–72.CrossRefPubMed
31.
go back to reference Mihaescu R, van Zitteren M, van Hoek M, Sijbrands EJG, Uitterlinden AG, Witteman JCM, Hofman A, Hunink MGM, van Duijn CM, Janssens ACJW. Improvement of risk prediction by genomic profiling: reclassification measures versus the area under the receiver operating characteristic curve. Am J Epidemiol. 2010; 172(3):353–61.CrossRefPubMed Mihaescu R, van Zitteren M, van Hoek M, Sijbrands EJG, Uitterlinden AG, Witteman JCM, Hofman A, Hunink MGM, van Duijn CM, Janssens ACJW. Improvement of risk prediction by genomic profiling: reclassification measures versus the area under the receiver operating characteristic curve. Am J Epidemiol. 2010; 172(3):353–61.CrossRefPubMed
32.
go back to reference Pencina MJ, Sr D’Agostino RB, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011; 30(1):11–21.CrossRefPubMed Pencina MJ, Sr D’Agostino RB, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011; 30(1):11–21.CrossRefPubMed
33.
go back to reference Heagerty PJ, Lumley T, Pepe MS. Time-dependent roc curves for censored survival data and a diagnostic marker. Biometrics. 2000; 56:337–44.CrossRefPubMed Heagerty PJ, Lumley T, Pepe MS. Time-dependent roc curves for censored survival data and a diagnostic marker. Biometrics. 2000; 56:337–44.CrossRefPubMed
34.
go back to reference Jiang B, Zhang X, Cai T. Estimating the confidence interval for prediction errors of support vector machine classifiers. J Mach Learn Res. 2008; 9:521–40. Jiang B, Zhang X, Cai T. Estimating the confidence interval for prediction errors of support vector machine classifiers. J Mach Learn Res. 2008; 9:521–40.
35.
go back to reference Uno H, Cai T, Tian L, Wei L. Evaluating prediction rules for t-year survivors with censored regression models. J Am Stat Assoc. 2007; 102(478):527–37.CrossRef Uno H, Cai T, Tian L, Wei L. Evaluating prediction rules for t-year survivors with censored regression models. J Am Stat Assoc. 2007; 102(478):527–37.CrossRef
36.
go back to reference R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2014. http://www.R-project.org/. R Foundation for Statistical Computing. R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2014. http://​www.​R-project.​org/​.​ R Foundation for Statistical Computing.
38.
go back to reference Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Müller M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinforma. 2011; 12:77.CrossRef Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Müller M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinforma. 2011; 12:77.CrossRef
39.
go back to reference Kundu S, Aulchenko YS, Janssens ACJW. PredictABEL: Assessment of Risk Prediction Models. Kundu S, Aulchenko YS, Janssens ACJW. PredictABEL: Assessment of Risk Prediction Models.
41.
go back to reference Wehberg S, Schumacher M. A comparison of nonparametric error rate estimation methods in classification problems. Biom J. 2004; 46(1):35–47.CrossRef Wehberg S, Schumacher M. A comparison of nonparametric error rate estimation methods in classification problems. Biom J. 2004; 46(1):35–47.CrossRef
42.
go back to reference Braga-Neto UM, Dougherty ER. Is cross-validation valid for small-sample microarray classification?Bioinformatics. 2004; 20(3):374–80.CrossRefPubMed Braga-Neto UM, Dougherty ER. Is cross-validation valid for small-sample microarray classification?Bioinformatics. 2004; 20(3):374–80.CrossRefPubMed
43.
go back to reference Sahiner B, Chan HP, Hadjiiski L. Classifier performance prediction for computer-aided diagnosis using a limited dataset. Med Phys. 2008; 35(4):1559–70.CrossRefPubMedPubMedCentral Sahiner B, Chan HP, Hadjiiski L. Classifier performance prediction for computer-aided diagnosis using a limited dataset. Med Phys. 2008; 35(4):1559–70.CrossRefPubMedPubMedCentral
44.
go back to reference Shao J, Sitter RR. Bootstrap for imputed survey data. J Am Stat Assoc. 1996; 91(435):1278–88.CrossRef Shao J, Sitter RR. Bootstrap for imputed survey data. J Am Stat Assoc. 1996; 91(435):1278–88.CrossRef
45.
go back to reference Siersma V, Johansen C. The use of the bootstrap in the analysis of case-control studies with missing data. 2004. Technical report. Siersma V, Johansen C. The use of the bootstrap in the analysis of case-control studies with missing data. 2004. Technical report.
46.
go back to reference Moons KGM, Donders RART, Stijnen T, Jr Harrell FE. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006; 59(10):1092–101.CrossRefPubMed Moons KGM, Donders RART, Stijnen T, Jr Harrell FE. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006; 59(10):1092–101.CrossRefPubMed
47.
go back to reference Pepe MS, Fan J, Feng Z, Gerds T, Hilden J. The net reclassification index (NRI): a misleading measure of prediction improvement even with independent test data sets. Stat Biosci. 2015; 7(2):282–95.CrossRefPubMed Pepe MS, Fan J, Feng Z, Gerds T, Hilden J. The net reclassification index (NRI): a misleading measure of prediction improvement even with independent test data sets. Stat Biosci. 2015; 7(2):282–95.CrossRefPubMed
49.
go back to reference Jiang W, Varma S, Simon R. Calculating confidence intervals for prediction error in microarray classification using resampling. Stat Appl Genet Mol Biol. 2008; 7(1):8. Jiang W, Varma S, Simon R. Calculating confidence intervals for prediction error in microarray classification using resampling. Stat Appl Genet Mol Biol. 2008; 7(1):8.
50.
go back to reference van de Wiel MA, Berkhof J, van Wieringen WN. Testing the prediction error difference between 2 predictors. Biostatistics. 2009; 10(3):550–60.CrossRefPubMed van de Wiel MA, Berkhof J, van Wieringen WN. Testing the prediction error difference between 2 predictors. Biostatistics. 2009; 10(3):550–60.CrossRefPubMed
51.
go back to reference Janitza S, Binder H, Boulesteix AL. Pitfalls of hypothesis tests and model selection on bootstrap samples: Causes and consequences in biometrical applications. Biom J. 2015; 58(3):447–73.CrossRefPubMed Janitza S, Binder H, Boulesteix AL. Pitfalls of hypothesis tests and model selection on bootstrap samples: Causes and consequences in biometrical applications. Biom J. 2015; 58(3):447–73.CrossRefPubMed
Metadata
Title
Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation
Authors
Simone Wahl
Anne-Laure Boulesteix
Astrid Zierer
Barbara Thorand
Mark A. van de Wiel
Publication date
01-12-2016
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2016
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-016-0239-7

Other articles of this Issue 1/2016

BMC Medical Research Methodology 1/2016 Go to the issue