Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2017

Open Access 01-12-2017 | Research article

Clinical prediction in defined populations: a simulation study investigating when and how to aggregate existing models

Authors: Glen P. Martin, Mamas A. Mamas, Niels Peek, Iain Buchan, Matthew Sperrin

Published in: BMC Medical Research Methodology | Issue 1/2017

Login to get access

Abstract

Background

Clinical prediction models (CPMs) are increasingly deployed to support healthcare decisions but they are derived inconsistently, in part due to limited data. An emerging alternative is to aggregate existing CPMs developed for similar settings and outcomes. This simulation study aimed to investigate the impact of between-population-heterogeneity and sample size on aggregating existing CPMs in a defined population, compared with developing a model de novo.

Methods

Simulations were designed to mimic a scenario in which multiple CPMs for a binary outcome had been derived in distinct, heterogeneous populations, with potentially different predictors available in each. We then generated a new ‘local’ population and compared the performance of CPMs developed for this population by aggregation, using stacked regression, principal component analysis or partial least squares, with redevelopment from scratch using backwards selection and penalised regression.

Results

While redevelopment approaches resulted in models that were miscalibrated for local datasets of less than 500 observations, model aggregation methods were well calibrated across all simulation scenarios. When the size of local data was less than 1000 observations and between-population-heterogeneity was small, aggregating existing CPMs gave better discrimination and had the lowest mean square error in the predicted risks compared with deriving a new model. Conversely, given greater than 1000 observations and significant between-population-heterogeneity, then redevelopment outperformed the aggregation approaches. In all other scenarios, both aggregation and de novo derivation resulted in similar predictive performance.

Conclusion

This study demonstrates a pragmatic approach to contextualising CPMs to defined populations. When aiming to develop models in defined populations, modellers should consider existing CPMs, with aggregation approaches being a suitable modelling strategy particularly with sparse data on the local population.
Appendix
Available only for authorised users
Literature
1.
go back to reference Kappen TH, Vergouwe Y, van Klei WA, van Wolfswinkel L, Kalkman CJ, Moons KGM. Adaptation of Clinical Prediction Models for Application in Local Settings. Med Decis Mak. 2012;32:E1–E10.CrossRef Kappen TH, Vergouwe Y, van Klei WA, van Wolfswinkel L, Kalkman CJ, Moons KGM. Adaptation of Clinical Prediction Models for Application in Local Settings. Med Decis Mak. 2012;32:E1–E10.CrossRef
2.
go back to reference Damen JAAG, Hooft L, Schuit E, Debray TPA, Collins GS, Tzoulaki I, Lassale CM, Siontis GCM, Chiocchia V, Roberts C, Schlüssel MM, Gerry S, Black JA, Heus P, van der Schouw YT, Peelen LM, Moons KGM. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353:i2416.CrossRefPubMedPubMedCentral Damen JAAG, Hooft L, Schuit E, Debray TPA, Collins GS, Tzoulaki I, Lassale CM, Siontis GCM, Chiocchia V, Roberts C, Schlüssel MM, Gerry S, Black JA, Heus P, van der Schouw YT, Peelen LM, Moons KGM. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353:i2416.CrossRefPubMedPubMedCentral
3.
go back to reference Altman DG, Vergouwe Y, Royston P, Moons KGM. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338:b605.CrossRefPubMed Altman DG, Vergouwe Y, Royston P, Moons KGM. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338:b605.CrossRefPubMed
4.
go back to reference Royston P, Moons KGM, Altman DG, Vergouwe Y. Prognosis and prognostic research: Developing a prognostic model. BMJ. 2009;338:b604.CrossRefPubMed Royston P, Moons KGM, Altman DG, Vergouwe Y. Prognosis and prognostic research: Developing a prognostic model. BMJ. 2009;338:b604.CrossRefPubMed
5.
go back to reference Moons KGM, Altman DG, Vergouwe Y, Royston P. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ. 2009;338:b606.CrossRefPubMed Moons KGM, Altman DG, Vergouwe Y, Royston P. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ. 2009;338:b606.CrossRefPubMed
6.
go back to reference Riley RD, Ensor J, Snell KIE, Debray TPA, Altman DG, Moons KGM, Collins GS. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ. 2016;353:i3140.CrossRefPubMedPubMedCentral Riley RD, Ensor J, Snell KIE, Debray TPA, Altman DG, Moons KGM, Collins GS. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ. 2016;353:i3140.CrossRefPubMedPubMedCentral
7.
go back to reference Janssen KJM, Moons KGM, Kalkman CJ, Grobbee DE, Vergouwe Y. Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol. 2008;61:76–86.CrossRefPubMed Janssen KJM, Moons KGM, Kalkman CJ, Grobbee DE, Vergouwe Y. Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol. 2008;61:76–86.CrossRefPubMed
8.
go back to reference Steyerberg EW, Borsboom GJJM, van Houwelingen HC, Eijkemans MJC, Habbema JDF. Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat Med. 2004;23:2567–86.CrossRefPubMed Steyerberg EW, Borsboom GJJM, van Houwelingen HC, Eijkemans MJC, Habbema JDF. Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat Med. 2004;23:2567–86.CrossRefPubMed
9.
go back to reference Debray TPA, Koffijberg H, Nieboer D, Vergouwe Y, Steyerberg EW, Moons KGM. Meta-analysis and aggregation of multiple published prediction models. Stat Med. 2014;33:2341–62.CrossRefPubMed Debray TPA, Koffijberg H, Nieboer D, Vergouwe Y, Steyerberg EW, Moons KGM. Meta-analysis and aggregation of multiple published prediction models. Stat Med. 2014;33:2341–62.CrossRefPubMed
10.
11.
go back to reference Altman DG. Prognostic Models: A Methodological Framework and Review of Models for Breast Cancer. Cancer Invest. 2009;27:235–43.CrossRefPubMed Altman DG. Prognostic Models: A Methodological Framework and Review of Models for Breast Cancer. Cancer Invest. 2009;27:235–43.CrossRefPubMed
12.
go back to reference Nashef SAM, Roques F, Sharples LD, Nilsson J, Smith C, Goldstone AR, Lockowandt U. EuroSCORE II. Eur J Cardio-Thoracic Surg. 2012;41:734–45.CrossRef Nashef SAM, Roques F, Sharples LD, Nilsson J, Smith C, Goldstone AR, Lockowandt U. EuroSCORE II. Eur J Cardio-Thoracic Surg. 2012;41:734–45.CrossRef
14.
go back to reference O’Brien SM, Shahian DM, Filardo G, Ferraris VA, Haan CK, Rich JB, Normand S-LT, DeLong ER, Shewan CM, Dokholyan RS, Peterson ED, Edwards FH, Anderson RP. The Society of Thoracic Surgeons 2008 Cardiac Surgery Risk Models: Part 2—Isolated Valve Surgery. Ann Thorac Surg. 2009;88:S23–42.CrossRefPubMed O’Brien SM, Shahian DM, Filardo G, Ferraris VA, Haan CK, Rich JB, Normand S-LT, DeLong ER, Shewan CM, Dokholyan RS, Peterson ED, Edwards FH, Anderson RP. The Society of Thoracic Surgeons 2008 Cardiac Surgery Risk Models: Part 2—Isolated Valve Surgery. Ann Thorac Surg. 2009;88:S23–42.CrossRefPubMed
15.
go back to reference Shahian DM, O’Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, Normand S-LT, DeLong ER, Shewan CM, Dokholyan RS, Peterson ED, Edwards FH, Anderson RP. The Society of Thoracic Surgeons 2008 Cardiac Surgery Risk Models: Part 3—Valve Plus Coronary Artery Bypass Grafting Surgery. Ann Thorac Surg. 2009;88:S43–62.CrossRefPubMed Shahian DM, O’Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, Normand S-LT, DeLong ER, Shewan CM, Dokholyan RS, Peterson ED, Edwards FH, Anderson RP. The Society of Thoracic Surgeons 2008 Cardiac Surgery Risk Models: Part 3—Valve Plus Coronary Artery Bypass Grafting Surgery. Ann Thorac Surg. 2009;88:S43–62.CrossRefPubMed
16.
go back to reference Kotting J, Schiller W, Beckmann A, Schafer E, Dobler K, Hamm C, Veit C, Welz A. German Aortic Valve Score: a new scoring system for prediction of mortality related to aortic valve procedures in adults. Eur J Cardio-Thoracic Surg. 2013;43:971–7.CrossRef Kotting J, Schiller W, Beckmann A, Schafer E, Dobler K, Hamm C, Veit C, Welz A. German Aortic Valve Score: a new scoring system for prediction of mortality related to aortic valve procedures in adults. Eur J Cardio-Thoracic Surg. 2013;43:971–7.CrossRef
17.
go back to reference Breiman L. Stacked Regression. Mach Learn. 1996;24:49–64. Breiman L. Stacked Regression. Mach Learn. 1996;24:49–64.
18.
go back to reference Debray TPA, Koffijberg H, Vergouwe Y, Moons KGM, Steyerberg EW. Aggregating published prediction models with individual participant data: a comparison of different approaches. Stat Med. 2012;31:2697–712.CrossRefPubMed Debray TPA, Koffijberg H, Vergouwe Y, Moons KGM, Steyerberg EW. Aggregating published prediction models with individual participant data: a comparison of different approaches. Stat Med. 2012;31:2697–712.CrossRefPubMed
19.
go back to reference Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933;24:417–41.CrossRef Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933;24:417–41.CrossRef
20.
go back to reference Merz CJ, Pazzani MJ. A Principal Components Approach to Combining Regression Estimates. Mach Learn. 1999;36:9–32.CrossRef Merz CJ, Pazzani MJ. A Principal Components Approach to Combining Regression Estimates. Mach Learn. 1999;36:9–32.CrossRef
21.
go back to reference Vergouwe Y, Steyerberg EW, Eijkemans MJC, Habbema JDF. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol. 2005;58:475–83.CrossRefPubMed Vergouwe Y, Steyerberg EW, Eijkemans MJC, Habbema JDF. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol. 2005;58:475–83.CrossRefPubMed
22.
go back to reference Collins GS, Ogundimu EO, Altman DG. Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med. 2016;35:214–26.CrossRefPubMed Collins GS, Ogundimu EO, Altman DG. Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med. 2016;35:214–26.CrossRefPubMed
23.
go back to reference Peek N, Arts DGT, Bosman RJ, van der Voort PHJ, de Keizer NF. External validation of prognostic models for critically ill patients required substantial sample sizes. J Clin Epidemiol. 2007;60:491–501.CrossRefPubMed Peek N, Arts DGT, Bosman RJ, van der Voort PHJ, de Keizer NF. External validation of prognostic models for critically ill patients required substantial sample sizes. J Clin Epidemiol. 2007;60:491–501.CrossRefPubMed
24.
go back to reference Austin PC, Steyerberg EW. Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Stat Methods Med Res. 2014. doi:10.1177/0962280214558972.PubMedCentral Austin PC, Steyerberg EW. Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Stat Methods Med Res. 2014. doi:10.​1177/​0962280214558972​.PubMedCentral
25.
go back to reference R Core Team R: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing 2016. [R Foundation for Statistical Computing] R Core Team R: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing 2016. [R Foundation for Statistical Computing]
26.
go back to reference Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77.CrossRefPubMedPubMedCentral Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77.CrossRefPubMedPubMedCentral
27.
go back to reference Bertrand F, Meyer N, Maumy-Bertrand M. Partial Least Squares Regression for Generalized Linear Models. 2014. Bertrand F, Meyer N, Maumy-Bertrand M. Partial Least Squares Regression for Generalized Linear Models. 2014.
28.
29.
go back to reference Sperrin M, Jaki T. Recovering Independent Associations in Genetics: A Comparison. J Comput Biol. 2012;19:978–87.CrossRefPubMed Sperrin M, Jaki T. Recovering Independent Associations in Genetics: A Comparison. J Comput Biol. 2012;19:978–87.CrossRefPubMed
30.
go back to reference Cox D. Two further applications of a model for binary regression. Biometrika. 1958;45:562–5.CrossRef Cox D. Two further applications of a model for binary regression. Biometrika. 1958;45:562–5.CrossRef
31.
go back to reference Steyerberg E. Stepwise Selection in Small Data Sets A Simulation Study of Bias in Logistic Regression Analysis. J Clin Epidemiol. 1999;52:935–42.CrossRefPubMed Steyerberg E. Stepwise Selection in Small Data Sets A Simulation Study of Bias in Logistic Regression Analysis. J Clin Epidemiol. 1999;52:935–42.CrossRefPubMed
32.
go back to reference Steyerberg EW, Eijkemans MJC, Harrell FE, Habbema JDF. Prognostic Modeling with Logistic Regression Analysis: In Search of a Sensible Strategy in Small Data Sets. Med Decis Mak. 2001;21:45–56.CrossRef Steyerberg EW, Eijkemans MJC, Harrell FE, Habbema JDF. Prognostic Modeling with Logistic Regression Analysis: In Search of a Sensible Strategy in Small Data Sets. Med Decis Mak. 2001;21:45–56.CrossRef
33.
go back to reference Toll DB, Janssen KJM, Vergouwe Y, Moons KGM. Validation, updating and impact of clinical prediction rules: A review. J Clin Epidemiol. 2008;61:1085–94.CrossRefPubMed Toll DB, Janssen KJM, Vergouwe Y, Moons KGM. Validation, updating and impact of clinical prediction rules: A review. J Clin Epidemiol. 2008;61:1085–94.CrossRefPubMed
34.
go back to reference Steyerberg EW, Eijkemans MJC, Van Houwelingen JC, Lee KL, Habbema JDF. Prognostic models based on literature and individual patient data in logistic regression analysis. Stat Med. 2000;19:141–60.CrossRefPubMed Steyerberg EW, Eijkemans MJC, Van Houwelingen JC, Lee KL, Habbema JDF. Prognostic models based on literature and individual patient data in logistic regression analysis. Stat Med. 2000;19:141–60.CrossRefPubMed
35.
go back to reference Riley RD, Simmonds MC, Look MP. Evidence synthesis combining individual patient data and aggregate data: a systematic review identified current practice and possible methods. J Clin Epidemiol. 2007;60:431–9.PubMed Riley RD, Simmonds MC, Look MP. Evidence synthesis combining individual patient data and aggregate data: a systematic review identified current practice and possible methods. J Clin Epidemiol. 2007;60:431–9.PubMed
36.
go back to reference Yoneoka D, Henmi M, Sawada N, Inoue M. Synthesis of clinical prediction models under different sets of covariates with one individual patient data. BMC Med Res Methodol. 2015;15:101.CrossRefPubMedPubMedCentral Yoneoka D, Henmi M, Sawada N, Inoue M. Synthesis of clinical prediction models under different sets of covariates with one individual patient data. BMC Med Res Methodol. 2015;15:101.CrossRefPubMedPubMedCentral
37.
go back to reference Held U, Kessels A, Garcia Aymerich J, Basagaña X, ter Riet G, Moons KGM, Puhan MA. Methods for Handling Missing Variables in Risk Prediction Models. Am J Epidemiol. 2016. doi:10.1093/aje/kwv346.PubMed Held U, Kessels A, Garcia Aymerich J, Basagaña X, ter Riet G, Moons KGM, Puhan MA. Methods for Handling Missing Variables in Risk Prediction Models. Am J Epidemiol. 2016. doi:10.​1093/​aje/​kwv346.PubMed
38.
go back to reference Hemingway H, Riley RD, Altman DG. Ten steps towards improving prognosis research. BMJ. 2009;339:b4184.CrossRefPubMed Hemingway H, Riley RD, Altman DG. Ten steps towards improving prognosis research. BMJ. 2009;339:b4184.CrossRefPubMed
39.
go back to reference Debray TPA, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KGM. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol. 2015;68:279–89.CrossRefPubMed Debray TPA, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KGM. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol. 2015;68:279–89.CrossRefPubMed
Metadata
Title
Clinical prediction in defined populations: a simulation study investigating when and how to aggregate existing models
Authors
Glen P. Martin
Mamas A. Mamas
Niels Peek
Iain Buchan
Matthew Sperrin
Publication date
01-12-2017
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2017
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-016-0277-1

Other articles of this Issue 1/2017

BMC Medical Research Methodology 1/2017 Go to the issue