Skip to main content
Top
Published in: European Journal of Epidemiology 5/2018

01-05-2018 | METHODS

Stacked generalization: an introduction to super learning

Published in: European Journal of Epidemiology | Issue 5/2018

Login to get access

Abstract

Stacked generalization is an ensemble method that allows researchers to combine several different prediction algorithms into one. Since its introduction in the early 1990s, the method has evolved several times into a host of methods among which is the “Super Learner”. Super Learner uses V-fold cross-validation to build the optimal weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve. Although relatively simple in nature, use of Super Learner by epidemiologists has been hampered by limitations in understanding conceptual and technical details. We work step-by-step through two examples to illustrate concepts and address common concerns.
Literature
1.
2.
go back to reference Breiman L. Stacked regressions. Mach Learn. 1996;24:49–64. Breiman L. Stacked regressions. Mach Learn. 1996;24:49–64.
3.
go back to reference van der Laan M, Dudoit S. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and example. Technical report 30, division of biostatistics, University of California, Berkeley. 2003. van der Laan M, Dudoit S. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and example. Technical report 30, division of biostatistics, University of California, Berkeley. 2003.
4.
go back to reference van der Laan M, Dudoit S, van der Vaart AW. The cross-validated adaptive epsilon-net estimator. Stat Decis. 2006;24:373. van der Laan M, Dudoit S, van der Vaart AW. The cross-validated adaptive epsilon-net estimator. Stat Decis. 2006;24:373.
5.
go back to reference van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007; 6:Article 25. van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007; 6:Article 25.
6.
go back to reference Polley EC, Rose S, van der Laan MJ. Super learning. In: van der Laan MJ, Rose S, editors. Targeted learning: causal inference for observational and experimental data. New York: Springer; 2011. p. 43–66.CrossRef Polley EC, Rose S, van der Laan MJ. Super learning. In: van der Laan MJ, Rose S, editors. Targeted learning: causal inference for observational and experimental data. New York: Springer; 2011. p. 43–66.CrossRef
7.
go back to reference Rose S. Mortality risk score prediction in an elderly population using machine learning. Am J Epidemiol. 2013;177:443–52.CrossRefPubMed Rose S. Mortality risk score prediction in an elderly population using machine learning. Am J Epidemiol. 2013;177:443–52.CrossRefPubMed
8.
go back to reference Pirracchio R, Petersen ML, Carone M, Resche Rigon M, Chevret S, van der Laan M. Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. Lancet Resp Med. 2015;3:42–52.CrossRef Pirracchio R, Petersen ML, Carone M, Resche Rigon M, Chevret S, van der Laan M. Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. Lancet Resp Med. 2015;3:42–52.CrossRef
9.
go back to reference Petersen M, LeDell E, Schwab J, Sarovar MS, et al. Super learner analysis of electronic adherence data improves viral prediction and may provide strategies for selective HIV RNA monitoring. J Acquir Immune Defic Syndr. 2015;69:109–18.CrossRefPubMedPubMedCentral Petersen M, LeDell E, Schwab J, Sarovar MS, et al. Super learner analysis of electronic adherence data improves viral prediction and may provide strategies for selective HIV RNA monitoring. J Acquir Immune Defic Syndr. 2015;69:109–18.CrossRefPubMedPubMedCentral
10.
go back to reference Zheng W, Balzer L, van der Laan M, Petersen M, the SEARCH Collaboration. Constrained binary classification using ensemble learning: an application to cost-efficient targeted PrEP strategies. Stat Med. 2017; (Early view). Zheng W, Balzer L, van der Laan M, Petersen M, the SEARCH Collaboration. Constrained binary classification using ensemble learning: an application to cost-efficient targeted PrEP strategies. Stat Med. 2017; (Early view).
11.
go back to reference Díaz I, Hubbard A, Decker A, Cohen M. Variable importance and prediction methods for longitudinal problems with missing variables. PLoS ONE. 2015;10:e0120031.CrossRefPubMedPubMedCentral Díaz I, Hubbard A, Decker A, Cohen M. Variable importance and prediction methods for longitudinal problems with missing variables. PLoS ONE. 2015;10:e0120031.CrossRefPubMedPubMedCentral
12.
go back to reference Kreif N, Tran L, Grieve R, De Stavola B, Tasker RC, Petersen M. Estimating the comparative effectiveness of feeding interventions in the pediatric intensive care unit: a demonstration of longitudinal targeted maximum likelihood estimation. Am J Epidemiol. 2017;186:1370–9.CrossRefPubMedPubMedCentral Kreif N, Tran L, Grieve R, De Stavola B, Tasker RC, Petersen M. Estimating the comparative effectiveness of feeding interventions in the pediatric intensive care unit: a demonstration of longitudinal targeted maximum likelihood estimation. Am J Epidemiol. 2017;186:1370–9.CrossRefPubMedPubMedCentral
15.
go back to reference R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. 2017. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. 2017.
16.
go back to reference Hastie T, Tibshirani R. Generalized additive models. London: Chapman & Hall; 1990. Hastie T, Tibshirani R. Generalized additive models. London: Chapman & Hall; 1990.
17.
go back to reference Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2009.CrossRef Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2009.CrossRef
22.
23.
go back to reference Erin L. Scalable ensemble learning and computationally efficient variance estimation. 2015. PhD Dissertation. UC Berkeley. Erin L. Scalable ensemble learning and computationally efficient variance estimation. 2015. PhD Dissertation. UC Berkeley.
24.
go back to reference Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence. 1995; vol 2. Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence. 1995; vol 2.
25.
go back to reference Zhang Y, Yang Y. Cross-validation for selecting a model selection procedure. J Econom. 2015;187:95–112.CrossRef Zhang Y, Yang Y. Cross-validation for selecting a model selection procedure. J Econom. 2015;187:95–112.CrossRef
26.
go back to reference Moodie EEM, Stephens DA. Treatment prediction, balance, and propensity score adjustment. Epidemiology. 2017;28:e51–3.CrossRefPubMed Moodie EEM, Stephens DA. Treatment prediction, balance, and propensity score adjustment. Epidemiology. 2017;28:e51–3.CrossRefPubMed
27.
go back to reference Pirracchio R, Carone M. The balance super learner: a robust adaptation of the super learner to improve estimation of the average treatment effect in the treated based on propensity score matching. Stat Methods Med Res. 2016; :962280216682055. Pirracchio R, Carone M. The balance super learner: a robust adaptation of the super learner to improve estimation of the average treatment effect in the treated based on propensity score matching. Stat Methods Med Res. 2016; :962280216682055.
28.
go back to reference McCaffrey DF, Griffin BA, Almirall D, Slaughter ME, Ramchand R, Burgette LF. A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat Med. 2013;32:3388–414.CrossRefPubMedPubMedCentral McCaffrey DF, Griffin BA, Almirall D, Slaughter ME, Ramchand R, Burgette LF. A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat Med. 2013;32:3388–414.CrossRefPubMedPubMedCentral
29.
go back to reference Westreich D, Lessler J, Funk MJ. Propensity score estimation: neural networks, support vector machines, decision trees (cart), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol. 2010;63:826–33.CrossRefPubMedPubMedCentral Westreich D, Lessler J, Funk MJ. Propensity score estimation: neural networks, support vector machines, decision trees (cart), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol. 2010;63:826–33.CrossRefPubMedPubMedCentral
30.
31.
go back to reference Snowden JM, Rose S, Mortimer KM. Implementation of g-computation on a simulated data set: demonstration of a causal inference technique. Am J Epidemiol. 2011;173:731–8.CrossRefPubMedPubMedCentral Snowden JM, Rose S, Mortimer KM. Implementation of g-computation on a simulated data set: demonstration of a causal inference technique. Am J Epidemiol. 2011;173:731–8.CrossRefPubMedPubMedCentral
32.
go back to reference Westreich D, Edwards JK, Cole SR, Platt RW, Mumford SL, Schisterman EF. Imputation approaches for potential outcomes in causal inference. Int J Epidemiol. 2015; Published ahead of print July 25, 2015. Westreich D, Edwards JK, Cole SR, Platt RW, Mumford SL, Schisterman EF. Imputation approaches for potential outcomes in causal inference. Int J Epidemiol. 2015; Published ahead of print July 25, 2015.
33.
go back to reference van der Vaart A. Higher order tangent spaces and influence functions. Stat Sci. 2014;29:679–86.CrossRef van der Vaart A. Higher order tangent spaces and influence functions. Stat Sci. 2014;29:679–86.CrossRef
34.
go back to reference Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89:846–66.CrossRef Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89:846–66.CrossRef
35.
go back to reference van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York: Springer; 2003.CrossRef van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York: Springer; 2003.CrossRef
36.
go back to reference van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data. New York: Springer; 2011.CrossRef van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data. New York: Springer; 2011.CrossRef
38.
go back to reference Kennedy EH, Balakrishnan S. Discussion of “Data-driven confounder selection via Markov and Bayesian networks” by Jenny Häggström. Biometrics. 2017; (in press). Kennedy EH, Balakrishnan S. Discussion of “Data-driven confounder selection via Markov and Bayesian networks” by Jenny Häggström. Biometrics. 2017; (in press).
39.
go back to reference Robins J, Li L, Tchetgen Tchetgen E, van der Vaart A. Higher order influence functions and minimax estimation of nonlinear functionals. In: Nolan D, Speed T (Eds.) Probability and statistics: essays in honor of David A. Freedman, Volume 2 of collections. Beachwood, Ohio: Institute of Mathematical Statistics. 2008; pp 335–421. https://doi.org/10.1214/193940307000000527 Robins J, Li L, Tchetgen Tchetgen E, van der Vaart A. Higher order influence functions and minimax estimation of nonlinear functionals. In: Nolan D, Speed T (Eds.) Probability and statistics: essays in honor of David A. Freedman, Volume 2 of collections. Beachwood, Ohio: Institute of Mathematical Statistics. 2008; pp 335–421. https://​doi.​org/​10.​1214/​1939403070000005​27
Metadata
Title
Stacked generalization: an introduction to super learning
Publication date
01-05-2018
Published in
European Journal of Epidemiology / Issue 5/2018
Print ISSN: 0393-2990
Electronic ISSN: 1573-7284
DOI
https://doi.org/10.1007/s10654-018-0390-z

Other articles of this Issue 5/2018

European Journal of Epidemiology 5/2018 Go to the issue