Skip to main content
Top
Published in:

Open Access 01-12-2024 | Research

Predicting the time to get back to work using statistical models and machine learning approaches

Authors: George Bouliotis, M. Underwood, R. Froud

Published in: BMC Medical Research Methodology | Issue 1/2024

Login to get access

Abstract

Background

Whether machine learning approaches are superior to classical statistical models for survival analyses, especially in the case of lack of proportionality, is unknown.

Objectives

To compare model performance and predictive accuracy of classic regressions and machine learning approaches using data from the Inspiring Families programme.

Methods

The Inspiring Families programme aims to support members of families with complex issues to return to work. We explored predictors of time to return to work with proportional hazards (Semi-Parametric Cox in Stata) and (Flexible Parametric Parmar-Royston in Stata) against the Survival penalised regression with Elastic Net penalty (scikit-survival), (conditional) Survival Forest algorithm (pySurvival), and (kernel) Survival Support Vector Machine (pySurvival).

Results

At baseline we obtained data on 61 binary variables from all 3161 participants. No model appeared superior, with a low predictive power (concordance index between 0.51 and 0.61). The median time for finding the first job was about 254 days. The top five contributing variables were ‘family issues and additional barriers’, ‘restriction of hours’, ‘available CV’, ‘self-employment considered’ and ‘education’. The Harrell’s Concordance index was range from 0.60 (Cox model) to 0.71 (Random Survival Forest) suggesting a better fit for the machine learning approaches. However, the comparison for predicting median time on a selected scenario based showed only minor differences.

Conclusion

Implementing a series of survival models with and without proportional hazards background provides a useful insight as well as better interpretation of the coefficients affected by non-linearities. However, that better fit does not translate to substantially higher predictive power and accuracy from using machine learning approaches. Further tuning of the machine learning algorithms may provide improved results.
Appendix
Available only for authorised users
Literature
1.
go back to reference DWP. Improving lives. The Work, Health and Disability Green Paper. London, 2016. DWP. Improving lives. The Work, Health and Disability Green Paper. London, 2016.
2.
go back to reference Waddell G, Burton AK. Is work good for your health and well-being? London, 2006. Waddell G, Burton AK. Is work good for your health and well-being? London, 2006.
4.
go back to reference Probyn K, Engedahl MS, Rajendran D, et al. The effects of supported employment interventions in populations of people with conditions other than severe mental health: a systematic review. Prim Health Care Res Dev. 2021;22:e79. [published Online First: 2021/12/10].CrossRefPubMedPubMedCentral Probyn K, Engedahl MS, Rajendran D, et al. The effects of supported employment interventions in populations of people with conditions other than severe mental health: a systematic review. Prim Health Care Res Dev. 2021;22:e79. [published Online First: 2021/12/10].CrossRefPubMedPubMedCentral
6.
go back to reference Bhatia A, Yu-Wei C. Machine learning with R Cookbook - Second edition: Packt Publishing. 2017. Bhatia A, Yu-Wei C. Machine learning with R Cookbook - Second edition: Packt Publishing. 2017.
8.
go back to reference Froud R, Hansen SH, Ruud HK, et al. Relative performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a quasi-experimental study. J Med Internet Res. 2021;23(7):e22021. https://doi.org/10.2196/22021. [published Online First: 2021/05/20].CrossRefPubMedPubMedCentral Froud R, Hansen SH, Ruud HK, et al. Relative performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a quasi-experimental study. J Med Internet Res. 2021;23(7):e22021. https://​doi.​org/​10.​2196/​22021. [published Online First: 2021/05/20].CrossRefPubMedPubMedCentral
9.
go back to reference Rubin DB, Schenker N. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc. 1986;81(394):366–74.CrossRef Rubin DB, Schenker N. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc. 1986;81(394):366–74.CrossRef
11.
go back to reference Little RJA, Rubin DB. Statistical analysis with Missing Data. New York: Wiley; 1987. Little RJA, Rubin DB. Statistical analysis with Missing Data. New York: Wiley; 1987.
12.
go back to reference Cox DR. Regression models and life-tables (with discussion). Journal of the Royal Statistical Society. Ser B: Methodological J Royal Stat Soc Ser B: Methodological 1972(34):187–220. Cox DR. Regression models and life-tables (with discussion). Journal of the Royal Statistical Society. Ser B: Methodological J Royal Stat Soc Ser B: Methodological 1972(34):187–220.
14.
go back to reference Collett D. Modelling Survival Data in Medical Research 3rd edition. London: Chapman and Hall; 2015.CrossRef Collett D. Modelling Survival Data in Medical Research 3rd edition. London: Chapman and Hall; 2015.CrossRef
15.
go back to reference Royston P, Lambert PCF. Parametric Survival Analysis Using Stata: Beyond the Cox Model. 2011. College Station, TX: : Stata Press 2011. Royston P, Lambert PCF. Parametric Survival Analysis Using Stata: Beyond the Cox Model. 2011. College Station, TX: : Stata Press 2011.
17.
go back to reference Hastie T, Rossett S, Tibshirani R, et al. The entire regularization path for the support vector machine. J Mach Learn Res Mach Learn Res. 2004;5:1391–415. Hastie T, Rossett S, Tibshirani R, et al. The entire regularization path for the support vector machine. J Mach Learn Res Mach Learn Res. 2004;5:1391–415.
18.
go back to reference Zhu H, Hastie T. Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B: Methodological. 2005;67:301–20.CrossRef Zhu H, Hastie T. Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B: Methodological. 2005;67:301–20.CrossRef
19.
go back to reference Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statisical Sci. 2001;16(3):199–231.CrossRef Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statisical Sci. 2001;16(3):199–231.CrossRef
20.
go back to reference Ishwaran H, Kogalur U, Blackstone E, et al. Random Survival Forests Annals Appl Stat. 2008;2(3):841–60. Ishwaran H, Kogalur U, Blackstone E, et al. Random Survival Forests Annals Appl Stat. 2008;2(3):841–60.
25.
go back to reference Pölsterl S. scikit-survival: a Library for Time-to-event analysis built on Top of scikit-learn. J Mach Learn Res. 2020;21(212):1–6. Pölsterl S. scikit-survival: a Library for Time-to-event analysis built on Top of scikit-learn. J Mach Learn Res. 2020;21(212):1–6.
28.
go back to reference Royston P. Tools for checking calibration of a Cox model in external validation: prediction of population-averaged survival curves based on risk groups. Stata J. 2015;15(1):275–91.CrossRef Royston P. Tools for checking calibration of a Cox model in external validation: prediction of population-averaged survival curves based on risk groups. Stata J. 2015;15(1):275–91.CrossRef
Metadata
Title
Predicting the time to get back to work using statistical models and machine learning approaches
Authors
George Bouliotis
M. Underwood
R. Froud
Publication date
01-12-2024
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2024
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-024-02390-4