Skip to main content
Top

Open Access 18-03-2024 | COVID-19 | Case Study

A machine learning approach for diagnostic and prognostic predictions, key risk factors and interactions

Authors: Murtaza Nasir, Nichalin S. Summerfield, Stephanie Carreiro, Dan Berlowitz, Asil Oztekin

Published in: Health Services and Outcomes Research Methodology

Login to get access

Abstract

Machine learning (ML) has the potential to revolutionize healthcare, allowing healthcare providers to improve patient-care planning, resource planning and utilization. Furthermore, identifying key-risk-factors and interaction-effects can help service-providers and decision-makers to institute better policies and procedures. This study used COVID-19 electronic health record (EHR) data to predict five crucial outcomes: positive-test, ventilation, death, hospitalization days, and ICU days. Our models achieved high accuracy and precision, with AUC values of 91.6%, 99.1%, and 97.5% for the first three outcomes, and MAE of 0.752 and 0.257 days for the last two outcomes. We also identified interaction effects, such as high bicarbonate in arterial blood being associated with longer hospitalization in middle-aged patients. Our models are embedded in a prototype of an online decision support tool that can be used by healthcare providers to make more informed decisions.
Appendix
Available only for authorised users
Literature
go back to reference Apostolopoulos, I.D., Mpesiana, T.A.: Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys. Eng. Sci. Med. 1 (2020) Apostolopoulos, I.D., Mpesiana, T.A.: Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys. Eng. Sci. Med. 1 (2020)
go back to reference Ardakani, A.A., Kanafi, A.R., Acharya, U.R., Khadem, N., Mohammadi, A.: Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks. Comput. Biol. Med. 103795 (2020) Ardakani, A.A., Kanafi, A.R., Acharya, U.R., Khadem, N., Mohammadi, A.: Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks. Comput. Biol. Med. 103795 (2020)
go back to reference Arora, P., Kumar, H., Panigrahi, B.K.: Prediction and analysis of COVID-19 positive cases using deep learning models: a descriptive case study of India. Chaos Solitons Fractals 110017 (2020). Arora, P., Kumar, H., Panigrahi, B.K.: Prediction and analysis of COVID-19 positive cases using deep learning models: a descriptive case study of India. Chaos Solitons Fractals 110017 (2020).
go back to reference Azcarate, C., Esparza, L., Mallor, F.: The problem of the last bed: contextualization and a new simulation framework for analyzing physician decisions. Omega 96, 102120 (2020)CrossRef Azcarate, C., Esparza, L., Mallor, F.: The problem of the last bed: contextualization and a new simulation framework for analyzing physician decisions. Omega 96, 102120 (2020)CrossRef
go back to reference Benaim, A.R., Almog, R., Gorelik, Y., Hochberg, I., Nassar, L., Mashiach, T., Khamaisi, M., Lurie, Y., Azzam, Z.S., Khoury, J.: Analyzing medical research results based on synthetic data and their relation to real data results: systematic comparison from five observational studies. JMIR Med. Inform. 8(2), e16492 (2020)CrossRef Benaim, A.R., Almog, R., Gorelik, Y., Hochberg, I., Nassar, L., Mashiach, T., Khamaisi, M., Lurie, Y., Azzam, Z.S., Khoury, J.: Analyzing medical research results based on synthetic data and their relation to real data results: systematic comparison from five observational studies. JMIR Med. Inform. 8(2), e16492 (2020)CrossRef
go back to reference Boulesteix, A.L., Janitza, S., Kruppa, J., König, I.R.: Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 2(6), 493–507 (2012)CrossRef Boulesteix, A.L., Janitza, S., Kruppa, J., König, I.R.: Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 2(6), 493–507 (2012)CrossRef
go back to reference Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artific. Intell. Res. 16, 321–357 (2002)CrossRef Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artific. Intell. Res. 16, 321–357 (2002)CrossRef
go back to reference Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y.: Xgboost: extreme gradient boosting. R Package Vers., pp. 1–4 (2015). Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y.: Xgboost: extreme gradient boosting. R Package Vers., pp. 1–4 (2015).
go back to reference Chen, J., Chun, D., Patel, M., Chiang, E., James, J.: The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures. BMC Med. Inform. Decis. Mak.decis. Mak. 19(1), 44 (2019)CrossRef Chen, J., Chun, D., Patel, M., Chiang, E., James, J.: The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures. BMC Med. Inform. Decis. Mak.decis. Mak. 19(1), 44 (2019)CrossRef
go back to reference Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning (2006). Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning (2006).
go back to reference Dolatsara, H.A., Chen, Y.-J., Evans, C., Gupta, A., Megahed, F.M. (2020). A two-stage machine learning framework to predict heart transplantation survival probabilities over time with a monotonic probability constraint. Decis. Support Syst. 113363. Dolatsara, H.A., Chen, Y.-J., Evans, C., Gupta, A., Megahed, F.M. (2020). A two-stage machine learning framework to predict heart transplantation survival probabilities over time with a monotonic probability constraint. Decis. Support Syst. 113363.
go back to reference Ekins, S., Mottin, M., Ramos, P.R., Sousa, B.K., Neves, B.J., Foil, D.H., Zorn, K.M., Braga, R.C., Coffee, M., Southan, C.: Déjà vu: stimulating open drug discovery for SARS-CoV-2. Drug Discov. Today (2020). Ekins, S., Mottin, M., Ramos, P.R., Sousa, B.K., Neves, B.J., Foil, D.H., Zorn, K.M., Braga, R.C., Coffee, M., Southan, C.: Déjà vu: stimulating open drug discovery for SARS-CoV-2. Drug Discov. Today (2020).
go back to reference Fushiki, T.: Estimation of prediction error by using K-fold cross-validation. Stat. Comput.comput. 21(2), 137–146 (2011)MathSciNetCrossRef Fushiki, T.: Estimation of prediction error by using K-fold cross-validation. Stat. Comput.comput. 21(2), 137–146 (2011)MathSciNetCrossRef
go back to reference Gebert, T., Jiang, S., Sheng, J.: Characterizing Allegheny county opioid overdoses with an interactive data explorer and synthetic prediction tool. arXiv:1804.08830 (2018). Gebert, T., Jiang, S., Sheng, J.: Characterizing Allegheny county opioid overdoses with an interactive data explorer and synthetic prediction tool. arXiv:​1804.​08830 (2018).
go back to reference Guo, M., Zhang, Q., Liao, X., Chen, F.Y., Zeng, D.D.: A hybrid machine learning framework for analyzing human decision-making through learning preferences. Omega 101, 102263 (2021)CrossRef Guo, M., Zhang, Q., Liao, X., Chen, F.Y., Zeng, D.D.: A hybrid machine learning framework for analyzing human decision-making through learning preferences. Omega 101, 102263 (2021)CrossRef
go back to reference King, J., Russell, S., Bennett, T. D., & Ghosh, D. Kung Faux Pandas Simplifying privacy protection. In Proceedings of AMIA Summits on Translational Science, Vol. 267 (2019). King, J., Russell, S., Bennett, T. D., & Ghosh, D. Kung Faux Pandas Simplifying privacy protection. In Proceedings of AMIA Summits on Translational Science, Vol. 267 (2019).
go back to reference Kucharski, A.J., Russell, T.W., Diamond, C., Liu, Y., Edmunds, J., Funk, S., Eggo, R.M., Sun, F., Jit, M., Munday, J.D.: Early dynamics of transmission and control of COVID-19: a mathematical modelling study. Lancet Infect. Diseases (2020). Kucharski, A.J., Russell, T.W., Diamond, C., Liu, Y., Edmunds, J., Funk, S., Eggo, R.M., Sun, F., Jit, M., Munday, J.D.: Early dynamics of transmission and control of COVID-19: a mathematical modelling study. Lancet Infect. Diseases (2020).
go back to reference Lalmuanawma, S., Hussain, J., Chhakchhuak, L.: Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: a review. Chaos Solitons Fractals 110059 (2020). Lalmuanawma, S., Hussain, J., Chhakchhuak, L.: Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: a review. Chaos Solitons Fractals 110059 (2020).
go back to reference Li, L., Qin, L., Xu, Z., Yin, Y., Wang, X., Kong, B., Bai, J., Lu, Y., Fang, Z., Song, Q.: Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT. Radiology (2020). Li, L., Qin, L., Xu, Z., Yin, Y., Wang, X., Kong, B., Bai, J., Lu, Y., Fang, Z., Song, Q.: Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT. Radiology (2020).
go back to reference Li, N., Zhang, Y., Teng, D., Kong, N.: Pareto optimization for control agreement in patient referral coordination. Omega 101, 102234 (2021)CrossRef Li, N., Zhang, Y., Teng, D., Kong, N.: Pareto optimization for control agreement in patient referral coordination. Omega 101, 102234 (2021)CrossRef
go back to reference Misiunas, N., Oztekin, A., Chen, Y., Chandra, K.: DEANN: A healthcare analytic methodology of data envelopment analysis and artificial neural networks for the prediction of organ recipient functional status. Omega 58, 46–54 (2016)CrossRef Misiunas, N., Oztekin, A., Chen, Y., Chandra, K.: DEANN: A healthcare analytic methodology of data envelopment analysis and artificial neural networks for the prediction of organ recipient functional status. Omega 58, 46–54 (2016)CrossRef
go back to reference Mueller-Peltzer, M., Feuerriegel, S., Nielsen, A.M., Kongsted, A., Vach, W., Neumann, D.: Longitudinal healthcare analytics for disease management: Empirical demonstration for low back pain. Decis. Supp. Syst. 113271 (2020). Mueller-Peltzer, M., Feuerriegel, S., Nielsen, A.M., Kongsted, A., Vach, W., Neumann, D.: Longitudinal healthcare analytics for disease management: Empirical demonstration for low back pain. Decis. Supp. Syst. 113271 (2020).
go back to reference Nasir, M., South-Winter, C., Ragothaman, S., Dag, A.: A comparative data analytic approach to construct a risk trade-off for cardiac patients’ re-admissions. Ind. Manag. Data Syst.manag. Data Syst. 119(1), 189–209 (2019)CrossRef Nasir, M., South-Winter, C., Ragothaman, S., Dag, A.: A comparative data analytic approach to construct a risk trade-off for cardiac patients’ re-admissions. Ind. Manag. Data Syst.manag. Data Syst. 119(1), 189–209 (2019)CrossRef
go back to reference Nasir, M., Summerfield, N., Dag, A., Oztekin, A.: A service analytic approach to studying patient no-shows. Serv. Bus. 14(2), 287–313 (2020)CrossRef Nasir, M., Summerfield, N., Dag, A., Oztekin, A.: A service analytic approach to studying patient no-shows. Serv. Bus. 14(2), 287–313 (2020)CrossRef
go back to reference Nasir, M., Summerfield, N.S., Oztekin, A., Knight, M., Ackerson, L.K., Carreiro, S.: Machine learning–based outcome prediction and novel hypotheses generation for substance use disorder treatment. J. Am. Med. Inform. Assoc. 28(6), 1216–1224 (2021)CrossRefPubMedPubMedCentral Nasir, M., Summerfield, N.S., Oztekin, A., Knight, M., Ackerson, L.K., Carreiro, S.: Machine learning–based outcome prediction and novel hypotheses generation for substance use disorder treatment. J. Am. Med. Inform. Assoc. 28(6), 1216–1224 (2021)CrossRefPubMedPubMedCentral
go back to reference Noble, W.S.: What is a support vector machine? Nat. Biotechnol.biotechnol. 24(12), 1565–1567 (2006)CrossRef Noble, W.S.: What is a support vector machine? Nat. Biotechnol.biotechnol. 24(12), 1565–1567 (2006)CrossRef
go back to reference Osuna, E., Freund, R., Girosi, F.: Support vector machines: Training and applications (1997) Osuna, E., Freund, R., Girosi, F.: Support vector machines: Training and applications (1997)
go back to reference Piri, S.: Missing care: A framework to address the issue of frequent missing values the case of a clinical decision support system for Parkinson's disease. Decis. Support Syst. 113339 (2020). Piri, S.: Missing care: A framework to address the issue of frequent missing values the case of a clinical decision support system for Parkinson's disease. Decis. Support Syst. 113339 (2020).
go back to reference Ribeiro, M.H.D.M., da Silva, R.G., Mariani, V.C., dos Santos Coelho, L.: Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil. Chaos Solitons Fractals 109853 (2020). Ribeiro, M.H.D.M., da Silva, R.G., Mariani, V.C., dos Santos Coelho, L.: Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil. Chaos Solitons Fractals 109853 (2020).
go back to reference Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern.cybern. 21(3), 660–674 (1991)MathSciNetCrossRef Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern.cybern. 21(3), 660–674 (1991)MathSciNetCrossRef
go back to reference Santini, A.: Optimising the assignment of swabs and reagent for PCR testing during a viral epidemic. Omega 102, 102341 (2021)CrossRefPubMed Santini, A.: Optimising the assignment of swabs and reagent for PCR testing during a viral epidemic. Omega 102, 102341 (2021)CrossRefPubMed
go back to reference Shi, F., Wang, J., Shi, J., Wu, Z., Wang, Q., Tang, Z., He, K., Shi, Y., Shen, D. Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19. IEEE Rev. Biomed. Eng. (2020) Shi, F., Wang, J., Shi, J., Wu, Z., Wang, Q., Tang, Z., He, K., Shi, Y., Shen, D. Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19. IEEE Rev. Biomed. Eng. (2020)
go back to reference Simsek, S., Tiahrt, T., Dag, A.: Stratifying no-show patients into multiple risk groups via a holistic data analytics-based framework. Decis. Support Syst. 113269 (2020). Simsek, S., Tiahrt, T., Dag, A.: Stratifying no-show patients into multiple risk groups via a holistic data analytics-based framework. Decis. Support Syst. 113269 (2020).
go back to reference Song, Y.-Y., Ying, L.: Decision tree methods: applications for classification and prediction. Shanghai Arch. Psychiatry 27(2), 130 (2015)PubMedPubMedCentral Song, Y.-Y., Ying, L.: Decision tree methods: applications for classification and prediction. Shanghai Arch. Psychiatry 27(2), 130 (2015)PubMedPubMedCentral
go back to reference Sun, L., Liu, G., Song, F., Shi, N., Liu, F., Li, S., Li, P., Zhang, W., Jiang, X., Zhang, Y.: Combination of four clinical indicators predicts the severe/critical symptom of patients infected COVID-19. J. Clin. Virol. 104431 (2020). Sun, L., Liu, G., Song, F., Shi, N., Liu, F., Li, S., Li, P., Zhang, W., Jiang, X., Zhang, Y.: Combination of four clinical indicators predicts the severe/critical symptom of patients infected COVID-19. J. Clin. Virol. 104431 (2020).
go back to reference Topuz, K., Zengul, F.D., Dag, A., Almehmi, A., Yildirim, M.B.: Predicting graft survival among kidney transplant recipients: a Bayesian decision support model. Decis. Support. Syst.. Support. Syst. 106, 97–109 (2018)CrossRef Topuz, K., Zengul, F.D., Dag, A., Almehmi, A., Yildirim, M.B.: Predicting graft survival among kidney transplant recipients: a Bayesian decision support model. Decis. Support. Syst.. Support. Syst. 106, 97–109 (2018)CrossRef
go back to reference Tuli, S., Tuli, S., Tuli, R., Gill, S.S.: Predicting the growth and trend of COVID-19 pandemic using machine learning and cloud computing. Int. Things 100222 (2020). Tuli, S., Tuli, S., Tuli, R., Gill, S.S.: Predicting the growth and trend of COVID-19 pandemic using machine learning and cloud computing. Int. Things 100222 (2020).
go back to reference Vaid, S., Cakan, C., Bhandari, M.: Using machine learning to estimate unobserved COVID-19 infections in North America. JBJS 102(13), e70 (2020)CrossRef Vaid, S., Cakan, C., Bhandari, M.: Using machine learning to estimate unobserved COVID-19 infections in North America. JBJS 102(13), e70 (2020)CrossRef
go back to reference Walonoski, J., Kramer, M., Nichols, J., Quina, A., Moesel, C., Hall, D., Duffett, C., Dube, K., Gallagher, T., McLachlan, S.: Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J. Am. Med. Inform. Assoc. 25(3), 230–238 (2018)CrossRefPubMed Walonoski, J., Kramer, M., Nichols, J., Quina, A., Moesel, C., Hall, D., Duffett, C., Dube, K., Gallagher, T., McLachlan, S.: Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J. Am. Med. Inform. Assoc. 25(3), 230–238 (2018)CrossRefPubMed
go back to reference Wang, L. L., Lo, K., Chandrasekhar, Y., Reas, R., Yang, J., Eide, D., Funk, K., Kinney, R., Liu, Z., Merrill, W.: CORD-19: The Covid-19 Open Research Dataset. (2020) Wang, L. L., Lo, K., Chandrasekhar, Y., Reas, R., Yang, J., Eide, D., Funk, K., Kinney, R., Liu, Z., Merrill, W.: CORD-19: The Covid-19 Open Research Dataset. (2020)
go back to reference Yadav, M., Perumal, M., Srinivas, M.: Analysis on novel coronavirus (covid-19) using machine learning methods. Chaos Solitons Fractals 110050 (2020) Yadav, M., Perumal, M., Srinivas, M.: Analysis on novel coronavirus (covid-19) using machine learning methods. Chaos Solitons Fractals 110050 (2020)
go back to reference Yang, Z., Zeng, Z., Wang, K., Wong, S.-S., Liang, W., Zanin, M., Liu, P., Cao, X., Gao, Z., Mai, Z.: Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. J. Thorac. Dis.thorac. Dis. 12(3), 165 (2020)CrossRef Yang, Z., Zeng, Z., Wang, K., Wong, S.-S., Liang, W., Zanin, M., Liu, P., Cao, X., Gao, Z., Mai, Z.: Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. J. Thorac. Dis.thorac. Dis. 12(3), 165 (2020)CrossRef
go back to reference Zhang, Z., Yan, C., Mesa, D.A., Sun, J., Malin, B.A.: Ensuring electronic medical record simulation through better training, modeling, and evaluation. J. Am. Med. Inform. Assoc. 27(1), 99–108 (2020)CrossRefPubMed Zhang, Z., Yan, C., Mesa, D.A., Sun, J., Malin, B.A.: Ensuring electronic medical record simulation through better training, modeling, and evaluation. J. Am. Med. Inform. Assoc. 27(1), 99–108 (2020)CrossRefPubMed
go back to reference Zhao, H.: Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl. Inf. Syst.. Inf. Syst. 15(3), 321–334 (2008)CrossRef Zhao, H.: Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl. Inf. Syst.. Inf. Syst. 15(3), 321–334 (2008)CrossRef
Metadata
Title
A machine learning approach for diagnostic and prognostic predictions, key risk factors and interactions
Authors
Murtaza Nasir
Nichalin S. Summerfield
Stephanie Carreiro
Dan Berlowitz
Asil Oztekin
Publication date
18-03-2024
Publisher
Springer US
Keyword
COVID-19
Published in
Health Services and Outcomes Research Methodology
Print ISSN: 1387-3741
Electronic ISSN: 1572-9400
DOI
https://doi.org/10.1007/s10742-024-00324-7