Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2021

Open Access 01-12-2021 | Research

Quantifying the impact of addressing data challenges in prediction of length of stay

Authors: Amin Naemi, Thomas Schmidt, Marjan Mansourvar, Ali Ebrahimi, Uffe Kock Wiil

Published in: BMC Medical Informatics and Decision Making | Issue 1/2021

Login to get access

Abstract

Background

Prediction of length of stay (LOS) at admission time can provide physicians and nurses insight into the illness severity of patients and aid them in avoiding adverse events and clinical deterioration. It also assists hospitals with more effectively managing their resources and manpower.

Methods

In this field of research, there are some important challenges, such as missing values and LOS data skewness. Moreover, various studies use a binary classification which puts a wide range of patients with different conditions into one category. To address these shortcomings, first multivariate imputation techniques are applied to fill incomplete records, then two proper resampling techniques, namely Borderline-SMOTE and SMOGN, are applied to address data skewness in the classification and regression domains, respectively. Finally, machine learning (ML) techniques including neural networks, extreme gradient boosting, random forest, support vector machine, and decision tree are implemented for both approaches to predict LOS of patients admitted to the Emergency Department of Odense University Hospital between June 2018 and April 2019. The ML models are developed based on data obtained from patients at admission time, including pulse rate, arterial blood oxygen saturation, respiratory rate, systolic blood pressure, triage category, arrival ICD-10 codes, age, and gender.

Results

The performance of predictive models before and after addressing missing values and data skewness is evaluated using four evaluation metrics namely receiver operating characteristic, area under the curve (AUC), R-squared score (R2), and normalized root mean square error (NRMSE). Results show that the performance of predictive models is improved on average by 15.75% for AUC, 32.19% for R2 score, and 11.32% for NRMSE after addressing the mentioned challenges. Moreover, our results indicate that there is a relationship between the missing values rate, data skewness, and illness severity of patients, so it is clinically essential to take incomplete records of patients into account and apply proper solutions for interpolation of missing values.

Conclusion

We propose a new method comprised of three stages: missing values imputation, data skewness handling, and building predictive models based on classification and regression approaches. Our results indicated that addressing these challenges in a proper way enhanced the performance of models significantly, which led to a more valid prediction of LOS.
Literature
1.
go back to reference A Awad M Bader-El-Den J McNicholas 2016 Modeling and predicting patient length of stay: a survey Int J Adv Sci Res Manag 1 8 90 102 A Awad M Bader-El-Den J McNicholas 2016 Modeling and predicting patient length of stay: a survey Int J Adv Sci Res Manag 1 8 90 102
2.
go back to reference J-L Vincent M Singer 2010 Critical care: advances and future perspectives Lancet 376 9749 1354 1361CrossRef J-L Vincent M Singer 2010 Critical care: advances and future perspectives Lancet 376 9749 1354 1361CrossRef
3.
go back to reference HF Lingsma A Bottle S Middleton J Kievit EW Steyerberg PJ Marang Van De Mheen 2018 Evaluation of hospital outcomes: the relation between length-of-stay, readmission, and mortality in a large international administrative database BMC Health Serv Res. 18 1 1 10CrossRef HF Lingsma A Bottle S Middleton J Kievit EW Steyerberg PJ Marang Van De Mheen 2018 Evaluation of hospital outcomes: the relation between length-of-stay, readmission, and mortality in a large international administrative database BMC Health Serv Res. 18 1 1 10CrossRef
4.
go back to reference M Sud B Yu HC Wijeysundera PC Austin DT Ko J Braga 2017 Associations between short or long length of stay and 30-day readmission and mortality in hospitalized patients with heart failure JACC Heart Fail 5 8 578 88CrossRef M Sud B Yu HC Wijeysundera PC Austin DT Ko J Braga 2017 Associations between short or long length of stay and 30-day readmission and mortality in hospitalized patients with heart failure JACC Heart Fail 5 8 578 88CrossRef
5.
go back to reference EM Carter HWW Potts 2014 Predicting length of stay from an electronic patient record system: a primary total knee replacement example BMC Med Inform Decis Mak 14 1 26CrossRef EM Carter HWW Potts 2014 Predicting length of stay from an electronic patient record system: a primary total knee replacement example BMC Med Inform Decis Mak 14 1 26CrossRef
6.
go back to reference A O’Cathain E Knowles R Maheswaran T Pearson J Turner E Hirst 2014 A system-wide approach to explaining variation in potentially avoidable emergency admissions: national ecological study BMJ Qual Saf 23 1 47 55CrossRef A O’Cathain E Knowles R Maheswaran T Pearson J Turner E Hirst 2014 A system-wide approach to explaining variation in potentially avoidable emergency admissions: national ecological study BMJ Qual Saf 23 1 47 55CrossRef
7.
go back to reference R Schmidt S Geisler C Spreckelsen 2013 Decision support for hospital bed management using adaptable individual length of stay estimations and shared resources BMC Med Inform Decis Mak 13 1 1 19CrossRef R Schmidt S Geisler C Spreckelsen 2013 Decision support for hospital bed management using adaptable individual length of stay estimations and shared resources BMC Med Inform Decis Mak 13 1 1 19CrossRef
8.
go back to reference C-H Chaou H-H Chen S-H Chang P Tang S-L Pan AM-F Yen 2017 Predicting length of stay among patients discharged from the emergency department—using an accelerated failure time model PLoS ONE 12 1 e0165756CrossRef C-H Chaou H-H Chen S-H Chang P Tang S-L Pan AM-F Yen 2017 Predicting length of stay among patients discharged from the emergency department—using an accelerated failure time model PLoS ONE 12 1 e0165756CrossRef
9.
go back to reference JM Pines A Prabhu JA Hilton JE Hollander EM Datner 2010 The effect of emergency department crowding on length of stay and medication treatment times in discharged patients with acute asthma Acad Emerg Med 17 8 834 839CrossRef JM Pines A Prabhu JA Hilton JE Hollander EM Datner 2010 The effect of emergency department crowding on length of stay and medication treatment times in discharged patients with acute asthma Acad Emerg Med 17 8 834 839CrossRef
10.
go back to reference Q Huang A Thind JF Dreyer GS Zaric 2010 The impact of delays to admission from the emergency department on inpatient outcomes BMC Emerg Med 10 1 1 6CrossRef Q Huang A Thind JF Dreyer GS Zaric 2010 The impact of delays to admission from the emergency department on inpatient outcomes BMC Emerg Med 10 1 1 6CrossRef
11.
go back to reference Roberts A, Marshall L, Charlesworth A. A decade of austerity. The funding pressures facing the NHS from 2010/11. Roberts A, Marshall L, Charlesworth A. A decade of austerity. The funding pressures facing the NHS from 2010/11.
12.
go back to reference J Stewart P Sprivulis G Dwivedi 2018 Artificial intelligence and machine learning in emergency medicine Emerg Med Australas 30 6 870 874CrossRef J Stewart P Sprivulis G Dwivedi 2018 Artificial intelligence and machine learning in emergency medicine Emerg Med Australas 30 6 870 874CrossRef
13.
go back to reference PR Hachesu M Ahmadi S Alizadeh F Sadoughi 2013 Use of data mining techniques to determine and predict length of stay of cardiac patients Healthc Inform Res 19 2 121 129CrossRef PR Hachesu M Ahmadi S Alizadeh F Sadoughi 2013 Use of data mining techniques to determine and predict length of stay of cardiac patients Healthc Inform Res 19 2 121 129CrossRef
14.
go back to reference P-FJ Tsai P-C Chen Y-Y Chen H-Y Song H-M Lin F-M Lin 2016 Length of hospital stay prediction at the admission stage for cardiology patients using artificial neural network J Healthc Eng 2016 1 11CrossRef P-FJ Tsai P-C Chen Y-Y Chen H-Y Song H-M Lin F-M Lin 2016 Length of hospital stay prediction at the admission stage for cardiology patients using artificial neural network J Healthc Eng 2016 1 11CrossRef
15.
go back to reference R Houthooft J Ruyssinck J Herten van der S Stijven I Couckuyt B Gadeyne 2015 Predictive modelling of survival and length of stay in critically ill patients using sequential organ failure scores Artif Intell Med 63 3 191 207CrossRef R Houthooft J Ruyssinck J Herten van der S Stijven I Couckuyt B Gadeyne 2015 Predictive modelling of survival and length of stay in critically ill patients using sequential organ failure scores Artif Intell Med 63 3 191 207CrossRef
16.
go back to reference S Kudyba T Gregorio 2010 Identifying factors that impact patient length of stay metrics for healthcare providers with advanced analytics Health Inform J 16 4 235 245CrossRef S Kudyba T Gregorio 2010 Identifying factors that impact patient length of stay metrics for healthcare providers with advanced analytics Health Inform J 16 4 235 245CrossRef
17.
go back to reference T-H Cheng PJ-H Hu 2009 A data-driven approach to manage the length of stay for appendectomy patients IEEE Trans Syst Man Cybern Part A Syst Hum 39 6 1339 47CrossRef T-H Cheng PJ-H Hu 2009 A data-driven approach to manage the length of stay for appendectomy patients IEEE Trans Syst Man Cybern Part A Syst Hum 39 6 1339 47CrossRef
19.
go back to reference MA Rahman B Honan T Glanville P Hough K Walker 2020 Using data mining to predict emergency department length of stay greater than 4 hours: derivation and single-site validation of a decision tree algorithm Emerg Med Australas 32 3 416 421CrossRef MA Rahman B Honan T Glanville P Hough K Walker 2020 Using data mining to predict emergency department length of stay greater than 4 hours: derivation and single-site validation of a decision tree algorithm Emerg Med Australas 32 3 416 421CrossRef
20.
go back to reference S Barnes E Hamrock M Toerper S Siddiqui S Levin 2016 Real-time prediction of inpatient length of stay for discharge prioritization J Am Med Inform Assoc 23 e1 e2 10CrossRef S Barnes E Hamrock M Toerper S Siddiqui S Levin 2016 Real-time prediction of inpatient length of stay for discharge prioritization J Am Med Inform Assoc 23 e1 e2 10CrossRef
21.
go back to reference Caetano N, Laureano RMS, Cortez P. A data-driven approach to predict hospital length of stay. In: Proceedings of the 16th international conference on enterprise information systems, volume 1. SCITEPRESS-Science and Technology Publications, Lda; 2014. p. 407–14. Caetano N, Laureano RMS, Cortez P. A data-driven approach to predict hospital length of stay. In: Proceedings of the 16th international conference on enterprise information systems, volume 1. SCITEPRESS-Science and Technology Publications, Lda; 2014. p. 407–14.
22.
go back to reference L Turgeman JH May R Sciulli 2017 Insights from a machine learning model for predicting the hospital Length of Stay (LOS) at the time of admission Expert Syst Appl 78 376 385CrossRef L Turgeman JH May R Sciulli 2017 Insights from a machine learning model for predicting the hospital Length of Stay (LOS) at the time of admission Expert Syst Appl 78 376 385CrossRef
23.
go back to reference BA Goldstein AM Navar MJ Pencina J Ioannidis 2017 Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review J Am Med Inform Assoc 24 1 198 208CrossRef BA Goldstein AM Navar MJ Pencina J Ioannidis 2017 Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review J Am Med Inform Assoc 24 1 198 208CrossRef
24.
go back to reference K Bhaskaran L Smeeth 2014 What is the difference between missing completely at random and missing at random? Int J Epidemiol 43 4 1336 1339CrossRef K Bhaskaran L Smeeth 2014 What is the difference between missing completely at random and missing at random? Int J Epidemiol 43 4 1336 1339CrossRef
25.
go back to reference J Agor OY Özaltın JS Ivy M Capan R Arnold S Romero 2019 The value of missing information in severity of illness score development J Biomed inform 97 103255CrossRef J Agor OY Özaltın JS Ivy M Capan R Arnold S Romero 2019 The value of missing information in severity of illness score development J Biomed inform 97 103255CrossRef
26.
go back to reference CN Bech M Brabrand S Mikkelsen A Lassen 2018 Risk factors associated with short term mortality changes over time, after arrival to the emergency department Scand J Trauma Resusc Emerg Med 26 1 1 9CrossRef CN Bech M Brabrand S Mikkelsen A Lassen 2018 Risk factors associated with short term mortality changes over time, after arrival to the emergency department Scand J Trauma Resusc Emerg Med 26 1 1 9CrossRef
27.
go back to reference Naemi A, Schmidt T, Mansourvar M, Ebrahimi A, Wiil UK. Prediction of length of stay using vital signs at the admission time in emergency departments. In: Innovation in medicine and healthcare. Singapore: Springer Singapore; 2021. p. 143–53. Naemi A, Schmidt T, Mansourvar M, Ebrahimi A, Wiil UK. Prediction of length of stay using vital signs at the admission time in emergency departments. In: Innovation in medicine and healthcare. Singapore: Springer Singapore; 2021. p. 143–53.
28.
go back to reference GS Collins JB Reitsma DG Altman KGM Moons 2015 Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) the TRIPOD statement Circulation 131 2 211 219CrossRef GS Collins JB Reitsma DG Altman KGM Moons 2015 Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) the TRIPOD statement Circulation 131 2 211 219CrossRef
29.
go back to reference Schmidt T, Wiil UK. Designing a 3-stage patient deterioration warning system for emergency departments. In: HEALTHINF conference; 2015. p. 470–7. Schmidt T, Wiil UK. Designing a 3-stage patient deterioration warning system for emergency departments. In: HEALTHINF conference; 2015. p. 470–7.
30.
go back to reference Naemi A, Mansourvar M, Schmidt T, Wiil UK. Prediction of patients severity at emergency department using NARX and ensemble learning. In: 2020 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE; 2020. p. 2793–9. Naemi A, Mansourvar M, Schmidt T, Wiil UK. Prediction of patients severity at emergency department using NARX and ensemble learning. In: 2020 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE; 2020. p. 2793–9.
31.
go back to reference G Bonaccorso 2017 Machine learning algorithms Packt Publishing Ltd Birmingham G Bonaccorso 2017 Machine learning algorithms Packt Publishing Ltd Birmingham
32.
go back to reference S García J Luengo F Herrera 2015 Data preprocessing in data mining Springer BerlinCrossRef S García J Luengo F Herrera 2015 Data preprocessing in data mining Springer BerlinCrossRef
33.
go back to reference NV Chawla KW Bowyer LO Hall WP Kegelmeyer 2002 SMOTE: synthetic minority over-sampling technique J Artif Intell Res 16 321 357CrossRef NV Chawla KW Bowyer LO Hall WP Kegelmeyer 2002 SMOTE: synthetic minority over-sampling technique J Artif Intell Res 16 321 357CrossRef
34.
go back to reference Han H, Wang W-Y, Mao B-H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer; 2005. p. 878–87. Han H, Wang W-Y, Mao B-H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer; 2005. p. 878–87.
35.
go back to reference Branco P, Torgo L, Ribeiro RP. SMOGN: a pre-processing approach for imbalanced regression. In: First international workshop on learning with imbalanced domains: theory and applications. PMLR; 2017. p. 36–50. Branco P, Torgo L, Ribeiro RP. SMOGN: a pre-processing approach for imbalanced regression. In: First international workshop on learning with imbalanced domains: theory and applications. PMLR; 2017. p. 36–50.
36.
go back to reference BG Carr AJ Kaye DJ Wiebe VH Gracias CW Schwab PM Reilly 2007 Emergency department length of stay: a major risk factor for pneumonia in intubated blunt trauma patients J Trauma Acute Care Surg 63 1 9 12CrossRef BG Carr AJ Kaye DJ Wiebe VH Gracias CW Schwab PM Reilly 2007 Emergency department length of stay: a major risk factor for pneumonia in intubated blunt trauma patients J Trauma Acute Care Surg 63 1 9 12CrossRef
37.
go back to reference CM Sullivan A Staib J Flores L Aggarwal A Scanlon JH Martin 2014 Aiming to be NEAT: safely improving and sustaining access to emergency care in a tertiary referral hospital Aust Health Rev 38 5 564 574CrossRef CM Sullivan A Staib J Flores L Aggarwal A Scanlon JH Martin 2014 Aiming to be NEAT: safely improving and sustaining access to emergency care in a tertiary referral hospital Aust Health Rev 38 5 564 574CrossRef
38.
go back to reference Gartner D, Padman R. "Mathematical Modelling and Cluster Analysis in Healthcare Analytics-The Case of Length of Stay Management.", thirty seventh international conference of information systems, Dublin, 2016. Gartner D, Padman R. "Mathematical Modelling and Cluster Analysis in Healthcare Analytics-The Case of Length of Stay Management.", thirty seventh international conference of information systems, Dublin, 2016.
39.
go back to reference TA Daghistani R Elshawi S Sakr AM Ahmed A Al-Thwayee MH Al-Mallah 2019 Predictors of in-hospital length of stay among cardiac patients: a machine learning approach Int J Cardiol 288 140 147CrossRef TA Daghistani R Elshawi S Sakr AM Ahmed A Al-Thwayee MH Al-Mallah 2019 Predictors of in-hospital length of stay among cardiac patients: a machine learning approach Int J Cardiol 288 140 147CrossRef
40.
go back to reference Hall MJ, Levant S, DeFrances CJ. Trends in inpatient hospital deaths: National Hospital Discharge Survey, 2000–2010. NCHS data brief. US Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Health Statistics; 2013. p. 1–8. Hall MJ, Levant S, DeFrances CJ. Trends in inpatient hospital deaths: National Hospital Discharge Survey, 2000–2010. NCHS data brief. US Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Health Statistics; 2013. p. 1–8.
41.
go back to reference JAC Sterne IR White JB Carlin M Spratt P Royston MG Kenward 2009 Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls BMJ 338 b2393CrossRef JAC Sterne IR White JB Carlin M Spratt P Royston MG Kenward 2009 Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls BMJ 338 b2393CrossRef
42.
go back to reference MP Young VJ Gooder K McBride B James ES Fisher 2003 Inpatient transfers to the intensive care unit J Gen Intern Med 18 2 77 83CrossRef MP Young VJ Gooder K McBride B James ES Fisher 2003 Inpatient transfers to the intensive care unit J Gen Intern Med 18 2 77 83CrossRef
43.
go back to reference DB Chalfin S Trzeciak A Likourezos BM Baumann RP Dellinger Group D-ES 2007 Impact of delayed transfer of critically ill patients from the emergency department to the intensive care unit Crit Care Med 35 6 1477 83CrossRef DB Chalfin S Trzeciak A Likourezos BM Baumann RP Dellinger Group D-ES 2007 Impact of delayed transfer of critically ill patients from the emergency department to the intensive care unit Crit Care Med 35 6 1477 83CrossRef
44.
go back to reference P Walsh SJ Rothenberg S O’Doherty H Hoey R Healy 2004 A validated clinical model to predict the need for admission and length of stay in children with acute bronchiolitis Eur J Emerg Med 11 5 265 272CrossRef P Walsh SJ Rothenberg S O’Doherty H Hoey R Healy 2004 A validated clinical model to predict the need for admission and length of stay in children with acute bronchiolitis Eur J Emerg Med 11 5 265 272CrossRef
45.
go back to reference Zebin T, Rezvy S, Chaussalet TJ. A deep learning approach for length of stay prediction in clinical settings from medical records. In: 2019 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB). IEEE; 2019. p. 1–5. Zebin T, Rezvy S, Chaussalet TJ. A deep learning approach for length of stay prediction in clinical settings from medical records. In: 2019 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB). IEEE; 2019. p. 1–5.
46.
go back to reference Alahmar A, Mohammed E, Benlamri R. Application of data mining techniques to predict the length of stay of hospitalized patients with diabetes. In: 2018 4th international conference on big data innovations and applications (Innovate-Data). IEEE; 2018. p. 38–43. Alahmar A, Mohammed E, Benlamri R. Application of data mining techniques to predict the length of stay of hospitalized patients with diabetes. In: 2018 4th international conference on big data innovations and applications (Innovate-Data). IEEE; 2018. p. 38–43.
47.
go back to reference Morton A, Marzban E, Giannoulis G, Patel A, Aparasu R, Kakadiaris IA. A comparison of supervised machine learning techniques for predicting short-term in-hospital length of stay among diabetic patients. In: 2014 13th international conference on machine learning and applications. IEEE; 2014. p. 428–31. Morton A, Marzban E, Giannoulis G, Patel A, Aparasu R, Kakadiaris IA. A comparison of supervised machine learning techniques for predicting short-term in-hospital length of stay among diabetic patients. In: 2014 13th international conference on machine learning and applications. IEEE; 2014. p. 428–31.
48.
go back to reference JN Mandrekar 2010 Receiver operating characteristic curve in diagnostic test assessment J Thorac Oncol 5 9 1315 1316CrossRef JN Mandrekar 2010 Receiver operating characteristic curve in diagnostic test assessment J Thorac Oncol 5 9 1315 1316CrossRef
49.
go back to reference B Saha S Gupta D Phung S Venkatesh 2017 Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions Knowl Inf Syst 53 1 179 206CrossRef B Saha S Gupta D Phung S Venkatesh 2017 Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions Knowl Inf Syst 53 1 179 206CrossRef
Metadata
Title
Quantifying the impact of addressing data challenges in prediction of length of stay
Authors
Amin Naemi
Thomas Schmidt
Marjan Mansourvar
Ali Ebrahimi
Uffe Kock Wiil
Publication date
01-12-2021
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2021
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-021-01660-1

Other articles of this Issue 1/2021

BMC Medical Informatics and Decision Making 1/2021 Go to the issue