Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-12-2020 | Research article

Combining structured and unstructured data for predictive models: a deep learning approach

Authors: Dongdong Zhang, Changchang Yin, Jucheng Zeng, Xiaohui Yuan, Ping Zhang

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Abstract

Background

The broad adoption of electronic health records (EHRs) provides great opportunities to conduct health care research and solve various clinical problems in medicine. With recent advances and success, methods based on machine learning and deep learning have become increasingly popular in medical informatics. However, while many research studies utilize temporal structured data on predictive modeling, they typically neglect potentially valuable information in unstructured clinical notes. Integrating heterogeneous data types across EHRs through deep learning techniques may help improve the performance of prediction models.

Methods

In this research, we proposed 2 general-purpose multi-modal neural network architectures to enhance patient representation learning by combining sequential unstructured notes with structured data. The proposed fusion models leverage document embeddings for the representation of long clinical note documents and either convolutional neural network or long short-term memory networks to model the sequential clinical notes and temporal signals, and one-hot encoding for static information representation. The concatenated representation is the final patient representation which is used to make predictions.

Results

We evaluate the performance of proposed models on 3 risk prediction tasks (i.e. in-hospital mortality, 30-day hospital readmission, and long length of stay prediction) using derived data from the publicly available Medical Information Mart for Intensive Care III dataset. Our results show that by combining unstructured clinical notes with structured data, the proposed models outperform other models that utilize either unstructured notes or structured data only.

Conclusions

The proposed fusion models learn better patient representation by combining structured and unstructured data. Integrating heterogeneous data types across EHRs helps improve the performance of prediction models and reduce errors.

Available only for authorised users

Henry J, Pylypchuk Y, Searcy T, Patel V. Adoption of electronic health record systems among US non-federal acute care hospitals: 2008–2015. ONC Data Brief. 2016;35:1–9.

Bisbal M, Jouve E, Papazian L, de Bourmont S, Perrin G, Eon B, et al. Effectiveness of SAPS III to predict hospital mortality for post-cardiac arrest patients. Resuscitation. 2014;85(7):939–44.CrossRef

Zimmerman JE, Kramer AA, McNair DS, Malila FM. Acute Physiology and Chronic Health Evaluation (APACHE) IV: hospital mortality assessment for today’s critically ill patients. Crit Care Med. 2006;34(5):1297–310.CrossRef

van Walraven C, Dhalla IA, Bell C, Etchells E, Stiell IG, Zarnke K, et al. Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ. 2010;182(6):551–7.CrossRef

Donzé J, Aujesky D, Williams D, Schnipper JL. Potentially avoidable 30-day hospital readmissions in medical patients: derivation and validation of a prediction model. JAMA Internal Med. 2013;173(8):632–8.CrossRef

Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM; 2015. p. 1721–1730.

Tang F, Xiao C, Wang F, Zhou J. Predictive modeling in urgent care: a comparative study of machine learning approaches. JAMIA Open. 2018;1(1):87–98.CrossRef

Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digital Med. 2018;1(1):18.CrossRef

Min X, Yu B, Wang F. Predictive modeling of the hospital readmission risk from patients’ claims data using machine learning: a case study on COPD. Sci Rep. 2019;9(1):1–10.CrossRef

10.

Purushotham S, Meng C, Che Z, Liu Y. Benchmarking deep learning models on large healthcare datasets. J Biomed Inform. 2018;83:112–34.CrossRef

11.

Harutyunyan H, Khachatrian H, Kale DC, Ver Steeg G, Galstyan A. Multitask learning and benchmarking with clinical time series data. Sci Data. 2019;6(1):96.CrossRef

12.

Grnarova P, Schmidt F, Hyland SL, Eickhoff C. Neural document embeddings for intensive care patient mortality prediction. arXiv preprint arXiv:161200467. 2016.

13.

Ghassemi M, Naumann T, Joshi R, Rumshisky A. Topic models for mortality modeling in intensive care units. In: ICML machine learning for clinical data analysis workshop; 2012. p. 1–4.

14.

Boag W, Doss D, Naumann T, Szolovits P. What’s in a note? Unpacking predictive value in clinical note representations. AMIA Summi Transl Sci Proc. 2018;2018:26.

15.

Liu J, Zhang Z, Razavian N. Deep EHR: chronic disease prediction using medical notes. J Mach Learn Res (JMLR). 2018

16.

Sushil M, Šuster S, Luyckx K, Daelemans W. Patient representation learning and interpretable evaluation using clinical notes. J Biomed Inform. 2018;84:103–13.CrossRef

17.

Jin M, Bahadori MT, Colak A, Bhatia P, Celikkaya B, Bhakta R, et al. Improving hospital mortality prediction with medical named entities and multimodal learning. arXiv preprint arXiv:181112276. 2018

18.

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.

19.

Wan J, Wang D, Hoi SCH, Wu P, Zhu J, Zhang Y, et al. Deep learning for content-based image retrieval: A comprehensive study. In: Proceedings of the 22nd ACM international conference on multimedia. ACM; 2014. p. 157–166.

20.

Deng L, Hinton G, Kingsbury B. New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE; 2013. p. 8599–8603.

21.

Collobert R, Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning. ACM; 2008. p. 160–167.

22.

Johnson AE, Pollard TJ, Shen L, Li-wei HL, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035.CrossRef

23.

Hu Z, Melton GB, Arsoniadis EG, Wang Y, Kwaan MR, Simon GJ. Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record. J Biomed Inform. 2017;68:112–20.CrossRef

24.

Luo YF, Rumshisky A. Interpretable topic features for post-icu mortality prediction. In: AMIA annual symposium proceedings. vol. 2016. American medical informatics association; 2016. p. 827.

25.

Kansagara D, Englander H, Salanitro A, Kagen D, Theobald C, Freeman M, et al. Risk prediction models for hospital readmission: a systematic review. JAMA. 2011;306(15):1688–98.CrossRef

26.

Campbell AJ, Cook JA, Adey G, Cuthbertson BH. Predicting death and readmission after intensive care discharge. Br J Anaesth. 2008;100(5):656–62.CrossRef

27.

Futoma J, Morris J, Lucas J. A comparison of models for predicting early hospital readmissions. J Biomed Inform. 2015;56:229–38.CrossRef

28.

Liu V, Kipnis P, Gould MK, Escobar GJ. Length of stay predictions: improvements through the use of automated laboratory and comorbidity variables. Med Care. 2010; p. 739–744.

29.

Hackbarth G, Reischauer R, Miller M. Report to the congress: promoting greater efficiency in medicare. Washington, DC: MedPAC; 2007.

30.

Le Q, Mikolov T. Distributed representations of sentences and documents. In: International conference on machine learning; 2014. p. 1188–1196.

31.

Rehurek R, Sojka P. Software framework for topic modelling with large corpora. In: In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer; 2010.

32.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12(Oct):2825–30.

33.

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems; 2019. p. 8024–8035.

Title: Combining structured and unstructured data for predictive models: a deep learning approach
Authors: Dongdong Zhang
Changchang Yin
Jucheng Zeng
Xiaohui Yuan
Ping Zhang
Publication date: 01-12-2020
Publisher: BioMed Central
Published in: BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/s12911-020-01297-6

Keynote webinar | Spotlight on sleep in brain health

Springer Medicine

Combining structured and unstructured data for predictive models: a deep learning approach

Abstract

Background

Methods

Results

Conclusions

Keynote webinar | Spotlight on sleep in brain health

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2020

Implementation and comparison of two text mining methods with a standard pharmacovigilance method for signal detection of medication errors

Correction to: The International Conference on Intelligent Biology and Medicine 2019: computational methods for drug interactions

Healthcare managers’ experiences of technostress and the actions they take to handle it – a critical incident analysis

Web-based online resources about adverse interactions or side effects associated with complementary and alternative medicine: a systematic review, summarization and quality assessment

Use of online knowledge base in primary health care and correlation to health care quality: an observational study

Technical requirements framework of hospital information systems: design and evaluation