Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2020

Open Access 01-12-2020 | Myocardial Infarction | Research article

Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data

Authors: Divneet Mandair, Premanand Tiwari, Steven Simon, Kathryn L. Colborn, Michael A. Rosenberg

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Login to get access

Abstract

Background

With cardiovascular disease increasing, substantial research has focused on the development of prediction tools. We compare deep learning and machine learning models to a baseline logistic regression using only ‘known’ risk factors in predicting incident myocardial infarction (MI) from harmonized EHR data.

Methods

Large-scale case-control study with outcome of 6-month incident MI, conducted using the top 800, from an initial 52 k procedures, diagnoses, and medications within the UCHealth system, harmonized to the Observational Medical Outcomes Partnership common data model, performed on 2.27 million patients. We compared several over- and under- sampling techniques to address the imbalance in the dataset. We compared regularized logistics regression, random forest, boosted gradient machines, and shallow and deep neural networks. A baseline model for comparison was a logistic regression using a limited set of ‘known’ risk factors for MI. Hyper-parameters were identified using 10-fold cross-validation.

Results

Twenty thousand Five hundred and ninety-one patients were diagnosed with MI compared with 2.25 million who did not. A deep neural network with random undersampling provided superior classification compared with other methods. However, the benefit of the deep neural network was only moderate, showing an F1 Score of 0.092 and AUC of 0.835, compared to a logistic regression model using only ‘known’ risk factors. Calibration for all models was poor despite adequate discrimination, due to overfitting from low frequency of the event of interest.

Conclusions

Our study suggests that DNN may not offer substantial benefit when trained on harmonized data, compared to traditional methods using established risk factors for MI.
Appendix
Available only for authorised users
Literature
1.
go back to reference Global Burden of Cardiovascular Diseases Collaboration, et al. The Burden of Cardiovascular Diseases Among US States, 1990–2016. JAMA Cardiol. 2018;3:375–89.CrossRef Global Burden of Cardiovascular Diseases Collaboration, et al. The Burden of Cardiovascular Diseases Among US States, 1990–2016. JAMA Cardiol. 2018;3:375–89.CrossRef
2.
go back to reference Wang Y, et al. Risk factors associated with major cardiovascular events 1 year after acute myocardial infarction. JAMA Netw Open. 2018;1:e181079.CrossRef Wang Y, et al. Risk factors associated with major cardiovascular events 1 year after acute myocardial infarction. JAMA Netw Open. 2018;1:e181079.CrossRef
3.
go back to reference Yeh RW, Go AS. Rethinking the epidemiology of acute myocardial infarction: challenges and opportunities. Arch Intern Med. 2010;170:759–64.CrossRef Yeh RW, Go AS. Rethinking the epidemiology of acute myocardial infarction: challenges and opportunities. Arch Intern Med. 2010;170:759–64.CrossRef
4.
go back to reference Liu N, et al. Prediction of adverse cardiac events in emergency department patients with chest pain using machine learning for variable selection. BMC Med Inform Decis Mak. 2014;14:75.CrossRef Liu N, et al. Prediction of adverse cardiac events in emergency department patients with chest pain using machine learning for variable selection. BMC Med Inform Decis Mak. 2014;14:75.CrossRef
5.
go back to reference Steele AJ, Denaxas SC, Shah AD, Hemingway H, Luscombe NM. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS One. 2018;13:e0202344.CrossRef Steele AJ, Denaxas SC, Shah AD, Hemingway H, Luscombe NM. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS One. 2018;13:e0202344.CrossRef
6.
go back to reference Tay D, Poh CL, Van Reeth E, Kitney RI. The effect of sample age and prediction resolution on myocardial infarction risk prediction. IEEE J Biomed Health Inform. 2015;19:1178–85.CrossRef Tay D, Poh CL, Van Reeth E, Kitney RI. The effect of sample age and prediction resolution on myocardial infarction risk prediction. IEEE J Biomed Health Inform. 2015;19:1178–85.CrossRef
7.
go back to reference Austin PC, Lee DS, Steyerberg EW, Tu JV. Regression trees for predicting mortality in patients with cardiovascular disease: what improvement is achieved by using ensemble-based methods? Biom J. 2012;54:657–73.CrossRef Austin PC, Lee DS, Steyerberg EW, Tu JV. Regression trees for predicting mortality in patients with cardiovascular disease: what improvement is achieved by using ensemble-based methods? Biom J. 2012;54:657–73.CrossRef
8.
go back to reference Mansoor H, Elgendy IY, Segal R, Bavry AA, Bian J. Risk prediction model for in-hospital mortality in women with ST-elevation myocardial infarction: a machine learning approach. Heart Lung. 2017;46:405–11.CrossRef Mansoor H, Elgendy IY, Segal R, Bavry AA, Bian J. Risk prediction model for in-hospital mortality in women with ST-elevation myocardial infarction: a machine learning approach. Heart Lung. 2017;46:405–11.CrossRef
9.
go back to reference Shouval R, et al. Machine learning for prediction of 30-day mortality after ST elevation myocardial infraction: an acute coronary syndrome Israeli survey data mining study. Int J Cardiol. 2017;246:7–13.CrossRef Shouval R, et al. Machine learning for prediction of 30-day mortality after ST elevation myocardial infraction: an acute coronary syndrome Israeli survey data mining study. Int J Cardiol. 2017;246:7–13.CrossRef
10.
go back to reference Wallert J, Tomasoni M, Madison G, Held C. Predicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register data. BMC Med Inform Decis Mak. 2017;17:99.CrossRef Wallert J, Tomasoni M, Madison G, Held C. Predicting two-year survival versus non-survival after first myocardial infarction using machine learning and Swedish national register data. BMC Med Inform Decis Mak. 2017;17:99.CrossRef
11.
12.
go back to reference Moravčík M, et al. DeepStack: expert-level artificial intelligence in heads-up no-limit poker. Science. 2017;356:508–13.CrossRef Moravčík M, et al. DeepStack: expert-level artificial intelligence in heads-up no-limit poker. Science. 2017;356:508–13.CrossRef
13.
go back to reference Kooi T, et al. Large scale deep learning for computer aided detection of mammographic lesions. Med Image Anal. 2017;35:303–12.CrossRef Kooi T, et al. Large scale deep learning for computer aided detection of mammographic lesions. Med Image Anal. 2017;35:303–12.CrossRef
14.
go back to reference Song X, Mitnitski A, Cox J, Rockwood K. Comparison of machine learning techniques with classical statistical models in predicting health outcomes. Stud Health Technol Inform. 2004;107:736–40.PubMed Song X, Mitnitski A, Cox J, Rockwood K. Comparison of machine learning techniques with classical statistical models in predicting health outcomes. Stud Health Technol Inform. 2004;107:736–40.PubMed
15.
go back to reference Payrovnaziri SN, Barrett LA, Bis D, Bian J, He Z. Enhancing prediction models for one-year mortality in patients with acute myocardial infarction and post myocardial infarction syndrome. Stud Health Technol Inform. 2019;264:273–7.PubMedPubMedCentral Payrovnaziri SN, Barrett LA, Bis D, Bian J, He Z. Enhancing prediction models for one-year mortality in patients with acute myocardial infarction and post myocardial infarction syndrome. Stud Health Technol Inform. 2019;264:273–7.PubMedPubMedCentral
16.
go back to reference Casey JA, Schwartz BS, Stewart WF, Adler NE. Using electronic health Records for Population Health Research: a review of methods and applications. Annu Rev Public Health. 2016;37:61–81.CrossRef Casey JA, Schwartz BS, Stewart WF, Adler NE. Using electronic health Records for Population Health Research: a review of methods and applications. Annu Rev Public Health. 2016;37:61–81.CrossRef
17.
go back to reference Chen Y, et al. Building bridges across electronic health record systems through inferred phenotypic topics. J Biomed Inform. 2015;55:82–93.CrossRef Chen Y, et al. Building bridges across electronic health record systems through inferred phenotypic topics. J Biomed Inform. 2015;55:82–93.CrossRef
18.
go back to reference Huang Y, et al. Privacy-preserving predictive modeling: harmonization of contextual Embeddings from different sources. JMIR Med Inform. 2018;6:e33.CrossRef Huang Y, et al. Privacy-preserving predictive modeling: harmonization of contextual Embeddings from different sources. JMIR Med Inform. 2018;6:e33.CrossRef
19.
go back to reference Barrett LA, Payrovnaziri SN, Bian J, He Z. Building computational models to predict one-year mortality in ICU patients with acute myocardial infarction and post myocardial infarction syndrome. AMIA Jt Summits Transl Sci Proc. 2019;2019:407–16.PubMedPubMedCentral Barrett LA, Payrovnaziri SN, Bian J, He Z. Building computational models to predict one-year mortality in ICU patients with acute myocardial infarction and post myocardial infarction syndrome. AMIA Jt Summits Transl Sci Proc. 2019;2019:407–16.PubMedPubMedCentral
20.
21.
go back to reference Hu D, et al. Evidential MACE prediction of acute coronary syndrome using electronic health records. BMC Med Inform Decis Mak. 2019;19:61.CrossRef Hu D, et al. Evidential MACE prediction of acute coronary syndrome using electronic health records. BMC Med Inform Decis Mak. 2019;19:61.CrossRef
22.
go back to reference Asaria M, et al. Using electronic health records to predict costs and outcomes in stable coronary artery disease. Heart. 2016;102:755–62.CrossRef Asaria M, et al. Using electronic health records to predict costs and outcomes in stable coronary artery disease. Heart. 2016;102:755–62.CrossRef
23.
go back to reference Weiss JC, Natarajan S, Peissig PL, McCarty CA, Page D. Machine learning for personalized medicine: predicting primary myocardial infarction from electronic health records. AI Mag. 2012;33:33.CrossRef Weiss JC, Natarajan S, Peissig PL, McCarty CA, Page D. Machine learning for personalized medicine: predicting primary myocardial infarction from electronic health records. AI Mag. 2012;33:33.CrossRef
24.
go back to reference Jiang G, Kiefer RC, Sharma DK, Prud’hommeaux E, Solbrig HR. A consensus-based approach for harmonizing the OHDSI common data model with HL7 FHIR. Stud Health Technol Inform. 2017;245:887–91.PubMedPubMedCentral Jiang G, Kiefer RC, Sharma DK, Prud’hommeaux E, Solbrig HR. A consensus-based approach for harmonizing the OHDSI common data model with HL7 FHIR. Stud Health Technol Inform. 2017;245:887–91.PubMedPubMedCentral
25.
go back to reference Jiang G, et al. Harmonization of detailed clinical models with clinical study data standards. Methods Inf Med. 2015;54:65–74.CrossRef Jiang G, et al. Harmonization of detailed clinical models with clinical study data standards. Methods Inf Med. 2015;54:65–74.CrossRef
26.
go back to reference Makadia R, Ryan PB. Transforming the Premier Perspective Hospital Database into the Observational Medical Outcomes Partnership (OMOP) Common Data Model. EGEMS (Washington, DC). 2014;2:1110. Makadia R, Ryan PB. Transforming the Premier Perspective Hospital Database into the Observational Medical Outcomes Partnership (OMOP) Common Data Model. EGEMS (Washington, DC). 2014;2:1110.
27.
go back to reference Blagus R, Lusa L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 2013;14:106.CrossRef Blagus R, Lusa L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 2013;14:106.CrossRef
28.
go back to reference Agresti A, Coull BA. Approximate is better than ‘exact’ for interval estimation of binomial proportions. Am Stat. 1998;52:119–26. Agresti A, Coull BA. Approximate is better than ‘exact’ for interval estimation of binomial proportions. Am Stat. 1998;52:119–26.
29.
go back to reference Lakhani P, Langlotz CP. Automated detection of radiology reports that document non-routine communication of critical or significant results. J Digit Imaging. 2010;23:647–57.CrossRef Lakhani P, Langlotz CP. Automated detection of radiology reports that document non-routine communication of critical or significant results. J Digit Imaging. 2010;23:647–57.CrossRef
31.
go back to reference Weiss JC, Page D, Peissig PL, Natarajan S, McCarty C. Statistical relational learning to predict primary myocardial infarction from electronic health records. Proc Innov Appl Artif Intell Conf. 2012;2012:2341–7.PubMedPubMedCentral Weiss JC, Page D, Peissig PL, Natarajan S, McCarty C. Statistical relational learning to predict primary myocardial infarction from electronic health records. Proc Innov Appl Artif Intell Conf. 2012;2012:2341–7.PubMedPubMedCentral
32.
go back to reference Xue S, et al. Circulating MiR-17-5p, MiR-126-5p and MiR-145-3p are novel biomarkers for diagnosis of acute myocardial infarction. Front Physiol. 2019;10:123.CrossRef Xue S, et al. Circulating MiR-17-5p, MiR-126-5p and MiR-145-3p are novel biomarkers for diagnosis of acute myocardial infarction. Front Physiol. 2019;10:123.CrossRef
33.
go back to reference Cross DS, McCarty CA, Steinhubl SR, Carey DJ, Erlich PM. Development of a multi-institutional cohort to facilitate cardiovascular disease biomarker validation using existing biorepository samples linked to electronic health records. Clin Cardiol. 2013;36:486–91.CrossRef Cross DS, McCarty CA, Steinhubl SR, Carey DJ, Erlich PM. Development of a multi-institutional cohort to facilitate cardiovascular disease biomarker validation using existing biorepository samples linked to electronic health records. Clin Cardiol. 2013;36:486–91.CrossRef
35.
go back to reference Androulakis AFA, et al. Entropy as a novel measure of myocardial tissue heterogeneity for prediction of ventricular arrhythmias and mortality in post-infarct patients. JACC Clin Electrophysiol. 2019;5:480–9.CrossRef Androulakis AFA, et al. Entropy as a novel measure of myocardial tissue heterogeneity for prediction of ventricular arrhythmias and mortality in post-infarct patients. JACC Clin Electrophysiol. 2019;5:480–9.CrossRef
37.
go back to reference Niculescu-Mizil, A. & Caruana, R. Obtaining Calibrated Probabilities from Boosting. Niculescu-Mizil, A. & Caruana, R. Obtaining Calibrated Probabilities from Boosting.
39.
go back to reference Tiwari, Premanand, Colborn, Katie, Smith, Derek, Xing, Fuyong, Gosh, Debashis, Rosenberg Michael. Development of a Prediction Model for Incident Atrial Fibrillation using Machine Learning Applied to Harmonized Electronic Health Record Data. BioRxiv [Preprint]. January 18, 2019. Available from: https://doi.org/10.1101/520866. Tiwari, Premanand, Colborn, Katie, Smith, Derek, Xing, Fuyong, Gosh, Debashis, Rosenberg Michael. Development of a Prediction Model for Incident Atrial Fibrillation using Machine Learning Applied to Harmonized Electronic Health Record Data. BioRxiv [Preprint]. January 18, 2019. Available from: https://​doi.​org/​10.​1101/​520866.
Metadata
Title
Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data
Authors
Divneet Mandair
Premanand Tiwari
Steven Simon
Kathryn L. Colborn
Michael A. Rosenberg
Publication date
01-12-2020
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-020-01268-x

Other articles of this Issue 1/2020

BMC Medical Informatics and Decision Making 1/2020 Go to the issue