Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-12-2020 | Influenza | Research article

Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis

Authors: Wei Tse Li, Jiayan Ma, Neil Shende, Grant Castaneda, Jaideep Chakladar, Joseph C. Tsai, Lauren Apostol, Christine O. Honda, Jingyue Xu, Lindsay M. Wong, Tianyi Zhang, Abby Lee, Aditi Gnanasekar, Thomas K. Honda, Selena Z. Kuo, Michael Andrew Yu, Eric Y. Chang, Mahadevan “ Raj” Rajasekaran, Weg M. Ongkeko

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Abstract

Background

The recent Coronavirus Disease 2019 (COVID-19) pandemic has placed severe stress on healthcare systems worldwide, which is amplified by the critical shortage of COVID-19 tests.

Methods

In this study, we propose to generate a more accurate diagnosis model of COVID-19 based on patient symptoms and routine test results by applying machine learning to reanalyzing COVID-19 data from 151 published studies. We aim to investigate correlations between clinical variables, cluster COVID-19 patients into subtypes, and generate a computational classification model for discriminating between COVID-19 patients and influenza patients based on clinical variables alone.

Results

We discovered several novel associations between clinical variables, including correlations between being male and having higher levels of serum lymphocytes and neutrophils. We found that COVID-19 patients could be clustered into subtypes based on serum levels of immune cells, gender, and reported symptoms. Finally, we trained an XGBoost model to achieve a sensitivity of 92.5% and a specificity of 97.9% in discriminating COVID-19 patients from influenza patients.

Conclusions

We demonstrated that computational methods trained on large clinical datasets could yield ever more accurate COVID-19 diagnostic models to mitigate the impact of lack of testing. We also presented previously unknown COVID-19 clinical variable correlations and clinical subgroups.

Available only for authorised users

Chang MG, Yuan X, Tao Y, Peng X, Wang F, Xie L, Sharma L, Dela Cruz CS, Qin E. Time Kinetics of Viral Clearance and Resolution of Symptoms in Novel Coronavirus Infection. Am J Respir Crit Care Med. 2020;201(9):1150–2.

Zhang MQ, Wang XH, Chen YL, Zhao KL, Cai YQ, An CL, Lin MG, Mu XD. Clinical features of 2019 novel coronavirus pneumonia in the early stage from a fever clinic in Beijing. Zhonghua Jie He He Hu Xi Za Zhi. 2020;43(3):215–8.PubMed

Feng K, Yun YX, Wang XF, Yang GD, Zheng YJ, Lin CM, Wang LF. Analysis of CT features of 15 children with 2019 novel coronavirus infection. Zhonghua Er Ke Za Zhi. 2020;58(0):E007.PubMed

Li Y, Guo F, Cao Y, Li L, Guo Y. Insight into COVID-2019 for pediatricians. Pediatr Pulmonol. 2020;55:E1–E4.

HUANG P. If Most of your coronavirus tests come Back positive, You're not testing enough: NPR; Washington D.C.; 2020.

Sun P, Qie S, Liu Z, Ren J, Li K, Xi J. Clinical characteristics of hospitalized patients with SARS-CoV-2 infection: a single arm meta-analysis. J Med Virol. 2020;92(6):612–617.

Yang J, Zheng Y, Gou X, Pu K, Chen Z, Guo Q, Ji R, Wang H, Wang Y, Zhou Y. Prevalence of comorbidities in the novel Wuhan coronavirus (COVID-19) infection: a systematic review and meta-analysis. Int J Infect Dis. 2020;94:91–5.

Cao Y, Liu X, Xiong L, Cai K. Imaging and clinical features of patients with 2019 novel coronavirus SARS-CoV-2: a systematic review and meta-analysis. J Med Virol. 2020;92:1449–59.

Cheng Y, Zhao H, Song P, Zhang Z, Chen J, Zhou YH. Dynamic changes of lymphocyte counts in adult patients with severe pandemic H1N1 influenza a. J Infect Public Health. 2019;12(6):878–83.PubMedPubMedCentralCrossRef

10.

Squires RB, Noronha J, Hunt V, Garcia-Sastre A, Macken C, Baumgarth N, Suarez D, Pickett BE, Zhang Y, Larsen CN, et al. Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza Other Respir Viruses. 2012;6(6):404–16.PubMedPubMedCentralCrossRef

11.

Boelaert J, Bendhaiba L, Olteanu M, Villa-Vialaneix N. SOMbrero: an R package for numeric and non-numeric self-organizing map; 2013.

12.

Chen T, Carlos G. XGBoost: A Scalable Tree Boosting System. In: KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016. p. 9.

13.

Kolifarhood G, Aghaali M, Mozafar Saadati H, Taherpour N, Rahimi S, Izadi N, Hashemi Nazari SS. Epidemiological and clinical aspects of COVID-19; a narrative review. Arch Acad Emerg Med. 2020;8(1):e41.PubMedPubMedCentral

14.

Jerez JM, Molina I, Garcia-Laencina PJ, Alba E, Ribelles N, Martin M, Franco L. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med. 2010;50(2):105–15.PubMedCrossRef

15.

Al'Aref SJ, Maliakal G, Singh G, van Rosendael AR, Ma X, Xu Z, Alawamlh OAH, Lee B, Pandey M, Achenbach S, et al. Machine learning of clinical variables and coronary artery calcium scoring for the prediction of obstructive coronary artery disease on coronary computed tomography angiography: analysis from the CONFIRM registry. Eur Heart J. 2020;41(3):359–67.PubMed

16.

Hollingsworth J. A coronavirus test can be developed in 24 hours. So why are some countries still struggling to diagnose? Atlanta: CNN; 2020.

17.

Yong E. How the pandemic will end. Boston: The Atlantic; 2020.

18.

Molloy EJ, Bearer CF. COVID-19 in children and altered inflammatory responses. Pediatr Res. 2020;88:340–341.

19.

Andersen CJ, Vance TM. Gender Dictates the Relationship between Serum Lipids and Leukocyte Counts in the National Health and Nutrition Examination Survey 1999(−)2004. J Clin Med. 2019;8(3):365.

20.

Bain BJ, England JM. Normal haematological values: sex difference in neutrophil count. Br Med J. 1975;1(5953):306–9.PubMedPubMedCentralCrossRef

21.

Wenham C, Smith J, Morgan R, Gender, Group C-W. COVID-19: the gendered impacts of the outbreak. Lancet. 2020;395(10227):846–8.PubMedPubMedCentralCrossRef

22.

Tokars JI, Olsen SJ, Reed C. Seasonal incidence of symptomatic influenza in the United States. Clin Infect Dis. 2018;66(10):1511–8.PubMedPubMedCentralCrossRef

23.

Malmgren J, Guo B, Kaplan HG. COVID-19 Confirmed Case Incidence Age Shift to Young Persons Age 0–19 and 20–39 Years Over Time: Washington State March–April 2020. MedRxiv. 2020.

Title: Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis
Authors: Wei Tse Li
Jiayan Ma
Neil Shende
Grant Castaneda
Jaideep Chakladar
Joseph C. Tsai
Lauren Apostol
Christine O. Honda
Jingyue Xu
Lindsay M. Wong
Tianyi Zhang
Abby Lee
Aditi Gnanasekar
Thomas K. Honda
Selena Z. Kuo
Michael Andrew Yu
Eric Y. Chang
Mahadevan “ Raj” Rajasekaran
Weg M. Ongkeko
Publication date: 01-12-2020
Publisher: BioMed Central
Keywords: Influenza
COVID-19
Published in: BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/s12911-020-01266-z

Keynote webinar | Spotlight on sleep in brain health

Springer Medicine

Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis

Abstract

Background

Methods

Results

Conclusions

Keynote webinar | Spotlight on sleep in brain health

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2020

A combination of two methods for evaluating the usability of a hospital information system

Examining the quality of record linkage process using nationwide Brazilian administrative databases to build a large birth cohort

Online Cost-Effectiveness ANalysis (OCEAN): a user-friendly interface to conduct cost-effectiveness analyses for cervical cancer

Using creative co-design to develop a decision support tool for people with malignant pleural effusion

Evaluating stakeholder involvement in building a decision support tool for NHS health checks: co-producing the WorkHORSE study

Novel methodology to measure pre-procedure antimicrobial prophylaxis: integrating text searches with structured data from the Veterans Health Administration’s electronic medical record