Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2023

Open Access 01-12-2023 | Research

ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports

Authors: Jeffrey Wang, Joao Souza de Vale, Saransh Gupta, Pulakesh Upadhyaya, Felipe A. Lisboa, Seth A. Schobel, Eric A. Elster, Christopher J. Dente, Timothy G. Buchman, Rishikesan Kamaleswaran

Published in: BMC Medical Informatics and Decision Making | Issue 1/2023

Login to get access

Abstract

Introduction

Accurate identification of venous thromboembolism (VTE) is critical to develop replicable epidemiological studies and rigorous predictions models. Traditionally, VTE studies have relied on international classification of diseases (ICD) codes which are inaccurate – leading to misclassification bias. Here, we developed ClotCatcher, a novel deep learning model that uses natural language processing to detect VTE from radiology reports.

Methods

Radiology reports to detect VTE were obtained from patients admitted to Emory University Hospital (EUH) and Grady Memorial Hospital (GMH). Data augmentation was performed using the Google PEGASUS paraphraser. This data was then used to fine-tune ClotCatcher, a novel deep learning model. ClotCatcher was validated on both the EUH dataset alone and GMH dataset alone.

Results

The dataset contained 1358 studies from EUH and 915 studies from GMH (n = 2273). The dataset contained 1506 ultrasound studies with 528 (35.1%) studies positive for VTE, and 767 CT studies with 91 (11.9%) positive for VTE. When validated on the EUH dataset, ClotCatcher performed best (AUC = 0.980) when trained on both EUH and GMH dataset without paraphrasing. When validated on the GMH dataset, ClotCatcher performed best (AUC = 0.995) when trained on both EUH and GMH dataset with paraphrasing.

Conclusion

ClotCatcher, a novel deep learning model with data augmentation rapidly and accurately adjudicated the presence of VTE from radiology reports. Applying ClotCatcher to large databases would allow for rapid and accurate adjudication of incident VTE. This would reduce misclassification bias and form the foundation for future studies to estimate individual risk for patient to develop incident VTE.
Appendix
Available only for authorised users
Literature
9.
go back to reference Streiff MB, Brady JP, Grant AM, Grosse SD, Wong B, Popovic T. CDC Grand Rounds: preventing hospital-associated venous thromboembolism. MMWR Morb Mortal Wkly Rep. 2014;63(9):190–3.PubMedPubMedCentral Streiff MB, Brady JP, Grant AM, Grosse SD, Wong B, Popovic T. CDC Grand Rounds: preventing hospital-associated venous thromboembolism. MMWR Morb Mortal Wkly Rep. 2014;63(9):190–3.PubMedPubMedCentral
22.
go back to reference Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.CrossRefPubMed Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.CrossRefPubMed
Metadata
Title
ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports
Authors
Jeffrey Wang
Joao Souza de Vale
Saransh Gupta
Pulakesh Upadhyaya
Felipe A. Lisboa
Seth A. Schobel
Eric A. Elster
Christopher J. Dente
Timothy G. Buchman
Rishikesan Kamaleswaran
Publication date
01-12-2023
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2023
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-023-02369-z

Other articles of this Issue 1/2023

BMC Medical Informatics and Decision Making 1/2023 Go to the issue