Top

Journal of Imaging Informatics in Medicine

Published in:

01-02-2019

Automatic Normalization of Anatomical Phrases in Radiology Reports Using Unsupervised Learning

Authors: Amir M. Tahmasebi, Henghui Zhu, Gabriel Mankovich, Peter Prinsen, Prescott Klassen, Sam Pilato, Rob van Ommering, Pritesh Patel, Martin L. Gunn, Paul Chang

Published in: Journal of Imaging Informatics in Medicine | Issue 1/2019

Abstract

In today’s radiology workflow, free-text reporting is established as the most common medium to capture, store, and communicate clinical information. Radiologists routinely refer to prior radiology reports of a patient to recall critical information for new diagnosis, which is quite tedious, time consuming, and prone to human error. Automatic structuring of report content is desired to facilitate such inquiry of information. In this work, we propose an unsupervised machine learning approach to automatically structure radiology reports by detecting and normalizing anatomical phrases based on the Systematized Nomenclature of Medicine—Clinical Terms (SNOMED CT) ontology. The proposed approach combines word embedding-based semantic learning with ontology-based concept mapping to derive the desired concept normalization. The word embedding model was trained using a large corpus of unlabeled radiology reports. Fifty-six anatomical labels were extracted from SNOMED CT as class labels of the whole human anatomy. The proposed framework was compared against a number of state-of-the-art supervised and unsupervised approaches. Radiology reports from three different clinical sites were manually labeled for testing. The proposed approach outperformed other techniques yielding an average precision of 82.6%. The proposed framework boosts the coverage and performance of conventional approaches for concept normalization, by applying word embedding techniques in semantic learning, while avoiding the challenge of having access to a large amount of annotated data, which is typically required for training classifiers.

Available only for authorised users

M. Q. Stearns, C. Price, K. A. Spackman, and A. Y. Wang, SNOMED clinical terms: overview of the development process and project status, in Proceedings of the AMIA Symposium, 2001, p. 662.

D. B. Johnson, R. K. Taira, A. F. Cardenas, and D. R. Aberle, Extracting information from free text radiology reports, vol. 1, no. 3, pp. 297–308, 1997.

O. Bodenreider, The unified medical language system (UMLS): integrating biomedical terminology, Nucleic acids research, vol. 32, no. suppl_1, pp. D267–D270, 2004.

Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 17(5):507–513, 2010CrossRefPubMedPubMedCentral

S. Goryachev, M. Sordo, and Q. T. Zeng, A suite of natural language processing tools developed for the I2B2 project, in AMIA Annual Symposium Proceedings, 2006, vol. 2006, p. 931.

G. Hripcsak, C. Friedman, P. O. Alderson, W. DuMouchel, S. B. Johnson, and P. D. Clayton, Unlocking clinical data from narrative reports: a study of natural language processing, vol. 122, no. 9, pp. 681–688, 1995.

C. Friedman, P. O. Alderson, J. H. Austin, J. J. Cimino, and S. B. Johnson, A general natural-language text processor for clinical radiology, vol. 1, no. 2, pp. 161–174, 1994.

C. Friedman, L. Shagina, Y. Lussier, and G. Hripcsak, Automated encoding of clinical documents based on natural language processing, vol. 11, no. 5, pp. 392–402, 2004.

J.-D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, Introduction to the bio-entity recognition task at JNLPBA, in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, 2004, pp. 70–75.

10.

M. Gerner, G. Nenadic, and C. M. Bergman, An exploration of mining gene expression mentions and their anatomical locations from biomedical text, in Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, 2010, pp. 72–80.

11.

A. R. Aronson, Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program, in Proceedings of the AMIA Symposium, 2001, p. 17.

12.

Taira RK, Soderland SG, Jakobovits RM: Automatic structuring of radiology free-text reports. Radiographics 21(1):237–245, 2001CrossRefPubMed

13.

Pyysalo S, Ananiadou S: Anatomical entity mention recognition at literature scale. Bioinformatics 30(6):868–875, 2013CrossRefPubMedPubMedCentral

14.

D. Campos, S. Matos, and J. L. Oliveira, Biomedical named entity recognition: a survey of machine-learning tools, in Theory and Applications for Advanced Text Mining, InTech, 2012.

15.

B. Tang, H. Cao, X. Wang, Q. Chen, and H. Xu, Evaluating word representation features in biomedical named entity recognition tasks, vol. 2014, 2014.

16.

Y. Wu, J. Xu, M. Jiang, Y. Zhang, and H. Xu, A study of neural word embeddings for named entity recognition in clinical text, in AMIA Annual Symposium Proceedings, 2015, vol. 2015, p. 1326.

17.

N. Limsopatham and N. Collier, Normalising medical concepts in social media texts by learning semantic representation, in ACL (1), 2016.

18.

Y. Bengio, Deep learning of representations for unsupervised and transfer learning, in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 2012, pp. 17–36.

19.

A. Ferré, P. Zweigenbaum, and C. Nédellec, Representation of complex terms in a vector space structured by an ontology for a normalization task, BioNLP 2017, pp. 99–106, 2017.

20.

Peter Prinsen, Robert van Ommering, Gabe Mankovich, Lucas Oliveira, Vadiraj Hombal, and Amir Tahmasebi, A novel approach for improving the recall of concept detection in medical documents using extended ontologies, in SIIM 2017 Scientific Session, 2017.

21.

S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc., 2009.

22.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient Estimation of Word Representations in Vector Space, 2013.

23.

J. Pennington, R. Socher, and C. D. Manning, Glove: global vectors for word representation, in EMNLP, 2014, vol. 14, pp. 1532–1543.

24.

N. Shazeer, R. Doherty, C. Evans, and C. Waterson, Swivel: improving embeddings by noticing what’s missing, arXiv preprint arXiv:1602.02215, 2016.

25.

B. Chiu, G. Crichton, A. Korhonen, and S. Pyysalo, How to train good word embeddings for biomedical NLP, Proceedings of BioNLP16, p. 166, 2016.

26.

Q. V. Le and T. Mikolov, Distributed representations of sentences and documents, in ICML, 2014, vol. 14, pp. 1188–1196.

27.

Salton G, Buckley C: Term-weighting approaches in automatic text retrieval. Information processing & management 24(5):513–523, 1988CrossRef

28.

P. Stenetorp, S. Pyysalo, G. Topić, T. Ohta, S. Ananiadou, and J. Tsujii, BRAT: a web-based tool for NLP-assisted text annotation, in Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp. 102–107.

29.

Artstein R, Poesio M: Inter-coder agreement for computational linguistics. Computational Linguistics 34(4):555–596, 2008CrossRef

30.

Hochreiter S, Schmidhuber J: Long short-term memory. Neural computation 9(8):1735–1780, 1997CrossRefPubMed

31.

X. Ma and E. Hovy, End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF, 2016.

32.

L. van der Maaten and G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008.

33.

Reiner BI, Knight N, Siegel EL: Radiology reporting, past, present, and future: the radiologist’s perspective. Journal of the American College of Radiology 4(5):313–319, 2007CrossRefPubMed

34.

C. L. Clarke, N. Craswell, and I. Soboroff, Overview of the TREC 2004 Terabyte Track, in TREC, 2004, vol. 4, p. 74.

35.

Porter MF: An algorithm for suffix stripping. Program 14(3):130–137, 1980CrossRef

36.

M. F. Porter, Snowball: A Language for Stemming Algorithms. 2001.

Title: Automatic Normalization of Anatomical Phrases in Radiology Reports Using Unsupervised Learning
Authors: Amir M. Tahmasebi
Henghui Zhu
Gabriel Mankovich
Peter Prinsen
Prescott Klassen
Sam Pilato
Rob van Ommering
Pritesh Patel
Martin L. Gunn
Paul Chang
Publication date: 01-02-2019
Publisher: Springer International Publishing
Published in: Journal of Imaging Informatics in Medicine / Issue 1/2019
Print ISSN: 2948-2925
Electronic ISSN: 2948-2933
DOI: https://doi.org/10.1007/s10278-018-0116-5

Keynote webinar | Spotlight on sleep in brain health

Springer Medicine

Automatic Normalization of Anatomical Phrases in Radiology Reports Using Unsupervised Learning

Abstract

Keynote webinar | Spotlight on sleep in brain health

Springer Medicine

Abstract

Please log in to get access to this content

Other articles of this Issue 1/2019

Automatic Detection of Negated Findings in Radiological Reports for Spanish Language: Methodology Based on Lexicon-Grammatical Information Processing

A Survey of Imaging Informatics Fellowships and Their Curricula: Current State Assessment

Limitations in and Solutions for Improving the Functionality of Picture Archiving and Communication System: an Exploratory Study of PACS Professionals’ Perspectives

Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning

Developing a More Responsive Radiology Resident Dashboard

Applying Modern Virtual and Augmented Reality Technologies to Medical Images and Models