Skip to main content
Top
Published in: Journal of Digital Imaging 1/2016

01-02-2016

Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository

Authors: Saeed Hassanpour, Curtis P. Langlotz

Published in: Journal of Imaging Informatics in Medicine | Issue 1/2016

Login to get access

Abstract

Radiology report narrative contains a large amount of information about the patient’s health and the radiologist’s interpretation of medical findings. Most of this critical information is entered in free text format, even when structured radiology report templates are used. The radiology report narrative varies in use of terminology and language among different radiologists and organizations. The free text format and the subtlety and variations of natural language hinder the extraction of reusable information from radiology reports for decision support, quality improvement, and biomedical research. Therefore, as the first step to organize and extract the information content in a large multi-institutional free text radiology report repository, we have designed and developed an unsupervised machine learning approach to capture the main concepts in a radiology report repository and partition the reports based on their main foci. In this approach, radiology reports are modeled in a vector space and compared to each other through a cosine similarity measure. This similarity is used to cluster radiology reports and identify the repository’s underlying topics. We applied our approach on a repository of 1,899,482 radiology reports from three major healthcare organizations. Our method identified 19 major radiology report topics in the repository and clustered the reports accordingly to these topics. Our results are verified by a domain expert radiologist and successfully explain the repository’s primary topics and extract the corresponding reports. The results of our system provide a target-based corpus and framework for information extraction and retrieval systems for radiology reports.
Literature
1.
go back to reference Sobel JL, Pearson ML, Gross K, Desmond KA, Harrison ER, Rubenstein LV, Rogers WH, Kahn KL: Information content and clarity of radiologists’ reports for chest radiography. Acad Radiol 3(9):709–17, 1996CrossRefPubMed Sobel JL, Pearson ML, Gross K, Desmond KA, Harrison ER, Rubenstein LV, Rogers WH, Kahn KL: Information content and clarity of radiologists’ reports for chest radiography. Acad Radiol 3(9):709–17, 1996CrossRefPubMed
2.
go back to reference Khorasani R, Bates DW, Teeger S, Rothschild JM, Adams DF, Seltzer SE: Is terminology used effectively to convey diagnostic certainty in radiology reports? Acad Radiol 10(6):685–8, 2003CrossRefPubMed Khorasani R, Bates DW, Teeger S, Rothschild JM, Adams DF, Seltzer SE: Is terminology used effectively to convey diagnostic certainty in radiology reports? Acad Radiol 10(6):685–8, 2003CrossRefPubMed
3.
go back to reference Dreyer KJ, Kalra MK, Maher MM, Hurier AM, Asfaw BA, Schultz T, Halpern EF, Thrall JH: Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study. Radiology 234(2):323–9, 2005CrossRefPubMed Dreyer KJ, Kalra MK, Maher MM, Hurier AM, Asfaw BA, Schultz T, Halpern EF, Thrall JH: Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study. Radiology 234(2):323–9, 2005CrossRefPubMed
4.
go back to reference Dreyer KJ: Information theory entropy reduction program. U.S. Patent 8,756,234, 2014 Dreyer KJ: Information theory entropy reduction program. U.S. Patent 8,756,234, 2014
5.
go back to reference Yetisgen-Yildiz M, Gunn ML, Xia F, Payne TH: A text processing pipeline to extract recommendations from radiology reports. J Biomed Inform 46(2):354–62, 2013CrossRefPubMed Yetisgen-Yildiz M, Gunn ML, Xia F, Payne TH: A text processing pipeline to extract recommendations from radiology reports. J Biomed Inform 46(2):354–62, 2013CrossRefPubMed
6.
go back to reference Friedman C, Hripcsak G, DuMouchel W, Johnson SB, Clayton PD: Natural language processing in an operational clinical information system. Nat Lang Eng 1(01):83–108, 1995CrossRef Friedman C, Hripcsak G, DuMouchel W, Johnson SB, Clayton PD: Natural language processing in an operational clinical information system. Nat Lang Eng 1(01):83–108, 1995CrossRef
7.
go back to reference Hripcsak G, Austin JHM, Alderson PO, Friedman C: Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology 224(1):157–63, 2002CrossRefPubMed Hripcsak G, Austin JHM, Alderson PO, Friedman C: Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology 224(1):157–63, 2002CrossRefPubMed
8.
go back to reference Hripcsak G, Kuperman GJ, Friedman C, Heitjan DF: A reliability study for evaluating information extraction from radiology reports. J Am Med Informatics Assoc 6(2):143–50, 1999CrossRef Hripcsak G, Kuperman GJ, Friedman C, Heitjan DF: A reliability study for evaluating information extraction from radiology reports. J Am Med Informatics Assoc 6(2):143–50, 1999CrossRef
9.
go back to reference Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G: Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review. Comput Biomed Res 33(1):1–10, 2000CrossRefPubMed Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G: Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review. Comput Biomed Res 33(1):1–10, 2000CrossRefPubMed
10.
go back to reference Johnson DB, Taira RK, Cardenas AF, Aberle DR: Extracting information from free text radiology reports. Int J Digit Libr 1(3):297–308, 1997CrossRef Johnson DB, Taira RK, Cardenas AF, Aberle DR: Extracting information from free text radiology reports. Int J Digit Libr 1(3):297–308, 1997CrossRef
11.
go back to reference Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Informatics Assoc 17(5):507–13, 2010CrossRef Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Informatics Assoc 17(5):507–13, 2010CrossRef
12.
go back to reference Goryachev S, Sordo M, Zeng QT: A suite of natural language processing tools developed for the I2B2 project. In: AMIA Annual Symposium Proceedings, 2006, p 931 Goryachev S, Sordo M, Zeng QT: A suite of natural language processing tools developed for the I2B2 project. In: AMIA Annual Symposium Proceedings, 2006, p 931
13.
go back to reference Aronson AR: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, 2001, p 17 Aronson AR: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, 2001, p 17
14.
go back to reference Taira RK, Soderland SG: A statistical natural language processor for medical reports. In: Proceedings of the AMIA Symposium, 1999, p 970 Taira RK, Soderland SG: A statistical natural language processor for medical reports. In: Proceedings of the AMIA Symposium, 1999, p 970
15.
go back to reference Taira RK, Soderland SG, Jakobovits RM: Automatic structuring of radiology free-text reports. Radiographics 21(1):237–45, 2001CrossRefPubMed Taira RK, Soderland SG, Jakobovits RM: Automatic structuring of radiology free-text reports. Radiographics 21(1):237–45, 2001CrossRefPubMed
16.
go back to reference Haug P, Koehler S, Lau LM, Wang P, Rocha R, Huff S: A natural language understanding system combining syntactic and semantic techniques. In: Proceedings of the Annual Symposium on Computer Application in Medical Care, 1994, p 247 Haug P, Koehler S, Lau LM, Wang P, Rocha R, Huff S: A natural language understanding system combining syntactic and semantic techniques. In: Proceedings of the Annual Symposium on Computer Application in Medical Care, 1994, p 247
17.
go back to reference Haug PJ, Koehler S, Lau LM, Wang P, Rocha R, Huff SM: Experience with a mixed semantic/syntactic parser. In: Proceedings of the Annual Symposium on Computer Application in Medical Care, 1995, p 284 Haug PJ, Koehler S, Lau LM, Wang P, Rocha R, Huff SM: Experience with a mixed semantic/syntactic parser. In: Proceedings of the Annual Symposium on Computer Application in Medical Care, 1995, p 284
18.
go back to reference Christensen LM, Haug PJ, Fiszman M: MPLUS: a probabilistic medical language understanding system. In: Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain, vol. 3, 2002, pp 29–36CrossRef Christensen LM, Haug PJ, Fiszman M: MPLUS: a probabilistic medical language understanding system. In: Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain, vol. 3, 2002, pp 29–36CrossRef
19.
go back to reference Friedman C, Rindflesch TC, Corn M: Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine. J Biomed Inform 46(5):765–773, 2013CrossRefPubMed Friedman C, Rindflesch TC, Corn M: Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine. J Biomed Inform 46(5):765–773, 2013CrossRefPubMed
22.
go back to reference Dean J, Ghemawat S: MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–13, 2008CrossRef Dean J, Ghemawat S: MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–13, 2008CrossRef
23.
go back to reference Manning CD, Raghavan P, Schutze H: Scoring, term weighting, and the vector space model. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008, p 100CrossRef Manning CD, Raghavan P, Schutze H: Scoring, term weighting, and the vector space model. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008, p 100CrossRef
24.
go back to reference Kaufman L, Rousseeuw PJ: Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, New York, 2009 Kaufman L, Rousseeuw PJ: Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, New York, 2009
25.
go back to reference Lloyd S: Least squares quantization in PCM. Inf Theory, IEEE Trans 28(2):129–37, 1982CrossRef Lloyd S: Least squares quantization in PCM. Inf Theory, IEEE Trans 28(2):129–37, 1982CrossRef
26.
go back to reference Singhal A: Modern information retrieval: A brief overview. IEEE Data Eng Bull 24(4):35–43, 2001 Singhal A: Modern information retrieval: A brief overview. IEEE Data Eng Bull 24(4):35–43, 2001
Metadata
Title
Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository
Authors
Saeed Hassanpour
Curtis P. Langlotz
Publication date
01-02-2016
Publisher
Springer US
Published in
Journal of Imaging Informatics in Medicine / Issue 1/2016
Print ISSN: 2948-2925
Electronic ISSN: 2948-2933
DOI
https://doi.org/10.1007/s10278-015-9823-3

Other articles of this Issue 1/2016

Journal of Digital Imaging 1/2016 Go to the issue