Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 3/2018

Open Access 01-09-2018 | Research

Comparison of MetaMap and cTAKES for entity extraction in clinical notes

Authors: Ruth Reátegui, Sylvie Ratté

Published in: BMC Medical Informatics and Decision Making | Special Issue 3/2018

Login to get access

Abstract

Background

Clinical notes such as discharge summaries have a semi- or unstructured format. These documents contain information about diseases, treatments, drugs, etc. Extracting meaningful information from them becomes challenging due to their narrative format. In this context, we aimed to compare the automatic extraction capacity of medical entities using two tools: MetaMap and cTAKES.

Methods

We worked with i2b2 (Informatics for Integrating Biology to the Bedside) Obesity Challenge data. Two experiments were constructed. In the first one, only one UMLS concept related with the diseases annotated was extracted. In the second, some UMLS concepts were aggregated.

Results

Results were evaluated with manually annotated medical entities. With the aggregation process the result shows a better improvement. MetaMap had an average of 0.88 in recall, 0.89 in precision, and 0.88 in F-score. With cTAKES, the average of recall, precision and F-score were 0.91, 0.89, and 0.89, respectively.

Conclusions

The aggregation of concepts (with similar and different semantic types) was shown to be a good strategy for improving the extraction of medical entities, and automatic aggregation could be considered in future works.
Literature
1.
go back to reference Roque FS, Jensen PB, Schmock H, Dalgaard M, Andreatta M, Hansen T, Brunak S. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput Biol. 2011;7(8):1–10.CrossRef Roque FS, Jensen PB, Schmock H, Dalgaard M, Andreatta M, Hansen T, Brunak S. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput Biol. 2011;7(8):1–10.CrossRef
2.
go back to reference Lyalina S, Percha B, LePendu P, Iyer SV, Altman RB, Shah NH. Identifying phenotypic signatures of neuropsychiatric disorders from electronic medical records. JAMIA. 2013;20(e2):e297–305.PubMed Lyalina S, Percha B, LePendu P, Iyer SV, Altman RB, Shah NH. Identifying phenotypic signatures of neuropsychiatric disorders from electronic medical records. JAMIA. 2013;20(e2):e297–305.PubMed
3.
go back to reference Alnazzawi N, Thompson P, Batista-Navarro R, Ananiadou S. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus. BMC Med Inform Decis Mak. 2015;15:1–10. Alnazzawi N, Thompson P, Batista-Navarro R, Ananiadou S. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus. BMC Med Inform Decis Mak. 2015;15:1–10.
4.
go back to reference Chiaramello E, Paglialonga A, Pinciroli F, Tognola G. Attempting to use MetaMap in clinical practice: a feasibility study on the identification of medical concepts from Italian clinical notes. Stud Health Technol Inform. 2016;228:28–32.PubMed Chiaramello E, Paglialonga A, Pinciroli F, Tognola G. Attempting to use MetaMap in clinical practice: a feasibility study on the identification of medical concepts from Italian clinical notes. Stud Health Technol Inform. 2016;228:28–32.PubMed
5.
go back to reference Pereira L, Rijo R, Silva C, Agostinho M. Using text mining to diagnose and classify epilepsy in children. In: 2013 IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom 2013): 9–12 Oct. 2013; 2013:345–9.CrossRef Pereira L, Rijo R, Silva C, Agostinho M. Using text mining to diagnose and classify epilepsy in children. In: 2013 IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom 2013): 9–12 Oct. 2013; 2013:345–9.CrossRef
6.
go back to reference Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13.CrossRef Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13.CrossRef
7.
go back to reference Pradhan S, Elhadad N, South BR, Martinez D, Christensen L, Vogel A, Suominen H, Chapman WW, Savova G. Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc. 2015;22(1):143–54.CrossRef Pradhan S, Elhadad N, South BR, Martinez D, Christensen L, Vogel A, Suominen H, Chapman WW, Savova G. Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc. 2015;22(1):143–54.CrossRef
8.
go back to reference Kovacevic A, Dehghan A, Filannino M, Keane JA, Nenadic G. Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. J Am Med Inform Assoc. 2013;20(5):859–66.CrossRef Kovacevic A, Dehghan A, Filannino M, Keane JA, Nenadic G. Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. J Am Med Inform Assoc. 2013;20(5):859–66.CrossRef
9.
go back to reference Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. JAMIA. 2010;17(3):229–36.PubMed Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. JAMIA. 2010;17(3):229–36.PubMed
10.
go back to reference Aronso A. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. AMIA Annu Symp Proc. 2001;2001:17–21. Aronso A. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. AMIA Annu Symp Proc. 2001;2001:17–21.
11.
go back to reference Becker M, Bockmann B. Extraction of UMLS (R) concepts using apache cTAKES (TM) for German language. Stud Health Technol. 2016;223:71–6. Becker M, Bockmann B. Extraction of UMLS (R) concepts using apache cTAKES (TM) for German language. Stud Health Technol. 2016;223:71–6.
12.
go back to reference Yildirim P, Çeken Ç, Hassanpour R, Tolun MR. Prediction of similarities among rheumatic diseases. J Med Syst. 2012;36(3):1485–90.CrossRef Yildirim P, Çeken Ç, Hassanpour R, Tolun MR. Prediction of similarities among rheumatic diseases. J Med Syst. 2012;36(3):1485–90.CrossRef
13.
go back to reference Yıldırım P, Çeken Ç, Çeken K, Tolun M. Clustering analysis for vasculitic diseases. In: Zavoral F, Yaghob J, Pichappan P, El-Qawasmeh E, editors. Networked Digital Technologies, vol. 88: Springer, Berlin, Heidelberg; 2010:36–45.CrossRef Yıldırım P, Çeken Ç, Çeken K, Tolun M. Clustering analysis for vasculitic diseases. In: Zavoral F, Yaghob J, Pichappan P, El-Qawasmeh E, editors. Networked Digital Technologies, vol. 88: Springer, Berlin, Heidelberg; 2010:36–45.CrossRef
14.
go back to reference Bejan CA, Xia F, Vanderwende L, Wurfel MM, Yetisgen-Yildiz M. Pneumonia identification using statistical feature selection. JAMIA. 2012;19(5):817–23.PubMed Bejan CA, Xia F, Vanderwende L, Wurfel MM, Yetisgen-Yildiz M. Pneumonia identification using statistical feature selection. JAMIA. 2012;19(5):817–23.PubMed
15.
go back to reference Uzuner Ö. Recognizing obesity and comorbidities in sparse data. JAMIA. 2009;16(4):561–70.PubMed Uzuner Ö. Recognizing obesity and comorbidities in sparse data. JAMIA. 2009;16(4):561–70.PubMed
19.
go back to reference Hwang S. Comparison and evaluation of pathway-level aggregation methods of gene expression data. BMC Genomics. 2012;13:1–18. Hwang S. Comparison and evaluation of pathway-level aggregation methods of gene expression data. BMC Genomics. 2012;13:1–18.
20.
go back to reference Tang B, Wu Y, Jiang M, Denny JC, Xu H. Recognizing and encoding discorder concepts in clinical text using machine learning and vector space model, Online working notes of the CLEF 2013 Evaluation Labs and Workshop; 2013:23–6. Tang B, Wu Y, Jiang M, Denny JC, Xu H. Recognizing and encoding discorder concepts in clinical text using machine learning and vector space model, Online working notes of the CLEF 2013 Evaluation Labs and Workshop; 2013:23–6.
21.
go back to reference Jonnagaddala J, Jue TR, Chang NW, Dai HJ. Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion. Database (Oxford). 2016;2016:1–14.CrossRef Jonnagaddala J, Jue TR, Chang NW, Dai HJ. Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion. Database (Oxford). 2016;2016:1–14.CrossRef
22.
go back to reference Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34(5):301–10.CrossRef Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34(5):301–10.CrossRef
Metadata
Title
Comparison of MetaMap and cTAKES for entity extraction in clinical notes
Authors
Ruth Reátegui
Sylvie Ratté
Publication date
01-09-2018
Publisher
BioMed Central
DOI
https://doi.org/10.1186/s12911-018-0654-2