Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2012

Open Access 01-12-2012 | Research article

Recognition of medication information from discharge summaries using ensembles of classifiers

Authors: Son Doan, Nigel Collier, Hua Xu, Pham Hoang Duy, Tu Minh Phuong

Published in: BMC Medical Informatics and Decision Making | Issue 1/2012

Login to get access

Abstract

Background

Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Recently, supervised machine learning methods have proven to be effective in clinical NLP as well. However, combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Combining classifiers into an ensemble classifier presents both challenges and opportunities to improve performance in such NLP tasks.

Methods

We investigated ensemble classifiers that used different voting strategies to combine outputs from three individual classifiers: a rule-based system, a support vector machine (SVM) based system, and a conditional random field (CRF) based system. Three voting methods were proposed and evaluated using the annotated data sets from the 2009 i2b2 NLP challenge: simple majority, local SVM-based voting, and local CRF-based voting.

Results

Evaluation on 268 manually annotated discharge summaries from the i2b2 challenge showed that the local CRF-based voting method achieved the best F-score of 90.84% (94.11% Precision, 87.81% Recall) for 10-fold cross-validation. We then compared our systems with the first-ranked system in the challenge by using the same training and test sets. Our system based on majority voting achieved a better F-score of 89.65% (93.91% Precision, 85.76% Recall) than the previously reported F-score of 89.19% (93.78% Precision, 85.03% Recall) by the first-ranked system in the challenge.

Conclusions

Our experimental results using the 2009 i2b2 challenge datasets showed that ensemble classifiers that combine individual classifiers into a voting system could achieve better performance than a single classifier in recognizing medication information from clinical text. It suggests that simple strategies that can be easily implemented such as majority voting could have the potential to significantly improve clinical entity recognition.
Appendix
Available only for authorised users
Literature
1.
go back to reference Hirschman L, Morgan AA, Yeh AS: Rutabaga by any other name: extracting biological names. J Biomed Inform. 2002, 35: 247-259. 10.1016/S1532-0464(03)00014-5.CrossRefPubMed Hirschman L, Morgan AA, Yeh AS: Rutabaga by any other name: extracting biological names. J Biomed Inform. 2002, 35: 247-259. 10.1016/S1532-0464(03)00014-5.CrossRefPubMed
2.
go back to reference Krauthammer M, Nenadic G: Term identification in the biomedical literature. J Biomed Inform. 2004, 37: 512-526. 10.1016/j.jbi.2004.08.004.CrossRefPubMed Krauthammer M, Nenadic G: Term identification in the biomedical literature. J Biomed Inform. 2004, 37: 512-526. 10.1016/j.jbi.2004.08.004.CrossRefPubMed
3.
go back to reference Kazama J, Makino T, Ohta Y, Tsujii T: Tuning Support Vector Machines for biomedical named entity recognition. Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain. 2002, 1-8.CrossRef Kazama J, Makino T, Ohta Y, Tsujii T: Tuning Support Vector Machines for biomedical named entity recognition. Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain. 2002, 1-8.CrossRef
4.
go back to reference Smith L, Tanabe L, Ando R, Kuo C, Chung I, Hsu C, Lin Y, Klinger R, Friedrich C, Ganchev K, Torii M, Liu H, Haddow B, Struble C, Povinelli R, Vlachos A, Baumgartner W, Hunter L, Carpenter B, Tsai R, Dai H, Liu F, Chen Y, Sun C, Katrenko S, Adriaans P, Blaschke C, Torres R, Neves M, Nakov P, Divoli M, Mana-Lopez A, Mata-Vazquez J, Wilbur W:Overview of BioCreative II gene mention recognition. Genome Biol. 2008, 9 (Suppl 2): S2-10.1186/gb-2008-9-s2-s2.CrossRefPubMedPubMedCentral Smith L, Tanabe L, Ando R, Kuo C, Chung I, Hsu C, Lin Y, Klinger R, Friedrich C, Ganchev K, Torii M, Liu H, Haddow B, Struble C, Povinelli R, Vlachos A, Baumgartner W, Hunter L, Carpenter B, Tsai R, Dai H, Liu F, Chen Y, Sun C, Katrenko S, Adriaans P, Blaschke C, Torres R, Neves M, Nakov P, Divoli M, Mana-Lopez A, Mata-Vazquez J, Wilbur W:Overview of BioCreative II gene mention recognition. Genome Biol. 2008, 9 (Suppl 2): S2-10.1186/gb-2008-9-s2-s2.CrossRefPubMedPubMedCentral
5.
go back to reference Takeuchi K, Collier N: Bio-medical entity extraction using Support Vector Machines. Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine. 2003, 57-64.CrossRef Takeuchi K, Collier N: Bio-medical entity extraction using Support Vector Machines. Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine. 2003, 57-64.CrossRef
6.
7.
go back to reference Yamamoto K, Kudo T, Konagaya A, Matsumoto Y: Protein name tagging for biomedical annotation in text. Proceedings of ACL,workshop on Natural Language Processing in Biomedicine. 2003, 65-72.CrossRef Yamamoto K, Kudo T, Konagaya A, Matsumoto Y: Protein name tagging for biomedical annotation in text. Proceedings of ACL,workshop on Natural Language Processing in Biomedicine. 2003, 65-72.CrossRef
8.
go back to reference Zhou G, Shen D, Zhang J, Su J, Tan S: Recognition of protein/gene names from text using an ensemble of classifiers. BMC Bioinformatics. 2005, 6 (Suppl 1): S7-10.1186/1471-2105-6-S1-S7.CrossRefPubMedPubMedCentral Zhou G, Shen D, Zhang J, Su J, Tan S: Recognition of protein/gene names from text using an ensemble of classifiers. BMC Bioinformatics. 2005, 6 (Suppl 1): S7-10.1186/1471-2105-6-S1-S7.CrossRefPubMedPubMedCentral
9.
go back to reference Uzuner Ö, Solti I, Cadag E: Extracting medication information from clinical text. J Am Med Inform Assoc. 2010, 17 (5): 514-518. 10.1136/jamia.2010.003947.CrossRefPubMedPubMedCentral Uzuner Ö, Solti I, Cadag E: Extracting medication information from clinical text. J Am Med Inform Assoc. 2010, 17 (5): 514-518. 10.1136/jamia.2010.003947.CrossRefPubMedPubMedCentral
10.
go back to reference Patrick J, Li M: High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge. J Am Med Inform Assoc. 2010, 17 (5): 524-527. 10.1136/jamia.2010.003939.CrossRefPubMedPubMedCentral Patrick J, Li M: High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge. J Am Med Inform Assoc. 2010, 17 (5): 524-527. 10.1136/jamia.2010.003939.CrossRefPubMedPubMedCentral
11.
go back to reference Doan S, Bastarache L, Klimkowski S, Denny JC, Xu H: Integrating existing natural language processing tools for medication extraction from discharge summaries. J Am Med Inform Assoc. 2010, 17 (5): 528-531. 10.1136/jamia.2010.003855.CrossRefPubMedPubMedCentral Doan S, Bastarache L, Klimkowski S, Denny JC, Xu H: Integrating existing natural language processing tools for medication extraction from discharge summaries. J Am Med Inform Assoc. 2010, 17 (5): 528-531. 10.1136/jamia.2010.003855.CrossRefPubMedPubMedCentral
12.
go back to reference Halgrim S, Xia F, Solti I, Cadag E, Uzuner Ö: Statistical extraction of medication information from clinical records. Proceedings of AMIA Summit on Translational Bioinformatics. 2010, 10-12. Halgrim S, Xia F, Solti I, Cadag E, Uzuner Ö: Statistical extraction of medication information from clinical records. Proceedings of AMIA Summit on Translational Bioinformatics. 2010, 10-12.
13.
go back to reference Halgrim S, Xia F, Solti I, Cadag E, Uzuner Ö: A cascade of classifiers for extracting medication information from discharge summaries. Journal of Biomedical Semantics. 2011, 2 (Suppl 3): S2-10.1186/2041-1480-2-S3-S2.CrossRefPubMedPubMedCentral Halgrim S, Xia F, Solti I, Cadag E, Uzuner Ö: A cascade of classifiers for extracting medication information from discharge summaries. Journal of Biomedical Semantics. 2011, 2 (Suppl 3): S2-10.1186/2041-1480-2-S3-S2.CrossRefPubMedPubMedCentral
14.
go back to reference Doan S, Xu H:Recognizing medication related entities in hospital discharge summaries using support vector machine. Proceedings of the 23th Conference on Computational Linguistics. 2010, 259-266. Doan S, Xu H:Recognizing medication related entities in hospital discharge summaries using support vector machine. Proceedings of the 23th Conference on Computational Linguistics. 2010, 259-266.
15.
go back to reference Uzuner Ö, Solti I, Xia F, Cadag E: Community annotation experiment for ground truth generation for the i2b2 medication challenge. J Am Med Inform Assoc. 2010, 17 (5): 519-523. 10.1136/jamia.2010.004200.CrossRefPubMedPubMedCentral Uzuner Ö, Solti I, Xia F, Cadag E: Community annotation experiment for ground truth generation for the i2b2 medication challenge. J Am Med Inform Assoc. 2010, 17 (5): 519-523. 10.1136/jamia.2010.004200.CrossRefPubMedPubMedCentral
16.
go back to reference Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC: MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc. 2010, 17 (1): 19-24. 10.1197/jamia.M3378.CrossRefPubMedPubMedCentral Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC: MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc. 2010, 17 (1): 19-24. 10.1197/jamia.M3378.CrossRefPubMedPubMedCentral
17.
go back to reference Settles B:Biomedical named entity recognition using conditional random fields and rich feature sets. Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP). 2004, 104-107. Settles B:Biomedical named entity recognition using conditional random fields and rich feature sets. Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP). 2004, 104-107.
18.
go back to reference Burges CJC: A Tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery. 1998, 2 (2): 121-167. 10.1023/A:1009715923555.CrossRef Burges CJC: A Tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery. 1998, 2 (2): 121-167. 10.1023/A:1009715923555.CrossRef
22.
go back to reference Li Z, Liu F, Antieau L, Cao Y, Yu H: Lancet: a high precision medication event extraction system for clinical text. J Am Med Inform Assoc. 2010, 17 (5): 563-567. 10.1136/jamia.2010.004077.CrossRefPubMedPubMedCentral Li Z, Liu F, Antieau L, Cao Y, Yu H: Lancet: a high precision medication event extraction system for clinical text. J Am Med Inform Assoc. 2010, 17 (5): 563-567. 10.1136/jamia.2010.004077.CrossRefPubMedPubMedCentral
25.
go back to reference Collier N, Nobata C, Tsujii J: Extracting the names of genes and gene products with a hidden Markov model. Proceedings of the 18th Conference on Computational Linguistics. 2000, 201-207.CrossRef Collier N, Nobata C, Tsujii J: Extracting the names of genes and gene products with a hidden Markov model. Proceedings of the 18th Conference on Computational Linguistics. 2000, 201-207.CrossRef
26.
go back to reference Breiman L: Bagging predictors. Machine Learning. 1996, 24: 123-140. Breiman L: Bagging predictors. Machine Learning. 1996, 24: 123-140.
27.
go back to reference Schapire RE, Singer Y: Improved boosting algorithms using confidence-rated predictions. Proceedings of the 11th Annual ACM Conference on Computational Learning Theory. 1998, 80-91. Schapire RE, Singer Y: Improved boosting algorithms using confidence-rated predictions. Proceedings of the 11th Annual ACM Conference on Computational Learning Theory. 1998, 80-91.
Metadata
Title
Recognition of medication information from discharge summaries using ensembles of classifiers
Authors
Son Doan
Nigel Collier
Hua Xu
Pham Hoang Duy
Tu Minh Phuong
Publication date
01-12-2012
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2012
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/1472-6947-12-36

Other articles of this Issue 1/2012

BMC Medical Informatics and Decision Making 1/2012 Go to the issue