Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 5/2019

Open Access 01-12-2019 | Research

RCorp: a resource for chemical disease semantic extraction in Chinese

Authors: Yueping Sun, Li Hou, Lu Qin, Yan Liu, Jiao Li, Qing Qian

Published in: BMC Medical Informatics and Decision Making | Special Issue 5/2019

Login to get access

Abstract

Background

To robustly identify synergistic combinations of drugs, high-throughput screenings are desirable. It will be of great help to automatically identify the relations in the published papers with machine learning based tools. To support the chemical disease semantic relation extraction especially for chronic diseases, a chronic disease specific corpus for combination therapy discovery in Chinese (RCorp) is manually annotated.

Methods

In this study, we extracted abstracts from a Chinese medical literature server and followed the annotation framework of the BioCreative CDR corpus, with the guidelines modified to make the combination therapy related relations available. An annotation tool was incorporated to the standard annotation process.

Results

The resulting RCorp consists of 339 Chinese biomedical articles with 2367 annotated chemicals, 2113 diseases, 237 symptoms, 164 chemical-induce-disease relations, 163 chemical-induce-symptom relations, and 805 chemical-treat-disease relations. Each annotation includes both the mention text spans and normalized concept identifiers. The corpus gets an inter-annotator agreement score of 0.883 for chemical entities, 0.791 for disease entities which are measured by F score. And the F score for chemical-treat-disease relations gets 0.788 after unifying the entity mentions.

Conclusions

We extracted and manually annotated a chronic disease specific corpus for combination therapy discovery in Chinese. The result analysis of the corpus proves its quality for the combination therapy related knowledge discovery task. Our annotated corpus would be a useful resource for the modelling of entity recognition and relation extraction tools. In the future, an evaluation based on the corpus will be held.
Literature
1.
go back to reference Neves M. An analysis on the entity annotations in biological corpora. F1000Res. 2014;3:96.CrossRef Neves M. An analysis on the entity annotations in biological corpora. F1000Res. 2014;3:96.CrossRef
2.
go back to reference Karjalainen E, Repasky GA. Chapter nine - molecular changes during acute myeloid leukemia (AML) evolution and identification of novel treatment strategies through molecular stratification. Prog Mol Biol Transl Sci. 2016;144:383–436.CrossRef Karjalainen E, Repasky GA. Chapter nine - molecular changes during acute myeloid leukemia (AML) evolution and identification of novel treatment strategies through molecular stratification. Prog Mol Biol Transl Sci. 2016;144:383–436.CrossRef
3.
go back to reference Patel L, Grossberg GT. Combination therapy for Alzheimer's disease. Drugs Aging. 2011;28(7):539–46.CrossRef Patel L, Grossberg GT. Combination therapy for Alzheimer's disease. Drugs Aging. 2011;28(7):539–46.CrossRef
4.
go back to reference Orloff D G: Fixed combination drugs for cardiovascular disease risk reduction: regulatory approach. Am J Cardiol. 2005; 96(9), Sup. 1: 28–33. Orloff D G: Fixed combination drugs for cardiovascular disease risk reduction: regulatory approach. Am J Cardiol. 2005; 96(9), Sup. 1: 28–33.
5.
go back to reference Bailey T. Options for Combination Therapy in Type 2 Diabetes: Comparison of the ADA/EASD Position Statement and AACE/ACE Algorithm. Am J Med. 2013;129(9 Suppl 1):S10–20.CrossRef Bailey T. Options for Combination Therapy in Type 2 Diabetes: Comparison of the ADA/EASD Position Statement and AACE/ACE Algorithm. Am J Med. 2013;129(9 Suppl 1):S10–20.CrossRef
8.
go back to reference Amzallag A, Ramaswamy S, Benes CH. Statistical assessment and visualization of synergies for large-scale sparse drug combination datasets. BMC Bioinformatics. 2019;20:83.CrossRef Amzallag A, Ramaswamy S, Benes CH. Statistical assessment and visualization of synergies for large-scale sparse drug combination datasets. BMC Bioinformatics. 2019;20:83.CrossRef
10.
go back to reference Kim Y, Riloff E, Meystre SM. Exploiting unlabeled texts with clustering-based instance selection for medical relation classification. In: AMIA Annu Symp Proc; 2017. p. 1060–9. Kim Y, Riloff E, Meystre SM. Exploiting unlabeled texts with clustering-based instance selection for medical relation classification. In: AMIA Annu Symp Proc; 2017. p. 1060–9.
11.
go back to reference Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I, et al. Building a semantically annotated corpus of clinical texts. J Biomed Inform. 2009;42(5):950–66.CrossRef Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I, et al. Building a semantically annotated corpus of clinical texts. J Biomed Inform. 2009;42(5):950–66.CrossRef
12.
go back to reference Uzuner Ö, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011;18(5):552–6.CrossRef Uzuner Ö, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011;18(5):552–6.CrossRef
13.
go back to reference van Mulligen EM, Fourrier-Reglat A, Gurwitz D, Molokhia M, Nieto A, Trifiro G, et al. The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J Biomed Inform. 2012;45(5):879–84.CrossRef van Mulligen EM, Fourrier-Reglat A, Gurwitz D, Molokhia M, Nieto A, Trifiro G, et al. The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J Biomed Inform. 2012;45(5):879–84.CrossRef
14.
go back to reference Rosario B, Marti AH. Classifying semantic relations in bioscience text. ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics; 2004. p. 430–7.CrossRef Rosario B, Marti AH. Classifying semantic relations in bioscience text. ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics; 2004. p. 430–7.CrossRef
15.
go back to reference Davis A P, Wiegers T C, Roberts P M, King B L, Lay J M, Lennon-Hopkins K et al. A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions. Database(Oxford). 2013; https://doi.org/10.1093/database/bat080. Davis A P, Wiegers T C, Roberts P M, King B L, Lay J M, Lennon-Hopkins K et al. A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions. Database(Oxford). 2013; https://​doi.​org/​10.​1093/​database/​bat080.
17.
18.
go back to reference Xia Y, Wang Q. Clinical named entity recognition: ECUST in the CCKS-2017 shared task 2. In: China Conference on Knowledge Graph and Semantic Computing; 2017. p. 43–8. Xia Y, Wang Q. Clinical named entity recognition: ECUST in the CCKS-2017 shared task 2. In: China Conference on Knowledge Graph and Semantic Computing; 2017. p. 43–8.
19.
go back to reference Li D, Hu T, Zhu W, Qian Q, Ren H, Li J, et al. Retrieval system for the Chinese medical subject headings. Chin J Med Library. 2004;4:1–2,9. Li D, Hu T, Zhu W, Qian Q, Ren H, Li J, et al. Retrieval system for the Chinese medical subject headings. Chin J Med Library. 2004;4:1–2,9.
20.
go back to reference Wei CH, Kao HY, Lu Z. PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 2013;41:W518–22.CrossRef Wei CH, Kao HY, Lu Z. PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 2013;41:W518–22.CrossRef
22.
go back to reference Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Setzer A. Semantic annotation of clinical text: the CLEF corpus. In: Proceedings of the LREC 2008 workshop on building and evaluating resources for biomedical text mining; 2008. p. 19–26. Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Setzer A. Semantic annotation of clinical text: the CLEF corpus. In: Proceedings of the LREC 2008 workshop on building and evaluating resources for biomedical text mining; 2008. p. 19–26.
23.
go back to reference Schuemie M, Jelier R, Kors J. Peregrine: lightweight gene name normalization by dictionary lookup. In: Second BioCreative Workshop; 2007. p. 131–3. Schuemie M, Jelier R, Kors J. Peregrine: lightweight gene name normalization by dictionary lookup. In: Second BioCreative Workshop; 2007. p. 131–3.
24.
go back to reference Oronoz M, Gojenola K, Pérez A, de Ilarraza AD, Casillas A. On the creation of a clinical gold standard corpus in Spanish: mining adverse drug reactions. J Biomed Inform. 2015;56:318–32.CrossRef Oronoz M, Gojenola K, Pérez A, de Ilarraza AD, Casillas A. On the creation of a clinical gold standard corpus in Spanish: mining adverse drug reactions. J Biomed Inform. 2015;56:318–32.CrossRef
Metadata
Title
RCorp: a resource for chemical disease semantic extraction in Chinese
Authors
Yueping Sun
Li Hou
Lu Qin
Yan Liu
Jiao Li
Qing Qian
Publication date
01-12-2019
Publisher
BioMed Central
DOI
https://doi.org/10.1186/s12911-019-0936-3

Other articles of this Special Issue 5/2019

BMC Medical Informatics and Decision Making 5/2019 Go to the issue