Abstract
At a time when experimental throughput in the field of molecular biology is increasing, it is necessary for biologists and people working in related fields to have access to sophisticated tools to enable them to efficiently process large amounts of information in order to stay abreast of current research.Rhetorical zone analysis is an application of natural language processing in which areas of text in scientific papers are classified in terms of argumentation and intellectual contribution in order to pinpoint and distinguish certain types of information. Such analysis can be employed to assist in information extraction, helping to assess and integrate data generated by experiments into the scientific community's store of knowledge.We present results for several experiments in automatic zone identification on the ZAISA-1 dataset, a new dataset composed of full biomedical research papers hand-annotated for rhetorical zones. We concentrate on general purpose and linguistically motivated features, and report results for a variety of sets of features. It is our intention to provide a baseline feature set for modeling, which can be extended in future work using combinations of heuristics and more sophisticated and task-specific modeling techniques.
- G. D. Bader, I. Donaldson, C. Wolting, B. F. Ouellette, T. Pawson, C. W. Hogue, BIND-The Biommolecular Interaction Network Database. Nucleic Acids Research, 29:242--245. 2001.Google ScholarCross Ref
- A. Bairoch, R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL in 200 Nucleic Acids Research, 28:302--303. 2000.Google Scholar
- H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov and P. E. Bourne. The Protein Data Bank/ Nucleic Acids Research, 28:235--242. 2000.Google Scholar
- C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121--167, 1998. Google ScholarDigital Library
- M. Craven and J. Kumlien. Constructing biological knowledge bases by extracting information from text sources. ISMB'99, pp 77--86. 1999. Google ScholarDigital Library
- N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press, 2000. Google ScholarDigital Library
- S. Dickman. Tough mining; the challenges of searching the scientific literature. PLoS Biology, 1(2), pp 144--147. 2003.Google Scholar
- K. Humphreys, G. Demetriou, and R. Gaizauskas. Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structures. In BSB2000, pp 502--513. 2000.Google Scholar
- T. Joachims. Learning to Classify Test Using Support Vector Machines. Kluwer Academic Publishers, 2001. Google ScholarDigital Library
- A. Koike, Y. Kobayashi, and T. Takagi. Kinase pathway database: an integrated protein-kinase and nip-based protein-interaction resource. Genome Res, 17(6A):1231--1243, 2003.Google ScholarCross Ref
- A. Koike and T. Takagi. Prediction of protein-protein interaction sites using support vector machines. Protein Engineering Design and Selection, 17(2):165--173, 2004.Google ScholarCross Ref
- L. Lo Conte, S. E. Brenner, T. J. P. Hubbard, C. Chothia, A, Murzin. SCOP database in 2002: Refinements accommodate structural genomics. Nucleic Acids Research, 30:264--267, 2002.Google ScholarCross Ref
- Y. Mizuta and N. Collier. An annotation scheme for a rhetorical analysis of biology articles. In LREC2004, pp. 1737--1740, 2004.Google Scholar
- Y. Mizuta, T. Mullen and N. Collier. Annotation of Biomedical Texts for Zone Analysis. NII Technical Report (NII-2004-007E, ISSN:1346--5597). Oct 2004.Google Scholar
- Y. Mizuta, A. Korhonen, T. Mullen and N. Collier. Zone analysis in biology articles as a basis for information extraction. In the Special Edition on Natural Language Processing in Biomedicine and Its Applications, International Journal of Medical Informatics. Elsevier. To appear. Google ScholarDigital Library
- S. Novichova, S. Egorov, and N. Darasalia. Medscan, a natural language processing engine for medline abstracts. Bioinformatics; 19(13):1699--1706, 2003.Google Scholar
- I. Tbahriti, C. Chichester, F Lisacek and P Ruch. Using Argumentation to Retrieve Articles with Similar Citations from MEDLINE. JNLPBA, pp 8--14. 2004. Google ScholarDigital Library
- G. Salton and M. J. McGill. The SMART and SIRE Experimental Retrieval Systems, pp.118--155, New York: McGraw-Hill. 1983.Google Scholar
- H. Schauer and U. Hahn Phrases as carriers of coherence relations CogSci 2000---Proceedings of the 22nd Annual Conference of the Cognitive Science Society, pp. 429--434. 2000.Google Scholar
- L. Tanabe and W. Wilbur. Tagging gene and protein names in biomedical text. Bioinformatics, 18, pp 1124--1132, 2002.Google ScholarCross Ref
- P. Tapanainen and T. Järvinen. A non-projective dependency parser. In Proceedings of the 5th Conference on Applied Natural Language Processing, Washington D.C., Association of Computational Linguistics, pp 64--71. 1997. Google ScholarDigital Library
- S. Teufel. Arugmentative Zoning: Information Extraction from Scientific Text PhD Thesis. University of Edinburgh. 1999.Google Scholar
- S. Teufel and M. Moens. Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4):409--445, 2002. Google ScholarDigital Library
- S. Teufel and H. van Halteren. Agreement in human factoid annotation for summarization evaluation. In LREC2004, 2004.Google Scholar
- V. N. Vapnik. Statistical Learning Theory. Springer. 1998. Google ScholarDigital Library
- T. Wattarujeekrit, P. Shah and N. Collier PASBio: Predicate-argument structures for event extraction in molecular biology. BMC Bioinformatics 5:155. 2004.Google ScholarCross Ref
- A. Zanzoni, L. Montecchi, M. Quondam G. Ausiello, M. Helmer-Citterich and G. Cesareni. MINT: A Molecular INTeraction database. FEBS Lett 513:135--140. 2002.Google ScholarCross Ref
Index Terms
- A baseline feature set for learning rhetorical zones using full articles in the biomedical domain
Recommendations
Rhetorical robots: making robots more effective speakers using linguistic cues of expertise
HRI '13: Proceedings of the 8th ACM/IEEE international conference on Human-robot interactionRobots hold great promise as informational assistants such as museum guides, information booth attendants, concierges, shopkeepers, and more. In such positions, people will expect them to be experts on their area of specialty. Not only will robots need ...
Retracted articles in the biomedical literature from Indian authors
AbstractThe aim of the present study is to identify retracted articles in the biomedical literature (co) authored by Indian authors and to examine the features of retracted articles. The PubMed database was searched to find the retracted articles in order ...
Expert-recommended biomedical journal articles: Their retractions or corrections, and post-retraction citing
Faculty Opinions has provided recommendations of important biomedical publications by domain experts (FMs) since 2001. The purpose of this study is two-fold: (1) identify the characteristics of the expert-recommended articles that were subsequently ...
Comments