article

A baseline feature set for learning rhetorical zones using full articles in the biomedical domain

Authors:
Tony Mullen

National Institute of Informatics, Chiyoda-ku, Tokyo, Japan

National Institute of Informatics, Chiyoda-ku, Tokyo, Japan
View Profile

,
Yoko Mizuta

National Institute of Informatics, Chiyoda-ku, Tokyo, Japan

National Institute of Informatics, Chiyoda-ku, Tokyo, Japan
View Profile

,
Nigel Collier

National Institute of Informatics, Chiyoda-ku, Tokyo, Japan

National Institute of Informatics, Chiyoda-ku, Tokyo, Japan
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 7 Issue 1June 2005pp 52–58https://doi.org/10.1145/1089815.1089823

Published:01 June 2005Publication History

ACM SIGKDD Explorations Newsletter

Abstract

At a time when experimental throughput in the field of molecular biology is increasing, it is necessary for biologists and people working in related fields to have access to sophisticated tools to enable them to efficiently process large amounts of information in order to stay abreast of current research.Rhetorical zone analysis is an application of natural language processing in which areas of text in scientific papers are classified in terms of argumentation and intellectual contribution in order to pinpoint and distinguish certain types of information. Such analysis can be employed to assist in information extraction, helping to assess and integrate data generated by experiments into the scientific community's store of knowledge.We present results for several experiments in automatic zone identification on the ZAISA-1 dataset, a new dataset composed of full biomedical research papers hand-annotated for rhetorical zones. We concentrate on general purpose and linguistically motivated features, and report results for a variety of sets of features. It is our intention to provide a baseline feature set for modeling, which can be extended in future work using combinations of heuristics and more sophisticated and task-specific modeling techniques.

References

G. D. Bader, I. Donaldson, C. Wolting, B. F. Ouellette, T. Pawson, C. W. Hogue, BIND-The Biommolecular Interaction Network Database. Nucleic Acids Research, 29:242--245. 2001.Google ScholarCross Ref
A. Bairoch, R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL in 200 Nucleic Acids Research, 28:302--303. 2000.Google Scholar
H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov and P. E. Bourne. The Protein Data Bank/ Nucleic Acids Research, 28:235--242. 2000.Google Scholar
C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121--167, 1998. Google ScholarDigital Library
M. Craven and J. Kumlien. Constructing biological knowledge bases by extracting information from text sources. ISMB'99, pp 77--86. 1999. Google ScholarDigital Library
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press, 2000. Google ScholarDigital Library
S. Dickman. Tough mining; the challenges of searching the scientific literature. PLoS Biology, 1(2), pp 144--147. 2003.Google Scholar
K. Humphreys, G. Demetriou, and R. Gaizauskas. Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structures. In BSB2000, pp 502--513. 2000.Google Scholar
T. Joachims. Learning to Classify Test Using Support Vector Machines. Kluwer Academic Publishers, 2001. Google ScholarDigital Library
A. Koike, Y. Kobayashi, and T. Takagi. Kinase pathway database: an integrated protein-kinase and nip-based protein-interaction resource. Genome Res, 17(6A):1231--1243, 2003.Google ScholarCross Ref
A. Koike and T. Takagi. Prediction of protein-protein interaction sites using support vector machines. Protein Engineering Design and Selection, 17(2):165--173, 2004.Google ScholarCross Ref
L. Lo Conte, S. E. Brenner, T. J. P. Hubbard, C. Chothia, A, Murzin. SCOP database in 2002: Refinements accommodate structural genomics. Nucleic Acids Research, 30:264--267, 2002.Google ScholarCross Ref
Y. Mizuta and N. Collier. An annotation scheme for a rhetorical analysis of biology articles. In LREC2004, pp. 1737--1740, 2004.Google Scholar
Y. Mizuta, T. Mullen and N. Collier. Annotation of Biomedical Texts for Zone Analysis. NII Technical Report (NII-2004-007E, ISSN:1346--5597). Oct 2004.Google Scholar
Y. Mizuta, A. Korhonen, T. Mullen and N. Collier. Zone analysis in biology articles as a basis for information extraction. In the Special Edition on Natural Language Processing in Biomedicine and Its Applications, International Journal of Medical Informatics. Elsevier. To appear. Google ScholarDigital Library
S. Novichova, S. Egorov, and N. Darasalia. Medscan, a natural language processing engine for medline abstracts. Bioinformatics; 19(13):1699--1706, 2003.Google Scholar
I. Tbahriti, C. Chichester, F Lisacek and P Ruch. Using Argumentation to Retrieve Articles with Similar Citations from MEDLINE. JNLPBA, pp 8--14. 2004. Google ScholarDigital Library
G. Salton and M. J. McGill. The SMART and SIRE Experimental Retrieval Systems, pp.118--155, New York: McGraw-Hill. 1983.Google Scholar
H. Schauer and U. Hahn Phrases as carriers of coherence relations CogSci 2000---Proceedings of the 22nd Annual Conference of the Cognitive Science Society, pp. 429--434. 2000.Google Scholar
L. Tanabe and W. Wilbur. Tagging gene and protein names in biomedical text. Bioinformatics, 18, pp 1124--1132, 2002.Google ScholarCross Ref
P. Tapanainen and T. Järvinen. A non-projective dependency parser. In Proceedings of the 5th Conference on Applied Natural Language Processing, Washington D.C., Association of Computational Linguistics, pp 64--71. 1997. Google ScholarDigital Library
S. Teufel. Arugmentative Zoning: Information Extraction from Scientific Text PhD Thesis. University of Edinburgh. 1999.Google Scholar
S. Teufel and M. Moens. Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4):409--445, 2002. Google ScholarDigital Library
S. Teufel and H. van Halteren. Agreement in human factoid annotation for summarization evaluation. In LREC2004, 2004.Google Scholar
V. N. Vapnik. Statistical Learning Theory. Springer. 1998. Google ScholarDigital Library
T. Wattarujeekrit, P. Shah and N. Collier PASBio: Predicate-argument structures for event extraction in molecular biology. BMC Bioinformatics 5:155. 2004.Google ScholarCross Ref
A. Zanzoni, L. Montecchi, M. Quondam G. Ausiello, M. Helmer-Citterich and G. Cesareni. MINT: A Molecular INTeraction database. FEBS Lett 513:135--140. 2002.Google ScholarCross Ref

Index Terms

Recommendations

Rhetorical robots: making robots more effective speakers using linguistic cues of expertise
HRI '13: Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction

Robots hold great promise as informational assistants such as museum guides, information booth attendants, concierges, shopkeepers, and more. In such positions, people will expect them to be experts on their area of specialty. Not only will robots need ...
Read More
Retracted articles in the biomedical literature from Indian authors
Abstract
The aim of the present study is to identify retracted articles in the biomedical literature (co) authored by Indian authors and to examine the features of retracted articles. The PubMed database was searched to find the retracted articles in order ...
Read More
Expert-recommended biomedical journal articles: Their retractions or corrections, and post-retraction citing

Faculty Opinions has provided recommendations of important biomedical publications by domain experts (FMs) since 2001. The purpose of this study is two-fold: (1) identify the characteristics of the expert-recommended articles that were subsequently ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGKDD Explorations Newsletter Volume 7, Issue 1
Natural language processing and text mining
June 2005
81 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1089815
Issue’s Table of Contents

Copyright © 2005 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2005
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 294
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A baseline feature set for learning rhetorical zones using full articles in the biomedical domain

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Rhetorical robots: making robots more effective speakers using linguistic cues of expertise

Retracted articles in the biomedical literature from Indian authors

Expert-recommended biomedical journal articles: Their retractions or corrections, and post-retraction citing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A baseline feature set for learning rhetorical zones using full articles in the biomedical domain

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Rhetorical robots: making robots more effective speakers using linguistic cues of expertise

Retracted articles in the biomedical literature from Indian authors

Expert-recommended biomedical journal articles: Their retractions or corrections, and post-retraction citing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media