Article

Free Access

Protein name tagging for biomedical annotation in text

Authors:
Kaoru Yamamoto

The Institute of Physical and Chemical Research, Suehiro-cho, Tsurumi-ku, Yokohama, Japan

The Institute of Physical and Chemical Research, Suehiro-cho, Tsurumi-ku, Yokohama, Japan
View Profile

,
Taku Kudo

Nara Institute of Science and Technology, Ikoma, Nara, Japan

Nara Institute of Science and Technology, Ikoma, Nara, Japan
View Profile

,
Akihiko Konagaya

The Institute of Physical and Chemical Research, Suehiro-cho, Tsurumi-ku, Yokohama, Japan

The Institute of Physical and Chemical Research, Suehiro-cho, Tsurumi-ku, Yokohama, Japan
View Profile

,
Yuji Matsumoto

Nara Institute of Science and Technology, Ikoma, Nara, Japan

Nara Institute of Science and Technology, Ikoma, Nara, Japan
View Profile

BioMed '03: Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13July 2003Pages 65–72https://doi.org/10.3115/1118958.1118967

Published:11 July 2003Publication History

BioMed '03: Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13

Pages 65–72

ABSTRACT

We explore the use of morphological analysis as preprocessing for protein name tagging. Our method finds protein names by chunking based on a morpheme, the smallest unit determined by the morphological analysis. This helps to recognize the exact boundaries of protein names. Moreover, our morphological analyzer can deal with compounds. This offers a simple way to adapt name descriptions from biomedical resources for language processing. Using GENIA corpus 3.01, our method attains f-score of 70 points for protein molecule names, and 75 points for protein names including molecules, families and domains.

References

B. Boeckmann, A. Bairoch, R. Apweiler, M.-C. Blatter, A. Estreicher, E. Gasteiger, M. J. Martin, K. Michoud, C. O'Donovan, I. Phan, S. Pilbout, and M. Schneider. 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL. Nucleic Acids Res., 31:365--370.Google ScholarCross Ref
N. Collier, C. Nobata, and J. Tsujii. 2000. Extracting the Names of Genes and Gene Products with a Hidden Markov Model. COLING, pages 201--207. Google ScholarDigital Library
M. Collins and Y. Singer. 1999. Unsupervised Models for Named Entity Classification. EMNLP-VLC, pages 100--110.Google Scholar
The Gene Ontology Consortium. 2000. Gene ontology: tool for the unification of biology. Nature Genetics, 25:25--29.Google ScholarCross Ref
K. Fukuda, T. Tsunoda, A. Tamura, and T. Takagi. 1998. Toward information extraction: identifying protein names from biological papers. PSB, pages 705--716.Google Scholar
D. Hanisch, J. Fluck, HT. Mevissen, and R. Zimmer. 2003. Playing biology's name game: identifying protein names in scientific text. PSB, pages 403--414.Google Scholar
J. Kazama, T. Makino, Y. Ohta, and J. Tsujii. 2002. Tuning Support Vector Machines for Biomedical Named Entity Recognition. ACL Workshop on NLP in Biomedical Domain, pages 1--8. Google ScholarDigital Library
T. Kudo and Y. Matsumoto. 2001. Chunking with Support Vector Machines. NAACL, pages 192--199.Google Scholar
C. D. Manning and Schütze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press. Google ScholarDigital Library
NLM. 2002. UMLS Knowledge Sources. 13th edition.Google Scholar
F. Olsson, G. Eriksson, K. Franzen, L. Asker, and P. Lidén. 2002. Notions of Correctness when Evaluating Protein Name Tagger. COLING, pages 765--771.Google Scholar
L. Tanabe and W. J. Wilbur. 2002. Tagging gene and protein names in biomedical text. Bioinformatics, 18(8):1124--1132.Google ScholarCross Ref
E. F. Tjong Kim Sang and J. Veenstra. 1999. Representing Text Chunks. EACL, pages 173--179. Google ScholarDigital Library
C. H. Wu, H. Huang, L. Arminski, J. Castro-Alvear, Y. Chen, Z.-Z. Hu, R. S. Ledley, K. C. Lewis, H.-W. Mewes, B. C. Orcutt, B. E. Suzek, A. Tsugita, C. R. Vinayaka, L.-S. L. Yeh, J. Zhang, and W. C. Barker. 2002. The Protein Information Resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Res., 30:35--37.Google ScholarCross Ref
T. Yamashita and Y. Matsumoto. 2000. Language Independent Morphological Analysis. 6th Applied Natural Language Processing Conference, pages 232--238. Google ScholarDigital Library

Protein name tagging for biomedical annotation in text
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Protein name tagging guidelines: lessons learned: Conference Papers

Interest in information extraction from the biomedical literature is motivated by the need to speed up the creation of structured databases representing the latest scientific knowledge about specific objects, such as proteins and genes. This paper ...
Read More
Using heuristics, syntax and a local dynamic dictionary for protein name tagging
HLT '02: Proceedings of the second international conference on Human Language Technology Research

A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. This paper presents a method for extracting protein names from abstracts of articles in the biomedical domain. These names present several ...
Read More
Two learning approaches for protein name extraction

Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

BioMed '03: Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
July 2003
131 pages
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 11 July 2003
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 279
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Protein name tagging for biomedical annotation in text

BioMed '03: Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13

ABSTRACT

References

Cited By

Recommendations

Protein name tagging guidelines: lessons learned: Conference Papers

Using heuristics, syntax and a local dynamic dictionary for protein name tagging

Two learning approaches for protein name extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Protein name tagging for biomedical annotation in text

BioMed '03: Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13

ABSTRACT

References

Cited By

Recommendations

Protein name tagging guidelines: lessons learned: Conference Papers

Using heuristics, syntax and a local dynamic dictionary for protein name tagging

Two learning approaches for protein name extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media