skip to main content
10.3115/1118958.1118967dlproceedingsArticle/Chapter ViewAbstractPublication PagesbiomedConference Proceedingsconference-collections
Article
Free Access

Protein name tagging for biomedical annotation in text

Authors Info & Claims
Published:11 July 2003Publication History

ABSTRACT

We explore the use of morphological analysis as preprocessing for protein name tagging. Our method finds protein names by chunking based on a morpheme, the smallest unit determined by the morphological analysis. This helps to recognize the exact boundaries of protein names. Moreover, our morphological analyzer can deal with compounds. This offers a simple way to adapt name descriptions from biomedical resources for language processing. Using GENIA corpus 3.01, our method attains f-score of 70 points for protein molecule names, and 75 points for protein names including molecules, families and domains.

References

  1. B. Boeckmann, A. Bairoch, R. Apweiler, M.-C. Blatter, A. Estreicher, E. Gasteiger, M. J. Martin, K. Michoud, C. O'Donovan, I. Phan, S. Pilbout, and M. Schneider. 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL. Nucleic Acids Res., 31:365--370.Google ScholarGoogle ScholarCross RefCross Ref
  2. N. Collier, C. Nobata, and J. Tsujii. 2000. Extracting the Names of Genes and Gene Products with a Hidden Markov Model. COLING, pages 201--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Collins and Y. Singer. 1999. Unsupervised Models for Named Entity Classification. EMNLP-VLC, pages 100--110.Google ScholarGoogle Scholar
  4. The Gene Ontology Consortium. 2000. Gene ontology: tool for the unification of biology. Nature Genetics, 25:25--29.Google ScholarGoogle ScholarCross RefCross Ref
  5. K. Fukuda, T. Tsunoda, A. Tamura, and T. Takagi. 1998. Toward information extraction: identifying protein names from biological papers. PSB, pages 705--716.Google ScholarGoogle Scholar
  6. D. Hanisch, J. Fluck, HT. Mevissen, and R. Zimmer. 2003. Playing biology's name game: identifying protein names in scientific text. PSB, pages 403--414.Google ScholarGoogle Scholar
  7. J. Kazama, T. Makino, Y. Ohta, and J. Tsujii. 2002. Tuning Support Vector Machines for Biomedical Named Entity Recognition. ACL Workshop on NLP in Biomedical Domain, pages 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Kudo and Y. Matsumoto. 2001. Chunking with Support Vector Machines. NAACL, pages 192--199.Google ScholarGoogle Scholar
  9. C. D. Manning and Schütze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. NLM. 2002. UMLS Knowledge Sources. 13th edition.Google ScholarGoogle Scholar
  11. F. Olsson, G. Eriksson, K. Franzen, L. Asker, and P. Lidén. 2002. Notions of Correctness when Evaluating Protein Name Tagger. COLING, pages 765--771.Google ScholarGoogle Scholar
  12. L. Tanabe and W. J. Wilbur. 2002. Tagging gene and protein names in biomedical text. Bioinformatics, 18(8):1124--1132.Google ScholarGoogle ScholarCross RefCross Ref
  13. E. F. Tjong Kim Sang and J. Veenstra. 1999. Representing Text Chunks. EACL, pages 173--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. H. Wu, H. Huang, L. Arminski, J. Castro-Alvear, Y. Chen, Z.-Z. Hu, R. S. Ledley, K. C. Lewis, H.-W. Mewes, B. C. Orcutt, B. E. Suzek, A. Tsugita, C. R. Vinayaka, L.-S. L. Yeh, J. Zhang, and W. C. Barker. 2002. The Protein Information Resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Res., 30:35--37.Google ScholarGoogle ScholarCross RefCross Ref
  15. T. Yamashita and Y. Matsumoto. 2000. Language Independent Morphological Analysis. 6th Applied Natural Language Processing Conference, pages 232--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Protein name tagging for biomedical annotation in text

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        BioMed '03: Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
        July 2003
        131 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 11 July 2003

        Qualifiers

        • Article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader