skip to main content
10.3115/977035.977081dlproceedingsArticle/Chapter ViewAbstractPublication PageseaclConference Proceedingsconference-collections
Article
Free Access

The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers

Authors Info & Claims
Published:08 June 1999Publication History

ABSTRACT

We present an outline of the genome information acquisition (GENIA) project for automatically extracting biochemical information from journal papers and abstracts. GENIA will be available over the Internet and is designed to aid in information extraction, retrieval and visualisation and to help reduce information overload on researchers. The vast repository of papers available online in databases such as MEDLINE is a natural environment in which to develop language engineering methods and tools and is an opportunity to show how language engineering can play a key role on the Internet.

References

  1. L. D. Baker and A. K. McCallum. 1998. Distributional clustering of words for text classification. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Buckley, J. Allan, and G. Salton. 1993. Automatic routing and ad-hoc retrieval using SMART: TREC-2. In D. K. Harman, editor, The second Text R Etrieval Conference (TREC-2), pages 45--55. NIST. Google ScholarGoogle Scholar
  3. GENIA. 1999. Information on the GENIA project can be found at:. http://www.is.s.u-tokyo.ac.jp/~nigel/GENIA.html.Google ScholarGoogle Scholar
  4. Y. Jing and W. Croft. 1994. An association thesaurus for information retrieval. In Proceedings of RIAO'94, pages 146--160.Google ScholarGoogle Scholar
  5. MEDLINE. 1999. The PubMed database can be found at:. http://www.ncbi.nlm.nih.gov/PubMed/.Google ScholarGoogle Scholar
  6. Norihiro Ogata. 1997. Dynamic constructive thesaurus. In Language Study and Thesaurus: Proceedings of the National Language Research Institute Fifth International Symposium: Session 1, pages 182--189. The National Language Research Institute, Tokyo.Google ScholarGoogle Scholar
  7. J. R. Quinlan. 1993. c4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., San Mateo, California. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Salton. 1989. Automatic Text Processing - The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Publishing Company, Inc., Reading, Massachusetts. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Sekimizu, H. Park, and J. Tsujii. 1998. Identifying the interaction between genes and gene products based on frequently seen verbs in medline abstracts. In Genome Informatics. Unviersal Academy Press, Inc.Google ScholarGoogle Scholar
  10. A. Voutilainen. 1996. Designing a (finite-state) parsing grammar. In E. Roche and Y. Schabes, editors, Finite-State Language Processing. A Bradford Book, The MIT Press.Google ScholarGoogle Scholar
  1. The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        EACL '99: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
        June 1999
        310 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 8 June 1999

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate100of360submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader