Article

Free Access

The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers

Authors:
Nigel Collier

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
View Profile

,
Hyun Seok Park

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
View Profile

,
Norihiro Ogata

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
View Profile

,
Yuka Tateishi

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
View Profile

,
Chikashi Nobata

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
View Profile

,
Tomoko Ohta

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
View Profile

,
Tateshi Sekimizu

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
View Profile

,
Hisao Imai

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
View Profile

,
Katsutoshi Ibushi

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
View Profile

,
Jun-ichi Tsujii

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan

University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
View Profile

EACL '99: Proceedings of the ninth conference on European chapter of the Association for Computational LinguisticsJune 1999Pages 271–272https://doi.org/10.3115/977035.977081

Published:08 June 1999Publication History

EACL '99: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics

Pages 271–272

ABSTRACT

We present an outline of the genome information acquisition (GENIA) project for automatically extracting biochemical information from journal papers and abstracts. GENIA will be available over the Internet and is designed to aid in information extraction, retrieval and visualisation and to help reduce information overload on researchers. The vast repository of papers available online in databases such as MEDLINE is a natural environment in which to develop language engineering methods and tools and is an opportunity to show how language engineering can play a key role on the Internet.

References

L. D. Baker and A. K. McCallum. 1998. Distributional clustering of words for text classification. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. Google ScholarDigital Library
C. Buckley, J. Allan, and G. Salton. 1993. Automatic routing and ad-hoc retrieval using SMART: TREC-2. In D. K. Harman, editor, The second Text R Etrieval Conference (TREC-2), pages 45--55. NIST. Google Scholar
GENIA. 1999. Information on the GENIA project can be found at:. http://www.is.s.u-tokyo.ac.jp/~nigel/GENIA.html.Google Scholar
Y. Jing and W. Croft. 1994. An association thesaurus for information retrieval. In Proceedings of RIAO'94, pages 146--160.Google Scholar
MEDLINE. 1999. The PubMed database can be found at:. http://www.ncbi.nlm.nih.gov/PubMed/.Google Scholar
Norihiro Ogata. 1997. Dynamic constructive thesaurus. In Language Study and Thesaurus: Proceedings of the National Language Research Institute Fifth International Symposium: Session 1, pages 182--189. The National Language Research Institute, Tokyo.Google Scholar
J. R. Quinlan. 1993. c4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., San Mateo, California. Google ScholarDigital Library
G. Salton. 1989. Automatic Text Processing - The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Publishing Company, Inc., Reading, Massachusetts. Google ScholarDigital Library
T. Sekimizu, H. Park, and J. Tsujii. 1998. Identifying the interaction between genes and gene products based on frequently seen verbs in medline abstracts. In Genome Informatics. Unviersal Academy Press, Inc.Google Scholar
A. Voutilainen. 1996. Designing a (finite-state) parsing grammar. In E. Roche and Y. Schabes, editors, Finite-State Language Processing. A Bradford Book, The MIT Press.Google Scholar

The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers
1. Applied computing
  1. Life and medical sciences
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

The GENIA corpus: an annotated research abstract corpus in molecular biology domain
HLT '02: Proceedings of the second international conference on Human Language Technology Research

With the information overload in genome-related field, there is an increasing need for natural language processing technology to extract information from literature and various attempts of information extraction using NLP has been being made. We are ...
Read More
Incorporating GENETAG-style annotation to GENIA corpus
BioNLP '09: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

Proteins and genes are the most important entities in molecular biology, and their automated recognition in text is the most widely studied task in biomedical information extraction (IE). Several corpora containing annotation for these entities have ...
Read More
Recognizing nested named entities in GENIA corpus
BioNLP '06: Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis

Nested Named Entities (nested NEs), one containing another, are commonly seen in biomedical text, e.g., accounting for 16.7% of all named entities in GENIA corpus. While many works have been done in recognizing non-nested NEs, nested NEs have been ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EACL '99: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
June 1999
310 pages
Program Chairs:
Henry S. Thompson
University of Edinburgh
,
Alex Lascarides
University of Edinburgh
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 8 June 1999
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate100of360submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 467
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers

EACL '99: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

The GENIA corpus: an annotated research abstract corpus in molecular biology domain

Incorporating GENETAG-style annotation to GENIA corpus

Recognizing nested named entities in GENIA corpus

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers

EACL '99: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

The GENIA corpus: an annotated research abstract corpus in molecular biology domain

Incorporating GENETAG-style annotation to GENIA corpus

Recognizing nested named entities in GENIA corpus

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media