Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2012

Open Access 01-12-2012 | Proceedings

Semantic text mining support for lignocellulose research

Authors: Marie-Jean Meurs, Caitlin Murphy, Ingo Morgenstern, Greg Butler, Justin Powlowski, Adrian Tsang, René Witte

Published in: BMC Medical Informatics and Decision Making | Special Issue 1/2012

Login to get access

Abstract

Background

Biofuels produced from biomass are considered to be promising sustainable alternatives to fossil fuels. The conversion of lignocellulose into fermentable sugars for biofuels production requires the use of enzyme cocktails that can efficiently and economically hydrolyze lignocellulosic biomass. As many fungi naturally break down lignocellulose, the identification and characterization of the enzymes involved is a key challenge in the research and development of biomass-derived products and fuels. One approach to meeting this challenge is to mine the rapidly-expanding repertoire of microbial genomes for enzymes with the appropriate catalytic properties.

Results

Semantic technologies, including natural language processing, ontologies, semantic Web services and Web-based collaboration tools, promise to support users in handling complex data, thereby facilitating knowledge-intensive tasks. An ongoing challenge is to select the appropriate technologies and combine them in a coherent system that brings measurable improvements to the users. We present our ongoing development of a semantic infrastructure in support of genomics-based lignocellulose research. Part of this effort is the automated curation of knowledge from information on fungal enzymes that is available in the literature and genome resources.

Conclusions

Working closely with fungal biology researchers who manually curate the existing literature, we developed ontological natural language processing pipelines integrated in a Web-based interface to assist them in two main tasks: mining the literature for relevant knowledge, and at the same time providing rich and semantically linked information.
Literature
1.
go back to reference Demirbas A: Political, economic and environmental impacts of biofuels: a review. Applied Energy. 2009, 86 (Suppl 1): S108-S117.CrossRef Demirbas A: Political, economic and environmental impacts of biofuels: a review. Applied Energy. 2009, 86 (Suppl 1): S108-S117.CrossRef
2.
go back to reference Bringezu S, Schütz H, O'Brien M, Kauppi L, Howarth RW, McNelly J: Towards sustainable production and use of resources: assessing biofuels. Tech Rep. 2009, United Nations Environment Programme Bringezu S, Schütz H, O'Brien M, Kauppi L, Howarth RW, McNelly J: Towards sustainable production and use of resources: assessing biofuels. Tech Rep. 2009, United Nations Environment Programme
3.
go back to reference Jovanovic I, Magnuson J, Collart F, Robbertse B, Adney W, Himmel M, Baker S: Fungal glycoside hydrolases for saccharification of lignocellulose: outlook for new discoveries fueled by genomics and functional studies. Cellulose. 2009, 16: 687-697. 10.1007/s10570-009-9307-z.CrossRef Jovanovic I, Magnuson J, Collart F, Robbertse B, Adney W, Himmel M, Baker S: Fungal glycoside hydrolases for saccharification of lignocellulose: outlook for new discoveries fueled by genomics and functional studies. Cellulose. 2009, 16: 687-697. 10.1007/s10570-009-9307-z.CrossRef
4.
go back to reference Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Lu Z, Madden TL, Madej T, Maglott DR, Marchler-Bauer A, Miller V, Mizrachi I, Ostell J, Panchenko A, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Slotta D, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Wang Y, John Wilbur W, Yaschenko E, Ye J: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009, 38 (Suppl 1): D5-D16.PubMedCentralCrossRefPubMed Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Lu Z, Madden TL, Madej T, Maglott DR, Marchler-Bauer A, Miller V, Mizrachi I, Ostell J, Panchenko A, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Slotta D, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Wang Y, John Wilbur W, Yaschenko E, Ye J: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009, 38 (Suppl 1): D5-D16.PubMedCentralCrossRefPubMed
5.
go back to reference Scheer M, Grote A, Chang A, Schomburg I, Munaretto C, Rother M, Söhngen C, Stelzer M, Thiele J, Schomburg D: BRENDA, the enzyme information system in 2011. Nucleic Acids Res. 2011, 39 (Database issue): D670-D676.PubMedCentralCrossRefPubMed Scheer M, Grote A, Chang A, Schomburg I, Munaretto C, Rother M, Söhngen C, Stelzer M, Thiele J, Schomburg D: BRENDA, the enzyme information system in 2011. Nucleic Acids Res. 2011, 39 (Database issue): D670-D676.PubMedCentralCrossRefPubMed
6.
go back to reference Ananiadou S, McNaught J: Text Mining for Biology and Biomedicine. 2005, Norwood, MA, USA: Artech House, Inc Ananiadou S, McNaught J: Text Mining for Biology and Biomedicine. 2005, Norwood, MA, USA: Artech House, Inc
7.
go back to reference Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences. Edited by: Baker CJO, Cheung KH. 2007, Springer Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences. Edited by: Baker CJO, Cheung KH. 2007, Springer
8.
go back to reference Shadbolt N, Berners-Lee T, Hall W: The semantic web revisited. IEEE Intell Syst. 21 (3): 96-101. Shadbolt N, Berners-Lee T, Hall W: The semantic web revisited. IEEE Intell Syst. 21 (3): 96-101.
9.
go back to reference Müller HM, Kenny EE, Sternberg PW: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2004, 2 (11): e309-10.1371/journal.pbio.0020309.PubMedCentralCrossRefPubMed Müller HM, Kenny EE, Sternberg PW: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2004, 2 (11): e309-10.1371/journal.pbio.0020309.PubMedCentralCrossRefPubMed
11.
go back to reference Hoffmann R, Valencia A: A gene network for navigating the literature. Nat Genet. 2004, 36: 664-10.1038/ng0704-664.CrossRefPubMed Hoffmann R, Valencia A: A gene network for navigating the literature. Nat Genet. 2004, 36: 664-10.1038/ng0704-664.CrossRefPubMed
12.
go back to reference Bernard DC, Buxton BF, Langdon WB, Jones DT: BioRAT: extracting biological information from full-length papers. Bioinformatics. 2004, 20: 3206-3213. 10.1093/bioinformatics/bth386.CrossRef Bernard DC, Buxton BF, Langdon WB, Jones DT: BioRAT: extracting biological information from full-length papers. Bioinformatics. 2004, 20: 3206-3213. 10.1093/bioinformatics/bth386.CrossRef
13.
go back to reference Görg C, Tipney H, Verspoor K, Baumgartner W, Cohen K, Stasko J, Hunter L: Visualization and language processing for supporting analysis across the biomedical literature. Knowledge-Based and Intelligent Information and Engineering Systems, Volume 6279 of Lecture Notes in Computer Science. Edited by: Setchi R, Jordanov I, Howlett R, Jain L. 2010, Springer Berlin/Heidelberg, 420-429. Görg C, Tipney H, Verspoor K, Baumgartner W, Cohen K, Stasko J, Hunter L: Visualization and language processing for supporting analysis across the biomedical literature. Knowledge-Based and Intelligent Information and Engineering Systems, Volume 6279 of Lecture Notes in Computer Science. Edited by: Setchi R, Jordanov I, Howlett R, Jain L. 2010, Springer Berlin/Heidelberg, 420-429.
14.
go back to reference Witte R, Kappler T, Baker CJO: Ontology design for biomedical text mining. Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences. Edited by: Baker CJO, Cheung KH. 2007, Springer, 281-313.CrossRef Witte R, Kappler T, Baker CJO: Ontology design for biomedical text mining. Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences. Edited by: Baker CJO, Cheung KH. 2007, Springer, 281-313.CrossRef
15.
go back to reference Pafilis E, O'Donoghue SI, Jensen LJ, Horn H, Kuhn M, Brown NP, Schneider R: Reflect: augmented browsing for the life scientist. Nat Biotechnol. 2009, 27: 508-510. 10.1038/nbt0609-508.CrossRefPubMed Pafilis E, O'Donoghue SI, Jensen LJ, Horn H, Kuhn M, Brown NP, Schneider R: Reflect: augmented browsing for the life scientist. Nat Biotechnol. 2009, 27: 508-510. 10.1038/nbt0609-508.CrossRefPubMed
16.
go back to reference Murphy C, Powlowski J, Wu M, Butler G, Tsang A: Curation of characterized glycoside hydrolases of fungal origin. Database (Oxford). 2011, 2011: bar020-10.1093/database/bar020.CrossRef Murphy C, Powlowski J, Wu M, Butler G, Tsang A: Curation of characterized glycoside hydrolases of fungal origin. Database (Oxford). 2011, 2011: bar020-10.1093/database/bar020.CrossRef
17.
go back to reference Federhen S: The Taxonomy Project. The NCBI Handbook. Edited by: McEntyre J, Ostell J. 2003, National Library of Medicine (US), National Center for Biotechnology Information Federhen S: The Taxonomy Project. The NCBI Handbook. Edited by: McEntyre J, Ostell J. 2003, National Library of Medicine (US), National Center for Biotechnology Information
18.
go back to reference UniProt Consortium: The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res. 2009, 37 (Database issue): D169-D174.CrossRef UniProt Consortium: The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res. 2009, 37 (Database issue): D169-D174.CrossRef
19.
go back to reference Witte R, Gitzinger T: Semantic assistants - user-centric natural language processing services for desktop clients. 3rd Asian Semantic Web Conference (ASWC 2008), Volume 5367 of LNCS, Bangkok, Thailand. 2009, Springer, 360-374. Witte R, Gitzinger T: Semantic assistants - user-centric natural language processing services for desktop clients. 3rd Asian Semantic Web Conference (ASWC 2008), Volume 5367 of LNCS, Bangkok, Thailand. 2009, Springer, 360-374.
20.
go back to reference Cunningham H, Maynard D, Bontcheva K, Tablan V, Aswani N, Roberts I, Gorrell G, Funk A, Roberts A, Damljanovic D, Heitz T, Greenwood MA, Saggion H, Petrak J, Li Y, Peters W: Text Processing with GATE (Version 6). 2011, University of Sheffield, Department of Computer Science, [http://tinyurl.com/gatebook] Cunningham H, Maynard D, Bontcheva K, Tablan V, Aswani N, Roberts I, Gorrell G, Funk A, Roberts A, Damljanovic D, Heitz T, Greenwood MA, Saggion H, Petrak J, Li Y, Peters W: Text Processing with GATE (Version 6). 2011, University of Sheffield, Department of Computer Science, [http://​tinyurl.​com/​gatebook]
21.
go back to reference Witte R, Khamis N, Rilling J: Flexible ontology population from text: the OwlExporter. The Seventh International Conference on Language Resources and Evaluation (LREC 2010). 2010, Valletta, Malta: ELRA, 3845-3850. Witte R, Khamis N, Rilling J: Flexible ontology population from text: the OwlExporter. The Seventh International Conference on Language Resources and Evaluation (LREC 2010). 2010, Valletta, Malta: ELRA, 3845-3850.
22.
go back to reference Naderi N, Kappler T, Baker CJ, Witte R: OrganismTagger: detection, normalization, and grounding of organism entities in biomedical documents. Bioinformatics. 2011, 27 (19): 2721-2729. 10.1093/bioinformatics/btr452.CrossRefPubMed Naderi N, Kappler T, Baker CJ, Witte R: OrganismTagger: detection, normalization, and grounding of organism entities in biomedical documents. Bioinformatics. 2011, 27 (19): 2721-2729. 10.1093/bioinformatics/btr452.CrossRefPubMed
23.
go back to reference International Union of Biochemistry and Molecular Biology: Enzyme Nomenclature. 1992, San Diego, California: Academic Press International Union of Biochemistry and Molecular Biology: Enzyme Nomenclature. 1992, San Diego, California: Academic Press
24.
go back to reference Saha BC: Production, purification and properties of endoglucanase from a newly isolated strain of Mucor circinelloides. Process Biochemistry. 2004, 39 (12): 1871-1876. 10.1016/j.procbio.2003.09.013.CrossRef Saha BC: Production, purification and properties of endoglucanase from a newly isolated strain of Mucor circinelloides. Process Biochemistry. 2004, 39 (12): 1871-1876. 10.1016/j.procbio.2003.09.013.CrossRef
25.
go back to reference Bontcheva K, Cunningham H, Roberts I, Tablan V: Web-based collaborative corpus annotation: requirements and a framework implementation. New Challenges for NLP Frameworks. 2010, Valletta, Malta: ELRA, 20-27. Bontcheva K, Cunningham H, Roberts I, Tablan V: Web-based collaborative corpus annotation: requirements and a framework implementation. New Challenges for NLP Frameworks. 2010, Valletta, Malta: ELRA, 20-27.
26.
go back to reference Okazaki N, Ananiadou S, Tsujii J: Building a high-quality sense inventory for improved abbreviation disambiguation. Bioinformatics. 2010, 26 (9): 1246-1253. 10.1093/bioinformatics/btq129.PubMedCentralCrossRefPubMed Okazaki N, Ananiadou S, Tsujii J: Building a high-quality sense inventory for improved abbreviation disambiguation. Bioinformatics. 2010, 26 (9): 1246-1253. 10.1093/bioinformatics/btq129.PubMedCentralCrossRefPubMed
27.
go back to reference Yamamoto Y, Yamaguchi A, Bono H, Takagi T: Allie: a database and a search service of abbreviations and long forms. Database (Oxford). 2011, 2011: bar013-10.1093/database/bar013. Yamamoto Y, Yamaguchi A, Bono H, Takagi T: Allie: a database and a search service of abbreviations and long forms. Database (Oxford). 2011, 2011: bar013-10.1093/database/bar013.
Metadata
Title
Semantic text mining support for lignocellulose research
Authors
Marie-Jean Meurs
Caitlin Murphy
Ingo Morgenstern
Greg Butler
Justin Powlowski
Adrian Tsang
René Witte
Publication date
01-12-2012
Publisher
BioMed Central
DOI
https://doi.org/10.1186/1472-6947-12-S1-S5

Other articles of this Special Issue 1/2012

BMC Medical Informatics and Decision Making 1/2012 Go to the issue