Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2012

Open Access 01-12-2012 | Technical advance

Mining biomarker information in biomedical literature

Authors: Erfan Younesi, Luca Toldo, Bernd Müller, Christoph M Friedrich, Natalia Novac, Alexander Scheer, Martin Hofmann-Apitius, Juliane Fluck

Published in: BMC Medical Informatics and Decision Making | Issue 1/2012

Login to get access

Abstract

Background

For selection and evaluation of potential biomarkers, inclusion of already published information is of utmost importance. In spite of significant advancements in text- and data-mining techniques, the vast knowledge space of biomarkers in biomedical text has remained unexplored. Existing named entity recognition approaches are not sufficiently selective for the retrieval of biomarker information from the literature. The purpose of this study was to identify textual features that enhance the effectiveness of biomarker information retrieval for different indication areas and diverse end user perspectives.

Methods

A biomarker terminology was created and further organized into six concept classes. Performance of this terminology was optimized towards balanced selectivity and specificity. The information retrieval performance using the biomarker terminology was evaluated based on various combinations of the terminology's six classes. Further validation of these results was performed on two independent corpora representing two different neurodegenerative diseases.

Results

The current state of the biomarker terminology contains 119 entity classes supported by 1890 different synonyms. The result of information retrieval shows improved retrieval rate of informative abstracts, which is achieved by including clinical management terms and evidence of gene/protein alterations (e.g. gene/protein expression status or certain polymorphisms) in combination with disease and gene name recognition. When additional filtering through other classes (e.g. diagnostic or prognostic methods) is applied, the typical high number of unspecific search results is significantly reduced. The evaluation results suggest that this approach enables the automated identification of biomarker information in the literature. A demo version of the search engine SCAIView, including the biomarker retrieval, is made available to the public through http://​www.​scaiview.​com/​scaiview-academia.​html.

Conclusions

The approach presented in this paper demonstrates that using a dedicated biomarker terminology for automated analysis of the scientific literature maybe helpful as an aid to finding biomarker information in text. Successful extraction of candidate biomarkers information from published resources can be considered as the first step towards developing novel hypotheses. These hypotheses will be valuable for the early decision-making in the drug discovery and development process.
Appendix
Available only for authorised users
Literature
1.
go back to reference Ghosh D, Poisson LM: Omics data and levels of evidence for biomarker discovery. Genomics. 2009, 93: 13-16. 10.1016/j.ygeno.2008.07.006.CrossRefPubMed Ghosh D, Poisson LM: Omics data and levels of evidence for biomarker discovery. Genomics. 2009, 93: 13-16. 10.1016/j.ygeno.2008.07.006.CrossRefPubMed
2.
go back to reference Group BDW: Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther. 2001, 69: 89-95.CrossRef Group BDW: Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther. 2001, 69: 89-95.CrossRef
3.
go back to reference Perera FP, Weinstein IB: Molecular epidemiology: recent advances and future directions. Carcinogenesis. 2000, 21: 517-524. 10.1093/carcin/21.3.517.CrossRefPubMed Perera FP, Weinstein IB: Molecular epidemiology: recent advances and future directions. Carcinogenesis. 2000, 21: 517-524. 10.1093/carcin/21.3.517.CrossRefPubMed
5.
go back to reference Timbrell J: Types of biomarker and challenges for new biomarkers. Toxicol Lett. 2006, 164 (Suppl 1): S315-CrossRef Timbrell J: Types of biomarker and challenges for new biomarkers. Toxicol Lett. 2006, 164 (Suppl 1): S315-CrossRef
6.
go back to reference Altar CA: The biomarkers consortium: on the critical path of drug discovery. Clin Pharmacol Ther. 2008, 83: 361-364. 10.1038/sj.clpt.6100471.CrossRefPubMed Altar CA: The biomarkers consortium: on the critical path of drug discovery. Clin Pharmacol Ther. 2008, 83: 361-364. 10.1038/sj.clpt.6100471.CrossRefPubMed
7.
go back to reference Wagner JA: Strategic approach to fit-for-purpose biomarkers in drug development. Annu Rev Pharmacol Toxicol. 2008, 48: 631-651. 10.1146/annurev.pharmtox.48.113006.094611.CrossRefPubMed Wagner JA: Strategic approach to fit-for-purpose biomarkers in drug development. Annu Rev Pharmacol Toxicol. 2008, 48: 631-651. 10.1146/annurev.pharmtox.48.113006.094611.CrossRefPubMed
8.
go back to reference Marrer E, Dieterle F: Impact of biomarker development on drug safety assessment. Toxicol Appl Pharmacol. 2010, 243: 167-179. 10.1016/j.taap.2009.12.015.CrossRefPubMed Marrer E, Dieterle F: Impact of biomarker development on drug safety assessment. Toxicol Appl Pharmacol. 2010, 243: 167-179. 10.1016/j.taap.2009.12.015.CrossRefPubMed
9.
go back to reference Bakhtiar R: Biomarkers in drug discovery and development. J Pharmacol Toxicol Methods. 2008, 57: 85-91. 10.1016/j.vascn.2007.10.002.CrossRefPubMed Bakhtiar R: Biomarkers in drug discovery and development. J Pharmacol Toxicol Methods. 2008, 57: 85-91. 10.1016/j.vascn.2007.10.002.CrossRefPubMed
10.
go back to reference Hurko H, Jones GK: Valuation of biomarkers. Nat Rev Drug Discov. 2011, 10: 253-254. 10.1038/nrd3417.CrossRefPubMed Hurko H, Jones GK: Valuation of biomarkers. Nat Rev Drug Discov. 2011, 10: 253-254. 10.1038/nrd3417.CrossRefPubMed
11.
go back to reference Ongenaert M, Dehaspe L: Integrating automated literature searches and text mining in biomarker discovery. BMC Bioinforma. 2010, 11 (Suppl 5): O5-10.1186/1471-2105-11-S5-O5.CrossRef Ongenaert M, Dehaspe L: Integrating automated literature searches and text mining in biomarker discovery. BMC Bioinforma. 2010, 11 (Suppl 5): O5-10.1186/1471-2105-11-S5-O5.CrossRef
12.
13.
go back to reference Harsha HC, Kandasamy K, Ranganathan P, Rani S, Ramabadran S, Gollapudi S, Balakrishnan L, Dwivedi SB, Telikicherla D, Selvan LDN, Goel R, Mathivanan S, Marimuthu R, DeCaprio JA, Srivastava S, Hanash SM, Htuban RH, Pandey A: A compendium of potential biomarkers of pancreatic cancer. PLoS Med. 2009, 6: e1000046-10.1371/journal.pmed.1000046.CrossRefPubMedPubMedCentral Harsha HC, Kandasamy K, Ranganathan P, Rani S, Ramabadran S, Gollapudi S, Balakrishnan L, Dwivedi SB, Telikicherla D, Selvan LDN, Goel R, Mathivanan S, Marimuthu R, DeCaprio JA, Srivastava S, Hanash SM, Htuban RH, Pandey A: A compendium of potential biomarkers of pancreatic cancer. PLoS Med. 2009, 6: e1000046-10.1371/journal.pmed.1000046.CrossRefPubMedPubMedCentral
15.
go back to reference Blaschke C, Hirschman L, Valencia A, Yeh A: A critical assessment of text mining methods in molecular biology. BMC Bioinforma. 2004, 6 (Suppl 1): S1-S23. Blaschke C, Hirschman L, Valencia A, Yeh A: A critical assessment of text mining methods in molecular biology. BMC Bioinforma. 2004, 6 (Suppl 1): S1-S23.
16.
go back to reference Hirschman L, Krallinger M, Wilbur J, Valencia A: The BioCreAtIvE II - critical assessment for information extraction in biology challenge. Genome Biol. 2008, 9 (Suppl 2): S1-S14. 10.1186/gb-2008-9-s2-s1.CrossRefPubMedPubMedCentral Hirschman L, Krallinger M, Wilbur J, Valencia A: The BioCreAtIvE II - critical assessment for information extraction in biology challenge. Genome Biol. 2008, 9 (Suppl 2): S1-S14. 10.1186/gb-2008-9-s2-s1.CrossRefPubMedPubMedCentral
17.
go back to reference Pennings JL, Koster MP, Rodenburg W, Schielen PC, de Vries A: Discovery of novel serum biomarkers for prenatal down syndrome screening by integrative data mining. PLoS One. 2009, 4: e8010-10.1371/journal.pone.0008010.CrossRefPubMedPubMedCentral Pennings JL, Koster MP, Rodenburg W, Schielen PC, de Vries A: Discovery of novel serum biomarkers for prenatal down syndrome screening by integrative data mining. PLoS One. 2009, 4: e8010-10.1371/journal.pone.0008010.CrossRefPubMedPubMedCentral
18.
go back to reference Deng X, Geng H, Bastola DR, Ali HH: Link test–a statistical method for finding prostate cancer biomarkers. Comput Biol Chem. 2006, 30: 425-433. 10.1016/j.compbiolchem.2006.09.002.CrossRefPubMedPubMedCentral Deng X, Geng H, Bastola DR, Ali HH: Link test–a statistical method for finding prostate cancer biomarkers. Comput Biol Chem. 2006, 30: 425-433. 10.1016/j.compbiolchem.2006.09.002.CrossRefPubMedPubMedCentral
19.
go back to reference Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel HP: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinforma. 2008, 9: 207-10.1186/1471-2105-9-207.CrossRef Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel HP: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinforma. 2008, 9: 207-10.1186/1471-2105-9-207.CrossRef
20.
go back to reference Elkin PL, Tuttle MS, Trusko BE, Brown HB: BioProspecting: novel marker discovery obtained by mining the bibleome. BMC Bioinforma. 2009, 10 (Suppl 2): S9-10.1186/1471-2105-10-S2-S9.CrossRef Elkin PL, Tuttle MS, Trusko BE, Brown HB: BioProspecting: novel marker discovery obtained by mining the bibleome. BMC Bioinforma. 2009, 10 (Suppl 2): S9-10.1186/1471-2105-10-S2-S9.CrossRef
21.
go back to reference Islam MT, Shaikh M, Nayak A, Ranganathan S: Biomarker Information Extraction Tool (BIET) development using natural language processing and machine learning. Proceedings of the International Conference and Workshop on Emerging Trends in Technology: 26–27 February 2010. Edited by: Mishra BK, Kekre HB, Thampi GT, Gharpure P, Mukherji A, Lohani RB. 2010, ICWET, Mumbai, 121-126.CrossRef Islam MT, Shaikh M, Nayak A, Ranganathan S: Biomarker Information Extraction Tool (BIET) development using natural language processing and machine learning. Proceedings of the International Conference and Workshop on Emerging Trends in Technology: 26–27 February 2010. Edited by: Mishra BK, Kekre HB, Thampi GT, Gharpure P, Mukherji A, Lohani RB. 2010, ICWET, Mumbai, 121-126.CrossRef
22.
go back to reference Friedrich CM, Dach H, Gattermayer T, Engelbrecht G, Benkner S, Hofmann-Apitius M: @neuLink: a service-oriented application for biomedical knowledge discovery. Proceedings of HealthGrid 2008; 2–4 June 2008. Edited by: Solomonides T, Silverstein JC, Saltz J, Legre Y, Kratz M, Foster I, Breton V, Beck JR. 2008, IOS Press, Chicago, 165-172. Friedrich CM, Dach H, Gattermayer T, Engelbrecht G, Benkner S, Hofmann-Apitius M: @neuLink: a service-oriented application for biomedical knowledge discovery. Proceedings of HealthGrid 2008; 2–4 June 2008. Edited by: Solomonides T, Silverstein JC, Saltz J, Legre Y, Kratz M, Foster I, Breton V, Beck JR. 2008, IOS Press, Chicago, 165-172.
23.
go back to reference Benkner S, Arbona A, Berti G, Chiarini A, Dunlop R, Engelbrecht G, Frangi AF, Friedrich CM, Hanser S, Hasselmeyer P, Hose RD, Iavindrasana J, Köhler M, Iacono LL, Lonsdale G, Meyer R, Moore B, Rajasekaran H, Summers PE, Wöhrer A, Woods S: @neurIST: infrastructure for advanced disease management through integration of heterogeneous data, computing, and complex processing services. IEEE Trans Inf Technol Biomed. 2010, 14: 1365-1377.CrossRefPubMed Benkner S, Arbona A, Berti G, Chiarini A, Dunlop R, Engelbrecht G, Frangi AF, Friedrich CM, Hanser S, Hasselmeyer P, Hose RD, Iavindrasana J, Köhler M, Iacono LL, Lonsdale G, Meyer R, Moore B, Rajasekaran H, Summers PE, Wöhrer A, Woods S: @neurIST: infrastructure for advanced disease management through integration of heterogeneous data, computing, and complex processing services. IEEE Trans Inf Technol Biomed. 2010, 14: 1365-1377.CrossRefPubMed
24.
go back to reference Hanisch D, Fluck J, Mevissen HT, Zimmer R: Playing biology's name game: identifying protein names in scientific text. Pac Symp Biocomput. 2003, 8: 403-14. Hanisch D, Fluck J, Mevissen HT, Zimmer R: Playing biology's name game: identifying protein names in scientific text. Pac Symp Biocomput. 2003, 8: 403-14.
25.
go back to reference Hanisch D, Fundel K, Mevissen HT, Zimmer R, Fluck J: ProMiner: rule based protein and gene entity recognition. BMC Bioinforma. 2005, 6 (Suppl 1): S14-10.1186/1471-2105-6-S1-S14.CrossRef Hanisch D, Fundel K, Mevissen HT, Zimmer R, Fluck J: ProMiner: rule based protein and gene entity recognition. BMC Bioinforma. 2005, 6 (Suppl 1): S14-10.1186/1471-2105-6-S1-S14.CrossRef
26.
go back to reference Morgan AA, Lu Z, Wang X, Cohen AM, Fluck J, Ruch P, Divoli A, Fundel K, Leaman R, Hakenberg J, Sun C, Liu HH, Torres R, Krauthammer M, Lau WW, Liu H, Hsu CN, Schuemie M, Cohen KB, Hirschman L: Overview of BioCreAtIvE II gene normalization. Genome Biol. 2008, 9: S3-CrossRefPubMedPubMedCentral Morgan AA, Lu Z, Wang X, Cohen AM, Fluck J, Ruch P, Divoli A, Fundel K, Leaman R, Hakenberg J, Sun C, Liu HH, Torres R, Krauthammer M, Lau WW, Liu H, Hsu CN, Schuemie M, Cohen KB, Hirschman L: Overview of BioCreAtIvE II gene normalization. Genome Biol. 2008, 9: S3-CrossRefPubMedPubMedCentral
27.
go back to reference Kullback S, Leibler R: On information and sufficiency. Ann Math Stat. 1951, 22: 79-86. 10.1214/aoms/1177729694.CrossRef Kullback S, Leibler R: On information and sufficiency. Ann Math Stat. 1951, 22: 79-86. 10.1214/aoms/1177729694.CrossRef
28.
go back to reference Büttcher S, Clarke CLA, Cormack GV: Information retrieval: implementing and evaluating search engines. Cambridge, Mass. MIT Press. 296-298. Büttcher S, Clarke CLA, Cormack GV: Information retrieval: implementing and evaluating search engines. Cambridge, Mass. MIT Press. 296-298.
29.
go back to reference Smeeton NC: Early history of the kappa statistic. Biometrics. 1985, 41: 795- Smeeton NC: Early history of the kappa statistic. Biometrics. 1985, 41: 795-
31.
go back to reference Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Gloub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci. 2005, 102: 15545-15550. 10.1073/pnas.0506580102.CrossRefPubMedPubMedCentral Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Gloub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci. 2005, 102: 15545-15550. 10.1073/pnas.0506580102.CrossRefPubMedPubMedCentral
32.
go back to reference Goutte C, Gaussier E: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. Advances in Information Retrieval. Lecture Notes in Computer Science. 2005, 3408: 345-59.CrossRef Goutte C, Gaussier E: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. Advances in Information Retrieval. Lecture Notes in Computer Science. 2005, 3408: 345-59.CrossRef
33.
go back to reference Szabo E: MUC1 expression in lung cancer. Lung Cancer, Methods in Molecular Medicine. Edited by: Driscoll B. 2003, Humana Press, New Jersey, 251-258. Volume 74, 3 Szabo E: MUC1 expression in lung cancer. Lung Cancer, Methods in Molecular Medicine. Edited by: Driscoll B. 2003, Humana Press, New Jersey, 251-258. Volume 74, 3
34.
go back to reference Petty RD, Nicolson MC, Kerr KM, Collie-Duguid E, Murray GI: Gene expression profiling in non-small cell lung cancer, from molecular mechanisms to clinical application. Clin Cancer Res. 2004, 10: 3237-10.1158/1078-0432.CCR-03-0503.CrossRefPubMed Petty RD, Nicolson MC, Kerr KM, Collie-Duguid E, Murray GI: Gene expression profiling in non-small cell lung cancer, from molecular mechanisms to clinical application. Clin Cancer Res. 2004, 10: 3237-10.1158/1078-0432.CCR-03-0503.CrossRefPubMed
Metadata
Title
Mining biomarker information in biomedical literature
Authors
Erfan Younesi
Luca Toldo
Bernd Müller
Christoph M Friedrich
Natalia Novac
Alexander Scheer
Martin Hofmann-Apitius
Juliane Fluck
Publication date
01-12-2012
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2012
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/1472-6947-12-148

Other articles of this Issue 1/2012

BMC Medical Informatics and Decision Making 1/2012 Go to the issue