Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-12-2020 | Colorectal Cancer | Research article

Identification of most influential co-occurring gene suites for gastrointestinal cancer using biomedical literature mining and graph-based influence maximization

Authors: Charles C. N. Wang, Jennifer Jin, Jan-Gowth Chang, Masahiro Hayakawa, Atsushi Kitazawa, Jeffrey J. P. Tsai, Phillip C.-Y. Sheu

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Abstract

Background

Gastrointestinal (GI) cancer including colorectal cancer, gastric cancer, pancreatic cancer, etc., are among the most frequent malignancies diagnosed annually and represent a major public health problem worldwide.

Methods

This paper reports an aided curation pipeline to identify potential influential genes for gastrointestinal cancer. The curation pipeline integrates biomedical literature to identify named entities by Bi-LSTM-CNN-CRF methods. The entities and their associations can be used to construct a graph, and from which we can compute the sets of co-occurring genes that are the most influential based on an influence maximization algorithm.

Results

The sets of co-occurring genes that are the most influential that we discover include RARA - CRBP1, CASP3 - BCL2, BCL2 - CASP3 – CRBP1, RARA - CASP3 – CRBP1, FOXJ1 - RASSF3 - ESR1, FOXJ1 - RASSF1A - ESR1, FOXJ1 - RASSF1A - TNFAIP8 - ESR1. With TCGA and functional and pathway enrichment analysis, we prove the proposed approach works well in the context of gastrointestinal cancer.

Conclusions

Our pipeline that uses text mining to identify objects and relationships to construct a graph and uses graph-based influence maximization to discover the most influential co-occurring genes presents a viable direction to assist knowledge discovery for clinical applications.

Toomey PG, Vohra NA, Ghansah T, Sarnaik AA, Pilon-Thomas SAJCC. Immunotherapy for gastrointestinal malignancies. Cancer Control, 2013;20(1):32–42.

Pöttgen C, Stuschke MJC. Radiotherapy versus surgery within multimodality protocols for esophageal cancer–a meta-analysis of the randomized trials. Cancer treatment reviews, 2012;38(6):599–604.

Vesely MD, Schreiber RDJANYAS. Cancer immunoediting: antigens, mechanisms, and implications to cancer immunotherapy. Annals of the New York Academy of Sciences, 2013;1284(1):1–5.

Zumwalt TJ, Goel AJC. Immunotherapy of metastatic colorectal cancer: prevailing challenges and new perspectives. Current colorectal cancer reports, 2015;11(3):125–40.

Jin S, Zeng X, Xia F, Huang W, Liu X. Application of deep learning methods in biological networks. Brief Bioinform. 2020;bbaa043.

Ali N, Amer E, Zayed H. Understanding Medical Text Related to Breast Cancer: A Review. In: International Conference on Advanced Intelligent Systems and Informatics: 2017: Springer; Cham. 2017. p. 280–8.

Jensen LJ, Saric J, Bork PJN. Literature mining for the biologist: from information retrieval to biological discovery. Nature reviews genetics, 2006;7(2):119–129.

Jurca G, Addam O, Aksac A, Gao S, Özyer T, Demetrick D, Alhajj RJB. Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends. BMC research notes, 2016;9(1):236.

Cho H, Lee H. Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinformatics. 2019;20(1):735.CrossRef

10.

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ESJPNAS. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 2005;102(43):15545–50.

11.

Huang DW, Sherman BT, Lempicki RAJN. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols, 2008;4(1):44.

12.

Zhu F, Patumcharoenpol P, Zhang C, Yang Y, Chan J, Meechai A, Vongsangnak W, Shen BJJ. Biomedical text mining and its applications in cancer research. Journal of biomedical informatics, 2013;46(2):200–11.

13.

Chang N-W, Dai H-J, Shih Y-Y, Wu C-Y, Rosa D, Obena RP, Chen Y-J, Hsu W-L, Oyang Y-JJD. Biomarker identification of hepatocellular carcinoma using a methodical literature mining strategy. Database, 2017;2017.

14.

Kim Y-A, Przytycki JH, Wuchty S, Przytycka TMJP. Modeling information flow in biological networks. Physical biology, 2011;8(3):035012.

15.

Cancer Genome Atlas Research N, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45(10):1113–20.CrossRef

16.

Wei C-H, Allot A, Leaman R, Lu Z. PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res. 2019;47(W1):W587–W593.

17.

Allot A, Peng Y, Wei C-H, Lee K, Phan L, Lu Z. LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC. Nucleic Acids Res. 2018;46(W1):W530–6.CrossRef

18.

Fontaine JF, Barbosa-Silva A, Schaefer M, Huska MR, Muro EM, Andrade-Navarro MA. MedlineRanker: flexible ranking of biomedical literature. Nucleic Acids Res. 2009;37(Web Server issue):W141–6.CrossRef

19.

Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.PubMed

20.

Smith L, Tanabe LK, Ando RJ, Kuo CJ, Chung IF, Hsu CN, Lin YS, Klinger R, Friedrich CM, Ganchev K, et al. Overview of BioCreative II gene mention recognition. Genome Biol. 2008;9(Suppl 2):S2.CrossRef

21.

Dang TH, Le H-Q, Nguyen TM, Vu ST. D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information. Bioinformatics. 2018;34(20):3539–46.CrossRef

22.

Ma X, Hovy EJ. End-to-end sequence labeling via bi-directional lstm-cnns-crf; 2016.CrossRef

23.

Mork J, Aronson A, Demner-Fushman D. 12 years on–Is the NLM medical text indexer still useful and relevant? J Biomedi Semantics. 2017;8(1):8.CrossRef

24.

Lu Z, Hirschman L. Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II. Database. 2012;2012:bas043.PubMedPubMedCentral

25.

Westergaard D, Stærfeldt H-H, Tønsberg C, Jensen LJ, Brunak S. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput Biol. 2018;14(2):e1005962.CrossRef

26.

Comeau DC, Wei C-H, Islamaj Doğan R, Lu Z. PMC text mining subset in BioC: about three million full-text articles and growing. Bioinformatics. 2019;35(18):3533–3535.

27.

Barbosa-Silva A, Soldatos TG, Magalhães IL, Pavlopoulos GA, Fontaine J-F, Andrade-Navarro MA, Schneider R, Ortega JM. Laitor-literature assistant for identification of terms co-occurrences and relationships. BMC Bioinformatics. 2010;11(1):70.CrossRef

28.

Mika S, Rost B. NLProt: extracting protein names and sequences from papers. Nucleic Acids Res. 2004;32(suppl_2):W634–7.CrossRef

29.

Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2004;32(suppl_1):D115–9.CrossRef

30.

Barbosa-Silva A, Fontaine JF, Donnard ER, Stussi F, Ortega JM, Andrade-Navarro MA. PESCADOR, a web-based tool to assist text-mining of biointeractions extracted from PubMed queries. BMC Bioinformatics. 2011;12:435.CrossRef

31.

Shakarian P, Bhatnagar A, Aleali A, Shaabani E, Guo R. The independent cascade and linear threshold models. In: Diffusion in Social Networks: Springer; Cham. 2015. p. 35–48.

32.

Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining: 2003: ACM; 2003. p. 137–46.

33.

Jin J. Influence Maximization in GOLAP. Irvine: University of California; 2019.

34.

Hashimoto RF, Kim S, Shmulevich I, Zhang W, Bittner ML, Dougherty ERJB. Growing genetic regulatory networks from seed genes. Bioinformatics, 2004;20(8):1241–7.

35.

Greenlee MHW, Honavar VG, Hecker LA, Alcon TAJB, Insights B. Using a seed-network to query multiple large-scale gene expression datasets from the developing retina in order to identify and prioritize experimental targets. Bioinformatics and Biology Insights, 2008;2:91–102.

36.

Gibbs DL, Shmulevich IJP. Solving the influence maximization problem reveals regulatory organization of the yeast cell cycle. 2017;13(6):e1005591.

37.

Nalluri JJ, Rana P, Barh D, Azevedo V, Dinh TN, Vladimirov V, Ghosh PJS. Determining causal miRNAs and their signaling cascade in diseases using an influence diffusion model. Scientific reports, 2017;7(1):1–14.

38.

Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani AK, Wong YW. Comparative experiments on learning information extractors for proteins and their interactions. Artif Intell Med. 2005;33(2):139–55.CrossRef

39.

Xiang Z, Huang X, Wang J, Zhang J, Ji J, Yan R, Zhu Z, Cai W, Yu YJF. Cross-database analysis reveals sensitive biomarkers for combined therapy for ERBB2+ gastric cancer. Frontiers in Pharmacology, 2018;9:861.

40.

Esteller M, Guo M, Moreno V, Peinado MA, Capella G, Galm O, Baylin SB, Herman JGJC. Hypermethylation-associated inactivation of the cellular retinol-binding-protein 1 gene in human cancer. Cancer research, 2002;62(20):5902–5.

41.

Yao Q, Wang W, Jin J, Min K, Yang J, Zhong Y, Xu C, Deng J, Zhou YJCB: Synergistic role of Caspase-8 and Caspase-3 expressions: Prognostic and predictive biomarkers in colorectal cancer. Cancer biomarkers: section A of Disease markers, 2018;21(4):899–908.

42.

Czabotar PE, Lessene G, Strasser A, Adams JMJNM. Control of apoptosis by the BCL-2 protein family: implications for physiology and therapy. Nature reviews. Molecular cell biology, 2014;15(1):49.

43.

Huang Q, Li S, Cheng P, Deng M, He X, Wang Z, Yang C-H, Zhao X-Y, Huang JJW. High expression of anti-apoptotic protein Bcl-2 is a good prognostic factor in colorectal cancer: Result of a meta-analysis. World Journal of Gastroenterology, 2017;23(27):5018.

44.

Liu K, Fan J, Wu JJM. research c: Forkhead box protein J1 (FOXJ1) is overexpressed in colorectal cancer and promotes nuclear translocation of β-catenin in SW620 cells. Medical Science Monitor: International Medical Journal of Experimental and Clinical Research, 2017;23:856.

45.

Fernandes MS, Carneiro F, Oliveira C, Seruca RJI. Colorectal cancer and RASSF family—a special emphasis on RASSF1A. International journal of cancer, 2013;132(2):251–8.

46.

Caiazza F, Ryan EJ, Doherty G, Winter DC, Sheahan KJF. Estrogen receptors and their implications in colorectal carcinogenesis. Frontiers in oncology, 2015;5:19.

47.

Li Y, Jing C, Chen Y, Wang J, Zhou M, Liu X, Sun D, Mu L, Li L, Guo XJM. Expression of tumor necrosis factor α-induced protein 8 is upregulated in human gastric cancer and regulates cell proliferation, invasion and migration. Molecular medicine reports, 2015;12(2):2636–42.

48.

Aguirre-Gamboa R, Gomez-Rueda H, Martínez-Ledesma E, Martínez-Torteya A, Chacolla-Huaringa R, Rodriguez-Barrientos A, Tamez-Pena JG. Trevino VJPo: SurvExpress: an online biomarker validation tool and database for cancer gene expression data using survival analysis. PloS one, 2013;8(9):e74250.

Title: Identification of most influential co-occurring gene suites for gastrointestinal cancer using biomedical literature mining and graph-based influence maximization
Authors: Charles C. N. Wang
Jennifer Jin
Jan-Gowth Chang
Masahiro Hayakawa
Atsushi Kitazawa
Jeffrey J. P. Tsai
Phillip C.-Y. Sheu
Publication date: 01-12-2020
Publisher: BioMed Central
Keywords: Colorectal Cancer
Colorectal Cancer
Published in: BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/s12911-020-01227-6

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Identification of most influential co-occurring gene suites for gastrointestinal cancer using biomedical literature mining and graph-based influence maximization

Abstract

Background

Methods

Results

Conclusions

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2020

Comparison of deep learning with regression analysis in creating predictive models for SARS-CoV-2 outcomes

Level of awareness of Saudi medical students of the internet-based health-related information seeking and developing to support health services

Application of standardised effect sizes to hospital discharge outcomes for people with diabetes

Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data

A combination of 3-D discrete wavelet transform and 3-D local binary pattern for classification of mild cognitive impairment

The use of technology in tracking soccer players’ health performance: a scoping review