Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 3/2020

Open Access 01-07-2020 | Hepatocellular Carcinoma | Research

KGHC: a knowledge graph for hepatocellular carcinoma

Authors: Nan Li, Zhihao Yang, Ling Luo, Lei Wang, Yin Zhang, Hongfei Lin, Jian Wang

Published in: BMC Medical Informatics and Decision Making | Special Issue 3/2020

Login to get access

Abstract

Background

Hepatocellular carcinoma is one of the most general malignant neoplasms in adults with high mortality. Mining relative medical knowledge from rapidly growing text data and integrating it with other existing biomedical resources will provide support to the research on the hepatocellular carcinoma. To this purpose, we constructed a knowledge graph for Hepatocellular Carcinoma (KGHC).

Methods

We propose an approach to build a knowledge graph for hepatocellular carcinoma. Specifically, we first extracted knowledge from structured data and unstructured data. Since the extracted entities may contain some noise, we applied a biomedical information extraction system, named BioIE, to filter the data in KGHC. Then we introduced a fusion method which is used to fuse the extracted data. Finally, we stored the data into the Neo4j which can help researchers analyze the network of hepatocellular carcinoma.

Results

KGHC contains 13,296 triples and provides the knowledge of hepatocellular carcinoma for healthcare professionals, making them free of digging into a large amount of biomedical literatures. This could hopefully improve the efficiency of researches on the hepatocellular carcinoma. KGHC is accessible free for academic research purpose at http://​202.​118.​75.​18:​18895/​browser/​.

Conclusions

In this paper, we present a knowledge graph associated with hepatocellular carcinoma, which is constructed with vast amounts of structured and unstructured data. The evaluation results show that the data in KGHC is of high quality.
Literature
3.
go back to reference Crissien AM, Frenette C. Current management of hepatocellular carcinoma. Gastroenterol Hepatol. 2014;10(3):153–61. Crissien AM, Frenette C. Current management of hepatocellular carcinoma. Gastroenterol Hepatol. 2014;10(3):153–61.
4.
go back to reference Amit S. Introducing the knowledge graph, vol. America: Official Blog of Google; 2012. Amit S. Introducing the knowledge graph, vol. America: Official Blog of Google; 2012.
6.
go back to reference Shi L, Li S, et al. Semantic health knowledge graph: semantic integration of heterogeneous medical knowledge and services. Biomed Res Int. 2017;2:1–12. Shi L, Li S, et al. Semantic health knowledge graph: semantic integration of heterogeneous medical knowledge and services. Biomed Res Int. 2017;2:1–12.
7.
go back to reference Yuan J, Jin Z, et al. Constructing biomedical domain-specific knowledge graph with minimum supervision. Knowledge and Information Systems.2019;62:317–36. Yuan J, Jin Z, et al. Constructing biomedical domain-specific knowledge graph with minimum supervision. Knowledge and Information Systems.2019;62:317–36.
8.
go back to reference Ernst P, Siu A, Weikum G. Knowlife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC biomedical sciences. 2015;16(1):1. Ernst P, Siu A, Weikum G. Knowlife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC biomedical sciences. 2015;16(1):1.
9.
go back to reference Kuhn M, Letunic I, Jensen LJ, et al. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016;44(D1):D1075.PubMedCrossRef Kuhn M, Letunic I, Jensen LJ, et al. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016;44(D1):D1075.PubMedCrossRef
10.
go back to reference Danishuddin M, Kaushal L, Baig MH, Khan AU. Amdd: Antimicrobial drug database. Genomics Proteom Bioinforma. 2012;10(6):360–3.CrossRef Danishuddin M, Kaushal L, Baig MH, Khan AU. Amdd: Antimicrobial drug database. Genomics Proteom Bioinforma. 2012;10(6):360–3.CrossRef
11.
12.
go back to reference Taccioli C, Maselli V, Tegnér J, Gomez-Cabrero D, Altobelli G, Emmett W, Lescai F, Gustincich S, Stupka E. Parkdb: a parkinson’s disease gene expression database. Database. 2011;2011:007.CrossRef Taccioli C, Maselli V, Tegnér J, Gomez-Cabrero D, Altobelli G, Emmett W, Lescai F, Gustincich S, Stupka E. Parkdb: a parkinson’s disease gene expression database. Database. 2011;2011:007.CrossRef
13.
go back to reference Kringelum J, Kjaerulff SK, Brunak S, Lund O, Oprea TI, Taboureau O. Chemprot-3.0: a global chemical biology diseases mapping. Database. 2016;2016:123.CrossRef Kringelum J, Kjaerulff SK, Brunak S, Lund O, Oprea TI, Taboureau O. Chemprot-3.0: a global chemical biology diseases mapping. Database. 2016;2016:123.CrossRef
16.
go back to reference Kamdar AMR, Dumontier M. Ebola virus-centered knowledge base [J]. DataBase. 2015;2015:1–11.CrossRef Kamdar AMR, Dumontier M. Ebola virus-centered knowledge base [J]. DataBase. 2015;2015:1–11.CrossRef
17.
go back to reference Siu A, Ernst P, Weikum G. Disambiguation of entities in medline abstracts by combining mesh terms with knowledge. Florence: ACL; 2016. p. p72. Siu A, Ernst P, Weikum G. Disambiguation of entities in medline abstracts by combining mesh terms with knowledge. Florence: ACL; 2016. p. p72.
18.
go back to reference Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32:267–70.CrossRef Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32:267–70.CrossRef
19.
go back to reference Ruan T, Wang M, Sun J et al. An automatic approach for constructing a knowledge base of symptoms in Chinese. Biological Ontologies and Knowledge bases workshop on IEEE BIBM, 2016. Ruan T, Wang M, Sun J et al. An automatic approach for constructing a knowledge base of symptoms in Chinese. Biological Ontologies and Knowledge bases workshop on IEEE BIBM, 2016.
20.
go back to reference Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text [J]. J Biomed Inform. 2003;36(6):462–77.PubMedCrossRef Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text [J]. J Biomed Inform. 2003;36(6):462–77.PubMedCrossRef
21.
go back to reference Wheeler DL, Barrett T, Benson DA, et al. Database resources of the National Center for biotechnology information. Nucleic Acids Res. 2007;35:D5–D12.PubMedCrossRef Wheeler DL, Barrett T, Benson DA, et al. Database resources of the National Center for biotechnology information. Nucleic Acids Res. 2007;35:D5–D12.PubMedCrossRef
22.
go back to reference Benson,D.A., Cavanaugh, M., Clark, K. et al. GenBank Nucleic Acids Res, 2013, 41:D36-D42. Benson,D.A., Cavanaugh, M., Clark, K. et al. GenBank Nucleic Acids Res, 2013, 41:D36-D42.
23.
go back to reference Barrett T, Clark K, Gevorgyan R, et al. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res. 2012;40:D57–63.PubMedCrossRef Barrett T, Clark K, Gevorgyan R, et al. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res. 2012;40:D57–63.PubMedCrossRef
24.
25.
go back to reference Rindflesch TC, Kilicoglu H, Fiszman M, et al. Semantic MEDLINE: an advanced information management application for biomedicine [J]. Inf Serv Use. 2011;31(1–2):15–21.CrossRef Rindflesch TC, Kilicoglu H, Fiszman M, et al. Semantic MEDLINE: an advanced information management application for biomedicine [J]. Inf Serv Use. 2011;31(1–2):15–21.CrossRef
26.
go back to reference Kilicoglu HF-M. Semantic MEDLINE: A Web Application to Manage the Results of PubMed searches. Proceedings of the 3rd International Symposium on Semantic Mining in Biomedicine; 2008. Kilicoglu HF-M. Semantic MEDLINE: A Web Application to Manage the Results of PubMed searches. Proceedings of the 3rd International Symposium on Semantic Mining in Biomedicine; 2008.
27.
go back to reference Luo L, Yang Z, Yang P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics. 2017;34(8):1381–8.CrossRef Luo L, Yang Z, Yang P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics. 2017;34(8):1381–8.CrossRef
28.
go back to reference Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging. Computer Science; 2015. Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging. Computer Science; 2015.
30.
go back to reference Ji B, Liu R, et al. A hybrid approach for named entity recognition in Chinese electronic medical record. BMC Med Informatics Decision Making. 2019;19:64. Ji B, Liu R, et al. A hybrid approach for named entity recognition in Chinese electronic medical record. BMC Med Informatics Decision Making. 2019;19:64.
31.
go back to reference Leaman R, et al. tmChem: a high performance approach for chemical named entity recognition and normalization. J Cheminformatics. 2015;7:S3. Leaman R, et al. tmChem: a high performance approach for chemical named entity recognition and normalization. J Cheminformatics. 2015;7:S3.
32.
go back to reference Lu Y, et al. CHEMDNER system with mixed conditional random fields and multi-scale word clustering. J Cheminf. 2015;7:S4.CrossRef Lu Y, et al. CHEMDNER system with mixed conditional random fields and multi-scale word clustering. J Cheminf. 2015;7:S4.CrossRef
33.
go back to reference Pandey C, et al. Improving RNN with attention and embedding for adverse drug reactions. In: Proceedings of the 2017 International conference on digital health. ACM; 2017. p. 67–71.CrossRef Pandey C, et al. Improving RNN with attention and embedding for adverse drug reactions. In: Proceedings of the 2017 International conference on digital health. ACM; 2017. p. 67–71.CrossRef
34.
go back to reference Santisteban J, Tejada-Cárcamo J. Unilateral Jaccard similarity coefficient. In: GSB@ SIGIR, 2015, 23–27. Santisteban J, Tejada-Cárcamo J. Unilateral Jaccard similarity coefficient. In: GSB@ SIGIR, 2015, 23–27.
35.
go back to reference Zhou ZQ, Qi GL, Glimm B. Exploring parallel tractability of ontology materialization. European Conference on Artificial Intelligence; 2016. p. 73–81. Zhou ZQ, Qi GL, Glimm B. Exploring parallel tractability of ontology materialization. European Conference on Artificial Intelligence; 2016. p. 73–81.
36.
go back to reference Webber J. A programmatic introduction to Neo4j[C]. Conference on Systems, Programming, and Applications: Software for Humanity; 2012. p. 217–8. Webber J. A programmatic introduction to Neo4j[C]. Conference on Systems, Programming, and Applications: Software for Humanity; 2012. p. 217–8.
37.
Metadata
Title
KGHC: a knowledge graph for hepatocellular carcinoma
Authors
Nan Li
Zhihao Yang
Ling Luo
Lei Wang
Yin Zhang
Hongfei Lin
Jian Wang
Publication date
01-07-2020
Publisher
BioMed Central
DOI
https://doi.org/10.1186/s12911-020-1112-5

Other articles of this Special Issue 3/2020

BMC Medical Informatics and Decision Making 3/2020 Go to the issue