Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 2/2018

Open Access 01-07-2018 | Research

An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival

Authors: Hansi Zhang, Yi Guo, Qian Li, Thomas J. George, Elizabeth Shenkman, François Modave, Jiang Bian

Published in: BMC Medical Informatics and Decision Making | Special Issue 2/2018

Login to get access

Abstract

Background

Cancer is the second leading cause of death in the United States, exceeded only by heart disease. Extant cancer survival analyses have primarily focused on individual-level factors due to limited data availability from a single data source. There is a need to integrate data from different sources to simultaneously study as much risk factors as possible. Thus, we proposed an ontology-based approach to integrate heterogeneous datasets addressing key data integration challenges.

Methods

Following best practices in ontology engineering, we created the Ontology for Cancer Research Variables (OCRV) adapting existing semantic resources such as the National Cancer Institute (NCI) Thesaurus. Using the global-as-view data integration approach, we created mapping axioms to link the data elements in different sources to OCRV. Implemented upon the Ontop platform, we built a data integration pipeline to query, extract, and transform data in relational databases using semantic queries into a pooled dataset according to the downstream multi-level Integrative Data Analysis (IDA) needs.

Results

Based on our use cases in the cancer survival IDA, we created tailored ontological structures in OCRV to facilitate the data integration tasks. Specifically, we created a flexible framework addressing key integration challenges: (1) using a shared, controlled vocabulary to make data understandable to both human and computers, (2) explicitly modeling the semantic relationships makes it possible to compute and reason with the data, (3) linking patients to contextual and environmental factors through geographic variables, (4) being able to document the data manipulation and integration processes clearly in the ontologies.

Conclusions

Using an ontology-based data integration approach not only standardizes the definitions of data variables through a common, controlled vocabulary, but also makes the semantic relationships among variables from different sources explicit and clear to all users of the same datasets. Such an approach resolves the ambiguity in variable selection, extraction and integration processes and thus improve reproducibility of the IDA.
Literature
3.
go back to reference Howlader N, Noone A, Krapcho M, Miller D, Bishop K, Kosary C, et al. SEER Cancer Statistics Review, 1975–2014, National Cancer Institute. Bethesda, MD, https://seer.cancer.gov/csr/1975_2014/, based on November 2016 SEER data submission, posted to the SEER web site, April 2017. Howlader N, Noone A, Krapcho M, Miller D, Bishop K, Kosary C, et al. SEER Cancer Statistics Review, 1975–2014, National Cancer Institute. Bethesda, MD, https://​seer.​cancer.​gov/​csr/​1975_​2014/​, based on November 2016 SEER data submission, posted to the SEER web site, April 2017.
5.
go back to reference Golden SD, Earp JAL. Social ecological approaches to individuals and their contexts: twenty years of health education & behavior health promotion interventions. Health Educ Behav Off Publ Soc Public Health Educ. 2012;39:364–72.CrossRef Golden SD, Earp JAL. Social ecological approaches to individuals and their contexts: twenty years of health education & behavior health promotion interventions. Health Educ Behav Off Publ Soc Public Health Educ. 2012;39:364–72.CrossRef
6.
go back to reference Vetterlein MW, Löppenberg B, Karabon P, Dalela D, Jindal T, Sood A, et al. Impact of travel distance to the treatment facility on overall mortality in US patients with prostate cancer. Cancer. 2017;123:3241–52.CrossRefPubMed Vetterlein MW, Löppenberg B, Karabon P, Dalela D, Jindal T, Sood A, et al. Impact of travel distance to the treatment facility on overall mortality in US patients with prostate cancer. Cancer. 2017;123:3241–52.CrossRefPubMed
7.
go back to reference Shao S, Gill AA, Zahm SH, Jatoi I, Shriver CD, McGlynn KA, et al. Diabetes and overall survival among breast Cancer patients in the U.S. military health system. Cancer Epidemiol Biomark Prev. 2018;27:50–7.CrossRef Shao S, Gill AA, Zahm SH, Jatoi I, Shriver CD, McGlynn KA, et al. Diabetes and overall survival among breast Cancer patients in the U.S. military health system. Cancer Epidemiol Biomark Prev. 2018;27:50–7.CrossRef
8.
go back to reference Iqbal J, Ginsburg O, Rochon PA, Sun P, Narod SA. Differences in breast Cancer stage at diagnosis and Cancer-specific survival by race and ethnicity in the United States. JAMA. 2015;313:165.CrossRefPubMed Iqbal J, Ginsburg O, Rochon PA, Sun P, Narod SA. Differences in breast Cancer stage at diagnosis and Cancer-specific survival by race and ethnicity in the United States. JAMA. 2015;313:165.CrossRefPubMed
9.
go back to reference Eng LG, Dawood S, Sopik V, Haaland B, Tan PS, Bhoo-Pathy N, et al. Ten-year survival in women with primary stage IV breast cancer. Breast Cancer Res Treat. 2016;160:145–52.CrossRefPubMed Eng LG, Dawood S, Sopik V, Haaland B, Tan PS, Bhoo-Pathy N, et al. Ten-year survival in women with primary stage IV breast cancer. Breast Cancer Res Treat. 2016;160:145–52.CrossRefPubMed
10.
go back to reference Smith CB, Bonomi M, Packer S, Wisnivesky JP. Disparities in lung cancer stage, treatment and survival among American Indians and Alaskan natives. Lung Cancer. 2011;72:160–4.CrossRefPubMed Smith CB, Bonomi M, Packer S, Wisnivesky JP. Disparities in lung cancer stage, treatment and survival among American Indians and Alaskan natives. Lung Cancer. 2011;72:160–4.CrossRefPubMed
11.
13.
go back to reference Khan SA, Pruitt SL, Xuan L, Makris U, Gerber DE. How does autoimmune disease impact treatment and outcomes among patients with lung cancer? A national SEER-Medicare analysis. Lung Cancer. 2018;115:97–102.CrossRefPubMed Khan SA, Pruitt SL, Xuan L, Makris U, Gerber DE. How does autoimmune disease impact treatment and outcomes among patients with lung cancer? A national SEER-Medicare analysis. Lung Cancer. 2018;115:97–102.CrossRefPubMed
14.
go back to reference Lichtensztajn DY, Giddings B, Morris C, Parikh-Patel A, Kizer K. Comorbidity index in central cancer registries: the value of hospital discharge data. Clin Epidemiol 2017;Volume 9:601–9. Lichtensztajn DY, Giddings B, Morris C, Parikh-Patel A, Kizer K. Comorbidity index in central cancer registries: the value of hospital discharge data. Clin Epidemiol 2017;Volume 9:601–9.
15.
16.
go back to reference Goble C, Stevens R. State of the nation in data integration for bioinformatics. J Biomed Inform. 2008;41:687–93.CrossRefPubMed Goble C, Stevens R. State of the nation in data integration for bioinformatics. J Biomed Inform. 2008;41:687–93.CrossRefPubMed
17.
go back to reference Lapatas V, Stefanidakis M, Jimenez RC, Via A, Schneider MV. Data integration in biological research: an overview. J Biol Res Thessalon Greece. 2015;22:9.CrossRef Lapatas V, Stefanidakis M, Jimenez RC, Via A, Schneider MV. Data integration in biological research: an overview. J Biol Res Thessalon Greece. 2015;22:9.CrossRef
18.
go back to reference Gruber TR, Olsen GR. An ontology for engineering mathematics. Proc of KR. 1994;1994:258–69. Gruber TR, Olsen GR. An ontology for engineering mathematics. Proc of KR. 1994;1994:258–69.
19.
go back to reference Gruber TR. A translation approach to portable ontology specifications. Knowl Acquis. 1993;5:199–220.CrossRef Gruber TR. A translation approach to portable ontology specifications. Knowl Acquis. 1993;5:199–220.CrossRef
20.
go back to reference David D. Tim V den B. Towards a Flexible Semantic Framework for Clinical Trial Eligibility using Topic Maps. 2012; David D. Tim V den B. Towards a Flexible Semantic Framework for Clinical Trial Eligibility using Topic Maps. 2012;
21.
go back to reference Guarino N. Formal ontology in information systems: proceedings of the 1st international conference June 6–8, 1998, Trento, Italy. 1st edition. Amsterdam, the Netherlands. The Netherlands: IOS Press; 1998. Guarino N. Formal ontology in information systems: proceedings of the 1st international conference June 6–8, 1998, Trento, Italy. 1st edition. Amsterdam, the Netherlands. The Netherlands: IOS Press; 1998.
22.
go back to reference Wache H, Vögele T, Visser U, Stuckenschmidt H, Schuster G, Neumann H, et al. Ontology-based integration of information - a survey of existing approaches. In: In IJCAI’01 workshop. On ontologies and information sharing; 2001. Wache H, Vögele T, Visser U, Stuckenschmidt H, Schuster G, Neumann H, et al. Ontology-based integration of information - a survey of existing approaches. In: In IJCAI’01 workshop. On ontologies and information sharing; 2001.
23.
go back to reference Xiao H. Query processing for heterogeneous data integration using ontologies: University of Illinois at Chicago; 2006. Xiao H. Query processing for heterogeneous data integration using ontologies: University of Illinois at Chicago; 2006.
24.
go back to reference Noy NF. Semantic integration: a survey of ontology-based approaches. ACM SIGMOD Rec. 2004;33:65.CrossRef Noy NF. Semantic integration: a survey of ontology-based approaches. ACM SIGMOD Rec. 2004;33:65.CrossRef
25.
go back to reference Dhombres F, Charlet J. Knowledge representation and management, It’s time to integrate! Yearb Med Inform. 2017;26:148–51.CrossRefPubMed Dhombres F, Charlet J. Knowledge representation and management, It’s time to integrate! Yearb Med Inform. 2017;26:148–51.CrossRefPubMed
26.
27.
go back to reference Cheung K-H, Yip KY, Smith A, deKnikker R, Masiar A, Gerstein M. YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics. 2005;21(Suppl 1):i85–96.CrossRefPubMed Cheung K-H, Yip KY, Smith A, deKnikker R, Masiar A, Gerstein M. YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics. 2005;21(Suppl 1):i85–96.CrossRefPubMed
28.
go back to reference Lam HYK, Marenco L, Shepherd GM, Miller PL, Cheung K-H. Using web ontology language to integrate heterogeneous databases in the neurosciences. AMIA Annu Symp Proc AMIA Symp. 2006:464–8. Lam HYK, Marenco L, Shepherd GM, Miller PL, Cheung K-H. Using web ontology language to integrate heterogeneous databases in the neurosciences. AMIA Annu Symp Proc AMIA Symp. 2006:464–8.
29.
go back to reference Lam HY, Marenco L, Clark T, Gao Y, Kinoshita J, Shepherd G, et al. AlzPharm: integration of neurodegeneration data using RDF. BMC Bioinformatics. 2007;8(Suppl 3):S4.CrossRefPubMedPubMedCentral Lam HY, Marenco L, Clark T, Gao Y, Kinoshita J, Shepherd G, et al. AlzPharm: integration of neurodegeneration data using RDF. BMC Bioinformatics. 2007;8(Suppl 3):S4.CrossRefPubMedPubMedCentral
30.
go back to reference Smith AK, Cheung K-H, Yip KY, Schultz M, Gerstein MB. LinkHub: a semantic web system that facilitates cross-database queries and information retrieval in proteomics. BMC Bioinformatics. 2007;8(Suppl 3):S5.CrossRefPubMedPubMedCentral Smith AK, Cheung K-H, Yip KY, Schultz M, Gerstein MB. LinkHub: a semantic web system that facilitates cross-database queries and information retrieval in proteomics. BMC Bioinformatics. 2007;8(Suppl 3):S5.CrossRefPubMedPubMedCentral
32.
33.
go back to reference Ariane AK, Audi P, Rémy C, Douglas T, Frank E, Catherine D, et al. Data Definition Ontology for clinical data integration and querying. Stud Health Technol Inform. 2012;:38–42. Ariane AK, Audi P, Rémy C, Douglas T, Frank E, Catherine D, et al. Data Definition Ontology for clinical data integration and querying. Stud Health Technol Inform. 2012;:38–42.
34.
go back to reference Pang C, Hendriksen D, Dijkstra M, van der Velde KJ, Kuiper J, Hillege HL, et al. BiobankConnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing. J Am Med Inform Assoc. 2015;22:65–75.CrossRefPubMed Pang C, Hendriksen D, Dijkstra M, van der Velde KJ, Kuiper J, Hillege HL, et al. BiobankConnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing. J Am Med Inform Assoc. 2015;22:65–75.CrossRefPubMed
36.
go back to reference Ethier J-F, Curcin V, Barton A, McGilchrist MM, Bastiaens H, Andreasson A, et al. Clinical data integration model: Core interoperability ontology for research using primary care data. Methods Inf Med. 2014;54:16–23.PubMed Ethier J-F, Curcin V, Barton A, McGilchrist MM, Bastiaens H, Andreasson A, et al. Clinical data integration model: Core interoperability ontology for research using primary care data. Methods Inf Med. 2014;54:16–23.PubMed
37.
go back to reference Mate S, Köpcke F, Toddenroth D, Martin M, Prokosch H-U, Bürkle T, et al. Ontology-based data integration between clinical and research systems. PLoS One. 2015;10:e0116656.CrossRefPubMedPubMedCentral Mate S, Köpcke F, Toddenroth D, Martin M, Prokosch H-U, Bürkle T, et al. Ontology-based data integration between clinical and research systems. PLoS One. 2015;10:e0116656.CrossRefPubMedPubMedCentral
39.
go back to reference Liang C, Sun J, Tao C. Semantic web ontology and data integration: a case study in aiding psychiatric drug repurposing. Stud Health Technol Inform. 2015;216:1051.PubMed Liang C, Sun J, Tao C. Semantic web ontology and data integration: a case study in aiding psychiatric drug repurposing. Stud Health Technol Inform. 2015;216:1051.PubMed
40.
go back to reference Kock-Schoppenhauer A-K, Kamann C, Ulrich H, Duhm-Harbeck P, Ingenerf J. Linked data applications through ontology based data access in clinical research. Stud Health Technol Inform. 2017;235:131–5.PubMed Kock-Schoppenhauer A-K, Kamann C, Ulrich H, Duhm-Harbeck P, Ingenerf J. Linked data applications through ontology based data access in clinical research. Stud Health Technol Inform. 2017;235:131–5.PubMed
44.
go back to reference Tao C, Wei WQ, Solbrig HR, Savova G, Chute CG. CNTRO: a semantic web ontology for temporal relation Inferencing in clinical narratives. AMIA Annu Symp Proc AMIA Symp AMIA Symp. 2010;2010:787–91. Tao C, Wei WQ, Solbrig HR, Savova G, Chute CG. CNTRO: a semantic web ontology for temporal relation Inferencing in clinical narratives. AMIA Annu Symp Proc AMIA Symp AMIA Symp. 2010;2010:787–91.
45.
go back to reference Tao C, Solbrig HR, Chute CG. CNTRO 2.0: A Harmonized Semantic Web ontology for temporal relation Inferencing in clinical narratives. AMIA Jt summits Transl Sci Proc AMIA Jt summits Transl Sci 2011;2011:64–68. Tao C, Solbrig HR, Chute CG. CNTRO 2.0: A Harmonized Semantic Web ontology for temporal relation Inferencing in clinical narratives. AMIA Jt summits Transl Sci Proc AMIA Jt summits Transl Sci 2011;2011:64–68.
53.
go back to reference Iachan R, Pierannunzi C, Healey K, Greenlund KJ, Town M. National weighting of data from the behavioral risk factor surveillance system (BRFSS). BMC Med Res Methodol. 2016;16:155.CrossRefPubMedPubMedCentral Iachan R, Pierannunzi C, Healey K, Greenlund KJ, Town M. National weighting of data from the behavioral risk factor surveillance system (BRFSS). BMC Med Res Methodol. 2016;16:155.CrossRefPubMedPubMedCentral
55.
go back to reference Calvanese D, Cogrel B, Komla-Ebri S, Kontchakov R, Lanti D, Rezk M, et al. Ontop: answering SPARQL queries over relational databases. Semantic Web. 2017;8:471–87.CrossRef Calvanese D, Cogrel B, Komla-Ebri S, Kontchakov R, Lanti D, Rezk M, et al. Ontop: answering SPARQL queries over relational databases. Semantic Web. 2017;8:471–87.CrossRef
56.
go back to reference Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, et al. BioPortal: enhanced functionality via new web services from the National Center for biomedical ontology to access and use ontologies in software applications. Nucleic Acids Res 2011;39 Web Server issue:W541–W545. Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, et al. BioPortal: enhanced functionality via new web services from the National Center for biomedical ontology to access and use ontologies in software applications. Nucleic Acids Res 2011;39 Web Server issue:W541–W545.
58.
go back to reference Horridge M, Bechhofer S. The OWL API: A Java API for Working with OWL 2 Ontologies. Horridge M, Bechhofer S. The OWL API: A Java API for Working with OWL 2 Ontologies.
62.
go back to reference Visochek A. Practical Data Wrangling: Expert techniques for transforming your raw data into a valuable source for analytics. Packt Publishing; 2017. Visochek A. Practical Data Wrangling: Expert techniques for transforming your raw data into a valuable source for analytics. Packt Publishing; 2017.
Metadata
Title
An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival
Authors
Hansi Zhang
Yi Guo
Qian Li
Thomas J. George
Elizabeth Shenkman
François Modave
Jiang Bian
Publication date
01-07-2018
Publisher
BioMed Central
DOI
https://doi.org/10.1186/s12911-018-0636-4

Other articles of this Special Issue 2/2018

BMC Medical Informatics and Decision Making 2/2018 Go to the issue