Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2016

Open Access 01-12-2016 | Research article

ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository

Authors: Martin Dugas, Alexandra Meidt, Philipp Neuhaus, Michael Storck, Julian Varghese

Published in: BMC Medical Research Methodology | Issue 1/2016

Login to get access

Abstract

Background

The volume and complexity of patient data – especially in personalised medicine – is steadily increasing, both regarding clinical data and genomic profiles: Typically more than 1,000 items (e.g., laboratory values, vital signs, diagnostic tests etc.) are collected per patient in clinical trials. In oncology hundreds of mutations can potentially be detected for each patient by genomic profiling. Therefore data integration from multiple sources constitutes a key challenge for medical research and healthcare.

Methods

Semantic annotation of data elements can facilitate to identify matching data elements in different sources and thereby supports data integration. Millions of different annotations are required due to the semantic richness of patient data. These annotations should be uniform, i.e., two matching data elements shall contain the same annotations. However, large terminologies like SNOMED CT or UMLS don’t provide uniform coding. It is proposed to develop semantic annotations of medical data elements based on a large-scale public metadata repository. To achieve uniform codes, semantic annotations shall be re-used if a matching data element is available in the metadata repository.

Results

A web-based tool called ODMedit (https://​odmeditor.​uni-muenster.​de/​) was developed to create data models with uniform semantic annotations. It contains ~800,000 terms with semantic annotations which were derived from ~5,800 models from the portal of medical data models (MDM). The tool was successfully applied to manually annotate 22 forms with 292 data items from CDISC and to update 1,495 data models of the MDM portal.

Conclusion

Uniform manual semantic annotation of data models is feasible in principle, but requires a large-scale collaborative effort due to the semantic richness of patient data. A web-based tool for these annotations is available, which is linked to a public metadata repository.
Literature
2.
go back to reference Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214–8.CrossRefPubMedPubMedCentral Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214–8.CrossRefPubMedPubMedCentral
3.
go back to reference Getz K. Protocol design trends and their effect on clinical trial performance. RAJ Pharma. 2008;5:315–6. Getz K. Protocol design trends and their effect on clinical trial performance. RAJ Pharma. 2008;5:315–6.
6.
go back to reference Dugas M. Missing semantic annotation in databases. The root cause for data integration and migration problems in information systems. Methods Inf Med. 2014;53(6):516–7.CrossRefPubMed Dugas M. Missing semantic annotation in databases. The root cause for data integration and migration problems in information systems. Methods Inf Med. 2014;53(6):516–7.CrossRefPubMed
9.
go back to reference Dugas M, Jöckel KH, Friede T, Gefeller O, Kieser M, Marschollek M, Ammenwerth E, Röhrig R, Knaup-Gregori P, Prokosch HU. Memorandum “open metadata”. open access to documentation forms and item catalogs in healthcare. Methods Inf Med. 2015;25:54(4). Dugas M, Jöckel KH, Friede T, Gefeller O, Kieser M, Marschollek M, Ammenwerth E, Röhrig R, Knaup-Gregori P, Prokosch HU. Memorandum “open metadata”. open access to documentation forms and item catalogs in healthcare. Methods Inf Med. 2015;25:54(4).
13.
go back to reference Bruland P, Breil B, Fritz F, Dugas M. Interoperability in clinical research: from metadata registries to semantically annotated CDISC ODM. Stud Health Technol Inform. 2012;180:564–8.PubMed Bruland P, Breil B, Fritz F, Dugas M. Interoperability in clinical research: from metadata registries to semantically annotated CDISC ODM. Stud Health Technol Inform. 2012;180:564–8.PubMed
14.
go back to reference Dugas M, Dugas-Breit S. Integrated data management for clinical studies: automatic transformation of data models with semantic annotations for principal investigators, data managers and statisticians. PLoS One. 2014;9(2), e90492.CrossRefPubMedPubMedCentral Dugas M, Dugas-Breit S. Integrated data management for clinical studies: automatic transformation of data models with semantic annotations for principal investigators, data managers and statisticians. PLoS One. 2014;9(2), e90492.CrossRefPubMedPubMedCentral
21.
go back to reference Doods J, Botteri F, Dugas M, Fritz F. EHR4CR WP7. A European inventory of common electronic health record data elements for clinical trial feasibility. Trials. 2014;15:18.CrossRefPubMedPubMedCentral Doods J, Botteri F, Dugas M, Fritz F. EHR4CR WP7. A European inventory of common electronic health record data elements for clinical trial feasibility. Trials. 2014;15:18.CrossRefPubMedPubMedCentral
22.
go back to reference Dewey FE, Grove ME, Pan C, Goldstein BA, Bernstein JA, Chaib H, Merker JD, Goldfeder RL, Enns GM, David SP, Pakdaman N, Ormond KE, Caleshu C, Kingham K, Klein TE, Whirl-Carrillo M, Sakamoto K, Wheeler MT, Butte AJ, Ford JM, Boxer L, Ioannidis JP, Yeung AC, Altman RB, Assimes TL, Snyder M, Ashley EA, Quertermous T. Clinical interpretation and implications of whole-genome sequencing. JAMA. 2014;311(10):1035–45.CrossRefPubMedPubMedCentral Dewey FE, Grove ME, Pan C, Goldstein BA, Bernstein JA, Chaib H, Merker JD, Goldfeder RL, Enns GM, David SP, Pakdaman N, Ormond KE, Caleshu C, Kingham K, Klein TE, Whirl-Carrillo M, Sakamoto K, Wheeler MT, Butte AJ, Ford JM, Boxer L, Ioannidis JP, Yeung AC, Altman RB, Assimes TL, Snyder M, Ashley EA, Quertermous T. Clinical interpretation and implications of whole-genome sequencing. JAMA. 2014;311(10):1035–45.CrossRefPubMedPubMedCentral
23.
go back to reference Ahmadian L, van Engen-Verheul M, Bakhshi-Raiez F, Peek N, Cornet R, de Keizer NF. The role of standardized data and terminological systems in computerized clinical decision support systems: literature review and survey. Int J Med Inform. 2011;80(2):81–93.CrossRefPubMed Ahmadian L, van Engen-Verheul M, Bakhshi-Raiez F, Peek N, Cornet R, de Keizer NF. The role of standardized data and terminological systems in computerized clinical decision support systems: literature review and survey. Int J Med Inform. 2011;80(2):81–93.CrossRefPubMed
24.
go back to reference Yimam S, Biemann C, Majnaric L, Sabanovic S, Holzinger A. Interactive and Iterative Annotation for Biomedical Entity Recognition. In: Guo Y, Friston K, Aldo F, Hill S, Peng H, editors. Brain Informatics and Health. Lecture Notes in Artificial Intelligence (LNAI) 9250. Cham: Springer; 2015. p. 347–57. Yimam S, Biemann C, Majnaric L, Sabanovic S, Holzinger A. Interactive and Iterative Annotation for Biomedical Entity Recognition. In: Guo Y, Friston K, Aldo F, Hill S, Peng H, editors. Brain Informatics and Health. Lecture Notes in Artificial Intelligence (LNAI) 9250. Cham: Springer; 2015. p. 347–57.
25.
go back to reference Krumm R, Semjonow A, Tio J, Duhme H, Bürkle T, Haier J, Dugas M, Breil B. The need for harmonized structured documentation and chances of secondary use - Results of a systematic analysis with automated form comparison for prostate and breast cancer. J Biomed Inform. 2014;51:86–99.CrossRefPubMed Krumm R, Semjonow A, Tio J, Duhme H, Bürkle T, Haier J, Dugas M, Breil B. The need for harmonized structured documentation and chances of secondary use - Results of a systematic analysis with automated form comparison for prostate and breast cancer. J Biomed Inform. 2014;51:86–99.CrossRefPubMed
26.
go back to reference Varghese J, Schulze Sünninghausen S, Dugas M. Standardized cardiovascular quality assurance forms with multilingual support, UMLS coding and medical concept analyses. Stud Health Technol Inform. 2015;216:837–41.PubMed Varghese J, Schulze Sünninghausen S, Dugas M. Standardized cardiovascular quality assurance forms with multilingual support, UMLS coding and medical concept analyses. Stud Health Technol Inform. 2015;216:837–41.PubMed
27.
go back to reference Rector AL, Brandt S, Schneider T. Getting the foot out of the pelvis: modeling problems affecting use of SNOMED CT hierarchies in practical applications. J Am Med Inform Assoc. 2011;18(4):432–40.CrossRefPubMedPubMedCentral Rector AL, Brandt S, Schneider T. Getting the foot out of the pelvis: modeling problems affecting use of SNOMED CT hierarchies in practical applications. J Am Med Inform Assoc. 2011;18(4):432–40.CrossRefPubMedPubMedCentral
31.
go back to reference Pathak J, Wang J, Kashyap S, Basford M, Li R, Masys DR, Chute CG. Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Network experience. J Am Med Inform Assoc. 2011;18(4):376–86.CrossRefPubMedPubMedCentral Pathak J, Wang J, Kashyap S, Basford M, Li R, Masys DR, Chute CG. Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Network experience. J Am Med Inform Assoc. 2011;18(4):376–86.CrossRefPubMedPubMedCentral
32.
go back to reference German E, Leibowitz A, Shahar Y. An architecture for linking medical decision-support applications to clinical databases and its evaluation. J Biomed Inform. 2009;42(2):203–18.CrossRefPubMed German E, Leibowitz A, Shahar Y. An architecture for linking medical decision-support applications to clinical databases and its evaluation. J Biomed Inform. 2009;42(2):203–18.CrossRefPubMed
33.
go back to reference Tu SW, Peleg M, Carini S, Bobak M, Ross J, Rubin D, Sim I. A practical method for transforming free-text eligibility criteria into computable criteria. J Biomed Inform. 2011;44(2):239–50.CrossRefPubMedPubMedCentral Tu SW, Peleg M, Carini S, Bobak M, Ross J, Rubin D, Sim I. A practical method for transforming free-text eligibility criteria into computable criteria. J Biomed Inform. 2011;44(2):239–50.CrossRefPubMedPubMedCentral
35.
go back to reference Pérez-Rey D, Maojo V, García-Remesal M, Alonso-Calvo R, Billhardt H, Martin-Sánchez F, Sousa A. ONTOFUSION: ontology-based integration of genomic and clinical databases. Comput Biol Med. 2006;36(7–8):712–30.CrossRefPubMed Pérez-Rey D, Maojo V, García-Remesal M, Alonso-Calvo R, Billhardt H, Martin-Sánchez F, Sousa A. ONTOFUSION: ontology-based integration of genomic and clinical databases. Comput Biol Med. 2006;36(7–8):712–30.CrossRefPubMed
Metadata
Title
ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository
Authors
Martin Dugas
Alexandra Meidt
Philipp Neuhaus
Michael Storck
Julian Varghese
Publication date
01-12-2016
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2016
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-016-0164-9

Other articles of this Issue 1/2016

BMC Medical Research Methodology 1/2016 Go to the issue