Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2019

Open Access 01-12-2019 | Technical advance

The Generalized Data Model for clinical research

Authors: Mark D. Danese, Marc Halperin, Jennifer Duryea, Ryan Duryea

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Login to get access

Abstract

Background

Most healthcare data sources store information within their own unique schemas, making reliable and reproducible research challenging. Consequently, researchers have adopted various data models to improve the efficiency of research. Transforming and loading data into these models is a labor-intensive process that can alter the semantics of the original data. Therefore, we created a data model with a hierarchical structure that simplifies the transformation process and minimizes data alteration.

Methods

There were two design goals in constructing the tables and table relationships for the Generalized Data Model (GDM). The first was to focus on clinical codes in their original vocabularies to retain the original semantic representation of the data. The second was to retain hierarchical information present in the original data while retaining provenance. The model was tested by transforming synthetic Medicare data; Surveillance, Epidemiology, and End Results data linked to Medicare claims; and electronic health records from the Clinical Practice Research Datalink. We also tested a subsequent transformation from the GDM into the Sentinel data model.

Results

The resulting data model contains 19 tables, with the Clinical Codes, Contexts, and Collections tables serving as the core of the model, and containing most of the clinical, provenance, and hierarchical information. In addition, a Mapping table allows users to apply an arbitrarily complex set of relationships among vocabulary elements to facilitate automated analyses.

Conclusions

The GDM offers researchers a simpler process for transforming data, clear data provenance, and a path for users to transform their data into other data models. The GDM is designed to retain hierarchical relationships among data elements as well as the original semantic representation of the data, ensuring consistency in protocol implementation as part of a complete data pipeline for researchers.
Appendix
Available only for authorised users
Literature
1.
go back to reference Kahn MG, Batson D, Schilling LM. Data model considerations for clinical effectiveness researchers. Med Care. 2012;50:S60–7.CrossRef Kahn MG, Batson D, Schilling LM. Data model considerations for clinical effectiveness researchers. Med Care. 2012;50:S60–7.CrossRef
2.
go back to reference Klann JG, Abend A, Raghavan VA, Mandl KD, Murphy SN. Data interchange using i2b2. J Am Med Informatics Assoc. 2016;23:909–15.CrossRef Klann JG, Abend A, Raghavan VA, Mandl KD, Murphy SN. Data interchange using i2b2. J Am Med Informatics Assoc. 2016;23:909–15.CrossRef
3.
go back to reference Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Informatics Assoc. 2010;17:124–30.CrossRef Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Informatics Assoc. 2010;17:124–30.CrossRef
5.
go back to reference Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc. 2012;19:54–60.CrossRef Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc. 2012;19:54–60.CrossRef
7.
go back to reference Voss EA, Makadia R, Matcho A, Ma Q, Knoll C, Schuemie M, et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J Am Med Informatics Assoc. 2015;22:553–64.CrossRef Voss EA, Makadia R, Matcho A, Ma Q, Knoll C, Schuemie M, et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J Am Med Informatics Assoc. 2015;22:553–64.CrossRef
8.
go back to reference Psaty BM, Breckenridge AM. Mini-sentinel and regulatory science--big data rendered fit and functional. N Engl J Med. 2014;370:2165.CrossRef Psaty BM, Breckenridge AM. Mini-sentinel and regulatory science--big data rendered fit and functional. N Engl J Med. 2014;370:2165.CrossRef
9.
go back to reference Curtis LH, Weiner MG, Boudreau DM, Cooper WO, Daniel GW, Nair VP, et al. Design considerations, architecture, and use of the mini-sentinel distributed data system. Pharmacoepidemiol Drug Saf. 2012;21(SUPPL. 1):23–31.CrossRef Curtis LH, Weiner MG, Boudreau DM, Cooper WO, Daniel GW, Nair VP, et al. Design considerations, architecture, and use of the mini-sentinel distributed data system. Pharmacoepidemiol Drug Saf. 2012;21(SUPPL. 1):23–31.CrossRef
11.
go back to reference Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014;21:578–82.CrossRef Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014;21:578–82.CrossRef
13.
go back to reference Bourke A, Bate A, Sauer BC, Brown JS, Hall GC. Evidence generation from healthcare databases: recommendations for managing change. Pharmacoepidemiol Drug Saf. 2016;25:749–54.CrossRef Bourke A, Bate A, Sauer BC, Brown JS, Hall GC. Evidence generation from healthcare databases: recommendations for managing change. Pharmacoepidemiol Drug Saf. 2016;25:749–54.CrossRef
14.
go back to reference Tyree PT, Lind BK, Lafferty WE. Challenges of using medical insurance claims data for utilization analysis. Am J Med Qual. 2006;21:269–75.CrossRef Tyree PT, Lind BK, Lafferty WE. Challenges of using medical insurance claims data for utilization analysis. Am J Med Qual. 2006;21:269–75.CrossRef
16.
go back to reference Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data resource profile: clinical practice research datalink (CPRD). Int J Epidemiol. 2015;44:827–36.CrossRef Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data resource profile: clinical practice research datalink (CPRD). Int J Epidemiol. 2015;44:827–36.CrossRef
17.
go back to reference Park HS, Lloyd S, Decker RH, Wilson LD, Yu JB. Overview of the surveillance, epidemiology, and end results database: evolution, data variables, and quality assurance. Curr Probl Cancer. 36:183–90.CrossRef Park HS, Lloyd S, Decker RH, Wilson LD, Yu JB. Overview of the surveillance, epidemiology, and end results database: evolution, data variables, and quality assurance. Curr Probl Cancer. 36:183–90.CrossRef
18.
go back to reference Danese MD, Voss EA, Duryea J, Gleeson M, Duryea R, Matcho A, et al. Feasibility of converting the Medicare synthetic public use data into a standardized data model for clinical research informatics. In: AMIA 2015 annual symposium. San Francisco; 2015. Danese MD, Voss EA, Duryea J, Gleeson M, Duryea R, Matcho A, et al. Feasibility of converting the Medicare synthetic public use data into a standardized data model for clinical research informatics. In: AMIA 2015 annual symposium. San Francisco; 2015.
20.
go back to reference Warren JL, Klabunde CN, Schrag D, Bach PB, Riley GF. Overview of the SEER-Medicare data: content, research applications, and generalizability to the United States elderly population. Med Care. 2002;40(8 Suppl):IV–3-18.PubMed Warren JL, Klabunde CN, Schrag D, Bach PB, Riley GF. Overview of the SEER-Medicare data: content, research applications, and generalizability to the United States elderly population. Med Care. 2002;40(8 Suppl):IV–3-18.PubMed
21.
22.
go back to reference Ong TC, Kahn MG, Kwan BM, Yamashita T, Brandt E, Hosokawa P, et al. Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading. BMC Med Inform Decis Mak. 2017;17:134.CrossRef Ong TC, Kahn MG, Kwan BM, Yamashita T, Brandt E, Hosokawa P, et al. Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading. BMC Med Inform Decis Mak. 2017;17:134.CrossRef
24.
go back to reference Venkatesh AK, Mei H, Kocher KE, Granovsky M, Obermeyer Z, Spatz ES, et al. Identification of emergency department visits in Medicare administrative claims: approaches and implications. Acad Emerg Med. 2017;24:422–31.CrossRef Venkatesh AK, Mei H, Kocher KE, Granovsky M, Obermeyer Z, Spatz ES, et al. Identification of emergency department visits in Medicare administrative claims: approaches and implications. Acad Emerg Med. 2017;24:422–31.CrossRef
25.
go back to reference Xu Y, Zhou X, Suehs BT, Hartzema AG, Kahn MG, Moride Y, et al. A comparative assessment of observational medical outcomes partnership and mini-sentinel common data models and analytics: implications for active drug safety surveillance. Drug Saf. 2015;38:749–65.CrossRef Xu Y, Zhou X, Suehs BT, Hartzema AG, Kahn MG, Moride Y, et al. A comparative assessment of observational medical outcomes partnership and mini-sentinel common data models and analytics: implications for active drug safety surveillance. Drug Saf. 2015;38:749–65.CrossRef
26.
go back to reference Zhou X, Murugesan S, Bhullar H, Liu Q, Cai B, Wentworth C, et al. An evaluation of the THIN database in the OMOP common data model for active drug safety surveillance. Drug Saf. 2013;36:119–34.CrossRef Zhou X, Murugesan S, Bhullar H, Liu Q, Cai B, Wentworth C, et al. An evaluation of the THIN database in the OMOP common data model for active drug safety surveillance. Drug Saf. 2013;36:119–34.CrossRef
28.
go back to reference Klann JG, Phillips LC, Herrick C, Joss MAH, Wagholikar KB, Murphy SN. Web services for data warehouses: OMOP and PCORnet on i2b2. J Am Med Inform Assoc. 2018;25(10):1331–8.CrossRef Klann JG, Phillips LC, Herrick C, Joss MAH, Wagholikar KB, Murphy SN. Web services for data warehouses: OMOP and PCORnet on i2b2. J Am Med Inform Assoc. 2018;25(10):1331–8.CrossRef
29.
go back to reference Centers for Medicare and Medicaid Services. HCPCS. Centers for Medicare and Medicaid Services. HCPCS.
30.
go back to reference Bradshaw RL, Matney S, Livne OE, Bray BE, Mitchell JA, Narus SP. Architecture of a federated query engine for heterogeneous resources. AMIA . Annu Symp proceedings AMIA Symp. 2009;2009:70–4. Bradshaw RL, Matney S, Livne OE, Bray BE, Mitchell JA, Narus SP. Architecture of a federated query engine for heterogeneous resources. AMIA . Annu Symp proceedings AMIA Symp. 2009;2009:70–4.
31.
go back to reference Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Liu PJ, et al. Scalable and accurate deep learning for electronic health records. npj Digit Med. 2018; January:1–10. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Liu PJ, et al. Scalable and accurate deep learning for electronic health records. npj Digit Med. 2018; January:1–10.
32.
go back to reference Lash TL, Fox MP, Cooney D, Lu Y, Forshee RA. Quantitative Bias analysis in regulatory settings. Am J Public Health. 2016;106:1227–30.CrossRef Lash TL, Fox MP, Cooney D, Lu Y, Forshee RA. Quantitative Bias analysis in regulatory settings. Am J Public Health. 2016;106:1227–30.CrossRef
33.
go back to reference Duan R, Cao M, Wu Y, Huang J, Denny JC, Xu H, et al. An empirical study for impacts of measurement errors on EHR based association studies. AMIA Annu Symp proceedings AMIA Symp. 2016;2016:1764–73. Duan R, Cao M, Wu Y, Huang J, Denny JC, Xu H, et al. An empirical study for impacts of measurement errors on EHR based association studies. AMIA Annu Symp proceedings AMIA Symp. 2016;2016:1764–73.
35.
go back to reference Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32(suppl 1):D267–70.CrossRef Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32(suppl 1):D267–70.CrossRef
Metadata
Title
The Generalized Data Model for clinical research
Authors
Mark D. Danese
Marc Halperin
Jennifer Duryea
Ryan Duryea
Publication date
01-12-2019
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-019-0837-5

Other articles of this Issue 1/2019

BMC Medical Informatics and Decision Making 1/2019 Go to the issue