Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2017

Open Access 01-12-2017 | Technical advance

Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading

Authors: Toan C. Ong, Michael G. Kahn, Bethany M. Kwan, Traci Yamashita, Elias Brandt, Patrick Hosokawa, Chris Uhrich, Lisa M. Schilling

Published in: BMC Medical Informatics and Decision Making | Issue 1/2017

Login to get access

Abstract

Background

Electronic health records (EHRs) contain detailed clinical data stored in proprietary formats with non-standard codes and structures. Participating in multi-site clinical research networks requires EHR data to be restructured and transformed into a common format and standard terminologies, and optimally linked to other data sources. The expertise and scalable solutions needed to transform data to conform to network requirements are beyond the scope of many health care organizations and there is a need for practical tools that lower the barriers of data contribution to clinical research networks.

Methods

We designed and implemented a health data transformation and loading approach, which we refer to as Dynamic ETL (Extraction, Transformation and Loading) (D-ETL), that automates part of the process through use of scalable, reusable and customizable code, while retaining manual aspects of the process that requires knowledge of complex coding syntax. This approach provides the flexibility required for the ETL of heterogeneous data, variations in semantic expertise, and transparency of transformation logic that are essential to implement ETL conventions across clinical research sharing networks. Processing workflows are directed by the ETL specifications guideline, developed by ETL designers with extensive knowledge of the structure and semantics of health data (i.e., “health data domain experts”) and target common data model.

Results

D-ETL was implemented to perform ETL operations that load data from various sources with different database schema structures into the Observational Medical Outcome Partnership (OMOP) common data model. The results showed that ETL rule composition methods and the D-ETL engine offer a scalable solution for health data transformation via automatic query generation to harmonize source datasets.

Conclusions

D-ETL supports a flexible and transparent process to transform and load health data into a target data model. This approach offers a solution that lowers technical barriers that prevent data partners from participating in research data networks, and therefore, promotes the advancement of comparative effectiveness research using secondary electronic health data.
Appendix
Available only for authorised users
Literature
2.
go back to reference Danaei G, Rodríguez LAG, Cantero OF, et al. Observational data for comparative effectiveness research: an emulation of randomised trials of statins and primary prevention of coronary heart disease. Stat Methods Med Res. 2013;22:70–96. doi:10.1177/0962280211403603.CrossRefPubMed Danaei G, Rodríguez LAG, Cantero OF, et al. Observational data for comparative effectiveness research: an emulation of randomised trials of statins and primary prevention of coronary heart disease. Stat Methods Med Res. 2013;22:70–96. doi:10.​1177/​0962280211403603​.CrossRefPubMed
3.
go back to reference Institute of Medicine (U.S.) Committee on Improving the Patient Record. In: Dick RS, Steen EB, editors. The computer-based patient record: an essential technology for health care, revised edition. Washington, D.C.: National Academy Press; 1997. Institute of Medicine (U.S.) Committee on Improving the Patient Record. In: Dick RS, Steen EB, editors. The computer-based patient record: an essential technology for health care, revised edition. Washington, D.C.: National Academy Press; 1997.
8.
go back to reference Rassen JA, Schneeweiss S. Using high-dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system. Pharmacoepidemiol Drug Saf. 2012;21(Suppl 1):41–9. doi:10.1002/pds.2328.CrossRefPubMed Rassen JA, Schneeweiss S. Using high-dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system. Pharmacoepidemiol Drug Saf. 2012;21(Suppl 1):41–9. doi:10.​1002/​pds.​2328.CrossRefPubMed
9.
go back to reference Garbe E, Kloss S, Suling M, et al. High-dimensional versus conventional propensity scores in a comparative effectiveness study of coxibs and reduced upper gastrointestinal complications. Eur J Clin Pharmacol Published Online First: 5 July 2012. doi:10.1007/s00228-012-1334-2. Garbe E, Kloss S, Suling M, et al. High-dimensional versus conventional propensity scores in a comparative effectiveness study of coxibs and reduced upper gastrointestinal complications. Eur J Clin Pharmacol Published Online First: 5 July 2012. doi:10.​1007/​s00228-012-1334-2.
10.
go back to reference Fleischer NL, Fernald LC, Hubbard AE. Estimating the potential impacts of intervention from observational data: methods for estimating causal attributable risk in a cross-sectional analysis of depressive symptoms in Latin America. J Epidemiol Community Health. 2010;64:16–21. doi:10.1136/jech.2008.085985.CrossRefPubMed Fleischer NL, Fernald LC, Hubbard AE. Estimating the potential impacts of intervention from observational data: methods for estimating causal attributable risk in a cross-sectional analysis of depressive symptoms in Latin America. J Epidemiol Community Health. 2010;64:16–21. doi:10.​1136/​jech.​2008.​085985.CrossRefPubMed
11.
go back to reference Polsky D, Eremina D, Hess G, et al. The importance of clinical variables in comparative analyses using propensity-score matching: the case of ESA costs for the treatment of chemotherapy-induced anaemia. PharmacoEconomics. 2009;27:755–65. doi:10.2165/11313860-000000000-00000.CrossRefPubMed Polsky D, Eremina D, Hess G, et al. The importance of clinical variables in comparative analyses using propensity-score matching: the case of ESA costs for the treatment of chemotherapy-induced anaemia. PharmacoEconomics. 2009;27:755–65. doi:10.​2165/​11313860-000000000-00000.CrossRefPubMed
13.
go back to reference Jansen RG, Wiertz LF, Meyer ES, et al. Reliability analysis of observational data: problems, solutions, and software implementation. Behav Res Methods Instrum Comput. 2003;35:391–9.CrossRefPubMed Jansen RG, Wiertz LF, Meyer ES, et al. Reliability analysis of observational data: problems, solutions, and software implementation. Behav Res Methods Instrum Comput. 2003;35:391–9.CrossRefPubMed
16.
go back to reference Forrest CB, Margolis PA, Bailey LC, et al. PEDSnet: a National Pediatric Learning Health System. J Am Med Inform Assoc JAMIA Published Online First: 12 May 2014. doi: 10.1136/amiajnl-2014-002743. Forrest CB, Margolis PA, Bailey LC, et al. PEDSnet: a National Pediatric Learning Health System. J Am Med Inform Assoc JAMIA Published Online First: 12 May 2014. doi: 10.​1136/​amiajnl-2014-002743.
21.
go back to reference Schilling LM, Kwan B, Drolshagen C, et al. Scalable Architecture for Federated Translational Inquires Network (SAFTINet): technology infrastructure for a Distributed Data Network. EGEMs Gener. Evid. Methods Improve Patient Outcomes. 2013. in press. Schilling LM, Kwan B, Drolshagen C, et al. Scalable Architecture for Federated Translational Inquires Network (SAFTINet): technology infrastructure for a Distributed Data Network. EGEMs Gener. Evid. Methods Improve Patient Outcomes. 2013. in press.
26.
28.
go back to reference Kushanoor A, Murali Krishna S, Vidya Sagar Reddy T. ETL process modeling in DWH using enhanced quality techniques. Int J Database Theory Appl. 2013;6:179–98. Kushanoor A, Murali Krishna S, Vidya Sagar Reddy T. ETL process modeling in DWH using enhanced quality techniques. Int J Database Theory Appl. 2013;6:179–98.
29.
go back to reference Peek N, Holmes JH, Sun J. Technical challenges for big data in biomedicine and health: data sources, infrastructure, and analytics. Yearb Med Inform. 2014;9:42–7. doi:10.15265/IY-2014-0018 Peek N, Holmes JH, Sun J. Technical challenges for big data in biomedicine and health: data sources, infrastructure, and analytics. Yearb Med Inform. 2014;9:42–7. doi:10.​15265/​IY-2014-0018
30.
go back to reference Sandhu E, Weinstein S, McKethan A, et al. Secondary uses of electronic health record data: Benefits and barriers. Jt Comm J Qual Patient Saf. 2012;38:34–40. 1CrossRefPubMed Sandhu E, Weinstein S, McKethan A, et al. Secondary uses of electronic health record data: Benefits and barriers. Jt Comm J Qual Patient Saf. 2012;38:34–40. 1CrossRefPubMed
34.
36.
go back to reference Madhavan J, Jeffery SR, Cohen S, et al. Web-scale data integration: you can only afford to pay as you go. In: In Proc. of CIDR-07; 2007. Madhavan J, Jeffery SR, Cohen S, et al. Web-scale data integration: you can only afford to pay as you go. In: In Proc. of CIDR-07; 2007.
40.
go back to reference Rothstein MA. Currents in contemporary ethics. Research privacy under HIPAA and the common rule. J Law Med Ethics. 2005;33:154–9.CrossRefPubMed Rothstein MA. Currents in contemporary ethics. Research privacy under HIPAA and the common rule. J Law Med Ethics. 2005;33:154–9.CrossRefPubMed
43.
go back to reference Herdman R, Moses HL, United States, et al. Effect of the HIPAA privacy rule on health research: Proceedings of a workshop presented to the National Cancer Policy Forum. Washington D.C.: National Academies Press; 2006. Herdman R, Moses HL, United States, et al. Effect of the HIPAA privacy rule on health research: Proceedings of a workshop presented to the National Cancer Policy Forum. Washington D.C.: National Academies Press; 2006.
47.
go back to reference Kwan BM, Sills MR, Graham D, et al. Stakeholder Engagement in a Patient-Reported Outcomes (PRO) Measure Implementation: A Report from the SAFTINet Practice-based Research Network (PBRN). J Am Board Fam Med JABFM. 2016;29:102–15. doi:10.3122/jabfm.2016.01.150141. Kwan BM, Sills MR, Graham D, et al. Stakeholder Engagement in a Patient-Reported Outcomes (PRO) Measure Implementation: A Report from the SAFTINet Practice-based Research Network (PBRN). J Am Board Fam Med JABFM. 2016;29:102–15. doi:10.​3122/​jabfm.​2016.​01.​150141.
Metadata
Title
Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading
Authors
Toan C. Ong
Michael G. Kahn
Bethany M. Kwan
Traci Yamashita
Elias Brandt
Patrick Hosokawa
Chris Uhrich
Lisa M. Schilling
Publication date
01-12-2017
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2017
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-017-0532-3

Other articles of this Issue 1/2017

BMC Medical Informatics and Decision Making 1/2017 Go to the issue