PS1-39: The Kaiser Permanente Northern California Oracle Research Database

  1. Jamila Gul, BS1
  1. 1Kaiser Northern California

Abstract

NCAL developed a pilot Research Database (RDB) in KPNC that contains clinical and administrative data for the 3.3 million current and 10 million past Health Plan members in the KPNC region, with some data that spans a 40 year period. The primary reason to create such a data warehouse is that researchers are often frustrated by the difficulty of finding, extracting, cleaning, interpreting and transforming the data into useful analytical datasets. This is due to the absence of an efficient research data infrastructure that is optimized for the analytical needs of researchers. In the RDB, we have aggregated clinical data from legacy systems, the current electronic health record (EHR) system, various public use datasets, and research datasets into a research data warehouse.

Aims The specific aims of the project are:

  1. to develop a uniform schema with consistent data definitions and coding, using standardized terminology, that will harmonize aggregated legacy and current KP electronic health records,

  2. develop interfaces and extract, transform and load (ETL) processes for the KP electronic medical record system and other clinical systems that will feed data to the NRDB,

  3. implement a security model that is compliant with HIPAA and State regulations and KP policies, and

  4. develop federated schema with the other KP regions, so that it will be possible to do federated queries of each region’s databases and aggregate the results, while enabling each region to control access to its data.

Methods We developed the RDB in Oracle. ETLs to various data sources were developed in SAS, Informatica and PL/SQL.

Results The database is available to over 150 KP researchers and programmers in the Division of Research and other KPNC researchers, who are currently engaged in over 250 research projects. The KPNC Virtual Data Warehouse (VDW) has been incorporated into the RDB as a materialized view or data mart. It is a primary example of an analytic data source optimized for research. Updates to the RDB can be reflected immediately in the VDW.

Conclusion The RDB, when it is complete, will be a unique and valuable resource for clinical,

| Table of Contents