Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2020

Open Access 01-12-2020 | Research article

Development and validation of data quality rules in administrative health data using association rule mining

Authors: Mingkai Peng, Sangmin Lee, Adam G. D’Souza, Chelsea T. A. Doktorchik, Hude Quan

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Login to get access

Abstract

Background

Data quality assessment presents a challenge for research using coded administrative health data. The objective of this study is to develop and validate a set of coding association rules for coded diagnostic data.

Methods

We used the Canadian re-abstracted hospital discharge abstract data coded in International Classification of Disease, 10th revision (ICD-10) codes. Association rule mining was conducted on the re-abstracted data in four age groups (0–4, 20–44, 45–64; ≥ 65) to extract ICD-10 coding association rules at the three-digit (category of diagnosis) and four-digit levels (category of diagnosis with etiology, anatomy, or severity). The rules were reviewed by a panel of 5 physicians and 2 classification specialists using a modified Delphi rating process. We proposed and defined the variance and bias to assess data quality using the rules.

Results

After the rule mining process and the panel review, 388 rules at the three-digit level and 275 rules at the four-digit level were developed. Half of the rules were from the age group of ≥65. Rules captured meaningful age-specific clinical associations, with rules at the age group of ≥65 being more complex and comprehensive than other age groups. The variance and bias can identify rules with high bias and variance in Alberta data and provides directions for quality improvement.

Conclusions

A set of ICD-10 data quality rules were developed and validated by a clinical and classification expert panel. The rules can be used as a tool to assess ICD-coded data, enabling the monitoring and comparison of data quality across institutions, provinces, and countries.
Literature
9.
go back to reference Canadian Institute for Health Information. CIHI’s Information Quality Framework. Ottawa; 2017. www.cihi.cacopyright@cihi.ca. Accessed 11 Jan 2019. Canadian Institute for Health Information. CIHI’s Information Quality Framework. Ottawa; 2017. www.​cihi.​cacopyright@cihi.​ca. Accessed 11 Jan 2019.
17.
go back to reference Hahsler M, Chelluboina S, Hornik K. The Arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets Christian Buchta, vol. Vol 12; 2011. http://cran.r-project.org. Accessed 3 Apr 2019. Hahsler M, Chelluboina S, Hornik K. The Arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets Christian Buchta, vol. Vol 12; 2011. http://​cran.​r-project.​org. Accessed 3 Apr 2019.
19.
go back to reference Fitch K, Bernstein María SJ, Aguilar D, et al. The RAND/UCLA appropriateness method User’s manual. Santa Monica; 2001. http://www.rand.org. Accessed 16 Jan 2019. Fitch K, Bernstein María SJ, Aguilar D, et al. The RAND/UCLA appropriateness method User’s manual. Santa Monica; 2001. http://​www.​rand.​org. Accessed 16 Jan 2019.
Metadata
Title
Development and validation of data quality rules in administrative health data using association rule mining
Authors
Mingkai Peng
Sangmin Lee
Adam G. D’Souza
Chelsea T. A. Doktorchik
Hude Quan
Publication date
01-12-2020
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-020-1089-0

Other articles of this Issue 1/2020

BMC Medical Informatics and Decision Making 1/2020 Go to the issue