Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2019

Open Access 01-12-2019 | Technical advance

A clustering approach for detecting implausible observation values in electronic health records data

Authors: Hossein Estiri, Jeffrey G. Klann, Shawn N. Murphy

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Login to get access

Abstract

Background

Identifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs.

Methods

The primary objectives of this research were to develop and test an unsupervised clustering-based anomaly/outlier detection approach for detecting implausible observations in EHR data as an alternative algorithmic solution to the existing procedures. Our approach is built upon two underlying hypotheses that, (i) when there are large number of observations, implausible records should be sparse, and therefore (ii) if these data are clustered properly, clusters with sparse populations should represent implausible observations. To test these hypotheses, we applied an unsupervised clustering algorithm to EHR observation data on 50 laboratory tests from Partners HealthCare. We tested different specifications of the clustering approach and computed confusion matrix indices against a set of silver-standard plausibility thresholds. We compared the results from the proposed approach with conventional anomaly detection (CAD) approaches, including standard deviation and Mahalanobis distance.

Results

We found that the clustering approach produced results with exceptional specificity and high sensitivity. Compared with the conventional anomaly detection approaches, our proposed clustering approach resulted in significantly smaller number of false positive cases.

Conclusion

Our contributions include (i) a clustering approach for identifying implausible EHR observations, (ii) evidence that implausible observations are sparse in EHR laboratory test results, (iii) a parallel implementation of the clustering approach on i2b2 star schema, and (3) a set of silver-standard plausibility thresholds for 50 laboratory tests that can be used in other studies for validation. The proposed algorithmic solution can augment human decisions to improve data quality. Therefore, a workflow is needed to complement the algorithm’s job and initiate necessary actions that need to be taken in order to improve the quality of data.
Appendix
Available only for authorised users
Literature
4.
go back to reference Ghahramani Z. Unsupervised Learning. In: Bousquet O, von Luxburg U, Rätsch G, editors. Advanced Lectures on Machine Learning. ML 2003. Lecture Notes in Computer Science, vol 3176. Berlin, Heidelberg: Springer; 2004. Ghahramani Z. Unsupervised Learning. In: Bousquet O, von Luxburg U, Rätsch G, editors. Advanced Lectures on Machine Learning. ML 2003. Lecture Notes in Computer Science, vol 3176. Berlin, Heidelberg: Springer; 2004.
5.
go back to reference Hauskrecht M, Batal I, Hong C, Nguyen Q, Cooper GF, Visweswaran S, et al. Outlier-based detection of unusual patient-management actions: an ICU study. J Biomed Inform. 2016;64:211–21.CrossRef Hauskrecht M, Batal I, Hong C, Nguyen Q, Cooper GF, Visweswaran S, et al. Outlier-based detection of unusual patient-management actions: an ICU study. J Biomed Inform. 2016;64:211–21.CrossRef
6.
go back to reference Bouarfa L, Dankelman J. Workflow mining and outlier detection from clinical activity logs. J Biomed Inform. 2012;45(6):1185–90.CrossRef Bouarfa L, Dankelman J. Workflow mining and outlier detection from clinical activity logs. J Biomed Inform. 2012;45(6):1185–90.CrossRef
7.
go back to reference Presbitero A, Quax R, Krzhizhanovskaya V, Sloot P. Anomaly detection in clinical data of patients undergoing heart surgery. Procedia Comput Sci. 2017;108:99–108.CrossRef Presbitero A, Quax R, Krzhizhanovskaya V, Sloot P. Anomaly detection in clinical data of patients undergoing heart surgery. Procedia Comput Sci. 2017;108:99–108.CrossRef
8.
go back to reference Antonelli D, Bruno G, Chiusano S. Anomaly detection in medical treatment to discover unusual patient management. IIE Trans Healthc Syst Eng. 2013;3(2):69–77.CrossRef Antonelli D, Bruno G, Chiusano S. Anomaly detection in medical treatment to discover unusual patient management. IIE Trans Healthc Syst Eng. 2013;3(2):69–77.CrossRef
9.
go back to reference Ray S, Wright A. Detecting anomalies in alert firing within clinical decision support systems using anomaly/outlier detection techniques. Proc. 7th ACM Int. Conf. Bioinformatics, Comput. Biol. Heal. Informatics. New York: ACM; 2016. p. 185–90. Available from: http://doi.acm.org/10.1145/2975167.2975186 Ray S, Wright A. Detecting anomalies in alert firing within clinical decision support systems using anomaly/outlier detection techniques. Proc. 7th ACM Int. Conf. Bioinformatics, Comput. Biol. Heal. Informatics. New York: ACM; 2016. p. 185–90. Available from: http://​doi.​acm.​org/​10.​1145/​2975167.​2975186
10.
go back to reference Ray S, McEvoy DS, Aaron S, Hickman TT, Wright A. Using statistical anomaly detection models to find clinical decision support malfunctions. J Am Med Informatics Assoc. 2018;25(7):862–71.CrossRef Ray S, McEvoy DS, Aaron S, Hickman TT, Wright A. Using statistical anomaly detection models to find clinical decision support malfunctions. J Am Med Informatics Assoc. 2018;25(7):862–71.CrossRef
11.
go back to reference Wilson B, Tseng CL, Soroka O, Pogach LM, Aron DC. Identification of outliers and positive deviants for healthcare improvement: looking for high performers in hypoglycemia safety in patients with diabetes. BMC Health Serv Res. 2017;17(1):738.CrossRef Wilson B, Tseng CL, Soroka O, Pogach LM, Aron DC. Identification of outliers and positive deviants for healthcare improvement: looking for high performers in hypoglycemia safety in patients with diabetes. BMC Health Serv Res. 2017;17(1):738.CrossRef
12.
go back to reference Deneshkumar V, Senthamaraikannan K, Manikandan M. Identification of outliers in medical diagnostic system using data mining techniques. Int J Stat Appl. 2014;4(6):241–8. Deneshkumar V, Senthamaraikannan K, Manikandan M. Identification of outliers in medical diagnostic system using data mining techniques. Int J Stat Appl. 2014;4(6):241–8.
14.
go back to reference Hodge VJ, Austin J. A survey of outlier detection methodologies. Artif Intell Rev. 2004;22(2):85–126.CrossRef Hodge VJ, Austin J. A survey of outlier detection methodologies. Artif Intell Rev. 2004;22(2):85–126.CrossRef
15.
go back to reference Aggarwal CC, Yu PS. Outlier detection for high dimensional data. ACM SIGMOD Rec. 2001;30(2):37–46.CrossRef Aggarwal CC, Yu PS. Outlier detection for high dimensional data. ACM SIGMOD Rec. 2001;30(2):37–46.CrossRef
16.
go back to reference Knorr EM, Ng RT, Tucakov V. Distance-based outliers: algorithms and applications. VLDB J. 2000;8(3-4):237–53.CrossRef Knorr EM, Ng RT, Tucakov V. Distance-based outliers: algorithms and applications. VLDB J. 2000;8(3-4):237–53.CrossRef
17.
go back to reference Ben-Gal I. Outlier Detection. In: Maimon O, Rokach L, editors. Data Mining and Knowledge Discovery Handbook. Boston: Springer; 2005. Ben-Gal I. Outlier Detection. In: Maimon O, Rokach L, editors. Data Mining and Knowledge Discovery Handbook. Boston: Springer; 2005.
18.
go back to reference Gaspar J, Catumbela E, Marques B, Freitas A. A systematic review of outliers detection techniques in medical data - preliminary study. Heal. 2011. Proc Int Conf Heal Informatics. 2011. Gaspar J, Catumbela E, Marques B, Freitas A. A systematic review of outliers detection techniques in medical data - preliminary study. Heal. 2011. Proc Int Conf Heal Informatics. 2011.
19.
go back to reference Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction: Springer Ser. Stat; 2009. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction: Springer Ser. Stat; 2009.
20.
go back to reference Jain AK. Data clustering: 50 years beyond K-means. Pattern Recogn Lett. 2010;31:651–66.CrossRef Jain AK. Data clustering: 50 years beyond K-means. Pattern Recogn Lett. 2010;31:651–66.CrossRef
23.
go back to reference Chen B, Tai PC, Harrison R, Pan Y. Novel hybrid hierarchical-K-means clustering method (H-K-means) for microarray analysis. IEEE Comput Syst Bioinforma Conf Work Poster Abstr. 2005;2005:105–8. Chen B, Tai PC, Harrison R, Pan Y. Novel hybrid hierarchical-K-means clustering method (H-K-means) for microarray analysis. IEEE Comput Syst Bioinforma Conf Work Poster Abstr. 2005;2005:105–8.
27.
go back to reference Nalichowski R, Keogh D, Chueh HC, Murphy SN. Calculating the benefits of a research patient data repository. AMIA Annu Symp Proc United States. 2006. p. 1044. Nalichowski R, Keogh D, Chueh HC, Murphy SN. Calculating the benefits of a research patient data repository. AMIA Annu Symp Proc United States. 2006. p. 1044.
29.
go back to reference De Maesschalck R, Jouan-Rimbaud D, Massart DLL. The Mahalanobis distance. Chemom Intell Lab Syst. 2000;50:1–18. De Maesschalck R, Jouan-Rimbaud D, Massart DLL. The Mahalanobis distance. Chemom Intell Lab Syst. 2000;50:1–18.
30.
go back to reference Filzmoser P. A multivariate outlier detection method. Seventh Int Conf Comput Data Anal Model. 2004. Filzmoser P. A multivariate outlier detection method. Seventh Int Conf Comput Data Anal Model. 2004.
Metadata
Title
A clustering approach for detecting implausible observation values in electronic health records data
Authors
Hossein Estiri
Jeffrey G. Klann
Shawn N. Murphy
Publication date
01-12-2019
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-019-0852-6

Other articles of this Issue 1/2019

BMC Medical Informatics and Decision Making 1/2019 Go to the issue