Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2017

Open Access 01-12-2017 | Software Review

Clinical records anonymisation and text extraction (CRATE): an open-source software system

Author: Rudolf N. Cardinal

Published in: BMC Medical Informatics and Decision Making | Issue 1/2017

Login to get access

Abstract

Background

Electronic medical records contain information of value for research, but contain identifiable and often highly sensitive confidential information. Patient-identifiable information cannot in general be shared outside clinical care teams without explicit consent, but anonymisation/de-identification allows research uses of clinical data without explicit consent.

Results

This article presents CRATE (Clinical Records Anonymisation and Text Extraction), an open-source software system with separable functions: (1) it anonymises or de-identifies arbitrary relational databases, with sensitivity and precision similar to previous comparable systems; (2) it uses public secure cryptographic methods to map patient identifiers to research identifiers (pseudonyms); (3) it connects relational databases to external tools for natural language processing; (4) it provides a web front end for research and administrative functions; and (5) it supports a specific model through which patients may consent to be contacted about research.

Conclusions

Creation and management of a research database from sensitive clinical records with secure pseudonym generation, full-text indexing, and a consent-to-contact process is possible and practical using entirely free and open-source software.
Literature
6.
go back to reference Stewart R, Soremekun M, Perera G, Broadbent M, Callard F, Denis M, et al. The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register: development and descriptive data. BMC Psychiatry. 2009;9:51.CrossRefPubMedPubMedCentral Stewart R, Soremekun M, Perera G, Broadbent M, Callard F, Denis M, et al. The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register: development and descriptive data. BMC Psychiatry. 2009;9:51.CrossRefPubMedPubMedCentral
7.
go back to reference Committee on Strategies for Responsible Sharing of Clinical Trial Data. Concepts and Methods for De-identifying Clinical Trial Data. Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk [Internet]. Washington: Board on Health Sciences Policy, Institute of Medicine; 2015. Available from: https://www.ncbi.nlm.nih.gov/books/NBK285994/. Committee on Strategies for Responsible Sharing of Clinical Trial Data. Concepts and Methods for De-identifying Clinical Trial Data. Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk [Internet]. Washington: Board on Health Sciences Policy, Institute of Medicine; 2015. Available from: https://​www.​ncbi.​nlm.​nih.​gov/​books/​NBK285994/​.
11.
go back to reference Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc JAMIA. 2007;14:550–63.CrossRefPubMed Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc JAMIA. 2007;14:550–63.CrossRefPubMed
12.
go back to reference Ferrández O, South BR, Shen S, Friedlin FJ, Samore MH, Meystre SM. Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents. BMC Med Res Methodol. 2012;12:109.CrossRefPubMedPubMedCentral Ferrández O, South BR, Shen S, Friedlin FJ, Samore MH, Meystre SM. Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents. BMC Med Res Methodol. 2012;12:109.CrossRefPubMedPubMedCentral
13.
go back to reference Fernandes AC, Cloete D, Broadbent MTM, Hayes RD, Chang C-K, Jackson RG, et al. Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records. BMC Med Inform Decis Mak. 2013;13:71.CrossRefPubMedPubMedCentral Fernandes AC, Cloete D, Broadbent MTM, Hayes RD, Chang C-K, Jackson RG, et al. Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records. BMC Med Inform Decis Mak. 2013;13:71.CrossRefPubMedPubMedCentral
14.
go back to reference Neamatullah I, Douglass MM, Lehman LH, Reisner A, Villarroel M, Long WJ, et al. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak. 2008;8:32.CrossRefPubMedPubMedCentral Neamatullah I, Douglass MM, Lehman LH, Reisner A, Villarroel M, Long WJ, et al. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak. 2008;8:32.CrossRefPubMedPubMedCentral
15.
go back to reference Erdal BS, Liu J, Ding J, Chen J, Marsh CB, Kamal J, et al. A database de-identification framework to enable direct queries on medical data for secondary use. Methods Inf Med. 2012;51:229–41.CrossRefPubMed Erdal BS, Liu J, Ding J, Chen J, Marsh CB, Kamal J, et al. A database de-identification framework to enable direct queries on medical data for secondary use. Methods Inf Med. 2012;51:229–41.CrossRefPubMed
22.
go back to reference Cunningham H, Tablan V, Roberts A, Bontcheva K. Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput Biol. 2013;9:e1002854.CrossRefPubMedPubMedCentral Cunningham H, Tablan V, Roberts A, Bontcheva K. Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput Biol. 2013;9:e1002854.CrossRefPubMedPubMedCentral
23.
go back to reference Jiang M, Wu Y, Shah A, Priyanka P, Denny JC, Xu H. Extracting and standardizing medication information in clinical text - the MedEx-UIMA system. AMIA Joint Summits on Translational Science Proceedings. 2014;2014:37–42. Jiang M, Wu Y, Shah A, Priyanka P, Denny JC, Xu H. Extracting and standardizing medication information in clinical text - the MedEx-UIMA system. AMIA Joint Summits on Translational Science Proceedings. 2014;2014:37–42.
32.
go back to reference Kleene SC. Representation of Events in Nerve Nets and Finite Automata. In: Shannon CE, McCarthy J, editors. Automata Studies. Princeton: Princeton University Press; 1956. p. 3–42. Kleene SC. Representation of Events in Nerve Nets and Finite Automata. In: Shannon CE, McCarthy J, editors. Automata Studies. Princeton: Princeton University Press; 1956. p. 3–42.
34.
go back to reference Ling X, Weld D. Temporal information extraction. AAAI-10 Proc. Twenty-Fourth AAAI Conf. Artif Intell. 2010;2010:1385–90. Ling X, Weld D. Temporal information extraction. AAAI-10 Proc. Twenty-Fourth AAAI Conf. Artif Intell. 2010;2010:1385–90.
37.
go back to reference Bellare M, Canetti R, Krawcyk H. Keying hash functions for message authentication. Lect Notes Comput Sci Adv Cryptol - Crypto 96 Proc. 1996;1109:1–15. Bellare M, Canetti R, Krawcyk H. Keying hash functions for message authentication. Lect Notes Comput Sci Adv Cryptol - Crypto 96 Proc. 1996;1109:1–15.
38.
go back to reference Preneel B. The First 30 Years of Cryptographic Hash Functions and the NIST SHA-3 Competition. In: Pieprzyk J, editor. Topics in Cryptology - CT-RSA 2010. CT-RSA 2010. Lecture Notes in Computer Science, vol 5985. Springer, Berlin, Heidelberg; 2010. p. 1–14. Available from: http://link.springer.com/chapter/10.1007/978-3-642-11925-5_1. Preneel B. The First 30 Years of Cryptographic Hash Functions and the NIST SHA-3 Competition. In: Pieprzyk J, editor. Topics in Cryptology - CT-RSA 2010. CT-RSA 2010. Lecture Notes in Computer Science, vol 5985. Springer, Berlin, Heidelberg; 2010. p. 1–14. Available from: http://​link.​springer.​com/​chapter/​10.​1007/​978-3-642-11925-5_​1.
40.
go back to reference Iqbal E, Mallah R, Jackson RG, Ball M, Ibrahim ZM, Broadbent M, et al. Identification of Adverse Drug Events from Free Text Electronic Patient Records and Information in a Large Mental Health Case Register. PLoS One. 2015;10:e0134208.CrossRefPubMedPubMedCentral Iqbal E, Mallah R, Jackson RG, Ball M, Ibrahim ZM, Broadbent M, et al. Identification of Adverse Drug Events from Free Text Electronic Patient Records and Information in a Large Mental Health Case Register. PLoS One. 2015;10:e0134208.CrossRefPubMedPubMedCentral
Metadata
Title
Clinical records anonymisation and text extraction (CRATE): an open-source software system
Author
Rudolf N. Cardinal
Publication date
01-12-2017
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2017
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-017-0437-1

Other articles of this Issue 1/2017

BMC Medical Informatics and Decision Making 1/2017 Go to the issue