Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2010

Open Access 01-12-2010 | Research article

De-identification of primary care electronic medical records free-text data in Ontario, Canada

Authors: Karen Tu, Julie Klein-Geltink, Tezeta F Mitiku, Chiriac Mihai, Joel Martin

Published in: BMC Medical Informatics and Decision Making | Issue 1/2010

Login to get access

Abstract

Background

Electronic medical records (EMRs) represent a potentially rich source of health information for research but the free-text in EMRs often contains identifying information. While de-identification tools have been developed for free-text, none have been developed or tested for the full range of primary care EMR data

Methods

We used deid open source de-identification software and modified it for an Ontario context for use on primary care EMR data. We developed the modified program on a training set of 1000 free-text records from one group practice and then tested it on two validation sets from a random sample of 700 free-text EMR records from 17 different physicians from 7 different practices in 5 different cities and 500 free-text records from a group practice that was in a different city than the group practice that was used for the training set. We measured the sensitivity/recall, precision, specificity, accuracy and F-measure of the modified tool against manually tagged free-text records to remove patient and physician names, locations, addresses, medical record, health card and telephone numbers.

Results

We found that the modified training program performed with a sensitivity of 88.3%, specificity of 91.4%, precision of 91.3%, accuracy of 89.9% and F-measure of 0.90. The validations sets had sensitivities of 86.7% and 80.2%, specificities of 91.4% and 87.7%, precisions of 91.1% and 87.4%, accuracies of 89.0% and 83.8% and F-measures of 0.89 and 0.84 for the first and second validation sets respectively.

Conclusion

The deid program can be modified to reasonably accurately de-identify free-text primary care EMR records while preserving clinical content.
Literature
1.
go back to reference Report of the WHO Global Observatory for eHealth: Building foundations for eHealth: progress of member states: report of the Global Observatory for eHealth. 2006, Geneva, WHO Press Report of the WHO Global Observatory for eHealth: Building foundations for eHealth: progress of member states: report of the Global Observatory for eHealth. 2006, Geneva, WHO Press
3.
go back to reference Mitiku T, Tu K: Using data from electronic medical records: theory versus practice. Healthcare Quarterly. 2008, 11: 19-21.CrossRef Mitiku T, Tu K: Using data from electronic medical records: theory versus practice. Healthcare Quarterly. 2008, 11: 19-21.CrossRef
6.
go back to reference Taira RK, Bui AA, Kangarloo H: Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp. 2002, 757-761. Taira RK, Bui AA, Kangarloo H: Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp. 2002, 757-761.
7.
go back to reference Sweeney L: Replacing personally-identifying information in medical records, the Scrub system. Proc AMIA Annu Fall Symp. 1996, 333-337. Sweeney L: Replacing personally-identifying information in medical records, the Scrub system. Proc AMIA Annu Fall Symp. 1996, 333-337.
8.
go back to reference Berman JJ: Concept-match medical data scrubbing. How pathology text can be used in research. Arch Pathol Lab Med. 2003, 127: 680-686.PubMed Berman JJ: Concept-match medical data scrubbing. How pathology text can be used in research. Arch Pathol Lab Med. 2003, 127: 680-686.PubMed
9.
go back to reference Gupta D, Saul M, Gilbertson J: Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol. 2004, 121: 176-186. 10.1309/E6K33GBPE5C27FYU.CrossRefPubMed Gupta D, Saul M, Gilbertson J: Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol. 2004, 121: 176-186. 10.1309/E6K33GBPE5C27FYU.CrossRefPubMed
10.
go back to reference Beckwith BA, Mahaadevan R, Balis UJ, Kuo F: Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Med Inform Decis Mak. 2006, 6: 12-10.1186/1472-6947-6-12.CrossRefPubMedPubMedCentral Beckwith BA, Mahaadevan R, Balis UJ, Kuo F: Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Med Inform Decis Mak. 2006, 6: 12-10.1186/1472-6947-6-12.CrossRefPubMedPubMedCentral
11.
go back to reference Uzuner O, Sibanda T, Luo Y, Szolovits : A de-identifier for medical discharge summaries. Artificial Intelligence in Medicine. 2008, 42: 13-35. 10.1016/j.artmed.2007.10.001.CrossRefPubMed Uzuner O, Sibanda T, Luo Y, Szolovits : A de-identifier for medical discharge summaries. Artificial Intelligence in Medicine. 2008, 42: 13-35. 10.1016/j.artmed.2007.10.001.CrossRefPubMed
12.
go back to reference Role of local contect in automatic deidentification of ungrammatical, fragmented text. Proceedings of the North American Chapter of Association for Computational Linguistics/Human Language Technology (NAACL-HLT 2006) New York, NY, June 5-7. 2006, 65-73. Role of local contect in automatic deidentification of ungrammatical, fragmented text. Proceedings of the North American Chapter of Association for Computational Linguistics/Human Language Technology (NAACL-HLT 2006) New York, NY, June 5-7. 2006, 65-73.
13.
go back to reference Szarvas G, Farkas R, Busa-Fekete R: State-of-the-art anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc. 2007, 14: 574-580. 10.1197/jamia.M2441.CrossRefPubMedPubMedCentral Szarvas G, Farkas R, Busa-Fekete R: State-of-the-art anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc. 2007, 14: 574-580. 10.1197/jamia.M2441.CrossRefPubMedPubMedCentral
14.
go back to reference Uzuner O, Luo Y, Szolovits P: Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007, 14: 550-563. 10.1197/jamia.M2444.CrossRefPubMedPubMedCentral Uzuner O, Luo Y, Szolovits P: Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007, 14: 550-563. 10.1197/jamia.M2444.CrossRefPubMedPubMedCentral
15.
go back to reference Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan A, Peshkin L: Rapidly retargetable approaches to de-identification in medical records. J Am Med Inform Assoc. 2007, 14: 564-573. 10.1197/jamia.M2435.CrossRefPubMedPubMedCentral Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan A, Peshkin L: Rapidly retargetable approaches to de-identification in medical records. J Am Med Inform Assoc. 2007, 14: 564-573. 10.1197/jamia.M2435.CrossRefPubMedPubMedCentral
16.
go back to reference Thomas SM, Mamlin B, Schadow G, McDonald C: A successful technique for removing names in pathology reports using an augmented search and replace method. Proc AMIA Symp. 2002, 777-781. Thomas SM, Mamlin B, Schadow G, McDonald C: A successful technique for removing names in pathology reports using an augmented search and replace method. Proc AMIA Symp. 2002, 777-781.
17.
go back to reference Neamatullah I, Douglass MM, Lehman LH, Reisner A, Viallarroel M, Long WJ: Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making. 2008, 8: 10.1186/1472-6947-8-32. Neamatullah I, Douglass MM, Lehman LH, Reisner A, Viallarroel M, Long WJ: Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making. 2008, 8: 10.1186/1472-6947-8-32.
19.
go back to reference DMTI Spatial Inc: CanMap Route Logistics, Ontario Version (Street Names). 2003, Markham, ON, DMTI Spatial Inc DMTI Spatial Inc: CanMap Route Logistics, Ontario Version (Street Names). 2003, Markham, ON, DMTI Spatial Inc
22.
go back to reference DMTI Spatial Inc: Enhanced Points of Interest, Ontario Version (Business Names). 2006, Markham, ON, DMTI Spatial Inc DMTI Spatial Inc: Enhanced Points of Interest, Ontario Version (Business Names). 2006, Markham, ON, DMTI Spatial Inc
23.
go back to reference Sokolova M, El Emam K, Chowdhury S, Emilio N, Rose S, Jonker E: Evaluation of rare event detection. Springer, 2010. Advances in Artificial Intelligence. 2010, 23: 379-383. full_text. (Canadian Al 2010) Sokolova M, El Emam K, Chowdhury S, Emilio N, Rose S, Jonker E: Evaluation of rare event detection. Springer, 2010. Advances in Artificial Intelligence. 2010, 23: 379-383. full_text. (Canadian Al 2010)
24.
go back to reference Scott's Directories: Canadian Medical Directory. Don Mills, ON. 2006, 52 Scott's Directories: Canadian Medical Directory. Don Mills, ON. 2006, 52
26.
go back to reference Velupillai S, Dalianis H, Hassel M, Nilsson GH: Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial. International Journal of Medical Informatics. 2009, 78: e19-e26. 10.1016/j.ijmedinf.2009.04.005.CrossRefPubMed Velupillai S, Dalianis H, Hassel M, Nilsson GH: Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial. International Journal of Medical Informatics. 2009, 78: e19-e26. 10.1016/j.ijmedinf.2009.04.005.CrossRefPubMed
27.
go back to reference Grouin C, Rosier A, Dameron O, Zweigenbaum P: Testing tactics to localize de-identification. Medical Informatics in a United and Healthy Europe. IOS Press. 2009, 735-739. Grouin C, Rosier A, Dameron O, Zweigenbaum P: Testing tactics to localize de-identification. Medical Informatics in a United and Healthy Europe. IOS Press. 2009, 735-739.
Metadata
Title
De-identification of primary care electronic medical records free-text data in Ontario, Canada
Authors
Karen Tu
Julie Klein-Geltink
Tezeta F Mitiku
Chiriac Mihai
Joel Martin
Publication date
01-12-2010
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2010
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/1472-6947-10-35

Other articles of this Issue 1/2010

BMC Medical Informatics and Decision Making 1/2010 Go to the issue