Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-12-2010 | Research article

De-identification of primary care electronic medical records free-text data in Ontario, Canada

Authors: Karen Tu, Julie Klein-Geltink, Tezeta F Mitiku, Chiriac Mihai, Joel Martin

Published in: BMC Medical Informatics and Decision Making | Issue 1/2010

Abstract

Background

Electronic medical records (EMRs) represent a potentially rich source of health information for research but the free-text in EMRs often contains identifying information. While de-identification tools have been developed for free-text, none have been developed or tested for the full range of primary care EMR data

Methods

We used deid open source de-identification software and modified it for an Ontario context for use on primary care EMR data. We developed the modified program on a training set of 1000 free-text records from one group practice and then tested it on two validation sets from a random sample of 700 free-text EMR records from 17 different physicians from 7 different practices in 5 different cities and 500 free-text records from a group practice that was in a different city than the group practice that was used for the training set. We measured the sensitivity/recall, precision, specificity, accuracy and F-measure of the modified tool against manually tagged free-text records to remove patient and physician names, locations, addresses, medical record, health card and telephone numbers.

Results

We found that the modified training program performed with a sensitivity of 88.3%, specificity of 91.4%, precision of 91.3%, accuracy of 89.9% and F-measure of 0.90. The validations sets had sensitivities of 86.7% and 80.2%, specificities of 91.4% and 87.7%, precisions of 91.1% and 87.4%, accuracies of 89.0% and 83.8% and F-measures of 0.89 and 0.84 for the first and second validation sets respectively.

Conclusion

The deid program can be modified to reasonably accurately de-identify free-text primary care EMR records while preserving clinical content.

Report of the WHO Global Observatory for eHealth: Building foundations for eHealth: progress of member states: report of the Global Observatory for eHealth. 2006, Geneva, WHO Press

Canada Health Infoway. [http://www.infoway-inforoute.ca/lang-en]

Mitiku T, Tu K: Using data from electronic medical records: theory versus practice. Healthcare Quarterly. 2008, 11: 19-21.CrossRef

Personal Health Information Protection Act, 2004. S.O. 2004, c.3, Schedule A. [http://www.e-laws.gov.on.ca/html/statutes/english/elaws_statutes_04p03_e.htm]

Privacy Code: Protecting Personal Health Information at ICES. [http://www.ices.on.ca/file/ICES%20Privacy%20Code%20Version%204.pdf]

Taira RK, Bui AA, Kangarloo H: Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp. 2002, 757-761.

Sweeney L: Replacing personally-identifying information in medical records, the Scrub system. Proc AMIA Annu Fall Symp. 1996, 333-337.

Berman JJ: Concept-match medical data scrubbing. How pathology text can be used in research. Arch Pathol Lab Med. 2003, 127: 680-686.PubMed

Gupta D, Saul M, Gilbertson J: Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol. 2004, 121: 176-186. 10.1309/E6K33GBPE5C27FYU.CrossRefPubMed

10.

Beckwith BA, Mahaadevan R, Balis UJ, Kuo F: Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Med Inform Decis Mak. 2006, 6: 12-10.1186/1472-6947-6-12.CrossRefPubMedPubMedCentral

11.

Uzuner O, Sibanda T, Luo Y, Szolovits : A de-identifier for medical discharge summaries. Artificial Intelligence in Medicine. 2008, 42: 13-35. 10.1016/j.artmed.2007.10.001.CrossRefPubMed

12.

Role of local contect in automatic deidentification of ungrammatical, fragmented text. Proceedings of the North American Chapter of Association for Computational Linguistics/Human Language Technology (NAACL-HLT 2006) New York, NY, June 5-7. 2006, 65-73.

13.

Szarvas G, Farkas R, Busa-Fekete R: State-of-the-art anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc. 2007, 14: 574-580. 10.1197/jamia.M2441.CrossRefPubMedPubMedCentral

14.

Uzuner O, Luo Y, Szolovits P: Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007, 14: 550-563. 10.1197/jamia.M2444.CrossRefPubMedPubMedCentral

15.

Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan A, Peshkin L: Rapidly retargetable approaches to de-identification in medical records. J Am Med Inform Assoc. 2007, 14: 564-573. 10.1197/jamia.M2435.CrossRefPubMedPubMedCentral

16.

Thomas SM, Mamlin B, Schadow G, McDonald C: A successful technique for removing names in pathology reports using an augmented search and replace method. Proc AMIA Symp. 2002, 777-781.

17.

Neamatullah I, Douglass MM, Lehman LH, Reisner A, Viallarroel M, Long WJ: Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making. 2008, 8: 10.1186/1472-6947-8-32.

18.

Ontario MD Funding Eligible CMS Offerings/EMR Advisor. [https://www.emradvisor.ca/node/253]

19.

DMTI Spatial Inc: CanMap Route Logistics, Ontario Version (Street Names). 2003, Markham, ON, DMTI Spatial Inc

20.

Land Information Data. [https://www.applio.lrc.gov.on.ca/lidslogin/SecureLogin.asp?SessionID=196516501]

21.

2008 Master Numbering System. [http://www.health.gov.on.ca/english/public/pub/ministry_reports/master_numsys/master_numsys08.html]

22.

DMTI Spatial Inc: Enhanced Points of Interest, Ontario Version (Business Names). 2006, Markham, ON, DMTI Spatial Inc

23.

Sokolova M, El Emam K, Chowdhury S, Emilio N, Rose S, Jonker E: Evaluation of rare event detection. Springer, 2010. Advances in Artificial Intelligence. 2010, 23: 379-383. full_text. (Canadian Al 2010)

24.

Scott's Directories: Canadian Medical Directory. Don Mills, ON. 2006, 52

25.

Who Named It? Eponyms A-Z. [http://www.whonamedit.com/azeponyms.cfm/A.html]

26.

Velupillai S, Dalianis H, Hassel M, Nilsson GH: Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial. International Journal of Medical Informatics. 2009, 78: e19-e26. 10.1016/j.ijmedinf.2009.04.005.CrossRefPubMed

27.

Grouin C, Rosier A, Dameron O, Zweigenbaum P: Testing tactics to localize de-identification. Medical Informatics in a United and Healthy Europe. IOS Press. 2009, 735-739.

The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/10/35/prepub

Title: De-identification of primary care electronic medical records free-text data in Ontario, Canada
Authors: Karen Tu
Julie Klein-Geltink
Tezeta F Mitiku
Chiriac Mihai
Joel Martin
Publication date: 01-12-2010
Publisher: BioMed Central
Published in: BMC Medical Informatics and Decision Making / Issue 1/2010
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/1472-6947-10-35

2024 ASCO Annual Meeting

Springer Medicine

De-identification of primary care electronic medical records free-text data in Ontario, Canada

Abstract

Background

Methods

Results

Conclusion

2024 ASCO Annual Meeting

Springer Medicine

Abstract

Background

Methods

Results

Conclusion

Please log in to get access to this content

Other articles of this Issue 1/2010

Essential pre-treatment imaging examinations in patients with endoscopically-diagnosed early gastric cancer

A bootstrap approach for assessing the uncertainty of outcome probabilities when using a scoring system

Data-driven approach for creating synthetic electronic medical records

ExaCT: automatic extraction of clinical trial characteristics from journal publications

Determining correspondences between high-frequency MedDRA concepts and SNOMED: a case study

A method for encoding clinical datasets with SNOMED CT