Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2011

Open Access 01-12-2011 | Research article

The re-identification risk of Canadians from longitudinal demographics

Authors: Khaled El Emam, David Buckeridge, Robyn Tamblyn, Angelica Neisa, Elizabeth Jonker, Aman Verma

Published in: BMC Medical Informatics and Decision Making | Issue 1/2011

Login to get access

Abstract

Background

The public is less willing to allow their personal health information to be disclosed for research purposes if they do not trust researchers and how researchers manage their data. However, the public is more comfortable with their data being used for research if the risk of re-identification is low. There are few studies on the risk of re-identification of Canadians from their basic demographics, and no studies on their risk from their longitudinal data. Our objective was to estimate the risk of re-identification from the basic cross-sectional and longitudinal demographics of Canadians.

Methods

Uniqueness is a common measure of re-identification risk. Demographic data on a 25% random sample of the population of Montreal were analyzed to estimate population uniqueness on postal code, date of birth, and gender as well as their generalizations, for periods ranging from 1 year to 11 years.

Results

Almost 98% of the population was unique on full postal code, date of birth and gender: these three variables are effectively a unique identifier for Montrealers. Uniqueness increased for longitudinal data. Considerable generalization was required to reach acceptably low uniqueness levels, especially for longitudinal data. Detailed guidelines and disclosure policies on how to ensure that the re-identification risk is low are provided.

Conclusions

A large percentage of Montreal residents are unique on basic demographics. For non-longitudinal data sets, the three character postal code, gender, and month/year of birth represent sufficiently low re-identification risk. Data custodians need to generalize their demographic information further for longitudinal data sets.
Appendix
Available only for authorised users
Literature
1.
go back to reference Arzberger P, Schroeder P, Bealieu A, Bowker G, casey K, Laaksonen L, Moorman D, Uhlir P, Wouters P: Promoting access to public research data for scientific, economic, and social development. Data Science Journal. 2004, 3 (29): 135-152.CrossRef Arzberger P, Schroeder P, Bealieu A, Bowker G, casey K, Laaksonen L, Moorman D, Uhlir P, Wouters P: Promoting access to public research data for scientific, economic, and social development. Data Science Journal. 2004, 3 (29): 135-152.CrossRef
3.
go back to reference Wager L, Krieza-Jeric K: Report Of Public Reporting Of Clinical Trial Outcomes And Results (PROCTOR) Meeting. 2008, Canadian Institutes of Health Research Wager L, Krieza-Jeric K: Report Of Public Reporting Of Clinical Trial Outcomes And Results (PROCTOR) Meeting. 2008, Canadian Institutes of Health Research
4.
5.
go back to reference Hryanszkiewicz I, Norton M, Vickers A, Altman D: Preparing raw clinical data for publications: Guidance for journal editors, authors, and peer reviewers. BMJ. 2010, 340: c181-10.1136/bmj.c181.CrossRef Hryanszkiewicz I, Norton M, Vickers A, Altman D: Preparing raw clinical data for publications: Guidance for journal editors, authors, and peer reviewers. BMJ. 2010, 340: c181-10.1136/bmj.c181.CrossRef
6.
go back to reference Fienberg S, Martin M, Straf M: Sharing Research Data. 1985, Committee on National Statistics, National Research Council Fienberg S, Martin M, Straf M: Sharing Research Data. 1985, Committee on National Statistics, National Research Council
8.
go back to reference Are journals doing enough to prevent fraudulent publication?. CMAJ. 2006, 174 (4): 431- Are journals doing enough to prevent fraudulent publication?. CMAJ. 2006, 174 (4): 431-
9.
go back to reference Abraham K: Microdata access and labor market research: The US experience. Allegmeines Statistisches Archiv. 2005, 89: 121-139. 10.1007/s10182-005-0197-6.CrossRef Abraham K: Microdata access and labor market research: The US experience. Allegmeines Statistisches Archiv. 2005, 89: 121-139. 10.1007/s10182-005-0197-6.CrossRef
10.
go back to reference Vickers A: Whose data set is it anyway ? Sharing raw data from randomized trials. Trials. 2006, 7 (15): Vickers A: Whose data set is it anyway ? Sharing raw data from randomized trials. Trials. 2006, 7 (15):
13.
go back to reference Commission of the European Communities: On Scientific Information In The Digital Age: Access, Dissemination And Preservation. 2007 Commission of the European Communities: On Scientific Information In The Digital Age: Access, Dissemination And Preservation. 2007
14.
go back to reference Lowrance W: Access To Collections Of Data And Materials For Health Research: A Report To The Medical Research Council And The Wellcome Trust. 2006, Medical Research Council and the Wellcome Trust Lowrance W: Access To Collections Of Data And Materials For Health Research: A Report To The Medical Research Council And The Wellcome Trust. 2006, Medical Research Council and the Wellcome Trust
15.
go back to reference Yolles B, Connors J, Grufferman S: Obtaining access to data from government-sponsored medical research. NEJM. 1986, 315 (26): 1669-1672. 10.1056/NEJM198612253152608.CrossRefPubMed Yolles B, Connors J, Grufferman S: Obtaining access to data from government-sponsored medical research. NEJM. 1986, 315 (26): 1669-1672. 10.1056/NEJM198612253152608.CrossRefPubMed
16.
go back to reference Hogue C: Ethical issues in sharing epidemiologic data. Journal of Clinical Epidemiology. 1991, 44 (Suppl. I): 103S-107S.CrossRefPubMed Hogue C: Ethical issues in sharing epidemiologic data. Journal of Clinical Epidemiology. 1991, 44 (Suppl. I): 103S-107S.CrossRefPubMed
17.
go back to reference Hedrick T: Justifications for the sharing of social science data. Law and Human Behavior. 1988, 12 (2): 163-171. 10.1007/BF01073124.CrossRef Hedrick T: Justifications for the sharing of social science data. Law and Human Behavior. 1988, 12 (2): 163-171. 10.1007/BF01073124.CrossRef
18.
go back to reference Mackie C, Bradburn N: Improving Access To And Confidentiality Of Research Data: Report Of A Workshop. 2000, Washington: The National Academies Press Mackie C, Bradburn N: Improving Access To And Confidentiality Of Research Data: Report Of A Workshop. 2000, Washington: The National Academies Press
19.
go back to reference Boyko E: Why disseminate microdata? United Nations Economic and Social Commission for Asia and the Pacific Workshop on Census and Survey Microdata. 2008 Boyko E: Why disseminate microdata? United Nations Economic and Social Commission for Asia and the Pacific Workshop on Census and Survey Microdata. 2008
21.
go back to reference Nass S, Levit L, Gostin L, eds: Beyond the HIPAA Privacy Rule: Enhancing privacy, improving health through research. 2009, National Academies Press: Washington, DC Nass S, Levit L, Gostin L, eds: Beyond the HIPAA Privacy Rule: Enhancing privacy, improving health through research. 2009, National Academies Press: Washington, DC
22.
go back to reference Damschroder L, Pritts J, Neblo M, Kalarickal R, Creswell J, Hayward R: Patients, privacy and trust: Patients' willingness to allow researchers to access their medical records. Social Science & Medicine. 2007, 64: 223-235. 10.1016/j.socscimed.2006.08.045.CrossRef Damschroder L, Pritts J, Neblo M, Kalarickal R, Creswell J, Hayward R: Patients, privacy and trust: Patients' willingness to allow researchers to access their medical records. Social Science & Medicine. 2007, 64: 223-235. 10.1016/j.socscimed.2006.08.045.CrossRef
23.
go back to reference Mayer TS: Privacy and Confidentiality Research and the US Census Bureau: Recommendations based on a review of the literature. 2002, US Bureau of the Census: Washington, DC Mayer TS: Privacy and Confidentiality Research and the US Census Bureau: Recommendations based on a review of the literature. 2002, US Bureau of the Census: Washington, DC
24.
go back to reference Singer E, van Hoewyk J, Neugebauer RJ: Attitudes and Behaviour: The impact of privacy and confidentiality concerns on participation in the 2000 census. Public Opinion Quarterly. 2003, 67: 368-384. 10.1086/377465.CrossRef Singer E, van Hoewyk J, Neugebauer RJ: Attitudes and Behaviour: The impact of privacy and confidentiality concerns on participation in the 2000 census. Public Opinion Quarterly. 2003, 67: 368-384. 10.1086/377465.CrossRef
25.
go back to reference Council National Research : Privacy and Confidentiality as Factors in Survey Response. 1979, Washington: National Academy of Sciences Council National Research : Privacy and Confidentiality as Factors in Survey Response. 1979, Washington: National Academy of Sciences
26.
go back to reference Martin E: Privacy Concerns and the Census Long Form: Some Evidence From Census 2000. Annual Meeting of the American Statistical Association. 2001, Washington, DC Martin E: Privacy Concerns and the Census Long Form: Some Evidence From Census 2000. Annual Meeting of the American Statistical Association. 2001, Washington, DC
27.
go back to reference Kosseim P, Brady M: Policy by procrastination: Secondary use of electronic health records for health research purposes. 2008, McGill Journal of Law and Health, 2- Kosseim P, Brady M: Policy by procrastination: Secondary use of electronic health records for health research purposes. 2008, McGill Journal of Law and Health, 2-
28.
go back to reference Lowrance W: Learning from experience: Privacy and the secondary use of data in health research. Journal of Health Services Research and Policy. 2003, 8 (S1): 2-7.CrossRefPubMed Lowrance W: Learning from experience: Privacy and the secondary use of data in health research. Journal of Health Services Research and Policy. 2003, 8 (S1): 2-7.CrossRefPubMed
29.
go back to reference Pullman D: Sorry, you can't have that information: Stakeholder awareness, perceptions and concerns regarding the disclosure and use of personal health information. e-Health 2006. 2006 Pullman D: Sorry, you can't have that information: Stakeholder awareness, perceptions and concerns regarding the disclosure and use of personal health information. e-Health 2006. 2006
30.
go back to reference OIPC Stakeholder Survey, 2003: Highlights Report. 2003, GPC Research OIPC Stakeholder Survey, 2003: Highlights Report. 2003, GPC Research
31.
go back to reference Willison D, Schwartz L, Abelson J, Charles C, Swinton M, Northrup D, Thabane L: Alternatives to project-specific consent for access to personal information for health research: What is the opinion of the Canadian public ?. Journal of the American Medical Informatics Association. 2007, 14: 706-712. 10.1197/jamia.M2457.CrossRefPubMedPubMedCentral Willison D, Schwartz L, Abelson J, Charles C, Swinton M, Northrup D, Thabane L: Alternatives to project-specific consent for access to personal information for health research: What is the opinion of the Canadian public ?. Journal of the American Medical Informatics Association. 2007, 14: 706-712. 10.1197/jamia.M2457.CrossRefPubMedPubMedCentral
32.
go back to reference Nair K, Willison D, Holbrook A, Keshavjee K: Patients' consent preferences regarding the use of their health information for research purposes: A qualitative study. Journal of Health Services Research & Policy. 2004, 9 (1): 22-27. 10.1258/135581904322716076.CrossRef Nair K, Willison D, Holbrook A, Keshavjee K: Patients' consent preferences regarding the use of their health information for research purposes: A qualitative study. Journal of Health Services Research & Policy. 2004, 9 (1): 22-27. 10.1258/135581904322716076.CrossRef
33.
go back to reference Kass N, Natowicz M, Hull S: The use of medical records in research: What do patients want?. Journal of Law, Medicine and Ethics. 2003, 31: 429-33. 10.1111/j.1748-720X.2003.tb00105.x.CrossRefPubMedPubMedCentral Kass N, Natowicz M, Hull S: The use of medical records in research: What do patients want?. Journal of Law, Medicine and Ethics. 2003, 31: 429-33. 10.1111/j.1748-720X.2003.tb00105.x.CrossRefPubMedPubMedCentral
34.
go back to reference Whiddett R, Hunter I, Engelbrecht J, Handy J: Patients' attitudes towards sharing their health information. International Journal of Medical Informatics. 2006, 75: 530-41. 10.1016/j.ijmedinf.2005.08.009.CrossRefPubMed Whiddett R, Hunter I, Engelbrecht J, Handy J: Patients' attitudes towards sharing their health information. International Journal of Medical Informatics. 2006, 75: 530-41. 10.1016/j.ijmedinf.2005.08.009.CrossRefPubMed
36.
go back to reference Willison D, Emerson C, Szala-Meneok K, Gibson E, Schwartz L, Weisbaum K: Access to medical records for research purposes: Varying perceptions across Research Ethics Boards. Journal of Medical Ethics. 2008, 34: 308-314. 10.1136/jme.2006.020032.CrossRefPubMed Willison D, Emerson C, Szala-Meneok K, Gibson E, Schwartz L, Weisbaum K: Access to medical records for research purposes: Varying perceptions across Research Ethics Boards. Journal of Medical Ethics. 2008, 34: 308-314. 10.1136/jme.2006.020032.CrossRefPubMed
37.
go back to reference Panel on Research Ethics: Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans. 2009, Draft, 2 Panel on Research Ethics: Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans. 2009, Draft, 2
38.
go back to reference Bethlehem J, Keller W, Pannekoek J: Disclosure control of microdata. Journal of the American Statistical Association. 1990, 85 (409): 38-45. 10.2307/2289523.CrossRef Bethlehem J, Keller W, Pannekoek J: Disclosure control of microdata. Journal of the American Statistical Association. 1990, 85 (409): 38-45. 10.2307/2289523.CrossRef
39.
go back to reference Sweeney L: Uniqueness of Simple Demographics in the US Population. 2000, Carnegie Mellon University, Laboratory for International Data Privacy Sweeney L: Uniqueness of Simple Demographics in the US Population. 2000, Carnegie Mellon University, Laboratory for International Data Privacy
40.
go back to reference El Emam K, Brown A, Abdelmalik P: Evaluating Predictors of Geographic Area Population Size Cutoffs to Manage Re-identification Risk. Journal of the American Medical Informatics Association. 2009, 16 (2): 256-266. 10.1197/jamia.M2902. [PMID: 19074299]CrossRefPubMedPubMedCentral El Emam K, Brown A, Abdelmalik P: Evaluating Predictors of Geographic Area Population Size Cutoffs to Manage Re-identification Risk. Journal of the American Medical Informatics Association. 2009, 16 (2): 256-266. 10.1197/jamia.M2902. [PMID: 19074299]CrossRefPubMedPubMedCentral
41.
go back to reference Golle P: Revisiting the uniqueness of simple demographics in the US population. Workshop on Privacy in the Electronic Society. 2006 Golle P: Revisiting the uniqueness of simple demographics in the US population. Workshop on Privacy in the Electronic Society. 2006
42.
go back to reference El Emam K, Brown A, AbdelMalik P, Neisa A, Walker M, Bottomley J, Roffey T: A method for managing re-identification risk from small geographic areas in Canada. BMC Medical Informatics and Decision Making. 2010, 10 (18): El Emam K, Brown A, AbdelMalik P, Neisa A, Walker M, Bottomley J, Roffey T: A method for managing re-identification risk from small geographic areas in Canada. BMC Medical Informatics and Decision Making. 2010, 10 (18):
43.
go back to reference Koot M, Noordende G, de Laat C: A study on the re-identifiability of Dutch citizens. Workshop on Privacy Enhancing Technologies (PET 2010). 2010 Koot M, Noordende G, de Laat C: A study on the re-identifiability of Dutch citizens. Workshop on Privacy Enhancing Technologies (PET 2010). 2010
44.
go back to reference Benitez K, Malin B: Evaluating re-identification risks with respect to the HIPAA privacy rule. Journal of the American Medical Informatics Association. 2010, 17 (2): 169-177. 10.1136/jamia.2009.000026.CrossRefPubMedPubMedCentral Benitez K, Malin B: Evaluating re-identification risks with respect to the HIPAA privacy rule. Journal of the American Medical Informatics Association. 2010, 17 (2): 169-177. 10.1136/jamia.2009.000026.CrossRefPubMedPubMedCentral
45.
go back to reference Kosseim P, Kardash A, Penta A: Compendium of Canadian Legislation Respecting the Protection of Personal Information in Health Research. 2005, Canadian Institutes of Health Research Kosseim P, Kardash A, Penta A: Compendium of Canadian Legislation Respecting the Protection of Personal Information in Health Research. 2005, Canadian Institutes of Health Research
46.
go back to reference Clause S, Triller D, Bornhorst C, Hamilton R, Cosler L: Conforming to HIPAA regulations and compilation of research data. American Journal of Health-System Pharmacy. 2004, 61 (10): 1025-1031.PubMed Clause S, Triller D, Bornhorst C, Hamilton R, Cosler L: Conforming to HIPAA regulations and compilation of research data. American Journal of Health-System Pharmacy. 2004, 61 (10): 1025-1031.PubMed
47.
go back to reference Zayatz L: Estimation of the percent of unique population elements on a microdata file using the sample. 1991, US Bureau of the Census: Washington Zayatz L: Estimation of the percent of unique population elements on a microdata file using the sample. 1991, US Bureau of the Census: Washington
48.
go back to reference National Committee on Vital and Health Statistics: Report to the Secretary of the US Department of Health and Human Services on Enhanced Protections for Uses of Health Data: A Stewardship Framework for "Secondary Uses" of Electronically Collected and Transmitted Health Data. 2007 National Committee on Vital and Health Statistics: Report to the Secretary of the US Department of Health and Human Services on Enhanced Protections for Uses of Health Data: A Stewardship Framework for "Secondary Uses" of Electronically Collected and Transmitted Health Data. 2007
51.
go back to reference Howe H, Lake A, Shen T: Method to assess identifiability in electronic data files. American Journal of Epidemiology. 2007, 165 (5): 597-601.CrossRefPubMed Howe H, Lake A, Shen T: Method to assess identifiability in electronic data files. American Journal of Epidemiology. 2007, 165 (5): 597-601.CrossRefPubMed
52.
go back to reference Howe H, Lake A, Lehnherr M, Roney D: Unique Record Identification On Public Use Files As Tested On The 1994-1998 CINA Analytic File. North American Association of Central Cancer Registries. 2002 Howe H, Lake A, Lehnherr M, Roney D: Unique Record Identification On Public Use Files As Tested On The 1994-1998 CINA Analytic File. North American Association of Central Cancer Registries. 2002
53.
go back to reference El Emam K: Heuristics for de-identifying health data. IEEE Security and Privacy. 2008, 72-75. El Emam K: Heuristics for de-identifying health data. IEEE Security and Privacy. 2008, 72-75.
54.
go back to reference Skinner G, Elliot M: A measure of disclosure risk for microdata. Journal of the Royal Statistical Society (Series B). 2002, 64 (Part 4): 855-867.CrossRef Skinner G, Elliot M: A measure of disclosure risk for microdata. Journal of the Royal Statistical Society (Series B). 2002, 64 (Part 4): 855-867.CrossRef
55.
go back to reference El Emam K, Dankar F: Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association. 2008, 15: 627-637. 10.1197/jamia.M2716.CrossRefPubMedPubMedCentral El Emam K, Dankar F: Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association. 2008, 15: 627-637. 10.1197/jamia.M2716.CrossRefPubMedPubMedCentral
56.
go back to reference Pong R, Pitblado J: Don't take geography for granted ! Some methodological issues in measuring geographic distribution of physicians. Canadian Journal of Rural Medicine. 2001, 6 (2): 103-112. Pong R, Pitblado J: Don't take geography for granted ! Some methodological issues in measuring geographic distribution of physicians. Canadian Journal of Rural Medicine. 2001, 6 (2): 103-112.
57.
go back to reference Malin B, Sweeney L, Newton E: Trail re-identification: Learning who you are from where you have been. 2003, Carnegie Mellon University Malin B, Sweeney L, Newton E: Trail re-identification: Learning who you are from where you have been. 2003, Carnegie Mellon University
58.
go back to reference Malin B, Sweeney L: Re-identification of DNA through an automated linkage process. Proceedings of the American Medical Informatics Association Annual Symposium. 2001 Malin B, Sweeney L: Re-identification of DNA through an automated linkage process. Proceedings of the American Medical Informatics Association Annual Symposium. 2001
59.
go back to reference Golle P, Partridge K: On the anonymity of home/work location pairs. Seventh International Conference on Pervasive Computing. 2009 Golle P, Partridge K: On the anonymity of home/work location pairs. Seventh International Conference on Pervasive Computing. 2009
60.
go back to reference Krumm J: Inference attacks on location tracks. Fifth International Conference on Pervasive Computing. 2007 Krumm J: Inference attacks on location tracks. Fifth International Conference on Pervasive Computing. 2007
Metadata
Title
The re-identification risk of Canadians from longitudinal demographics
Authors
Khaled El Emam
David Buckeridge
Robyn Tamblyn
Angelica Neisa
Elizabeth Jonker
Aman Verma
Publication date
01-12-2011
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2011
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/1472-6947-11-46

Other articles of this Issue 1/2011

BMC Medical Informatics and Decision Making 1/2011 Go to the issue