Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2011

Open Access 01-12-2011 | Research article

De-identifying a public use microdata file from the Canadian national discharge abstract database

Authors: Khaled El Emam, David Paton, Fida Dankar, Gunes Koru

Published in: BMC Medical Informatics and Decision Making | Issue 1/2011

Login to get access

Abstract

Background

The Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records.

Methods

Plausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy.

Results

Two different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression.

Conclusions

The strategies we used to maximize data utility and minimize information loss can result in a PUMF that would be useful for the specific purposes noted earlier. However, to create a more detailed file with less information loss suitable for more complex health services research, the risk would need to be mitigated by requiring the data recipient to commit to a data sharing agreement.
Appendix
Available only for authorised users
Literature
1.
go back to reference Fienberg S, Martin M, Straf M: Sharing Research Data. 1985, Committee on National Statistics, National Research Council Fienberg S, Martin M, Straf M: Sharing Research Data. 1985, Committee on National Statistics, National Research Council
2.
go back to reference Hutchon D: Publishing raw data and real time statistical analysis on e-journals. British Medical Journal. 2001, 322 (3): 530-PubMedPubMedCentral Hutchon D: Publishing raw data and real time statistical analysis on e-journals. British Medical Journal. 2001, 322 (3): 530-PubMedPubMedCentral
3.
go back to reference Are journals doing enough to prevent fraudulent publication?. Canadian Medical Association Journal. 2006, 174 (4): 431- Are journals doing enough to prevent fraudulent publication?. Canadian Medical Association Journal. 2006, 174 (4): 431-
4.
go back to reference Abraham K: Microdata access and labor market research: The US experience. Allegmeines Statistisches Archiv. 2005, 89: 121-139. 10.1007/s10182-005-0197-6. Abraham K: Microdata access and labor market research: The US experience. Allegmeines Statistisches Archiv. 2005, 89: 121-139. 10.1007/s10182-005-0197-6.
5.
go back to reference Vickers A: Whose data set is it anyway?. Sharing raw data from randomized trials. Trials. 2006, 7 (15): Vickers A: Whose data set is it anyway?. Sharing raw data from randomized trials. Trials. 2006, 7 (15):
9.
go back to reference Commission of the European Communities. On scientific information in the digital age: Access, dissemination and preservation. 2007 Commission of the European Communities. On scientific information in the digital age: Access, dissemination and preservation. 2007
10.
go back to reference Lowrance W: Access to collections of data and materials for health research: A report to the Medical Research Council and the Wellcome Trust. Medical Research Council and the Wellcome Trust. 2006 Lowrance W: Access to collections of data and materials for health research: A report to the Medical Research Council and the Wellcome Trust. Medical Research Council and the Wellcome Trust. 2006
11.
go back to reference Yolles B, Connors J, Grufferman S: Obtaining access to data from government-sponsored medical research. NEJM. 1986, 315 (26): 1669-1672. 10.1056/NEJM198612253152608.PubMed Yolles B, Connors J, Grufferman S: Obtaining access to data from government-sponsored medical research. NEJM. 1986, 315 (26): 1669-1672. 10.1056/NEJM198612253152608.PubMed
12.
go back to reference Hogue C: Ethical issues in sharing epidemiologic data. Journal of Clinical Epidemiology. 1991, 44 (Suppl I): 103S-107S.PubMed Hogue C: Ethical issues in sharing epidemiologic data. Journal of Clinical Epidemiology. 1991, 44 (Suppl I): 103S-107S.PubMed
13.
go back to reference Hedrick T: Justifications for the sharing of social science data. Law and Human Behavior. 1988, 12 (2): 163-171. 10.1007/BF01073124. Hedrick T: Justifications for the sharing of social science data. Law and Human Behavior. 1988, 12 (2): 163-171. 10.1007/BF01073124.
14.
go back to reference Mackie C, Bradburn N: Improving access to and confidentiality of research data: Report of a workshop. 2000, Washington: The National Academies Press Mackie C, Bradburn N: Improving access to and confidentiality of research data: Report of a workshop. 2000, Washington: The National Academies Press
15.
go back to reference Boyko E: Why disseminate microdata?. United Nations Economic and social Commission for Asia and the Pacific Workshop on Census and Survey Microdata. 2008 Boyko E: Why disseminate microdata?. United Nations Economic and social Commission for Asia and the Pacific Workshop on Census and Survey Microdata. 2008
16.
go back to reference Winkler W: Producing Public-Use Microdata That Are Analytically Valid And Confidential. US Census Bureau. 1997 Winkler W: Producing Public-Use Microdata That Are Analytically Valid And Confidential. US Census Bureau. 1997
17.
go back to reference Statistics Canada: 2001 Census Public Use Microdata File: Individuals file user documentation. 2001 Statistics Canada: 2001 Census Public Use Microdata File: Individuals file user documentation. 2001
18.
go back to reference Dale A, Elliot M: Proposals for the 2001 samples of anonymized records: An assessment of disclosure risk. Journal of the Royal Statistical Society. 2001, 164 (3): 427-447. 10.1111/1467-985X.00212. Dale A, Elliot M: Proposals for the 2001 samples of anonymized records: An assessment of disclosure risk. Journal of the Royal Statistical Society. 2001, 164 (3): 427-447. 10.1111/1467-985X.00212.
19.
go back to reference Marsh C, Skinner C, Arber S, Penhale B, Openshaw S, Hobcraft J, Lievesley D, Walford N: The case for samples of anonymized records from the 1991 census. Journal of the Royal Statistical Society, Series A (Statistics in Society). 1991, 154 (2): 305-340. 10.2307/2983043. Marsh C, Skinner C, Arber S, Penhale B, Openshaw S, Hobcraft J, Lievesley D, Walford N: The case for samples of anonymized records from the 1991 census. Journal of the Royal Statistical Society, Series A (Statistics in Society). 1991, 154 (2): 305-340. 10.2307/2983043.
21.
go back to reference Marsh C, Dale A, Skinner C: Safe data versus safe settings: Access to microdata from the British census. International Statistical Review. 1994, 62 (1): 35-53. 10.2307/1403544. Marsh C, Dale A, Skinner C: Safe data versus safe settings: Access to microdata from the British census. International Statistical Review. 1994, 62 (1): 35-53. 10.2307/1403544.
22.
go back to reference Alexander L, Jabine T: Access to social security microdata files for research and statistical purposes. Social Security Bulletin. 1978, 41 (8): 3-17.PubMed Alexander L, Jabine T: Access to social security microdata files for research and statistical purposes. Social Security Bulletin. 1978, 41 (8): 3-17.PubMed
23.
go back to reference Department of Health and Human Services: Comparative Effectiveness Research Public Use Data Pilot Project. 2010 Department of Health and Human Services: Comparative Effectiveness Research Public Use Data Pilot Project. 2010
24.
go back to reference Department of Health and Human Services: CER-Public Use Data Pilot Project. 2010 Department of Health and Human Services: CER-Public Use Data Pilot Project. 2010
25.
go back to reference Consumer-Purchaser Disclosure Project. The state experience in health quality data collection. 2004 Consumer-Purchaser Disclosure Project. The state experience in health quality data collection. 2004
26.
go back to reference Agency for Healthcare Research and Quality: Healthcare cost and utilization project: SID/SASD/SEDD Application Kit. 2010 Agency for Healthcare Research and Quality: Healthcare cost and utilization project: SID/SASD/SEDD Application Kit. 2010
27.
go back to reference Canadian Institute for Health Information: Data Quality Documentation, Discharge Abstract Database. 2009-2010: Executive Summary. 2010 Canadian Institute for Health Information: Data Quality Documentation, Discharge Abstract Database. 2009-2010: Executive Summary. 2010
28.
go back to reference Canadian Institute for Health Information: Privacy and security framework. 2010 Canadian Institute for Health Information: Privacy and security framework. 2010
29.
go back to reference Schoenman J, Sutton J, Kintala S, Love D, Maw R: The value of hospital discharge databases. Agency for Healthcare Research and Quality. 2005 Schoenman J, Sutton J, Kintala S, Love D, Maw R: The value of hospital discharge databases. Agency for Healthcare Research and Quality. 2005
30.
go back to reference Samarati P: Protecting respondents identities in microdata release. Knowledge and Data Engineering, IEEE Transactions on. 2001, 13 (6): 1010-1027. 10.1109/69.971193. [10.1109/69.971193] Samarati P: Protecting respondents identities in microdata release. Knowledge and Data Engineering, IEEE Transactions on. 2001, 13 (6): 1010-1027. 10.1109/69.971193. [10.1109/69.971193]
31.
go back to reference Sweeney L: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems. 2002, 10 (5): 557-570. 10.1142/S0218488502001648. Sweeney L: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems. 2002, 10 (5): 557-570. 10.1142/S0218488502001648.
32.
go back to reference Skinner C: On identification dislcosure and prediction disclosure for microdata. Statistics Neerlandica. 1992, 46 (1): 21-32. 10.1111/j.1467-9574.1992.tb01324.x. Skinner C: On identification dislcosure and prediction disclosure for microdata. Statistics Neerlandica. 1992, 46 (1): 21-32. 10.1111/j.1467-9574.1992.tb01324.x.
33.
go back to reference Subcommittee on Disclosure Limitation Methodology - Federal Committee on Statistical Methodology. Statistical Policy Working paper 22: Report on statistical disclosure control. Statistical Policy Office, Office of Information and Regulatory Affairs, Office of Management and Budget. 1994 Subcommittee on Disclosure Limitation Methodology - Federal Committee on Statistical Methodology. Statistical Policy Working paper 22: Report on statistical disclosure control. Statistical Policy Office, Office of Information and Regulatory Affairs, Office of Management and Budget. 1994
34.
go back to reference Machanavajjhala A, Gehrke J, Kifer D: l-Diversity: Privacy Beyond k-Anonymity. Transactions on Knowledge Discovery from Data. 2007, 1 (1): 1-47. 10.1145/1217299.1217300. Machanavajjhala A, Gehrke J, Kifer D: l-Diversity: Privacy Beyond k-Anonymity. Transactions on Knowledge Discovery from Data. 2007, 1 (1): 1-47. 10.1145/1217299.1217300.
35.
go back to reference Hansell S: AOL Removes Search Data on Group of Web Users. 2006, New York Times Hansell S: AOL Removes Search Data on Group of Web Users. 2006, New York Times
36.
go back to reference Barbaro M, Zeller T: A Face Is Exposed for AOL Searcher No. 4417749. New York Times. 2006 Barbaro M, Zeller T: A Face Is Exposed for AOL Searcher No. 4417749. New York Times. 2006
37.
go back to reference Zeller T: AOL Moves to Increase Privacy on Search Queries New York Times. 2006 Zeller T: AOL Moves to Increase Privacy on Search Queries New York Times. 2006
38.
go back to reference Ochoa S, Rasmussen J, Robson C, Salib M: Reidentification of individuals in Chicago's homicide database: A technical and legal study. Massachusetts Institute of Technology. 2001 Ochoa S, Rasmussen J, Robson C, Salib M: Reidentification of individuals in Chicago's homicide database: A technical and legal study. Massachusetts Institute of Technology. 2001
39.
go back to reference Narayanan A, Shmatikov V: Robust de-anonymization of large datasets (how to break anonymity of the Netflix prize dataset). 2008, University of Texas at Austin Narayanan A, Shmatikov V: Robust de-anonymization of large datasets (how to break anonymity of the Netflix prize dataset). 2008, University of Texas at Austin
40.
go back to reference Appellate Court of Illinois - Fifth District. The Southern Illinoisan v. Department of Public Health. 2004 Appellate Court of Illinois - Fifth District. The Southern Illinoisan v. Department of Public Health. 2004
42.
go back to reference Federal Court (Canada): Mike Gordon vs. The Minister of Health: Affidavit of Bill Wilson. 2006 Federal Court (Canada): Mike Gordon vs. The Minister of Health: Affidavit of Bill Wilson. 2006
43.
go back to reference El Emam K, Kosseim P: Privacy Interests in Prescription Records, Part 2: Patient Privacy. IEEE Security and Privacy. 2009, 7 (2): 75-78. El Emam K, Kosseim P: Privacy Interests in Prescription Records, Part 2: Patient Privacy. IEEE Security and Privacy. 2009, 7 (2): 75-78.
44.
go back to reference Lafky D: The Safe Harbor method of de-identification: An empirical test. Fourth National HIPAA Summit West. 2010 Lafky D: The Safe Harbor method of de-identification: An empirical test. Fourth National HIPAA Summit West. 2010
45.
go back to reference Fung BCM, Wang K, Chen R, Yu PS: Privacy-Preserving Data Publishing: A Survey of Recent Developments. ACM Computing Surveys. 2010, 42 (4): Fung BCM, Wang K, Chen R, Yu PS: Privacy-Preserving Data Publishing: A Survey of Recent Developments. ACM Computing Surveys. 2010, 42 (4):
46.
go back to reference Chen B-C, Kifer D, LeFevre K, Machanavajjhala A: Privacy preserving data publishing. Foundations and Trends in Databases. 2009, 2 (1-2): 1-167. Chen B-C, Kifer D, LeFevre K, Machanavajjhala A: Privacy preserving data publishing. Foundations and Trends in Databases. 2009, 2 (1-2): 1-167.
47.
go back to reference Alexander LA, Jabine TB: Access to social security microdata files for research and statistical purposes. Social Security Bulletin. 1978, 41 (8): 3-17.PubMed Alexander LA, Jabine TB: Access to social security microdata files for research and statistical purposes. Social Security Bulletin. 1978, 41 (8): 3-17.PubMed
48.
go back to reference de Waal T, Willenborg L: A view on statistical disclosure control for microdata. Survey Methodology. 1996, 22 (1): 95-103. de Waal T, Willenborg L: A view on statistical disclosure control for microdata. Survey Methodology. 1996, 22 (1): 95-103.
49.
go back to reference Willenborg L, de Waal T: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. 2001, 155: Springer Willenborg L, de Waal T: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. 2001, 155: Springer
50.
go back to reference Domingo-Ferrer J, Torra Vc: Theory and Practical Applications for Statistical Agencies. North-Holland: Amsterdam. 2002, 113-134. Domingo-Ferrer J, Torra Vc: Theory and Practical Applications for Statistical Agencies. North-Holland: Amsterdam. 2002, 113-134.
51.
go back to reference Skinner C, Elliot M: A measure of disclosure risk for microdata. Journal of the Royal Statistical Society Series B-Statistical Methodology. 2002, 64: 855--867. 10.1111/1467-9868.00365. Skinner C, Elliot M: A measure of disclosure risk for microdata. Journal of the Royal Statistical Society Series B-Statistical Methodology. 2002, 64: 855--867. 10.1111/1467-9868.00365.
52.
go back to reference Domingo-Ferrer J, Torra Vc: Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation. Data Min Knowl Discov. 2005, 11: 195-212. 10.1007/s10618-005-0007-5. [10.1007/s10618-005-0007-5] Domingo-Ferrer J, Torra Vc: Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation. Data Min Knowl Discov. 2005, 11: 195-212. 10.1007/s10618-005-0007-5. [10.1007/s10618-005-0007-5]
53.
go back to reference Templ M: Statistical Disclosure Control for Microdata Using the R-Package sdcMicro. Trans Data Privacy. 2008, 1: 67--85. Templ M: Statistical Disclosure Control for Microdata Using the R-Package sdcMicro. Trans Data Privacy. 2008, 1: 67--85.
54.
go back to reference Dandekar RA: Cost effective implementation of synthetic tabulation (a.k.a. controlled tabular adjustments) in legacy and new statistical data publication systems. Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Luxembourg. 2003 Dandekar RA: Cost effective implementation of synthetic tabulation (a.k.a. controlled tabular adjustments) in legacy and new statistical data publication systems. Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Luxembourg. 2003
55.
go back to reference Dandekar RA: Maximum Utility-Minimum Information Loss Table Server Design for Statistical Disclosure Control of Tabular Data. Privacy in statistical databases. Edited by: Josep Domingo-Ferrer, Vicen\cc Torra. 2004 Dandekar RA: Maximum Utility-Minimum Information Loss Table Server Design for Statistical Disclosure Control of Tabular Data. Privacy in statistical databases. Edited by: Josep Domingo-Ferrer, Vicen\cc Torra. 2004
56.
go back to reference Castro J: Minimum-distance controlled perturbation methods for large-scale tabular data protection. European Journal of Operational Research. 2004, 171- Castro J: Minimum-distance controlled perturbation methods for large-scale tabular data protection. European Journal of Operational Research. 2004, 171-
57.
go back to reference Cox LH, Kelly JP, Patil R: Balancing quality and confidentiality for multivariate tabular data. Privacy in statistical databases. Edited by: Josep Domingo-Ferrer, Torra V. 2004 Cox LH, Kelly JP, Patil R: Balancing quality and confidentiality for multivariate tabular data. Privacy in statistical databases. Edited by: Josep Domingo-Ferrer, Torra V. 2004
58.
go back to reference Doyle P, Lane JI, Theeuwes JJM, Zayatz LM: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Elsevier Science. 2001, 1 Doyle P, Lane JI, Theeuwes JJM, Zayatz LM: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Elsevier Science. 2001, 1
59.
go back to reference Domingo-Ferrer J, Torra Vc: Disclosure Control Methods and Information Loss for Microdata. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Edited by: P. Doyle, et al. 2001, Elsevier Science Domingo-Ferrer J, Torra Vc: Disclosure Control Methods and Information Loss for Microdata. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Edited by: P. Doyle, et al. 2001, Elsevier Science
60.
go back to reference Kim JJ: A method for limiting disclosure in microdata based on random noise and transformation. Proceedings of the ASA Section on Survey Research Methodology. 1986, American Statistical Association: Alexandria, Virginia, 370-375. Kim JJ: A method for limiting disclosure in microdata based on random noise and transformation. Proceedings of the ASA Section on Survey Research Methodology. 1986, American Statistical Association: Alexandria, Virginia, 370-375.
61.
go back to reference Little RJA: Statistical Analysis of Masked Data. Journal of Official Statistics. 1993, 9 (2): 407-426. Little RJA: Statistical Analysis of Masked Data. Journal of Official Statistics. 1993, 9 (2): 407-426.
62.
go back to reference Sullivan G, Fuller WA: The use of measurement error to avoid disclosure. Proceedings of the American Statistical Association, Survey Methods Section. 1989, American Statistical Association: Alexandria, Virginia, 802-808. Sullivan G, Fuller WA: The use of measurement error to avoid disclosure. Proceedings of the American Statistical Association, Survey Methods Section. 1989, American Statistical Association: Alexandria, Virginia, 802-808.
63.
go back to reference Sullivan G, Fuller WA: The construction of masking error for categorical variables. Proceedings of the American Statistical Association, Survey Methods Section. 1990, American Statistical Association: Alexandria, Virginia, 435-440. Sullivan G, Fuller WA: The construction of masking error for categorical variables. Proceedings of the American Statistical Association, Survey Methods Section. 1990, American Statistical Association: Alexandria, Virginia, 435-440.
64.
go back to reference Kargupta H, Datta S, Wang Q, Sivakumar K: Random data perturbation techniques and privacy preserving data mining. Knowledge and Information Systems. 2005, 7: 387-414. 10.1007/s10115-004-0173-6. Kargupta H, Datta S, Wang Q, Sivakumar K: Random data perturbation techniques and privacy preserving data mining. Knowledge and Information Systems. 2005, 7: 387-414. 10.1007/s10115-004-0173-6.
65.
go back to reference Liew CK, Choi UJ, Liew CJ: A Data Distortion by Probability Distribution. ACM Transactions on Database Systems. 1985, 10 (3): 395-411. 10.1145/3979.4017. Liew CK, Choi UJ, Liew CJ: A Data Distortion by Probability Distribution. ACM Transactions on Database Systems. 1985, 10 (3): 395-411. 10.1145/3979.4017.
66.
go back to reference Defays D, Nanopoulos P: Panels of enterprises and confidentiality: The small aggregates method. Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys. 1993, Statistics Canada: Ottowa, Canada Defays D, Nanopoulos P: Panels of enterprises and confidentiality: The small aggregates method. Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys. 1993, Statistics Canada: Ottowa, Canada
67.
go back to reference Nanopoulos P: Confidentiality In The European Statistical System. Qüestiió. 1997, 21 (1 i 2): 219-220. Nanopoulos P: Confidentiality In The European Statistical System. Qüestiió. 1997, 21 (1 i 2): 219-220.
68.
go back to reference Defays D, Anwar M: Masking data using micro-aggregation. Journal of Official Statistics. 1998, 14: 449-461. Defays D, Anwar M: Masking data using micro-aggregation. Journal of Official Statistics. 1998, 14: 449-461.
69.
go back to reference Domingo-Ferrer J, Mateo-Sanz J: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering. 2002, 14 (1): 189-201. 10.1109/69.979982. [10.1109/69.979982] Domingo-Ferrer J, Mateo-Sanz J: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering. 2002, 14 (1): 189-201. 10.1109/69.979982. [10.1109/69.979982]
70.
go back to reference Sande G: Exact and approximate methods for data directed microaggregation in one or more dimensions. International Journal of Uncertainty, Fuziness, and Knowledge-Based Systems. 2002, 10 (5): 459-476. 10.1142/S0218488502001582. Sande G: Exact and approximate methods for data directed microaggregation in one or more dimensions. International Journal of Uncertainty, Fuziness, and Knowledge-Based Systems. 2002, 10 (5): 459-476. 10.1142/S0218488502001582.
71.
go back to reference Torra V: Microaggregation for categorical variables: A median based approach. Privacy in Statistical Databases. 2004, Springer LNCS Torra V: Microaggregation for categorical variables: A median based approach. Privacy in Statistical Databases. 2004, Springer LNCS
72.
go back to reference Domingo-Ferrer J, Mateo-Sanz JM: Resampling for statistical confidentiality in contingency tables. Computers and Mathematics with Applications. 1999, 38 (11-12): 13-32. 10.1016/S0898-1221(99)00281-3. Domingo-Ferrer J, Mateo-Sanz JM: Resampling for statistical confidentiality in contingency tables. Computers and Mathematics with Applications. 1999, 38 (11-12): 13-32. 10.1016/S0898-1221(99)00281-3.
73.
go back to reference Domingo-Ferrer J, Torra V: Disclosure Control Methods and Information Loss for Microdata. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Edited by: P. Doyle, et al. 2001 Domingo-Ferrer J, Torra V: Disclosure Control Methods and Information Loss for Microdata. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Edited by: P. Doyle, et al. 2001
74.
go back to reference Jimenez J, Torra V: Utility and Risk of JPEG-based Continuous Microdata Protection Methods. International Conference on Availability, Reliability and Security. 2009 Jimenez J, Torra V: Utility and Risk of JPEG-based Continuous Microdata Protection Methods. International Conference on Availability, Reliability and Security. 2009
75.
go back to reference Rubin DB: Discussion Statistical Disclosure Limitation (also cited as: Satisfying confidentiality constraints through the use of synthetic multiply-imputed microdata). Journal of Official Statistics. 1996, 9 (2): 461--468. Rubin DB: Discussion Statistical Disclosure Limitation (also cited as: Satisfying confidentiality constraints through the use of synthetic multiply-imputed microdata). Journal of Official Statistics. 1996, 9 (2): 461--468.
76.
go back to reference Heer GR: A Bootstrap Procedure to Preserve Statistical Confidentiality in Contingency Tables. International Seminar on Statistical Confidentiality. Edited by: D. Lievesley. 1993, Luxembourg, 261-271. Heer GR: A Bootstrap Procedure to Preserve Statistical Confidentiality in Contingency Tables. International Seminar on Statistical Confidentiality. Edited by: D. Lievesley. 1993, Luxembourg, 261-271.
77.
go back to reference Gopal R, Garfinkel R, Goes P: Confidentiality via Camouflage: The CVC Approach to Disclosure Limitation When Answering Queries to Databases. OPERATIONS RESEARCH. 2002, 50 (3): 501-516. 10.1287/opre.50.3.501.7745. Gopal R, Garfinkel R, Goes P: Confidentiality via Camouflage: The CVC Approach to Disclosure Limitation When Answering Queries to Databases. OPERATIONS RESEARCH. 2002, 50 (3): 501-516. 10.1287/opre.50.3.501.7745.
78.
go back to reference Gouweleeuw JP, Kooiman P, Willenborg LCRJ, P.-P dW: dW. Post Randomisation for Statistical Disclosure Control: Theory and Implementation. Journal of Official Statistics. 1998, 14 (4): 463--478. Gouweleeuw JP, Kooiman P, Willenborg LCRJ, P.-P dW: dW. Post Randomisation for Statistical Disclosure Control: Theory and Implementation. Journal of Official Statistics. 1998, 14 (4): 463--478.
79.
go back to reference Willenborg L, de Waal T: Statistical Disclosure Control in Practice. 1996, Springer, 1 Willenborg L, de Waal T: Statistical Disclosure Control in Practice. 1996, Springer, 1
80.
go back to reference Hundepool A, Willenborg L: $\mu$- and $\tau$-argus: Software for Statistical Disclosure Control. 3rd International Seminar on Statistical Confidentiality. 1996, Bled Hundepool A, Willenborg L: $\mu$- and $\tau$-argus: Software for Statistical Disclosure Control. 3rd International Seminar on Statistical Confidentiality. 1996, Bled
81.
go back to reference Samarati P, Sweeney L: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalisation and suppression. 1998, SRI International Samarati P, Sweeney L: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalisation and suppression. 1998, SRI International
82.
go back to reference Samarati P: Protecting respondents' identities in microdata release. IEEE Transactions on Knowledge and Data Engineering. 2001, 13 (6): 1010-1027. 10.1109/69.971193. Samarati P: Protecting respondents' identities in microdata release. IEEE Transactions on Knowledge and Data Engineering. 2001, 13 (6): 1010-1027. 10.1109/69.971193.
83.
go back to reference Sweeney L: Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 2002, 10 (5): 571-588. 10.1142/S021848850200165X. Sweeney L: Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 2002, 10 (5): 571-588. 10.1142/S021848850200165X.
84.
go back to reference Ciriani V, De Capitani di Vimercati SSF, Samarati P: k-Anonymity. Secure Data Management in Decentralized Systems. 2007, Springer Ciriani V, De Capitani di Vimercati SSF, Samarati P: k-Anonymity. Secure Data Management in Decentralized Systems. 2007, Springer
85.
go back to reference Bayardo R, Agrawal R: Data Privacy through Optimal k-Anonymization Proceedings of the 21st International Conference on Data Engineering. 2005 Bayardo R, Agrawal R: Data Privacy through Optimal k-Anonymization Proceedings of the 21st International Conference on Data Engineering. 2005
86.
go back to reference Iyengar V: Transforming data to satisfy privacy constraints. Proceedings of the ACM SIGKDD International Conference on Data Mining and Knowledge Discovery. 2002 Iyengar V: Transforming data to satisfy privacy constraints. Proceedings of the ACM SIGKDD International Conference on Data Mining and Knowledge Discovery. 2002
87.
go back to reference El Emam K, Dankar F, Issa R, Jonker E, Amyot D, Cogo E, Corriveau J-P, Walker M, Chowdhury S, Vaillancourt R, Roffey T, Bottomley J: A Globally Optimal k-Anonymity Method for the De-identification of Health Data Journal of the American Medical Informatics Association. 2009 El Emam K, Dankar F, Issa R, Jonker E, Amyot D, Cogo E, Corriveau J-P, Walker M, Chowdhury S, Vaillancourt R, Roffey T, Bottomley J: A Globally Optimal k-Anonymity Method for the De-identification of Health Data Journal of the American Medical Informatics Association. 2009
88.
go back to reference Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Lenz R, Naylor J, Nordholt ES, Seri G, Wolf P-PD: Handbook on Statistical Disclosure Control (Version 1.2). A Network of Excellence in the European Statistical System in the field of Statistical Disclosure Control. 2010 Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Lenz R, Naylor J, Nordholt ES, Seri G, Wolf P-PD: Handbook on Statistical Disclosure Control (Version 1.2). A Network of Excellence in the European Statistical System in the field of Statistical Disclosure Control. 2010
89.
go back to reference Winkler WE: Re-identification methods for evaluating the confidentiality of analytically valid microdata. Research in Official Statistics. 1998, 1 (2): 50--69. Winkler WE: Re-identification methods for evaluating the confidentiality of analytically valid microdata. Research in Official Statistics. 1998, 1 (2): 50--69.
90.
go back to reference Domingo-Ferrer J, Mateo-sanz JM, Torra Vc: Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure. Proceedings of ETK-NTTS 2001. 2001, Luxemburg: Eurostat, 807-826. Eurostat Domingo-Ferrer J, Mateo-sanz JM, Torra Vc: Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure. Proceedings of ETK-NTTS 2001. 2001, Luxemburg: Eurostat, 807-826. Eurostat
91.
go back to reference Shannon CE: A Mathematical Theory of Communication. The Bell System Technical Journal. 1948, 27- Shannon CE: A Mathematical Theory of Communication. The Bell System Technical Journal. 1948, 27-
92.
go back to reference Shannon CE: Communication Theory of Secrecy Systems. The Bell System Technical Journal. 1949 Shannon CE: Communication Theory of Secrecy Systems. The Bell System Technical Journal. 1949
93.
go back to reference de Waal T, Willenborg L: Information loss through global recoding and local suppression. Netherlands Official Statistics. 1999, 17-20. Spring Special Issue de Waal T, Willenborg L: Information loss through global recoding and local suppression. Netherlands Official Statistics. 1999, 17-20. Spring Special Issue
94.
go back to reference Kooiman P, Willenborg L, Gouweleeuw J: PRAM: a method for disclosure limitation of microdata. Statistics Netherlands, Division Research and Development, Department of Statistical Methods. 1997 Kooiman P, Willenborg L, Gouweleeuw J: PRAM: a method for disclosure limitation of microdata. Statistics Netherlands, Division Research and Development, Department of Statistical Methods. 1997
95.
go back to reference Gionis A, Tassa T: k-Anonymization with Minimal Loss of Information. Knowledge and Data Engineering, IEEE Transactions on. 2009, 21 (2): 206-219. Gionis A, Tassa T: k-Anonymization with Minimal Loss of Information. Knowledge and Data Engineering, IEEE Transactions on. 2009, 21 (2): 206-219.
96.
go back to reference El Emam K, Dankar FK, Issa R, Jonker E, Amyot D, Cogo E, Corriveau J-P, Walker M, Chowdhury S, Vaillancourt R, Roffey T, Bottomley J: A globally optimal k-anonymity method for the de-identification of health data. Journal of the American Medical Informatics Association. 2009, 16 (5): 670-82. 10.1197/jamia.M3144. [10.1197/jamia.M3144]PubMedPubMedCentral El Emam K, Dankar FK, Issa R, Jonker E, Amyot D, Cogo E, Corriveau J-P, Walker M, Chowdhury S, Vaillancourt R, Roffey T, Bottomley J: A globally optimal k-anonymity method for the de-identification of health data. Journal of the American Medical Informatics Association. 2009, 16 (5): 670-82. 10.1197/jamia.M3144. [10.1197/jamia.M3144]PubMedPubMedCentral
97.
go back to reference Sweeney L: Computational disclosure control: A primer on data privacy protection. Massachusetts Institute of Technology. 2001 Sweeney L: Computational disclosure control: A primer on data privacy protection. Massachusetts Institute of Technology. 2001
98.
go back to reference Bayardo RJ, Agrawal R: Data privacy through optimal k-anonymization. Proceedings of the 21st International Conference on Data Engineering. 2005, 217-228. Bayardo RJ, Agrawal R: Data privacy through optimal k-anonymization. Proceedings of the 21st International Conference on Data Engineering. 2005, 217-228.
99.
go back to reference El Emam K, Dankar FK: Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association. 2008, 15 (5): 627-37. 10.1197/jamia.M2716. [10.1197/jamia.M2716]PubMedPubMedCentral El Emam K, Dankar FK: Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association. 2008, 15 (5): 627-37. 10.1197/jamia.M2716. [10.1197/jamia.M2716]PubMedPubMedCentral
100.
go back to reference LeFevre K, DeWitt DJ, Ramakrishnan R: Mondrian Multidimensional K-Anonymity. Proceedings of the 22nd International Conference on Data Engineering. 2006, IEEE Computer Society: Washington, DC, USA, 25- LeFevre K, DeWitt DJ, Ramakrishnan R: Mondrian Multidimensional K-Anonymity. Proceedings of the 22nd International Conference on Data Engineering. 2006, IEEE Computer Society: Washington, DC, USA, 25-
101.
go back to reference Hore B, Jammalamadaka RC, Mehrotra S: Flexible Anonymization For Privacy Preserving Data Publishing: A Systematic Search Based Approach. SIAM International Conference on Data Mining (SDM). 2007 Hore B, Jammalamadaka RC, Mehrotra S: Flexible Anonymization For Privacy Preserving Data Publishing: A Systematic Search Based Approach. SIAM International Conference on Data Mining (SDM). 2007
104.
go back to reference note on the individual risk of disclosure A. Silvia Polettini. Istituto Nazionale di Statistica. 2003 note on the individual risk of disclosure A. Silvia Polettini. Istituto Nazionale di Statistica. 2003
106.
go back to reference Sweeney L: Datafly: A system for providing anoinymity to medical data. Proceedings of the IFIP TC11 WG11.3 Eleventh International Conference on Database Securty XI: Status and Prospects. 1997 Sweeney L: Datafly: A system for providing anoinymity to medical data. Proceedings of the IFIP TC11 WG11.3 Eleventh International Conference on Database Securty XI: Status and Prospects. 1997
107.
go back to reference Fung BCM, Wang K, Fu AW-C, Yu PS: Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques. 2010, Chapman and Hall/CRC, 1 Fung BCM, Wang K, Fu AW-C, Yu PS: Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques. 2010, Chapman and Hall/CRC, 1
108.
go back to reference Xiao X, Tao Y: Personalized privacy preservation. Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 2006, ACM: New York, NY, USA, 229-240. Xiao X, Tao Y: Personalized privacy preservation. Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 2006, ACM: New York, NY, USA, 229-240.
109.
go back to reference Iyengar VS: Transforming data to satisfy privacy constraints. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 2002, ACM: New York, NY, USA, 279-288. Iyengar VS: Transforming data to satisfy privacy constraints. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 2002, ACM: New York, NY, USA, 279-288.
110.
go back to reference Fung BCM, Wang K, Yu PS: Top-down specialization for information and privacy preservation. ICDE 2005. Proceedings. 21st International Conference on Data Engineering. 2005, 205-216. Fung BCM, Wang K, Yu PS: Top-down specialization for information and privacy preservation. ICDE 2005. Proceedings. 21st International Conference on Data Engineering. 2005, 205-216.
111.
go back to reference Fung BCM, Wang K, Yu PS: Anonymizing Classification Data for Privacy Preservation. IEEE Transactions on Knowledge and Data Engineering (TKDE). 2007, 19 (5): 711--725. Fung BCM, Wang K, Yu PS: Anonymizing Classification Data for Privacy Preservation. IEEE Transactions on Knowledge and Data Engineering (TKDE). 2007, 19 (5): 711--725.
112.
go back to reference El Emam K: Risk-based de-identification of health data. IEEE Security and Privacy. 2010, 8 (3): 64-67. El Emam K: Risk-based de-identification of health data. IEEE Security and Privacy. 2010, 8 (3): 64-67.
113.
go back to reference Statistics Canada: Handbook for creating public use microdata files. 2006 Statistics Canada: Handbook for creating public use microdata files. 2006
114.
go back to reference Cancer Care Ontario Data Use and Disclosure Policy. Cancer Care Ontario. 2005 Cancer Care Ontario Data Use and Disclosure Policy. Cancer Care Ontario. 2005
115.
go back to reference Security and confidentiality policies and procedures. Health Quality Council. 2004 Security and confidentiality policies and procedures. Health Quality Council. 2004
116.
117.
go back to reference Privacy code. Manitoba Center for Health Policy. 2002 Privacy code. Manitoba Center for Health Policy. 2002
118.
go back to reference Subcommittee on Disclosure Limitation Methodology - Federal Committee on Statistical Methodology. Working paper 22: Report on statistical disclosure control. 1994, Office of Management and Budget Subcommittee on Disclosure Limitation Methodology - Federal Committee on Statistical Methodology. Working paper 22: Report on statistical disclosure control. 1994, Office of Management and Budget
119.
go back to reference Statistics Canada: Therapeutic abortion survey. 2007 Statistics Canada: Therapeutic abortion survey. 2007
120.
go back to reference Office of the Information and Privacy Commissioner of British Columbia. 1998, Order No. 261-1998 Office of the Information and Privacy Commissioner of British Columbia. 1998, Order No. 261-1998
121.
go back to reference Office of the Information and Privacy Commissioner of Ontario. Office of the Information and Privacy Commissioner of Ontario.
122.
go back to reference Ministry of Health and Long Term care (Ontario). Corporate Policy. 1984, 3-1-21 Ministry of Health and Long Term care (Ontario). Corporate Policy. 1984, 3-1-21
123.
go back to reference Duncan G, Jabine T, de Wolf S: Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics. 1993, National Academies Press Duncan G, Jabine T, de Wolf S: Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics. 1993, National Academies Press
124.
go back to reference de Waal A, Willenborg L: A view on statistical disclosure control for microdata. Survey Methodology. 1996, 22 (1): 95-103. de Waal A, Willenborg L: A view on statistical disclosure control for microdata. Survey Methodology. 1996, 22 (1): 95-103.
125.
go back to reference Office of the Privacy Commissioner of Quebec (CAI). Chenard v. Ministere de l'agriculture, des pecheries et de l'alimentation (141). 1997 Office of the Privacy Commissioner of Quebec (CAI). Chenard v. Ministere de l'agriculture, des pecheries et de l'alimentation (141). 1997
126.
go back to reference National Center for Education Statistics. NCES Statistical Standards. 2003, US Department of Education National Center for Education Statistics. NCES Statistical Standards. 2003, US Department of Education
127.
go back to reference National Committee on Vital and Health Statistics: Report to the Secretary of the US Department of Health and Human Services on Enhanced Protections for Uses of Health Data: A Stewardship Framework for "Secondary Uses" of Electronically Collected and Transmitted Health Data. 2007 National Committee on Vital and Health Statistics: Report to the Secretary of the US Department of Health and Human Services on Enhanced Protections for Uses of Health Data: A Stewardship Framework for "Secondary Uses" of Electronically Collected and Transmitted Health Data. 2007
128.
go back to reference Sweeney L: Data sharing under HIPAA: 12 years later. Workshop on the HIPAA Privacy Rule's De-Identification Standard. 2010, Department of Health and Human Services Sweeney L: Data sharing under HIPAA: 12 years later. Workshop on the HIPAA Privacy Rule's De-Identification Standard. 2010, Department of Health and Human Services
129.
go back to reference El Emam K, Jabbouri S, Sams S, Drouet Y, Power M: Evaluating common de-identification heuristics for personal health information. Journal of Medical Internet Research. 2006, 8 (4): e28-10.2196/jmir.8.4.e28. [PMID: 17213047 ]PubMedPubMedCentral El Emam K, Jabbouri S, Sams S, Drouet Y, Power M: Evaluating common de-identification heuristics for personal health information. Journal of Medical Internet Research. 2006, 8 (4): e28-10.2196/jmir.8.4.e28. [PMID: 17213047 ]PubMedPubMedCentral
130.
go back to reference El Emam K, Jonker E, Sams S, Neri E, Neisa A, Gao T, Chowdhury S: De-Identification Guidelines for Personal Health Information. 2007, Report produced for the Office of the Privacy Commissioner of Canada: Ottawa El Emam K, Jonker E, Sams S, Neri E, Neisa A, Gao T, Chowdhury S: De-Identification Guidelines for Personal Health Information. 2007, Report produced for the Office of the Privacy Commissioner of Canada: Ottawa
131.
go back to reference Benitez K, Malin B: Evaluating re-identification risks with respect to the HIPAA privacy rule. Journal of the American Medical Informatics Association. 2010, 17 (2): 169-177. 10.1136/jamia.2009.000026.PubMedPubMedCentral Benitez K, Malin B: Evaluating re-identification risks with respect to the HIPAA privacy rule. Journal of the American Medical Informatics Association. 2010, 17 (2): 169-177. 10.1136/jamia.2009.000026.PubMedPubMedCentral
132.
go back to reference Torra V, Domingo-Ferrer J: Record linkage methods for multidatabase data mining. Information Fusion in Data Mining. 2003, 101-132. Torra V, Domingo-Ferrer J: Record linkage methods for multidatabase data mining. Information Fusion in Data Mining. 2003, 101-132.
133.
go back to reference Torra V, Abowd J, Domingo-Ferrer J: Using Mahalanobis distance-based record linkage for dislcosure risk assessment. 2006, 233-242. Springer LNCS Torra V, Abowd J, Domingo-Ferrer J: Using Mahalanobis distance-based record linkage for dislcosure risk assessment. 2006, 233-242. Springer LNCS
134.
go back to reference El Emam K, Dankar F: Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association. 2008, 15: 627-637. 10.1197/jamia.M2716.PubMedPubMedCentral El Emam K, Dankar F: Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association. 2008, 15: 627-637. 10.1197/jamia.M2716.PubMedPubMedCentral
136.
go back to reference Loukides G, Denny J, Malin B: The disclosure of diagnosis codes can breach research participants' privacy. Journal of the American Medical Informatics Association. 2010, 17: 322-327.PubMedPubMedCentral Loukides G, Denny J, Malin B: The disclosure of diagnosis codes can breach research participants' privacy. Journal of the American Medical Informatics Association. 2010, 17: 322-327.PubMedPubMedCentral
137.
go back to reference El Emam K: Methods for the de-identification of electronic health records for genomic research. Genome Medicine. 2011, 3 (25): El Emam K: Methods for the de-identification of electronic health records for genomic research. Genome Medicine. 2011, 3 (25):
138.
go back to reference Meyerson A, Williams R: On the complexity of optimal k-anonymity. Proceedings of the 23rd Conference on the Principles of Database Systems. 2004 Meyerson A, Williams R: On the complexity of optimal k-anonymity. Proceedings of the 23rd Conference on the Principles of Database Systems. 2004
139.
go back to reference Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A: Anonymizing Tables. Proceedings of the 10th International Conference on Database Theory (ICDT05). 2005 Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A: Anonymizing Tables. Proceedings of the 10th International Conference on Database Theory (ICDT05). 2005
140.
go back to reference El Emam K, Dankar F, Vaillancourt R, Roffey T, Lysyk M: Evaluating patient re-identification risk from hospital prescription records. Canadian Journal of Hospital Pharmacy. 2009, 62 (4): 307-319. El Emam K, Dankar F, Vaillancourt R, Roffey T, Lysyk M: Evaluating patient re-identification risk from hospital prescription records. Canadian Journal of Hospital Pharmacy. 2009, 62 (4): 307-319.
Metadata
Title
De-identifying a public use microdata file from the Canadian national discharge abstract database
Authors
Khaled El Emam
David Paton
Fida Dankar
Gunes Koru
Publication date
01-12-2011
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2011
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/1472-6947-11-53

Other articles of this Issue 1/2011

BMC Medical Informatics and Decision Making 1/2011 Go to the issue