Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2020

Open Access 01-12-2020 | Research article

Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values

Authors: Hyukki Lee, Yon Dohn Chung

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Login to get access

Abstract

Background

Various methods based on k-anonymity have been proposed for publishing medical data while preserving privacy. However, the k-anonymity property assumes that adversaries possess fixed background knowledge. Although differential privacy overcomes this limitation, it is specialized for aggregated results. Thus, it is difficult to obtain high-quality microdata. To address this issue, we propose a differentially private medical microdata release method featuring high utility.

Methods

We propose a method of anonymizing medical data under differential privacy. To improve data utility, especially by preserving informative attribute values, the proposed method adopts three data perturbation approaches: (1) generalization, (2) suppression, and (3) insertion. The proposed method produces an anonymized dataset that is nearly optimal with regard to utility, while preserving privacy.

Results

The proposed method achieves lower information loss than existing methods. Based on a real-world case study, we prove that the results of data analyses using the original dataset and those obtained using a dataset anonymized via the proposed method are considerably similar.

Conclusions

We propose a novel differentially private anonymization method that preserves informative values for the release of medical data. Through experiments, we show that the utility of medical data that has been anonymized via the proposed method is significantly better than that of existing methods.
Literature
1.
go back to reference Ren J-J, Sun T, He Y, Zhang Y. A statistical analysis of vaccine-adverse event data. BMC Med Inform Decis Mak. 2019; 19(1):101.CrossRef Ren J-J, Sun T, He Y, Zhang Y. A statistical analysis of vaccine-adverse event data. BMC Med Inform Decis Mak. 2019; 19(1):101.CrossRef
2.
go back to reference Jing X, Emerson M, Masters D, Brooks M, Buskirk J, Abukamail N, Liu C, Cimino JJ, Shubrook J, De Lacalle S, et al. A visual interactive analytic tool for filtering and summarizing large health data sets coded with hierarchical terminologies (VIADS). BMC Med Inform Decis Mak. 2019; 19(1):31.CrossRef Jing X, Emerson M, Masters D, Brooks M, Buskirk J, Abukamail N, Liu C, Cimino JJ, Shubrook J, De Lacalle S, et al. A visual interactive analytic tool for filtering and summarizing large health data sets coded with hierarchical terminologies (VIADS). BMC Med Inform Decis Mak. 2019; 19(1):31.CrossRef
3.
go back to reference Sweeney L. Int J Uncertain, Fuzziness Knowl-Based Syst. 2002; 10(05):557–70. Sweeney L. Int J Uncertain, Fuzziness Knowl-Based Syst. 2002; 10(05):557–70.
4.
go back to reference Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. l-diversity: Privacy beyond k-anonymity. ACM Trans Knowl Discov Data (TKDD). 2007; 1(1):3.CrossRef Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. l-diversity: Privacy beyond k-anonymity. ACM Trans Knowl Discov Data (TKDD). 2007; 1(1):3.CrossRef
5.
go back to reference Li N, Li T, Venkatasubramanian S. t-closeness: Privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering. IEEE Computer Society: 2007. p. 106–15. Li N, Li T, Venkatasubramanian S. t-closeness: Privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering. IEEE Computer Society: 2007. p. 106–15.
6.
go back to reference Truta TM, Vinay B. Privacy protection: p-sensitive k-anonymity property. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06). IEEE: 2006. p. 94. Truta TM, Vinay B. Privacy protection: p-sensitive k-anonymity property. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06). IEEE: 2006. p. 94.
7.
go back to reference Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography Conference. Springer: 2006. p. 265–84. Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography Conference. Springer: 2006. p. 265–84.
8.
go back to reference Ganta SR, Kasiviswanathan SP, Smith A. Composition attacks and auxiliary information in data privacy. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM: 2008. p. 265–73. Ganta SR, Kasiviswanathan SP, Smith A. Composition attacks and auxiliary information in data privacy. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM: 2008. p. 265–73.
9.
go back to reference Mohammed N, Chen R, Fung B, Yu PS. Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM: 2011. p. 493–501. Mohammed N, Chen R, Fung B, Yu PS. Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM: 2011. p. 493–501.
10.
go back to reference Li H, Xiong L, Jiang X, Liu J. Differentially private histogram publication for dynamic datasets: an adaptive sampling approach. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM: 2015. p. 1001–10. Li H, Xiong L, Jiang X, Liu J. Differentially private histogram publication for dynamic datasets: an adaptive sampling approach. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM: 2015. p. 1001–10.
11.
go back to reference Lee H, Kim S, Kim JW, Chung YD. Utility-preserving anonymization for health data publishing. BMC Med Inform Decis Mak. 2017; 17(1):104.CrossRef Lee H, Kim S, Kim JW, Chung YD. Utility-preserving anonymization for health data publishing. BMC Med Inform Decis Mak. 2017; 17(1):104.CrossRef
12.
go back to reference Xu Y, Ma T, Tang M, Tian W. A survey of privacy preserving data publishing using generalization and suppression. Appl Math Inf Sci. 2014; 8(3):1103.CrossRef Xu Y, Ma T, Tang M, Tian W. A survey of privacy preserving data publishing using generalization and suppression. Appl Math Inf Sci. 2014; 8(3):1103.CrossRef
13.
go back to reference Xu C, Ren J, Zhang Y, Qin Z, Ren K. DPPro: Differentially private high-dimensional data release via random projection. IEEE Trans Inf Forensics Secur. 2017; 12(12):3081–93.CrossRef Xu C, Ren J, Zhang Y, Qin Z, Ren K. DPPro: Differentially private high-dimensional data release via random projection. IEEE Trans Inf Forensics Secur. 2017; 12(12):3081–93.CrossRef
14.
go back to reference Al-Hussaeni K, Fung BC, Iqbal F, Liu J, Hung PC. Differentially private multidimensional data publishing. Knowl Inf Syst. 2018; 56(3):717–52.CrossRef Al-Hussaeni K, Fung BC, Iqbal F, Liu J, Hung PC. Differentially private multidimensional data publishing. Knowl Inf Syst. 2018; 56(3):717–52.CrossRef
15.
go back to reference McSherry F, Talwar K. Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07). IEEE: 2007. p. 94–103. McSherry F, Talwar K. Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07). IEEE: 2007. p. 94–103.
16.
go back to reference McSherry F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM: 2009. p. 19–30. McSherry F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM: 2009. p. 19–30.
17.
go back to reference LeFevre K, DeWitt DJ, Ramakrishnan R. Incognito: Efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. ACM: 2005. p. 49–60. LeFevre K, DeWitt DJ, Ramakrishnan R. Incognito: Efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. ACM: 2005. p. 49–60.
18.
go back to reference Xu J, Wang W, Pei J, Wang X, Shi B, Fu AW-C. Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM: 2006. p. 785–90. Xu J, Wang W, Pei J, Wang X, Shi B, Fu AW-C. Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM: 2006. p. 785–90.
21.
go back to reference Mohammed N, Jiang X, Chen R, Fung BC, Ohno-Machado L. Privacy-preserving heterogeneous health data sharing. J Am Med Inform Assoc. 2013; 20(3):462–9.CrossRef Mohammed N, Jiang X, Chen R, Fung BC, Ohno-Machado L. Privacy-preserving heterogeneous health data sharing. J Am Med Inform Assoc. 2013; 20(3):462–9.CrossRef
22.
go back to reference Bild R, Kuhn KA, Prasser F. Safepub: A truthful data anonymization algorithm with strong privacy guarantees. Proc Priv Enhancing Technol. 2018; 2018(1):67–87.CrossRef Bild R, Kuhn KA, Prasser F. Safepub: A truthful data anonymization algorithm with strong privacy guarantees. Proc Priv Enhancing Technol. 2018; 2018(1):67–87.CrossRef
23.
go back to reference Li N, Qardaji WH, Su D. Provably private data anonymization: Or, k-anonymity meets differential privacy. CoRR, abs/1101.2604. 2011; 49:55. Li N, Qardaji WH, Su D. Provably private data anonymization: Or, k-anonymity meets differential privacy. CoRR, abs/1101.2604. 2011; 49:55.
Metadata
Title
Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values
Authors
Hyukki Lee
Yon Dohn Chung
Publication date
01-12-2020
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-020-01171-5

Other articles of this Issue 1/2020

BMC Medical Informatics and Decision Making 1/2020 Go to the issue