ABSTRACT
Advances in information technology, and its use in research, are increasing both the need for anonymized data and the risks of poor anonymization. We present a metric, δ-presence, that clearly links the quality of anonymization to the risk posed by inadequate anonymization. We show that existing anonymization techniques are inappropriate for situations where δ-presence is a good metric (specifically, where knowing an individual is in the database poses a privacy risk), and present algorithms for effectively anonymizing to meet δ-presence. The algorithms are evaluated in the context of a real-world scenario, demonstrating practical applicability of the approach.
- A. D. Association. Direct and indirect costs of diabetes in the United States, 2006. http://www.diabetes.org/diabetes-statistics/cost-of-diabetes-in-us.jspGoogle Scholar
- C. C. Agrawal. On k-anonymity and the curse of dimensionality. In Proceedings of the 31st international conference on Very large data bases, pp. 901--909, Trondheim, Norway, 2005. Google ScholarDigital Library
- G. Agrawal, T. Feder, K. Kenthapadi, S. Khuller,R. Panigrahy, D. Thomas., A. Zhu, Achieving anonymity via clustering. In: PODS '06: Proc. of the 25th ACMSIGMOD-SIGACT-SIGART symposium on Principles of database systems, Chicago, IL, USA, 2006. Google ScholarDigital Library
- M. Atzori. Weak k-anonymity: A low-distortion model for protecting privacy. In Proceedings of the 8th International Information Security Conference (ISC06), pages 60--71,2006. Google ScholarDigital Library
- R. Bayardo and R. Agrawal. Data privacy through optimalk-anonymization. In Proc. of the 21st Int'l Conf. on Data Engineering, 2005. Google ScholarDigital Library
- C. Blake and C. Merz. UCI repository of machine learning databases, 1998.Google Scholar
- Standard for privacy of individually identifiable health information. Federal Register, 67(157):53181--53273, Aug.14 2002.Google Scholar
- A. Ohrn and L. Ohno-Machado. Using boolean reasoning to anonymize databases. Artificial Intelligence in Medicine, 15(3):235--254, Mar. 1999.Google ScholarCross Ref
- V. Iyengar. Transforming data to satisfy privacy constraints. In Proc., the Eigth ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pages 279--288, 2002. Google ScholarDigital Library
- K. LeFevre, D. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, MD, June 13--16 2005. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k--anonymity. In Proceedings of the 22ndInternational Conference on Data Engineering (ICDE '06), pages 25--35, Atlanta, GA, Apr. 3--7 2006. Google ScholarDigital Library
- A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE 2006), Atlanta, Georgia, Apr. 2006. Google ScholarDigital Library
- National Institute of Diabetes and Digestive and Kidney Diseases. National diabetes statistics fact sheet: general information and national estimates on diabetes in the United States. Technical Report NIH Publication No. 06-3892, U.S. Department of Health and Human Services, National Institute of Health, Bethesda, MD, Nov. 2005.Google Scholar
- M. E. Nergiz and C. Clifton. Thoughts on k-anonymization. In ICDEW '06: Proc. of the 22nd Int'l Conf. on Data Engineering Workshops, page 96, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- P. Samarati. Protecting respondent's privacy in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6):1010--1027, Nov./Dec. 2001. Google ScholarDigital Library
- L. Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, (5):557--570, 2002. Google ScholarDigital Library
- X. Xiao and Y. Tao. Anatomy: Simple and effective privacy preservation. In Proceedings of 32nd International Conference on Very Large Data Bases (VLDB 2006), Seoul, Korea, Sept. 12-15 2006. Google ScholarDigital Library
Index Terms
- Hiding the presence of individuals from shared databases
Recommendations
δ-Presence without Complete World Knowledge
Advances in information technology, and its use in research, are increasing both the need for anonymized data and the risks of poor anonymization. In [CHECK END OF SENTENCE], we presented a new privacy metric, \delta-presence, that clearly links the ...
Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining
Identity disclosure is one of the most serious privacy concerns in today's information age. A well-known method for protecting identity disclosure is k-anonymity. A dataset provides k-anonymity protection if the information for each individual in the ...
A polynomial-time approximation to optimal multivariate microaggregation
Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released without disclosing private information on the underlying ...
Comments