ABSTRACT
An organization makes a new release as new information become available, releases a tailored view for each data request, releases sensitive information and identifying information separately. The availability of related releases sharpens the identification of individuals by a global quasi-identifier consisting of attributes from related releases. Since it is not an option to anonymize previously released data, the current release must be anonymized to ensure that a global quasi-identifier is not effective for identification. In this paper, we study the sequential anonymization problem under this assumption. A key question is how to anonymize the current release so that it cannot be linked to previous releases yet remains useful for its own release purpose. We introduce the lossy join, a negative property in relational database design, as a way to hide the join relationship among releases, and propose a scalable and practical solution.
- G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Anonymizing tables. In ICDT, 2005. Google ScholarDigital Library
- R. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In IEEE ICDE, pages 217--228, 2005. Google ScholarDigital Library
- C. Clifton. Using sample size to limit exposure to data mining. Journal of Computer Security, 8(4):281--307, 2000. Google ScholarDigital Library
- A. Deutsch and Y. Papakonstantinou. Privacy in database publishing. In ICDT, 2005. Google ScholarDigital Library
- B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In IEEE ICDE, pages 205--216, April 2005. Google ScholarDigital Library
- V. S. Iyengar. Transforming data to satisfy privacy constraints. In ACM SIGKDD, pages 279--288, 2002. Google ScholarDigital Library
- D. Kifer and J. Gehrke. Injecting utility into anonymized datasets. In ACM SIGMOD, Chicago, IL, June 2006. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In ACM SIGMOD, 2005. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In IEEE ICDE, 2006. Google ScholarDigital Library
- A. Machanavajjhala, J. Gehrke, and D. Kifer. l-diversity: Privacy beyond k-anonymity. In IEEE ICDE, 2006. Google ScholarDigital Library
- B. Malin and L. Sweeney. How to protect genomic data privacy in a distributed network. In Journal of Biomed Info, 37(3): 179--192, 2004. Google ScholarDigital Library
- A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In PODS, 2004. Google ScholarDigital Library
- G. Miklau and D. Suciu. A formal analysis of information disclosure in data exchange. In ACM SIGMOD, 2004. Google ScholarDigital Library
- D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar
- J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
- P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 13(6):1010--1027, 2001. Google ScholarDigital Library
- P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In IEEE Symposium on Research in Security and Privacy, May 1998.Google Scholar
- C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379 and 623, 1948.Google ScholarCross Ref
- L. Sweeney. k-Anonymity: a model for protecting privacy. In International Journal on Uncertanty, Fuzziness and Knowledge-based Systems, 10(5), pages 557--570, 2002. Google ScholarDigital Library
- K. Wang, B. C. M. Fung, and G. Dong. Integrating private databases for data analysis. In IEEE ISI, May 2005. Google ScholarDigital Library
- K. Wang, B. C. M. Fung, and P. S. Yu. Template-based privacy preservation in classification problems. In IEEE ICDM, pages 466--473, November 2005. Google ScholarDigital Library
- K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: An alternative to k-anonymization. Knowledge and Information Systems: An International Journal, 2006. Google ScholarDigital Library
- K. Wang, P. S. Yu, and S. Chakraborty. Bottom-up generalization: A data mining solution to privacy protection. In IEEE ICDM, November 2004. Google ScholarDigital Library
- R. C. W. Wong, J. Li., A. W. C. Fu, and K. Wang. (α,k)-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In ACM SIGKDD, 2006. Google ScholarDigital Library
- X. Xiao and Y. Tao. Personalized privacy preservation. In ACM SIGMOD, June 2006. Google ScholarDigital Library
- C. Yao, X. S. Wang, and S. Jajodia. Checking for k-anonymity violation by views. In VLDB, 2005. Google ScholarDigital Library
Index Terms
- Anonymizing sequential releases
Recommendations
Privacy by diversity in sequential releases of databases
We study the problem of privacy preservation in sequential releases of databases. In that scenario, several releases of the same table are published over a period of time, where each release contains a different set of the table attributes, as dictated ...
Anonymizing sequential releases under arbitrary updates
EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 WorkshopsIn today's global information society, governments, companies, public and private institutions and even individuals have to cope with growing demands for personal data publication from scientists, statisticians, journalists and many other data ...
Limiting disclosure of sensitive data in sequential releases of databases
Privacy Preserving Data Publishing (PPDP) is a research field that deals with the development of methods to enable publishing of data while minimizing distortion, for maintaining usability on one hand, and respecting privacy on the other hand. ...
Comments