skip to main content
research-article

Output privacy in data mining

Published:18 March 2011Publication History
Skip Abstract Section

Abstract

Privacy has been identified as a vital requirement in designing and implementing data mining systems. In general, privacy preservation demands protecting both input and output privacy: the former refers to sanitizing the raw data itself before performing mining; while the latter refers to preventing the mining output (models or patterns) from malicious inference attacks. This article presents a systematic study on the problem of protecting output privacy in data mining, and particularly, stream mining: (i) we highlight the importance of this problem by showing that even sufficient protection of input privacy does not guarantee that of output privacy; (ii) we present a general inferencing and disclosure model that exploits the intrawindow and interwindow privacy breaches in stream mining output; (iii) we propose a light-weighted countermeasure that effectively eliminates these breaches without explicitly detecting them, while minimizing the loss of output accuracy; (iv) we further optimize the basic scheme by taking account of two types of semantic constraints, aiming at maximally preserving utility-related semantics while maintaining hard privacy guarantee; (v) finally, we conduct extensive experimental evaluation over both synthetic and real data to validate the efficacy of our approach.

References

  1. Adam, N. R. and Worthmann, J. C. 1989. Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21, 4, 515--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, D. and Aggarwal, C. C. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'01). ACM, New York, NY, 247--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94). Morgan Kaufmann Publishers Inc., San Francisco, CA, 487--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. SIGMOD Rec. 29, 2, 439--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Atzori, M., Bonchi, F., Giannotti, F., and Pedreschi, D. 2008. Anonymity preserving pattern discovery. VLDB J. 17, 4, 703--727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'02). ACM, New York, NY, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bu, S., Lakshmanan, L. V. S., Ng, R. T., and Ramesh, G. 2007. Preservation of patterns and input-output privacy. In Proceedings of the 23th IEEE International Conference on Data Mining (ICDE'07). IEEE Computer Society, Los Alamitos, CA, 696--705.Google ScholarGoogle Scholar
  8. Calders, T. 2004. Computational complexity of itemset frequency satisfiability. In Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'04). ACM, New York, NY, 143--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Calders, T. and Goethals, B. 2002. Mining all non-derivable frequent itemsets. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD'02). Springer-Verlag, Berlin, Germany, 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen, K. and Liu, L. 2005. Privacy preserving data classification with rotation perturbation. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM'05). IEEE Computer Society, Los Alamitos, CA, 589--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chi, Y., Wang, H., Yu, P. S., and Muntz, R. R. 2006. Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl. Inf. Syst. 10, 3, 265--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chin, F. Y. and Ozsoyoglu, G. 1981. Statistical database design. ACM Trans. Datab. Syst. 6, 1, 113--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cox, L. 1980. Suppression methodology and statistical disclosure control. J. Amer. Stat. Asson. 75, 370, 377--385.Google ScholarGoogle Scholar
  14. Dalenius, T. and Reiss, S. P. 1980. Data-swapping: A technique for disclosure control. J. Statist. Plann. Inference 6, 73--85.Google ScholarGoogle ScholarCross RefCross Ref
  15. Denning, D. E. 1980. Secure statistical databases with random sample queries. ACM Trans. Datab. Syst. 5, 3, 291--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dobkin, D., Jones, A. K., and Lipton, R. J. 1979. Secure databases: protection against user influence. ACM Trans. Datab. Syst. 4, 1, 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. 2002. Privacy preserving mining of association rules. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). ACM, New York, NY, 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fellegi, I. P. 1972. On the question of statistical confidentiality. J. Amer. Stat. Asson. 67, 337, 7--18.Google ScholarGoogle Scholar
  19. Hore, B., Mehrotra, S., and Tsudik, G. 2004. A privacy-preserving index for range queries. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB'04). VLDB Endowment, Toronto, Canada, 720--731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Huang, Z., Du, W., and Chen, B. 2005. Deriving private information from randomized data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'05). ACM, New York, NY, 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kantarcioǧlu, M., Jin, J., and Clifton, C. 2004. When do data mining results violate privacy? In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD'04). ACM, New York, NY, 599--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. 2003. On the privacy preserving properties of random data perturbation techniques. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM'03). IEEE Computer Society, Los Alamitos, CA, 99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). IEEE Computer Society, Los Alamitos, CA, 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Li, F., Sun, J., Papadimitriou, S., Mihaila, G. A., and Stanoi, I. 2007. Hiding in the crowd: Privacy preservation on evolving streams through correlation tracking. In Proceedings of the 23th IEEE International Conference on Data Mining (ICDE'07). IEEE Computer Society, Los Alamitos, CA, 686--695.Google ScholarGoogle Scholar
  25. Lindell, Y. and Pinkas, B. 2000. Privacy preserving data mining. In Proceedings of the 20th Annual International Cryptology Conference on Advances in Cryptology (CRYPTO'00). Springer-Verlag, Berlin, Germany, 36--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lukasiewicz, T. 2001. Probabilistic logic programming with conditional constraints. ACM Trans. Comput. Logic 2, 3, 289--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M. 2006. l-diversity: Privacy beyond k-anonymity. In Proceedings of the 22th IEEE International Conference on Data Mining (ICDE'06). IEEE Computer Society, Los Alamitos, CA, 24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. O'Connor, L. 1993. The inclusion-exclusion principle and its applications to cryptography. Cryptologia 17, 1, 63--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Park, H. and Shim, K. 2007. Approximate algorithms for k-anonymity. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). ACM, New York, NY, 67--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shoshani, A. 1982. Statistical databases: Characteristics, problems, and some solutions. In Proceedings of the 8th International Conference on Very Large Data Bases (VLDB'82). Morgan Kaufmann Publishers Inc., San Francisco, CA, 208--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sweeney, L. 2002. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 5, 557--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Traub, J. F., Yemini, Y., and Woźniakowski, H. 1984. The statistical security of a statistical database. ACM Trans. Datab. Syst. 9, 4, 672--679. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vaidya, J. and Clifton, C. 2002. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). ACM, New York, NY, 639--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Vavasis, S. A. 1990. Quadratic programming is in np. Inf. Process. Lett. 36, 2, 73--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wang, K., Fung, B. C. M., and Yu, P. S. 2007. Handicapping attacker's confidence: an alternative to k-anonymization. Knowl. Inf. Syst. 11, 3, 345--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Wang, T. and Liu, L. 2008. Butterfly: Protecting output privacy in stream mining. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE'08). IEEE Computer Society, Los Alamitos, CA, 1170--1179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wang, T., Meng, S., Bamba, B., Liu, L., and Pu, C. 2009. A general proximity privacy principle. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). IEEE Computer Society, Los Alamitos, CA, 1279--1282. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Output privacy in data mining

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 36, Issue 1
          March 2011
          251 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/1929934
          Issue’s Table of Contents

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 March 2011
          • Accepted: 1 February 2010
          • Revised: 1 January 2010
          • Received: 1 April 2009
          Published in tods Volume 36, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader