skip to main content
10.1145/1835804.1835868acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Data mining with differential privacy

Authors Info & Claims
Published:25 July 2010Publication History

ABSTRACT

We consider the problem of data mining with formal privacy guarantees, given a data access interface based on the differential privacy framework. Differential privacy requires that computations be insensitive to changes in any particular individual's record, thereby restricting data leaks through the results. The privacy preserving interface ensures unconditionally safe access to the data and does not require from the data miner any expertise in privacy. However, as we show in the paper, a naive utilization of the interface to construct privacy preserving data mining algorithms could lead to inferior data mining results. We address this problem by considering the privacy and the algorithmic requirements simultaneously, focusing on decision tree induction as a sample application. The privacy mechanism has a profound effect on the performance of the methods chosen by the data miner. We demonstrate that this choice could make the difference between an accurate classifier and a completely useless one. Moreover, an improved algorithm can achieve the same level of accuracy and privacy as the naive implementation but with an order of magnitude fewer learning samples.

Skip Supplemental Material Section

Supplemental Material

kdd2010_friedman_dmdp_01.mov

mov

122.4 MB

References

  1. A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: The SuLQ framework. In Proc. of PODS, pages 128--138, New York, NY, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In Proc. of STOC, pages 609--618, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classiffcation and Regression Trees. Chapman & Hall, New York, 1984.Google ScholarGoogle Scholar
  4. K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression. In NIPS, pages 289--296, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. B. D.J. Newman, S. Hettich and C. Merz. UCI repository of machine learning databases, 1998.Google ScholarGoogle Scholar
  6. P. Domingos and G. Hulten. Mining high-speed data streams. In KDD, pages 71--80, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Dwork. Differential privacy. In ICALP (2), volume 4052 of LNCS, pages 1--12, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Dwork. Differential privacy: A survey of results. In TAMC, pages 1--19, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265--284, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Dwork and S. Yekhanin. New efficient attacks on statistical disclosure control mechanisms. In CRYPTO, pages 469--480, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Feldman, A. Fiat, H. Kaplan, and K. Nissim. Private coresets. In STOC, pages 361--370, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In KDD, pages 265--273, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? In FOCS, pages 531--540, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Machanavajjhala, D. Kifer, J. M. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on the map. In ICDE, pages 277--286, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD Conference, pages 19--30, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the net. In KDD, pages 627--636, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94--103, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3(4):319--342, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81--106, 1986. Google ScholarGoogle ScholarCross RefCross Ref
  20. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. E. Steur. Multiple criteria optimization: theory computation and application. John Wiley & Sons, New York, 1986.Google ScholarGoogle Scholar
  22. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data mining with differential privacy

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
          July 2010
          1240 pages
          ISBN:9781450300551
          DOI:10.1145/1835804

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 July 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader