research-article

Data mining with differential privacy

Authors:
Arik Friedman

Technion - Israel Institute of Technology, Haifa, Israel

Technion - Israel Institute of Technology, Haifa, Israel
View Profile

,
Assaf Schuster

Technion - Israel Institute of Technology, Haifa, Israel

Technion - Israel Institute of Technology, Haifa, Israel
View Profile

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2010Pages 493–502https://doi.org/10.1145/1835804.1835868

Published:25 July 2010Publication History

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 493–502

ABSTRACT

We consider the problem of data mining with formal privacy guarantees, given a data access interface based on the differential privacy framework. Differential privacy requires that computations be insensitive to changes in any particular individual's record, thereby restricting data leaks through the results. The privacy preserving interface ensures unconditionally safe access to the data and does not require from the data miner any expertise in privacy. However, as we show in the paper, a naive utilization of the interface to construct privacy preserving data mining algorithms could lead to inferior data mining results. We address this problem by considering the privacy and the algorithmic requirements simultaneously, focusing on decision tree induction as a sample application. The privacy mechanism has a profound effect on the performance of the methods chosen by the data miner. We demonstrate that this choice could make the difference between an accurate classifier and a completely useless one. Moreover, an improved algorithm can achieve the same level of accuracy and privacy as the naive implementation but with an order of magnitude fewer learning samples.

Supplemental Material

kdd2010_friedman_dmdp_01.mov

mov

122.4 MB

Download

References

A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: The SuLQ framework. In Proc. of PODS, pages 128--138, New York, NY, June 2005. Google ScholarDigital Library
A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In Proc. of STOC, pages 609--618, 2008. Google ScholarDigital Library
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classiffcation and Regression Trees. Chapman & Hall, New York, 1984.Google Scholar
K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression. In NIPS, pages 289--296, 2008.Google ScholarDigital Library
C. B. D.J. Newman, S. Hettich and C. Merz. UCI repository of machine learning databases, 1998.Google Scholar
P. Domingos and G. Hulten. Mining high-speed data streams. In KDD, pages 71--80, 2000. Google ScholarDigital Library
C. Dwork. Differential privacy. In ICALP (2), volume 4052 of LNCS, pages 1--12, 2006. Google ScholarDigital Library
C. Dwork. Differential privacy: A survey of results. In TAMC, pages 1--19, 2008. Google ScholarDigital Library
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265--284, 2006. Google ScholarDigital Library
C. Dwork and S. Yekhanin. New efficient attacks on statistical disclosure control mechanisms. In CRYPTO, pages 469--480, 2008. Google ScholarDigital Library
D. Feldman, A. Fiat, H. Kaplan, and K. Nissim. Private coresets. In STOC, pages 361--370, 2009. Google ScholarDigital Library
S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In KDD, pages 265--273, 2008. Google ScholarDigital Library
S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? In FOCS, pages 531--540, 2008. Google ScholarDigital Library
A. Machanavajjhala, D. Kifer, J. M. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on the map. In ICDE, pages 277--286, 2008. Google ScholarDigital Library
F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD Conference, pages 19--30, 2009. Google ScholarDigital Library
F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the net. In KDD, pages 627--636, 2009. Google ScholarDigital Library
F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94--103, 2007. Google ScholarDigital Library
J. Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3(4):319--342, 1989. Google ScholarDigital Library
J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81--106, 1986. Google ScholarCross Ref
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, 1993. Google ScholarDigital Library
R. E. Steur. Multiple criteria optimization: theory computation and application. John Wiley & Sons, New York, 1986.Google Scholar
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005. Google ScholarDigital Library

Index Terms

Data mining with differential privacy

Recommendations

Differentially private data release for data mining
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions ...
Read More
Mining frequent graph patterns with differential privacy
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Discovering frequent graph patterns in a graph database offers valuable information in a variety of applications. However, if the graph dataset contains sensitive data of individuals such as mobile phone-call graphs and web-click graphs, releasing ...
Read More
Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects
ICCCT '12: Proceedings of the 2012 Third International Conference on Computer and Communication Technology

Privacy preserving has originated as an important concern with reference to the success of the data mining. Privacy preserving data mining (PPDM) deals with protecting the privacy of individual data or sensitive knowledge without sacrificing the utility ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
July 2010
1240 pages
ISBN:9781450300551
DOI:10.1145/1835804
General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data mining
decision trees
differential privacy
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 290
  Total Citations
  View Citations
- 4,659
  Total Downloads
- Downloads (Last 12 months)356
- Downloads (Last 6 weeks)45
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data mining with differential privacy

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Differentially private data release for data mining

Mining frequent graph patterns with differential privacy

Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects