research-article

Discovering frequent patterns in sensitive data

Authors:
Raghav Bhaskar

Microsoft Research, Bangalore, India

Microsoft Research, Bangalore, India
View Profile

,
Srivatsan Laxman

Microsoft Research, Bangalore, India

Microsoft Research, Bangalore, India
View Profile

,
Adam Smith

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Abhradeep Thakurta

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2010Pages 503–512https://doi.org/10.1145/1835804.1835869

Published:25 July 2010Publication History

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 503–512

ABSTRACT

Discovering frequent patterns from data is a popular exploratory technique in datamining. However, if the data are sensitive (e.g., patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees of privacy for the individuals whose information is stored there.

We present two efficient algorithms for discovering the k most frequent patterns in a data set of sensitive records. Our algorithms satisfy differential privacy, a recently introduced definition that provides meaningful privacy guarantees in the presence of arbitrary external information. Differentially private algorithms require a degree of uncertainty in their output to preserve privacy. Our algorithms handle this by returning 'noisy' lists of patterns that are close to the actual list of k most frequent patterns in the data. We define a new notion of utility that quantifies the output accuracy of private top-k pattern mining algorithms. In typical data sets, our utility criterion implies low false positive and false negative rates in the reported lists. We prove that our methods meet the new utility criterion; we also demonstrate the performance of our algorithms through extensive experiments on the transaction data sets from the FIMI repository. While the paper focuses on frequent pattern mining, the techniques developed here are relevant whenever the data mining output is a list of elements ordered according to an appropriately 'robust' measure of interest.

Supplemental Material

kdd2010_thakurta_dfp_01.mov

mov

99.1 MB

Download

References

Apriori implementation of Ferenc Bodon. http://www.cs.bme.hu/~bodon/en/apriori/.Google Scholar
Frequent itemset mining implementations repository. http://fimi.helsinki.fi.Google Scholar
C. Aggarwal, C. C. Aggarwal, and P. S. Yu. A condensation approach to privacy preserving data mining. In Proceedings of the Ninth International Conference on Extending Database Technology (EDBT), pages 183--199, 2004.Google ScholarCross Ref
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207--216, May 1993. Google ScholarDigital Library
R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering, Taipei, Taiwan, Mar. 1995. IEEE Computer Society, Washington, DC, USA. Google ScholarDigital Library
S. Agrawal and J. R. Haritsa. A framework for high-accuracy privacy-preserving mining. In ICDE, pages 193--204, 2005. Google ScholarDigital Library
M. Barbaro and T. Zeller. A face is exposed for aol searcher no. 4417749. The New York Times, Aug. 2006.Google Scholar
R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering Frequent Patterns in Sensitive Data . Technical Report NAS-TR-0129-2010, Network and Security Research Center, Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA, Apr. 2010. http://www.cse.psu.edu/~asmith/fim/.Google ScholarDigital Library
A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, pages 609--618, 2008. Google ScholarDigital Library
C. Dwork. Differential privacy. In ICALP, LNCS, pages 1--12, 2006. Google ScholarDigital Library
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265--284, 2006. Google ScholarDigital Library
A. V. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, pages 211--222, 2003. Google ScholarDigital Library
S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In KDD, pages 265--273, 2008. Google ScholarDigital Library
N. G.N., B. A., H. J., S. K., and E. I.R. Temporal pattern discovery for trends and transient effects: Its application to patient records. In Proceedings of the Fourteenth International Conference on Knowledge Discovery and Data Mining SIGKDD 2008, pages 963--971, 2008. Google ScholarDigital Library
M. Götz, A. Machanavajjhala, G. Wang, X. Xiao, and J. Gehrke. Privacy in search logs. CoRR, abs/0904.0682, 2009.Google Scholar
J. Han and M. Kamber. Data mining: Concepts and techniques. Morgan Kaufmann Publishers, San Fransisco, CA, USA, 2001. Google ScholarDigital Library
D. Hand, H. Mannila, and P. Smyth. Principles of data mining. MIT Press, Cambridge, MA, USA, 2001. Google ScholarDigital Library
V. Hristidis, editor. Information Discovery on Electronic Health Records. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, Boca Raton, FL, USA, 2009. Google ScholarDigital Library
A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In WWW, pages 171--180, 2009. Google ScholarDigital Library
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. 1(3):259--289, 1997. Google ScholarDigital Library
Y. Matias, J. S. Vitter, and W.-C. Ni. Dynamic generation of discrete random variates. Theory Comput. Syst., 36(4):329--358, 2003.Google ScholarCross Ref
F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD Conference, pages 19--30, 2009. Google ScholarDigital Library
F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94--103, 2007. Google ScholarDigital Library
T. Mielikäinen. On inverse frequent set mining. In 2nd Workshop on Privacy Preserving Data Mining (PPDM 2003), pages 18--23. IEEE Computer Society, 2003.Google Scholar
L. Sweeney. k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):557--570, 2002. Google ScholarDigital Library
T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5:59--68, 2003. Google ScholarDigital Library

Index Terms

Discovering frequent patterns in sensitive data
1. Information systems
  1. Data management systems

Recommendations

Efficient algorithms for mining constrained frequent patterns from uncertain data
U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data

Mining of frequent patterns is one of the popular knowledge discovery and data mining (KDD) tasks. It also plays an essential role in the mining of many other patterns such as correlation, sequences, and association rules. Hence, it has been the subject ...
Read More
Mining frequent graph patterns with differential privacy
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Discovering frequent graph patterns in a graph database offers valuable information in a variety of applications. However, if the graph dataset contains sensitive data of individuals such as mobile phone-call graphs and web-click graphs, releasing ...
Read More
Dataless Transitions Between Concise Representations of Frequent Patterns

For many data mining problems in order to solve them it is required to discover frequent patterns. Frequent itemsets are useful e.g. in the discovery of association and episode rules, sequential patterns and clusters. Nevertheless, the number of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
July 2010
1240 pages
ISBN:9781450300551
DOI:10.1145/1835804
General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
differential privacy
exponential mechanism
frequent itemsets
frequent patterns
privacy
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 154
  Total Citations
  View Citations
- 1,545
  Total Downloads
- Downloads (Last 12 months)51
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discovering frequent patterns in sensitive data

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Efficient algorithms for mining constrained frequent patterns from uncertain data

Mining frequent graph patterns with differential privacy

Dataless Transitions Between Concise Representations of Frequent Patterns