skip to main content
10.1145/1835804.1835869acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Discovering frequent patterns in sensitive data

Published:25 July 2010Publication History

ABSTRACT

Discovering frequent patterns from data is a popular exploratory technique in datamining. However, if the data are sensitive (e.g., patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees of privacy for the individuals whose information is stored there.

We present two efficient algorithms for discovering the k most frequent patterns in a data set of sensitive records. Our algorithms satisfy differential privacy, a recently introduced definition that provides meaningful privacy guarantees in the presence of arbitrary external information. Differentially private algorithms require a degree of uncertainty in their output to preserve privacy. Our algorithms handle this by returning 'noisy' lists of patterns that are close to the actual list of k most frequent patterns in the data. We define a new notion of utility that quantifies the output accuracy of private top-k pattern mining algorithms. In typical data sets, our utility criterion implies low false positive and false negative rates in the reported lists. We prove that our methods meet the new utility criterion; we also demonstrate the performance of our algorithms through extensive experiments on the transaction data sets from the FIMI repository. While the paper focuses on frequent pattern mining, the techniques developed here are relevant whenever the data mining output is a list of elements ordered according to an appropriately 'robust' measure of interest.

Skip Supplemental Material Section

Supplemental Material

kdd2010_thakurta_dfp_01.mov

mov

99.1 MB

References

  1. Apriori implementation of Ferenc Bodon. http://www.cs.bme.hu/~bodon/en/apriori/.Google ScholarGoogle Scholar
  2. Frequent itemset mining implementations repository. http://fimi.helsinki.fi.Google ScholarGoogle Scholar
  3. C. Aggarwal, C. C. Aggarwal, and P. S. Yu. A condensation approach to privacy preserving data mining. In Proceedings of the Ninth International Conference on Extending Database Technology (EDBT), pages 183--199, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207--216, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering, Taipei, Taiwan, Mar. 1995. IEEE Computer Society, Washington, DC, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Agrawal and J. R. Haritsa. A framework for high-accuracy privacy-preserving mining. In ICDE, pages 193--204, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Barbaro and T. Zeller. A face is exposed for aol searcher no. 4417749. The New York Times, Aug. 2006.Google ScholarGoogle Scholar
  8. R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering Frequent Patterns in Sensitive Data . Technical Report NAS-TR-0129-2010, Network and Security Research Center, Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA, Apr. 2010. http://www.cse.psu.edu/~asmith/fim/.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, pages 609--618, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Dwork. Differential privacy. In ICALP, LNCS, pages 1--12, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265--284, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. V. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, pages 211--222, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In KDD, pages 265--273, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. G.N., B. A., H. J., S. K., and E. I.R. Temporal pattern discovery for trends and transient effects: Its application to patient records. In Proceedings of the Fourteenth International Conference on Knowledge Discovery and Data Mining SIGKDD 2008, pages 963--971, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Götz, A. Machanavajjhala, G. Wang, X. Xiao, and J. Gehrke. Privacy in search logs. CoRR, abs/0904.0682, 2009.Google ScholarGoogle Scholar
  16. J. Han and M. Kamber. Data mining: Concepts and techniques. Morgan Kaufmann Publishers, San Fransisco, CA, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Hand, H. Mannila, and P. Smyth. Principles of data mining. MIT Press, Cambridge, MA, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Hristidis, editor. Information Discovery on Electronic Health Records. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, Boca Raton, FL, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In WWW, pages 171--180, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. 1(3):259--289, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Matias, J. S. Vitter, and W.-C. Ni. Dynamic generation of discrete random variates. Theory Comput. Syst., 36(4):329--358, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  22. F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD Conference, pages 19--30, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94--103, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Mielikäinen. On inverse frequent set mining. In 2nd Workshop on Privacy Preserving Data Mining (PPDM 2003), pages 18--23. IEEE Computer Society, 2003.Google ScholarGoogle Scholar
  25. L. Sweeney. k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):557--570, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5:59--68, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Discovering frequent patterns in sensitive data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
      July 2010
      1240 pages
      ISBN:9781450300551
      DOI:10.1145/1835804

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader