skip to main content
10.1145/1065167.1065169acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Context-sensitive program analysis as database queries

Published:13 June 2005Publication History

ABSTRACT

Program analysis has been increasingly used in software engineering tasks such as auditing programs for security vulnerabilities and finding errors in general. Such tools often require analyses much more sophisticated than those traditionally used in compiler optimizations. In particular, context-sensitive pointer alias information is a prerequisite for any sound and precise analysis that reasons about uses of heap objects in a program. Context-sensitive analysis is challenging because there are over 1014 contexts in a typical large program, even after recursive cycles are collapsed. Moreover, pointers cannot be resolved in general without analyzing the entire program.

This paper presents a new framework, based on the concept of deductive databases, for context-sensitive program analysis. In this framework, all program information is stored as relations; data access and analyses are written as Datalog queries. To handle the large number of contexts in a program, the database represents relations with binary decision diagrams (BDDs). The system we have developed, called bddbddb, automatically translates database queries into highly optimized BDD programs.

Our preliminary experiences suggest that a large class of analyses involving heap objects can be described succinctly in Datalog and implemented efficiently with BDDs. To make developing application-specific analyses easy for programmers, we have also created a language called PQL that makes a subset of Datalog queries more intuitive to define. We have used the language to find many security holes in Web applications.

References

  1. B. Arkin, S. Stender, and G. McGraw. Software penetration testing. Proceedings of the IEEE Symposium on Security and Privacy, 3(1):84--87, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Avots, M. Dalton, V. B. Livshits, and M. S. Lam. Using C pointer analysis to improve software security. In Proceedings of the 27th International Conference on Software Engineering, May 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Balbin and K. Ramamohanarao. A generalization of the differential approach to recursive query optimization. Journal of Logic Programming, 4(3):259--262, Sept. 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Ball and S. Rajamani. SLIC: A specification language for interface checking (of C). Technical Report MSR-TR-2001-21, Microsoft Research, Jan. 2002.]]Google ScholarGoogle Scholar
  5. T. Ball and S. K. Rajamani. A symbolic model checker for boolean programs. In Proceedings of the SPIN 2000 Workshop on Model Checking of Software, pages 113--130, Aug. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Bancilhon, D. Maier, Y. Sagiv, and J. D. Ullman. Magic sets and other strange ways to implement logic programs (extended abstract). In PODS '86: Proceedings of the Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, pages 1--15, 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Berndl, O. Lhoták, F. Qian, L. Hendren, and N. Umanee. Points-to analysis using BDDs. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, pages 103--114, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Besson and T. Jensen. Modular class analysis with Datalog. In Proceedings of the 10th International Static Analysis Symposium, pages 19--36, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Beyer, A. Noack, and C. Lewerentz. Simple and efficient relational querying of software structures. In Proceedings of the 10th IEEE Working Conference on Reverse Engineering, pages 216--225, Nov. 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. Comput., 35(8):677--691, 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Buege, R. Layman, and A. Taylor. Hacking Exposed: J2EE and Java: Developing Secure Applications with Java Technology. McGraw-Hill/Osborne, 2002.]]Google ScholarGoogle Scholar
  12. W. R. Bush, J. D. Pincus, and D. J. Sielaff. A static analyzer for finding dynamic programming errors. Software - Practice and Experience (SPE), 30:775--802, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Ceri, G. Gottlob, and L. Tanca. Logic programming and databases. Springer-Verlag New York, Inc., 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Chandra and D. Harel. Horn clauses and generalizations. Journal of Logic Programming, 2(1):1--15, 1985.]]Google ScholarGoogle ScholarCross RefCross Ref
  15. W. Chen and D. S. Warren. Tabled evaluation with delaying for general logic programs. J. ACM, 43(1):20--74, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Chess and G. McGraw. Static analysis for security. Proceedings of the IEEE Symposium on Security and Privacy, 2(6):76--79, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Cohen. BuDDy - a Binary Decision Diagram package. http://buddy. sourceforge.net/.]]Google ScholarGoogle Scholar
  18. S. Cook. A Web developer's guide to cross-site scripting. http://www.giac.org/practical/GSEC/Steve_Cook_GSEC. pdf, 2003.]]Google ScholarGoogle Scholar
  19. M.-M. Corsini, K. Musumbu, A. Rauzy, and B. L. Charlier. Efficient bottom-up abstract interpretation of prolog by means of constraint solving over symbolic finite domains. In PLILP '93: Proceedings of the 5th International Symposium on Programming Language Implementation and Logic Programming, pages 75--91, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Das, B. Liblit, M. Fähndrich, and J. Rehof. Estimating the impact of scalable pointer analysis on optimization. In Proceedings of the 8th International Static Analysis Symposium, pages 260--278, July 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Emami, R. Ghiya, and L. J. Hendren. Context-sensitive interprocedural points-to analysis in the presence of function pointers. In Proceedings of the ACM SIGPLAN'94 Conference on Programming Language Design and Implementation, pages 242--256, June 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Fähndrich, J. Rehof, and M. Das. Scalable context-sensitive flow analysis using instantiation constraints. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, pages 253--263, June 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Greiner, A. J. Grove, and D. Roth. Learning cost-sensitive active classifiers. Artif. Intell., 139(2):137--174, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Grossman. WASC activities and U.S. Web application security trends. http://www.whitehatsec.com/presentations/WASC_WASF_1.02.pdf, 2004.]]Google ScholarGoogle Scholar
  25. O. Grumberg, S. Livne, and S. Markovitch. Learning to order BDD variables in verification. Journal of Artificial Intelligence Research, 18:83--116, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Hallem, B. Chelf, Y. Xie, and D. Engler. A system and language for building system-specific, static analyses. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, pages 69--82, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Heintze and O. Tardieu. Ultra-fast aliasing analysis using CLA: A million lines of C code in a second. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, pages 254--263, June 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Hulme. New software may improve application security. http://www.informationweek.com/story/IWK20010209S0003, 2001.]]Google ScholarGoogle Scholar
  29. M. Iwaihara and Y. Inoue. Bottom-up evaluation of logic programs using binary decision diagrams. In ICDE '95: Proceedings of the Eleventh International Conference on Data Engineering, pages 467--474, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Johnson and D. Wagner. Finding user/kernel pointer bugs with type inference. In Proceedings of the 2004 Usenix Security Conference, pages 119--134, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Klein. Divide and conquer: HTTP response splitting, Web cache poisoning attacks, and related topics. http://www.sanctuminc.com/pdf/Whitepaper_HTTPResponse.pdf, 2004.]]Google ScholarGoogle Scholar
  32. W. Landi, B. G. Ryder, and S. Zhang. Interprocedural modification side effect analysis with pointer aliasing. In Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation, pages 56--67, June 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. O. Lhoták and L. Hendren. Scaling Java points-to analysis using Spark. In Proceedings of the 12th International Conference on Compiler Construction, pages 153--169, April 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. O. Lhoták and L. Hendren. Jedd: a BDD-based relational extension of Java. In PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, pages 158--169, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. A. Liu and S. D. Stoller. From Datalog rules to efficient programs with time and space guarantees. In PPDP '03: Proceedings of the 5th ACM SIGPLAN International Conference on Principles and Practice of Declarative Programming, pages 172--183, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Manevich, G. Ramalingam, J. Field, D. Goyal, and M. Sagiv. Compactly representing first-order structures for static analysis. In Proceedings of the 9th International Static Analysis Symposium, pages 196--212, Sept. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. F. Naughton and R. Ramakrishnan. Bottom-up evaluation of logic programs. In Computational Logic - Essays in Honor of Alan Robinson, pages 640--700, 1991.]]Google ScholarGoogle Scholar
  38. Open Web Application Security Project. The ten most critical Web application security vulnerabilities. http://umn.dl.sourceforge.net/sourceforge/owasp/OWASPTopTen2004.pdf, 2004.]]Google ScholarGoogle Scholar
  39. R. Ramakrishnan, D. Srivastava, and S. Sudarshan. Rule ordering in bottom-up fixpoint evaluation of logic programs. In Proceedings of the 16th International Conference on Very Large Data Bases, pages 359--371, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Ramakrishnan, D. Srivastava, S. Sudarshan, and P. Seshadri. Implementation of the CORAL deductive database system. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages 167--176, May 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. Reps. Demand interprocedural program analysis using logic databases. Applications of Logic Databases, pages 163--196, 1994.]]Google ScholarGoogle Scholar
  42. T. Reps. Solving demand versions of interprocedural analysis problems. In Proceedings of the Fifth International Conference on Compiler Construction, pages 389--403, Apr. 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. O. Ruwase and M. S. Lam. A practical dynamic buffer overflow detector. In Proceedings of the 11th Annual Network and Distributed System Security Symposium, pages 159--169, 2004.]]Google ScholarGoogle Scholar
  44. J. Scambray and M. Shema. Web Applications (Hacking Exposed). Addison-Wesley Professional, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. U. Shankar, K. Talwar, J. S. Foster, and D. Wagner. Detecting format string vulnerabilities with type qualifiers. In Proceedings of the 2001 Usenix Security Conference, pages 201--220, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. Sharir and A. Pnueli. Two approaches to interprocedural data-flow analyis. In S. Muchnick and N. D. Jones, editors, Program Flow Analysis: Theory and Applications, chapter 7, pages 189--234. Prentice-Hall, 1981.]]Google ScholarGoogle Scholar
  47. N. J. A. Sloane. The on-line encyclopedia of integer sequences: A000670. http://www.research.att.com/as/sequences, 2003.]]Google ScholarGoogle Scholar
  48. B. Steensgaard. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 32--41, Jan. 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. H. Tamaki and T. Sato. Old resolution with tabulation. In Proceedings on Third international conference on logic programming, pages 84--98, 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. D. Ullman. Bottom-up beats top-down for Datalog. In PODS '89: Proceedings of the 8th ACM SIGACT-SIGMOD-SIGART symposium on Principles of Database Systems, pages 140--149, 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. D. Ullman. Principles of Database and Knowledge-Base Systems. Computer Science Press, Rockville, Md., volume II edition, 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. Vernon. Top five threats. ComputerWeekly.com (http://www.computerweekly.com/Article129980.htm), Apr. 2004.]]Google ScholarGoogle Scholar
  53. D. Wagner, J. Foster, E. Brewer, and A. Aiken. A first step towards automated detection of buffer overrun vulnerabilities. In Proceedings of Network and Distributed Systems Security Symposium, pages 3--17, 2000.]]Google ScholarGoogle Scholar
  54. WebCohort, Inc. Only 10% of Web applications are secured against common hacking techniques. http://www.imperva.com/company/news/2004-feb-02.html, 2004.]]Google ScholarGoogle Scholar
  55. J. Whaley and M. Lam. Cloning-based context-sensitive pointer alias analysis using binary decision diagrams. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pages 131--144, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. J. Whaley and M. S. Lam. An efficient inclusion-based points-to analysis for strictly-typed languages. In Proceedings of the 9th International Static Analysis Symposium, pages 180--195, Sept. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. J. Whaley and M. Rinard. Compositional pointer and escape analysis for Java programs. In Conference of Object Oriented Programming: Systems, Languages, and Applications, pages 187--206, Nov. 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. R. P. Wilson and M. S. Lam. Efficient context-sensitive pointer analysis for C programs. In Proceedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation, pages 1--12, June 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. T. Yavuz-Kahveci and T. Bultan. Automated verification of concurrent linked lists with counters. In Proceedings of the 9th International Static Analysis Symposium, pages 69--84, Sept. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. J. Zhu. Symbolic pointer analysis. In Proceedings of the International Conference in Computer-Aided Design, pages 150--157, Nov. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. J. Zhu and S. Calman. Symbolic pointer analysis revisited. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pages 145--157, 2004. Research Session 1: Querying XML & Semistructured Data / Query Languages]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
    June 2005
    388 pages
    ISBN:1595930620
    DOI:10.1145/1065167
    • General Chair:
    • Georg Gottlob,
    • Program Chair:
    • Foto Afrati

    Copyright © 2005 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 June 2005

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate642of2,707submissions,24%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader