ABSTRACT
It has recently been asserted that the usability of a database is as important as its capability. Understanding the database schema, the hidden relationships among attributes in the data all play an important role in this context. Subscribing to this viewpoint, in this paper, we present a novel data-driven approach, called Query By Output (QBO), which can enhance the usability of database systems. The central goal of QBO is as follows: given the output of some query Q on a database D, denoted by Q(D), we wish to construct an alternative query Q′ such that Q(D) and Q′ (D) are instance-equivalent. To generate instance-equivalent queries from Q(D), we devise a novel data classification-based technique that can handle the at-least-one semantics that is inherent in the query derivation. In addition to the basic framework, we design several optimization techniques to reduce processing overhead and introduce a set of criteria to rank order output queries by various notions of utility. Our framework is evaluated comprehensively on three real data sets and the results show that the instance-equivalent queries we obtain are interesting and that the approach is scalable and robust to queries of different selectivities.
- P. Andritsos, R. J. Miller, and P. Tsaparas. Information-theoretic tools for mining database structure from large data sets. In SIGMOD, pages 731--742, 2004. Google ScholarDigital Library
- C. Binnig, D. Kossmann, and E. Lo. Reverse query processing. In ICDE, pages 506--515, 2007.Google ScholarCross Ref
- C. Binnig, D. Kossmann, E. Lo, and M. T. Ä Ozsu. QAGen: generating query-aware test databases. In SIGMOD, pages 341--352, 2007. Google ScholarDigital Library
- S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarDigital Library
- N. Bruno, S. Chaudhuri, and D. Thomas. Generating queries with cardinality constraints for dbms testing. IEEE TKDE, 18(12):1721--1725, 2006. Google ScholarDigital Library
- P. Buneman, J. Cheney, W.-C. Tan, and S. Vansummeren. Curated databases. In PODS, pages 1--12, 2008. Google ScholarDigital Library
- T. Gaasterl, P. Godfrey, and J. Minker. An overview of cooperative answering. Journal of Intelligent Information Systems, (2):123--157, 1992.Google ScholarCross Ref
- L. Getoor, B. Taskar, and D. Koller. Selectivity estimation using probabilistic models. In SIGMOD, pages 461--472, 2001. Google ScholarDigital Library
- P. Godfrey, J. Gryz, and C. Zuzarte. Exploiting constraint-like data characterizations in query optimization. In SIGMOD, pages 582--592, 2001. Google ScholarDigital Library
- H. V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. Making database systems usable. In SIGMOD, pages 13--24, 2007. Google ScholarDigital Library
- T. Johnson, A. Marathe, and T. Dasu. Database exploration and bellman. 26(3):34--39, 2003.Google Scholar
- G. Koutrika, A. Simitsis, and Y. Ioannidis. Précis: The essence of a query answer. In ICDE, page 69, 2006. Google ScholarDigital Library
- M. Mehta, R. Agrawal, and J. Rissanen. SLIQ: A fast scalable classifier for data mining. In EDBT, pages 18--32, 1996. Google ScholarDigital Library
- C. Mishra, N. Koudas, and C. Zuzarte. Generating targeted queries for database testing. In SIGMOD, pages 499--510, 2008. Google ScholarDigital Library
- A. Motro. Intensional answers to database queries. IEEE TKDE, 6(3):444--454, 1994. Google ScholarDigital Library
- M. P.N. Tan and V.Kumar. Introduction to Data Mining. Addison-Wesley, 2006. Google ScholarDigital Library
- N. Ramakrishnan, D. Kumar, B. Mishra, M. Potts, and R. F. Helm. Turning cartwheels: An alternating algorithm for mining redescriptions. In KDD, pages 266--275, 2004. Google ScholarDigital Library
- J. Rissanen. Modeling by shortest data description. Automatica, 14:465--471, 1978.Google ScholarDigital Library
- A. Simitsis, G. Koutrika, and Y. E. Ioannidis. Generalized précis queries for logical database subset creation. In ICDE, pages 1382--1386, 2007.Google Scholar
- Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query by output. Technical Report TRA4/09, National University of Singapore - School of Computing, April 2009.Google Scholar
- P. Valduriez. Join indices. ACM Trans. Database Syst., 12(2):218--246, 1987. Google ScholarDigital Library
- C. J. van Rijsbergen. Information Retireval. Butterworth, 1979. Google ScholarDigital Library
- W. Wu, B. Reinwald, Y. Sismanis, and R. Manjrekar. Discovering topical structures of databases. In SIGMOD, pages 1019--1030, 2008. Google ScholarDigital Library
- X. Xiao and Y. Tao. Output perturbation with query relaxation. Proc. VLDB Endow., 1(1):857--869, 2008. Google ScholarDigital Library
- M. M. Zloof. Query by example. In AFIPS NCC, pages 431--438, 1975. Google ScholarDigital Library
Index Terms
- Query by output
Recommendations
Query reverse engineering
In this paper, we introduce a new problem termed query reverse engineering (QRE). Given a database D and a result table T-the output of some known or unknown query Q on D-the goal of QRE is to reverse-engineer a query Q' such that the output of query Q' ...
Interactive Query Synthesis from Input-Output Examples
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataThis demo showcases Scythe, a novel query-by-example system that can synthesize expressive SQL queries from input-output examples. Scythe is designed to help end-users program SQL and explore data simply using input-output examples. From a web-browser, ...
View-based query containment
PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsQuery containment is the problem of checking whether for all databases the answer to a query is a subset of the answer to a second query. In several data management tasks, such as data integration, mobile computing, etc., the data of interest are only ...
Comments