skip to main content
research-article

Characterizing schema mappings via data examples

Authors Info & Claims
Published:19 December 2011Publication History
Skip Abstract Section

Abstract

Schema mappings are high-level specifications that describe the relationship between two database schemas; they are considered to be the essential building blocks in data exchange and data integration, and have been the object of extensive research investigations. Since in real-life applications schema mappings can be quite complex, it is important to develop methods and tools for understanding, explaining, and refining schema mappings. A promising approach to this effect is to use “good” data examples that illustrate the schema mapping at hand.

We develop a foundation for the systematic investigation of data examples and obtain a number of results on both the capabilities and the limitations of data examples in explaining and understanding schema mappings. We focus on schema mappings specified by source-to-target tuple generating dependencies (s-t tgds) and investigate the following problem: which classes of s-t tgds can be “uniquely characterized” by a finite set of data examples? Our investigation begins by considering finite sets of positive and negative examples, which are arguably the most natural choice of data examples. However, we show that they are not powerful enough to yield interesting unique characterizations. We then consider finite sets of universal examples, where a universal example is a pair consisting of a source instance and a universal solution for that source instance. We first show that unique characterizations via universal examples is, in a precise sense, equivalent to the existence of Armstrong bases (a relaxation of the classical notion of Armstrong databases). After this, we show that every schema mapping specified by LAV s-t tgds is uniquely characterized by a finite set of universal examples with respect to the class of LAV s-t tgds. Moreover, this positive result extends to the much broader classes of n-modular schema mappings, n a positive integer. Finally, we study the unique characterizability of GAV schema mappings. It turns out that some GAV schema mappings are uniquely characterizable by a finite set of universal examples with respect to the class of GAV s-t tgds, while others are not. By unveiling a tight connection with homomorphism dualities, we establish an effective, sound, and complete criterion for determining whether or not a GAV schema mapping is uniquely characterizable by a finite set of universal examples with respect to the class of GAV s-t tgds.

References

  1. Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexe, B., ten Cate, B., Kolaitis, P. G., and Tan, W.-C. 2011. Designing and refining schema mappings via data examples. In Proceedings of the International Conference on Management of Data. ACM, New York, 133--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexe, B., Chiticariu, L., Miller, R. J., and Tan, W. C. 2008a. Muse: Mapping understanding and design by example. In Proceedings of the International Conference on Data Engineering (ICDE). 10--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alexe, B., Kolaitis, P. G., and Tan, W.-C. 2010. Characterizing schema mappings via data examples. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alexe, B., Tan, W. C., and Velegrakis, Y. 2008b. STBenchmark: Towards a benchmark for mapping systems. Proc. VLDB Endowm. 1, 1, 230--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Armstrong, W. W. 1974. Dependency structures of data base relationships. In Proceedings of the IFIP Congress. 580--583.Google ScholarGoogle Scholar
  7. ten Cate, B. and Kolaitis, P. G. 2009. Structural characterizations of schema-mapping languages. In Proceedings of the International Conference on Database Theory (ICDT). 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. ten Cate, B., Kolaitis, P. G., and Tan, W.-C. 2010. Database constraints and homomorphism dualities. In Principles and Practice of Constraint Programming, D. Cohen, Ed., Lecture Notes in Computer Science, vol. 6308, Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chiticariu, L. and Tan, W. C. 2006. Debugging schema mappings with routes. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 79--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Coté, A., Chouinard-Prévost, V., and Tardif, C. 2004. Orientations and 3-colourings of graphs. Commentationes Mathematicae Universitatis Carolinae 45, 549--553.Google ScholarGoogle Scholar
  11. Deutsch, A., Nash, A., and Remmel, J. 2008. The chase revisited. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 149--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Erdös, P. 1959. Graph theory and probability. Canadian J. Math. 11, 34--38.Google ScholarGoogle ScholarCross RefCross Ref
  13. Fagin, R. 1982a. Armstrong databases. In Proceedings of the 7th IBM Symposium on Mathematical Foundations of Computer Science.Google ScholarGoogle Scholar
  14. Fagin, R. 1982b. Horn clauses and database dependencies. J. ACM 29, 4, 952--985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2005. Data exchange: Semantics and query answering. Theoret. Comput. Sci. 336, 1, 89--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fagin, R., Kolaitis, P. G., Nash, A., and Popa, L. 2008. Towards a theory of schema-mapping optimization. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fagin, R. and Vardi, M. Y. 1983. Armstrong databases for functional and inclusion dependencies. Inf. Process. Lett. 16, 1, 13--19.Google ScholarGoogle ScholarCross RefCross Ref
  18. Fagin, R. and Vardi, M. Y. 1986. The theory of data dependencies—A survey. In Proceedings of Symposia in Applied Mathematics, M. Anshel and W. Gewirtz, Eds. Vol. 34—Mathematics of Information Processing. American Mathematical Society, Providence, Rhode Island, 19--71.Google ScholarGoogle Scholar
  19. Foniok, J., Nesetril, J., and Tardif, C. 2008. Generalised dualities and maximal finite antichains in the homomorphism order of relational structures. Eur. J. Comb. 29, 4, 881--899. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Fuxman, A., Kolaitis, P. G., Miller, R. J., and Tan, W. C. 2006. Peer data exchange. ACM Trans. Datab. Syst. 1454--1498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gottlob, G. and Senellart, P. 2010. Schema mapping discovery from data instances. J. ACM 57, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Haas, L. M., Hernández, M. A., Ho, H., Popa, L., and Roth, M. 2005. Clio grows up: From research prototype to industrial tool. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 805--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hell, P. and Nešetřil, J. 1992. The core of a graph. Discr. Math. 109, 117--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hell, P. and Nešetřil, J. 2004. Graphs and Homomorphisms. Oxford University Press.Google ScholarGoogle Scholar
  25. Hernández, M. A., Miller, R. J., Haas, L., Yan, L., Ho, C. T. H., and Tian, X. 2001. Clio: A semi-automatic tool for schema mapping, System Demonstration. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) 30, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hodges, W. 1997. A Shorter Model Theory. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kearns, M. J. and Vazirani, U. V. 1994. An Introduction to Computational Learning Theory. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kolaitis, P. G. 2005. Schema mappings, data exchange, and metadata management. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 61--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 233--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mannila, H. and Räihä, K.-J. 1989. Automatic generation of test data for relational queries. J. Comput. Syst. Sci. 38, 2, 240--258.Google ScholarGoogle ScholarCross RefCross Ref
  31. Nešetřil, J. and Rödl, V. 1989. Chromatically optimal rigid graphs. J. Comb. Theory, Ser. B 46, 2, 133--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nešetřil, J. and Tardif, C. 2005. Short answers to exponentially long questions: Extremal aspects of homomorphism duality. SIAM J. Discr. Math. 19, 4, 914--920. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nešetřil, J. and Zhu, X. 1998. On bounded treewidth duality of graphs. J. Graph Theory 23, 2, 151--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Olston, C., Chopra, S., and Srivastava, U. 2009. Generating example data for dataflow programs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 245--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Reingold, O. 2005. Undirected st-connectivity in log-space. In Proceedings of the ACM Symposium on Theory of Computing (STOC'05). ACM, 376--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Senellart, P. and Gottlob, G. 2008. On the complexity of deriving schema mappings from database instances. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yan, L.-L., Miller, R. J., Haas, L. M., and Fagin, R. 2001. Data-Driven understanding and refinement of schema mappings. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 485--496. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Characterizing schema mappings via data examples

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Database Systems
              ACM Transactions on Database Systems  Volume 36, Issue 4
              December 2011
              271 pages
              ISSN:0362-5915
              EISSN:1557-4644
              DOI:10.1145/2043652
              Issue’s Table of Contents

              Copyright © 2011 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 19 December 2011
              • Accepted: 1 August 2011
              • Revised: 1 April 2011
              • Received: 1 October 2010
              Published in tods Volume 36, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader