skip to main content
10.1145/1807085.1807120acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Characterizing schema mappings via data examples

Published:06 June 2010Publication History

ABSTRACT

Schema mappings are high-level specifications that describe the relationship between two database schemas; they are considered to be the essential building blocks in data exchange and data integration, and have been the object of extensive research investigations. Since in real-life applications schema mappings can be quite complex, it is important to develop methods and tools for understanding, explaining, and refining schema mappings. A promising approach to this effect is to use "good" data examples that illustrate the schema mapping at hand.

We develop a foundation for the systematic investigation of data examples and obtain a number of results on both the capabilities and the limitations of data examples in explaining and understanding schema mappings. We focus on schema mappings specified by source-to-target tuple generating dependencies (s-t tgds) and investigate the following problem: which classes of s-t tgds can be "uniquely characterized" by a finite set of data examples? Our investigation begins by considering finite sets of positive and negative examples, which are arguably the most natural choice of data examples. However, we show that they are not powerful enough to yield interesting unique characterizations. We then consider finite sets of universal examples, where a universal example is a pair consisting of a source instance and a universal solution for that source instance. We unveil a tight connection between unique characterizations via universal examples and the existence of Armstrong bases (a relaxation of the classical notion of Armstrong databases). On the positive side, we show that every schema mapping specified by LAV s-t tgds is uniquely characterized by a finite set of universal examples with respect to the class of LAV s-t tgds. Moreover, this positive result extends to the much broader classes of n-modular schema mappings, n a positive integer. Finally, we show that, on the negative side, there are schema mappings specified by GAV s-t tgds that are not uniquely characterized by any finite set of universal examples and negative examples with respect to the class of GAV s-t tgds (hence also with respect to the class of all s-t tgds).

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Alexe, L. Chiticariu, R. J. Miller, and W. C. Tan. Muse: Mapping Understanding and deSign by Example. In International Conference on Data Engineering (ICDE), pages 10--19, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Alexe, W. C. Tan, and Y. Velegrakis. STBenchmark: Towards a Benchmark for Mapping Systems. Proceedings of the VLDB Endowment (PVLDB), 1(1):230--244, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. W. Armstrong. Dependency Structures of Data Base Relationships. In IFIP Congress, pages 580--583, 1974.Google ScholarGoogle Scholar
  5. L. Chiticariu and W. C. Tan. Debugging schema mappings with routes. In International Conference on Very Large Data Bases (VLDB), pages 79--90, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Erdös. Graph theory and probability. Canadian J. of Mathematics, 11:34--38, 1959.Google ScholarGoogle ScholarCross RefCross Ref
  7. R. Fagin. Armstrong Databases. In 7th IBM Symposium on Mathematical Foundations of Computer Science, 1982.Google ScholarGoogle Scholar
  8. R. Fagin. Horn Clauses and Database Dependencies. Journal of the Association for Computing Machinery (JACM), 29(4):952--985, Oct. 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. Theoretical Computer Science (TCS), 336(1):89--124, 2005. Google ScholarGoogle ScholarCross RefCross Ref
  10. R. Fagin, P. G. Kolaitis, A. Nash, and L. Popa. (PODS), 2008. In ACM Symposium on Principles of Database Systems (PODS), pages 33--42, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Fagin and M. Y. Vardi. Armstrong Databases for Functional and Inclusion Dependencies. Inf. Process. Lett., 16(1):13--19, 1983.Google ScholarGoogle Scholar
  12. R. Fagin and M. Y. Vardi. The Theory of Data Dependencies - A Survey. In M. Anshel and W. Gewirtz, editors, Proc. of Symposia in Applied Mathematics, volume 34 - Mathematics of Information Processing, pages 19--71. American Mathematical Society, Providence, Rhode Island, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  13. L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. Clio Grows Up: From Research Prototype to Industrial Tool. In ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 805--810, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Hell and J. Neaetril. Graphs and Homomorphisms. Oxford University Press, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. A. Hernández, R. J. Miller, L. Haas, L. Yan, C. T. H. Ho, and X. Tian. Clio: A Semi-Automatic Tool for Schema Mapping, System Demonstration. ACM SIGMOD International Conference on Management of Data (SIGMOD), 30(2), May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Hodges. A Shorter Model Theory. Cambridge University Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. J. Kearns and U. V. Vazirani. An Introduction to Computational Learning Theory. The MIT Press, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. G. Kolaitis. Schema Mappings, Data Exchange, and Metadata Management. In ACM Symposium on Principles of Database Systems (PODS), pages 61--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Lenzerini. Data Integration: A Theoretical Perspective. In ACM Symposium on Principles of Database Systems (PODS), pages 233--246, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Lubotzky, R. Phillips, and P. Sarnak. Ramanujan graphs. Combinatorica, 8(3):261--277, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  21. J. Neaetril and V. Rödl. Chromatically optimal rigid graphs. J. Comb. Theory, Ser. B, 46(2):133--141, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Olston, S. Chopra, and U. Srivastava. Generating example data for dataflow programs. In ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 245--256, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Senellart and G. Gottlob. On the complexity of deriving schema mappings from database instances. In ACM Symposium on Principles of Database Systems (PODS), pages 23--32, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. ten Cate and P. G. Kolaitis. Structural Characterizations of Schema-Mapping Languages. In International Conference on Database Theory (ICDT), pages 63--72, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. ten Cate and P. G. Kolaitis and W. C. Tan. Database Constraints and Homomorphism Dualities. Work in Progress, 2010.Google ScholarGoogle Scholar
  26. L.-L. Yan, R. J. Miller, L. M. Haas, and R. Fagin. Data-Driven Understanding and Refinement of Schema Mappings. In ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 485--496, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Characterizing schema mappings via data examples

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
            June 2010
            350 pages
            ISBN:9781450300339
            DOI:10.1145/1807085

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 6 June 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            PODS '10 Paper Acceptance Rate27of113submissions,24%Overall Acceptance Rate642of2,707submissions,24%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader