skip to main content
10.1145/1989323.1989338acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Designing and refining schema mappings via data examples

Published:12 June 2011Publication History

ABSTRACT

A schema mapping is a specification of the relationship between a source schema and a target schema. Schema mappings are fundamental building blocks in data integration and data exchange and, as such, obtaining the right schema mapping constitutes a major step towards the integration or exchange of data. Up to now, schema mappings have typically been specified manually or have been derived using mapping-design systems that automatically generate a schema mapping from a visual specification of the relationship between two schemas. We present a novel paradigm and develop a system for the interactive design of schema mappings via data examples. Each data example represents a partial specification of the semantics of the desired schema mapping. At the core of our system lies a sound and complete algorithm that, given a finite set of data examples, decides whether or not there exists a GLAV schema mapping (i.e., a schema mapping specified by Global-and-Local-As-View constraints) that "fits" these data examples. If such a fitting GLAV schema mapping exists, then our system constructs the "most general" one. We give a rigorous computational complexity analysis of the underlying decision problem concerning the existence of a fitting GLAV schema mapping, given a set of data examples. Specifically, we prove that this problem is complete for the second level of the polynomial hierarchy, hence, in a precise sense, harder than NP-complete. This worst-case complexity analysis notwithstanding, we conduct an experimental evaluation of our prototype implementation that demonstrates the feasibility of interactively designing schema mappings using data examples. In particular, our experiments show that our system achieves very good performance in real-life scenarios.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Alexe, L. Chiticariu, R. J. Miller, and W. C. Tan. Muse: Mapping Understanding and deSign by Example. In ICDE, pages 10--19, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Alexe, P. G. Kolaitis, and W. C. Tan. Characterizing schema mappings via data examples. In ACM PODS, pages 261--272, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Alexe, W. C. Tan, and Y. Velegrakis. STBenchmark: Towards a Benchmark for Mapping Systems. PVLDB, 1(1):230--244, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Barceló. Logical foundations of relational data exchange. SIGMOD Record, 38(1):49--58, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. A. Bernstein, T. J. Green, S. Melnik, and A. Nash. Implementing Mapping Composition. VLDB Journal, 17(2):333--353, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Bonifati, E. Q. Chang, T. Ho, V. S. Lakshmanan, and R. Pottinger. HePToX: Marrying XML and Heterogeneity in Your P2P Databases. In VLDB, pages 1267--1270, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, pages 77--90, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. TCS, 336(1):89--124, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Fagin, P. G. Kolaitis, A. Nash, and L. Popa. Towards a Theory of Schema-Mapping Optimization. In ACM PODS, pages 33--42, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Fagin, P. G. Kolaitis, and L. Popa. Data Exchange: Getting to the Core. ACM TODS, 30(1):174--210, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. H. Fletcher and C. M. Wyss. Towards a general framework for effective solutions to the data mapping problem. Journal on Data Semantics, XIV, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. H. L. Fletcher, M. Gyssens, J. Paredaens, and D. V. Gucht. On the expressive power of the relational algebra on finite sets of relation pairs. TKDE, 21(6):939--942, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Gottlob and P. Senellart. Schema mapping discovery from data instances. JACM, 57(2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. Clio Grows Up: From Research Prototype to Industrial Tool. In ACM SIGMOD, pages 805--810, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. G. Kolaitis. Schema Mappings, Data Exchange, and Metadata Management. In ACM PODS, pages 61--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Lenzerini. Data Integration: A Theoretical Perspective. In ACM PODS, pages 233--246, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Mannila and K.-J. Räihä. Automatic generation of test data for relational queries. JCSS, 38(2):240--258, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  19. C. Olston, S. Chopra, and U. Srivastava. Generating example data for dataflow programs. In ACM SIGMOD, pages 245--256, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.Google ScholarGoogle Scholar
  21. A. D. Sarma, A. G. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. In ICDT, pages 89--103, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. ten Cate, P. G. Kolaitis, and W. C. Tan. Database constraints and homomorphism dualities. In CP, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Yan, R. J. Miller, L. M. Haas, and R. Fagin. Data-Driven Understanding and Refinement of Schema Mappings. In ACM SIGMOD, pages 485--496, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Designing and refining schema mappings via data examples

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
              June 2011
              1364 pages
              ISBN:9781450306614
              DOI:10.1145/1989323

              Copyright © 2011 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 12 June 2011

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate785of4,003submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader