ABSTRACT
A schema mapping is a specification of the relationship between a source schema and a target schema. Schema mappings are fundamental building blocks in data integration and data exchange and, as such, obtaining the right schema mapping constitutes a major step towards the integration or exchange of data. Up to now, schema mappings have typically been specified manually or have been derived using mapping-design systems that automatically generate a schema mapping from a visual specification of the relationship between two schemas. We present a novel paradigm and develop a system for the interactive design of schema mappings via data examples. Each data example represents a partial specification of the semantics of the desired schema mapping. At the core of our system lies a sound and complete algorithm that, given a finite set of data examples, decides whether or not there exists a GLAV schema mapping (i.e., a schema mapping specified by Global-and-Local-As-View constraints) that "fits" these data examples. If such a fitting GLAV schema mapping exists, then our system constructs the "most general" one. We give a rigorous computational complexity analysis of the underlying decision problem concerning the existence of a fitting GLAV schema mapping, given a set of data examples. Specifically, we prove that this problem is complete for the second level of the polynomial hierarchy, hence, in a precise sense, harder than NP-complete. This worst-case complexity analysis notwithstanding, we conduct an experimental evaluation of our prototype implementation that demonstrates the feasibility of interactively designing schema mappings using data examples. In particular, our experiments show that our system achieves very good performance in real-life scenarios.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- B. Alexe, L. Chiticariu, R. J. Miller, and W. C. Tan. Muse: Mapping Understanding and deSign by Example. In ICDE, pages 10--19, 2008. Google ScholarDigital Library
- B. Alexe, P. G. Kolaitis, and W. C. Tan. Characterizing schema mappings via data examples. In ACM PODS, pages 261--272, 2010. Google ScholarDigital Library
- B. Alexe, W. C. Tan, and Y. Velegrakis. STBenchmark: Towards a Benchmark for Mapping Systems. PVLDB, 1(1):230--244, 2008. Google ScholarDigital Library
- P. Barceló. Logical foundations of relational data exchange. SIGMOD Record, 38(1):49--58, 2009. Google ScholarDigital Library
- P. A. Bernstein, T. J. Green, S. Melnik, and A. Nash. Implementing Mapping Composition. VLDB Journal, 17(2):333--353, 2008. Google ScholarDigital Library
- A. Bonifati, E. Q. Chang, T. Ho, V. S. Lakshmanan, and R. Pottinger. HePToX: Marrying XML and Heterogeneity in Your P2P Databases. In VLDB, pages 1267--1270, 2005. Google ScholarDigital Library
- A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, pages 77--90, 1977. Google ScholarDigital Library
- R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and Query Answering. TCS, 336(1):89--124, 2005. Google ScholarDigital Library
- R. Fagin, P. G. Kolaitis, A. Nash, and L. Popa. Towards a Theory of Schema-Mapping Optimization. In ACM PODS, pages 33--42, 2008. Google ScholarDigital Library
- R. Fagin, P. G. Kolaitis, and L. Popa. Data Exchange: Getting to the Core. ACM TODS, 30(1):174--210, 2005. Google ScholarDigital Library
- G. H. Fletcher and C. M. Wyss. Towards a general framework for effective solutions to the data mapping problem. Journal on Data Semantics, XIV, 2009. Google ScholarDigital Library
- G. H. L. Fletcher, M. Gyssens, J. Paredaens, and D. V. Gucht. On the expressive power of the relational algebra on finite sets of relation pairs. TKDE, 21(6):939--942, 2009. Google ScholarDigital Library
- G. Gottlob and P. Senellart. Schema mapping discovery from data instances. JACM, 57(2), 2010. Google ScholarDigital Library
- L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. Clio Grows Up: From Research Prototype to Industrial Tool. In ACM SIGMOD, pages 805--810, 2005. Google ScholarDigital Library
- P. G. Kolaitis. Schema Mappings, Data Exchange, and Metadata Management. In ACM PODS, pages 61--75, 2005. Google ScholarDigital Library
- M. Lenzerini. Data Integration: A Theoretical Perspective. In ACM PODS, pages 233--246, 2002. Google ScholarDigital Library
- H. Mannila and K.-J. Räihä. Automatic generation of test data for relational queries. JCSS, 38(2):240--258, 1989.Google ScholarCross Ref
- C. Olston, S. Chopra, and U. Srivastava. Generating example data for dataflow programs. In ACM SIGMOD, pages 245--256, 2009. Google ScholarDigital Library
- C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.Google Scholar
- A. D. Sarma, A. G. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. In ICDT, pages 89--103, 2010. Google ScholarDigital Library
- B. ten Cate, P. G. Kolaitis, and W. C. Tan. Database constraints and homomorphism dualities. In CP, 2010. Google ScholarDigital Library
- L. Yan, R. J. Miller, L. M. Haas, and R. Fagin. Data-Driven Understanding and Refinement of Schema Mappings. In ACM SIGMOD, pages 485--496, 2001. Google ScholarDigital Library
Index Terms
- Designing and refining schema mappings via data examples
Recommendations
Characterizing schema mappings via data examples
Schema mappings are high-level specifications that describe the relationship between two database schemas; they are considered to be the essential building blocks in data exchange and data integration, and have been the object of extensive research ...
Schema mappings and data examples
EDBT '13: Proceedings of the 16th International Conference on Extending Database TechnologyA fundamental task in data integration and data exchange is the design of schema mappings, that is, high-level declarative specifications of the relationship between two database schemas. Several research prototypes and commercial systems have been ...
Characterizing schema mappings via data examples
PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsSchema mappings are high-level specifications that describe the relationship between two database schemas; they are considered to be the essential building blocks in data exchange and data integration, and have been the object of extensive research ...
Comments