Abstract
Schema mappings are high-level specifications that describe the relationship between two database schemas; they are considered to be the essential building blocks in data exchange and data integration, and have been the object of extensive research investigations. Since in real-life applications schema mappings can be quite complex, it is important to develop methods and tools for understanding, explaining, and refining schema mappings. A promising approach to this effect is to use “good” data examples that illustrate the schema mapping at hand.
We develop a foundation for the systematic investigation of data examples and obtain a number of results on both the capabilities and the limitations of data examples in explaining and understanding schema mappings. We focus on schema mappings specified by source-to-target tuple generating dependencies (s-t tgds) and investigate the following problem: which classes of s-t tgds can be “uniquely characterized” by a finite set of data examples? Our investigation begins by considering finite sets of positive and negative examples, which are arguably the most natural choice of data examples. However, we show that they are not powerful enough to yield interesting unique characterizations. We then consider finite sets of universal examples, where a universal example is a pair consisting of a source instance and a universal solution for that source instance. We first show that unique characterizations via universal examples is, in a precise sense, equivalent to the existence of Armstrong bases (a relaxation of the classical notion of Armstrong databases). After this, we show that every schema mapping specified by LAV s-t tgds is uniquely characterized by a finite set of universal examples with respect to the class of LAV s-t tgds. Moreover, this positive result extends to the much broader classes of n-modular schema mappings, n a positive integer. Finally, we study the unique characterizability of GAV schema mappings. It turns out that some GAV schema mappings are uniquely characterizable by a finite set of universal examples with respect to the class of GAV s-t tgds, while others are not. By unveiling a tight connection with homomorphism dualities, we establish an effective, sound, and complete criterion for determining whether or not a GAV schema mapping is uniquely characterizable by a finite set of universal examples with respect to the class of GAV s-t tgds.
- Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarDigital Library
- Alexe, B., ten Cate, B., Kolaitis, P. G., and Tan, W.-C. 2011. Designing and refining schema mappings via data examples. In Proceedings of the International Conference on Management of Data. ACM, New York, 133--144. Google ScholarDigital Library
- Alexe, B., Chiticariu, L., Miller, R. J., and Tan, W. C. 2008a. Muse: Mapping understanding and design by example. In Proceedings of the International Conference on Data Engineering (ICDE). 10--19. Google ScholarDigital Library
- Alexe, B., Kolaitis, P. G., and Tan, W.-C. 2010. Characterizing schema mappings via data examples. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). Google ScholarDigital Library
- Alexe, B., Tan, W. C., and Velegrakis, Y. 2008b. STBenchmark: Towards a benchmark for mapping systems. Proc. VLDB Endowm. 1, 1, 230--244. Google ScholarDigital Library
- Armstrong, W. W. 1974. Dependency structures of data base relationships. In Proceedings of the IFIP Congress. 580--583.Google Scholar
- ten Cate, B. and Kolaitis, P. G. 2009. Structural characterizations of schema-mapping languages. In Proceedings of the International Conference on Database Theory (ICDT). 63--72. Google ScholarDigital Library
- ten Cate, B., Kolaitis, P. G., and Tan, W.-C. 2010. Database constraints and homomorphism dualities. In Principles and Practice of Constraint Programming, D. Cohen, Ed., Lecture Notes in Computer Science, vol. 6308, Springer. Google ScholarDigital Library
- Chiticariu, L. and Tan, W. C. 2006. Debugging schema mappings with routes. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 79--90. Google ScholarDigital Library
- Coté, A., Chouinard-Prévost, V., and Tardif, C. 2004. Orientations and 3-colourings of graphs. Commentationes Mathematicae Universitatis Carolinae 45, 549--553.Google Scholar
- Deutsch, A., Nash, A., and Remmel, J. 2008. The chase revisited. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 149--158. Google ScholarDigital Library
- Erdös, P. 1959. Graph theory and probability. Canadian J. Math. 11, 34--38.Google ScholarCross Ref
- Fagin, R. 1982a. Armstrong databases. In Proceedings of the 7th IBM Symposium on Mathematical Foundations of Computer Science.Google Scholar
- Fagin, R. 1982b. Horn clauses and database dependencies. J. ACM 29, 4, 952--985. Google ScholarDigital Library
- Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2005. Data exchange: Semantics and query answering. Theoret. Comput. Sci. 336, 1, 89--124. Google ScholarDigital Library
- Fagin, R., Kolaitis, P. G., Nash, A., and Popa, L. 2008. Towards a theory of schema-mapping optimization. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 33--42. Google ScholarDigital Library
- Fagin, R. and Vardi, M. Y. 1983. Armstrong databases for functional and inclusion dependencies. Inf. Process. Lett. 16, 1, 13--19.Google ScholarCross Ref
- Fagin, R. and Vardi, M. Y. 1986. The theory of data dependencies—A survey. In Proceedings of Symposia in Applied Mathematics, M. Anshel and W. Gewirtz, Eds. Vol. 34—Mathematics of Information Processing. American Mathematical Society, Providence, Rhode Island, 19--71.Google Scholar
- Foniok, J., Nesetril, J., and Tardif, C. 2008. Generalised dualities and maximal finite antichains in the homomorphism order of relational structures. Eur. J. Comb. 29, 4, 881--899. Google ScholarDigital Library
- Fuxman, A., Kolaitis, P. G., Miller, R. J., and Tan, W. C. 2006. Peer data exchange. ACM Trans. Datab. Syst. 1454--1498. Google ScholarDigital Library
- Gottlob, G. and Senellart, P. 2010. Schema mapping discovery from data instances. J. ACM 57, 2. Google ScholarDigital Library
- Haas, L. M., Hernández, M. A., Ho, H., Popa, L., and Roth, M. 2005. Clio grows up: From research prototype to industrial tool. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 805--810. Google ScholarDigital Library
- Hell, P. and Nešetřil, J. 1992. The core of a graph. Discr. Math. 109, 117--126. Google ScholarDigital Library
- Hell, P. and Nešetřil, J. 2004. Graphs and Homomorphisms. Oxford University Press.Google Scholar
- Hernández, M. A., Miller, R. J., Haas, L., Yan, L., Ho, C. T. H., and Tian, X. 2001. Clio: A semi-automatic tool for schema mapping, System Demonstration. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) 30, 2. Google ScholarDigital Library
- Hodges, W. 1997. A Shorter Model Theory. Cambridge University Press. Google ScholarDigital Library
- Kearns, M. J. and Vazirani, U. V. 1994. An Introduction to Computational Learning Theory. The MIT Press. Google ScholarDigital Library
- Kolaitis, P. G. 2005. Schema mappings, data exchange, and metadata management. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 61--75. Google ScholarDigital Library
- Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 233--246. Google ScholarDigital Library
- Mannila, H. and Räihä, K.-J. 1989. Automatic generation of test data for relational queries. J. Comput. Syst. Sci. 38, 2, 240--258.Google ScholarCross Ref
- Nešetřil, J. and Rödl, V. 1989. Chromatically optimal rigid graphs. J. Comb. Theory, Ser. B 46, 2, 133--141. Google ScholarDigital Library
- Nešetřil, J. and Tardif, C. 2005. Short answers to exponentially long questions: Extremal aspects of homomorphism duality. SIAM J. Discr. Math. 19, 4, 914--920. Google ScholarDigital Library
- Nešetřil, J. and Zhu, X. 1998. On bounded treewidth duality of graphs. J. Graph Theory 23, 2, 151--162. Google ScholarDigital Library
- Olston, C., Chopra, S., and Srivastava, U. 2009. Generating example data for dataflow programs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 245--256. Google ScholarDigital Library
- Reingold, O. 2005. Undirected st-connectivity in log-space. In Proceedings of the ACM Symposium on Theory of Computing (STOC'05). ACM, 376--385. Google ScholarDigital Library
- Senellart, P. and Gottlob, G. 2008. On the complexity of deriving schema mappings from database instances. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 23--32. Google ScholarDigital Library
- Yan, L.-L., Miller, R. J., Haas, L. M., and Fagin, R. 2001. Data-Driven understanding and refinement of schema mappings. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 485--496. Google ScholarDigital Library
Index Terms
- Characterizing schema mappings via data examples
Recommendations
Designing and refining schema mappings via data examples
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataA schema mapping is a specification of the relationship between a source schema and a target schema. Schema mappings are fundamental building blocks in data integration and data exchange and, as such, obtaining the right schema mapping constitutes a ...
Schema mappings and data examples
EDBT '13: Proceedings of the 16th International Conference on Extending Database TechnologyA fundamental task in data integration and data exchange is the design of schema mappings, that is, high-level declarative specifications of the relationship between two database schemas. Several research prototypes and commercial systems have been ...
Characterizing schema mappings via data examples
PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsSchema mappings are high-level specifications that describe the relationship between two database schemas; they are considered to be the essential building blocks in data exchange and data integration, and have been the object of extensive research ...
Comments