skip to main content
research-article

Learning schema mappings

Published:04 December 2013Publication History
Skip Abstract Section

Abstract

A schema mapping is a high-level specification of the relationship between a source schema and a target schema. Recently, a line of research has emerged that aims at deriving schema mappings automatically or semi-automatically with the help of data examples, that is, pairs consisting of a source instance and a target instance that depict, in some precise sense, the intended behavior of the schema mapping. Several different uses of data examples for deriving, refining, or illustrating a schema mapping have already been proposed and studied.

In this article, we use the lens of computational learning theory to systematically investigate the problem of obtaining algorithmically a schema mapping from data examples. Our aim is to leverage the rich body of work on learning theory in order to develop a framework for exploring the power and the limitations of the various algorithmic methods for obtaining schema mappings from data examples. We focus on GAV schema mappings, that is, schema mappings specified by GAV (Global-As-View) constraints. GAV constraints are the most basic and the most widely supported language for specifying schema mappings. We present an efficient algorithm for learning GAV schema mappings using Angluin's model of exact learning with membership and equivalence queries. This is optimal, since we show that neither membership queries nor equivalence queries suffice, unless the source schema consists of unary relations only. We also obtain results concerning the learnability of schema mappings in the context of Valiant's well-known PAC (Probably-Approximately-Correct) learning model, and concerning the learnability of restricted classes of GAV schema mappings. Finally, as a byproduct of our work, we show that there is no efficient algorithm for approximating the shortest GAV schema mapping fitting a given set of examples, unless the source schema consists of unary relations only.

References

  1. Alekhnovich, M., Braverman, M., Feldman, V., Klivans, A. R., and Pitassi, T. 2008. The complexity of properly learning simple concept classes. J. Comput. Syst. Sci. 74, 1, 16--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexe, B., Chiticariu, L., Miller, R. J., and Tan, W. C. 2008. Muse: Mapping understanding and design by example. In Proceedings of the 24th International Conference on Data Engineering (ICDE'08). 10--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexe, B., Kolaitis, P. G., and Tan, W. C. 2010. Characterizing schema mappings via data examples. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'10). 261--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alexe, B., ten Cate, B., Kolaitis, P. G., and Tan, W. C. 2011. Designing and refining schema mappings via data examples. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11). 133--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Angluin, D. 1987. Queries and concept learning. Mach. Learn. 2, 4, 319--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Angluin, D. 1990. Negative results for equivalence queries. Mach. Learn. 5, 121--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Angluin, D., Frazier, M., and Pitt, L. 1992. Learning conjunctions of horn clauses. Mach. Learn. 9, 147--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Angluin, D. and Kharitonov, M. 1995. When won't membership queries help? J. Comput. Syst. Sci. 50, 2, 336--355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Anthony, M. and Biggs, N. 1992. An Introduction to Computational Learning Theory. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Barcelo, P. 2009. Logical foundations of relational data exchange. SIGMOD Rec. 38, 1, 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bernstein, P. A., Green, T. J., Melnik, S., and Nash, A. 2008. Implementing mapping composition. VLDB J. 17, 2, 333--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. 1987. Occam's razor. Inf. Process. Lett. 24, 6, 377--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bonifati, A., Chang, E. Q., Ho, T., Lakshmanan, V. S., and Pottinger, R. 2005. HePToX: Marrying XML and heterogeneity in your P2P databases. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). 1267--1270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. ten Cate, B., Dalmau, V., and Kolaitis, P. 2012. Learning schema mappings. In Proceedings of the International Conference on Database Theory (ICDT'12). ACM Press, New York, 22--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. ten Cate, B., Kolaitis, P. G., and Tan, W. C. 2010. Database constraints and homomorphism dualities. In Proceedings of the 16th International Conference on Principles and Practice of Constraint Programming (CP'10). 475-490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chandra, A. K. and Merlin, P. M. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the 9th Annual ACM Symposium on Theory of Computing (STOC'77). 77--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chekuri, C. and Rajaraman, A. 2000. Conjunctive query containment revisited. Theor. Comput. Sci. 239, 211--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ciucanu, R. and Staworko, S. 2013. Learning schemas for unordered xml. http://dbpl2013.inria.fr/slides/dbpl13-ciucanu-slides.pdf.Google ScholarGoogle Scholar
  19. Cohen, W. W. 1995a. PAC-learning non-recursive prolog clauses. Artif. Intell. 79, 1, 1--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Cohen, W. W. 1995b. PAC-learning recursive logic programs: Efficient algorithms. J. Artif. Intell. Res. 2, 501--539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Cohen, W. W. 1995c. PAC-learning recursive logic programs: Negative results. J. Artif. Intell. Res. 2, 541--573. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Das Sarma, A., Parameswaran, A. G., Garcia-Molina, H., and Widom, J. 2010. Synthesizing view definitions from data. In Proceedings of the 13th International Conference on Database Theory (ICDT'10). 89--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Doan, A. 2002. Learning to map between structured representations of data. Ph.D. thesis, Department of Computer Science and Engineering, University of Washington. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Doan, A., Madhavan, J., Domingos, P., and Halevy, A. Y. 2002. Learning to map between ontologies on the Semantic web. In Proceedings of the 11th International Conference on World Wide Web (WWW'02). 662--673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2005. Data exchange: Semantics and query answering. Theor. Comput. Sci. 336, 1, 89--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Fan, W., Geerts, F., Lakshmanan, L. V. S., and Xiong, M. 2009. Discovering conditional functional dependencies. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 1231--1234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Feldman, V. 2009. Hardness of approximate two-level logic minimization and pac learning with membership queries. J. Comput. Syst. Sci. 75, 1, 13--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Flach, P. A. and Savnik, I. 1999. Database dependency discovery: A machine learning approach. AI Comm. 12, 3, 139--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Fletcher, G. H. L., Gyssens, M., Paredaens, J., and Gucht, D. V. 2009. On the expressive power of the relational algebra on finite sets of relation pairs. IEEE Trans. Knowl. Data Engin. 21, 6, 939--942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Fletcher, G. H. L. and Wyss, C. M. 2009. Towards a general framework for effective solutions to the data mapping problem. J. Data Semantics 14, 37--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Gillis, J. J. M. and Van Den Bussche, J. 2009. Induction of relational algebra expressions. In Proceedings of the 19th International Conference on Inductive Logic Programming (ILP'09). 25--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Gold, M. E. 1967. Language identification in the limit. Inf. Control 10, 5, 447--474.Google ScholarGoogle ScholarCross RefCross Ref
  33. Gottlob, G. and Senellart, P. 2010. Schema mapping discovery from data instances. J. ACM 57, 2, 6:1--6:37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Gutjahr, W., Welzl, E., and Woeginger, G. 1992. Polynomial graph-colorings. Discr. Appl. Math. 35, 29--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Haas, L. M., Hernandez, M. A., Ho, H., Popa, L., and Roth, M. 2005. Clio grows up: From research prototype to industrial tool. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'05). 805--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Haussler, D. 1989. Learning conjunctive concepts in structural domains. Mach. Learn. 4, 7--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Hellerstein, L., Pillaipakkamnatt, K., Raghavan, V. V., and Wilkins, D. 1996. How many queries are needed to learn? J. ACM 43, 5, 840--862. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hirata, K. 2000. On the hardness of learning acyclic conjunctive queries. In Proceedings of the 11th International Conference on Algorithmic Learning Theory (ALT'00). 238--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Kearns, M. J. and Valiant, L. G. 1994. Cryptographic limitations on learning boolean formulae and finite automata. J. ACM 41, 1, 67--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Kearns, M. J. and Vazirani, U. V. 1997. An Introduction to Computational Learning Theory. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kolaitis, P. G. 2005. Schema mappings, data exchange, and metadata management. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS'05). 61--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Kolaitis, P. G. and Vardi, M. Y. 2000. Conjunctive-query containment and constraint satisfaction. J. Comput. Syst. Sci. 61, 2, 302--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS'02). 233--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Rahm, E. and Bernstein, P. A. 2001. A survey of approaches to automatic schema matching. Very Large Database J. 10, 4, 334--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Schapire, R. E. 1990. The strength of weak learnability. Mach. Learn. 5, 197--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Staworko, S. and Wieczorek, P. 2012. Learning twig and path queries. In Proceedings of the International Conference on Database Theory (ICDT'12). ACM Press, New York, 140--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Tran, Q. T., Chan, C.-Y., and Parthasarathy, S. 2009. Query by output. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'09). 535--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Valiant, L. G. 1984. A theory of the learnable. Comm. ACM 27, 11, 1134--1142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Valiant, L. G. 1985. Learning disjunctions of conjunctions. In Proceedings of the 9th International Joint Conference on Artificial Intelligence (IJCAI'85). 560--566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yan, L., Miller, R. J., Haas, L. M., and Fagin, R. 2001. Data-driven understanding and refinement of schema mappings. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'01). 485--496. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning schema mappings

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Database Systems
              ACM Transactions on Database Systems  Volume 38, Issue 4
              Invited papers issue
              November 2013
              294 pages
              ISSN:0362-5915
              EISSN:1557-4644
              DOI:10.1145/2539032
              Issue’s Table of Contents

              Copyright © 2013 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 4 December 2013
              • Accepted: 1 August 2013
              • Received: 1 March 2013
              Published in tods Volume 38, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader