skip to main content
research-article

Schema mapping discovery from data instances

Published:08 February 2010Publication History
Skip Abstract Section

Abstract

We introduce a theoretical framework for discovering relationships between two database instances over distinct and unknown schemata. This framework is grounded in the context of data exchange. We formalize the problem of understanding the relationship between two instances as that of obtaining a schema mapping so that a minimum repair of this mapping provides a perfect description of the target instance given the source instance. We show that this definition yields “intuitive” results when applied on database instances derived from each other by basic operations. We study the complexity of decision problems related to this optimality notion in the context of different logical languages and show that, even in very restricted cases, the problem is of high complexity.

Skip Supplemental Material Section

Supplemental Material

References

  1. Arenas, M., Bertossi, L., and Chomicki, J. 1999. Consistent query answers in inconsistent databases. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Beeri, C., Fagin, R., Maier, D., Mendelzon, A., Ullman, J., and Yannakakis, M. 1981. Properties of acyclic database schemes. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bernstein, P. 2003. Applying model management to classical meta data. In Proceedings of Conference on Innovative Data Systems.Google ScholarGoogle Scholar
  4. Bernstein, P. A., and Chiu, D.-M. W. 1981. Using semi-joins to solve relational queries. J ACM 28, 1, 25--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chandra, A. K., and Merlin, P. M. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Crescenzi, V., Mecca, G., and Merialdo, P. 2001. Roadrunner: Towards automatic data extraction from large Web sites. In Proceedings of Symposium on Very Large Databases (VLDB). VLDB Endowment. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Di Paola, R. A. 1969. The recursive unsolvability of the decision problem for the class of definite formulas. J ACM 16, 2, 324--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Diestel, R. 2005. Graph Theory. Springer-Verlag, New York.Google ScholarGoogle Scholar
  9. Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2003. Data exchange: Semantics and query answering. In Proceedings of the International Conference on Database Theory (ICDT). Springer-Verlag, Berlin, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fagin, R., Kolaitis, P. G., Popa, L., and Tan, W. C. 2007. Quasi-inverses of schema mapping. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fagin, R., Kolaitis, P. G., Tan, W.-C., and Popa, L. 2004. Composing schema mappings: Second-order dependencies to the rescue. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fletcher, G. H. L. 2007. On the data mapping problem. Ph.D. dissertation, Indiana University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability. A Guide to the Theory of NP-Completeness. W. H. Freeman, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gottlob, G., Leone, N., and Scarcello, F. 1997. On the complexity of some inductive logic programming problems. In Proceedings of the International Conference of Inductive Logic Programming (ILP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gottlob, G., Leone, N., and Scarcello, F. 1999. Hypertree decompositions and tractable queries. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Haas, L. M., Hernández, M. A., Ho, H., Popa, L., and Roth, M. 2005. Clio grows up: from research prototype to industrial tool. In Proceedings of SIGMOD. ACM, New York, 805--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ilie, L., Solis-Oba, R., and Yu, S. 2002. Reducing the size of NFAs by using equivalences and preorders. In Combinatorial Pattern Matching. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kolaitis, P. G. 2005. Schema mappings, data exchange, and metadata management. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kőnig, D. 1936. Theorie der endlichen und unendlichen Graphen. Akademische Verlagsgesellschaft, Leipzig, Germany.Google ScholarGoogle Scholar
  20. Lavrač, N., and Džeroski, S. 1994. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lenzerini, M. 2002. Data integration: a theoretical perspective. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Li, M., and Vitányi, P. 1997. An Introduction to Kolmogorov Complexity and Its Applications, second ed. Springer-Verlag, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Papadimitriou, C. H. 1994. Computational Complexity. Addison Wesley, Reading, MA.Google ScholarGoogle Scholar
  24. Rahm, E., and Bernstein, P. A. 2001. A survey of approaches to automatic schema matching. VLDB J 10, 4, 334--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Schaefer, M., and Umans, C. 2002. Completeness in the polynomial hierarchy, a compendium. SIGACT News 33, 3 (Sept.), 32--49.Google ScholarGoogle Scholar
  26. Senellart, P., and Gottlob, G. 2008. On the complexity of deriving schema mappings from database instances. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York, 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Stockmeyer, L. J., and Meyer, A. R. 1973. Word problems requiring exponential time. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tarjan, R. E., and Yannakakis, M. 1984. Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J Comput 13, 3, 566--579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Trakhtenbrot, B. A. 1963. Impossibility of an algorithm for the decision problem in finite classes. AMS Translations Series 2 23, 1--5.Google ScholarGoogle Scholar
  30. Wrathall, C. 1976. Complete sets and the polynomial-time hierarchy. Theoret Comput Sci 3, 1, 23--33.Google ScholarGoogle ScholarCross RefCross Ref
  31. Yannakakis, M. 1981. Algorithms for acyclic database schemes. In Proceedings of the Symposium on Very Large Databases (VLDB). VLDB Endowment. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Schema mapping discovery from data instances

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image Journal of the ACM
              Journal of the ACM  Volume 57, Issue 2
              January 2010
              248 pages
              ISSN:0004-5411
              EISSN:1557-735X
              DOI:10.1145/1667053
              Issue’s Table of Contents

              Copyright © 2010 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 8 February 2010
              • Revised: 1 September 2009
              • Accepted: 1 September 2009
              • Received: 1 January 2009
              Published in jacm Volume 57, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader