Abstract
We introduce a theoretical framework for discovering relationships between two database instances over distinct and unknown schemata. This framework is grounded in the context of data exchange. We formalize the problem of understanding the relationship between two instances as that of obtaining a schema mapping so that a minimum repair of this mapping provides a perfect description of the target instance given the source instance. We show that this definition yields “intuitive” results when applied on database instances derived from each other by basic operations. We study the complexity of decision problems related to this optimality notion in the context of different logical languages and show that, even in very restricted cases, the problem is of high complexity.
Supplemental Material
Available for Download
25-minute talk on the topic of the article (unrefereed material)
- Arenas, M., Bertossi, L., and Chomicki, J. 1999. Consistent query answers in inconsistent databases. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarDigital Library
- Beeri, C., Fagin, R., Maier, D., Mendelzon, A., Ullman, J., and Yannakakis, M. 1981. Properties of acyclic database schemes. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). ACM, New York. Google ScholarDigital Library
- Bernstein, P. 2003. Applying model management to classical meta data. In Proceedings of Conference on Innovative Data Systems.Google Scholar
- Bernstein, P. A., and Chiu, D.-M. W. 1981. Using semi-joins to solve relational queries. J ACM 28, 1, 25--40. Google ScholarDigital Library
- Chandra, A. K., and Merlin, P. M. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). ACM, New York. Google ScholarDigital Library
- Crescenzi, V., Mecca, G., and Merialdo, P. 2001. Roadrunner: Towards automatic data extraction from large Web sites. In Proceedings of Symposium on Very Large Databases (VLDB). VLDB Endowment. Google ScholarDigital Library
- Di Paola, R. A. 1969. The recursive unsolvability of the decision problem for the class of definite formulas. J ACM 16, 2, 324--327. Google ScholarDigital Library
- Diestel, R. 2005. Graph Theory. Springer-Verlag, New York.Google Scholar
- Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. 2003. Data exchange: Semantics and query answering. In Proceedings of the International Conference on Database Theory (ICDT). Springer-Verlag, Berlin, Germany. Google ScholarDigital Library
- Fagin, R., Kolaitis, P. G., Popa, L., and Tan, W. C. 2007. Quasi-inverses of schema mapping. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarDigital Library
- Fagin, R., Kolaitis, P. G., Tan, W.-C., and Popa, L. 2004. Composing schema mappings: Second-order dependencies to the rescue. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarDigital Library
- Fletcher, G. H. L. 2007. On the data mapping problem. Ph.D. dissertation, Indiana University. Google ScholarDigital Library
- Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability. A Guide to the Theory of NP-Completeness. W. H. Freeman, New York. Google ScholarDigital Library
- Gottlob, G., Leone, N., and Scarcello, F. 1997. On the complexity of some inductive logic programming problems. In Proceedings of the International Conference of Inductive Logic Programming (ILP). Google ScholarDigital Library
- Gottlob, G., Leone, N., and Scarcello, F. 1999. Hypertree decompositions and tractable queries. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarDigital Library
- Haas, L. M., Hernández, M. A., Ho, H., Popa, L., and Roth, M. 2005. Clio grows up: from research prototype to industrial tool. In Proceedings of SIGMOD. ACM, New York, 805--810. Google ScholarDigital Library
- Ilie, L., Solis-Oba, R., and Yu, S. 2002. Reducing the size of NFAs by using equivalences and preorders. In Combinatorial Pattern Matching. Google ScholarDigital Library
- Kolaitis, P. G. 2005. Schema mappings, data exchange, and metadata management. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarDigital Library
- Kőnig, D. 1936. Theorie der endlichen und unendlichen Graphen. Akademische Verlagsgesellschaft, Leipzig, Germany.Google Scholar
- Lavrač, N., and Džeroski, S. 1994. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, New York. Google ScholarDigital Library
- Lenzerini, M. 2002. Data integration: a theoretical perspective. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarDigital Library
- Li, M., and Vitányi, P. 1997. An Introduction to Kolmogorov Complexity and Its Applications, second ed. Springer-Verlag, New York. Google ScholarDigital Library
- Papadimitriou, C. H. 1994. Computational Complexity. Addison Wesley, Reading, MA.Google Scholar
- Rahm, E., and Bernstein, P. A. 2001. A survey of approaches to automatic schema matching. VLDB J 10, 4, 334--350. Google ScholarDigital Library
- Schaefer, M., and Umans, C. 2002. Completeness in the polynomial hierarchy, a compendium. SIGACT News 33, 3 (Sept.), 32--49.Google Scholar
- Senellart, P., and Gottlob, G. 2008. On the complexity of deriving schema mappings from database instances. In Proceedings of the ACM SIGACT-SIGMOD-SIGAAT Symposium on Principles of Database Systems (PODS). ACM, New York, 23--32. Google ScholarDigital Library
- Stockmeyer, L. J., and Meyer, A. R. 1973. Word problems requiring exponential time. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). ACM, New York. Google ScholarDigital Library
- Tarjan, R. E., and Yannakakis, M. 1984. Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J Comput 13, 3, 566--579. Google ScholarDigital Library
- Trakhtenbrot, B. A. 1963. Impossibility of an algorithm for the decision problem in finite classes. AMS Translations Series 2 23, 1--5.Google Scholar
- Wrathall, C. 1976. Complete sets and the polynomial-time hierarchy. Theoret Comput Sci 3, 1, 23--33.Google ScholarCross Ref
- Yannakakis, M. 1981. Algorithms for acyclic database schemes. In Proceedings of the Symposium on Very Large Databases (VLDB). VLDB Endowment. Google ScholarDigital Library
Index Terms
- Schema mapping discovery from data instances
Recommendations
Towards a theory of schema-mapping optimization
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsA schema mapping is a high-level specification that describes the relationship between two database schemas. As schema mappings constitute the essential building blocks of data exchange and data integration, an extensive investigation of the foundations ...
On the complexity of deriving schema mappings from database instances
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsWe introduce a theoretical framework for discovering relationships between two database instances over distinct and unknown schemata. This framework is grounded in the context of data exchange. We formalize the problem of understanding the relationship ...
Structural characterizations of schema-mapping languages
ICDT '09: Proceedings of the 12th International Conference on Database TheorySchema mappings are declarative specifications that describe the relationship between two database schemas. In recent years, there has been an extensive study of schema mappings and of their applications to several different data inter-operability tasks,...
Comments