Abstract
We study the problem of validating XML documents of size N against general DTDs in the context of streaming algorithms. The starting point of this work is a well-known space lower bound. There are XML documents and DTDs for which p-pass streaming algorithms require Ω(N/p) space. We show that when allowing access to external memory, there is a deterministic streaming algorithm that solves this problem with memory space O(log2 N), a constant number of auxiliary read/write streams, and O(log N) total number of passes on the XML document and auxiliary streams. An important intermediate step of this algorithm is the computation of the First-Child-Next-Sibling (FCNS) encoding of the initial XML document in a streaming fashion. We study this problem independently, and we also provide memory-efficient streaming algorithms for decoding an XML document given in its FCNS encoding. Furthermore, validating XML documents encoding binary trees against any DTD in the usual streaming model without external memory can be done with sublinear memory. There is a one-pass algorithm using O(√N log N) space, and a bidirectional two-pass algorithm using O(log2 N) space which perform this task.
- Alon, N., Matias, Y., and Szegedy, M. 1999. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1, 137--147. Google ScholarDigital Library
- Balmin, A., Papakonstantinou, Y., and Vianu, V. 2004. Incremental validation of xml documents. ACM Trans. Database Syst. 29, 4, 710--751. Google ScholarDigital Library
- Bar-Yossef, Z., Jayram, T. S., Kumar, R., and Sivakumar, D. 2004. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci. 68, 4, 702--732. Google ScholarDigital Library
- Barbosa, D., Mendelzon, A. O., Libkin, L., Mignet, L., and Arenas, M. 2004. Efficient incremental validation of xml documents. In Proceedings of the 20th International Conference on Data Engineering (ICDE'04). 671--682. Google ScholarDigital Library
- Beame, P. and Huynh-Ngoc, D.-T. 2008. On the value of multiple read/write streams for approximating frequency moments. In Proceedings of the 49th IEEE Annual Symposium on Foundations of Computer Science (FOCS'08). 499--508. Google ScholarDigital Library
- Beame, P., Jayram, T., and Rudra, A. 2007. Lower bounds for randomized read/write stream algorithms. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC'07). 689--698. Google ScholarDigital Library
- Demetrescu, C., Finocchi, I., and Ribichini, A. 2006. Trading off space for passes in graph streaming problems. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA'06). 714--723. Google ScholarDigital Library
- Grohe, M., Hernich, A., and Schweikardt, N. 2006. Randomized computations on large data sets: Tight lower bounds. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'06). 243--252. Google ScholarDigital Library
- Grohe, M., Hernich, A., and Schweikardt, N. 2009. Lower bounds for processing data with few random accesses to external memory. J. ACM 56, 3, 1--16. Google ScholarDigital Library
- Grohe, M., Koch, C., and Schweikardt, N. 2005. The complexity of querying external memory and streaming data. In Proceedings of the 15th International Conference on Fundamentals of Computation Theory (FCT'05). 1--16. Google ScholarDigital Library
- Grohe, M., Koch, C., and Schweikardt, N. 2007. Tight lower bounds for query processing on streaming and external memory data. Theor. Comput. Sci. 380, 199--217. Google ScholarDigital Library
- Grohe, M. and Schweikardt, N. 2005. Lower bounds for sorting with few random accesses to external memory. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'05). 238--249. Google ScholarDigital Library
- Konrad, C. and Magniez, F. 2012. Validating xml documents in the streaming model with external memory. In Proceedings of the 15th International Conference on Database Theory (ICDT'12). 34--45. Google ScholarDigital Library
- Kushilevitz, E. and Nisan, N. 1997. Communication Complexity. Cambridge University Press. Google ScholarDigital Library
- Magniez, F., Mathieu, C., and Nayak, A. 2010. Recognizing well-parenthesized expressions in the streaming model. In Proceedings of the 42nd ACM symposium on Theory of Computing (STOC'10). 261--270. Google ScholarDigital Library
- Martens, W., Neven, F., Schwentick, T., and Bex, G. J. 2006. Expressiveness and complexity of xml schema. ACM Trans. Database Syst. 31, 3, 770--813. Google ScholarDigital Library
- Muthukrishnan, S. 2005. Data Streams: Algorithms and Applications. Now Publishers Inc.Google ScholarCross Ref
- Neven, F. 2002. Automata theory for xml researchers. SIGMOD Rec. 31, 2002. Google ScholarDigital Library
- Papakonstantinou, Y. and Vianu, V. 2000. DTD inference for views of xml data. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'00). ACM Press, New York, 35--46. Google ScholarDigital Library
- Segoufin, L. and Sirangelo, C. 2007. Constant-memory validation of streaming xml documents against dtds. In Proceedings of the 11th International Conference on Database Theory (ICDT'07). 299--313. Google ScholarDigital Library
- Segoufin, L. and Vianu, V. 2002. Validating streaming xml documents. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'02). 53--64. Google ScholarDigital Library
Index Terms
- Validating XML documents in the streaming model with external memory
Recommendations
Validating XML documents in the streaming model with external memory
ICDT '12: Proceedings of the 15th International Conference on Database TheoryWe study the problem of validating XML documents of size N against general DTDs in the context of streaming algorithms. The starting point of this work is a well-known space lower bound. There are XML documents and DTDs for which p-pass streaming ...
Validating streaming XML documents
PODS '02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsThis paper investigates the on-line validation of streaming XML documents with respect to a DTD, under memory constraints. We first consider validation using constant memory, formalized by a finite-state automaton (FSA). We examine two flavors of the ...
Constant-memory validation of streaming XML documents against DTDs
ICDT'07: Proceedings of the 11th international conference on Database TheoryIn this paper we investigate the problem of validating, with constant memory, streaming XML documents with respect to a DTD. Such constant memory validations can only be performed for some but not all DTDs. This paper gives a non trivial interesting ...
Comments