skip to main content
research-article

Validating XML documents in the streaming model with external memory

Authors Info & Claims
Published:04 December 2013Publication History
Skip Abstract Section

Abstract

We study the problem of validating XML documents of size N against general DTDs in the context of streaming algorithms. The starting point of this work is a well-known space lower bound. There are XML documents and DTDs for which p-pass streaming algorithms require Ω(N/p) space. We show that when allowing access to external memory, there is a deterministic streaming algorithm that solves this problem with memory space O(log2 N), a constant number of auxiliary read/write streams, and O(log N) total number of passes on the XML document and auxiliary streams. An important intermediate step of this algorithm is the computation of the First-Child-Next-Sibling (FCNS) encoding of the initial XML document in a streaming fashion. We study this problem independently, and we also provide memory-efficient streaming algorithms for decoding an XML document given in its FCNS encoding. Furthermore, validating XML documents encoding binary trees against any DTD in the usual streaming model without external memory can be done with sublinear memory. There is a one-pass algorithm using O(√N log N) space, and a bidirectional two-pass algorithm using O(log2 N) space which perform this task.

References

  1. Alon, N., Matias, Y., and Szegedy, M. 1999. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1, 137--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Balmin, A., Papakonstantinou, Y., and Vianu, V. 2004. Incremental validation of xml documents. ACM Trans. Database Syst. 29, 4, 710--751. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bar-Yossef, Z., Jayram, T. S., Kumar, R., and Sivakumar, D. 2004. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci. 68, 4, 702--732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Barbosa, D., Mendelzon, A. O., Libkin, L., Mignet, L., and Arenas, M. 2004. Efficient incremental validation of xml documents. In Proceedings of the 20th International Conference on Data Engineering (ICDE'04). 671--682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Beame, P. and Huynh-Ngoc, D.-T. 2008. On the value of multiple read/write streams for approximating frequency moments. In Proceedings of the 49th IEEE Annual Symposium on Foundations of Computer Science (FOCS'08). 499--508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Beame, P., Jayram, T., and Rudra, A. 2007. Lower bounds for randomized read/write stream algorithms. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC'07). 689--698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Demetrescu, C., Finocchi, I., and Ribichini, A. 2006. Trading off space for passes in graph streaming problems. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA'06). 714--723. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Grohe, M., Hernich, A., and Schweikardt, N. 2006. Randomized computations on large data sets: Tight lower bounds. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'06). 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Grohe, M., Hernich, A., and Schweikardt, N. 2009. Lower bounds for processing data with few random accesses to external memory. J. ACM 56, 3, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Grohe, M., Koch, C., and Schweikardt, N. 2005. The complexity of querying external memory and streaming data. In Proceedings of the 15th International Conference on Fundamentals of Computation Theory (FCT'05). 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Grohe, M., Koch, C., and Schweikardt, N. 2007. Tight lower bounds for query processing on streaming and external memory data. Theor. Comput. Sci. 380, 199--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Grohe, M. and Schweikardt, N. 2005. Lower bounds for sorting with few random accesses to external memory. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'05). 238--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Konrad, C. and Magniez, F. 2012. Validating xml documents in the streaming model with external memory. In Proceedings of the 15th International Conference on Database Theory (ICDT'12). 34--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kushilevitz, E. and Nisan, N. 1997. Communication Complexity. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Magniez, F., Mathieu, C., and Nayak, A. 2010. Recognizing well-parenthesized expressions in the streaming model. In Proceedings of the 42nd ACM symposium on Theory of Computing (STOC'10). 261--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Martens, W., Neven, F., Schwentick, T., and Bex, G. J. 2006. Expressiveness and complexity of xml schema. ACM Trans. Database Syst. 31, 3, 770--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Muthukrishnan, S. 2005. Data Streams: Algorithms and Applications. Now Publishers Inc.Google ScholarGoogle ScholarCross RefCross Ref
  18. Neven, F. 2002. Automata theory for xml researchers. SIGMOD Rec. 31, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Papakonstantinou, Y. and Vianu, V. 2000. DTD inference for views of xml data. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'00). ACM Press, New York, 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Segoufin, L. and Sirangelo, C. 2007. Constant-memory validation of streaming xml documents against dtds. In Proceedings of the 11th International Conference on Database Theory (ICDT'07). 299--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Segoufin, L. and Vianu, V. 2002. Validating streaming xml documents. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'02). 53--64. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Validating XML documents in the streaming model with external memory

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 38, Issue 4
      Invited papers issue
      November 2013
      294 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/2539032
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 December 2013
      • Accepted: 1 July 2013
      • Revised: 1 May 2013
      • Received: 1 January 2013
      Published in tods Volume 38, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader