Streaming Enumeration on Nested Documents

Authors Martín Muñoz, Cristian Riveros



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2022.19.pdf
  • Filesize: 0.88 MB
  • 18 pages

Document Identifiers

Author Details

Martín Muñoz
  • Pontificia Universidad Católica de Chile, Santiago, Chile
  • Millennium Institute for Foundational Research on Data, Santiago, Chile
Cristian Riveros
  • Pontificia Universidad Católica de Chile, Santiago, Chile
  • Millennium Institute for Foundational Research on Data, Santiago, Chile

Cite AsGet BibTex

Martín Muñoz and Cristian Riveros. Streaming Enumeration on Nested Documents. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 19:1-19:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ICDT.2022.19

Abstract

Some of the most relevant document schemas used online, such as XML and JSON, have a nested format. In the last decade, the task of extracting data from nested documents over streams has become especially relevant. We focus on the streaming evaluation of queries with outputs of varied sizes over nested documents. We model queries of this kind as Visibly Pushdown Transducers (VPT), a computational model that extends visibly pushdown automata with outputs and has the same expressive power as MSO over nested documents. Since processing a document through a VPT can generate a massive number of results, we are interested in reading the input in a streaming fashion and enumerating the outputs one after another as efficiently as possible, namely, with constant-delay. This paper presents an algorithm that enumerates these elements with constant-delay after processing the document stream in a single pass. Furthermore, we show that this algorithm is worst-case optimal in terms of update-time per symbol and memory usage.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database theory
Keywords
  • Streaming
  • nested documents
  • query evaluation
  • enumeration algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974. Google Scholar
  2. Mehmet Altınel and Michael J Franklin. Efficient filtering of XML documents for selective dissemination of information. In VLDB, pages 53-64, 2000. Google Scholar
  3. Rajeev Alur, Dana Fisman, Konstantinos Mamouras, Mukund Raghothaman, and Caleb Stanford. Streamable regular transductions. Theor. Comput. Sci., 807:15-41, 2020. Google Scholar
  4. Rajeev Alur and P. Madhusudan. Visibly pushdown languages. In STOC, pages 202-211, 2004. Google Scholar
  5. Antoine Amarilli, Pierre Bourhis, Louis Jachiet, and Stefan Mengel. A circuit-based approach to efficient enumeration. In ICALP, pages 111:1-111:15, 2017. Google Scholar
  6. Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Constant-delay enumeration for nondeterministic document spanners. In ICDT, pages 22:1-22:19, 2019. Google Scholar
  7. Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Enumeration on trees with tractable combined complexity and efficient updates. In PODS, pages 89-103, 2019. Google Scholar
  8. Marcelo Arenas, Luis Alberto Croquevielle, Rajesh Jayaram, and Cristian Riveros. Efficient logspace classes for enumeration, counting, and uniform generation. In PODS, pages 59-73, 2019. Google Scholar
  9. Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In SIGMOD, pages 1-16, 2002. Google Scholar
  10. Guillaume Bagan. MSO queries on tree decomposable structures are computable with linear delay. In CSL, pages 167-181, 2006. Google Scholar
  11. Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. On acyclic conjunctive queries and constant delay enumeration. In CSL, pages 208-222, 2007. Google Scholar
  12. Ziv Bar-Yossef, Marcus Fontoura, and Vanja Josifovski. Buffering in query evaluation over XML streams. In PODS, pages 216-227, 2005. Google Scholar
  13. Ziv Bar-Yossef, Marcus Fontoura, and Vanja Josifovski. On the memory requirements of XPath evaluation over XML streams. J. Comput. Syst. Sci., 73(3):391-441, 2007. Google Scholar
  14. Corentin Barloy, Filip Murlak, and Charles Paperman. Stackless processing of streamed trees. In PODS, 2021. Google Scholar
  15. Christoph Berkholz, Fabian Gerhardt, and Nicole Schweikardt. Constant delay enumeration for conjunctive queries: a tutorial. ACM SIGLOG News, 7(1):4-33, 2020. Google Scholar
  16. Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. Answering conjunctive queries under updates. In PODS, pages 303-318, 2017. Google Scholar
  17. Pierre Bourhis, Juan L. Reutter, and Domagoj Vrgoc. JSON: Data model and query languages. Inf. Syst., 89:101478, 2020. Google Scholar
  18. Mathieu Caralp, Pierre-Alain Reynier, and Jean-Marc Talbot. Trimming visibly pushdown automata. Theor. Comput. Sci., 578:13-29, 2015. Google Scholar
  19. Yi Chen, Susan B. Davidson, and Yifeng Zheng. An efficient XPath query processor for XML streams. In ICDE, page 79, 2006. Google Scholar
  20. Rada Chirkova and Jun Yang. Materialized views. Found. Trends Databases, 4(4):295-405, 2012. Google Scholar
  21. Bruno Courcelle. Linear delay enumeration and monadic second-order logic. Discret. Appl. Math., 157(12):2675-2700, 2009. Google Scholar
  22. James R Driscoll, Neil Sarnak, Daniel Dominic Sleator, and Robert Endre Tarjan. Making data structures persistent. In STOC, pages 109-121, 1986. Google Scholar
  23. Arnaud Durand and Etienne Grandjean. First-order queries on structures of bounded degree are computable with constant delay. ACM Trans. Comput. Log., 8(4):21, 2007. Google Scholar
  24. Emmanuel Filiot, Olivier Gauwin, Pierre-Alain Reynier, and Frédéric Servais. Streamability of nested word transductions. LMCS, 15(2), 2019. Google Scholar
  25. Emmanuel Filiot, Jean-François Raskin, Pierre-Alain Reynier, Frédéric Servais, and Jean-Marc Talbot. Visibly pushdown transducers. JCSS, 97:147-181, 2018. Google Scholar
  26. Fernando Florenzano, Cristian Riveros, Martín Ugarte, Stijn Vansummeren, and Domagoj Vrgoc. Efficient enumeration algorithms for regular document spanners. TODS, 45(1):3:1-3:42, 2020. Google Scholar
  27. Olivier Gauwin, Joachim Niehren, and Yves Roos. Streaming tree automata. Inf. Process. Lett., 109(1):13-17, 2008. Google Scholar
  28. Olivier Gauwin, Joachim Niehren, and Sophie Tison. Bounded delay and concurrency for earliest query answering. In LATA, volume 5457, pages 350-361, 2009. Google Scholar
  29. Olivier Gauwin, Joachim Niehren, and Sophie Tison. Earliest query answering for deterministic nested word automata. In FCT, volume 5699, pages 121-132, 2009. Google Scholar
  30. Gang Gou and Rada Chirkova. Efficient algorithms for evaluating XPath over streams. In SIGMOD, pages 269-280. ACM, 2007. Google Scholar
  31. Todd J. Green, Ashish Gupta, Gerome Miklau, Makoto Onizuka, and Dan Suciu. Processing XML streams with deterministic automata and stream indexes. ACM Trans. Database Syst., 29(4):752-788, 2004. Google Scholar
  32. Alejandro Grez and Cristian Riveros. Towards streaming evaluation of queries with correlation in complex event processing. In ICDT, pages 14:1-14:17, 2020. Google Scholar
  33. Alejandro Grez, Cristian Riveros, and Martín Ugarte. A formal framework for complex event processing. In ICDT, pages 5:1-5:18, 2019. Google Scholar
  34. Muhammad Idris, Martín Ugarte, and Stijn Vansummeren. The dynamic Yannakakis algorithm: Compact and efficient query processing under updates. In SIGMOD, pages 1259-1274, 2017. Google Scholar
  35. Mark Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theor. Comput. Sci., 43:169-188, 1986. Google Scholar
  36. Vanja Josifovski, Marcus Fontoura, and Attila Barta. Querying XML streams. VLDB J., 14(2):197-210, 2005. Google Scholar
  37. Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Trade-offs in static and dynamic evaluation of hierarchical queries. In PODS, pages 375-392, 2020. Google Scholar
  38. Viraj Kumar, P. Madhusudan, and Mahesh Viswanathan. Visibly pushdown automata for streaming XML. In WWW, pages 1053-1062, 2007. Google Scholar
  39. Leonid Libkin. Elements of finite model theory, volume 41. Springer, 2004. Google Scholar
  40. Milos Nikolic and Dan Olteanu. Incremental view maintenance with triple lock factorization benefits. In SIGMOD, pages 365-380, 2018. Google Scholar
  41. Dan Olteanu. SPEX: Streamed and progressive evaluation of XPath. IEEE Trans. Knowl. Data Eng., 19(7):934-949, 2007. Google Scholar
  42. Dan Olteanu, Tim Furche, and François Bry. An efficient single-pass query evaluator for XML data streams. In SAC, pages 627-631, 2004. Google Scholar
  43. Dan Olteanu and Jakub Závodný. Size bounds for factorised representations of query results. ACM TODS, 40(1):2:1-2:44, 2015. Google Scholar
  44. Luc Segoufin. Enumerating with constant delay the answers to a query. In ICDT, pages 10-20, 2013. Google Scholar
  45. Luc Segoufin and Victor Vianu. Validating streaming XML documents. In PODS, pages 53-64, 2002. Google Scholar
  46. Mirit Shalem and Ziv Bar-Yossef. The space complexity of processing XML twig queries over indexed documents. In ICDE, pages 824-832, 2008. Google Scholar
  47. Balder ten Cate and Maarten Marx. Navigational XPath: calculus and algebra. SIGMOD Record, 36(2):19-26, 2007. Google Scholar
  48. Szymon Torunczyk. Aggregate queries on sparse databases. In PODS, pages 427-443, 2020. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail