Automata-Based Stream Processing

Authors Rajeev Alur, Konstantinos Mamouras, Caleb Stanford

Thumbnail PDF


  • Filesize: 0.59 MB
  • 15 pages

Document Identifiers

Author Details

Rajeev Alur
Konstantinos Mamouras
Caleb Stanford

Cite AsGet BibTex

Rajeev Alur, Konstantinos Mamouras, and Caleb Stanford. Automata-Based Stream Processing. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 112:1-112:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


We propose an automata-theoretic framework for modularly expressing computations on streams of data. With weighted automata as a starting point, we identify three key features that are useful for an automaton model for stream processing: expressing the regular decomposition of streams whose data items are elements of a complex type (e.g., tuple of values), allowing the hierarchical nesting of several different kinds of aggregations, and specifying modularly the parallel execution and combination of various subcomputations. The combination of these features leads to subtle efficiency considerations that concern the interaction between nondeterminism, hierarchical nesting, and parallelism. We identify a syntactic restriction where the nondeterminism is unambiguous and parallel subcomputations synchronize their outputs. For automata satisfying these restrictions, we show that there is a space- and time-efficient streaming evaluation algorithm. We also prove that when these restrictions are relaxed, the evaluation problem becomes inherently computationally expensive.
  • weighted automata
  • Quantitative Regular Expressions
  • stream processing


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Alfred V. Aho. Algorithms for finding patterns in strings. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, chapter 5, pages 255-300. MIT Press/Elsevier, 1990. Google Scholar
  2. Shaull Almagor, Udi Boker, and Orna Kupferman. What’s decidable about weighted automata? In Tevfik Bultan and Pao-Ann Hsiung, editors, Proceedings of the 9th International Symposium on Automated Technology for Verification and Analysis (ATVA '11), pages 482-491. Springer Berlin Heidelberg, 2011. Google Scholar
  3. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137-147, 1999. Google Scholar
  4. Rajeev Alur, Loris D'Antoni, Jyotirmoy Deshmukh, Mukund Raghothaman, and Yifei Yuan. Regular functions and cost register automata. In Proceedings of the 28th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS '13), pages 13-22, 2013. Google Scholar
  5. Rajeev Alur, Dana Fisman, and Mukund Raghothaman. Regular programming for quantitative properties of data streams. In Proceedings of the 25th European Symposium on Programming (ESOP '16), pages 15-40, 2016. Google Scholar
  6. Arvind Arasu and Gurmeet Singh Manku. Approximate counts and quantiles over sliding windows. In Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS '04), pages 286-296, 2004. Google Scholar
  7. Gerard Berry and Ravi Sethi. From regular expressions to deterministic automata. Theoretical Computer Science, 48:117-126, 1986. Google Scholar
  8. Mikołaj Bojańczyk, Claire David, Anca Muscholl, Thomas Schwentick, and Luc Segoufin. Two-variable logic on data words. ACM Transactions on Computational Logic (TOCL), 12(4):27:1-27:26, 2011. Google Scholar
  9. Ronald Book, Shimon Even, Sheila Greibach, and Gene Ott. Ambiguity in graphs and expressions. IEEE Transactions on Computers, C-20(2):149-153, 1971. Google Scholar
  10. Krishnendu Chatterjee, Laurent Doyen, and Thomas A. Henzinger. Quantitative languages. ACM Transactions on Computational Logic (TOCL), 11(4):23, 2010. Google Scholar
  11. Krishnendu Chatterjee, Thomas A. Henzinger, and Jan Otop. Nested weighted automata. In Proceedings of the 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS '15), pages 725-737, 2015. Google Scholar
  12. Krishnendu Chatterjee, Thomas A. Henzinger, and Jan Otop. Quantitative monitor automata. In Xavier Rival, editor, Proceedings of the 23rd International Symposium on Static Analysis (SAS '16), pages 23-38. Springer Berlin Heidelberg, 2016. Google Scholar
  13. S. Chintapalli, D. Dagit, B. Evans, R. Farivar, T. Graves, M. Holderbaugh, Z. Liu, K. Nusbaum, K. Patil, B. J. Peng, and P. Poulosky. Benchmarking streaming computation engines: Storm, Flink and Spark Streaming. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 1789-1792, 2016. Google Scholar
  14. Gianpaolo Cugola and Alessandro Margara. Processing flows of information: From data stream to complex event processing. ACM Computing Surveys (CSUR), 44(3):15:1-15:62, 2012. Google Scholar
  15. Loris D'Antoni and Margus Veanes. Equivalence of extended symbolic finite transducers. In Proceedings of the 25th International Conference on Computer Aided Verification (CAV '13), pages 624-639, 2013. Google Scholar
  16. Loris D'Antoni and Margus Veanes. Minimization of symbolic automata. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '14), pages 541-553, 2014. Google Scholar
  17. Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 31(6):1794-1813, 2002. Google Scholar
  18. Stéphane Demri and Ranko Lazić. LTL with the freeze quantifier and register automata. ACM Transactions on Computational Logic (TOCL), 10(3):16:1-16:30, 2009. Google Scholar
  19. Manfred Droste, Werner Kuich, and Heiko Vogler, editors. Handbook of Weighted Automata. Springer, 2009. Google Scholar
  20. Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2):182-209, 1985. Google Scholar
  21. Martin Fürer. The complexity of the inequivalence problem for regular expressions with intersection. In Proceedings of the 7th International Colloquium on Automata, Languages and Programming (ICALP '80), pages 234-245, 1980. Google Scholar
  22. Namit Jain, Shailendra Mishra, Anand Srinivasan, Johannes Gehrke, Jennifer Widom, Hari Balakrishnan, Uǧur Çetintemel, Mitch Cherniack, Richard Tibbetts, and Stan Zdonik. Towards a streaming SQL standard. Proceedings of the VLDB Endowment, 1(2):1379-1390, 2008. Google Scholar
  23. Michael Kaminski and Nissim Francez. Finite-memory automata. Theoretical Computer Science, 134(2):329-363, 1994. Google Scholar
  24. Daniel Krob. The equality problem for rational series with multiplicities in the tropical semiring is undecidable. International Journal of Algebra and Computation, 4(3):405-425, 1994. Google Scholar
  25. Konstantinos Mamouras, Mukund Raghothaman, Rajeev Alur, Zachary G. Ives, and Sanjeev Khanna. Streamqre: modular specification and efficient evaluation of quantitative queries over streaming data. In Albert Cohen and Martin T. Vechev, editors, Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, pages 693-708. ACM, 2017. URL:
  26. Mehryar Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, 23(2):269-311, 1997. Google Scholar
  27. J. Ian Munro and Michael S. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, 12(3):315-323, 1980. Google Scholar
  28. Shanmugavelayutham Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trendsregistered in Theoretical Computer Science, 1(2):117-236, 2005. Google Scholar
  29. Frank Neven, Thomas Schwentick, and Victor Vianu. Finite state machines for strings over infinite alphabets. ACM Transactions on Computational Logic (TOCL), 5(3):403-435, 2004. Google Scholar
  30. Richard Edwin Stearns and Harry B. Hunt III. On the equivalence and containment problems for unambiguous regular expressions, regular grammars and finite automata. SIAM Journal on Computing, 14(3):598-611, 1985. Google Scholar
  31. Ken Thompson. Regular expression search algorithm. Communications of the ACM, 11(6):419-422, 1968. Google Scholar
  32. Pete Tucker, Kristin Tufte, Vassilis Papadimos, and David Maier. NEXMark: A benchmark for queries over data streams. Available at, 2002.
  33. Margus Veanes, Peli de Halleux, and Nikolai Tillmann. Rex: Symbolic regular expression explorer. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation (ICST '10), pages 498-507. IEEE, 2010. Google Scholar
  34. Margus Veanes, Pieter Hooimeijer, Benjamin Livshits, David Molnar, and Nikolaj Bjorner. Symbolic finite state transducers: Algorithms and applications. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '12), pages 137-150, 2012. Google Scholar