Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints

Authors Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid , Nicole Schweikardt , Matthias Weidlich



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2022.18.pdf
  • Filesize: 0.93 MB
  • 21 pages

Document Identifiers

Author Details

Sarah Kleest-Meißner
  • Humboldt-Universität zu Berlin, Germany
Rebecca Sattler
  • Humboldt-Universität zu Berlin, Germany
Markus L. Schmid
  • Humboldt-Universität zu Berlin, Germany
Nicole Schweikardt
  • Humboldt-Universität zu Berlin, Germany
Matthias Weidlich
  • Humboldt-Universität zu Berlin, Germany

Cite As Get BibTex

Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, and Matthias Weidlich. Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 18:1-18:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ICDT.2022.18

Abstract

We introduce subsequence-queries with wildcards and gap-size constraints (swg-queries, for short) as a tool for querying event traces. An swg-query q is given by a string s over an alphabet of variables and types, a global window size w, and a tuple c = ((c^-_1, c^+_1), (c^-_2, c^+_2), …, (c^-_{|s|-1}, c^+_{|s|-1})) of local gap-size constraints over ℕ × (ℕ ∪ {∞}). The query q matches in a trace t (i. e., a sequence of types) if the variables can uniformly be substituted by types such that the resulting string occurs in t as a subsequence that spans an area of length at most w, and the i^{th} gap of the subsequence (i. e., the distance between the i^{th} and (i+1)^{th} position of the subsequence) has length at least c^-_i and at most c^+_i. We formalise and investigate the task of discovering an swg-query that describes best the traces from a given sample S of traces, and we present an algorithm solving this task. As a central component, our algorithm repeatedly solves the matching problem (i. e., deciding whether a given query q matches in a given trace t), which is an NP-complete problem (in combined complexity). Hence, the matching problem is of special interest in the context of query discovery, and we therefore subject it to a detailed (parameterised) complexity analysis to identify tractable subclasses, which lead to tractable subclasses of the discovery problem as well. We complement this by a reduction proving that any query discovery algorithm also yields an algorithm for the matching problem. Hence, lower bounds on the complexity of the matching problem directly translate into according lower bounds of the query discovery problem. As a proof of concept, we also implemented a prototype of our algorithm and tested it on real-world data.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database query languages (principles)
  • Theory of computation → Data structures and algorithms for data management
  • Theory of computation → Problems, reductions and completeness
  • Information systems → Data mining
Keywords
  • event queries on traces
  • pattern queries on strings
  • learning descriptive queries
  • complexity of query evaluation and query learning

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, VLDB'94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile, pages 487-499. Morgan Kaufmann, 1994. URL: http://www.vldb.org/conf/1994/P487.PDF.
  2. Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. In Philip S. Yu and Arbee L. P. Chen, editors, Proceedings of the Eleventh International Conference on Data Engineering, March 6-10, 1995, Taipei, Taiwan, pages 3-14. IEEE Computer Society, 1995. URL: https://doi.org/10.1109/ICDE.1995.380415.
  3. Dana Angluin. Inductive inference of formal languages from positive data. Inf. Control., 45(2):117-135, 1980. URL: https://doi.org/10.1016/S0019-9958(80)90285-5.
  4. Alexander Artikis, Chris Baber, Pedro Bizarro, Carlos Canudas-de-Wit, Opher Etzion, Fabiana Fournier, Paul Goulart, Andrew Howes, John Lygeros, Georgios Paliouras, Assaf Schuster, and Izchak Sharfman. Scalable proactive event-driven decision making. IEEE Technol. Soc. Mag., 33(3):35-41, 2014. URL: https://doi.org/10.1109/MTS.2014.2345131.
  5. Alexander Artikis, Alessandro Margara, Martín Ugarte, Stijn Vansummeren, and Matthias Weidlich. Complex event recognition languages: Tutorial. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, DEBS 2017, Barcelona, Spain, June 19-23, 2017, pages 7-10. ACM, 2017. URL: https://doi.org/10.1145/3093742.3095106.
  6. Alexander Artikis, Matthias Weidlich, François Schnitzler, Ioannis Boutsis, Thomas Liebig, Nico Piatkowski, Christian Bockermann, Katharina Morik, Vana Kalogeraki, Jakub Marecek, Avigdor Gal, Shie Mannor, Dimitrios Gunopulos, and Dermot Kinane. Heterogeneous stream processing and crowdsourcing for urban traffic management. In Sihem Amer-Yahia, Vassilis Christophides, Anastasios Kementsietsidis, Minos N. Garofalakis, Stratos Idreos, and Vincent Leroy, editors, Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, March 24-28, 2014, pages 712-723. OpenProceedings.org, 2014. URL: https://doi.org/10.5441/002/edbt.2014.77.
  7. Lars Baumgärtner, Christian Strack, Bastian Hoßbach, Marc Seidemann, Bernhard Seeger, and Bernd Freisleben. Complex event processing for reactive security monitoring in virtualized computer systems. In Frank Eliassen and Roman Vitenberg, editors, Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS '15, Oslo, Norway, June 29 - July 3, 2015, pages 22-33. ACM, 2015. URL: https://doi.org/10.1145/2675743.2771829.
  8. Lei Chang, Tengjiao Wang, Dongqing Yang, and Hua Luan. Seqstream: Mining closed sequential patterns over stream sliding windows. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), December 15-19, 2008, Pisa, Italy, pages 83-92. IEEE Computer Society, 2008. URL: https://doi.org/10.1109/ICDM.2008.36.
  9. Hong Cheng, Xifeng Yan, and Jiawei Han. Incspan: incremental mining of sequential patterns in large database. In Won Kim, Ron Kohavi, Johannes Gehrke, and William DuMouchel, editors, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25, 2004, pages 527-532. ACM, 2004. URL: https://doi.org/10.1145/1014052.1014114.
  10. Gianpaolo Cugola and Alessandro Margara. Processing flows of information: From data stream to complex event processing. ACM Comput. Surv., 44(3):15:1-15:62, 2012. URL: https://doi.org/10.1145/2187671.2187677.
  11. Joel D. Day, Pamela Fleischmann, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. The edit distance to k-subsequence universality. In 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken, Germany (Virtual Conference), pages 25:1-25:19, 2021. URL: https://doi.org/10.4230/LIPIcs.STACS.2021.25.
  12. Rodney G. Downey and Michael R. Fellows. Parameterized Complexity. Monographs in Computer Science. Springer, 1999. URL: https://doi.org/10.1007/978-1-4612-0515-9.
  13. Rodney G. Downey and Michael R. Fellows. Fundamentals of Parameterized Complexity. Texts in Computer Science. Springer, 2013. URL: https://doi.org/10.1007/978-1-4471-5559-1.
  14. Henning Fernau, Florin Manea, Robert Mercas, and Markus L. Schmid. Revisiting Shinohara’s algorithm for computing descriptive patterns. Theor. Comput. Sci., 733:44-54, 2018. URL: https://doi.org/10.1016/j.tcs.2018.04.035.
  15. Henning Fernau and Markus L. Schmid. Pattern matching with variables: A multivariate complexity analysis. Inf. Comput., 242:287-305, 2015. URL: https://doi.org/10.1016/j.ic.2015.03.006.
  16. Henning Fernau, Markus L. Schmid, and Yngve Villanger. On the parameterised complexity of string morphism problems. Theory Comput. Syst., 59(1):24-51, 2016. URL: https://doi.org/10.1007/s00224-015-9635-3.
  17. Jörg Flum and Martin Grohe. Parameterized Complexity Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2006. URL: https://doi.org/10.1007/3-540-29953-X.
  18. Dominik D. Freydenberger and Daniel Reidenbach. Existence and nonexistence of descriptive patterns. Theor. Comput. Sci., 411(34-36):3274-3286, 2010. URL: https://doi.org/10.1016/j.tcs.2010.05.033.
  19. Dominik D. Freydenberger and Daniel Reidenbach. Inferring descriptive generalisations of formal languages. J. Comput. Syst. Sci., 79(5):622-639, 2013. URL: https://doi.org/10.1016/j.jcss.2012.10.001.
  20. Pawel Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Efficiently testing Simon’s congruence. In 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken, Germany (Virtual Conference), pages 34:1-34:18, 2021. URL: https://doi.org/10.4230/LIPIcs.STACS.2021.34.
  21. Lars George, Bruno Cadonna, and Matthias Weidlich. Il-miner: Instance-level discovery of complex event patterns. Proc. VLDB Endow., 10(1):25-36, 2016. URL: https://doi.org/10.14778/3015270.3015273.
  22. Nikos Giatrakos, Elias Alevizos, Alexander Artikis, Antonios Deligiannakis, and Minos N. Garofalakis. Complex event recognition in the big data era: a survey. VLDB J., 29(1):313-352, 2020. URL: https://doi.org/10.1007/s00778-019-00557-w.
  23. Florin Manea and Markus L. Schmid. Matching patterns with variables. In Combinatorics on Words - 12th International Conference, WORDS 2019, Loughborough, UK, September 9-13, 2019, Proceedings, pages 1-27, 2019. URL: https://doi.org/10.1007/978-3-030-28796-2_1.
  24. Alessandro Margara, Gianpaolo Cugola, and Giordano Tamburrelli. Learning from the past: automated rule generation for complex event processing. In Umesh Bellur and Ravi Kothari, editors, The 8th ACM International Conference on Distributed Event-Based Systems, DEBS '14, Mumbai, India, May 26-29, 2014, pages 47-58. ACM, 2014. URL: https://doi.org/10.1145/2611286.2611289.
  25. A. Mateescu and A. Salomaa. Patterns. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 1, pages 230-242. Springer, 1997. Google Scholar
  26. Charles Reiss, John Wilkes, and Joseph L Hellerstein. Google cluster-usage traces: format+ schema. Google Inc., White Paper, pages 1-14, 2011. Google Scholar
  27. T. Shinohara. Polynomial time inference of pattern languages and its application. In Proceedings of the 7th IBM Symposium on Mathematical Foundations of Computer Science, MFCS, pages 191-209, 1982. Google Scholar
  28. Takeshi Shinohara and Setsuo Arikawa. Pattern inference. In Algorithmic Learning for Knowledge-Based Systems, GOSLER Final Report, pages 259-291, 1995. URL: https://doi.org/10.1007/3-540-60217-8_13.
  29. Kia Teymourian, Malte Rohde, and Adrian Paschke. Knowledge-based processing of complex stock market events. In Elke A. Rundensteiner, Volker Markl, Ioana Manolescu, Sihem Amer-Yahia, Felix Naumann, and Ismail Ari, editors, 15th International Conference on Extending Database Technology, EDBT '12, Berlin, Germany, March 27-30, 2012, Proceedings, pages 594-597. ACM, 2012. URL: https://doi.org/10.1145/2247596.2247674.
  30. Jianyong Wang and Jiawei Han. BIDE: efficient mining of frequent closed sequences. In Z. Meral Özsoyoglu and Stanley B. Zdonik, editors, Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, 30 March - 2 April 2004, Boston, MA, USA, pages 79-90. IEEE Computer Society, 2004. URL: https://doi.org/10.1109/ICDE.2004.1319986.
  31. Haopeng Zhang, Yanlei Diao, and Neil Immerman. On complexity and optimization of expensive queries in complex event processing. In Curtis E. Dyreson, Feifei Li, and M. Tamer Özsu, editors, International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, pages 217-228. ACM, 2014. URL: https://doi.org/10.1145/2588555.2593671.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail