Subsequences with Gap Constraints: Complexity Bounds for Matching and Analysis Problems

Day, Joel D.; Kosche, Maria; Manea, Florin; Schmid, Markus L.

doi:10.4230/LIPIcs.ISAAC.2022.64

Abstract

We consider subsequences with gap constraints, i. e., length-k subsequences p that can be embedded into a string w such that the induced gaps (i. e., the factors of w between the positions to which p is mapped to) satisfy given gap constraints gc = (C_1, C_2, …, C_{k-1}); we call p a gc-subsequence of w. In the case where the gap constraints gc are defined by lower and upper length bounds C_i = (L^-_i, L^+_i) ∈ ℕ² and/or regular languages C_i ∈ REG, we prove tight (conditional on the orthogonal vectors (OV) hypothesis) complexity bounds for checking whether a given p is a gc-subsequence of a string w. We also consider the whole set of all gc-subsequences of a string, and investigate the complexity of the universality, equivalence and containment problems for these sets of gc-subsequences.

Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59-78, 2015. URL: https://doi.org/10.1109/FOCS.2015.14.
Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. Consequences of faster alignment of sequences. In Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I, pages 39-51, 2014. URL: https://doi.org/10.1007/978-3-662-43948-7_4.
Roberto Amadini. A survey on string constraint solving. ACM Computing Surveys (CSUR), 55(1):1-38, 2021.
Dana Angluin. Finding patterns common to a set of strings. J. Comput. Syst. Sci., 21(1):46-62, 1980.
Alexander Artikis, Alessandro Margara, Martín Ugarte, Stijn Vansummeren, and Matthias Weidlich. Complex event recognition languages: Tutorial. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, DEBS 2017, Barcelona, Spain, June 19-23, 2017, pages 7-10, 2017. URL: https://doi.org/10.1145/3093742.3095106.
Arturs Backurs and Piotr Indyk. Which regular expression patterns are hard to match? In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pages 457-466, 2016. URL: https://doi.org/10.1109/FOCS.2016.56.
Johannes Bader, Simon Gog, and Matthias Petri. Practical variable length gap pattern matching. In Experimental Algorithms - 15th International Symposium, SEA 2016, St. Petersburg, Russia, June 5-8, 2016, Proceedings, pages 1-16, 2016. URL: https://doi.org/10.1007/978-3-319-38851-9_1.
Ricardo A. Baeza-Yates. Searching subsequences. Theor. Comput. Sci., 78(2):363-376, 1991.
Gabriel Bathie and Tatiana Starikovskaya. Property testing of regular languages with applications to streaming property testing of visibly pushdown languages. In ICALP, volume 198 of LIPIcs, pages 119:1-119:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind. String matching with variable length gaps. Theor. Comput. Sci., 443:25-34, 2012. URL: https://doi.org/10.1016/j.tcs.2012.03.029.
Francine Blanchet-Sadri. Algorithmic Combinatorics on Partial Words. Discrete mathematics and its applications. CRC Press, 2008.
Karl Bringmann. Why walking the dog takes time: Frechet distance has no strongly subquadratic algorithms unless SETH fails. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 661-670, 2014. URL: https://doi.org/10.1109/FOCS.2014.76.
Karl Bringmann. Fine-grained complexity theory (tutorial). In 36th International Symposium on Theoretical Aspects of Computer Science, STACS 2019, March 13-16, 2019, Berlin, Germany, pages 4:1-4:7, 2019. URL: https://doi.org/10.4230/LIPIcs.STACS.2019.4.
Karl Bringmann and Bhaskar Ray Chaudhury. Sketching, streaming, and fine-grained complexity of (weighted) LCS. In Proc. FSTTCS 2018, volume 122 of LIPIcs, pages 40:1-40:16, 2018.
Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In Proc. SODA 2018, pages 1216-1235, 2018.
Sam Buss and Michael Soltys. Unshuffling a square is NP-hard. J. Comput. Syst. Sci., 80(4):766-776, 2014. URL: https://doi.org/10.1016/j.jcss.2013.11.002.
Manuel Cáceres, Simon J. Puglisi, and Bella Zhukova. Fast indexes for gapped pattern matching. In SOFSEM 2020: Theory and Practice of Computer Science - 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, Limassol, Cyprus, January 20-24, 2020, Proceedings, pages 493-504, 2020. URL: https://doi.org/10.1007/978-3-030-38919-2_40.
Peter Clifford and Raphaël Clifford. Simple deterministic wildcard matching. Inf. Process. Lett., 101(2):53-54, 2007.
Maxime Crochemore, Borivoj Melichar, and Zdenek Tronícek. Directed acyclic subsequence graph - Overview. J. Discrete Algorithms, 1(3-4):255-280, 2003.
Joel D. Day, Maria Kosche, Florin Manea, and Markus L. Schmid. Subsequences with gap constraints: Complexity bounds for matching and analysis problems. CoRR, 2022. URL: https://doi.org/10.48550/ARXIV.2206.13896.
Patrick Dinklage, Johannes Fischer, Alexander Herlez, Tomasz Kociumaka, and Florian Kurpicz. Practical Performance of Space Efficient Data Structures for Longest Common Extensions. In 28th Annual European Symposium on Algorithms (ESA 2020), volume 173 of Leibniz International Proceedings in Informatics (LIPIcs), pages 39:1-39:20, 2020.
Bartlomiej Dudek, Pawel Gawrychowski, Garance Gourdel, and Tatiana Starikovskaya. Streaming regular expression membership and pattern matching. In SODA, pages 670-694. SIAM, 2022.
Henning Fernau, Florin Manea, Robert Mercas, and Markus L. Schmid. Pattern matching with variables: Efficient algorithms and complexity results. ACM Trans. Comput. Theory, 12(1):6:1-6:37, 2020.
Lukas Fleischer and Manfred Kufleitner. Testing Simon’s congruence. In Proc. MFCS 2018, volume 117 of LIPIcs, pages 62:1-62:13, 2018.
Dominik D. Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, and Wojciech Rytter. Testing k-binomial equivalence. In Multidisciplinary Creativity, a collection of papers dedicated to G. Păun 65th birthday, pages 239-248, 2015. available in CoRR abs/1509.00622.
Dominik D Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, and Wojciech Rytter. Testing k-binomial equivalence. arXiv preprint arXiv:1509.00622, 2015.
Moses Ganardi, Danny Hucke, Daniel König, Markus Lohrey, and Konstantinos Mamouras. Automata theory on sliding windows. In STACS, volume 96 of LIPIcs, pages 31:1-31:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018.
Moses Ganardi, Danny Hucke, and Markus Lohrey. Querying regular languages over sliding windows. In FSTTCS, volume 65 of LIPIcs, pages 18:1-18:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.
Moses Ganardi, Danny Hucke, and Markus Lohrey. Randomized sliding window algorithms for regular languages. In ICALP, volume 107 of LIPIcs, pages 127:1-127:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018.
Moses Ganardi, Danny Hucke, and Markus Lohrey. Sliding window algorithms for regular languages. In LATA, volume 10792 of Lecture Notes in Computer Science, pages 26-35. Springer, 2018.
Moses Ganardi, Danny Hucke, Markus Lohrey, and Tatiana Starikovskaya. Sliding window property testing for regular languages. In ISAAC, volume 149 of LIPIcs, pages 6:1-6:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
Emmanuelle Garel. Minimal separators of two words. In Proc. CPM 1993, volume 684 of Lecture Notes in Computer Science, pages 35-53, 1993.
Pawel Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Efficiently testing Simon’s congruence. In 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken, Germany (Virtual Conference), pages 34:1-34:18, 2021. URL: https://doi.org/10.4230/LIPIcs.STACS.2021.34.
Nikos Giatrakos, Elias Alevizos, Alexander Artikis, Antonios Deligiannakis, and Minos N. Garofalakis. Complex event recognition in the big data era: a survey. VLDB J., 29(1):313-352, 2020. URL: https://doi.org/10.1007/s00778-019-00557-w.
Simon Halfon, Philippe Schnoebelen, and Georg Zetzsche. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. LICS 2017, pages 1-12, 2017.
Jean-Jacques Hebrard. An algorithm for distinguishing efficiently bit-strings by their subsequences. Theor. Comput. Sci., 82(1):35-49, 22 May 1991.
John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979.
Costas S. Iliopoulos, Marcin Kubica, M. Sohel Rahman, and Tomasz Walen. Algorithms for computing the longest parameterized common subsequence. In Combinatorial Pattern Matching, 18th Annual Symposium, CPM 2007, London, Canada, July 9-11, 2007, Proceedings, pages 265-273, 2007. URL: https://doi.org/10.1007/978-3-540-73437-6_27.
Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-SAT. J. Comput. Syst. Sci., 62(2):367-375, 2001. URL: https://doi.org/10.1006/jcss.2000.1727.
Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Comput. Syst. Sci., 63(4):512-530, 2001. URL: https://doi.org/10.1006/jcss.2001.1774.
Prateek Karandikar, Manfred Kufleitner, and Philippe Schnoebelen. On the index of Simon’s congruence for piecewise testability. Inf. Process. Lett., 115(4):515-519, 2015.
Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages with applications in logical complexity. In Proc. CSL 2016, volume 62 of LIPIcs, pages 37:1-37:22, 2016.
Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages and the complexity of the logic of subwords. Log. Methods Comput. Sci., 15(2), 2019.
Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, and Matthias Weidlich. Discovering event queries from traces: Laying foundations for subsequence-queries with wildcards and gap-size constraints. In 25th International Conference on Database Theory, ICDT 2022, volume 220 of LIPIcs, pages 18:1-18:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPIcs.ICDT.2022.18.
Dietrich Kuske. The subtrace order and counting first-order logic. In Proc. CSR 2020, volume 12159 of Lecture Notes in Computer Science, pages 289-302, 2020.
Dietrich Kuske and Georg Zetzsche. Languages ordered by the subword order. In Proc. FOSSACS 2019, volume 11425 of Lecture Notes in Computer Science, pages 348-364, 2019.
Marie Lejeune, Julien Leroy, and Michel Rigo. Computing the k-binomial complexity of the Thue-Morse word. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 278-291, 2019.
Julien Leroy, Michel Rigo, and Manon Stipulanti. Generalized Pascal triangle for binomial coefficients of words. Electron. J. Combin., 24(1.44):36 pp., 2017.
Chun Li and Jianyong Wang. Efficiently mining closed subsequences with gap constraints. In SDM, pages 313-322. SIAM, 2008.
Chun Li, Qingyan Yang, Jianyong Wang, and Ming Li. Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data, 6(1):2:1-2:39, 2012.
Daniel Lokshtanov, Dániel Marx, and Saket Saurabh. Lower bounds based on the exponential time hypothesis. Bull. EATCS, 105:41-72, 2011. URL: http://eatcs.org/beatcs/index.php/beatcs/article/view/92.
David Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25(2):322-336, April 1978.
Alexandru Mateescu, Arto Salomaa, and Sheng Yu. Subword histories and Parikh matrices. J. Comput. Syst. Sci., 68(1):1-21, 2004.
William E. Riddle. An approach to software system modelling and analysis. Comput. Lang., 4(1):49-66, 1979. URL: https://doi.org/10.1016/0096-0551(79)90009-2.
Michel Rigo and Pavel Salimov. Another generalization of abelian equivalence: Binomial complexity of infinite words. Theor. Comput. Sci., 601:47-57, 2015.
Arto Salomaa. Connections between subwords and certain matrix mappings. Theoret. Comput. Sci., 340(2):188-203, 2005.
Shinnosuke Seki. Absoluteness of subword inequality is undecidable. Theor. Comput. Sci., 418:116-120, 2012. URL: https://doi.org/10.1016/j.tcs.2011.10.017.
Alan C. Shaw. Software descriptions with flow expressions. IEEE Trans. Software Eng., 4(3):242-254, 1978. URL: https://doi.org/10.1109/TSE.1978.231501.
Imre Simon. Hierarchies of events with dot-depth one - Ph.D. thesis. University of Waterloo, 1972.
Imre Simon. Piecewise testable events. In Autom. Theor. Form. Lang., 2nd GI Conf., volume 33 of LNCS, pages 214-222, 1975.
Imre Simon. Words distinguished by their subwords (extended abstract). In Proc. WORDS 2003, volume 27 of TUCS General Publication, pages 6-13, 2003.
Ken Thompson. Regular expression search algorithm. Commun. ACM, 11(6):419-422, 1968. URL: https://doi.org/10.1145/363347.363387.
Zdenek Tronícek. Common subsequence automaton. In Proc. CIAA 2002 (Revised Papers), volume 2608 of Lecture Notes in Computer Science, pages 270-275, 2002.
Wen-Guey Tzeng. A polynomial-time algorithm for the equivalence of probabilistic automata. SIAM J. Comput., 21(2):216-227, 1992. URL: https://doi.org/10.1137/0221017.
Virginia Vassilevska Williams. Hardness of easy problems: Basing hardness on popular conjectures such as the strong exponential time hypothesis (invited talk). In 10th International Symposium on Parameterized and Exact Computation, IPEC 2015, September 16-18, 2015, Patras, Greece, pages 17-29, 2015. URL: https://doi.org/10.4230/LIPIcs.IPEC.2015.17.
Georg Zetzsche. The complexity of downward closure comparisons. In Proc. ICALP 2016, volume 55 of LIPIcs, pages 123:1-123:14, 2016.
Haopeng Zhang, Yanlei Diao, and Neil Immerman. On complexity and optimization of expensive queries in complex event processing. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, pages 217-228, 2014. URL: https://doi.org/10.1145/2588555.2593671.

Subsequences with Gap Constraints: Complexity Bounds for Matching and Analysis Problems

Authors Joel D. Day , Maria Kosche , Florin Manea , Markus L. Schmid

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Subsequences with Gap Constraints: Complexity Bounds for Matching and Analysis Problems

Authors Joel D. Day , Maria Kosche , Florin Manea , Markus L. Schmid

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message