Subsequences with Gap Constraints: Complexity Bounds for Matching and Analysis Problems

Authors Joel D. Day , Maria Kosche , Florin Manea , Markus L. Schmid



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2022.64.pdf
  • Filesize: 1.44 MB
  • 18 pages

Document Identifiers

Author Details

Joel D. Day
  • Loughborough University, UK
Maria Kosche
  • Computer Science Department, Universität Göttingen, Germany
Florin Manea
  • Computer Science Department and CIDAS, Universität Göttingen, Germany
Markus L. Schmid
  • Humboldt-Universität zu Berlin, Germany

Cite AsGet BibTex

Joel D. Day, Maria Kosche, Florin Manea, and Markus L. Schmid. Subsequences with Gap Constraints: Complexity Bounds for Matching and Analysis Problems. In 33rd International Symposium on Algorithms and Computation (ISAAC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 248, pp. 64:1-64:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ISAAC.2022.64

Abstract

We consider subsequences with gap constraints, i. e., length-k subsequences p that can be embedded into a string w such that the induced gaps (i. e., the factors of w between the positions to which p is mapped to) satisfy given gap constraints gc = (C_1, C_2, …, C_{k-1}); we call p a gc-subsequence of w. In the case where the gap constraints gc are defined by lower and upper length bounds C_i = (L^-_i, L^+_i) ∈ ℕ² and/or regular languages C_i ∈ REG, we prove tight (conditional on the orthogonal vectors (OV) hypothesis) complexity bounds for checking whether a given p is a gc-subsequence of a string w. We also consider the whole set of all gc-subsequences of a string, and investigate the complexity of the universality, equivalence and containment problems for these sets of gc-subsequences.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • String algorithms
  • subsequences with gap constraints
  • pattern matching
  • fine-grained complexity
  • conditional lower bounds
  • parameterised complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59-78, 2015. URL: https://doi.org/10.1109/FOCS.2015.14.
  2. Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. Consequences of faster alignment of sequences. In Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I, pages 39-51, 2014. URL: https://doi.org/10.1007/978-3-662-43948-7_4.
  3. Roberto Amadini. A survey on string constraint solving. ACM Computing Surveys (CSUR), 55(1):1-38, 2021. Google Scholar
  4. Dana Angluin. Finding patterns common to a set of strings. J. Comput. Syst. Sci., 21(1):46-62, 1980. Google Scholar
  5. Alexander Artikis, Alessandro Margara, Martín Ugarte, Stijn Vansummeren, and Matthias Weidlich. Complex event recognition languages: Tutorial. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, DEBS 2017, Barcelona, Spain, June 19-23, 2017, pages 7-10, 2017. URL: https://doi.org/10.1145/3093742.3095106.
  6. Arturs Backurs and Piotr Indyk. Which regular expression patterns are hard to match? In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pages 457-466, 2016. URL: https://doi.org/10.1109/FOCS.2016.56.
  7. Johannes Bader, Simon Gog, and Matthias Petri. Practical variable length gap pattern matching. In Experimental Algorithms - 15th International Symposium, SEA 2016, St. Petersburg, Russia, June 5-8, 2016, Proceedings, pages 1-16, 2016. URL: https://doi.org/10.1007/978-3-319-38851-9_1.
  8. Ricardo A. Baeza-Yates. Searching subsequences. Theor. Comput. Sci., 78(2):363-376, 1991. Google Scholar
  9. Gabriel Bathie and Tatiana Starikovskaya. Property testing of regular languages with applications to streaming property testing of visibly pushdown languages. In ICALP, volume 198 of LIPIcs, pages 119:1-119:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. Google Scholar
  10. Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind. String matching with variable length gaps. Theor. Comput. Sci., 443:25-34, 2012. URL: https://doi.org/10.1016/j.tcs.2012.03.029.
  11. Francine Blanchet-Sadri. Algorithmic Combinatorics on Partial Words. Discrete mathematics and its applications. CRC Press, 2008. Google Scholar
  12. Karl Bringmann. Why walking the dog takes time: Frechet distance has no strongly subquadratic algorithms unless SETH fails. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 661-670, 2014. URL: https://doi.org/10.1109/FOCS.2014.76.
  13. Karl Bringmann. Fine-grained complexity theory (tutorial). In 36th International Symposium on Theoretical Aspects of Computer Science, STACS 2019, March 13-16, 2019, Berlin, Germany, pages 4:1-4:7, 2019. URL: https://doi.org/10.4230/LIPIcs.STACS.2019.4.
  14. Karl Bringmann and Bhaskar Ray Chaudhury. Sketching, streaming, and fine-grained complexity of (weighted) LCS. In Proc. FSTTCS 2018, volume 122 of LIPIcs, pages 40:1-40:16, 2018. Google Scholar
  15. Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In Proc. SODA 2018, pages 1216-1235, 2018. Google Scholar
  16. Sam Buss and Michael Soltys. Unshuffling a square is NP-hard. J. Comput. Syst. Sci., 80(4):766-776, 2014. URL: https://doi.org/10.1016/j.jcss.2013.11.002.
  17. Manuel Cáceres, Simon J. Puglisi, and Bella Zhukova. Fast indexes for gapped pattern matching. In SOFSEM 2020: Theory and Practice of Computer Science - 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, Limassol, Cyprus, January 20-24, 2020, Proceedings, pages 493-504, 2020. URL: https://doi.org/10.1007/978-3-030-38919-2_40.
  18. Peter Clifford and Raphaël Clifford. Simple deterministic wildcard matching. Inf. Process. Lett., 101(2):53-54, 2007. Google Scholar
  19. Maxime Crochemore, Borivoj Melichar, and Zdenek Tronícek. Directed acyclic subsequence graph - Overview. J. Discrete Algorithms, 1(3-4):255-280, 2003. Google Scholar
  20. Joel D. Day, Maria Kosche, Florin Manea, and Markus L. Schmid. Subsequences with gap constraints: Complexity bounds for matching and analysis problems. CoRR, 2022. URL: https://doi.org/10.48550/ARXIV.2206.13896.
  21. Patrick Dinklage, Johannes Fischer, Alexander Herlez, Tomasz Kociumaka, and Florian Kurpicz. Practical Performance of Space Efficient Data Structures for Longest Common Extensions. In 28th Annual European Symposium on Algorithms (ESA 2020), volume 173 of Leibniz International Proceedings in Informatics (LIPIcs), pages 39:1-39:20, 2020. Google Scholar
  22. Bartlomiej Dudek, Pawel Gawrychowski, Garance Gourdel, and Tatiana Starikovskaya. Streaming regular expression membership and pattern matching. In SODA, pages 670-694. SIAM, 2022. Google Scholar
  23. Henning Fernau, Florin Manea, Robert Mercas, and Markus L. Schmid. Pattern matching with variables: Efficient algorithms and complexity results. ACM Trans. Comput. Theory, 12(1):6:1-6:37, 2020. Google Scholar
  24. Lukas Fleischer and Manfred Kufleitner. Testing Simon’s congruence. In Proc. MFCS 2018, volume 117 of LIPIcs, pages 62:1-62:13, 2018. Google Scholar
  25. Dominik D. Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, and Wojciech Rytter. Testing k-binomial equivalence. In Multidisciplinary Creativity, a collection of papers dedicated to G. Păun 65th birthday, pages 239-248, 2015. available in CoRR abs/1509.00622. Google Scholar
  26. Dominik D Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, and Wojciech Rytter. Testing k-binomial equivalence. arXiv preprint arXiv:1509.00622, 2015. Google Scholar
  27. Moses Ganardi, Danny Hucke, Daniel König, Markus Lohrey, and Konstantinos Mamouras. Automata theory on sliding windows. In STACS, volume 96 of LIPIcs, pages 31:1-31:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. Google Scholar
  28. Moses Ganardi, Danny Hucke, and Markus Lohrey. Querying regular languages over sliding windows. In FSTTCS, volume 65 of LIPIcs, pages 18:1-18:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. Google Scholar
  29. Moses Ganardi, Danny Hucke, and Markus Lohrey. Randomized sliding window algorithms for regular languages. In ICALP, volume 107 of LIPIcs, pages 127:1-127:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. Google Scholar
  30. Moses Ganardi, Danny Hucke, and Markus Lohrey. Sliding window algorithms for regular languages. In LATA, volume 10792 of Lecture Notes in Computer Science, pages 26-35. Springer, 2018. Google Scholar
  31. Moses Ganardi, Danny Hucke, Markus Lohrey, and Tatiana Starikovskaya. Sliding window property testing for regular languages. In ISAAC, volume 149 of LIPIcs, pages 6:1-6:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. Google Scholar
  32. Emmanuelle Garel. Minimal separators of two words. In Proc. CPM 1993, volume 684 of Lecture Notes in Computer Science, pages 35-53, 1993. Google Scholar
  33. Pawel Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Efficiently testing Simon’s congruence. In 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken, Germany (Virtual Conference), pages 34:1-34:18, 2021. URL: https://doi.org/10.4230/LIPIcs.STACS.2021.34.
  34. Nikos Giatrakos, Elias Alevizos, Alexander Artikis, Antonios Deligiannakis, and Minos N. Garofalakis. Complex event recognition in the big data era: a survey. VLDB J., 29(1):313-352, 2020. URL: https://doi.org/10.1007/s00778-019-00557-w.
  35. Simon Halfon, Philippe Schnoebelen, and Georg Zetzsche. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. LICS 2017, pages 1-12, 2017. Google Scholar
  36. Jean-Jacques Hebrard. An algorithm for distinguishing efficiently bit-strings by their subsequences. Theor. Comput. Sci., 82(1):35-49, 22 May 1991. Google Scholar
  37. John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979. Google Scholar
  38. Costas S. Iliopoulos, Marcin Kubica, M. Sohel Rahman, and Tomasz Walen. Algorithms for computing the longest parameterized common subsequence. In Combinatorial Pattern Matching, 18th Annual Symposium, CPM 2007, London, Canada, July 9-11, 2007, Proceedings, pages 265-273, 2007. URL: https://doi.org/10.1007/978-3-540-73437-6_27.
  39. Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-SAT. J. Comput. Syst. Sci., 62(2):367-375, 2001. URL: https://doi.org/10.1006/jcss.2000.1727.
  40. Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Comput. Syst. Sci., 63(4):512-530, 2001. URL: https://doi.org/10.1006/jcss.2001.1774.
  41. Prateek Karandikar, Manfred Kufleitner, and Philippe Schnoebelen. On the index of Simon’s congruence for piecewise testability. Inf. Process. Lett., 115(4):515-519, 2015. Google Scholar
  42. Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages with applications in logical complexity. In Proc. CSL 2016, volume 62 of LIPIcs, pages 37:1-37:22, 2016. Google Scholar
  43. Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages and the complexity of the logic of subwords. Log. Methods Comput. Sci., 15(2), 2019. Google Scholar
  44. Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, and Matthias Weidlich. Discovering event queries from traces: Laying foundations for subsequence-queries with wildcards and gap-size constraints. In 25th International Conference on Database Theory, ICDT 2022, volume 220 of LIPIcs, pages 18:1-18:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPIcs.ICDT.2022.18.
  45. Dietrich Kuske. The subtrace order and counting first-order logic. In Proc. CSR 2020, volume 12159 of Lecture Notes in Computer Science, pages 289-302, 2020. Google Scholar
  46. Dietrich Kuske and Georg Zetzsche. Languages ordered by the subword order. In Proc. FOSSACS 2019, volume 11425 of Lecture Notes in Computer Science, pages 348-364, 2019. Google Scholar
  47. Marie Lejeune, Julien Leroy, and Michel Rigo. Computing the k-binomial complexity of the Thue-Morse word. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 278-291, 2019. Google Scholar
  48. Julien Leroy, Michel Rigo, and Manon Stipulanti. Generalized Pascal triangle for binomial coefficients of words. Electron. J. Combin., 24(1.44):36 pp., 2017. Google Scholar
  49. Chun Li and Jianyong Wang. Efficiently mining closed subsequences with gap constraints. In SDM, pages 313-322. SIAM, 2008. Google Scholar
  50. Chun Li, Qingyan Yang, Jianyong Wang, and Ming Li. Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data, 6(1):2:1-2:39, 2012. Google Scholar
  51. Daniel Lokshtanov, Dániel Marx, and Saket Saurabh. Lower bounds based on the exponential time hypothesis. Bull. EATCS, 105:41-72, 2011. URL: http://eatcs.org/beatcs/index.php/beatcs/article/view/92.
  52. David Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25(2):322-336, April 1978. Google Scholar
  53. Alexandru Mateescu, Arto Salomaa, and Sheng Yu. Subword histories and Parikh matrices. J. Comput. Syst. Sci., 68(1):1-21, 2004. Google Scholar
  54. William E. Riddle. An approach to software system modelling and analysis. Comput. Lang., 4(1):49-66, 1979. URL: https://doi.org/10.1016/0096-0551(79)90009-2.
  55. Michel Rigo and Pavel Salimov. Another generalization of abelian equivalence: Binomial complexity of infinite words. Theor. Comput. Sci., 601:47-57, 2015. Google Scholar
  56. Arto Salomaa. Connections between subwords and certain matrix mappings. Theoret. Comput. Sci., 340(2):188-203, 2005. Google Scholar
  57. Shinnosuke Seki. Absoluteness of subword inequality is undecidable. Theor. Comput. Sci., 418:116-120, 2012. URL: https://doi.org/10.1016/j.tcs.2011.10.017.
  58. Alan C. Shaw. Software descriptions with flow expressions. IEEE Trans. Software Eng., 4(3):242-254, 1978. URL: https://doi.org/10.1109/TSE.1978.231501.
  59. Imre Simon. Hierarchies of events with dot-depth one - Ph.D. thesis. University of Waterloo, 1972. Google Scholar
  60. Imre Simon. Piecewise testable events. In Autom. Theor. Form. Lang., 2nd GI Conf., volume 33 of LNCS, pages 214-222, 1975. Google Scholar
  61. Imre Simon. Words distinguished by their subwords (extended abstract). In Proc. WORDS 2003, volume 27 of TUCS General Publication, pages 6-13, 2003. Google Scholar
  62. Ken Thompson. Regular expression search algorithm. Commun. ACM, 11(6):419-422, 1968. URL: https://doi.org/10.1145/363347.363387.
  63. Zdenek Tronícek. Common subsequence automaton. In Proc. CIAA 2002 (Revised Papers), volume 2608 of Lecture Notes in Computer Science, pages 270-275, 2002. Google Scholar
  64. Wen-Guey Tzeng. A polynomial-time algorithm for the equivalence of probabilistic automata. SIAM J. Comput., 21(2):216-227, 1992. URL: https://doi.org/10.1137/0221017.
  65. Virginia Vassilevska Williams. Hardness of easy problems: Basing hardness on popular conjectures such as the strong exponential time hypothesis (invited talk). In 10th International Symposium on Parameterized and Exact Computation, IPEC 2015, September 16-18, 2015, Patras, Greece, pages 17-29, 2015. URL: https://doi.org/10.4230/LIPIcs.IPEC.2015.17.
  66. Georg Zetzsche. The complexity of downward closure comparisons. In Proc. ICALP 2016, volume 55 of LIPIcs, pages 123:1-123:14, 2016. Google Scholar
  67. Haopeng Zhang, Yanlei Diao, and Neil Immerman. On complexity and optimization of expensive queries in complex event processing. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, pages 217-228, 2014. URL: https://doi.org/10.1145/2588555.2593671.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail