Subsequences with Generalised Gap Constraints: Upper and Lower Complexity Bounds

Authors Florin Manea , Jonas Richardsen, Markus L. Schmid



PDF
Thumbnail PDF

File

LIPIcs.CPM.2024.22.pdf
  • Filesize: 0.83 MB
  • 17 pages

Document Identifiers

Author Details

Florin Manea
  • Computer Science Department and CIDAS, Universität Göttingen, Germany
Jonas Richardsen
  • Computer Science Department and CIDAS, Universität Göttingen, Germany
Markus L. Schmid
  • Humboldt-Universität zu Berlin, Berlin, Germany

Cite AsGet BibTex

Florin Manea, Jonas Richardsen, and Markus L. Schmid. Subsequences with Generalised Gap Constraints: Upper and Lower Complexity Bounds. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 22:1-22:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.CPM.2024.22

Abstract

For two strings u, v over some alphabet A, we investigate the problem of embedding u into w as a subsequence under the presence of generalised gap constraints. A generalised gap constraint is a triple (i, j, C_{i, j}), where 1 ≤ i < j ≤ |u| and C_{i, j} ⊆ A^*. Embedding u as a subsequence into v such that (i, j, C_{i, j}) is satisfied means that if u[i] and u[j] are mapped to v[k] and v[𝓁], respectively, then the induced gap v[k + 1..𝓁 - 1] must be a string from C_{i, j}. This generalises the setting recently investigated in [Day et al., ISAAC 2022], where only gap constraints of the form C_{i, i + 1} are considered, as well as the setting from [Kosche et al., RP 2022], where only gap constraints of the form C_{1, |u|} are considered. We show that subsequence matching under generalised gap constraints is NP-hard, and we complement this general lower bound with a thorough (parameterised) complexity analysis. Moreover, we identify several efficiently solvable subclasses that result from restricting the interval structure induced by the generalised gap constraints.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Theory of computation → Parameterized complexity and exact algorithms
  • Theory of computation → Formal languages and automata theory
Keywords
  • String algorithms
  • subsequences with gap constraints
  • pattern matching
  • fine-grained complexity
  • conditional lower bounds
  • parameterised complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59-78, 2015. URL: https://doi.org/10.1109/FOCS.2015.14.
  2. Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. Consequences of faster alignment of sequences. In Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I, pages 39-51, 2014. URL: https://doi.org/10.1007/978-3-662-43948-7_4.
  3. Duncan Adamson, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Longest common subsequence with gap constraints. In Combinatorics on Words - 14th International Conference, WORDS 2023, Umeå, Sweden, June 12-16, 2023, Proceedings, pages 60-76, 2023. URL: https://doi.org/10.1007/978-3-031-33180-0_5.
  4. Alexander Artikis, Alessandro Margara, Martín Ugarte, Stijn Vansummeren, and Matthias Weidlich. Complex event recognition languages: Tutorial. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, DEBS 2017, Barcelona, Spain, June 19-23, 2017, pages 7-10, 2017. URL: https://doi.org/10.1145/3093742.3095106.
  5. Johannes Bader, Simon Gog, and Matthias Petri. Practical variable length gap pattern matching. In Experimental Algorithms - 15th International Symposium, SEA 2016, St. Petersburg, Russia, June 5-8, 2016, Proceedings, pages 1-16, 2016. URL: https://doi.org/10.1007/978-3-319-38851-9_1.
  6. Ricardo A. Baeza-Yates. Searching subsequences. Theor. Comput. Sci., 78(2):363-376, 1991. Google Scholar
  7. Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind. String matching with variable length gaps. Theor. Comput. Sci., 443:25-34, 2012. URL: https://doi.org/10.1016/j.tcs.2012.03.029.
  8. Hans L. Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM J. Comput., 25(6):1305-1317, 1996. URL: https://doi.org/10.1137/S0097539793251219.
  9. Hans L. Bodlaender. A partial k-arboretum of graphs with bounded treewidth. Theor. Comput. Sci., 209(1-2):1-45, 1998. URL: https://doi.org/10.1016/S0304-3975(97)00228-4.
  10. Karl Bringmann and Bhaskar Ray Chaudhury. Sketching, streaming, and fine-grained complexity of (weighted) LCS. In Proc. FSTTCS 2018, volume 122 of LIPIcs, pages 40:1-40:16, 2018. Google Scholar
  11. Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In Proc. SODA 2018, pages 1216-1235, 2018. Google Scholar
  12. Sam Buss and Michael Soltys. Unshuffling a square is NP-hard. J. Comput. Syst. Sci., 80(4):766-776, 2014. URL: https://doi.org/10.1016/j.jcss.2013.11.002.
  13. Manuel Cáceres, Simon J. Puglisi, and Bella Zhukova. Fast indexes for gapped pattern matching. In SOFSEM 2020: Theory and Practice of Computer Science - 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, Limassol, Cyprus, January 20-24, 2020, Proceedings, pages 493-504, 2020. URL: https://doi.org/10.1007/978-3-030-38919-2_40.
  14. Stephen A. Cook. The complexity of theorem-proving procedures. In Michael A. Harrison, Ranan B. Banerji, and Jeffrey D. Ullman, editors, Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, May 3-5, 1971, Shaker Heights, Ohio, USA, pages 151-158. ACM, 1971. URL: https://doi.org/10.1145/800157.805047.
  15. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, 3rd Edition. MIT Press, 2009. URL: http://mitpress.mit.edu/books/introduction-algorithms.
  16. David Coudert, Florian Huc, and Jean-Sébastien Sereni. Pathwidth of outerplanar graphs. J. Graph Theory, 55(1):27-41, 2007. URL: https://doi.org/10.1002/JGT.20218.
  17. Joel D. Day, Maria Kosche, Florin Manea, and Markus L. Schmid. Subsequences with gap constraints: Complexity bounds for matching and analysis problems. In 33rd International Symposium on Algorithms and Computation, ISAAC 2022, December 19-21, 2022, Seoul, Korea, pages 64:1-64:18, 2022. URL: https://doi.org/10.4230/LIPICS.ISAAC.2022.64.
  18. Hristo N. Djidjev and Imrich Vrto. Crossing numbers and cutwidths. J. Graph Algorithms Appl., 7(3):245-251, 2003. URL: https://doi.org/10.7155/JGAA.00069.
  19. Shiri Dori and Gad M. Landau. Construction of aho corasick automaton in linear time for integer alphabets. Inf. Process. Lett., 98(2):66-72, 2006. URL: https://doi.org/10.1016/J.IPL.2005.11.019.
  20. John A. Ellis, Ivan Hal Sudborough, and Jonathan S. Turner. Graph separation and search number. In Proc. 1983 Allerton Conf. on Communication, Control, and Computing, 1983. Google Scholar
  21. John A. Ellis, Ivan Hal Sudborough, and Jonathan S. Turner. The vertex separation and search number of a graph. Inf. Comput., 113(1):50-79, 1994. URL: https://doi.org/10.1006/INCO.1994.1064.
  22. Pamela Fleischmann, Sungmin Kim, Tore Koß, Florin Manea, Dirk Nowotka, Stefan Siemer, and Max Wiedenhöft. Matching patterns with variables under simon’s congruence. In Reachability Problems - 17th International Conference, RP 2023, Nice, France, October 11-13, 2023, Proceedings, pages 155-170, 2023. URL: https://doi.org/10.1007/978-3-031-45286-4_12.
  23. Dominik D. Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, and Wojciech Rytter. Testing k-binomial equivalence. In Multidisciplinary Creativity, a collection of papers dedicated to G. Păun 65th birthday, pages 239-248, 2015. available in CoRR abs/1509.00622. Google Scholar
  24. André Frochaux and Sarah Kleest-Meißner. Puzzling over subsequence-query extensions: Disjunction and generalised gaps. In Proceedings of the 15th Alberto Mendelzon International Workshop on Foundations of Data Management (AMW 2023), Santiago de Chile, Chile, May 22-26, 2023, 2023. URL: https://ceur-ws.org/Vol-3409/paper3.pdf.
  25. Nikos Giatrakos, Elias Alevizos, Alexander Artikis, Antonios Deligiannakis, and Minos N. Garofalakis. Complex event recognition in the big data era: a survey. VLDB J., 29(1):313-352, 2020. URL: https://doi.org/10.1007/s00778-019-00557-w.
  26. Simon Halfon, Philippe Schnoebelen, and Georg Zetzsche. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. LICS 2017, pages 1-12, 2017. Google Scholar
  27. Costas S. Iliopoulos, Marcin Kubica, M. Sohel Rahman, and Tomasz Walen. Algorithms for computing the longest parameterized common subsequence. In Combinatorial Pattern Matching, 18th Annual Symposium, CPM 2007, London, Canada, July 9-11, 2007, Proceedings, pages 265-273, 2007. URL: https://doi.org/10.1007/978-3-540-73437-6_27.
  28. Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. J. Comput. Syst. Sci., 62(2):367-375, 2001. URL: https://doi.org/10.1006/jcss.2000.1727.
  29. Prateek Karandikar, Manfred Kufleitner, and Philippe Schnoebelen. On the index of Simon’s congruence for piecewise testability. Inf. Process. Lett., 115(4):515-519, 2015. Google Scholar
  30. Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages with applications in logical complexity. In Proc. CSL 2016, volume 62 of LIPIcs, pages 37:1-37:22, 2016. Google Scholar
  31. Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages and the complexity of the logic of subwords. Log. Methods Comput. Sci., 15(2), 2019. Google Scholar
  32. Richard M. Karp. Reducibility among combinatorial problems. In Raymond E. Miller and James W. Thatcher, editors, Proceedings of a symposium on the Complexity of Computer Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA, The IBM Research Symposia Series, pages 85-103. Plenum Press, New York, 1972. URL: https://doi.org/10.1007/978-1-4684-2001-2_9.
  33. Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, and Matthias Weidlich. Discovering event queries from traces: Laying foundations for subsequence-queries with wildcards and gap-size constraints. In 25th International Conference on Database Theory, ICDT 2022, 29th March-1st April, 2022 Edinburgh, UK, 2022. Google Scholar
  34. Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, and Matthias Weidlich. Discovering multi-dimensional subsequence queries from traces - from theory to practice. In Datenbanksysteme für Business, Technologie und Web (BTW 2023), 20. Fachtagung des GI-Fachbereichs ,,Datenbanken und Informationssysteme" (DBIS), 06.-10, März 2023, Dresden, Germany, Proceedings, pages 511-533, 2023. URL: https://doi.org/10.18420/BTW2023-24.
  35. Maria Kosche, Tore Koß, Florin Manea, and Viktoriya Pak. Subsequences in bounded ranges: Matching and analysis problems. In Anthony W. Lin, Georg Zetzsche, and Igor Potapov, editors, Reachability Problems - 16th International Conference, RP 2022, Kaiserslautern, Germany, October 17-21, 2022, Proceedings, volume 13608 of Lecture Notes in Computer Science, pages 140-159. Springer, 2022. URL: https://doi.org/10.1007/978-3-031-19135-0_10.
  36. Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Combinatorial algorithms for subsequence matching: A survey. In Henning Bordihn, Géza Horváth, and György Vaszil, editors, Proceedings 12th International Workshop on Non-Classical Models of Automata and Applications, NCMA 2022, Debrecen, Hungary, August 26-27, 2022, volume 367 of EPTCS, pages 11-27, 2022. URL: https://doi.org/10.4204/EPTCS.367.2.
  37. Dietrich Kuske. The subtrace order and counting first-order logic. In Proc. CSR 2020, volume 12159 of Lecture Notes in Computer Science, pages 289-302, 2020. Google Scholar
  38. Dietrich Kuske and Georg Zetzsche. Languages ordered by the subword order. In Proc. FOSSACS 2019, volume 11425 of Lecture Notes in Computer Science, pages 348-364, 2019. Google Scholar
  39. Marie Lejeune, Julien Leroy, and Michel Rigo. Computing the k-binomial complexity of the Thue-Morse word. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 278-291, 2019. Google Scholar
  40. Julien Leroy, Michel Rigo, and Manon Stipulanti. Generalized Pascal triangle for binomial coefficients of words. Electron. J. Combin., 24(1.44):36 pp., 2017. Google Scholar
  41. Chun Li and Jianyong Wang. Efficiently mining closed subsequences with gap constraints. In SDM, pages 313-322. SIAM, 2008. Google Scholar
  42. Chun Li, Qingyan Yang, Jianyong Wang, and Ming Li. Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data, 6(1):2:1-2:39, 2012. Google Scholar
  43. David Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25(2):322-336, April 1978. Google Scholar
  44. Alexandru Mateescu, Arto Salomaa, and Sheng Yu. Subword histories and Parikh matrices. J. Comput. Syst. Sci., 68(1):1-21, 2004. Google Scholar
  45. T.A.J. Nicholson. Permutation procedure for minimising the number of crossings in a network. Proceedings of the Institution of Electrical Engineers, 115:21-26(5), January 1968. Google Scholar
  46. Rohit J Parikh. Language generating devices. Quarterly Progress Report, 60:199-212, 1961. Google Scholar
  47. M. Praveen, Philippe Schnoebelen, Julien Veron, and Isa Vialard. On the piecewise complexity of words and periodic words. In SOFSEM 2024: Theory and Practice of Computer Science - 48th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2024, Cochem, Germany, February 19-23, 2024, Proceedings, pages 456-470, 2024. URL: https://doi.org/10.1007/978-3-031-52113-3_32.
  48. William E. Riddle. An approach to software system modelling and analysis. Comput. Lang., 4(1):49-66, 1979. URL: https://doi.org/10.1016/0096-0551(79)90009-2.
  49. Michel Rigo and Pavel Salimov. Another generalization of abelian equivalence: Binomial complexity of infinite words. Theor. Comput. Sci., 601:47-57, 2015. Google Scholar
  50. Arto Salomaa. Connections between subwords and certain matrix mappings. Theoret. Comput. Sci., 340(2):188-203, 2005. Google Scholar
  51. Philippe Schnoebelen and Julien Veron. On arch factorization and subword universality for words and compressed words. In Combinatorics on Words - 14th International Conference, WORDS 2023, Umeå, Sweden, June 12-16, 2023, Proceedings, pages 274-287, 2023. URL: https://doi.org/10.1007/978-3-031-33180-0_21.
  52. Shinnosuke Seki. Absoluteness of subword inequality is undecidable. Theor. Comput. Sci., 418:116-120, 2012. URL: https://doi.org/10.1016/J.TCS.2011.10.017.
  53. Alan C. Shaw. Software descriptions with flow expressions. IEEE Trans. Software Eng., 4(3):242-254, 1978. URL: https://doi.org/10.1109/TSE.1978.231501.
  54. Imre Simon. Hierarchies of events with dot-depth one - Ph.D. thesis. University of Waterloo, 1972. Google Scholar
  55. Imre Simon. Piecewise testable events. In Autom. Theor. Form. Lang., 2nd GI Conf., volume 33 of LNCS, pages 214-222, 1975. Google Scholar
  56. Manfred Wiegers. Recognizing outerplanar graphs in linear time. In Gottfried Tinhofer and Gunther Schmidt, editors, Graphtheoretic Concepts in Computer Science, International Workshop, WG '86, Bernried, Germany, June 17-19, 1986, Proceedings, volume 246 of Lecture Notes in Computer Science, pages 165-176. Springer, 1986. URL: https://doi.org/10.1007/3-540-17218-1_57.
  57. Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci., 348(2-3):357-365, 2005. URL: https://doi.org/10.1016/j.tcs.2005.09.023.
  58. Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity, pages 3447-3487. World Scientific, 2018. URL: https://doi.org/10.1142/9789813272880_0188.
  59. Georg Zetzsche. The complexity of downward closure comparisons. In Proc. ICALP 2016, volume 55 of LIPIcs, pages 123:1-123:14, 2016. Google Scholar
  60. Haopeng Zhang, Yanlei Diao, and Neil Immerman. On complexity and optimization of expensive queries in complex event processing. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, pages 217-228, 2014. URL: https://doi.org/10.1145/2588555.2593671.