The Fine-Grained Complexity of Episode Matching

Authors Philip Bille , Inge Li Gørtz , Shay Mozes , Teresa Anna Steiner , Oren Weimann



PDF
Thumbnail PDF

File

LIPIcs.CPM.2022.4.pdf
  • Filesize: 0.65 MB
  • 12 pages

Document Identifiers

Author Details

Philip Bille
  • Technical University of Denmark, Lyngby, Denmark
Inge Li Gørtz
  • Technical University of Denmark, Lyngby, Denmark
Shay Mozes
  • The Interdisciplinary Center Herzliya, Israel
Teresa Anna Steiner
  • Technical University of Denmark, Lyngby, Denmark
Oren Weimann
  • University of Haifa, Israel

Cite AsGet BibTex

Philip Bille, Inge Li Gørtz, Shay Mozes, Teresa Anna Steiner, and Oren Weimann. The Fine-Grained Complexity of Episode Matching. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 4:1-4:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.CPM.2022.4

Abstract

Given two strings S and P, the Episode Matching problem is to find the shortest substring of S that contains P as a subsequence. The best known upper bound for this problem is Õ(nm) by Das et al. (1997), where n,m are the lengths of S and P, respectively. Although the problem is well studied and has many applications in data mining, this bound has never been improved. In this paper we show why this is the case by proving that no O((nm)^{1-ε}) algorithm (even for binary strings) exists, unless the Strong Exponential Time Hypothesis (SETH) is false. We then consider the indexing version of the problem, where S is preprocessed into a data structure for answering episode matching queries P. We show that for any τ, there is a data structure using O(n+(n/(τ)) ^k) space that answers episode matching queries for any P of length k in O(k⋅ τ ⋅ log log n) time. We complement this upper bound with an almost matching lower bound, showing that any data structure that answers episode matching queries for patterns of length k in time O(n^δ), must use Ω(n^{k-kδ-o(1)}) space, unless the Strong k-Set Disjointness Conjecture is false. Finally, for the special case of k = 2, we present a faster construction of the data structure using fast min-plus multiplication of bounded integer matrices.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Pattern matching
  • fine-grained complexity
  • longest common subsequence

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In Proc. 56th FOCS, pages 59-78, 2015. Google Scholar
  2. Amir Abboud and Virginia Vassilevska Williams. Fine-grained hardness for edit distance to a fixed sequence. In Proc. 48th ICALP, volume 198, pages 7:1-7:14, 2021. Google Scholar
  3. Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. Consequences of faster alignment of sequences. In Proc. 41st ICALP, pages 39-51, 2014. Google Scholar
  4. Avinash Achar, A. Ibrahim, and P. S. Sastry. Pattern-growth based frequent serial episode discovery. Data Knowl. Eng., 87:91-108, 2013. Google Scholar
  5. Josh Alman and Virginia Vassilevska Williams. A refined laser method and faster matrix multiplication. In Dániel Marx, editor, Proc. 32nd SODA, pages 522-539, 2021. Google Scholar
  6. Alberto Apostolico and Mikhail J. Atallah. Compact recognizers of episode sequences. Inf. Comput., 174(2):180-192, 2002. Google Scholar
  7. Mikhail J. Atallah, Robert Gwadera, and Wojciech Szpankowski. Detection of significant sets of episodes in event sequences. In Proc. 4th ICDM, pages 3-10, 2004. Google Scholar
  8. Arturs Backurs and Piotr Indyk. Which regular expression patterns are hard to match? In Proc. 57th FOCS, pages 457-466, 2016. Google Scholar
  9. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput., 47(3):1087-1097, 2018. Google Scholar
  10. Philip Bille, Inge Li Gørtz, Max Rishøj Pedersen, and Teresa Anna Steiner. Gapped indexing for consecutive occurrences. In Proc. 32nd CPM, pages 10:1-10:19, 2021. Google Scholar
  11. Luc Boasson, Patrick Cégielski, Irène Guessarian, and Yuri V. Matiyasevich. Window-accumulated subsequence matching problem is linear. Ann. Pure Appl. Log., 113(1-3):59-80, 2001. Google Scholar
  12. Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In Proc. 56th FOCS, pages 79-97, 2015. Google Scholar
  13. Patrick Cégielski, Irène Guessarian, and Yuri V. Matiyasevich. Multiple serial episodes matching. Inf. Process. Lett., 98(6):211-218, 2006. Google Scholar
  14. Maxime Crochemore, Costas S. Iliopoulos, Christos Makris, Wojciech Rytter, Athanasios K. Tsakalidis, and T. Tsichlas. Approximate string matching with gaps. Nord. J. Comput., 9(1):54-65, 2002. Google Scholar
  15. Gautam Das, Rudolf Fleischer, Leszek Gasieniec, Dimitrios Gunopulos, and Juha Kärkkäinen. Episode matching. In Proc. 8th CPM, pages 12-27, 1997. Google Scholar
  16. Lech Duraj, Marvin Künnemann, and Adam Polak. Tight conditional lower bounds for longest common increasing subsequence. Algorithmica, 81(10):3968-3992, 2019. Google Scholar
  17. Massimo Equi, Roberto Grossi, Veli Mäkinen, and Alexandru I. Tomescu. On the complexity of string matching for graphs. In Proc. 46th ICALP, pages 55:1-55:15, 2019. Google Scholar
  18. Massimo Equi, Veli Mäkinen, and Alexandru I. Tomescu. Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless SETH fails. In Proc. 27th SOFSEM, volume 12607, pages 608-622, 2021. Google Scholar
  19. Daniel Gibney. An efficient elastic-degenerate text index? not likely. In Proc. 27th SPIRE, pages 76-88, 2020. Google Scholar
  20. Isaac Goldstein, Tsvi Kopelowitz, Moshe Lewenstein, and Ely Porat. Conditional lower bounds for space/time tradeoffs. In Proc. 15th WADS, pages 421-436, 2017. Google Scholar
  21. Robert Gwadera, Mikhail J. Atallah, and Wojciech Szpankowski. Markov models for identification of significant episodes. In Proc. 5th SDM, pages 404-414, 2005. Google Scholar
  22. Robert Gwadera, Mikhail J. Atallah, and Wojciech Szpankowski. Reliable detection of episodes in event sequences. Knowl. Inf. Syst., 7(4):415-437, 2005. Google Scholar
  23. Masahiro Hirao, Shunsuke Inenaga, Ayumi Shinohara, Masayuki Takeda, and Setsuo Arikawa. A practical algorithm to find the best episode patterns. In Proc. 4th DS, pages 435-440, 2001. Google Scholar
  24. Daniel S. Hirschberg. Algorithms for the longest common subsequence problem. J. ACM, 24(4):664-675, 1977. Google Scholar
  25. Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. J. Comput. Syst. Sci., 62(2):367-375, 2001. Google Scholar
  26. Mika Klemettinen, Heikki Mannila, and Hannu Toivonen. Rule discovery in telecommunication alarm data. J. Netw. Syst. Manag., 7(4):395-423, 1999. Google Scholar
  27. Tomasz Kociumaka, Jakub Radoszewski, and Tatiana Starikovskaya. Longest common substring with approximately k mismatches. Algorithmica, 81(6):2633-2652, 2019. Google Scholar
  28. Tsvi Kopelowitz and Robert Krauthgamer. Color-distance oracles and snippets. In Proc. 27th CPM, pages 24:1-24:10, 2016. Google Scholar
  29. Josué Kuri, Gonzalo Navarro, and Ludovic Mé. Fast multipattern search algorithms for intrusion detection. Fundam. Informaticae, 56(1-2):23-49, 2003. Google Scholar
  30. Veli Mäkinen, Gonzalo Navarro, and Esko Ukkonen. Transposition invariant string matching. J. Algorithms, 56(2):124-153, 2005. Google Scholar
  31. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1(3):259-289, 1997. Google Scholar
  32. Elżbieta Nowicka and Marcin Zawada. On the complexity of matching non-injective general episodes. Computation and Logic in the Real World, pages 288-296, 2007. Google Scholar
  33. Adam Polak. Why is it hard to beat O(n^2) for longest common weakly increasing subsequence? Inf. Process. Lett., 132:1-5, 2018. Google Scholar
  34. Shinichiro Tago, Tatsuya Asai, Takashi Katoh, Hiroaki Morikawa, and Hiroya Inakoshi. EVIS: A fast and scalable episode matching engine for massively parallel data streams. In Proc. 17th DASFAA, pages 213-223, 2012. Google Scholar
  35. D. E. Willard. Log-logarithmic worst-case range queries are possible in space θ(N). Inf. Process. Lett., 17(2):81-84, 1983. Google Scholar
  36. Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci., 348(2-3):357-365, 2005. Google Scholar
  37. Virginia Vassilevska Williams and Yinzhan Xu. Truly subcubic min-plus product for less structured matrices, with applications. In Proc.31st SODA, pages 12-29, 2020. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail