Approximate Circular Pattern Matching

Authors Panagiotis Charalampopoulos , Tomasz Kociumaka , Jakub Radoszewski , Solon P. Pissis , Wojciech Rytter , Tomasz Waleń , Wiktor Zuba



PDF
Thumbnail PDF

File

LIPIcs.ESA.2022.35.pdf
  • Filesize: 1.03 MB
  • 19 pages

Document Identifiers

Author Details

Panagiotis Charalampopoulos
  • Birkbeck, University of London, UK
  • Reichman University, Herzliya, Israel
Tomasz Kociumaka
  • Max Planck Institute for Informatics, Saarbrücken, Germany
Jakub Radoszewski
  • University of Warsaw, Poland
  • Samsung R&D, Warsaw, Poland
Solon P. Pissis
  • CWI, Amsterdam, The Netherlands
  • Vrije Universiteit, Amsterdam, The Netherlands
Wojciech Rytter
  • University of Warsaw, Poland
Tomasz Waleń
  • University of Warsaw, Poland
Wiktor Zuba
  • CWI, Amsterdam, The Netherlands

Acknowledgements

We thank Paweł Gawrychowski and Oren Weimann for suggesting the strategy for the proof of Lemma 29.

Cite As Get BibTex

Panagiotis Charalampopoulos, Tomasz Kociumaka, Jakub Radoszewski, Solon P. Pissis, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Approximate Circular Pattern Matching. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 35:1-35:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ESA.2022.35

Abstract

We investigate the complexity of approximate circular pattern matching (CPM, in short) under the Hamming and edit distance. Under each of these two basic metrics, we are given a length-n text T, a length-m pattern P, and a positive integer threshold k, and we are to report all starting positions (called occurrences) of fragments of T that are at distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if there is any such occurrence. All previous results for approximate CPM were either average-case upper bounds or heuristics, with the exception of the work of Charalampopoulos et al. [CKP^+, JCSS'21], who considered only the Hamming distance. For the reporting version of the approximate CPM problem, under the Hamming distance we improve upon the main algorithm of [CKP^+, JCSS'21] from 𝒪(n+(n/m) ⋅ k⁴) to 𝒪(n+(n/m) ⋅ k³ log log k) time; for the edit distance, we give an 𝒪(nk²)-time algorithm. Notably, for the decision versions and wide parameter-ranges, we give algorithms whose complexities are almost identical to the state-of-the-art for standard (i.e., non-circular) approximate pattern matching:  
- For the decision version of the approximate CPM problem under the Hamming distance, we obtain an 𝒪(n+(n/m) ⋅ k² log k / log log k)-time algorithm, which works in 𝒪(n) time whenever k = 𝒪(√{m log log m / log m}). In comparison, the fastest algorithm for the standard counterpart of the problem, by Chan et al. [CGKKP, STOC’20], runs in 𝒪(n) time only for k = 𝒪(√m). We achieve this result via a reduction to a geometric problem by building on ideas from [CKP^+, JCSS'21] and Charalampopoulos et al. [CKW, FOCS'20]. 
- For the decision version of the approximate CPM problem under the edit distance, the 𝒪(nklog³ k) runtime of our algorithm near matches the 𝒪(nk) runtime of the Landau-Vishkin algorithm [LV, J. Algorithms'89] for approximate pattern matching under edit distance; the latter algorithm remains the fastest known for k = Ω(m^{2/5}). As a stepping stone, we propose an 𝒪(nklog³ k)-time algorithm for solving the Longest Prefix k'-Approximate Match problem, proposed by Landau et al. [LMS, SICOMP'98], for all k' ∈ {1,…,k}. Our algorithm is based on Tiskin’s theory of seaweeds [Tiskin, Math. Comput. Sci.'08], with recent advancements (see Charalampopoulos et al. [CKW, FOCS'22]), and on exploiting the seaweeds' relation to Monge matrices. 
In contrast, we obtain a conditional lower bound that suggests a polynomial separation between approximate CPM under the Hamming distance over the binary alphabet and its non-circular counterpart. We also show that a strongly subquadratic-time algorithm for the decision version of approximate CPM under edit distance would refute the Strong Exponential Time Hypothesis.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • approximate circular pattern matching
  • Hamming distance
  • edit distance

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Karl R. Abrahamson. Generalized string matching. SIAM Journal on Computing, 16(6):1039-1051, 1987. URL: http://dx.doi.org/10.1137/0216067.
  2. Lorraine A. K. Ayad, Carl Barton, and Solon P. Pissis. A faster and more accurate heuristic for cyclic edit distance computation. Pattern Recognition Letters, 88:81-87, 2017. URL: http://dx.doi.org/10.1016/j.patrec.2017.01.018.
  3. Lorraine A. K. Ayad and Solon P. Pissis. MARS: Improving multiple circular sequence alignment using refined sequences. BMC Genomics, 18(1):86, 2017. URL: http://dx.doi.org/10.1186/s12864-016-3477-5.
  4. Lorraine A. K. Ayad, Solon P. Pissis, and Ahmad Retha. libFLASM: A software library for fixed-length approximate string matching. BMC Bioinformatics, 17:454:1-454:12, 2016. URL: http://dx.doi.org/10.1186/s12859-016-1320-2.
  5. Md. Aashikur Rahman Azim, Costas S. Iliopoulos, M. Sohel Rahman, and M. Samiruzzaman. A filter-based approach for approximate circular pattern matching. In Bioinformatics Research and Applications - 11th International Symposium, ISBRA 2015, volume 9096 of Lecture Notes in Computer Science, pages 24-35. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19048-8_3.
  6. Carl Barton, Costas S. Iliopoulos, Ritu Kundu, Solon P. Pissis, Ahmad Retha, and Fatima Vayani. Accurate and efficient methods to improve multiple circular sequence alignment. In Experimental Algorithms, SEA 2015, volume 9125 of Lecture Notes in Computer Science, pages 247-258. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-20086-6_19.
  7. Carl Barton, Costas S. Iliopoulos, and Solon P. Pissis. Fast algorithms for approximate circular string matching. Algorithms for Molecular Biology, 9:9, 2014. URL: http://dx.doi.org/10.1186/1748-7188-9-9.
  8. Carl Barton, Costas S. Iliopoulos, and Solon P. Pissis. Average-case optimal approximate circular string matching. In Language and Automata Theory and Applications - 9th International Conference, LATA 2015, volume 8977 of Lecture Notes in Computer Science, pages 85-96. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-15579-1_6.
  9. Carl Barton, Costas S. Iliopoulos, Solon P. Pissis, and William F. Smyth. Fast and simple computations using prefix tables under Hamming and edit distance. In Combinatorial Algorithms - 25th International Workshop, IWOCA 2014, volume 8986 of Lecture Notes in Computer Science, pages 49-61. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-19315-1_5.
  10. Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, pages 79-97. IEEE Computer Society, 2015. URL: http://dx.doi.org/10.1109/FOCS.2015.15.
  11. Gerth Stølting Brodal, Pooya Davoodi, and S. Srinivasa Rao. Path minima queries in dynamic weighted trees. In Algorithms and Data Structures - 12th International Symposium, WADS 2011, volume 6844 of Lecture Notes in Computer Science, pages 290-301. Springer, 2011. URL: http://dx.doi.org/10.1007/978-3-642-22300-6_25.
  12. Timothy M. Chan, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. Approximating text-to-pattern Hamming distances. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 643-656. ACM, 2020. URL: http://dx.doi.org/10.1145/3357713.3384266.
  13. Timothy M. Chan and Moshe Lewenstein. Clustered integer 3SUM via additive combinatorics. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, pages 31-40. ACM, 2015. URL: http://dx.doi.org/10.1145/2746539.2746568.
  14. Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Circular pattern matching with k mismatches. Journal of Computer and System Sciences, 115:73-85, 2021. URL: http://dx.doi.org/10.1016/j.jcss.2020.07.003.
  15. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approximate pattern matching: A unified approach. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, pages 978-989. IEEE, 2020. URL: http://dx.doi.org/10.1109/FOCS46700.2020.00095.
  16. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster pattern matching under edit distance. In Accepted to 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, 2022. URL: http://arxiv.org/abs/2204.03087.
  17. Kuei-Hao Chen, Guan-Shieng Huang, and Richard Chia-Tung Lee. Bit-parallel algorithms for exact circular string matching. The Computer Journal, 57(5):731-743, 2014. URL: http://dx.doi.org/10.1093/comjnl/bxt023.
  18. Shucheng Chi, Ran Duan, Tianle Xie, and Tianyi Zhang. Faster min-plus product for monotone instances. In STOC 2022: 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1529-1542. ACM, 2022. URL: http://dx.doi.org/10.1145/3519935.3520057.
  19. Ferdinando Cicalese, Gabriele Fici, and Zsuzsanna Lipták. Searching for jumbled patterns in strings. In Proceedings of the Prague Stringology Conference 2009, pages 105-117, 2009. URL: http://www.stringology.org/event/2009/p10.html.
  20. Ferdinando Cicalese, Travis Gagie, Emanuele Giaquinta, Eduardo Sany Laber, Zsuzsanna Lipták, Romeo Rizzi, and Alexandru I. Tomescu. Indexes for jumbled pattern matching in strings, trees and graphs. In String Processing and Information Retrieval - 20th International Symposium, SPIRE 2013, volume 8214 of Lecture Notes in Computer Science, pages 56-63. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-319-02432-5_10.
  21. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. The k-mismatch problem revisited. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, pages 2039-2052. SIAM, 2016. URL: http://dx.doi.org/10.1137/1.9781611974331.ch142.
  22. Michael J. Fischer and Michael S. Paterson. String-matching and other products. Technical report, Massachusetts Institute of Technology, USA, 1974. Google Scholar
  23. Kimmo Fredriksson and Szymon Grabowski. Average-optimal string matching. Journal of Discrete Algorithms, 7(4):579-594, 2009. URL: http://dx.doi.org/10.1016/j.jda.2008.09.001.
  24. Kimmo Fredriksson and Gonzalo Navarro. Average-optimal single and multiple approximate string matching. ACM Journal of Experimental Algorithmics, 9, 2004. URL: http://dx.doi.org/10.1145/1005813.1041513.
  25. Paweł Gawrychowski, Shay Mozes, and Oren Weimann. Submatrix maximum queries in Monge and partial Monge matrices are equivalent to predecessor search. ACM Transactions on Algorithms, 16(2):16:1-16:24, 2020. URL: http://dx.doi.org/10.1145/3381416.
  26. Paweł Gawrychowski and Przemysław Uznański. Towards unified approximate pattern matching for Hamming and L₁ distance. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, volume 107 of LIPIcs, pages 62:1-62:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: http://dx.doi.org/10.4230/LIPIcs.ICALP.2018.62.
  27. Roberto Grossi, Costas S. Iliopoulos, Robert Mercaş, Nadia Pisanti, Solon P. Pissis, Ahmad Retha, and Fatima Vayani. Circular sequence comparison: algorithms and applications. Algorithms for Molecular Biology, 11:12, 2016. URL: http://dx.doi.org/10.1186/s13015-016-0076-6.
  28. Yijie Han. Deterministic sorting in O(nlog log n) time and linear space. Journal of Algorithms, 50(1):96-105, 2004. URL: http://dx.doi.org/10.1016/j.jalgor.2003.09.001.
  29. Tommi Hirvola and Jorma Tarhio. Bit-parallel approximate matching of circular strings with k mismatches. ACM Journal of Experimental Algorithmics, 22, 2017. URL: http://dx.doi.org/10.1145/3129536.
  30. ThienLuan Ho, Seungrohk Oh, and Hyunjin Kim. New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance. The Journal of Supercomputing, 74(5):1815-1834, 2018. URL: http://dx.doi.org/10.1007/s11227-017-2192-6.
  31. Wing-Kai Hon, Tsung-Han Ku, Rahul Shah, and Sharma V. Thankachan. Space-efficient construction algorithm for the circular suffix tree. In Combinatorial Pattern Matching, 24th Annual Symposium, CPM 2013, volume 7922 of Lecture Notes in Computer Science, pages 142-152. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-642-38905-4_15.
  32. Wing-Kai Hon, Chen-Hua Lu, Rahul Shah, and Sharma V. Thankachan. Succinct indexes for circular patterns. In Algorithms and Computation - 22nd International Symposium, ISAAC 2011, volume 7074 of Lecture Notes in Computer Science, pages 673-682. Springer, 2011. URL: http://dx.doi.org/10.1007/978-3-642-25591-5_69.
  33. Costas S. Iliopoulos, Solon P. Pissis, and M. Sohel Rahman. Searching and indexing circular patterns. In Algorithms for Next-Generation Sequencing Data, Techniques, Approaches, and Applications, pages 77-90. Springer, 2017. URL: http://dx.doi.org/10.1007/978-3-319-59826-0_3.
  34. Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-SAT. Journal of Computer and System Sciences, 62(2):367-375, 2001. URL: http://dx.doi.org/10.1006/jcss.2000.1727.
  35. Haim Kaplan, Shay Mozes, Yahav Nussbaum, and Micha Sharir. Submatrix maximum queries in Monge matrices and Monge partial matrices, and their applications. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, pages 338-355. SIAM, 2012. URL: http://dx.doi.org/10.1137/1.9781611973099.31.
  36. S. Rao Kosaraju. Efficient string matching. Manuscript, 1987. Google Scholar
  37. Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM Journal on Computing, 27(2):557-582, 1998. URL: http://dx.doi.org/10.1137/S0097539794264810.
  38. Gad M. Landau and Uzi Vishkin. Fast parallel and serial approximate string matching. Journal of Algorithms, 10(2):157-169, 1989. URL: http://dx.doi.org/10.1016/0196-6774(89)90010-2.
  39. M. Lothaire. Applied Combinatorics on Words. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2005. URL: http://dx.doi.org/10.1017/CBO9781107341005.
  40. Vicente Palazón-González and Andrés Marzal. On the dynamic time warping of cyclic sequences for shape retrieval. Image and Vision Computing, 30(12):978-990, 2012. URL: http://dx.doi.org/10.1016/j.imavis.2012.08.012.
  41. Vicente Palazón-González and Andrés Marzal. Speeding up the cyclic edit distance using LAESA with early abandon. Pattern Recognition Letters, 62:1-7, 2015. URL: http://dx.doi.org/10.1016/j.patrec.2015.04.013.
  42. Vicente Palazón-González, Andrés Marzal, and Juan Miguel Vilar. On hidden Markov models and cyclic strings for shape recognition. Pattern Recognition, 47(7):2490-2504, 2014. URL: http://dx.doi.org/10.1016/j.patcog.2014.01.018.
  43. Robert Susik, Szymon Grabowski, and Sebastian Deorowicz. Fast and simple circular pattern matching. In Man-Machine Interactions 3, Proceedings of the 3rd International Conference on Man-Machine Interactions, ICMMI 2013, volume 242 of Advances in Intelligent Systems and Computing, pages 537-544. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-319-02309-0_59.
  44. Alexander Tiskin. Semi-local string comparison: algorithmic techniques and applications, 2007. URL: http://dx.doi.org/10.48550/ARXIV.0707.3619.
  45. Alexander Tiskin. Semi-local string comparison: Algorithmic techniques and applications. Mathematics in Computer Science, 1(4):571-603, 2008. URL: http://dx.doi.org/10.1007/s11786-007-0033-3.
  46. Dan E. Willard. Log-logarithmic worst-case range queries are possible in space Θ(N). Information Processing Letters, 17(2):81-84, 1983. URL: http://dx.doi.org/10.1016/0020-0190(83)90075-3.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail