Streaming k-Edit Approximate Pattern Matching via String Decomposition

Authors Sudatta Bhattacharya , Michal Koucký



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2023.22.pdf
  • Filesize: 0.72 MB
  • 14 pages

Document Identifiers

Author Details

Sudatta Bhattacharya
  • Computer Science Institute of Charles University, Prague, Czech Republic
Michal Koucký
  • Computer Science Institute of Charles University, Prague, Czech Republic

Acknowledgements

We thank Tomasz Kociumaka for pointing to us references for Corollary 3. We thank anonymous reviewers for helpful comments.

Cite AsGet BibTex

Sudatta Bhattacharya and Michal Koucký. Streaming k-Edit Approximate Pattern Matching via String Decomposition. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 22:1-22:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ICALP.2023.22

Abstract

In this paper we give an algorithm for streaming k-edit approximate pattern matching which uses space Õ(k²) and time Õ(k²) per arriving symbol. This improves substantially on the recent algorithm of Kociumaka, Porat and Starikovskaya [Kociumaka et al., 2022] which uses space Õ(k⁵) and time Õ(k⁸) per arriving symbol. In the k-edit approximate pattern matching problem we get a pattern P and text T and we want to identify all substrings of the text T that are at edit distance at most k from P. In the streaming version of this problem both the pattern and the text arrive in a streaming fashion symbol by symbol and after each symbol of the text we need to report whether there is a current suffix of the text with edit distance at most k from P. We measure the total space needed by the algorithm and time needed per arriving symbol.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Theory of computation → Sketching and sampling
Keywords
  • Approximate pattern matching
  • edit distance
  • streaming algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. Journal of Algorithms, 50(2):257-275, 2004. Google Scholar
  2. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless seth is false). In Proceedings of the forty-seventh annual ACM symposium on Theory of computing STOC, pages 51-58, 2015. Google Scholar
  3. Sudatta Bhattacharya and Michal Koucký. Locally consistent decomposition of strings with applications to edit distance sketching. In Proceedings of the 55th Annual ACM SIGACT Symposium on Theory of Computing, STOC(to appear), 2023. URL: https://arxiv.org/abs/2302.04475.
  4. Robert S Boyer and J Strother Moore. A fast string searching algorithm. Communications of the ACM, 20(10):762-772, 1977. Google Scholar
  5. Dany Breslauer and Zvi Galil. Real-time streaming string-matching. ACM Transactions on Algorithms (TALG), 10(4):1-12, 2014. Google Scholar
  6. Timothy M Chan, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. Approximating text-to-pattern hamming distances. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 643-656, 2020. Google Scholar
  7. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approximate pattern matching: A unified approach. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 978-989. IEEE, 2020. Google Scholar
  8. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster pattern matching under edit distance: a reduction to dynamic puzzle matching and the seaweed monoid of permutation matrices. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 698-707. IEEE, 2022. Google Scholar
  9. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. Dictionary matching in a stream. In Algorithms-ESA 2015: 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings, pages 361-372. Springer, 2015. Google Scholar
  10. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. The k-mismatch problem revisited. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms SODA, pages 2039-2052. SIAM, 2016. Google Scholar
  11. Raphaël Clifford, Tomasz Kociumaka, and Ely Porat. The streaming k-mismatch problem. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 1106-1125. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975482.68.
  12. Richard Cole and Ramesh Hariharan. Approximate string matching: A simpler faster algorithm. SIAM Journal on Computing, 31(6):1761-1782, 2002. Google Scholar
  13. Zvi Galil and Raffaele Giancarlo. Improved string matching with k mismatches. ACM SIGACT News, 17(4):52-54, 1986. Google Scholar
  14. Arun Ganesh, Tomasz Kociumaka, Andrea Lincoln, and Barna Saha. How compression and approximation affect efficiency in string distance measures. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 2867-2919, 2022. URL: https://doi.org/10.1137/1.9781611977073.112.
  15. Paweł Gawrychowski and Tatiana Starikovskaya. Streaming dictionary matching with mismatches. Algorithmica, pages 1-21, 2019. Google Scholar
  16. Pawel Gawrychowski and Przemyslaw Uznanski. Towards unified approximate pattern matching for hamming and l_1 distance. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2018. Google Scholar
  17. Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time. In Inge Li Gørtz and Oren Weimann, editors, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020), volume 161 of Leibniz International Proceedings in Informatics (LIPIcs), pages 15:1-15:15, Dagstuhl, Germany, 2020. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.CPM.2020.15.
  18. Shay Golan, Tsvi Kopelowitz, and Ely Porat. Towards optimal approximate streaming pattern matching by matching multiple patterns in multiple streams. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  19. Shay Golan and Ely Porat. Real-time streaming multi-pattern search for constant alphabet. In 25th Annual European Symposium on Algorithms (ESA 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  20. Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987. URL: https://doi.org/10.1147/rd.312.0249.
  21. Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323-350, 1977. Google Scholar
  22. Tomasz Kociumaka, Ely Porat, and Tatiana Starikovskaya. Small-space and streaming pattern matching with k edits. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 885-896. IEEE, 2022. Google Scholar
  23. Gad M Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theoretical Computer Science, 43:239-249, 1986. Google Scholar
  24. Gad M Landau and Uzi Vishkin. Fast parallel and serial approximate string matching. Journal of algorithms, 10(2):157-169, 1989. Google Scholar
  25. Benny Porat and Ely Porat. Exact and approximate pattern matching in the streaming model. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 315-323. IEEE, 2009. Google Scholar
  26. Jakub Radoszewski and Tatiana Starikovskaya. Streaming k-mismatch with error correcting and applications. Information and Computation, 271:104513, 2020. Google Scholar
  27. Tatiana Starikovskaya. Communication and streaming complexity of approximate pattern matching. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail