The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time

Authors Shay Golan , Tomasz Kociumaka , Tsvi Kopelowitz , Ely Porat



PDF
Thumbnail PDF

File

LIPIcs.CPM.2020.15.pdf
  • Filesize: 0.53 MB
  • 15 pages

Document Identifiers

Author Details

Shay Golan
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Tomasz Kociumaka
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Tsvi Kopelowitz
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Ely Porat
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel

Cite As Get BibTex

Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.CPM.2020.15

Abstract

We revisit the k-mismatch problem in the streaming model on a pattern of length m and a streaming text of length n, both over a size-σ alphabet. The current state-of-the-art algorithm for the streaming k-mismatch problem, by Clifford et al. [SODA 2019], uses Õ(k) space and Õ(√k) worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is Õ(n√k), and the fastest known offline algorithm, which costs Õ(n + min(nk/√m, σn)) time. Moreover, it is not known whether improvements over the Õ(n√k) total time are possible when using more than O(k) space.
We address these gaps by designing a randomized streaming algorithm for the k-mismatch problem that, given an integer parameter k≤s≤m, uses Õ(s) space and costs Õ(n+min(nk²/m, nk/√s, σnm/s)) total time. For s=m, the total runtime becomes Õ(n + min(nk/√m, σn)), which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still Õ(√k).

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • Streaming pattern matching
  • Hamming distance
  • k-mismatch

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Karl R. Abrahamson. Generalized string matching. SIAM Journal on Computing, 16(6):1039-1051, 1987. URL: https://doi.org/10.1137/0216067.
  2. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. Journal of Algorithms, 50(2):257-275, 2004. URL: https://doi.org/10.1016/S0196-6774(03)00097-X.
  3. Dany Breslauer and Zvi Galil. Real-time streaming string-matching. ACM Transactions on Algorithms, 10(4):22:1-22:12, 2014. URL: https://doi.org/10.1145/2635814.
  4. Timothy M. Chan, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. Approximating text-to-pattern Hamming distances. In 52nd Annual ACM Symposium on Theory of Computing, STOC 2020, 2020. URL: http://arxiv.org/abs/2001.00211.
  5. Raphaël Clifford, Klim Efremenko, Benny Porat, and Ely Porat. A black box for online approximate pattern matching. Information and Computation, 209(4):731-736, 2011. URL: https://doi.org/10.1016/j.ic.2010.12.007.
  6. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. Dictionary matching in a stream. In Nikhil Bansal and Irene Finocchi, editors, 23rd Annual European Symposium on Algorithms, ESA 2015, volume 9294 of LNCS, pages 361-372. Springer, 2015. URL: https://doi.org/10.1007/978-3-662-48350-3_31.
  7. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. The k-mismatch problem revisited. In Robert Krauthgamer, editor, 27th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, pages 2039-2052. SIAM, 2016. URL: https://doi.org/10.1137/1.9781611974331.ch142.
  8. Raphaël Clifford, Tomasz Kociumaka, and Ely Porat. The streaming k-mismatch problem. In Timothy M. Chan, editor, 30th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 1106-1125. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975482.68.
  9. Raphaël Clifford and Benjamin Sach. Pseudo-realtime pattern matching: Closing the gap. In Amihood Amir and Laxmi Parida, editors, 21st Annual Symposium on Combinatorial Pattern Matching, CPM 2010, volume 6129 of LNCS, pages 101-111. Springer, 2010. URL: https://doi.org/10.1007/978-3-642-13509-5_10.
  10. Richard Cole and Ramesh Hariharan. Approximate string matching: A simpler faster algorithm. SIAM Journal on Computing, 31(6):1761-1782, 2002. URL: https://doi.org/10.1137/S0097539700370527.
  11. Michael J. Fischer and Michael S. Paterson. String matching and other products. In Richard M. Karp, editor, Complexity of Computation, volume 7 of SIAM-AMS Proceedings, pages 113-125, Providence, RI, 1974. AMS. Google Scholar
  12. Zvi Galil and Raffaele Giancarlo. Parallel string matching with k mismatches. Theoretical Computer Science, 51:341-348, 1987. URL: https://doi.org/10.1016/0304-3975(87)90042-9.
  13. Paweł Gawrychowski and Tatiana Starikovskaya. Streaming dictionary matching with mismatches. In Nadia Pisanti and Solon P. Pissis, editors, 30th Annual Symposium on Combinatorial Pattern Matching, CPM, volume 128 of LIPIcs, pages 21:1-21:15. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.21.
  14. Paweł Gawrychowski and Przemysław Uznański. Personal communication, November 2018. Google Scholar
  15. Paweł Gawrychowski and Przemysław Uznański. Towards unified approximate pattern matching for Hamming and L₁ distance. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, volume 107 of LIPIcs, pages 62:1-62:13. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.62.
  16. Shay Golan, Tsvi Kopelowitz, and Ely Porat. Towards optimal approximate streaming pattern matching by matching multiple patterns in multiple streams. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, volume 107 of LIPIcs, pages 65:1-65:16. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.65.
  17. Shay Golan and Ely Porat. Real-time streaming multi-pattern search for constant alphabet. In Kirk Pruhs and Christian Sohler, editors, 25th Annual European Symposium on Algorithms, ESA 2017, volume 87 of LIPIcs, pages 41:1-41:15. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.ESA.2017.41.
  18. Gad M. Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theoretical Computer Science, 43:239-249, 1986. URL: https://doi.org/10.1016/0304-3975(86)90178-7.
  19. Gad M. Landau and Uzi Vishkin. Fast parallel and serial approximate string matching. Journal of Algorithms, 10(2):157-169, 1989. URL: https://doi.org/10.1016/0196-6774(89)90010-2.
  20. Benny Porat and Ely Porat. Exact and approximate pattern matching in the streaming model. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2009, pages 315-323. IEEE Computer Society, 2009. URL: https://doi.org/10.1109/FOCS.2009.11.
  21. Jakub Radoszewski and Tatiana Starikovskaya. Streaming k-mismatch with error correcting and applications. In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, 2017 Data Compression Conference, DCC 2017, pages 290-299. IEEE, 2017. URL: https://doi.org/10.1109/DCC.2017.14.
  22. Süleyman Cenk Sahinalp and Uzi Vishkin. Efficient approximate and dynamic matching of patterns using a labeling paradigm (extended abstract). In 37th Annual Symposium on Foundations of Computer Science, FOCS, pages 320-328. IEEE Computer Society, 1996. URL: https://doi.org/10.1109/SFCS.1996.548491.
  23. Tatiana Starikovskaya, Michal Svagerka, and Przemysław Uznański. L_p pattern matching in a stream, 2019. URL: http://arxiv.org/abs/1907.04405.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail