Streaming Dictionary Matching with Mismatches

Authors Paweł Gawrychowski, Tatiana Starikovskaya



PDF
Thumbnail PDF

File

LIPIcs.CPM.2019.21.pdf
  • Filesize: 445 kB
  • 15 pages

Document Identifiers

Author Details

Paweł Gawrychowski
  • University of Wrocław, 50-137 Wrocław, Poland
Tatiana Starikovskaya
  • DIENS, École normale supérieure, PSL Research University, 75005 Paris, France

Acknowledgements

The authors would like to thank the anonymous reviewers for their helpful and constructive comments that greatly contributed to improving this paper.

Cite AsGet BibTex

Paweł Gawrychowski and Tatiana Starikovskaya. Streaming Dictionary Matching with Mismatches. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 21:1-21:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.CPM.2019.21

Abstract

In the k-mismatch problem we are given a pattern of length m and a text and must find all locations where the Hamming distance between the pattern and the text is at most k. A series of recent breakthroughs have resulted in an ultra-efficient streaming algorithm for this problem that requires only O(k log m/k) space [Clifford, Kociumaka, Porat, SODA 2019]. In this work, we consider a strictly harder problem called dictionary matching with k mismatches, where we are given a dictionary of d patterns of lengths at most m and must find all their k-mismatch occurrences in the text, and show the first streaming algorithm for it. The algorithm uses O(k d log^k d polylog m) space and processes each position of the text in O(k log^k d polylog m + occ) time, where occ is the number of k-mismatch occurrences of the patterns that end at this position. The algorithm is randomised and outputs correct answers with high probability.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • Streaming
  • multiple pattern matching
  • Hamming distance

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to bibliographic search. Communincations of the ACM, 18(6):333-340, June 1975. URL: http://dx.doi.org/10.1145/360825.360855.
  2. Ricardo Baeza-Yates and Gonzalo Navarro. Multiple approximate string matching. In Proc. of the 5th Workshop on Algorithms and Data Structures, pages 174-184, 1997. URL: http://dx.doi.org/10.1007/3-540-63307-3_57.
  3. Djamal Belazzougui. Succinct Dictionary Matching with No Slowdown. In Proc. of the 21st Annual Symposium on Combinatorial Pattern Matching, pages 88-100, 2010. URL: http://dx.doi.org/10.1007/978-3-642-13509-5_9.
  4. Djamal Belazzougui. Worst-case efficient single and multiple string matching on packed texts in the word-RAM model. Journal of Discrete Algorithms, 14:91-106, 2012. URL: http://dx.doi.org/10.1007/978-3-642-19222-7_10.
  5. Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses. In Proc. of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 785-794, 2009. URL: http://dx.doi.org/10.1137/1.9781611973068.86.
  6. Djamal Belazzougui, Paolo Boldi, and Sebastiano Vigna. Dynamic Z-Fast Tries. In Proc. of the 17th International Symposium on String Processing and Information Retrieval, pages 159-172, 2010. URL: http://dx.doi.org/10.1007/978-3-642-16321-0_15.
  7. Djamal Belazzougui and Mathieu Raffinot. Average Optimal String Matching in Packed Strings. In Proc. of the 8th International Conference on Algorithms and Complexity, pages 37-48, 2013. URL: http://dx.doi.org/10.1007/978-3-642-38233-8_4.
  8. Dany Breslauer and Zvi Galil. Real-Time Streaming String-Matching. ACM Transactions on Algorithms, 10(4):22:1-22:12, August 2014. URL: http://dx.doi.org/10.1145/2635814.
  9. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. Dictionary Matching in a Stream. In Proc. of the 23rd Annual European Symposium on Algorithms, pages 361-372, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48350-3_31.
  10. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. The k-mismatch problem revisited. In Proc. of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2039-2052, 2016. URL: http://dx.doi.org/10.1137/1.9781611974331.ch142.
  11. Raphaël Clifford, Tomasz Kociumaka, and Ely Porat. The streaming k-mismatch problem. In Proc. of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1106-1125, 2019. URL: http://dx.doi.org/10.1137/1.9781611975482.68.
  12. Richard Cole, Lee-Ad Gottlieb, and Moshe Lewenstein. Dictionary matching and indexing with errors and don't cares. In Proc. of the 36th Annual ACM Symposium on Theory of Computing, pages 91-100, 2004. URL: http://dx.doi.org/10.1145/1007352.1007374.
  13. Beate Commentz-Walter. A string matching algorithm fast on the average. In Proc. of the 6th International Colloquium on Automata, Languages and Programming, pages 118-132, 1979. URL: http://dx.doi.org/10.1007/3-540-09510-1_10.
  14. Maxime Crochemore, Artur Czumaj, Leszek Gasieniec, Thierry Lecroq, Wojciech Plandowski, and Wojciech Rytter. Fast practical multi-pattern matching. Information Processing Letters, 71(3):107-113, 1999. URL: http://dx.doi.org/10.1016/S0020-0190(99)00092-7.
  15. Martin Dietzfelbinger and Friedhelm Meyer auf der Heide. Dynamic Hashing in Real Time, pages 95-119. Vieweg+Teubner Verlag, Wiesbaden, 1992. URL: http://dx.doi.org/10.1007/978-3-322-95233-2_7.
  16. Johannes Fischer, Travis Gagie, Paweł Gawrychowski, and Tomasz Kociumaka. Approximating LZ77 via Small-Space Multiple-Pattern Matching. In Proc. of the 23rd European Symposium on Algorithms, pages 533-544, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48350-3_45.
  17. Shay Golan, Tsvi Kopelowitz, and Ely Porat. Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams. In Proc. of the 45th International Colloquium on Automata, Languages, and Programming, pages 65:1-65:16, 2018. URL: http://dx.doi.org/10.4230/LIPIcs.ICALP.2018.65.
  18. Shay Golan and Ely Porat. Real-Time Streaming Multi-Pattern Search for Constant Alphabet. In Proc. of the 25th Annual European Symposium on Algorithms, volume 87, pages 41:1-41:15, 2017. URL: http://dx.doi.org/10.4230/LIPIcs.ESA.2017.41.
  19. Wing-Kai Hon, Tsung-Han Ku, Rahul Shah, Sharma V. Thankachan, and Jeffrey Scott Vitter. Faster Compressed Dictionary Matching. In Proc. of the 17th International Symposium on String Processing and Information Retrieval, pages 191-200, 2010. URL: http://dx.doi.org/10.1007/978-3-642-16321-0_19.
  20. Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev., 31(2):249-260, March 1987. URL: http://dx.doi.org/10.1147/rd.312.0249.
  21. Tsvi Kopelowitz, Ely Porat, and Yaron Rozen. Succinct Online Dictionary Matching with Improved Worst-Case Guarantees. In Proc. of the 27th Annual Symposium on Combinatorial Pattern Matching, volume 54, pages 6:1-6:13, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2016.6.
  22. Dmitry Kosolobov and Nikita Sivukhin. Compressed Multiple Pattern Matching. CoRR, abs/1811.01248, 2018. URL: http://arxiv.org/abs/1811.01248.
  23. Ilan Kremer, Noam Nisan, and Dana Ron. On Randomized One-round Communication Complexity. In Proc. of the 27th Annual ACM Symposium on Theory of Computing, pages 596-605, 1995. URL: http://dx.doi.org/10.1007/s000370050018.
  24. Robert Muth and Udi Manber. Approximate multiple string search. In Proc. of the 7th Annual Symposium on Combinatorial Pattern Matching, pages 75-86, 1996. URL: http://dx.doi.org/10.1007/3-540-61258-0_7.
  25. Gonzalo Navarro. Multiple Approximate String Matching by Counting. In Proc. of the 4th South American Workshop on String Processing, pages 95-111, 1997. Google Scholar
  26. Benny Porat and Ely Porat. Exact And Approximate Pattern Matching In The Streaming Model. In Proc. of the 50th Annual Symposium on Foundations of Computer Science, pages 315-323, 2009. URL: http://dx.doi.org/10.1109/FOCS.2009.11.
  27. Sun Wu and Udi Manber. Agrep - A Fast Approximate Pattern-Matching Tool. In Proc. of the USENIX Technical Conference, pages 153-162, 1992. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail