Searching Long Repeats in Streams

Authors Oleg Merkurev, Arseny M. Shur



PDF
Thumbnail PDF

File

LIPIcs.CPM.2019.31.pdf
  • Filesize: 1.01 MB
  • 14 pages

Document Identifiers

Author Details

Oleg Merkurev
  • Ural Federal University, Ekaterinburg, Russia
Arseny M. Shur
  • Ural Federal University, Ekaterinburg, Russia

Acknowledgements

The authors are grateful to D. Kosolobov for very useful comments.

Cite AsGet BibTex

Oleg Merkurev and Arseny M. Shur. Searching Long Repeats in Streams. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 31:1-31:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.CPM.2019.31

Abstract

We consider two well-known related problems: Longest Repeated Substring (LRS) and Longest Repeated Reversed Substring (LRRS). Their streaming versions cannot be solved exactly; we show that only approximate solutions by Monte Carlo algorithms are possible, and prove a lower bound on consumed memory. For both problems, we present purely linear-time Monte Carlo algorithms working in O(E + n/E) space, where E is the additive approximation error. Within the same space bounds, we then present nearly real-time solutions, which require O(log n) time per symbol and O(n + n/E log n) time overall. The working space exactly matches the lower bound whenever E=O(n^{0.5}) and the size of the alphabet is Omega(n^{0.01}).

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • Longest repeated substring
  • longest repeated reversed substring
  • streaming algorithm
  • Karp
  • Rabin fingerprint
  • suffix tree

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir, Tsvi Kopelowitz, Moshe Lewenstein, and Noa Lewenstein. Towards real-time suffix tree construction. In International Symposium on String Processing and Information Retrieval, pages 67-78. Springer, 2005. Google Scholar
  2. Alberto Apostolico, Costas Iliopoulos, Gad M. Landau, Baruch Schieber, and Uzi Vishkin. Parallel construction of a suffix tree with applications. Algorithmica, 3(1-4):347-365, 1988. Google Scholar
  3. Dany Breslauer and Zvi Galil. Real-time streaming string-matching. In Combinatorial Pattern Matching, volume 6661 of LNCS, pages 162-172, Berlin, 2011. Springer. Google Scholar
  4. Dany Breslauer and Giuseppe F. Italiano. Near real-time suffix tree construction via the fringe marked ancestor problem. Journal of Discrete Algorithms, 18:32-48, 2013. Google Scholar
  5. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. Dictionary matching in a stream. In Algorithms-ESA 2015, pages 361-372. Springer, 2015. Google Scholar
  6. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. The k-mismatch problem revisited. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 2039-2052. Society for Industrial and Applied Mathematics, 2016. Google Scholar
  7. Raphaël Clifford, Tomasz Kociumaka, and Ely Porat. The streaming k-mismatch problem. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1106-1125. SIAM, 2019. Google Scholar
  8. Raphaël Clifford and Tatiana Starikovskaya. Approximate Hamming distance in a stream. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, volume 55 of LIPIcs, pages 20:1-20:14, 2016. Google Scholar
  9. Maxime Crochemore, Costas S Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Order-preserving indexing. Theoretical Computer Science, 638:122-135, 2016. Google Scholar
  10. Martin Dietzfelbinger and Friedhelm Meyer auf der Heide. Dynamic hashing in real time. In Informatik, pages 95-119. Springer, 1992. Google Scholar
  11. Paweł Gawrychowski, Oleg Merkurev, Arseny M. Shur, and Przemysław Uznański. Tight Tradeoffs for Real-Time Approximation of Longest Palindromes in Streams. In 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, volume 54 of LIPIcs, pages 18:1-18:13, 2016. Google Scholar
  12. Shay Golan, Tsvi Kopelowitz, and Ely Porat. Towards optimal approximate streaming pattern matching by matching multiple patterns in multiple streams. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  13. Shay Golan and Ely Porat. Real-Time Streaming Multi-Pattern Search for Constant Alphabet. In 25th Annual European Symposium on Algorithms, ESA 2017, volume 87 of LIPIcs, pages 41:1-41:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  14. Elena Grigorescu, Erfan Sadeqi Azer, and Samson Zhou. Streaming for aibohphobes: Longest palindrome with mismatches. In 37th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2017, volume 93 of LIPIcs, pages 31:1-31:13, 2017. Google Scholar
  15. Richard M Karp and Michael O Rabin. Efficient randomized pattern-matching algorithms. IBM journal of research and development, 31(2):249-260, 1987. Google Scholar
  16. Tsvi Kopelowitz. On-line indexing for general alphabets via predecessor queries on subsets of an ordered list. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on, pages 283-292. IEEE, 2012. Google Scholar
  17. Tsvi Kopelowitz and Moshe Lewenstein. Dynamic weighted ancestors. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 565-574. Society for Industrial and Applied Mathematics, 2007. Google Scholar
  18. Benny Porat and Ely Porat. Exact and approximate pattern matching in the streaming model. In Foundations of Computer Science, 2009. FOCS'09. 50th Annual IEEE Symposium on, pages 315-323. IEEE, 2009. Google Scholar
  19. Jakub Radoszewski and Tatiana A. Starikovskaya. Streaming K-Mismatch with Error Correcting and Applications. In 2017 Data Compression Conference, DCC 2017, pages 290-299. IEEE, 2017. Google Scholar
  20. E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. Google Scholar
  21. Peter Weiner. Linear pattern matching algorithms. In Switching and Automata Theory, 1973. SWAT'08. IEEE Conference Record of 14th Annual Symposium on, pages 1-11. IEEE, 1973. Google Scholar
  22. Dan E. Willard. Log-logarithmic worst-case range queries are possible in space Θ(N). Information Processing Letters, 17:81-84, 1983. Google Scholar
  23. Andrew Yao. Probabilistic computations: Toward a unified measure of complexity. In Proceedings of the 18th IEEE Symposium on Foundations of Computer Science (FOCS), pages 222-227, 1977. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail