Repetition Detection in a Dynamic String

Authors Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos , Eitan Kondratovsky



PDF
Thumbnail PDF

File

LIPIcs.ESA.2019.5.pdf
  • Filesize: 0.64 MB
  • 18 pages

Document Identifiers

Author Details

Amihood Amir
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Itai Boneh
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Panagiotis Charalampopoulos
  • Department of Informatics, King’s College London, London, UK, Efi Arazi School of Computer Science, The Interdisciplinary Center Herzliya, Herzliya, Israel
Eitan Kondratovsky
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel

Acknowledgements

We warmly thank Tomasz Kociumaka for useful discussions.

Cite AsGet BibTex

Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos, and Eitan Kondratovsky. Repetition Detection in a Dynamic String. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ESA.2019.5

Abstract

A string UU for a non-empty string U is called a square. Squares have been well-studied both from a combinatorial and an algorithmic perspective. In this paper, we are the first to consider the problem of maintaining a representation of the squares in a dynamic string S of length at most n. We present an algorithm that updates this representation in n^o(1) time. This representation allows us to report a longest square-substring of S in O(1) time and all square-substrings of S in O(output) time. We achieve this by introducing a novel tool - maintaining prefix-suffix matches of two dynamic strings. We extend the above result to address the problem of maintaining a representation of all runs (maximal repetitions) of the string. Runs are known to capture the periodic structure of a string, and, as an application, we show that our representation of runs allows us to efficiently answer periodicity queries for substrings of a dynamic string. These queries have proven useful in static pattern matching problems and our techniques have the potential of offering solutions to these problems in a dynamic text setting.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • string algorithms
  • dynamic algorithms
  • squares
  • repetitions
  • runs

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Paniz Abedin, Sahar Hooshmand, Arnab Ganguly, and Sharma V. Thankachan. The Heaviest Induced Ancestors Problem Revisited. In Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China, pages 20:1-20:13, 2018. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.20.
  2. Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. New Data Structures for Orthogonal Range Searching. In 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, 12-14 November 2000, Redondo Beach, California, USA, pages 198-207. IEEE Computer Society, 2000. URL: https://doi.org/10.1109/SFCS.2000.892088.
  3. Amihood Amir and Itai Boneh. Locally Maximal Common Factors as a Tool for Efficient Dynamic String Algorithms. In Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China, pages 11:1-11:13, 2018. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.11.
  4. Amihood Amir and Itai Boneh. Dynamic Palindrome Detection. CoRR, abs/1906.09732, 2019. URL: http://arxiv.org/abs/1906.09732.
  5. Amihood Amir, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest Common Factor After One Edit Operation. In Gabriele Fici, Marinella Sciortino, and Rossano Venturini, editors, String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Proceedings, volume 10508 of Lecture Notes in Computer Science, pages 14-26, Cham, 2017. Springer International Publishing. URL: https://doi.org/10.1007/978-3-319-67428-5_2.
  6. Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest Common Substring Made Fully Dynamic. In Ola Svensson Michael A. Bender and Grzegorz Herman, editors, 27th Annual European Symposium on Algorithms, ESA 2019, Munich/Garching, Germany, September 9-11, 2019, LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2019. Google Scholar
  7. Amihood Amir and Martin Farach. Adaptive Dictionary Matching. In 32nd Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, 1-4 October 1991, pages 760-766. IEEE Computer Society, 1991. URL: https://doi.org/10.1109/SFCS.1991.185445.
  8. Amihood Amir, Martin Farach, Ramana M. Idury, Johannes A. La Poutré, and Alejandro A. Schäffer. Improved Dynamic Dictionary Matching. Inf. Comput., 119(2):258-282, 1995. URL: https://doi.org/10.1006/inco.1995.1090.
  9. Amihood Amir, Gad M. Landau, Moshe Lewenstein, and Dina Sokol. Dynamic text and static pattern matching. ACM Trans. Algorithms, 3(2):19, 2007. URL: https://doi.org/10.1145/1240233.1240242.
  10. Alberto Apostolico and Franco P. Preparata. Optimal Off-Line Detection of Repetitions in a String. Theor. Comput. Sci., 22:297-315, 1983. URL: https://doi.org/10.1016/0304-3975(83)90109-3.
  11. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. The "Runs" Theorem. SIAM J. Comput., 46(5):1501-1514, 2017. URL: https://doi.org/10.1137/15M1011032.
  12. Gary Benson. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research, 27(2):573-580, January 1999. URL: https://doi.org/10.1093/nar/27.2.573.
  13. Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel Vildhøj, and Søren Vind. Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation. Algorithmica, 80(11):3207-3224, 2018. URL: https://doi.org/10.1007/s00453-017-0380-7.
  14. Timothy M. Chan and Konstantinos Tsakalidis. Dynamic Orthogonal Range Searching on the RAM, Revisited. In Boris Aronov and Matthew J. Katz, editors, 33rd International Symposium on Computational Geometry, SoCG 2017, July 4-7, 2017, Brisbane, Australia, volume 77 of LIPIcs, pages 28:1-28:13. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.SoCG.2017.28.
  15. Maxime Crochemore. An Optimal Algorithm for Computing the Repetitions in a Word. Inf. Process. Lett., 12(5):244-250, 1981. URL: https://doi.org/10.1016/0020-0190(81)90024-7.
  16. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on strings. Cambridge University Press, 2007. Google Scholar
  17. Maxime Crochemore, Costas S. Iliopoulos, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Extracting powers and periods in a word from its runs structure. Theor. Comput. Sci., 521:29-41, 2014. URL: https://doi.org/10.1016/j.tcs.2013.11.018.
  18. Camil Demetrescu, David Eppstein, Zvi Galil, and Giuseppe F. Italiano. Dynamic Graph Algorithms. In Mikhail J. Atallah and Marina Blanton, editors, Algorithms and Theory of Computation Handbook: General Concepts and Techniques, chapter 9. Chapman & Hall/CRC, 2010. URL: http://dl.acm.org/citation.cfm?id=1882757.1882766.
  19. Nevzat Onur Domaniç and Franco P. Preparata. A Novel Approach to the Detection of Genomic Approximate Tandem Repeats in the Levenshtein Metric. Journal of Computational Biology, 14(7):873-891, 2007. URL: https://doi.org/10.1089/cmb.2007.0018.
  20. Paolo Ferragina. Dynamic Text Indexing under String Updates. J. Algorithms, 22(2):296-328, 1997. URL: https://doi.org/10.1006/jagm.1996.0814.
  21. Paolo Ferragina and Fabrizio Luccio. Dynamic Dictionary Matching in External Memory. Inf. Comput., 146(2):85-99, 1998. URL: https://doi.org/10.1006/inco.1998.2733.
  22. N. J. Fine and H. S. Wilf. Uniqueness Theorems for Periodic Functions. Proceedings of the American Mathematical Society, 16(1):109-114, 1965. URL: http://www.jstor.org/stable/2034009.
  23. Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest substring palindrome after edit. In Annual Symposium on Combinatorial Pattern Matching, CPM 2018, pages 12:1-12:14, 2018. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.12.
  24. Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Queries for Longest Substring Palindrome After Block Edit. In Nadia Pisanti and Solon P. Pissis, editors, 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, June 18-20, 2019, Pisa, Italy, volume 128 of LIPIcs, pages 27:1-27:13. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.27.
  25. Pawel Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Lacki, and Piotr Sankowski. Optimal Dynamic Strings. In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 1509-1528. SIAM, 2018. URL: https://doi.org/10.1137/1.9781611975031.99.
  26. Ming Gu, Martin Farach, and Richard Beigel. An Efficient Algorithm for Dynamic Text Indexing. In Daniel Dominic Sleator, editor, Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms. 23-25 January 1994, Arlington, Virginia., pages 697-704. ACM/SIAM, 1994. URL: http://dl.acm.org/citation.cfm?id=314464.314675.
  27. Dan Gusfield and Jens Stoye. Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci., 69(4):525-546, 2004. Google Scholar
  28. Dan Gusfield and Jens Stoye. Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci., 69(4):525-546, 2004. Google Scholar
  29. Ramana M. Idury and Alejandro A. Schäffer. Dynamic Dictionary Matching with Failure Functions. Theor. Comput. Sci., 131(2):295-310, 1994. URL: https://doi.org/10.1016/0304-3975(94)90176-7.
  30. Richard M. Karp, Raymond E. Miller, and Arnold L. Rosenberg. Rapid Identification of Repeated Patterns in Strings, Trees and Arrays. In Proceedings of the 4th Annual ACM Symposium on Theory of Computing, May 1-3, 1972, Denver, Colorado, USA, pages 125-136, 1972. URL: https://doi.org/10.1145/800152.804905.
  31. Richard M. Karp and Michael O. Rabin. Efficient Randomized Pattern-Matching Algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987. URL: https://doi.org/10.1147/rd.312.0249.
  32. Tomasz Kociumaka. Efficient Data Structures for Internal Queries in Texts. PhD thesis, University of Warsaw, 2018. URL: https://mimuw.edu.pl/~kociumaka/files/phd.pdf.
  33. Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Efficient Data Structures for the Factor Periodicity Problem. In Liliana Calderón-Benavides, Cristina N. González-Caro, Edgar Chávez, and Nivio Ziviani, editors, String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Cartagena de Indias, Colombia, October 21-25, 2012. Proceedings, volume 7608 of Lecture Notes in Computer Science, pages 284-294. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-34109-0_30.
  34. Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Internal Pattern Matching Queries in a Text and Applications. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 532-551. SIAM, 2015. URL: https://doi.org/10.1137/1.9781611973730.36.
  35. Roman M. Kolpakov and Gregory Kucherov. Finding Maximal Repetitions in a Word in Linear Time. In 40th Annual Symposium on Foundations of Computer Science, FOCS '99, 17-18 October, 1999, New York, NY, USA, pages 596-604, 1999. URL: https://doi.org/10.1109/SFFCS.1999.814634.
  36. Gad M. Landau, Jeanette P. Schmidt, and Dina Sokol. An Algorithm for Approximate Tandem Repeats. Journal of Computational Biology, 8(1):1-18, 2001. URL: https://doi.org/10.1089/106652701300099038.
  37. Michael G. Main and Richard J. Lorentz. An O(nlog n) Algorithm for Finding All Repetitions in a String. J. Algorithms, 5(3):422-432, 1984. URL: https://doi.org/10.1016/0196-6774(84)90021-X.
  38. Kurt Mehlhorn, R. Sundar, and Christian Uhrig. Maintaining Dynamic Sequences under Equality Tests in Polylogarithmic Time. Algorithmica, 17(2):183-198, 1997. URL: https://doi.org/10.1007/BF02522825.
  39. Süleyman Cenk Sahinalp and Uzi Vishkin. Efficient Approximate and Dynamic Matching of Patterns Using a Labeling Paradigm (extended abstract). In 37th Annual Symposium on Foundations of Computer Science, FOCS '96, Burlington, Vermont, USA, 14-16 October, 1996, pages 320-328. IEEE Computer Society, 1996. URL: https://doi.org/10.1109/SFCS.1996.548491.
  40. Dina Sokol, Gary Benson, and Justin Tojeira. Tandem repeats over the edit distance. Bioinformatics, 23(2):30-35, 2007. URL: https://doi.org/10.1093/bioinformatics/btl309.
  41. Ydo Wexler, Zohar Yakhini, Yechezkel Kashi, and Dan Geiger. Finding Approximate Tandem Repeats in Genomic Sequences. Journal of Computational Biology, 12(7):928-942, 2005. URL: https://doi.org/10.1089/cmb.2005.12.928.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail