Efficient Computation of 2-Covers of a String

Authors Jakub Radoszewski , Juliusz Straszyński



PDF
Thumbnail PDF

File

LIPIcs.ESA.2020.77.pdf
  • Filesize: 0.6 MB
  • 17 pages

Document Identifiers

Author Details

Jakub Radoszewski
  • Institute of Informatics, University of Warsaw, Poland
  • Samsung R&D Poland, Warsaw, Poland
Juliusz Straszyński
  • Institute of Informatics, University of Warsaw, Poland

Acknowledgements

The authors thank Patryk Czajka for helpful discussions on the initial version of the algorithm.

Cite As Get BibTex

Jakub Radoszewski and Juliusz Straszyński. Efficient Computation of 2-Covers of a String. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 77:1-77:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.ESA.2020.77

Abstract

Quasiperiodicity is a generalization of periodicity that has been researched for almost 30 years. The notion of cover is the classic variant of quasiperiodicity. A cover of a text T is a string whose occurrences in T cover all positions of T. There are several algorithms computing covers of a text in linear time. In this paper we consider a natural extension of cover. For a text T, we call a pair of strings a 2-cover if they have the same length and their occurrences cover the text T. We give an algorithm that computes all 2-covers of a string of length n in 𝒪(n log n log log n + output) expected time or 𝒪(n log n log² log n / log log log n + output) worst-case time, where output is the size of output.
If (X,Y) is a 2-cover of T, then either X is a prefix and Y is a suffix of T, in which case we call (X,Y) a ps-cover, or one of X, Y is a border (that is, both a prefix and a suffix) of T, and then we call (X,Y) a b-cover. A string of length n has up to n ps-covers; we show an algorithm that computes all of them in 𝒪(n log log n) expected time or 𝒪(n log² log n / log log log n) worst-case time. A string of length n can have Θ(n²) non-trivial b-covers; our algorithm can report one b-cover per length (if it exists) or all shortest b-covers in 𝒪(n log n log log n) expected time or 𝒪(n log n log² log n / log log log n) worst-case time. All our algorithms use linear space.
The problem in scope can be generalized to λ > 2 equal-length strings, resulting in the notion of λ-cover. Cole et al. (2005) showed that the λ-cover problem is NP-complete. Our algorithms generalize to λ-covers, with (the first component of) the algorithm’s complexity multiplied by n^{λ-2}.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • quasiperiodicity
  • cover of a string
  • 2-cover
  • lambda-cover

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Ali Alatabbi, M. Sohel Rahman, and William F. Smyth. Computing covers using prefix tables. Discrete Applied Mathematics, 212:2-9, 2016. URL: http://dx.doi.org/10.1016/j.dam.2015.05.019.
  2. Amihood Amir, Costas S. Iliopoulos, and Jakub Radoszewski. Two strings at Hamming distance 1 cannot be both quasiperiodic. Information Processing Letters, 128:54-57, 2017. URL: http://dx.doi.org/10.1016/j.ipl.2017.08.005.
  3. Amihood Amir, Avivit Levy, Moshe Lewenstein, Ronit Lubin, and Benny Porat. Can we recover the cover? Algorithmica, 81(7):2857-2875, 2019. URL: http://dx.doi.org/10.1007/s00453-019-00559-8.
  4. Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat. Approximate cover of strings. Theoretical Computer Science, 793:59-69, 2019. URL: http://dx.doi.org/10.1016/j.tcs.2019.05.020.
  5. Arne Andersson and Mikkel Thorup. Dynamic ordered sets with exponential search trees. Journal of the ACM, 54(3):13, 2007. URL: http://dx.doi.org/10.1145/1236457.1236460.
  6. Alberto Apostolico and Andrzej Ehrenfeucht. Efficient detection of quasiperiodicities in strings. Theoretical Computer Science, 119(2):247-265, 1993. URL: http://dx.doi.org/10.1016/0304-3975(93)90159-Q.
  7. Alberto Apostolico, Martin Farach, and Costas S. Iliopoulos. Optimal superprimitivity testing for strings. Information Processing Letters, 39(1):17-20, 1991. URL: http://dx.doi.org/10.1016/0020-0190(91)90056-N.
  8. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. The "runs" theorem. SIAM Journal on Computing, 46(5):1501-1514, 2017. URL: http://dx.doi.org/10.1137/15M1011032.
  9. Carl Barton, Tomasz Kociumaka, Chang Liu, Solon P. Pissis, and Jakub Radoszewski. Indexing weighted sequences: Neat and efficient. Information and Computation, 270:104462, 2020. URL: http://dx.doi.org/10.1016/j.ic.2019.104462.
  10. Amir M. Ben-Amram, Omer Berkman, Costas S. Iliopoulos, and Kunsoo Park. The subtree max gap problem with application to parallel string covering. In Daniel Dominic Sleator, editor, Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 501-510. ACM/SIAM, 1994. URL: http://dl.acm.org/citation.cfm?id=314464.314633.
  11. Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Gaston H. Gonnet, Daniel Panario, and Alfredo Viola, editors, LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, volume 1776 of Lecture Notes in Computer Science, pages 88-94. Springer, 2000. URL: http://dx.doi.org/10.1007/10719839_9.
  12. Hans L. Bodlaender, Rodney G. Downey, Michael R. Fellows, Michael T. Hallett, and Harold T. Wareham. Parameterized complexity analysis in computational biology. Computer Applications in the Biosciences, 11(1):49-57, 1995. URL: http://dx.doi.org/10.1093/bioinformatics/11.1.49.
  13. Dany Breslauer. An on-line string superprimitivity test. Information Processing Letters, 44(6):345-347, 1992. URL: http://dx.doi.org/10.1016/0020-0190(92)90111-8.
  14. Dany Breslauer. Testing string superprimitivity in parallel. Information Processing Letters, 49(5):235-241, 1994. URL: http://dx.doi.org/10.1016/0020-0190(94)90060-4.
  15. Stefan Canzar, Tobias Marschall, Sven Rahmann, and Chris Schwiegelshohn. Solving the minimum string cover problem. In Proceedings of the 14th Meeting on Algorithm Engineering & Experiments, ALENEX 2012, pages 75-83. SIAM / Omnipress, 2012. URL: http://dx.doi.org/10.1137/1.9781611972924.8.
  16. Manolis Christodoulakis, Costas S. Iliopoulos, Kunsoo Park, and Jeong Seop Sim. Approximate seeds of strings. Journal of Automata, Languages, and Combinatorics, 10(5/6):609-626, 2005. URL: http://dx.doi.org/10.25596/jalc-2005-609.
  17. Richard Cole, Costas S. Iliopoulos, Manal Mohamed, William F. Smyth, and Lu Yang. The complexity of the minimum k-cover problem. Journal of Automata, Languages, and Combinatorics, 10(5/6):641-653, 2005. Google Scholar
  18. Maxime Crochemore. An optimal algorithm for computing the repetitions in a word. Information Processing Letters, 12(5):244-250, 1981. URL: http://dx.doi.org/10.1016/0020-0190(81)90024-7.
  19. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on Strings. Cambridge University Press, 2007. URL: http://dx.doi.org/10.1017/cbo9780511546853.
  20. Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Covering problems for partial words and for indeterminate strings. Theoretical Computer Science, 698:25-39, 2017. URL: http://dx.doi.org/10.1016/j.tcs.2017.05.026.
  21. Maxime Crochemore, Costas S. Iliopoulos, and Maureen Korda. Two-dimensional prefix string matching and covering on square matrices. Algorithmica, 20(4):353-373, 1998. URL: http://dx.doi.org/10.1007/PL00009200.
  22. Maxime Crochemore and Wojciech Rytter. Jewels of Stringology. World Scientific, 2003. URL: http://dx.doi.org/10.1142/4838.
  23. Patryk Czajka and Jakub Radoszewski. Experimental evaluation of algorithms for computing quasiperiods. CoRR, abs/1909.11336, 2019 (accepted to Theoretical Computer Science). URL: http://arxiv.org/abs/1909.11336.
  24. Nathan J. Fine and Herbert S. Wilf. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society, 16(1):109-114, 1965. URL: http://dx.doi.org/10.2307/2034009.
  25. Tomás Flouri, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Simon J. Puglisi, William F. Smyth, and Wojciech Tyczyński. Enhanced string covering. Theoretical Computer Science, 506:102-114, 2013. URL: http://dx.doi.org/10.1016/j.tcs.2013.08.013.
  26. Paweł Gawrychowski, Jakub Radoszewski, and Tatiana A. Starikovskaya. Quasi-periodicity in streams. In Nadia Pisanti and Solon P. Pissis, editors, 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, volume 128 of LIPIcs, pages 22:1-22:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2019. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2019.22.
  27. Qing Guo, Hui Zhang, and Costas S. Iliopoulos. Computing the λ-covers of a string. Information Sciences, 177(19):3957-3967, 2007. URL: http://dx.doi.org/10.1016/j.ins.2007.02.020.
  28. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997. URL: http://dx.doi.org/10.1017/cbo9780511574931.
  29. Yijie Han. Deterministic sorting in O(n loglogn) time and linear space. Journal of Algorithms, 50(1):96-105, 2004. URL: http://dx.doi.org/10.1016/j.jalgor.2003.09.001.
  30. Danny Hermelin, Dror Rawitz, Romeo Rizzi, and Stéphane Vialette. The minimum substring cover problem. Information and Computation, 206(11):1303-1312, 2008. URL: http://dx.doi.org/10.1016/j.ic.2008.06.002.
  31. Costas S. Iliopoulos, Christos Makris, Yannis Panagis, Katerina Perdikuri, Evangelos Theodoridis, and Athanasios K. Tsakalidis. The weighted suffix tree: An efficient data structure for handling molecular weighted sequences and its applications. Fundamenta Informaticae, 71(2-3):259-277, 2006. URL: http://content.iospress.com/articles/fundamenta-informaticae/fi71-2-3-07.
  32. Costas S. Iliopoulos, Manal Mohamed, Laurent Mouchard, Katerina Perdikuri, William F. Smyth, and Athanasios K. Tsakalidis. String regularities with don't cares. Nordic Journal on Computing, 10(1):40-51, 2003. Google Scholar
  33. Costas S. Iliopoulos, Dennis W. G. Moore, and Kunsoo Park. Covering a string. Algorithmica, 16(3):288-297, 1996. URL: http://dx.doi.org/10.1007/BF01955677.
  34. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. Journal of the ACM, 53(6):918-936, 2006. URL: http://dx.doi.org/10.1145/1217856.1217858.
  35. Tomasz Kociumaka. Efficient Data Structures for Internal Queries in Texts. PhD thesis, University of Warsaw, 2018. URL: https://mimuw.edu.pl/~kociumaka/files/phd.pdf.
  36. Tomasz Kociumaka, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. A linear-time algorithm for seeds computation. ACM Transactions on Algorithms, 16(2):Article 27, 2020. URL: http://dx.doi.org/10.1145/3386369.
  37. Tomasz Kociumaka, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. A linear time algorithm for seeds computation. In Yuval Rabani, editor, 23rd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, pages 1095-1112. SIAM, 2012. URL: http://dx.doi.org/10.1137/1.9781611973099.
  38. Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Efficient algorithms for shortest partial seeds in words. Theoretical Computer Science, 710:139-147, 2018. URL: http://dx.doi.org/10.1016/j.tcs.2016.11.035.
  39. Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Fast algorithm for partial covers in words. Algorithmica, 73(1):217-233, 2015. URL: http://dx.doi.org/10.1007/s00453-014-9915-3.
  40. Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Internal pattern matching queries in a text and applications. In Piotr Indyk, editor, 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pages 532-551. SIAM, 2015. URL: http://dx.doi.org/10.1137/1.9781611973730.36.
  41. Yin Li and William F. Smyth. Computing the cover array in linear time. Algorithmica, 32(1):95-106, 2002. URL: http://dx.doi.org/10.1007/s00453-001-0062-2.
  42. Michael G. Main and Richard J. Lorentz. An O(n log n) algorithm for finding all repetitions in a string. Journal of Algorithms, 5(3):422-432, 1984. URL: http://dx.doi.org/10.1016/0196-6774(84)90021-X.
  43. Dennis Moore and W. F. Smyth. Computing the covers of a string in linear time. In Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’94, page 511–515, USA, 1994. Society for Industrial and Applied Mathematics. Google Scholar
  44. Dennis W. G. Moore and William F. Smyth. An optimal algorithm to compute all the covers of a string. Information Processing Letters, 50(5):239-246, 1994. URL: http://dx.doi.org/10.1016/0020-0190(94)00045-X.
  45. Dennis W. G. Moore and William F. Smyth. A correction to "An optimal algorithm to compute all the covers of a string". Information Processing Letters, 54(2):101-103, 1995. URL: http://dx.doi.org/10.1016/0020-0190(94)00235-Q.
  46. Jean Néraud. Elementariness of a finite set of words is co-NP-complete. RAIRO Theoretical Informatics and Applications, 24:459-470, 1990. URL: http://dx.doi.org/10.1051/ita/1990240504591.
  47. Alexandru Popa and Andrei Tanasescu. An output-sensitive algorithm for the minimization of 2-dimensional string covers. In T. V. Gopal and Junzo Watada, editors, Theory and Applications of Models of Computation - 15th Annual Conference, TAMC 2019, volume 11436 of Lecture Notes in Computer Science, pages 536-549. Springer, 2019. URL: http://dx.doi.org/10.1007/978-3-030-14812-6_33.
  48. Dan E. Willard. Log-logarithmic worst-case range queries are possible in space θ(n). Information Processing Letters, 17(2):81-84, 1983. URL: http://dx.doi.org/10.1016/0020-0190(83)90075-3.
  49. Hui Zhang, Qing Guo, and Costas S. Iliopoulos. Algorithms for computing the lambda-regularities in strings. Fundamenta Informaticae, 84(1):33-49, 2008. URL: http://content.iospress.com/articles/fundamenta-informaticae/fi84-1-04.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail