On Extensions of Maximal Repeats in Compressed Strings

Author Julian Pape-Lange



PDF
Thumbnail PDF

File

LIPIcs.CPM.2020.27.pdf
  • Filesize: 449 kB
  • 13 pages

Document Identifiers

Author Details

Julian Pape-Lange
  • Technische Universität Chemnitz, Straße der Nationen 62, 09111 Chemnitz, Germany

Acknowledgements

Fabio Cunial suggested that my previous work might be extendable from counting maximal repeats to counting extensions of maximal repeats. He also pointed out that such a result would be more interesting since it is more closely linked to the size of the compacted directed acyclic word graph. Nicola Prezza noted that my previous work also resulted in a non-trivial upper bound for the number of runs in the run-length Burrows-Wheeler transform and that a more careful investigation of the extensions of maximal repeats might result in a better bound for the Burrows-Wheeler conjecture which was unsolved at that time. I also thank Djamal Belazzougui for notifying me of the "Resolution of the Burrows-Wheeler Conjecture" by Kempa and Kociumaka.

Cite AsGet BibTex

Julian Pape-Lange. On Extensions of Maximal Repeats in Compressed Strings. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 27:1-27:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.CPM.2020.27

Abstract

This paper provides upper bounds for several subsets of maximal repeats and maximal pairs in compressed strings and also presents a formerly unknown relationship between maximal pairs and the run-length Burrows-Wheeler transform. This relationship is used to obtain a different proof for the Burrows-Wheeler conjecture which has recently been proven by Kempa and Kociumaka in "Resolution of the Burrows-Wheeler Transform Conjecture". More formally, this paper proves that the run-length Burrows-Wheeler transform of a string S with z_S LZ77-factors has at most 73(log₂ |S|)(z_S+2)² runs, and if S does not contain q-th powers, the number of arcs in the compacted directed acyclic word graph of S is bounded from above by 18q(1+log_q |S|)(z_S+2)².

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Combinatorics on words
  • Mathematics of computing → Combinatoric problems
Keywords
  • Maximal repeats
  • Extensions of maximal repeats
  • Combinatorics on compressed strings
  • LZ77
  • Burrows-Wheeler transform
  • Burrows-Wheeler transform conjecture
  • Compact suffix automata
  • CDAWGs

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Djamal Belazzougui, Fabio Cunial, Travis Gagie, Nicola Prezza, and Mathieu Raffinot. Composite repetition-aware data structures. In Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro, editors, Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Ischia Island, Italy, June 29 - July 1, 2015, Proceedings, volume 9133 of Lecture Notes in Computer Science, pages 26-39. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-19929-0_3.
  2. Anselm Blumer, J. Blumer, David Haussler, Ross M. McConnell, and Andrzej Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis. J. ACM, 34(3):578-595, 1987. URL: https://doi.org/10.1145/28869.28873.
  3. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical report, DEC Systems Research Center, 1994. Google Scholar
  4. Manolis Christodoulakis, Costas S. Iliopoulos, and Yoan José Pinzón Ardila. Simple algorithm for sorting the fibonacci string rotations. In Jiří Wiedermann, Gerard Tel, Jaroslav Pokorný, Mária Bieliková, and Július Štuller, editors, SOFSEM 2006: Theory and Practice of Computer Science, pages 218-225, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. Google Scholar
  5. M. Crochemore and W. Rytter. Squares, cubes, and time-space efficient string searching. Algorithmica, 13(5):405-425, May 1995. URL: https://doi.org/10.1007/BF01190846.
  6. N. J. Fine and H. S. Wilf. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society, 16(1):109-114, 1965. URL: http://www.jstor.org/stable/2034009.
  7. I. Furuya, T. Takagi, Y. Nakashima, S. Inenaga, H. Bannai, and T. Kida. Mr-repair: Grammar compression based on maximal repeats. In 2019 Data Compression Conference (DCC), pages 508-517, March 2019. URL: https://doi.org/10.1109/DCC.2019.00059.
  8. Dominik Kempa and Tomasz Kociumaka. Resolution of the burrows-wheeler transform conjecture. CoRR, abs/1910.10631, 2019. URL: http://arxiv.org/abs/1910.10631.
  9. Julian Pape-Lange. On Maximal Repeats in Compressed Strings. In Nadia Pisanti and Solon P. Pissis, editors, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019), volume 128 of Leibniz International Proceedings in Informatics (LIPIcs), pages 18:1-18:13, Dagstuhl, Germany, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.18.
  10. Mathieu Raffinot. On maximal repeats in strings. Inf. Process. Lett., 80(3):165-169, 2001. URL: https://doi.org/10.1016/S0020-0190(01)00152-1.