Conversion from RLBWT to LZ77

Authors Takaaki Nishimoto, Yasuo Tabei



PDF
Thumbnail PDF

File

LIPIcs.CPM.2019.9.pdf
  • Filesize: 0.57 MB
  • 12 pages

Document Identifiers

Author Details

Takaaki Nishimoto
  • RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
Yasuo Tabei
  • RIKEN Center for Advanced Intelligence Project, Tokyo, Japan

Cite AsGet BibTex

Takaaki Nishimoto and Yasuo Tabei. Conversion from RLBWT to LZ77. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 9:1-9:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.CPM.2019.9

Abstract

Converting a compressed format of a string into another compressed format without an explicit decompression is one of the central research topics in string processing. We discuss the problem of converting the run-length Burrows-Wheeler Transform (RLBWT) of a string into Lempel-Ziv 77 (LZ77) phrases of the reversed string. The first results with Policriti and Prezza’s conversion algorithm [Algorithmica 2018] were O(n log r) time and O(r) working space for length of the string n, number of runs r in the RLBWT, and number of LZ77 phrases z. Recent results with Kempa’s conversion algorithm [SODA 2019] are O(n / log n + r log^{9} n + z log^{9} n) time and O(n / log_{sigma} n + r log^{8} n) working space for the alphabet size sigma of the RLBWT. In this paper, we present a new conversion algorithm by improving Policriti and Prezza’s conversion algorithm where dynamic data structures for general purpose are used. We argue that these dynamic data structures can be replaced and present new data structures for faster conversion. The time and working space of our conversion algorithm with new data structures are O(n min{log log n, sqrt{(log r)/(log log r)}}) and O(r), respectively.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data compression
Keywords
  • Burrows-Wheeler Transform
  • Lempel-Ziv Parsing
  • Lossless Data Compression

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Hideo Bannai, Pawel Gawrychowski, Shunsuke Inenaga, and Masayuki Takeda. Converting SLP to LZ78 in almost Linear Time. In Proceedings of CPM, pages 38-49, 2013. Google Scholar
  2. Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Efficient LZ78 Factorization of Grammar Compressed Text. In Proceedings of SPIRE, pages 86-98, 2012. Google Scholar
  3. Paul Beame and Faith E. Fich. Optimal Bounds for the Predecessor Problem and Related Problems. J. Comput. Syst. Sci., 65(1):38-72, 2002. Google Scholar
  4. Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel Vildhøj, and Søren Vind. Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation. Algorithmica, 80(11):3207-3224, 2018. Google Scholar
  5. Michael Burrows and David J Wheeler. A block-sorting lossless data compression algorithm. Technical report, 1994. Google Scholar
  6. Johannes Fischer and Pawel Gawrychowski. Alphabet-Dependent String Searching with Wexponential Search Trees. In Proceedings of CPM, pages 160-171, 2015. Google Scholar
  7. Johannes Fischer and Volker Heun. Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays. SIAM J. Comput., 40(2):465-492, 2011. Google Scholar
  8. Travis Gagie, Gonzalo Navarro, and Nicola Prezza. Optimal-Time Text Indexing in BWT-runs Bounded Space. In Proceedings of SODA, pages 1459-1477, 2018. Google Scholar
  9. Wing-Kai Hon, Kunihiko Sadakane, and Wing-Kin Sung. Succinct data structures for Searchable Partial Sums with optimal worst-case performance. Theor. Comput. Sci., 412(39):5176-5186, 2011. Google Scholar
  10. Artur Jez. A really simple approximation of smallest grammar. Theor. Comput. Sci., 616:141-150, 2016. Google Scholar
  11. Juha Kärkkäinen and Peter Sanders. Simple Linear Work Suffix Array Construction. In Proceedings of ICALP, pages 943-955, 2003. Google Scholar
  12. Dominik Kempa. Optimal Construction of Compressed Indexes for Highly Repetitive Texts. In Proceedings of SODA, pages 1344-1357, 2019. Google Scholar
  13. Abraham Lempel and Jacob Ziv. On the complexity of finite sequences. IEEE Transactions on information theory, 22(1):75-81, 1976. Google Scholar
  14. Udi Manber and Eugene W. Myers. Suffix Arrays: A New Method for On-Line String Searches. SIAM J. Comput., 22(5):935-948, 1993. Google Scholar
  15. Alberto Policriti and Nicola Prezza. LZ77 computation based on the run-length encoded BWT. Algorithmica, 80(7):1986-2011, 2018. Google Scholar
  16. Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003. Google Scholar
  17. Kensuke Sakai, Tatsuya Ohno, Keisuke Goto, Yoshimasa Takabatake, Tomohiro I, and Hiroshi Sakamoto. RePair in Compressed Space and Time. CoRR, abs/1811.01472, 2018. Google Scholar
  18. Dan E. Willard. Log-Logarithmic Worst-Case Range Queries are Possible in Space Theta(N). Inf. Process. Lett., 17(2):81-84, 1983. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail