Fully Dynamic Data Structure for LCE Queries in Compressed Space

Nishimoto, Takaaki; I, Tomohiro; Inenaga, Shunsuke; Bannai, Hideo; Takeda, Masayuki

doi:10.4230/LIPIcs.MFCS.2016.72

File

Author Details

Takaaki Nishimoto

Tomohiro I

Shunsuke Inenaga

Hideo Bannai

Masayuki Takeda

Cite AsGet BibTex

Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully Dynamic Data Structure for LCE Queries in Compressed Space. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 72:1-72:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.MFCS.2016.72

Abstract

A Longest Common Extension (LCE) query on a text T of length N asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding G of size w = O(min(z log N log^* M, N)) [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of T, which can be seen as a compressed representation of T, has a capability to support LCE queries in O(log N + log ell log^* M) time, where ell is the answer to the query, z is the size of the Lempel-Ziv77 (LZ77) factorization of T, and M >= 4N is an integer that can be handled in constant time under word RAM model. In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover, G can be enhanced to support efficient update operations: After processing G in O(w f_A) time, we can insert/delete any (sub)string of length y into/from an arbitrary position of T in O((y + log Nlog^* M) f_A) time, where f_A = O(min{ (loglog M loglog w)/(logloglog M), sqrt(log w/loglog w)}). This yields the first fully dynamic LCE data structure working in compressed space. We also present efficient construction algorithms from various types of inputs: We can construct G in O(N f_A) time from uncompressed string T; in O(n loglog (n log^* M) log N log^* M) time from grammar-compressed string T represented by a straight-line program of size n; and in O(z f_A log N log^* M) time from LZ77-compressed string T with z factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing.

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. Dynamic pattern matching. Technical report, Department of Computer Science, University of Copenhagen, 1998.
Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. Pattern matching in dynamic texts. In Proc. SODA 2000, pages 819-828, 2000.
Paul Beame and Faith E. Fich. Optimal bounds for the predecessor problem and related problems. J. Comput. Syst. Sci., 65(1):38-72, 2002. URL: http://dx.doi.org/10.1006/jcss.2002.1822.
M. A. Bender, M. Farach-Colton, G. Pemmasani, S. Skiena, and P. Sumazin. Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms, 57(2):75-94, 2005.
P. Bille, P. H. Cording, I. L. Gørtz, B. Sach, H. W. Vildhøj, and Søren Vind. Fingerprints in compressed strings. In Proc. WADS 2013, pages 146-157, 2013.
Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, and Inge Li Gørtz. Finger search, random access, and longest common extensions in grammar-compressed strings. CoRR, abs/1507.02853, 2015. URL: http://arxiv.org/abs/1507.02853.
Philip Bille, Inge Li Gørtz, Mathias Bæk Tejs Knudsen, Moshe Lewenstein, and Hjalte Wedel Vildhøj. Longest common extensions in sublinear space. In Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro, editors, Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Ischia Island, Italy, June 29 - July 1, 2015, Proceedings, volume 9133 of Lecture Notes in Computer Science, pages 65-76. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_6.
Francisco Claude and Gonzalo Navarro. Self-indexed grammar-based compression. Fundamenta Informaticae, 111(3):313-337, 2011.
Johannes Fischer, Tomohiro I, and Dominik Köppl. Deterministic sparse suffix sorting on rewritable texts. In LATIN 2016: Theoretical Informatics - 12th Latin American Symposium, Ensenada, Mexico, April 11-15, 2016, Proceedings, pages 483-496, 2016.
Pawel Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Lacki, and Piotr Sankowski. Optimal dynamic strings. CoRR, abs/1511.02612, 2015. URL: http://arxiv.org/abs/1511.02612.
Dan Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.
Yijie Han. Deterministic sorting in O (n log log n) time and linear space. Proc. STOC 2002, pages 602-608, 2002.
Tomohiro I, Wataru Matsubara, Kouji Shimohira, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Kazuyuki Narisawa, and Ayumi Shinohara. Detecting regularities on grammar-compressed strings. Inf. Comput., 240:74-89, 2015.
Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Lyndon factorization algorithms for SLP and LZ78 compressed text. Theoretical Computer Science, 2016. in press. URL: http://dx.doi.org/10.1016/j.tcs.2016.03.005.
Tomohiro I, Takaaki Nishimoto, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Compressed automata for dictionary matching. Theor. Comput. Sci., 578:30-41, 2015. URL: http://dx.doi.org/10.1016/j.tcs.2015.01.019.
Yury Lifshits. Processing compressed texts: A tractability border. In Proc. CPM 2007, volume 4580 of LNCS, pages 228-240, 2007.
S. Maruyama, M. Nakahara, N. Kishiue, and H. Sakamoto. ESP-index: A compressed index based on edit-sensitive parsing. J. Discrete Algorithms, 18:100-112, 2013.
W. Matsubara, S. Inenaga, A. Ishino, A. Shinohara, T. Nakamura, and K. Hashimoto. Efficient algorithms to compute compressed longest common substrings and compressed palindromes. Theor. Comput. Sci., 410(8-10):900-913, 2009.
Kurt Mehlhorn, R. Sundar, and Christian Uhrig. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica, 17(2):183-198, 1997.
M. Miyazaki, A. Shinohara, and M. Takeda. An improved pattern matching algorithm for strings in terms of straight-line programs. In Proc. CPM 1997, pages 1-11, 1997.
Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully dynamic data structure for LCE queries in compressed space. CoRR, abs/1605.01488, 2016. URL: http://arxiv.org/abs/1605.01488.
Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science, 302(1-3):211-222, 2003.
S. C Sahinalp and Uzi Vishkin. Data compression using locally consistent parsing. TechnicM report, University of Maryland Department of Computer Science, 1995.
Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda. Deterministic sub-linear space LCE data structures with efficient construction. In Proc. CPM 2016, 2016. to appear.
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, IT-23(3):337-349, 1977.

Fully Dynamic Data Structure for LCE Queries in Compressed Space

Authors Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References

Thanks for your feedback!

Could not send message