Small-Space LCE Data Structure with Constant-Time Queries

Tanimura, Yuka; Nishimoto, Takaaki; Bannai, Hideo; Inenaga, Shunsuke; Takeda, Masayuki

doi:10.4230/LIPIcs.MFCS.2017.10

Abstract

The longest common extension (LCE) problem is to preprocess a given string w of length n so that the length of the longest common prefix between suffixes of w that start at any two given positions is answered quickly. In this paper, we present a data structure of O(z \tau^2 + \frac{n}{\tau}) words of space which answers LCE queries in O(1) time and can be built in O(n \log \sigma) time, where 1 \leq \tau \leq \sqrt{n} is a parameter, z is the size of the Lempel-Ziv 77 factorization of w and \sigma is the alphabet size. The proposed LCE data structure not access the input string w when answering queries, and thus w can be deleted after preprocessing. On top of this main result, we obtain further results using (variants of) our LCE data structure, which include the following: - For highly repetitive strings where the z\tau^2 term is dominated by \frac{n}{\tau}, we obtain a constant-time and sub-linear space LCE query data structure. - Even when the input string is not well compressible via Lempel-Ziv 77 factorization, we still can obtain a constant-time and sub-linear space LCE data structure for suitable \tau and for \sigma \leq 2^{o(\log n)}. - The time-space trade-off lower bounds for the LCE problem by Bille et al. [J. Discrete Algorithms, 25:42-50, 2014] and by Kosolobov [CoRR, abs/1611.02891, 2016] do not apply in some cases with our LCE data structure.

Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. A new characterization of maximal repetitions by Lyndon trees. In Proc. SODA 2015, pages 562-571, 2015.
Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl. Computing all distinct squares in linear time for integer alphabets. CoRR, abs/1610.03421, 2016.
Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Proc. Latin 2000, pages 88-94, 2000.
Michael A. Bender and Martin Farach-Colton. The level ancestor problem simplified. Theor. Comput. Sci., 321(1):5-12, 2004.
Omer Berkman and Uzi Vishkin. Finding level-ancestors in trees. Journal of Computer and System Sciences, 48(2):214-230, 1994.
Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, and Inge Li Gørtz. Finger search in grammar-compressed strings. In Proc. FSTTCS 2016, pages 36:1-36:16, 2016.
Philip Bille, Inge Li Gørtz, Patrick Hagge Cording, Benjamin Sach, Hjalte Wedel Vildhøj, and Søren Vind. Fingerprints in compressed strings. J. Comput. Syst. Sci., 86:171-180, 2017.
Philip Bille, Inge Li Gørtz, Mathias Bæk Tejs Knudsen, Moshe Lewenstein, and Hjalte Wedel Vildhøj. Longest common extensions in sublinear space. In Proc. CPM 2015, pages 65-76, 2015.
Philip Bille, Inge Li Gørtz, Benjamin Sach, and Hjalte Wedel Vildhøj. Time-space trade-offs for longest common extensions. J. Discrete Algorithms, 25:42-50, 2014.
Gerth Stølting Brodal, Pooya Davoodi, and S. Srinivasa Rao. On space efficient two dimensional range minimum data structures. Algorithmica, 63(4):815-830, 2012.
Gerth Stølting Brodal, Rune B. Lyngsø, Christian N. S. Pedersen, and Jens Stoye. Finding maximal pairs with bounded gap. In Proc. CPM 1999, pages 134-149, 1999.
Stefan Burkhardt and Juha Kärkkäinen. Fast lightweight suffix array construction and checking. In Proc. CPM 2003, pages 55-69, 2003.
Bastien Cazaux, Thierry Lecroq, and Eric Rivals. Construction of a de Bruijn graph for assembly from a truncated suffix tree. In LATA 2015, pages 109-120, 2015.
Maxime Crochemore, Roman Kolpakov, and Gregory Kucherov. Optimal bounds for computing α-gapped repeats. In Proc. LATA 2016, pages 245-255, 2016.
Johannes Fischer, Travis Gagie, Pawel Gawrychowski, and Tomasz Kociumaka. Approximating LZ77 via small-space multiple-pattern matching. CoRR, abs/1504.06647, 2015.
Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci., 47(3):424-436, 1993.
Zvi Galil and Raffaele Giancarlo. Improved string matching with k mismatches. ACM SIGACT News, 17:52-54, 1986.
Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea. Efficiently finding all maximal α-gapped repeats. In Proc. STACS 2016, pages 39:1-39:14, 2016.
Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, and Tomasz Walen. Faster longest common extension queries in strings over general alphabets. In Proc. CPM 2016, pages 5:1-5:13, 2016.
Sara Geizhals and Dina Sokol. Finding maximal 2-dimensional palindromes. In Proc. CPM 2016, pages 19:1-19:12, 2016.
Dan Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.
Dan Gusfield and Jens Stoye. Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci., 69(4):525-546, 2004.
Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 13(2):338-355, 1984.
Tomohiro I. Longest common extensions with recompression. In Proc. CPM 2017, 2017. To appear.
Shunsuke Inenaga. A faster longest common extension algorithm on compressed strings and its applications. In Proc. PSC 2015, pages 1-4, 2015.
Juha Kärkkäinen. Repetition-based text indexes. Ph.D. thesis, University of Helsinki, Department of Computer Science, 1999.
Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. J. ACM, 53(6):918-936, 2006.
Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proc. CPM 2001, pages 181-192, 2001.
Roman Kolpakov and Gregory Kucherov. Searching for gapped palindromes. Theor. Comput. Sci., 410(51):5365-5373, 2009.
Roman M. Kolpakov and Gregory Kucherov. Finding maximal repetitions in a word in linear time. In Proc. FOCS 1999, pages 596-604, 1999.
Roman M. Kolpakov and Gregory Kucherov. Finding repeats with fixed gap. In Proc. SPIRE 2000, pages 162-168, 2000.
Dominik Köppl and Kunihiko Sadakane. Lempel-Ziv computation in compressed space (LZ-CICS). In Proc. DCC 2016, pages 3-12, 2016.
Dmitry Kosolobov. Tight lower bounds for the longest common extension problem. CoRR, abs/1611.02891, 2016.
Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM J. Comput., 27(2):557-582, 1998.
Gad M. Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theor. Comput. Sci., 43:239-249, 1986.
Mamoru Maekawa. A square root N algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst., 3(2):145-159, 1985.
Udi Manber and Gene Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993.
Joong Chae Na, Alberto Apostolico, Costas S. Iliopoulos, and Kunsoo Park. Truncated suffix trees and their application to data compression. Theor. Comput. Sci., 1-3(304):87-101, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(03)00053-7.
Shintaro Narisada, Diptarama, Kazuyuki Narisawa, Shunsuke Inenaga, and Ayumi Shinohara. Computing longest single-arm-gapped palindromes in a string. In Proc. SOFSEM 2017, pages 375-386, 2017.
Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Dynamic index and LZ factorization in compressed space. In Proc. PSC 2016, pages 158-170, 2016.
Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully dynamic data structure for LCE queries in compressed space. In MFCS 2016, pages 72:1-72:15, 2016.
Nicola Prezza. In-place longest common extensions. CoRR, abs/1608.05100, 2016.
Simon J. Puglisi and Andrew Turpin. Space-time tradeoffs for longest-common-prefix array computation. In Proc. ISAAC 2008, pages 124-135, 2008.
Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003.
Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda. Deterministic sub-linear space LCE data structures with efficient construction. In Proc. CPM 2016, pages 1:1-1:10, 2016.
Luciana Vitale, Alvaro Martín, and Gadiel Seroussi. Space-efficient representation of truncated suffix trees, with applications to Markov order estimation. Theor. Comput. Sci., 595:34-45, 2015.
P. Weiner. Linear pattern-matching algorithms. In Proc. of 14th IEEE Ann. Symp. on Switching and Automata Theory, pages 1-11, 1973.
Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Information Theory, 23(3):337-343, 1977.

Small-Space LCE Data Structure with Constant-Time Queries

Authors Yuka Tanimura, Takaaki Nishimoto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References