Small-Space LCE Data Structure with Constant-Time Queries

Tanimura, Yuka; Nishimoto, Takaaki; Bannai, Hideo; Inenaga, Shunsuke; Takeda, Masayuki

doi:10.4230/LIPIcs.MFCS.2017.10

Abstract

The longest common extension (LCE) problem is to preprocess a given string w of length n so that the length of the longest common prefix between suffixes of w that start at any two given positions is answered quickly. In this paper, we present a data structure of O(z \tau^2 + \frac{n}{\tau}) words of space which answers LCE queries in O(1) time and can be built in O(n \log \sigma) time, where 1 \leq \tau \leq \sqrt{n} is a parameter, z is the size of the Lempel-Ziv 77 factorization of w and \sigma is the alphabet size. The proposed LCE data structure not access the input string w when answering queries, and thus w can be deleted after preprocessing. On top of this main result, we obtain further results using (variants of) our LCE data structure, which include the following:
- For highly repetitive strings where the z\tau^2 term is dominated by \frac{n}{\tau}, we obtain a constant-time and sub-linear space LCE query data structure.
- Even when the input string is not well compressible via Lempel-Ziv 77 factorization, we still can obtain a constant-time and sub-linear space LCE data structure for suitable \tau and for \sigma \leq 2^{o(\log n)}.
- The time-space trade-off lower bounds for the LCE problem by Bille et al. [J. Discrete Algorithms, 25:42-50, 2014] and by Kosolobov [CoRR, abs/1611.02891, 2016] do not apply in some cases with our LCE data structure.

Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. A new characterization of maximal repetitions by Lyndon trees. In Proc. SODA 2015, pages 562-571, 2015.
Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl. Computing all distinct squares in linear time for integer alphabets. CoRR, abs/1610.03421, 2016.
Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Proc. Latin 2000, pages 88-94, 2000.
Michael A. Bender and Martin Farach-Colton. The level ancestor problem simplified. Theor. Comput. Sci., 321(1):5-12, 2004.
Omer Berkman and Uzi Vishkin. Finding level-ancestors in trees. Journal of Computer and System Sciences, 48(2):214-230, 1994.
Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, and Inge Li Gørtz. Finger search in grammar-compressed strings. In Proc. FSTTCS 2016, pages 36:1-36:16, 2016.
Philip Bille, Inge Li Gørtz, Patrick Hagge Cording, Benjamin Sach, Hjalte Wedel Vildhøj, and Søren Vind. Fingerprints in compressed strings. J. Comput. Syst. Sci., 86:171-180, 2017.
Philip Bille, Inge Li Gørtz, Mathias Bæk Tejs Knudsen, Moshe Lewenstein, and Hjalte Wedel Vildhøj. Longest common extensions in sublinear space. In Proc. CPM 2015, pages 65-76, 2015.
Philip Bille, Inge Li Gørtz, Benjamin Sach, and Hjalte Wedel Vildhøj. Time-space trade-offs for longest common extensions. J. Discrete Algorithms, 25:42-50, 2014.
Gerth Stølting Brodal, Pooya Davoodi, and S. Srinivasa Rao. On space efficient two dimensional range minimum data structures. Algorithmica, 63(4):815-830, 2012.
Gerth Stølting Brodal, Rune B. Lyngsø, Christian N. S. Pedersen, and Jens Stoye. Finding maximal pairs with bounded gap. In Proc. CPM 1999, pages 134-149, 1999.
Stefan Burkhardt and Juha Kärkkäinen. Fast lightweight suffix array construction and checking. In Proc. CPM 2003, pages 55-69, 2003.
Bastien Cazaux, Thierry Lecroq, and Eric Rivals. Construction of a de Bruijn graph for assembly from a truncated suffix tree. In LATA 2015, pages 109-120, 2015.
Maxime Crochemore, Roman Kolpakov, and Gregory Kucherov. Optimal bounds for computing α-gapped repeats. In Proc. LATA 2016, pages 245-255, 2016.
Johannes Fischer, Travis Gagie, Pawel Gawrychowski, and Tomasz Kociumaka. Approximating LZ77 via small-space multiple-pattern matching. CoRR, abs/1504.06647, 2015.
Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci., 47(3):424-436, 1993.
Zvi Galil and Raffaele Giancarlo. Improved string matching with k mismatches. ACM SIGACT News, 17:52-54, 1986.
Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea. Efficiently finding all maximal α-gapped repeats. In Proc. STACS 2016, pages 39:1-39:14, 2016.
Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, and Tomasz Walen. Faster longest common extension queries in strings over general alphabets. In Proc. CPM 2016, pages 5:1-5:13, 2016.
Sara Geizhals and Dina Sokol. Finding maximal 2-dimensional palindromes. In Proc. CPM 2016, pages 19:1-19:12, 2016.
Dan Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.
Dan Gusfield and Jens Stoye. Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci., 69(4):525-546, 2004.
Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 13(2):338-355, 1984.
Tomohiro I. Longest common extensions with recompression. In Proc. CPM 2017, 2017. To appear.
Shunsuke Inenaga. A faster longest common extension algorithm on compressed strings and its applications. In Proc. PSC 2015, pages 1-4, 2015.
Juha Kärkkäinen. Repetition-based text indexes. Ph.D. thesis, University of Helsinki, Department of Computer Science, 1999.
Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. J. ACM, 53(6):918-936, 2006.
Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proc. CPM 2001, pages 181-192, 2001.
Roman Kolpakov and Gregory Kucherov. Searching for gapped palindromes. Theor. Comput. Sci., 410(51):5365-5373, 2009.
Roman M. Kolpakov and Gregory Kucherov. Finding maximal repetitions in a word in linear time. In Proc. FOCS 1999, pages 596-604, 1999.
Roman M. Kolpakov and Gregory Kucherov. Finding repeats with fixed gap. In Proc. SPIRE 2000, pages 162-168, 2000.
Dominik Köppl and Kunihiko Sadakane. Lempel-Ziv computation in compressed space (LZ-CICS). In Proc. DCC 2016, pages 3-12, 2016.
Dmitry Kosolobov. Tight lower bounds for the longest common extension problem. CoRR, abs/1611.02891, 2016.
Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM J. Comput., 27(2):557-582, 1998.
Gad M. Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theor. Comput. Sci., 43:239-249, 1986.
Mamoru Maekawa. A square root N algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst., 3(2):145-159, 1985.
Udi Manber and Gene Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993.
Joong Chae Na, Alberto Apostolico, Costas S. Iliopoulos, and Kunsoo Park. Truncated suffix trees and their application to data compression. Theor. Comput. Sci., 1-3(304):87-101, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(03)00053-7.
Shintaro Narisada, Diptarama, Kazuyuki Narisawa, Shunsuke Inenaga, and Ayumi Shinohara. Computing longest single-arm-gapped palindromes in a string. In Proc. SOFSEM 2017, pages 375-386, 2017.
Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Dynamic index and LZ factorization in compressed space. In Proc. PSC 2016, pages 158-170, 2016.
Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully dynamic data structure for LCE queries in compressed space. In MFCS 2016, pages 72:1-72:15, 2016.
Nicola Prezza. In-place longest common extensions. CoRR, abs/1608.05100, 2016.
Simon J. Puglisi and Andrew Turpin. Space-time tradeoffs for longest-common-prefix array computation. In Proc. ISAAC 2008, pages 124-135, 2008.
Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003.
Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda. Deterministic sub-linear space LCE data structures with efficient construction. In Proc. CPM 2016, pages 1:1-1:10, 2016.
Luciana Vitale, Alvaro Martín, and Gadiel Seroussi. Space-efficient representation of truncated suffix trees, with applications to Markov order estimation. Theor. Comput. Sci., 595:34-45, 2015.
P. Weiner. Linear pattern-matching algorithms. In Proc. of 14th IEEE Ann. Symp. on Switching and Automata Theory, pages 1-11, 1973.
Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Information Theory, 23(3):337-343, 1977.

Small-Space LCE Data Structure with Constant-Time Queries

Authors Yuka Tanimura, Takaaki Nishimoto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message