Small-Space LCE Data Structure with Constant-Time Queries

Authors Yuka Tanimura, Takaaki Nishimoto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda



PDF
Thumbnail PDF

File

LIPIcs.MFCS.2017.10.pdf
  • Filesize: 0.68 MB
  • 15 pages

Document Identifiers

Author Details

Yuka Tanimura
Takaaki Nishimoto
Hideo Bannai
Shunsuke Inenaga
Masayuki Takeda

Cite As Get BibTex

Yuka Tanimura, Takaaki Nishimoto, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Small-Space LCE Data Structure with Constant-Time Queries. In 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 83, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.MFCS.2017.10

Abstract

The longest common extension (LCE) problem is to preprocess a given string w of length n so that the length of the longest common prefix between suffixes of w that start at any two given positions is answered quickly. In this paper, we present a data structure of O(z \tau^2 + \frac{n}{\tau}) words of space which answers LCE queries in O(1) time and can be built in O(n \log \sigma) time, where 1 \leq \tau \leq \sqrt{n} is a parameter, z is the size of the Lempel-Ziv 77 factorization of w and \sigma is the alphabet size. The proposed LCE data structure not access the input string w when answering queries, and thus w can be deleted after preprocessing. On top of this main result, we obtain further results using (variants of) our LCE data structure, which include the following:
- For highly repetitive strings where the z\tau^2 term is dominated by \frac{n}{\tau}, we obtain a constant-time and sub-linear space LCE query data structure.
- Even when the input string is not well compressible via Lempel-Ziv 77 factorization, we still can obtain a constant-time and sub-linear space LCE data structure for suitable \tau and for \sigma \leq 2^{o(\log n)}.
- The time-space trade-off lower bounds for the LCE problem by Bille et al. [J. Discrete Algorithms, 25:42-50, 2014] and by Kosolobov [CoRR, abs/1611.02891, 2016] do not apply in some cases with our LCE data structure.

Subject Classification

Keywords
  • longest common extension
  • truncated suffix trees
  • t-covers

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. A new characterization of maximal repetitions by Lyndon trees. In Proc. SODA 2015, pages 562-571, 2015. Google Scholar
  2. Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl. Computing all distinct squares in linear time for integer alphabets. CoRR, abs/1610.03421, 2016. Google Scholar
  3. Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Proc. Latin 2000, pages 88-94, 2000. Google Scholar
  4. Michael A. Bender and Martin Farach-Colton. The level ancestor problem simplified. Theor. Comput. Sci., 321(1):5-12, 2004. Google Scholar
  5. Omer Berkman and Uzi Vishkin. Finding level-ancestors in trees. Journal of Computer and System Sciences, 48(2):214-230, 1994. Google Scholar
  6. Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, and Inge Li Gørtz. Finger search in grammar-compressed strings. In Proc. FSTTCS 2016, pages 36:1-36:16, 2016. Google Scholar
  7. Philip Bille, Inge Li Gørtz, Patrick Hagge Cording, Benjamin Sach, Hjalte Wedel Vildhøj, and Søren Vind. Fingerprints in compressed strings. J. Comput. Syst. Sci., 86:171-180, 2017. Google Scholar
  8. Philip Bille, Inge Li Gørtz, Mathias Bæk Tejs Knudsen, Moshe Lewenstein, and Hjalte Wedel Vildhøj. Longest common extensions in sublinear space. In Proc. CPM 2015, pages 65-76, 2015. Google Scholar
  9. Philip Bille, Inge Li Gørtz, Benjamin Sach, and Hjalte Wedel Vildhøj. Time-space trade-offs for longest common extensions. J. Discrete Algorithms, 25:42-50, 2014. Google Scholar
  10. Gerth Stølting Brodal, Pooya Davoodi, and S. Srinivasa Rao. On space efficient two dimensional range minimum data structures. Algorithmica, 63(4):815-830, 2012. Google Scholar
  11. Gerth Stølting Brodal, Rune B. Lyngsø, Christian N. S. Pedersen, and Jens Stoye. Finding maximal pairs with bounded gap. In Proc. CPM 1999, pages 134-149, 1999. Google Scholar
  12. Stefan Burkhardt and Juha Kärkkäinen. Fast lightweight suffix array construction and checking. In Proc. CPM 2003, pages 55-69, 2003. Google Scholar
  13. Bastien Cazaux, Thierry Lecroq, and Eric Rivals. Construction of a de Bruijn graph for assembly from a truncated suffix tree. In LATA 2015, pages 109-120, 2015. Google Scholar
  14. Maxime Crochemore, Roman Kolpakov, and Gregory Kucherov. Optimal bounds for computing α-gapped repeats. In Proc. LATA 2016, pages 245-255, 2016. Google Scholar
  15. Johannes Fischer, Travis Gagie, Pawel Gawrychowski, and Tomasz Kociumaka. Approximating LZ77 via small-space multiple-pattern matching. CoRR, abs/1504.06647, 2015. Google Scholar
  16. Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci., 47(3):424-436, 1993. Google Scholar
  17. Zvi Galil and Raffaele Giancarlo. Improved string matching with k mismatches. ACM SIGACT News, 17:52-54, 1986. Google Scholar
  18. Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea. Efficiently finding all maximal α-gapped repeats. In Proc. STACS 2016, pages 39:1-39:14, 2016. Google Scholar
  19. Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, and Tomasz Walen. Faster longest common extension queries in strings over general alphabets. In Proc. CPM 2016, pages 5:1-5:13, 2016. Google Scholar
  20. Sara Geizhals and Dina Sokol. Finding maximal 2-dimensional palindromes. In Proc. CPM 2016, pages 19:1-19:12, 2016. Google Scholar
  21. Dan Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997. Google Scholar
  22. Dan Gusfield and Jens Stoye. Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci., 69(4):525-546, 2004. Google Scholar
  23. Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 13(2):338-355, 1984. Google Scholar
  24. Tomohiro I. Longest common extensions with recompression. In Proc. CPM 2017, 2017. To appear. Google Scholar
  25. Shunsuke Inenaga. A faster longest common extension algorithm on compressed strings and its applications. In Proc. PSC 2015, pages 1-4, 2015. Google Scholar
  26. Juha Kärkkäinen. Repetition-based text indexes. Ph.D. thesis, University of Helsinki, Department of Computer Science, 1999. Google Scholar
  27. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. J. ACM, 53(6):918-936, 2006. Google Scholar
  28. Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proc. CPM 2001, pages 181-192, 2001. Google Scholar
  29. Roman Kolpakov and Gregory Kucherov. Searching for gapped palindromes. Theor. Comput. Sci., 410(51):5365-5373, 2009. Google Scholar
  30. Roman M. Kolpakov and Gregory Kucherov. Finding maximal repetitions in a word in linear time. In Proc. FOCS 1999, pages 596-604, 1999. Google Scholar
  31. Roman M. Kolpakov and Gregory Kucherov. Finding repeats with fixed gap. In Proc. SPIRE 2000, pages 162-168, 2000. Google Scholar
  32. Dominik Köppl and Kunihiko Sadakane. Lempel-Ziv computation in compressed space (LZ-CICS). In Proc. DCC 2016, pages 3-12, 2016. Google Scholar
  33. Dmitry Kosolobov. Tight lower bounds for the longest common extension problem. CoRR, abs/1611.02891, 2016. Google Scholar
  34. Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM J. Comput., 27(2):557-582, 1998. Google Scholar
  35. Gad M. Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theor. Comput. Sci., 43:239-249, 1986. Google Scholar
  36. Mamoru Maekawa. A square root N algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst., 3(2):145-159, 1985. Google Scholar
  37. Udi Manber and Gene Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993. Google Scholar
  38. Joong Chae Na, Alberto Apostolico, Costas S. Iliopoulos, and Kunsoo Park. Truncated suffix trees and their application to data compression. Theor. Comput. Sci., 1-3(304):87-101, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(03)00053-7.
  39. Shintaro Narisada, Diptarama, Kazuyuki Narisawa, Shunsuke Inenaga, and Ayumi Shinohara. Computing longest single-arm-gapped palindromes in a string. In Proc. SOFSEM 2017, pages 375-386, 2017. Google Scholar
  40. Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Dynamic index and LZ factorization in compressed space. In Proc. PSC 2016, pages 158-170, 2016. Google Scholar
  41. Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully dynamic data structure for LCE queries in compressed space. In MFCS 2016, pages 72:1-72:15, 2016. Google Scholar
  42. Nicola Prezza. In-place longest common extensions. CoRR, abs/1608.05100, 2016. Google Scholar
  43. Simon J. Puglisi and Andrew Turpin. Space-time tradeoffs for longest-common-prefix array computation. In Proc. ISAAC 2008, pages 124-135, 2008. Google Scholar
  44. Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003. Google Scholar
  45. Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda. Deterministic sub-linear space LCE data structures with efficient construction. In Proc. CPM 2016, pages 1:1-1:10, 2016. Google Scholar
  46. Luciana Vitale, Alvaro Martín, and Gadiel Seroussi. Space-efficient representation of truncated suffix trees, with applications to Markov order estimation. Theor. Comput. Sci., 595:34-45, 2015. Google Scholar
  47. P. Weiner. Linear pattern-matching algorithms. In Proc. of 14th IEEE Ann. Symp. on Switching and Automata Theory, pages 1-11, 1973. Google Scholar
  48. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Information Theory, 23(3):337-343, 1977. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail