Online Algorithms for Constructing Linear-Size Suffix Trie

Authors Diptarama Hendrian, Takuya Takagi, Shunsuke Inenaga



PDF
Thumbnail PDF

File

LIPIcs.CPM.2019.30.pdf
  • Filesize: 1.11 MB
  • 19 pages

Document Identifiers

Author Details

Diptarama Hendrian
  • Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Takuya Takagi
  • Fujitsu Laboratories Ltd., Kawasaki, Japan
Shunsuke Inenaga
  • Department of Informatics, Kyushu University, Fukuoka, Japan

Acknowledgements

The authors thank Keisuke Goto and Mitsuru Funakoshi for discussions. The authors are also grateful for the anonymous referees for fruitful suggestions.

Cite As Get BibTex

Diptarama Hendrian, Takuya Takagi, and Shunsuke Inenaga. Online Algorithms for Constructing Linear-Size Suffix Trie. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 30:1-30:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.CPM.2019.30

Abstract

The suffix trees are fundamental data structures for various kinds of string processing. The suffix tree of a string T of length n has O(n) nodes and edges, and the string label of each edge is encoded by a pair of positions in T. Thus, even after the tree is built, the input text T needs to be kept stored and random access to T is still needed. The linear-size suffix tries (LSTs), proposed by Crochemore et al. [Linear-size suffix tries, TCS 638:171-178, 2016], are a "stand-alone" alternative to the suffix trees. Namely, the LST of a string T of length n occupies O(n) total space, and supports pattern matching and other tasks in the same efficiency as the suffix tree without the need to store the input text T. Crochemore et al. proposed an offline algorithm which transforms the suffix tree of T into the LST of T in O(n log sigma) time and O(n) space, where sigma is the alphabet size. In this paper, we present two types of online algorithms which "directly" construct the LST, from right to left, and from left to right, without constructing the suffix tree as an intermediate structure. Both algorithms construct the LST incrementally when a new symbol is read, and do not access to the previously read symbols. The right-to-left construction algorithm works in O(n log sigma) time and O(n) space and the left-to-right construction algorithm works in O(n (log sigma + log n / log log n)) time and O(n) space. The main feature of our algorithms is that the input text does not need to be stored.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data structures design and analysis
  • Theory of computation → Pattern matching
Keywords
  • Indexing structure
  • Linear-size suffix trie
  • Online algorithm
  • Pattern Matching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Stephen Alstrup, Thore Husfeldt, and Theis Rauhe. Marked Ancestor Problems. In Proc. FOCS 1998, pages 534-544, 1998. URL: http://dx.doi.org/10.1109/SFCS.1998.743504.
  2. Djamal Belazzougui and Fabio Cunial. Fast Label Extraction in the CDAWG. In Proc. SPIRE 2017, pages 161-175, 2017. URL: http://dx.doi.org/10.1007/978-3-319-67428-5_14.
  3. Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. Random Access to Grammar-Compressed Strings and Trees. SIAM J. Comput., 44(3):513-539, 2015. URL: http://dx.doi.org/10.1137/130936889.
  4. Anselm Blumer, J. Blumer, David Haussler, Andrzej Ehrenfeucht, M.T. Chen, and Joel Seiferas. The smallest automation recognizing the subwords of a text. Theoretical Computer Science, 40:31-55, 1985. URL: http://dx.doi.org/10.1016/0304-3975(85)90157-4.
  5. Anselm Blumer, J. Blumer, David Haussler, Ross McConnell, and Andrzej Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis. Journal of the ACM, 34(3):578-595, 1987. URL: http://dx.doi.org/10.1145/28869.28873.
  6. Dany Breslauer and Giuseppe F. Italiano. Near real-time suffix tree construction via the fringe marked ancestor problem. J. Discrete Algorithms, 18:32-48, 2013. URL: http://dx.doi.org/10.1016/j.jda.2012.07.003.
  7. Maxime Crochemore, Chiara Epifanio, Roberto Grossi, and Filippo Mignosi. Linear-size suffix tries. Theoretical Computer Science, 638:171-178, 2016. URL: http://dx.doi.org/10.1016/j.tcs.2016.04.002.
  8. Maxime Crochemore and Renaud Vérin. Direct construction of compact directed acyclic word graphs. In Combinatorial Pattern Matching, pages 116-129, 1997. URL: http://dx.doi.org/10.1007/3-540-63220-4_55.
  9. Maxime Crochemore and Renaud Vérin. On compact directed acyclic word graphs. In Structures in Logic and Computer Science: A Selection of Essays in Honor of A. Ehrenfeucht, pages 192-211. Springer Berlin Heidelberg, 1997. URL: http://dx.doi.org/10.1007/3-540-63246-8_12.
  10. Andrzej Ehrenfeucht, Ross M. McConnell, Nissa Osheim, and Sung-Whan Woo. Position heaps: A simple and dynamic text indexing data structure. Journal of Discrete Algorithms, 9(1):100-121, 2011. URL: http://dx.doi.org/10.1016/j.jda.2010.12.001.
  11. Martin Farach-Colton, Paolo Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987-1011, 2000. URL: http://dx.doi.org/10.1145/355541.355547.
  12. Johannes Fischer and Pawel Gawrychowski. Alphabet-Dependent String Searching with Wexponential Search Trees. In Proc. CPM 2015, pages 160-171, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_14.
  13. Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets. In MFCS 2016, pages 38:1-38:14, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.MFCS.2016.38.
  14. Leszek Gasieniec, Roman M. Kolpakov, Igor Potapov, and Paul Sant. Real-Time Traversal in Grammar-Based Compressed Files. In Proc. DCC 2005, page 458, 2005. URL: http://dx.doi.org/10.1109/DCC.2005.78.
  15. Pawel Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Lacki, and Piotr Sankowski. Optimal Dynamic Strings. In Proc. SODA 2018, pages 1509-1528, 2018. URL: http://dx.doi.org/10.1137/1.9781611975031.99.
  16. Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Lyndon factorization algorithms for SLP and LZ78 compressed text. Theor. Comput. Sci., 656:215-224, 2016. URL: http://dx.doi.org/10.1016/j.tcs.2016.03.005.
  17. Shunsuke Inenaga, Hiromasa Hoshino, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa, Giancarlo Mauri, and Giulio Pavesi. On-line construction of compact directed acyclic word graphs. Discrete Applied Mathematics, 146(2):156-179, 2005. URL: http://dx.doi.org/10.1016/j.dam.2004.04.012.
  18. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. J. ACM, 53(6):918-936, 2006. URL: http://dx.doi.org/10.1145/1217856.1217858.
  19. Gregory Kucherov. On-line construction of position heaps. Journal of Discrete Algorithms, 20:3-11, 2013. URL: http://dx.doi.org/10.1016/j.jda.2012.08.002.
  20. Udi Manber and Gene Myers. Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing, 22(5):935-948, 1993. URL: http://dx.doi.org/10.1137/0222058.
  21. Kazuyuki Narisawa, Hideharu Hiratsuka, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Efficient Computation of Substring Equivalence Classes with Suffix Arrays. Algorithmica, 79(2):291-318, 2017. URL: http://dx.doi.org/10.1007/s00453-016-0178-z.
  22. Takuya Takagi, Keisuke Goto, Yuta Fujishige, Shunsuke Inenaga, and Hiroki Arimura. Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression. In SPIRE 2017, volume 10508, pages 304-316, 2017. URL: http://dx.doi.org/10.1007/978-3-319-67428-5_26.
  23. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. URL: http://dx.doi.org/10.1007/BF01206331.
  24. Peter Weiner. Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory (SWAT 1973), pages 1-11. IEEE, 1973. URL: http://dx.doi.org/10.1109/SWAT.1973.13.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail