Fully-online Construction of Suffix Trees for Multiple Texts

Authors Takuya Takagi, Shunsuke Inenaga, Hiroki Arimura



PDF
Thumbnail PDF

File

LIPIcs.CPM.2016.22.pdf
  • Filesize: 0.55 MB
  • 13 pages

Document Identifiers

Author Details

Takuya Takagi
Shunsuke Inenaga
Hiroki Arimura

Cite As Get BibTex

Takuya Takagi, Shunsuke Inenaga, and Hiroki Arimura. Fully-online Construction of Suffix Trees for Multiple Texts. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 22:1-22:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.CPM.2016.22

Abstract

We consider fully-online construction of indexing data structures for multiple texts. Let T = {T_1, ..., T_K} be a collection of texts. By fully-online, we mean that a new character can be appended to any text in T at any time. This is a natural generalization of semi-online construction of indexing data structures for multiple texts in which, after a new character is appended to the kth text T_k, then its previous texts T_1, ..., T_k-1 will remain static. Our fully-online scenario arises when we maintain dynamic indexes for multi-sensor data. Let N and sigma denote the total length of texts in T and the alphabet size, respectively. We first show that the algorithm by Blumer et al. [Theoretical Computer Science, 40:31-55, 1985] to construct the directed acyclic word graph (DAWG) for T can readily be extended to our fully-online setting, retaining O(N log sigma)-time and O(N)-space complexities. Then, we give a sophisticated fully-online algorithm which constructs the suffix tree for T in O(N log sigma) time and O(N) space. A key idea of this algorithm is synchronized maintenance of the DAWG and the suffix tree.

Subject Classification

Keywords
  • suffix trees
  • DAWGs
  • multiple texts
  • online algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Stephen Alstrup and Jacob Holm. Improved algorithms for finding level ancestors in dynamic trees. In ICALP 2000, pages 73-84, 2000. Google Scholar
  2. Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In PODS 2002, pages 1-16, 2002. Google Scholar
  3. Anselm Blumer, Janet Blumer, David Haussler, Andrzej Ehrenfeucht, M. T. Chen, and Joel Seiferas. The smallest automaton recognizing the subwords of a text. TCS, 40:31-55, 1985. Google Scholar
  4. Anselm Blumer, Janet Blumer, David Haussler, Ross Mcconnell, and Andrzej Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis. J. ACM, 34(3):578-595, 1987. Google Scholar
  5. Dany Breslauer and Giuseppe F. Italiano. Near real-time suffix tree construction via the fringe marked ancestor problem. J. Discrete Algorithms, 18:32-48, 2013. Google Scholar
  6. Ho-Leung Chan, Wing-Kai Hon, Tak Wah Lam, and Kunihiko Sadakane. Compressed indexes for dynamic text collections. ACM Transactions on Algorithms, 3(2), 2007. Google Scholar
  7. Richard Cole and Ramesh Hariharan. Dynamic LCA queries on trees. In SODA 1999, pages 235-244, 1999. Google Scholar
  8. Paolo Ferragina and Roberto Grossi. Improved dynamic text indexing. J. Algorithms, 31(2):291-319, 1999. Google Scholar
  9. Johannes Fischer and Pawel Gawrychowski. Alphabet-dependent string searching with wexponential search trees. CoRR, abs/1302.3347, 2013. Google Scholar
  10. Johannes Fischer and Pawel Gawrychowski. Alphabet-dependent string searching with wexponential search trees. In CPM 2015, pages 160-171, 2015. Google Scholar
  11. Eamonn J. Keogh, Stefano Lonardi, and Bill Yuan-chi Chiu. Finding surprising patterns in a time series database in linear time and space. In KDD 2002, pages 550-556, 2002. Google Scholar
  12. Takuya Takagi, Shunsuke Inenaga, and Hiroki Arimura. Fully-online construction of suffix trees and dawgs for multiple texts. CoRR, abs/1507.07622, 2015. Google Scholar
  13. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. Google Scholar
  14. Yilun Wang, Yu Zheng, and Yexiang Xue. Travel time estimation of a path using sparse trajectories. In KDD 2014, pages 25-34, 2014. Google Scholar
  15. Peter Weiner. Linear pattern-matching algorithms. In Proc. of 14th IEEE Ann. Symp. on Switching and Automata Theory, pages 1-11, 1973. Google Scholar
  16. Jeffery Westbrook. Fast incremental planarity testing. In ICALP 1992, pages 342-353, 1992. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail