Weighted Ancestors in Suffix Trees Revisited

Authors Djamal Belazzougui, Dmitry Kosolobov , Simon J. Puglisi , Rajeev Raman



PDF
Thumbnail PDF

File

LIPIcs.CPM.2021.8.pdf
  • Filesize: 0.95 MB
  • 15 pages

Document Identifiers

Author Details

Djamal Belazzougui
  • Centre de Recherche sur l'Information Scientifique et Technique, Algiers, Algeria
Dmitry Kosolobov
  • Institute of Natural Sciences and Mathematics, Ural Federal University, Ekaterinburg, Russia
Simon J. Puglisi
  • Department of Computer Science, University of Helsinki, Helsinki, Finland
Rajeev Raman
  • Department of Informatics, University of Leicester, Leicester, United Kingdom

Cite AsGet BibTex

Djamal Belazzougui, Dmitry Kosolobov, Simon J. Puglisi, and Rajeev Raman. Weighted Ancestors in Suffix Trees Revisited. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 8:1-8:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.CPM.2021.8

Abstract

The weighted ancestor problem is a well-known generalization of the predecessor problem to trees. It is known to require Ω(log log n) time for queries provided 𝒪(n polylog n) space is available and weights are from [0..n], where n is the number of tree nodes. However, when applied to suffix trees, the problem, surprisingly, admits an 𝒪(n)-space solution with constant query time, as was shown by Gawrychowski, Lewenstein, and Nicholson (Proc. ESA 2014). This variant of the problem can be reformulated as follows: given the suffix tree of a string s, we need a data structure that can locate in the tree any substring s[p..q] of s in 𝒪(1) time (as if one descended from the root reading s[p..q] along the way). Unfortunately, the data structure of Gawrychowski et al. has no efficient construction algorithm, limiting its wider usage as an algorithmic tool. In this paper we resolve this issue, describing a data structure for weighted ancestors in suffix trees with constant query time and a linear construction algorithm. Our solution is based on a novel approach using so-called irreducible LCP values.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • suffix tree
  • weighted ancestors
  • irreducible LCP
  • deterministic substring hashing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. A. Amir, G.M. Landau, M. Lewenstein, and D. Sokol. Dynamic text and static pattern matching. ACM Transactions on Algorithms (TALG), 3(2):19-31, 2007. URL: https://doi.org/10.1145/1240233.1240242.
  2. G. Badkobeh, P. Gawrychowski, J. Kärkkäinen, S. J. Puglisi, and B. Zhukova. Tight upper and lower bounds on suffix tree breadth. Theoretical Computer Science, 854:63-67, 2021. URL: https://doi.org/10.1016/j.tcs.2020.11.037.
  3. G. Badkobeh, J. Kärkkäinen, S.J. Puglisi, and B. Zhukova. On suffix tree breadth. In Proc. 24th Symposium on String Processing and Information Retrieval (SPIRE), volume 10508 of LNCS, pages 68-73. Springer, 2017. URL: https://doi.org/10.1007/978-3-319-67428-5_6.
  4. M. A. Bender and M. Farach-Colton. The LCA problem revisited. In 4th Latin American Symposium on Theoretical Informatics (LATIN), pages 88-94. Springer, 2000. URL: https://doi.org/10.1007/10719839_9.
  5. M.A. Bender and M. Farach-Colton. The level ancestor problem simplified. Theoretical Computer Science, 321(1):5-12, 2004. URL: https://doi.org/10.1016/j.tcs.2003.05.002.
  6. O. Berkman and U. Vishkin. Finding level-ancestors in trees. Journal of Computer and System Sciences, 48(2):214-230, 1994. URL: https://doi.org/10.1016/S0022-0000(05)80002-9.
  7. S. Biswas, A. Ganguly, R. Shah, and S.V. Thankachan. Ranked document retrieval with forbidden pattern. In Proc. 26th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 77-88. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-19929-0_7.
  8. S. Biswas, A. Ganguly, R. Shah, and S.V. Thankachan. Ranked document retrieval for multiple patterns. Theoretical Computer Science, 746:98-111, 2018. URL: https://doi.org/10.1016/j.tcs.2018.06.029.
  9. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California, 1994. Google Scholar
  10. David Clark. Compact pat trees. PhD thesis, University of Waterloo, 1997. Google Scholar
  11. M. Farach and S. Muthukrishnan. Perfect hashing for strings: Formalization and algorithms. In 7th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 130-140. Springer, 1996. URL: https://doi.org/10.1007/3-540-61258-0_11.
  12. M. L. Fredman and D. E. Willard. Surpassing the information theoretic bound with fusion trees. Journal of Computer and System Sciences, 47(3):424-436, 1993. URL: https://doi.org/10.1016/0022-0000(93)90040-4.
  13. P. Gawrychowski, M. Lewenstein, and P.K. Nicholson. Weighted ancestors in suffix trees. In Proc. 22nd Annual European Symposium on Algorithms (ESA), pages 455-466. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-44777-2_38.
  14. D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 13(2):338-355, 1984. URL: https://doi.org/10.1137/0213024.
  15. G. Jacobson. Space-efficient static trees and graphs. In Proc. 30th Annual Symposium on Foundations of Computer Science (FOCS), pages 549-554. IEEE, 1989. URL: https://doi.org/10.1109/SFCS.1989.63533.
  16. J. Kärkkäinen, D. Kempa, and M. Piątkowski. Tighter bounds for the sum of irreducible LCP values. Theoretical Computer Science, 656:265-278, 2016. URL: https://doi.org/0.1016/j.tcs.2015.12.009.
  17. J. Kärkkäinen, G. Manzini, and S.J. Puglisi. Permuted longest-common-prefix array. In Proc. 20th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 181-192. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-02441-2_17.
  18. D. Kempa and T. Kociumaka. Resolution of the Burrows-Wheeler transform conjecture. arXiv preprint, 2019. URL: http://arxiv.org/abs/1910.10631.
  19. D. Kempa, A. Policriti, N. Prezza, and E. Rotenberg. String attractors: Verification and optimization. In Proc. 26th Annual European Symposium on Algorithms (ESA), volume 112 of Leibniz International Proceedings in Informatics (LIPIcs), pages 52:1-52:13. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ESA.2018.52.
  20. T. Kociumaka, M. Kubica, J. Radoszewski, W. Rytter, and T. Waleń. A linear-time algorithm for seeds computation. ACM Transactions on Algorithms (TALG), 16(2):1-23, 2020. URL: https://doi.org/10.1145/3386369.
  21. T. Kopelowitz, G. Kucherov, Y. Nekrich, and T. Starikovskaya. Cross-document pattern matching. Journal of Discrete Algorithms, 24:40-47, 2014. URL: https://doi.org/10.1016/j.jda.2013.05.002.
  22. T. Kopelowitz and M. Lewenstein. Dynamic weighted ancestors. In Proc. 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 565-574. ACM-SIAM, 2007. URL: https://doi.org/10.5555/1283383.1283444.
  23. U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22(5):935-948, 1993. URL: https://doi.org/10.1137/0222058.
  24. J. I. Munro, G. Navarro, and Y. Nekrich. Space-efficient construction of compressed indexes in deterministic linear time. In Proc. 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 408-424. SIAM, 2017. URL: https://doi.org/10.1137/1.9781611974782.26.
  25. M. Pătraşcu and M. Thorup. Time-space trade-offs for predecessor search. In Proc. 38th Annual ACM Symposium on Theory of Computing (STOC), pages 232-240. ACM, 2006. URL: https://doi.org/10.1145/1132516.1132551.
  26. M. Pătraşcu and M. Thorup. Dynamic integer sets with optimal rank, select, and predecessor search. In Proc. 55th Annual Symposium on Foundations of Computer Science (FOCS), pages 166-175. IEEE, 2014. URL: https://doi.org/10.1109/FOCS.2014.26.
  27. K. Sadakane. Compressed suffix trees with full functionality. Theory of Computing Systems, 41(4):589-607, 2007. URL: https://doi.org/10.1007/s00224-006-1198-x.
  28. D. D. Sleator and R. E. Tarjan. A data structure for dynamic trees. Journal of Computer and System Sciences, 26(3):362-391, 1983. URL: https://doi.org/10.1016/0022-0000(83)90006-5.
  29. N. Välimäki and S.J. Puglisi. Distributed string mining for high-throughput sequencing data. In Proc. 12th International Workshop on Algorithms in Bioinformatics (WABI), LNCS 7534, pages 441-452. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-33122-0_35.
  30. P. van Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an efficient priority queue. Mathematical systems theory, 10(1):99-127, 1976. URL: https://doi.org/10.1007/BF01683268.
  31. D. E. Willard. Log-logarithmic worst-case range queries are possible in space Θ(N). Information Processing Letters, 17(2):81-84, 1983. URL: https://doi.org/10.1016/0020-0190(83)90075-3.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail