Compressed Orthogonal Search on Suffix Arrays with Applications to Range LCP

Authors Kotaro Matsuda, Kunihiko Sadakane , Tatiana Starikovskaya, Masakazu Tateshita



PDF
Thumbnail PDF

File

LIPIcs.CPM.2020.23.pdf
  • Filesize: 0.5 MB
  • 13 pages

Document Identifiers

Author Details

Kotaro Matsuda
  • Graduate School of Information Science and Technology, The University of Tokyo, Japan
Kunihiko Sadakane
  • Graduate School of Information Science and Technology, The University of Tokyo, Japan
Tatiana Starikovskaya
  • DIENS, École normale supérieure, PSL Research University, Paris, France
Masakazu Tateshita
  • Graduate School of Information Science and Technology, The University of Tokyo, Japan

Acknowledgements

The authors would like to thank anonymous referees for helpful comments.

Cite AsGet BibTex

Kotaro Matsuda, Kunihiko Sadakane, Tatiana Starikovskaya, and Masakazu Tateshita. Compressed Orthogonal Search on Suffix Arrays with Applications to Range LCP. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 23:1-23:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.CPM.2020.23

Abstract

We propose a space-efficient data structure for orthogonal range search on suffix arrays. For general two-dimensional orthogonal range search problem on a set of n points, there exists an n log n (1+o(1))-bit data structure supporting O(log n)-time counting queries [Mäkinen, Navarro 2007]. The space matches the information-theoretic lower bound. However, if we focus on a point set representing a suffix array, there is a chance to obtain a space efficient data structure. We answer this question affirmatively. Namely, we propose a data structure for orthogonal range search on suffix arrays which uses O(1/(ε) n (H₀+1)) bits where H₀ is the order-0 entropy of the string and answers a counting query in O(n^ε) time for any constant ε>0. As an application, we give an O(1/(ε) n (H₀+1))-bit data structure for the range LCP problem.

Subject Classification

ACM Subject Classification
  • Theory of computation → Models of computation
Keywords
  • Orthogonal Range Search
  • Succinct Data Structure
  • Suffix Array

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Paniz Abedin, Arnab Ganguly, Wing-Kai Hon, Yakov Nekrich, Kunihiko Sadakane, Rahul Shah, and Sharma V. Thankachan. A linear-space data structure for range-lcp queries in poly-logarithmic time. In COCOON'18, pages 615-625, 2018. URL: https://doi.org/10.1007/978-3-319-94776-1_51.
  2. Amihood Amir, Alberto Apostolico, Gad M. Landau, Avivit Levy, Moshe Lewenstein, and Ely Porat. Range LCP. J. Comput. Syst. Sci., 80(7):1245-1253, 2014. URL: https://doi.org/10.1016/j.jcss.2014.02.010.
  3. Amihood Amir, Moshe Lewenstein, and Sharma V. Thankachan. Range LCP queries revisited. In SPIRE'15, pages 350-361, 2015. URL: https://doi.org/10.1007/978-3-319-23826-5_33.
  4. Bernard Chazelle. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput., 17(3):427-462, 1988. URL: https://doi.org/10.1137/0217026.
  5. Graham Cormode and S. Muthukrishnan. Substring compression problems. In SODA'05, pages 321-330, 2005. Google Scholar
  6. Johannes Fischer and Volker Heun. A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In ESCAPE'07, pages 459-470, 2007. URL: https://doi.org/10.1007/978-3-540-74450-4_41.
  7. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In SODA'03, page 841–850, 2003. Google Scholar
  8. Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. on Computing, 35(2):378-407, 2005. Google Scholar
  9. Orgad Keller, Tsvi Kopelowitz, Shir Landau Feibish, and Moshe Lewenstein. Generalized substring compression. Theoretical Computer Science, 525:42-54, 2014. URL: https://doi.org/10.1016/j.tcs.2013.10.010.
  10. Orgad Keller, Tsvi Kopelowitz, Shir Landau Feibish, and Moshe Lewenstein. Generalized substring compression. Theoretical Computer Science, 525:42-54, 2014. Advances in Stringology. URL: https://doi.org/10.1016/j.tcs.2013.10.010.
  11. Veli Mäkinen and Gonzalo Navarro. Rank and select revisited and extended. Theoretical Computer Science, 387(3):332-347, 2007. URL: https://doi.org/10.1016/j.tcs.2007.07.013.
  12. Udi Manber and Gene Myers. Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing, 22(5):935-948, 1993. Google Scholar
  13. Gonzalo Navarro and Kunihiko Sadakane. Fully-functional static and dynamic succinct trees. ACM Transactions on Algorithms (TALG), 10(3):Article No. 16, 39 pages, 2014. Google Scholar
  14. Yakov Nekrich and Gonzalo Navarro. Sorted range reporting. In SWAT'12, pages 271-282, 2012. Google Scholar
  15. Manish Patil, Rahul Shah, and Sharma V. Thankachan. Faster range LCP queries. In SPIRE'13, pages 263-270, 2013. URL: https://doi.org/10.1007/978-3-319-02432-5_29.
  16. Rajeev Raman, Venkatesh Raman, and Srinavasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms (TALG), 3(4), 2007. URL: https://doi.org/10.1145/1290672.1290680.
  17. Kunihiko Sadakane. Space-efficient data structures for flexible text retrieval systems. In ISAAC'02, pages 14-24, 2002. URL: https://doi.org/10.1007/3-540-36136-7_2.
  18. Kunihiko Sadakane. New text indexing functionalities of the compressed suffix arrays. J. of Algorithms, 48(2):294-313, 2003. Google Scholar
  19. Kunihiko Sadakane. Compressed suffix trees with full functionality. Theory of Computing Systems, 41(4):589-607, 2007. Google Scholar
  20. Daniel D. Sleator and Robert Endre Tarjan. A data structure for dynamic trees. Journal of computer and system sciences, 26(3):362-391, 1983. Google Scholar
  21. Peter Weiner. Linear pattern matching algorithms. In SWAT(FOCS)'73, pages 1-11, 1973. URL: https://doi.org/10.1109/SWAT.1973.13.
  22. Dan E. Willard. Log-logarithmic worst-case range queries are possible in space θ(n). Inf. Process. Lett., 17(2):81-84, 1983. URL: https://doi.org/10.1016/0020-0190(83)90075-3.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail