Optimal Rank and Select Queries on Dictionary-Compressed Text

Author Nicola Prezza



PDF
Thumbnail PDF

File

LIPIcs.CPM.2019.4.pdf
  • Filesize: 469 kB
  • 12 pages

Document Identifiers

Author Details

Nicola Prezza
  • Department of Computer Science, University of Pisa, Italy

Cite AsGet BibTex

Nicola Prezza. Optimal Rank and Select Queries on Dictionary-Compressed Text. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 4:1-4:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.CPM.2019.4

Abstract

We study the problem of supporting queries on a string S of length n within a space bounded by the size gamma of a string attractor for S. In the paper introducing string attractors it was shown that random access on S can be supported in optimal O(log(n/gamma)/log log n) time within O(gamma polylog n) space. In this paper, we extend this result to rank and select queries and provide lower bounds matching our upper bounds on alphabets of polylogarithmic size. Our solutions are given in the form of a space-time trade-off that is more general than the one previously known for grammars and that improves existing bounds on LZ77-compressed text by a log log n time-factor in select queries. We also provide matching lower and upper bounds for partial sum and predecessor queries within attractor-bounded space, and extend our lower bounds to encompass navigation of dictionary-compressed tree representations.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data compression
  • Theory of computation → Cell probe models and lower bounds
Keywords
  • Rank
  • Select
  • Dictionary compression
  • String Attractors

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Paul Beame and Faith E Fich. Optimal bounds for the predecessor problem and related problems. Journal of Computer and System Sciences, 65(1):38-72, 2002. Google Scholar
  2. D. Belazzougui and G. Navarro. Optimal Lower and Upper Bounds for Representing Sequences. ACM Transactions on Algorithms, 11(4):article 31, 2015. Google Scholar
  3. Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Monotone minimal perfect hashing: searching a sorted table with O (1) accesses. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, pages 785-794. SIAM, 2009. Google Scholar
  4. Djamal Belazzougui, Patrick Hagge Cording, Simon J Puglisi, and Yasuo Tabei. Access, rank, and select in grammar-compressed strings. In Algorithms-ESA 2015, pages 142-154. Springer, 2015. Google Scholar
  5. Djamal Belazzougui, Travis Gagie, Pawel Gawrychowski, Juha Kärkkäinen, Alberto Ordónez, Simon J Puglisi, and Yasuo Tabei. Queries on LZ-bounded encodings. In Data Compression Conference (DCC), 2015, pages 83-92. IEEE, 2015. Google Scholar
  6. David Benoit, Erik D Demaine, J Ian Munro, Rajeev Raman, Venkatesh Raman, and S Srinivasa Rao. Representing trees of higher degree. Algorithmica, 43(4):275-292, 2005. Google Scholar
  7. Philip Bille, Gad M Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. Random access to grammar-compressed strings and trees. SIAM Journal on Computing, 44(3):513-539, 2015. Google Scholar
  8. Anselm Blumer, Janet Blumer, David Haussler, Ross McConnell, and Andrzej Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis. Journal of the ACM, 34(3):578-595, 1987. Google Scholar
  9. Michael Burrows and David J. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994. Google Scholar
  10. Ho-Leung Chan, Wing-Kai Hon, Tak-Wah Lam, and Kunihiko Sadakane. Compressed indexes for dynamic text collections. ACM Transactions on Algorithms (TALG), 3(2):21, 2007. Google Scholar
  11. Maxime Crochemore and Renaud Vérin. Direct construction of compact directed acyclic word graphs. In Combinatorial Pattern Matching (CPM), pages 116-129. Springer, 1997. Google Scholar
  12. Michael L Fredman and Dan E Willard. Blasting through the information theoretic barrier with fusion trees. In Proceedings of the twenty-second annual ACM symposium on Theory of Computing, pages 1-7. ACM, 1990. Google Scholar
  13. Moses Ganardi, Danny Hucke, Markus Lohrey, and Eric Noeth. Tree compression using string grammars. Algorithmica, 80(3):885-917, 2018. Google Scholar
  14. Guy Jacobson. Space-efficient static trees and graphs. In Foundations of Computer Science, 1989., 30th Annual Symposium on, pages 549-554. IEEE, 1989. Google Scholar
  15. Dominik Kempa, Alberto Policriti, Nicola Prezza, and Eva Rotenberg. String Attractors: Verification and Optimization. In Yossi Azar, Hannah Bast, and Grzegorz Herman, editors, 26th Annual European Symposium on Algorithms (ESA 2018), volume 112 of Leibniz International Proceedings in Informatics (LIPIcs), pages 52:1-52:13, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  16. Dominik Kempa and Nicola Prezza. At the Roots of Dictionary Compression: String Attractors. In Annual Symposium on Theory of Computing (STOC), pages 827-840. ACM, 2018. Google Scholar
  17. T. Kida, T. Matsumoto, Y. Shibata, M. Takeda, A. Shinohara, and S. Arikawa. Collage system: A unifying framework for compressed pattern matching. Theor. Comput. Sci., 298(1):253-272, 2003. Google Scholar
  18. John C. Kieffer and En-Hui Yang. Grammar-based codes: A new class of universal lossless source codes. IEEE Transactions on Information Theory, 46(3):737-754, 2000. Google Scholar
  19. A. Lempel and J. Ziv. On the complexity of finite sequences. IEEE Trans. Information Theory, 22(1):75-81, 1976. Google Scholar
  20. Gonzalo Navarro. Compact data structures: A practical approach. Cambridge University Press, 2016. Google Scholar
  21. Gonzalo Navarro and Nicola Prezza. Faster Attractor-Based Indexes. arXiv preprint, 2018. URL: http://arxiv.org/abs/1811.12779.
  22. Gonzalo Navarro and Nicola Prezza. Universal Compressed Text Indexing. Theoretical Computer Science, 2018. Google Scholar
  23. Alberto Ordóñez, Gonzalo Navarro, and Nieves R Brisaboa. Grammar compressed sequences with rank/select support. Journal of Discrete Algorithms, 43:54-71, 2017. Google Scholar
  24. James A. Storer and Thomas G. Szymanski. Data compression via textual substitution. Journal of the ACM, 29(4):928-951, 1982. Google Scholar
  25. Elad Verbin and Wei Yu. Data structure lower bounds on random access to grammar-compressed strings. In Annual Symposium on Combinatorial Pattern Matching, pages 247-258. Springer, 2013. Google Scholar
  26. Andrew Chi-Chih Yao. Should tables be sorted? Journal of the ACM (JACM), 28(3):615-628, 1981. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail