Fast Compressed Self-Indexes with Deterministic Linear-Time Construction

Authors J. Ian Munro, Gonzalo Navarro, Yakov Nekrich



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2017.57.pdf
  • Filesize: 0.5 MB
  • 12 pages

Document Identifiers

Author Details

J. Ian Munro
Gonzalo Navarro
Yakov Nekrich

Cite AsGet BibTex

J. Ian Munro, Gonzalo Navarro, and Yakov Nekrich. Fast Compressed Self-Indexes with Deterministic Linear-Time Construction. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 57:1-57:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ISAAC.2017.57

Abstract

We introduce a compressed suffix array representation that, on a text T of length n over an alphabet of size \sigma, can be built in O(n) deterministic time, within O(n\log\sigma) bits of working space, and counts the number of occurrences of any pattern P in T in time O(|P| + \log\log_w \sigma) on a RAM machine of w=\Omega(\log n)-bit words. This new index outperforms all the other compressed indexes that can be built in linear deterministic time, and some others. The only faster indexes can be built in linear time only in expectation, or require \Theta(n\log n) bits.
Keywords
  • Succinct data structures
  • Self-indexes
  • Suffix arrays
  • Deterministic construction

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. J. Barbay, F. Claude, T. Gagie, G. Navarro, and Y. Nekrich. Efficient fully-compressed sequence representations. Algorithmica, 69(1):232-268, 2014. Google Scholar
  2. D. Belazzougui, P. Boldi, R. Pagh, and S. Vigna. Fast prefix search in little space, with applications. In Proc. 18th ESA, LNCS 6346, pages 427-438, 2010. Google Scholar
  3. D. Belazzougui, F. Cunial, J. Kärkkäinen, and V. Mäkinen. Versatile succinct representations of the bidirectional Burrows-Wheeler transform. In Proc. 21st ESA, pages 133-144, 2013. Google Scholar
  4. D. Belazzougui, F. Cunial, J. Kärkkäinen, and V. Mäkinen. Linear-time string indexing and analysis in small space. CoRR, abs/1609.06378, 2016. Google Scholar
  5. D. Belazzougui and G. Navarro. Alphabet-independent compressed text indexing. ACM Trans. Alg., 10(4):article 23, 2014. Google Scholar
  6. D. Belazzougui and G. Navarro. Optimal lower and upper bounds for representing sequences. ACM Trans. Alg., 11(4):article 31, 2015. Google Scholar
  7. P. Bille, I. L. Gørtz, and F. R. Skjoldjensen. Deterministic indexing for packed strings. In Proc. 28th CPM, LIPIcs 78, page article 6, 2017. Google Scholar
  8. M. Burrows and D. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994. Google Scholar
  9. D. R. Clark. Compact PAT Trees. PhD thesis, University of Waterloo, Canada, 1996. Google Scholar
  10. R. Cole, T. Kopelowitz, and M. Lewenstein. Suffix trays and suffix trists: Structures for faster text indexing. Algorithmica, 72(2):450-466, 2015. Google Scholar
  11. M. Farach. Optimal suffix tree construction with large alphabets. In Proc. 38th FOCS, pages 137-143, 1997. Google Scholar
  12. P. Ferragina and G. Manzini. Indexing compressed text. J. ACM, 52(4):552-581, 2005. Google Scholar
  13. P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro. Compressed representations of sequences and full-text indexes. ACM Trans. Alg., 3(2):article 20, 2007. Google Scholar
  14. J. Fischer and P. Gawrychowski. Alphabet-dependent string searching with wexponential search trees. In Proc. 26th CPM, LNCS 9133, pages 160-171, 2015. Google Scholar
  15. T. Gagie. Large alphabets and incompressibility. Inf. Proc. Lett., 99(6):246-251, 2006. Google Scholar
  16. A. Golynski, J. I. Munro, and S. S. Rao. Rank/select operations on large alphabets: a tool for text indexing. In Proc. 17th SODA, pages 368-373, 2006. Google Scholar
  17. R. Grossi, A. Orlandi, R. Raman, and S. S. Rao. More haste, less waste: Lowering the redundancy in fully indexable dictionaries. In Proc. 26th STACS, pages 517-528, 2009. Google Scholar
  18. R. Grossi and J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comp., 35(2):378-407, 2005. Google Scholar
  19. T. Hagerup, P. Bro Miltersen, and R. Pagh. Deterministic dictionaries. J. Alg., 41(1):69 - 85, 2001. Google Scholar
  20. J. Kärkkäinen, P. Sanders, and S. Burkhardt. Linear work suffix array construction. J. ACM, 53(6):918-936, 2006. Google Scholar
  21. U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM J. Comp., 22(5):935-948, 1993. Google Scholar
  22. G. Manzini. An analysis of the Burrows-Wheeler transform. J. ACM, 48(3):407-430, 2001. Google Scholar
  23. J. I. Munro, G. Navarro, and Y. Nekrich. Fast compressed self-indexes with deterministic linear-time construction. CoRR, abs/1707.01743, 2017. Google Scholar
  24. J. I. Munro, G. Navarro, and Y. Nekrich. Space-efficient construction of compressed indexes in deterministic linear time. In Proc. 28th SODA, pages 408-424, 2017. Google Scholar
  25. G. Navarro and V. Mäkinen. Compressed full-text indexes. ACM Comp. Surv., 39(1):article 2, 2007. Google Scholar
  26. G. Navarro and Y. Nekrich. Time-optimal top-k document retrieval. SIAM J. Comp., 46(1):89-113, 2017. Google Scholar
  27. G. Navarro and K. Sadakane. Fully-functional static and dynamic succinct trees. ACM Trans. Alg., 10(3):article 16, 2014. Google Scholar
  28. K. Sadakane. Compressed suffix trees with full functionality. Theor. Comp. Sys., 41(4):589-607, 2007. Google Scholar
  29. P. Weiner. Linear pattern matching algorithms. In Proc. 14th FOCS, pages 1-11, 1973. Google Scholar