Online Construction of Wavelet Trees

Authors Paulo G. S. da Fonseca, Israel B. F. da Silva



PDF
Thumbnail PDF

File

LIPIcs.SEA.2017.16.pdf
  • Filesize: 0.61 MB
  • 14 pages

Document Identifiers

Author Details

Paulo G. S. da Fonseca
Israel B. F. da Silva

Cite AsGet BibTex

Paulo G. S. da Fonseca and Israel B. F. da Silva. Online Construction of Wavelet Trees. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 16:1-16:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.SEA.2017.16

Abstract

The wavelet tree (WT) is a flexible and efficient data structure for representing character strings in succinct space, while allowing for fast generalised rank, select and access operations. As such, they play an important role in modern text indexing methods. However, despite their popularity, not many algorithms have been published concerning their construction. In particular, while the WT is capable of representing a sequence of length n over an alphabet of size m in n\lg m+o(n\lg m) bits, much more space is typically used for its construction. Here we propose an O(n\lg m)-time online method for the construction of the WT, requiring no prior knowledge about the input alphabet. The proposed algorithm is conceptually simpler than other state-of-the-art methods, while having comparable time performance and being more space-efficient in practice, since it performs just one pass over the input text and uses little extra space other than for the structure itself, as shown both theoretically and empirically.
Keywords
  • Wavelet tree
  • Online construction

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. David Clark. Compact Pat Trees. PhD thesis, University of Waterloo, 1996. Google Scholar
  2. Francisco Claude. LIBCDS: A compressed data structures library (https://github.com/fclaude/libcds2).
  3. Francisco Claude, Patrick K. Nicholson, and Diego Seco. Space Efficient Wavelet Tree Construction. In Proceedings of the 18th Symposium on String Processing and Information Retrieval - SPIRE 2011, pages 185-196, Pisa, 2011. URL: http://dx.doi.org/10.1007/978-3-642-24583-1_19.
  4. Joshimar Cordova and Gonzalo Navarro. Practical Dynamic Entropy-Compressed Bitvectors with Applications. In Proceedings of the 15th International Symposium on Experimental Algorithms - SEA 2016, pages 105-117, 2016. URL: http://dx.doi.org/10.1007/978-3-319-38851-9_8.
  5. Thomas H. Cormen, Charles E. Leiserson, Ronald L Rivest, and Clifford Stein. Introduction To Algorithms. MIT Press, 1990. Google Scholar
  6. Paolo Ferragina, Raffaele Giancarlo, and Giovanni Manzini. The myriad virtues of Wavelet Trees. Information and Computation, 207(8):849-866, 2009. URL: http://dx.doi.org/10.1016/j.ic.2008.12.010.
  7. José Fuentes-Sepúlveda, Erick Elejalde, Leo Ferres, and Diego Seco. Efficient Wavelet Tree Construction and Querying for Multicore Architectures. In Proceedings of the 13th International Symposium on Experimental Algorithms - SEA 2014, pages 150-161, Copenhagen, 2014. URL: http://dx.doi.org/10.1007/978-3-319-07959-2_13.
  8. Travis Gagie, Simon J. Puglisi, and Andrew Turpin. Range Quantile Queries: Another Virtue of Wavelet Trees. In Proceedings of the 16th International Symposium on String Processing and Information Retrieval - SPIRE 2009, pages 1-6, Saariselkä, 2009. http://arxiv.org/abs/arXiv:0903.4726v6, URL: http://dx.doi.org/10.1007/978-3-642-03784-9_1.
  9. Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From Theory to Practice: Plug and Play with Succinct Data Structures. In Proceedings of the 13th International Symposium on Experimental Algorithms - SEA 2014, pages 326-337, Copenhagen, 2014. URL: http://dx.doi.org/10.1007/978-3-319-07959-2_28.
  10. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-Order Entropy-Compressed Text Indexes. In Proceedings of the 14th annual ACM-SIAM Symposium On Discrete Algorithms - SODA 2003, pages 841-850, Philadelphia, 2003. Google Scholar
  11. J. Ian Munro. Tables. In Proceedings of the International Conference on Foundations of Software Technology and Theoretical Computer Science, pages 37-42, Hyderabad, 1996. URL: http://dx.doi.org/10.1007/3-540-62034-6_35.
  12. J. Ian Munro. Personal communication, 2017. Google Scholar
  13. J. Ian Munro, Yakov Nekrich, and Jeffrey S. Vitter. Fast construction of wavelet trees. Theoretical Computer Science, 638:91-97, 2016. URL: http://dx.doi.org/10.1016/j.tcs.2015.11.011.
  14. ISO/IEC. Information technology - Programming languages - C. ISO/IEC 9899:2011 Std, 2011. Google Scholar
  15. Guy Jacobson. Succinct static data structures. PhD thesis, Carnegie Mellon University, 1989. Google Scholar
  16. Veli Mäkinen and Gonzalo Navarro. Dynamic entropy-compressed sequences and full-text indexes. ACM Transactions on Algorithms, 4(3):1-38, 2008. URL: http://dx.doi.org/10.1145/1367064.1367072.
  17. Christos Makris. Wavelet trees: A survey. Computer Science and Information Systems, 9(2):585-625, 2012. URL: http://dx.doi.org/10.2298/CSIS110606004M.
  18. Gonzalo Navarro. Wavelet Trees for All. In Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching - CPM 2012, pages 2-26, Helsinki, 2012. URL: http://dx.doi.org/10.1007/978-3-642-31265-6_2.
  19. Gonzalo Navarro. Compact data structures: a pratical approach. Cambridge Univ Press, 2016. Google Scholar
  20. Gonzalo Navarro and Paolo Ferragina. Pizza&Chili Corpus Website (http://pizzachili.dcc.uchile.cl/).
  21. Gonzalo Navarro and Eliana Providel. Fast, Small, Simple Rank/Select on Bitmaps. In Proceedings of the 11th international Symposium on Experimental Algorithms - SEA 2012, pages 295-306, Bordeaux, 2012. URL: http://dx.doi.org/10.1007/978-3-642-30850-5_26.
  22. Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming Language Design and Implementation - PLDI 2007, pages 89-100, New York, 2007. URL: http://dx.doi.org/10.1145/1250734.1250746.
  23. Rajeev Raman, Venkatesh Raman, and S Srinivasa Rao. Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees, Prefix Sums and Multisets. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms - SODA 2002, pages 233-242, San Francisco, 2002. URL: http://arxiv.org/abs/arXiv:0705.0552v1.
  24. Julian Shun. Parallel Wavelet Tree Construction. In 2015 Data Compression Conference, pages 63-72. IEEE, apr 2015. URL: http://dx.doi.org/10.1109/DCC.2015.7.
  25. German Tischler. On wavelet tree construction. In Proceedings of the 22nd annual conference on Combinatorial pattern matching - CPM 2011, pages 208-218, Palermo, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail