Finger Search in Grammar-Compressed Strings

Authors Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, Inge Li Gortz



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2016.36.pdf
  • Filesize: 0.57 MB
  • 16 pages

Document Identifiers

Author Details

Philip Bille
Anders Roy Christiansen
Patrick Hagge Cording
Inge Li Gortz

Cite As Get BibTex

Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, and Inge Li Gortz. Finger Search in Grammar-Compressed Strings. In 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 65, pp. 36:1-36:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.FSTTCS.2016.36

Abstract

Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position  in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index f, called the finger, and the query index i. We consider both a static variant, where we first place a finger and subsequently access indices near the finger efficiently, and a dynamic variant where also moving the finger such that the time depends on the distance moved is supported. 

Let n be the size the grammar, and let N be the size of the string. For the static variant we give a linear space representation that supports placing the finger in O(log(N)) time and subsequently accessing in O(log(D)) time, where D is the distance between the finger and the accessed index. For the dynamic variant we give a linear space representation that supports placing the finger in O(log(N)) time and accessing and moving the finger in O(log(D) + log(log(N))) time. Compared to the best linear space solution to random access, we improve a O(log(N)) query bound to O(log(D)) for the static variant and to O(log(D) + log(log(N))) for the dynamic variant, while maintaining linear space. As an application of our results we obtain an improved solution to the longest common extension problem in grammar compressed strings. To obtain our results, we introduce several new techniques of independent interest, including a novel van Emde Boas style decomposition of grammars.

Subject Classification

Keywords
  • Compression
  • Grammars
  • Finger search
  • Algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Stephen Alstrup, Thore Husfeldt, and Theis Rauhe. Marked ancestor problems. In Proc. 39th FOCS, pages 534-543, 1998. Google Scholar
  2. A. Apostolico and S. Lonardi. Some theory and practice of greedy off-line textual substitution. In Proc. DCC, pages 119-128, 1998. Google Scholar
  3. A. Apostolico and S. Lonardi. Compression of biological sequences by greedy off-line textual substitution. In Proc. DCC, pages 143-152, 2000. Google Scholar
  4. Alberto Apostolico and Stefano Lonardi. Off-line compression by greedy textual substitution. Proceedings of the IEEE, 88(11):1733-1744, 2000. Google Scholar
  5. D. Belazzougui, T. Gagie, P. Gawrychowski, J. Karkkainen, A. Ordonez, S. J. Puglisi, and Y. Tabei. Queries on lz-bounded encodings. In Proc. DCC, pages 83-92, April 2015. URL: http://dx.doi.org/10.1109/DCC.2015.69.
  6. Djamal Belazzougui, Patrick Hagge Cording, Simon J. Puglisi, and Yasuo Tabei. Access, rank, and select in grammar-compressed strings. In Proc. 23rd ESA, 2015. Google Scholar
  7. Jon Louis Bentley and Andrew Chi-Chih Yao. An almost optimal algorithm for unbounded searching. Inform. Process. Lett., 5(3):82-87, 1976. Google Scholar
  8. Philip Bille, Patrick Hagge Cording, and Inge Li Gørtz. Compressed subsequence matching and packed tree coloring. Algorithmica, pages 1-13, 2015. URL: http://dx.doi.org/10.1007/s00453-015-0068-9.
  9. Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, Benjamin Sach, Hjalte Wedel Vildhøj, and Søren Vind. Fingerprints in compressed strings. In Proc. 13th SWAT, 2013. Google Scholar
  10. Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. Random access to grammar-compressed strings and trees. SIAM J. Comput, 44(3):513-539, 2014. Announced at SODA 2011. Google Scholar
  11. Guy E. Blelloch, Bruce M. Maggs, and Shan Leung Maverick Woo. Space-efficient finger search on degree-balanced search trees. In Proc. 14th SODA, pages 374-383, 2003. Google Scholar
  12. Gerth Stølting Brodal. Finger search trees. In Handbook of Data Structures and Applications. Chapman and Hall/CRC, 2004. Google Scholar
  13. Gerth Stølting Brodal, George Lagogiannis, Christos Makris, Athanasios K. Tsakalidis, and Kostas Tsichlas. Optimal finger search trees in the pointer machine. J. Comput. Syst. Sci., 67(2):381-418, 2003. URL: http://dx.doi.org/10.1016/S0022-0000(03)00013-8.
  14. M. Charikar, E. Lehman, D. Liu, R. Panigrahy, M. Prabhakaran, A. Sahai, and A. Shelat. The smallest grammar problem. IEEE Trans. Inf. Theory, 51(7):2554-2576, 2005. Announced at STOC 2002 and SODA 2002. Google Scholar
  15. Francisco Claude and Gonzalo Navarro. Self-indexed grammar-based compression. Fund. Inform., 111(3):313-337, 2011. Google Scholar
  16. Patrick Hagge Cording, Paweł Gawrychowski, and Oren Weimann. Bookmarks in grammar-compressed strings. In Proc. 23rd SPIRE, pages x-y, 2016. Google Scholar
  17. Paul F. Dietz and Rajeev Raman. A constant update time finger search tree. Inf. Process. Lett., 52(3):147-154, 1994. Google Scholar
  18. Martin Farach and S. Muthukrishnan. Perfect hashing for strings: Formalization and algorithms. In Proc. 7th CPM, pages 130-140. Springer, 1996. Google Scholar
  19. Rudolf Fleischer. A simple balanced search tree with O(1) worst-case update time. Int. J. Found. Comput. Sci., 7(2):137-150, 1996. URL: http://dx.doi.org/10.1142/S0129054196000117.
  20. P. Gage. A new algorithm for data compression. The C Users J., 12(2):23-38, 1994. Google Scholar
  21. Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. A faster grammar-based self-index. In Proc. 6th LATA, pages 240-251, 2012. Google Scholar
  22. Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. LZ77-based self-indexing with faster pattern matching. In Proc. 11th LATIN, pages 731-742. Springer, 2014. Google Scholar
  23. Travis Gagie, Pawel Gawrychowski, and Simon J. Puglisi. Approximate pattern matching in lz77-compressed texts. J. Discrete Algorithms, 32:64-68, 2015. URL: http://dx.doi.org/10.1016/j.jda.2014.10.003.
  24. Travis Gagie, Christopher Hoobin, and Simon J. Puglisi. Block graphs in practice. In Proc. ICABD, pages 30-36, 2014. Google Scholar
  25. Leszek Ga̧sieniec, Roman Kolpakov, Igor Potapov, and Paul Sant. Real-time traversal in grammar-based compressed files. In Proc. 15th DCC, page 458, 2005. Google Scholar
  26. Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. LZD factorization: Simple and practical online grammar compression with variable-to-fixed encoding. In Proc. 26th CPM, pages 219-230. Springer, 2015. Google Scholar
  27. Leonidas J. Guibas, Edward M. McCreight, Michael F. Plass, and Janet R. Roberts. A new representation for linear lists. In Proc. 9th STOC, pages 49-60, 1977. Google Scholar
  28. Tomohiro I, Wataru Matsubara, Kouji Shimohira, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Kazuyuki Narisawa, and Ayumi Shinohara. Detecting regularities on grammar-compressed strings. Inform. Comput., 240:74-89, 2015. Google Scholar
  29. J. C. Kieffer and E. H. Yang. Grammar based codes: A new class of universal lossless source codes. IEEE Trans. Inf. Theory, 46(3):737-754, 2000. Google Scholar
  30. J. C. Kieffer, E. H. Yang, G. J. Nelson, and P. Cosman. Universal lossless compression via multilevel pattern matching. IEEE Trans. Inf. Theory, 46(5):1227-1245, 2000. Google Scholar
  31. S. Rao Kosaraju. Localized search in sorted lists. In Proc. 13th STOC, pages 62-69, New York, NY, USA, 1981. URL: http://dx.doi.org/10.1145/800076.802458.
  32. N. Jesper Larsson and Alistair Moffat. Off-line dictionary-based compression. Proc. IEEE, 88(11):1722-1732, 2000. Google Scholar
  33. Kurt Mehlhorn. A new data structure for representing sorted lists. In Proc. WG, pages 90-112, 1981. Google Scholar
  34. Gonzalo Navarro and Alberto Ordónez. Grammar compressed sequences with rank/select support. In 21st SPIRE, pages 31-44. Springer, 2014. Google Scholar
  35. Craig G. Nevill-Manning and Ian H. Witten. Identifying Hierarchical Structure in Sequences: A linear-time algorithm. J. Artificial Intelligence Res., 7:67-82, 1997. Google Scholar
  36. Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully dynamic data structure for LCE queries in compressed space. In Proc. 41st MFCS, pages 72:1-72:15, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.MFCS.2016.72.
  37. William Pugh. Skip lists: A probabilistic alternative to balanced trees. Commun. ACM, 33(6):668-676, 1990. Google Scholar
  38. W. Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003. Google Scholar
  39. Raimund Seidel and Cecilia R. Aragon. Randomized search trees. Algorithmica, 16(4/5):464-497, 1996. Google Scholar
  40. Y. Shibata, T. Kida, S. Fukamachi, M. Takeda, A. Shinohara, T. Shinohara, and S. Arikawa. Byte Pair encoding: A text compression scheme that accelerates pattern matching. Technical Report DOI-TR-161, Dept. of Informatics, Kyushu University, 1999. Google Scholar
  41. Daniel Dominic Sleator and Robert Endre Tarjan. Self-adjusting binary search trees. J. ACM, 32(3):652-686, July 1985. Google Scholar
  42. Toshiya Tanaka, I Tomohiro, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing convolution on grammar-compressed text. In Proc. 23rd DCC, pages 451-460, 2013. Google Scholar
  43. I Tomohiro, Takaaki Nishimoto, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Compressed automata for dictionary matching. Theor. Comput. Sci., 578:30-41, 2015. Google Scholar
  44. P. van Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an efficient priority queue. Theory Comput. Syst., 10(1):99-127, 1976. Google Scholar
  45. Elad Verbin and Wei Yu. Data structure lower bounds on random access to grammar-compressed strings. In Proc. 24th CPM, pages 247-258, 2013. Google Scholar
  46. Terry A. Welch. A technique for high-performance data compression. IEEE Computer, 17(6):8-19, 1984. Google Scholar
  47. E. H. Yang and J. C. Kieffer. Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform - part one: Without context models. IEEE Trans. Inf. Theory, 46(3):755-754, 2000. Google Scholar
  48. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, 23(3):337-343, 1977. Google Scholar
  49. Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory, 24(5):530-536, 1978. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail