Random Access in Persistent Strings

Authors Philip Bille , Inge Li Gørtz



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2020.48.pdf
  • Filesize: 2.79 MB
  • 16 pages

Document Identifiers

Author Details

Philip Bille
  • Technical University of Denmark, DTU Compute, Lyngby, Denmark
Inge Li Gørtz
  • Technical University of Denmark, DTU Compute, Lyngby, Denmark

Acknowledgements

We thank the anonymous reviewers for their helpful comments that improved the presentation of this paper.

Cite AsGet BibTex

Philip Bille and Inge Li Gørtz. Random Access in Persistent Strings. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 48:1-48:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ISAAC.2020.48

Abstract

We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a character) and a node represents the string obtained by applying the sequence of edit operations on the path from the root to the node. The goal is to compactly represent the entire collection while supporting fast random access to any part of a string in the collection. This problem captures natural scenarios such as representing the past history of an edited document or representing highly-repetitive collections. Given a tree with n nodes, we show how to represent the corresponding collection in O(n) space and optimal O(log n/ log log n) query time. This improves the previous time-space trade-offs for the problem. To obtain our results, we introduce new techniques and ideas, including a reduction to a new geometric line segment selection together with an efficient solution.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data structures design and analysis
Keywords
  • Data compression
  • data structures
  • persistent strings

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jérémy Barbay, Francisco Claude, Travis Gagie, Gonzalo Navarro, and Yakov Nekrich. Efficient fully-compressed sequence representations. Algorithmica, 69(1):232-268, 2014. Google Scholar
  2. Jérémy Barbay, Meng He, J Ian Munro, and S Srinivasa Rao. Succinct indexes for strings, binary relations and multi-labeled trees. In Proc. 18th SODA, pages 680-689, 2007. Google Scholar
  3. Djamal Belazzougui, Patrick Hagge Cording, Simon J. Puglisi, and Yasuo Tabei. Access, rank, and select in grammar-compressed strings. In Proc. 23rd ESA, pages 142-154, 2015. Google Scholar
  4. Djamal Belazzougui and Gonzalo Navarro. Optimal lower and upper bounds for representing sequences. ACM Trans. Algorithms, 11(4):1-21, 2015. Google Scholar
  5. Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel Vildhøj, and Søren Vind. Dynamic relative compression, dynamic partial sums, and substring concatenation. Algorithmica, 80(11):3207-3224, 2018. Announced at ISAAC 2016. Google Scholar
  6. Philip Bille, Anders Roy Christiansen, Nicola Prezza, and Frederik Rye Skjoldjensen. Succinct partial sums and Fenwick trees. In Proc. 24th SPIRE, pages 91-96, 2017. Google Scholar
  7. Philip Bille, Mikko Berggren Ettienne, Inge Li Gørtz, and Hjalte Wedel Vildhøj. Time-space trade-offs for lempel-ziv compressed indexing. Theoret. Comput. Sci., 713:66-77, 2018. Google Scholar
  8. Philip Bille, Inge Li Gørtz, Gad M Landau, and Oren Weimann. Tree compression with top trees. Inform. and Comput., 243:166-177, 2015. Google Scholar
  9. Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. Random access to grammar-compressed strings and trees. SIAM J. Comput., 44(3):513-539, 2015. Announced at SODA 2011. Google Scholar
  10. M. Charikar, E. Lehman, D. Liu, R. Panigrahy, M. Prabhakaran, A. Sahai, and A. Shelat. The smallest grammar problem. IEEE Trans. Inform. Theory, 51(7):2554-2576, 2005. Google Scholar
  11. Bernard Chazelle. Filtering search: A new approach to query-answering. SIAM J. Comput., 15(3):703-724, 1986. Google Scholar
  12. BG Chern, Idoia Ochoa, Alexandros Manolakos, Albert No, Kartik Venkat, and Tsachy Weissman. Reference based genome compression. In Proc. 12th ITW, pages 427-431, 2012. Google Scholar
  13. P. F. Dietz. Fully persistent arrays (extended array). In Proc. 1st WADS, volume 382, pages 67-74, 1989. Google Scholar
  14. Paul F Dietz. Optimal algorithms for list indexing and subset rank. In Proc. 1st WADS, pages 39-46, 1989. Google Scholar
  15. Huy Hoang Do, Jesper Jansson, Kunihiko Sadakane, and Wing-Kin Sung. Fast relative Lempel-Ziv self-index for similar sequences. Theoret. Comput. Sci., 532:14-30, 2014. Google Scholar
  16. J. Driscoll, N. Sarnak, D. Sleator, and R. Tarjan. Making data structures persistent. J. Comput. System Sci., 38:86-124, 1989. Google Scholar
  17. Peter M Fenwick. A new data structure for cumulative frequency tables. Software: Pract. Exper., 24(3):327-336, 1994. Google Scholar
  18. Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro. Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms, 3(2):20, 2007. Google Scholar
  19. Paolo Ferragina and Rossano Venturini. A simple storage scheme for strings achieving entropy bounds. Theoret. Comput. Sci., 372(1):115-121, 2007. Google Scholar
  20. Michael Fredman and Michael Saks. The cell probe complexity of dynamic data structures. In Proc. 21st STOC, pages 345-354, 1989. Google Scholar
  21. Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. System Sci., 47(3):424-436, 1993. Google Scholar
  22. Michael L. Fredman and Dan E. Willard. Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. System Sci., 48(3):533-551, 1994. Google Scholar
  23. Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J Puglisi. A faster grammar-based self-index. In Proc. 6th LATA, pages 240-251, 2012. Google Scholar
  24. Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J Puglisi. LZ77-based self-indexing with faster pattern matching. In Proc. 14th LATIN, pages 731-742, 2014. Google Scholar
  25. Travis Gagie, Paweł Gawrychowski, and Simon J Puglisi. Approximate pattern matching in lz77-compressed texts. J. Discrete Algorithms, 32:64-68, 2015. Google Scholar
  26. Travis Gagie, Kalle Karhu, Gonzalo Navarro, Simon J Puglisi, and Jouni Sirén. Document listing on repetitive collections. In Proc. 24th CPM, pages 107-119, 2013. Google Scholar
  27. Moses Ganardi, Artur Jez, and Markus Lohrey. Balancing straight-line programs. In Proc. 60th FOCS, pages 1169-1183, 2019. Google Scholar
  28. Alexander Golynski, J Ian Munro, and S Srinivasa Rao. Rank/select operations on large alphabets: a tool for text indexing. In Proc. 17th SODA, pages 368-373, 2006. Google Scholar
  29. Alexander Golynski, Rajeev Raman, and S Srinivasa Rao. On the redundancy of succinct data structures. In Proc. 11th SWAT, pages 148-159, 2008. Google Scholar
  30. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proc. 14th SODA, pages 841-850, 2003. Google Scholar
  31. Roberto Grossi, Rajeev Raman, Satti Srinivasa Rao, and Rossano Venturini. Dynamic compressed strings with random access. In Proc. 40th ICALP, pages 504-515. Springer, 2013. Google Scholar
  32. Wing-Kai Hon, Kunihiko Sadakane, and Wing-Kin Sung. Succinct data structures for searchable partial sums with optimal worst-case performance. Theoret. Comput. Sci., 412(39):5176-5186, 2011. Google Scholar
  33. Christopher Hoobin, Simon J Puglisi, and Justin Zobel. Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections. Proc. VLDB Endowment, 5(3):265-273, 2011. Google Scholar
  34. Allan Grønlund Jørgensen and Kasper Green Larsen. Range selection and median: Tight cell probe lower bounds and adaptive data structures. In Proc. 22nd SODA, pages 805-813, 2011. Google Scholar
  35. Dominik Kempa and Nicola Prezza. At the roots of dictionary compression: String attractors. In Proc. 50th STOC, pages 827-840, 2018. Google Scholar
  36. Shanika Kuruppu, Simon J Puglisi, and Justin Zobel. Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In Proc. 17th SPIRE, pages 201-206, 2010. Google Scholar
  37. Shanika Kuruppu, Simon J Puglisi, and Justin Zobel. Optimized relative Lempel-Ziv compression of genomes. In Proc. 34th ACSC, pages 91-98, 2011. Google Scholar
  38. Stan Y. Liao, Srinivas Devadas, and Kurt Keutzer. A text-compression-based method for code size minimization in embedded systems. Trans. Design Autom. Electr. Syst., 4(1):12-38, 1999. Google Scholar
  39. Stan Y. Liao, Srinivas Devadas, Kurt Keutzer, Steven W. K. Tjiang, and Albert Wang. Code optimization techniques in embedded DSP microprocessors. Design Autom. Emb. Sys., 3(1):59-73, 1998. Google Scholar
  40. Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki. Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol., 17(3):281-308, 2010. Google Scholar
  41. J Ian Munro and Yakov Nekrich. Compressed data structures for dynamic sequences. In Proc. 23rd ESA, pages 891-902. Springer, 2015. Google Scholar
  42. Gonzalo Navarro. Indexing highly repetitive collections. In Proc. 23rd IWOCA, pages 274-279, 2012. Google Scholar
  43. Gonzalo Navarro. Document listing on repetitive collections with guaranteed performance. Theoret. Comput. Sci., 772:58-72, 2019. Google Scholar
  44. Mihai Pătraşcu and Mikkel Thorup. Dynamic integer sets with optimal rank, select, and predecessor search. In Proc. 55th FOCS, pages 166-175, 2014. Google Scholar
  45. Mihai Pǎtraşcu and Erik D. Demaine. Logarithmic lower bounds in the cell-probe model. SIAM J. Comput., 35(4):932-963, 2006. Announced at SODA 2004. Google Scholar
  46. Rajeev Raman, Venkatesh Raman, and S Srinivasa Rao. Succinct dynamic data structures. In Proc. 7th WADS, pages 426-437, 2001. Google Scholar
  47. Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci., 302(1-3):211-222, 2003. Google Scholar
  48. Kunihiko Sadakane and Roberto Grossi. Squeezing succinct data structures into entropy bounds. In Proc. 17th SODA, pages 1230-1239, 2006. Google Scholar
  49. Neil Sarnak and Robert Endre Tarjan. Planar point location using persistent search trees. Commun. ACM, 29(7):669-679, 1986. Google Scholar
  50. James A. Storer and Thomas G. Szymanski. The macro model for data compression. In Proc. 10th STOC, pages 30-39, 1978. Google Scholar
  51. James A Storer and Thomas G Szymanski. Data compression via textual substitution. J. ACM, 29(4):928-951, 1982. Google Scholar
  52. Robert Endre Tarjan and Uzi Vishkin. Finding biconnected componemts and computing tree functions in logarithmic parallel time. In Proc. 25th FOCS, pages 12-20, 1984. Google Scholar
  53. Elad Verbin and Wei Yu. Data structure lower bounds on random access to grammar-compressed strings. In Proc. 24th CPM, pages 247-258, 2013. Google Scholar
  54. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory, 23(3):337-343, 1977. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail