Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Authors Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel Vildhøj, Søren Vind



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2016.18.pdf
  • Filesize: 0.49 MB
  • 13 pages

Document Identifiers

Author Details

Philip Bille
Patrick Hagge Cording
Inge Li Gørtz
Frederik Rye Skjoldjensen
Hjalte Wedel Vildhøj
Søren Vind

Cite As Get BibTex

Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel Vildhøj, and Søren Vind. Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 18:1-18:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.ISAAC.2016.18

Abstract

Given a static reference string R and a source string S, a relative compression of S with respect to R is an encoding of S as a sequence of references to substrings of R. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string S is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem.

Subject Classification

Keywords
  • Relative compression
  • dynamic compression
  • dynamic partial sum
  • sub-string concatenation
  • external macro compression

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. Pattern matching in dynamic texts. In Proc. 11th SODA, pages 819-828, 2000. Google Scholar
  2. Amihood Amir, Gad M Landau, Moshe Lewenstein, and Dina Sokol. Dynamic text and static pattern matching. ACM TALG, 3(2):19, 2007. Google Scholar
  3. Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Fast prefix search in little space, with applications. In Proc. 18th ESA, pages 427-438, 2010. Google Scholar
  4. Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel Vildhøj, and Søren Vind. Dynamic relative compression, dynamic partial sums, and substring concatenation. CoRR, abs/1504.07851, 2015. URL: http://arxiv.org/abs/1504.07851.
  5. Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and Søren Vind. String indexing for patterns with wildcards. Theory Comput. Syst., 55(1):41-60, 2014. Google Scholar
  6. B. G. Chern, Idoia Ochoa, Alexandros Manolakos, Albert No, Kartik Venkat, and Tsachy Weissman. Reference based genome compression. In IEEE ITW, pages 427-431, 2012. Google Scholar
  7. Richard Cole, Lee-Ad Gottlieb, and Moshe Lewenstein. Dictionary matching and indexing with errors and don't cares. In Proc. 36th STOC, pages 91-100, 2004. Google Scholar
  8. Paul F. Dietz. Optimal algorithms for list indexing and subset rank. In Proc. 1st WADS, pages 39-46, 1989. Google Scholar
  9. Huy Hoang Do, Jesper Jansson, Kunihiko Sadakane, and Wing-Kin Sung. Fast relative Lempel-Ziv self-index for similar sequences. TCS, 532:14-30, 2014. Google Scholar
  10. Paolo Ferragina and Giovanni Manzini. Indexing compressed text. J. ACM, 52(4):552-581, 2005. Google Scholar
  11. Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro. Succinct representation of sequences. Technical report, Università di Pisa, 2004. Google Scholar
  12. Paolo Ferragina and Rossano Venturini. A simple storage scheme for strings achieving entropy bounds. TCS, 372(1):115-121, 2007. Google Scholar
  13. Michael Fredman and Michael Saks. The cell probe complexity of dynamic data structures. In Proc. 21st STOC, pages 345-354, 1989. Google Scholar
  14. Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. System Sci., 47(3):424-436, 1993. Google Scholar
  15. Paweł Gawrychowski, Moshe Lewenstein, and Patrick K Nicholson. Weighted ancestors in suffix trees. In Algorithms - ESA 2014, pages 455-466. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-662-44777-2_38.
  16. Rodrigo González and Gonzalo Navarro. Compressed text indexes with fast locate. In Combinatorial Pattern Matching, volume 4580, pages 216-227. Springer, 2007. URL: http://dx.doi.org/10.1007/978-3-540-73437-6_23.
  17. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proc. 14th SODA, pages 841-850, 2003. Google Scholar
  18. Wing-Kai Hon, Kunihiko Sadakane, and Wing-Kin Sung. Succinct data structures for searchable partial sums with optimal worst-case performance. TCS, 412(39):5176-5186, 2011. Google Scholar
  19. Christopher Hoobin, Simon J. Puglisi, and Justin Zobel. Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections. PVLDB, 5(3):265-273, 2011. Google Scholar
  20. Thore Husfeldt and Theis Rauhe. New lower bound techniques for dynamic partial sums and related problems. SIAM J. Comput., 32(3):736-753, 2003. Google Scholar
  21. Jesper Jansson, Kunihiko Sadakane, and Wing-Kin Sung. CRAM: Compressed random access memory. In Automata, Languages, and Programming, pages 510-521. Springer, 2012. URL: http://dx.doi.org/10.1007/978-3-642-31594-7_43.
  22. Brian Kernighan and Dennis Ritchie. The C Programming Language (1st Ed.). Prentice-Hall, 1978. Google Scholar
  23. Shanika Kuruppu, Simon J. Puglisi, and Justin Zobel. Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In Proc. 17th SPIRE, pages 201-206, 2010. Google Scholar
  24. Shanika Kuruppu, Simon J. Puglisi, and Justin Zobel. Optimized relative Lempel-Ziv compression of genomes. In Proc. 34th ACSC, pages 91-98, 2011. Google Scholar
  25. Moshe Lewenstein, Yakov Nekrich, and Jeffrey Scott Vitter. Space-efficient string indexing for wildcard pattern matching. In Proc. 31st STACS, pages 506-517, 2014. Google Scholar
  26. Stan Y. Liao, Srinivas Devadas, and Kurt Keutzer. A text-compression-based method for code size minimization in embedded systems. ACM Trans. Design Autom. Electr. Syst., 4(1):12-38, 1999. Google Scholar
  27. Stan Y. Liao, Srinivas Devadas, Kurt Keutzer, Steven W. K. Tjiang, and Albert Wang. Code optimization techniques in embedded DSP microprocessors. Design Autom. for Emb. Sys., 3(1):59-73, 1998. Google Scholar
  28. Gonzalo Navarro and Yakov Nekrich. Optimal dynamic sequence representations. In Proc. 24th SODA, pages 865-876, 2013. Google Scholar
  29. Gonzalo Navarro and Kunihiko Sadakane. Fully functional static and dynamic succinct trees. ACM Trans. Alg., 10(3):16, 2014. Google Scholar
  30. Mihai Pătraşcu and Erik D Demaine. Tight bounds for the partial-sums problem. In Proc. 15th SODA, pages 20-29, 2004. Google Scholar
  31. Mihai Pătraşcu and Mikkel Thorup. Dynamic integer sets with optimal rank, select, and predecessor search. In Proc. 55th FOCS, pages 166-175, 2014. Google Scholar
  32. Rajeev Raman, Venkatesh Raman, and S Srinivasa Rao. Succinct dynamic data structures. In Algorithms and Data Structures, pages 426-437. Springer, 2001. URL: http://dx.doi.org/10.1007/3-540-44634-6_39.
  33. Kunihiko Sadakane and Roberto Grossi. Squeezing succinct data structures into entropy bounds. In Proc. 17th SODA, pages 1230-1239, 2006. Google Scholar
  34. James A. Storer and Thomas G. Szymanski. The macro model for data compression. In Proc. 10th STOC, pages 30-39, 1978. Google Scholar
  35. James A Storer and Thomas G Szymanski. Data compression via textual substitution. J. ACM, 29(4):928-951, 1982. Google Scholar
  36. Bjarne Stroustrup. The C++ Programming Language: Special Edition (3rd Edition). Addison-Wesley, 2000. First edition from 1985. Google Scholar
  37. Dan E. Willard. Examining computational geometry, van emde boas trees, and hashing from the perspective of the fusion tree. SIAM J. Comput., 29(3):1030-1049, 2000. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail