Compression with the tudocomp Framework

Authors Patrick Dinklage, Johannes Fischer, Dominik Köppl, Marvin Löbel, Kunihiko Sadakane



PDF
Thumbnail PDF

File

LIPIcs.SEA.2017.13.pdf
  • Filesize: 0.87 MB
  • 22 pages

Document Identifiers

Author Details

Patrick Dinklage
Johannes Fischer
Dominik Köppl
Marvin Löbel
Kunihiko Sadakane

Cite As Get BibTex

Patrick Dinklage, Johannes Fischer, Dominik Köppl, Marvin Löbel, and Kunihiko Sadakane. Compression with the tudocomp Framework. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 13:1-13:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.SEA.2017.13

Abstract

We present a framework facilitating the implementation and comparison of text compression algorithms. We evaluate its features by a case study on two novel compression algorithms based on the Lempel-Ziv compression schemes that perform well on highly repetitive texts.

Subject Classification

Keywords
  • lossless compression
  • compression framework
  • compression library
  • algorithm engineering
  • application of string algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jyrki Alakuijala and Zoltan Szabadka. Brotli Compressed Data Format. RFC 7932, 2016. Google Scholar
  2. David Benoit, Erik D. Demaine, J. Ian Munro, Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Representing trees of higher degree. Algorithmica, 43(4):275-292, 2005. Google Scholar
  3. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994. Google Scholar
  4. David R. Clark. Compact Pat Trees. PhD thesis, University of Waterloo, Canada, 1996. Google Scholar
  5. Jarek Duda, Khalid Tahboub, Neeraj J. Gadgil, and Edward J. Delp. The use of asymmetric numeral systems as an accurate replacement for Huffman coding. In Proc. PCS, pages 65-69. IEEE Computer Society, 2015. Google Scholar
  6. Peter Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2):194-203, 1975. Google Scholar
  7. Andrea Farruggia, Paolo Ferragina, and Rossano Venturini. Bicriteria data compression: Efficient and usable. In Proc. ESA, volume 8737 of LNCS, pages 406-417. Springer, 2014. Google Scholar
  8. Paolo Ferragina, Igor Nitto, and Rossano Venturini. On the bit-complexity of Lempel-Ziv compression. SIAM J. Comput., 42(4):1521-1541, 2013. Google Scholar
  9. Paolo Ferragina, Francesco Piccinno, and Roberto Santoro. On analyzing hashtags in Twitter. In Proc. ICWSM, pages 110-119, 2015. Google Scholar
  10. Johannes Fischer, Tomohiro I, and Dominik Köppl. Lempel-Ziv computation in small space (LZ-CISS). In Proc. CPM, volume 9133 of LNCS, pages 172-184. Springer, 2015. Google Scholar
  11. Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley, first edition, 1995. Google Scholar
  12. Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From theory to practice: Plug and play with succinct data structures. In Proc. SEA, volume 8504 of LNCS, pages 326-337. Springer, 2014. Google Scholar
  13. Roberto Grossi and Giuseppe Ottaviano. Design of practical succinct data structures for large data collections. In Proc. SEA, volume 7933 of LNCS, pages 5-17. Springer, 2013. Google Scholar
  14. Jan Holub, Jakub Reznicek, and Filip Simek. Lossless data compression testbed: ExCom and Prague corpus. In Proc. DCC, page 457. IEEE Computer Society, 2011. Google Scholar
  15. Guy Joseph Jacobson. Space-efficient static trees and graphs. In Proc. FOCS, pages 549-554. IEEE Computer Society, 1989. Google Scholar
  16. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Linear time Lempel-Ziv factorization: Simple, fast, small. In Proc. CPM, volume 7922 of LNCS, pages 189-200. Springer, 2013. Google Scholar
  17. Juha Kärkkäinen, Giovanni Manzini, and Simon John Puglisi. Permuted longest-common-prefix array. In Proc. CPM, volume 5577 of LNCS, pages 181-192. Springer, 2009. Google Scholar
  18. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. J. ACM, 53(6):1-19, 2006. Google Scholar
  19. Dominik Köppl and Kunihiko Sadakane. Lempel-Ziv computation in compressed space (LZ-CICS). In Proc. DCC, pages 3-12. IEEE Computer Society, 2016. Google Scholar
  20. N. Jesper Larsson and Alistair Moffat. Offline dictionary-based compression. In Proc. DCC, pages 296-305. IEEE Computer Society, 1999. Google Scholar
  21. Udi Manber and Eugene W. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993. Google Scholar
  22. Wataru Matsubara, Kazuhiko Kusano, Hideo Bannai, and Ayumi Shinohara. A series of run-rich strings. In Proc. LATA, volume 5457 of LNCS, pages 578-587. Springer, 2009. Google Scholar
  23. Kunihiko Sadakane. Compressed suffix trees with full functionality. Theory of Computing Systems, 41(4):589-607, 2007. Google Scholar
  24. Maria Simi and Giuseppe Attardi. Adapting the tanl tool suite to universal dependencies. In Proc. LREC. European Language Resources Association, 2016. Google Scholar
  25. James A. Storer and Thomas G. Szymanski. Data compression via textural substitution. J. ACM, 29(4):928-951, 1982. Google Scholar
  26. Peter Weiner. Linear pattern matching algorithms. In Proc. Annual Symp. on Switching and Automata Theory, pages 1-11. IEEE Computer Society, 1973. Google Scholar
  27. Terry A. Welch. A technique for high-performance data compression. Computer, 17(6):8-19, 1984. Google Scholar
  28. Hugh E. Williams and Justin Zobel. Compressing integers for fast file access. Comput. J., 42(3):193-201, 1999. Google Scholar
  29. Ian H Witten, Alistair Moffat, and Timothy C Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, 2nd edition, 1999. Google Scholar
  30. Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory, 23(3):337-343, 1977. Google Scholar
  31. Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable length coding. IEEE Trans. Inform. Theory, 24(5):530-536, 1978. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail