Bidirectional Text Compression in External Memory

Authors Patrick Dinklage, Jonas Ellert, Johannes Fischer, Dominik Köppl , Manuel Penschuck



PDF
Thumbnail PDF

File

LIPIcs.ESA.2019.41.pdf
  • Filesize: 0.84 MB
  • 16 pages

Document Identifiers

Author Details

Patrick Dinklage
  • Technische Universität Dortmund, Department of Computer Science, Germany
Jonas Ellert
  • Technische Universität Dortmund, Department of Computer Science, Germany
Johannes Fischer
  • Technische Universität Dortmund, Department of Computer Science, Germany
Dominik Köppl
  • Kyushu University, Fukuoka, Japan Society for Promotion of Science, Japan
Manuel Penschuck
  • Goethe University Frankfurt, Department of Computer Science, Germany

Cite AsGet BibTex

Patrick Dinklage, Jonas Ellert, Johannes Fischer, Dominik Köppl, and Manuel Penschuck. Bidirectional Text Compression in External Memory. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 41:1-41:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ESA.2019.41

Abstract

Bidirectional compression algorithms work by substituting repeated substrings by references that, unlike in the famous LZ77-scheme, can point to either direction. We present such an algorithm that is particularly suited for an external memory implementation. We evaluate it experimentally on large data sets of size up to 128 GiB (using only 16 GiB of RAM) and show that it is significantly faster than all known LZ77 compressors, while producing a roughly similar number of factors. We also introduce an external memory decompressor for texts compressed with any uni- or bidirectional compression scheme.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data compression
Keywords
  • text compression
  • bidirectional parsing
  • text decompression
  • external algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alok Aggarwal and Jeffrey Scott Vitter. The Input/Output Complexity of Sorting and Related Problems. Commun. ACM, 31(9):1116-1127, 1988. Google Scholar
  2. Djamal Belazzougui, Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Lempel-Ziv decoding in external memory. In Proc. SEA, volume 9685 of LNCS, pages 63-74, 2016. Google Scholar
  3. Timothy C. Bell. Better OPM/L Text Compression. IEEE Trans. Communications, 34(12):1176-1182, 1986. Google Scholar
  4. M. Burrows and D. J. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California, 1994. Google Scholar
  5. Roman Dementiev, Lutz Kettner, and Peter Sanders. STXXL: standard template library for XXL data sets. Softw., Pract. Exper., 38(6):589-637, 2008. Google Scholar
  6. Patrick Dinklage, Johannes Fischer, Dominik Köppl, Marvin Löbel, and Kunihiko Sadakane. Compression with the tudocomp Framework. In Proc. SEA, volume 75 of LIPIcs, pages 13:1-13:22, 2017. URL: http://arxiv.org/abs/1702.07577.
  7. Paolo Ferragina, Travis Gagie, and Giovanni Manzini. Lightweight Data Indexing and Compression in External Memory. Algorithmica, 63(3):707-730, 2012. Google Scholar
  8. Paolo Ferragina, Igor Nitto, and Rossano Venturini. On the Bit-Complexity of Lempel-Ziv Compression. SIAM J. Comput., 42(4):1521-1541, 2013. Google Scholar
  9. Johannes Fischer, Tomohiro I, Dominik Köppl, and Kunihiko Sadakane. Lempel-Ziv Factorization Powered by Space Efficient Suffix Trees. Algorithmica, 80(7):2048-2081, 2018. Google Scholar
  10. Travis Gagie, Gonzalo Navarro, and Nicola Prezza. On the Approximation Ratio of Lempel-Ziv Parsing. In Proc. LATIN, volume 10807 of LNCS, pages 490-503, 2018. Google Scholar
  11. J. K. Gallant. String compression algorithms. PhD thesis, Princeton University, 1982. Google Scholar
  12. Keisuke Goto and Hideo Bannai. Simpler and Faster Lempel Ziv Factorization. In Proc. DCC, pages 133-142, 2013. Google Scholar
  13. Keisuke Goto and Hideo Bannai. Space Efficient Linear Time Lempel-Ziv Factorization for Small Alphabets. In Proc. DCC, pages 163-172, 2014. Google Scholar
  14. Joseph JáJá. An Introduction to Parallel Algorithms. Addison-Wesley, 1992. Google Scholar
  15. Juha Kärkkäinen and Dominik Kempa. LCP array construction in external memory. ACM Journal of Experimental Algorithmics, 21(1):1.7:1-1.7:22, 2016. Google Scholar
  16. Juha Kärkkäinen and Dominik Kempa. Engineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet. In Proc. SEA, volume 75 of LIPIcs, pages 17:1-17:14, 2017. Google Scholar
  17. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Lempel-Ziv parsing in external memory. In Proc. DCC, pages 153-162, 2014. Google Scholar
  18. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Parallel External Memory Suffix Sorting. In Proc. CPM, volume 9133 of LNCS, pages 329-342, 2015. Google Scholar
  19. Juha Kärkkäinen, Dominik Kempa, and Simon John Puglisi. Lightweight Lempel-Ziv Parsing. In Proc. SEA, volume 7933 of LNCS, pages 139-150. Springer, 2013. Google Scholar
  20. Juha Kärkkäinen, Giovanni Manzini, and Simon J. Puglisi. Permuted Longest-Common-Prefix Array. In Proc. CPM, volume 5577 of LNCS, pages 181-192, 2009. Google Scholar
  21. Dominik Kempa and Dmitry Kosolobov. LZ-end parsing in compressed space. In Proc. DCC, pages 350-359, 2017. Google Scholar
  22. Dominik Kempa and Nicola Prezza. At the roots of dictionary compression: string attractors. In Proc. STOC, pages 827-840, 2018. Google Scholar
  23. Sebastian Kreft and Gonzalo Navarro. LZ77-like compression with fast random access. In Proc. DCC, pages 239-248, 2010. Google Scholar
  24. Udi Manber and Eugene W. Myers. Suffix Arrays: A New Method for On-Line String Searches. SIAM J. Comput., 22(5):935-948, 1993. Google Scholar
  25. Markus Mauer, Timo Beller, and Enno Ohlebusch. A Lempel-Ziv-style Compression Method for Repetitive Texts. In Proc. PSC, pages 96-107, 2017. Google Scholar
  26. Ryosuke Nakamura, Shunsuke Inenaga, Hideo Bannai, Takashi Funamoto, Masayuki Takeda, and Ayumi Shinohara. Linear-Time Text Compression by Longest-First Substitution. Algorithms, 2(4):1429-1448, 2009. Google Scholar
  27. Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda.𝒪(n log n)-time text compressionby LZ-style longest first substitution. In Proc. PSC, pages 12-26, 2018. Google Scholar
  28. Takaaki Nishimoto and Yasuo Tabei. LZRR: LZ77 parsing with right reference. arXiv 1812.04261, 2018. URL: http://arxiv.org/abs/1812.04261.
  29. James A. Storer and Thomas G. Szymanski. Data compression via textural substitution. J. ACM, 29(4):928-951, 1982. Google Scholar
  30. Vladimir Yanovsky. ReCoil - an algorithm for compression of extremely large datasets of DNA data. Algorithms for Molecular Biology, 6(23):1-9, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail