,
Joseph Winjum
,
Jordan Dood
,
Hiroki Shibata
,
Shunsuke Inenaga
Creative Commons Attribution 4.0 International license
Grammar-based compression is a powerful compression technique that allows for computation over the compressed data. While there has been extensive theoretical work on grammar and encoding size, there has been little work on practical grammar encodings. In this work, we consider the canonical array-of-arrays grammar representation and present a general bit packing approach for reducing its space requirements in practice. We then present three bit packing strategies based on this approach - one online and two offline - with different space-time trade-offs. This technique can be used to encode any grammar-compressed string while preserving the virtues of the array-of-arrays representation. We show that our encodings are Nlog₂ N away from the information-theoretic bound, where N is the number of symbols in the grammar, and that they are much smaller than methods that meet the information-theoretic bound in practice. Moreover, our experiments show that by using bit packed encodings we can achieve state-of-the-art performance both in grammar encoding size and run-time performance of random-access queries.
@InProceedings{cleary_et_al:LIPIcs.SEA.2025.12,
author = {Cleary, Alan M. and Winjum, Joseph and Dood, Jordan and Shibata, Hiroki and Inenaga, Shunsuke},
title = {{Bit Packed Encodings for Grammar-Compressed Strings Supporting Fast Random Access}},
booktitle = {23rd International Symposium on Experimental Algorithms (SEA 2025)},
pages = {12:1--12:17},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-375-1},
ISSN = {1868-8969},
year = {2025},
volume = {338},
editor = {Mutzel, Petra and Prezza, Nicola},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.12},
URN = {urn:nbn:de:0030-drops-232506},
doi = {10.4230/LIPIcs.SEA.2025.12},
annote = {Keywords: String algorithms, data compression, random access, grammar-based compression}
}
archived version