,
Tomasz Kociumaka
Creative Commons Attribution 4.0 International license
A Random Access query to a string T asks for the character T[i] at a given position i ∈ [0..|T|). This fundamental task admits a straightforward solution with constant-time queries and 𝒪(n log σ) bits of space when T ∈ [0..σ)ⁿ. While this is the best one can achieve in the worst case, much research has focused on the compressed setting: if T is compressible, one can hope for a much smaller data structure that still answers Random Access queries efficiently.
In this work, we investigate the grammar-compressed setting, where T is represented by a context-free grammar that produces only T. Our main result is a general trade-off that optimizes Random Access time as a function of the string length n, the grammar size (the total length of productions) g, the alphabet size σ, the data structure size M, and the word size w ≥ Ω(log n) of the word RAM model. For any data structure size M satisfying glog n < Mw < nlog σ, we show an 𝒪(M)-size data structure that answers Random Access queries in time 𝒪(log((n log σ)/(Mw)) / log(Mw/(g log n))) . We also prove a matching unconditional lower bound that holds for all parameter regimes except very small grammars (g ≤ w^{1+o(1)} log n) and relatively small data structures (Mw ≤ g log n ⋅ w^o(1)). The lower bound applies to word-RAM query time and, more strongly, to the worst-case cell-probe complexity of nondeterministic or bounded-error randomized query algorithms.
Previous work focused on optimizing the query time as a function of n only, achieving 𝒪(log n) time using 𝒪(g) space [Bille, Landau, Raman, Sadakane, Satti, Weimann; SIAM J. Comput. 2015] and 𝒪((log n)/(log log n)) time using 𝒪(g log^ε n) space for any constant ε > 0 [Belazzougui, Cording, Puglisi, Tabei; ESA 2015], [Ganardi, Jeż, Lohrey; J. ACM 2021]. Our result improves upon these bounds (strictly for g = n^{1-o(1)}) and generalizes them beyond M ≤ 𝒪(g poly log n), yielding a smooth interpolation with the uncompressed setting of Mw = nlogσ bits.
Thus far, the only tight lower bound [Verbin and Yu; CPM 2013] was Ω((log n)/(log log n)) for w = Θ(log n), n^Ω(1) ≤ g ≤ n^{1-Ω(1), and M = g⋅log^Θ(1) n. In contrast, our result yields a tight bound that accounts for all relevant parameters and is valid for almost all parameter regimes.
Our bounds remain valid for run-length grammars, where production sizes use run-length encoding. This lets us recover (and, for strings with small run-length grammars, improve) the trade-offs achieved by block trees, formulated in terms of the LZ77 size z [Belazzougui, Cáceres, Gagie, Gawrychowski, Kärkkäinen, Navarro, Ordóñez, Puglisi, Tabei; J. Comput. Syst. Sci. 2021] and substring complexity δ [Kociumaka, Navarro, Prezza; IEEE Trans. Inf. Theory 2023].
Our data structure admits an efficient deterministic construction algorithm. Beyond Random Access, its variants also support substring extraction (with optimal additive overhead 𝒪((m log σ)/w) for a length-m substring, provided that M ≥ g), as well as rank and select queries.
All our results rely on novel grammar transformations that generalize contracting grammars [Ganardi; ESA 2021] and achieve the optimal trade-off between grammar size and height while enforcing extra structure crucial for constant-time navigation in the parse tree.
@InProceedings{duyster_et_al:LIPIcs.ICALP.2026.86,
author = {Duyster, Anouk and Kociumaka, Tomasz},
title = {{Random Access in Grammar-Compressed Strings: Optimal Trade-Offs in Almost All Parameter Regimes}},
booktitle = {53rd International Colloquium on Automata, Languages, and Programming (ICALP 2026)},
pages = {86:1--86:25},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-428-4},
ISSN = {1868-8969},
year = {2026},
volume = {374},
editor = {Bhattacharya, Sayan and Nanongkai, Danupon and Benedikt, Michael and Puppis, Gabriele},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2026.86},
URN = {urn:nbn:de:0030-drops-264755},
doi = {10.4230/LIPIcs.ICALP.2026.86},
annote = {Keywords: grammar-based compression, straight-line programs, random access problem}
}