Creative Commons Attribution 4.0 International license
Suppose that we are given a string s of length n over an alphabet {0,1,…,n^O(1)} and δ is the string complexity of s, a known compression measure. We describe an index on s with O(δlog(n/δ)) space, measured in O(log n)-bit machine words, which can search in s any string of length m in O(m + (occ + 1)log^ε n) time, where occ is the number of occurrences and ε > 0 is any fixed constant (the big-O in the space bound hides factor 1/ε). Crucially, the index can be built in O(n log n) expected time by one left-to-right pass on the string s in a streaming fashion with O(δlog(n/δ)) construction space. The index does not use the Karp-Rabin fingerprints, and the randomization in the construction time can be eliminated by using deterministic dictionaries instead of hash tables (with a slowdown). The search time matches currently best results and the space is almost optimal (the known optimum is O(δlog n/(δα)), where α = log_σ n and σ is the alphabet size, and it coincides with O(δlog(n/δ)) when δ = O(n/α²)). This is the first index that can be constructed within such space and with such time guarantees. To avoid uninteresting marginal cases, all above bounds are stated for δ ≥ Ω(log log n).
@InProceedings{kosolobov:LIPIcs.CPM.2026.25,
author = {Kosolobov, Dmitry},
title = {{Compressed Index with Construction in Compressed Space}},
booktitle = {37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
pages = {25:1--25:24},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-420-8},
ISSN = {1868-8969},
year = {2026},
volume = {369},
editor = {Bille, Philip and Prezza, Nicola},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.25},
URN = {urn:nbn:de:0030-drops-259515},
doi = {10.4230/LIPIcs.CPM.2026.25},
annote = {Keywords: compressed index, pattern matching, string complexity, grammar, block tree}
}