Creative Commons Attribution 4.0 International license
A Straight-Line Program (SLP) for a string T is a context-free grammar in Chomsky normal form that derives T only, which can be seen as a compressed form of T. Kida et al. introduced collage systems [Theor. Comput. Sci., 2003] to generalize SLPs by adding repetition rules and truncation rules. The smallest size c(T) of collage systems for T has gained attention to see how these generalized rules improve the compression ability of SLPs. Navarro et al. [IEEE Trans. Inf. Theory, 2021] showed that c(T) ∈ O(z(T)) and there is a string family with c(T) ∈ Ω(b(T) log |T|), where z(T) is the number of phrases in the Lempel-Ziv parsing of T and b(T) is the smallest size of bidirectional schemes for T. They also introduced a subclass of collage systems, called internal collage systems, and proved that its smallest size ĉ(T) for T is at least b(T). While c(T) ≤ ĉ(T) is obvious, it is unknown how large ĉ(T) is compared to c(T). In this paper, we prove that ĉ(T) = Θ(c(T)) by showing that any collage system of size m can be transformed into an internal collage system of size O(m) in O(m²) time. Thanks to this result, we can focus on internal collage systems to study the asymptotic behavior of c(T), which helps to suppress excess use of truncation rules. As a direct application, we get b(T) = O(c(T)), which answers an open question posed in [Navarro et al., IEEE Trans. Inf. Theory, 2021]. We also give a MAX-SAT formulation to compute ĉ(T) for a given T.
@InProceedings{migita_et_al:LIPIcs.CPM.2026.31,
author = {Migita, Soichiro and Uehata, Kyotaro and I, Tomohiro},
title = {{On the Smallest Size of Internal Collage Systems}},
booktitle = {37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
pages = {31:1--31:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-420-8},
ISSN = {1868-8969},
year = {2026},
volume = {369},
editor = {Bille, Philip and Prezza, Nicola},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.31},
URN = {urn:nbn:de:0030-drops-259575},
doi = {10.4230/LIPIcs.CPM.2026.31},
annote = {Keywords: Collage Systems, Dictionary-based compression, Compressibility measures}
}