We show that the number of distinct squares in a packed string of length n over an alphabet of size σ can be computed in 𝒪(n/log_{σ}n) time in the word-RAM model of computation. This paper is the first to introduce a sublinear time algorithm for the packed version of squares counting. The packed representation of a string of length n over an alphabet of size σ is given as a sequence of 𝒪(n/ log_{σ} n) machine words in the word-RAM model (a machine word consists of ω ≥ log₂ n bits). Previously it was known how to count distinct squares in 𝒪(n) time [Gusfield and Stoye, JCSS 2004], even for a string over an integer alphabet, see [Crochemore et al., TCS 2014; Bannai et al., CPM 2017; Charalampopoulos et al., SPIRE 2020]. We use techniques of squares extraction from runs described by Crochemore et al. [TCS 2014]. However, the packed model requires novel approaches. In particular, we need an 𝒪(n/log_{σ}n) sized representation of all long-period runs (runs with periods that are Ω(log_{σ}n)) which guarantees sublinear time counting of potentially linearly-many implied squares. The long-period runs with a string period that is periodic itself (called layer runs) are an obstacle, since their number can be Ω(n). Fortunately, the number of all other long-period runs is 𝒪(n/log_{σ}n) and we can construct an implicit representation of all long-period runs in 𝒪(n/log_{σ}n) time by adopting the insights of Amir et al. [ESA 2019], combined with sublinear time tools provided by the PILLAR model of computations in case of packed strings. We count squares in layer runs in sublinear time by exploiting combinatorial properties of types of pyramidally-shaped groups of layer runs. As a by-product, we discover several new structural properties of runs. Another difficulty is to compute, in sublinear time, locations of Lyndon roots of runs in packed strings, which is needed for grouping of runs that can generate equal squares. To overcome this difficulty, we introduce sparse-Lyndon roots which are based on the notion of string synchronizers proposed by Kempa and Kociumaka [STOC 2019].
@InProceedings{charalampopoulos_et_al:LIPIcs.MFCS.2025.36, author = {Charalampopoulos, Panagiotis and Mohamed, Manal and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Counting Distinct Square Substrings in Sublinear Time}}, booktitle = {50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)}, pages = {36:1--36:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-388-1}, ISSN = {1868-8969}, year = {2025}, volume = {345}, editor = {Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.36}, URN = {urn:nbn:de:0030-drops-241439}, doi = {10.4230/LIPIcs.MFCS.2025.36}, annote = {Keywords: square in a string, packed model, run (maximal repetition), Lyndon word} }
Feedback for Dagstuhl Publishing