DROPS

Document

DOI: 10.4230/LIPIcs.STACS.2026.36

Time-Optimal Construction of String Synchronizing Sets

Authors: Jonas Ellert and Tomasz Kociumaka

Published in: LIPIcs, Volume 364, 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)

Abstract

A powerful design principle behind many modern string algorithms is local consistency: breaking the symmetry between string positions based on their small contexts so that matching fragments are handled consistently. Among the most influential instantiations of this principle are string synchronizing sets [Kempa & Kociumaka; STOC 2019]. A τ-synchronizing set of a string of length n is a set of O(n/τ) string positions, chosen using their length-2τ contexts, such that (outside of highly periodic regions) every block of τ consecutive positions contains at least one element of the set. Synchronizing sets have found dozens of applications in diverse settings, from quantum and dynamic algorithms to fully compressed computation. In the classic word RAM model, particularly for strings over small alphabets, they enabled faster solutions to core problems in data compression, text indexing, and string similarity. In this work, we show that any string T ∈ [0 .. σ)ⁿ can be preprocessed in O(n log σ / log n) time so that, for any given integer τ ∈ [1 .. n], a τ-synchronizing set of T can be constructed in O((n log τ)/(τ log n)) time. Both bounds are optimal in the word RAM model with machine word size w = Θ(log n), matching the information-theoretic minimum for the input and output sizes, respectively. Previously, constructing a τ-synchronizing set required O(n/τ) time after an O(n)-time preprocessing [Kociumaka, Radoszewski, Rytter, and Waleń; SICOMP 2024], or, in the restricted regime of τ < 0.2 log_σ n, without any preprocessing needed [Kempa & Kociumaka; STOC 2019]. A simple instantiation of our method outputs the synchronizing set as a sorted list in O(n/τ) time, or as a bitmask in O(n/log n) time. Our optimal construction produces a compact fully indexable dictionary, supporting select queries in O(1) time and rank queries in O(log ((log τ)/(log log n))) time. The latter complexity matches known unconditional cell-probe lower bounds for τ ≤ n^{1-Ω(1)}. To achieve this, we introduce a general framework for efficiently processing sparse integer sequences via a custom variable-length encoding. We also augment the optimal variant of van Emde Boas trees [Pătraşcu & Thorup; STOC 2006] with a deterministic linear-time construction. When the set is represented as a bitmask under our sparse encoding, the same guarantees for select and rank queries hold after preprocessing in time proportional to the size of our encoding (in words).

Cite as

Jonas Ellert and Tomasz Kociumaka. Time-Optimal Construction of String Synchronizing Sets. In 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 364, pp. 36:1-36:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{ellert_et_al:LIPIcs.STACS.2026.36,
  author =	{Ellert, Jonas and Kociumaka, Tomasz},
  title =	{{Time-Optimal Construction of String Synchronizing Sets}},
  booktitle =	{43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)},
  pages =	{36:1--36:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-412-3},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{364},
  editor =	{Mahajan, Meena and Manea, Florin and McIver, Annabelle and Thắng, Nguy\~{ê}n Kim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2026.36},
  URN =		{urn:nbn:de:0030-drops-255258},
  doi =		{10.4230/LIPIcs.STACS.2026.36},
  annote =	{Keywords: synchronizing sets, local consistency, packed strings}
}

Document

DOI: 10.4230/LIPIcs.SEA.2025.8

A Simple Integer Successor-Delete Data Structure

Authors: Gerth Stølting Brodal

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)

Abstract

We consider a simple decremental data structure for maintaining a set of integers, that supports initializing the set to {1,2,…,n} followed by d deletions and s successor queries in arbitrary order in total 𝒪(n+d+s⋅(1+log_{max(2,s/n)} min(s,n))) time. The data structure consists of a single array of n integers. A straightforward modification allows the data structure to also support p predecessor and r range queries, with a total output k, in total 𝒪(n+d+k+q ⋅ (1+log_{max(2,q/n)} min(q,n))) time, where q = s+p+r. The data structure is essentially a special case of the classic union-find data structure with path compression but with unweighted linking (i.e., without linking by rank or size), that is known to achieve logarithmic amortized time bounds (Tarjan and van Leeuwen, 1984). In this paper we study the efficiency of this simple data structure, and compare it to other, theoretically superior, data structures.

Cite as

Gerth Stølting Brodal. A Simple Integer Successor-Delete Data Structure. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{brodal:LIPIcs.SEA.2025.8,
  author =	{Brodal, Gerth St{\o}lting},
  title =	{{A Simple Integer Successor-Delete Data Structure}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{8:1--8:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.8},
  URN =		{urn:nbn:de:0030-drops-232461},
  doi =		{10.4230/LIPIcs.SEA.2025.8},
  annote =	{Keywords: Successor queries, deletions, interval union-find, union-find}
}

Document

DOI: 10.4230/LIPIcs.SEA.2021.7

Engineering Predecessor Data Structures for Dynamic Integer Sets

Authors: Patrick Dinklage, Johannes Fischer, and Alexander Herlez

Published in: LIPIcs, Volume 190, 19th International Symposium on Experimental Algorithms (SEA 2021)

Abstract

We present highly optimized data structures for the dynamic predecessor problem, where the task is to maintain a set S of w-bit numbers under insertions, deletions, and predecessor queries (return the largest element in S no larger than a given key). The problem of finding predecessors can be viewed as a generalized form of the membership problem, or as a simple version of the nearest neighbour problem. It lies at the core of various real-world problems such as internet routing. In this work, we engineer (1) a simple implementation of the idea of universe reduction, similar to van-Emde-Boas trees (2) variants of y-fast tries [Willard, IPL'83], and (3) B-trees with different strategies for organizing the keys contained in the nodes, including an implementation of dynamic fusion nodes [Pǎtraşcu and Thorup, FOCS'14]. We implement our data structures for w = 32,40,64, which covers most typical scenarios. Our data structures finish workloads faster than previous approaches while being significantly more space-efficient, e.g., they clearly outperform standard implementations of the STL by finishing up to four times as fast using less than a third of the memory. Our tests also provide more general insights on data structure design, such as how small sets should be stored and handled and if and when new CPU instructions such as advanced vector extensions pay off.

Cite as

Patrick Dinklage, Johannes Fischer, and Alexander Herlez. Engineering Predecessor Data Structures for Dynamic Integer Sets. In 19th International Symposium on Experimental Algorithms (SEA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 190, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{dinklage_et_al:LIPIcs.SEA.2021.7,
  author =	{Dinklage, Patrick and Fischer, Johannes and Herlez, Alexander},
  title =	{{Engineering Predecessor Data Structures for Dynamic Integer Sets}},
  booktitle =	{19th International Symposium on Experimental Algorithms (SEA 2021)},
  pages =	{7:1--7:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-185-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{190},
  editor =	{Coudert, David and Natale, Emanuele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2021.7},
  URN =		{urn:nbn:de:0030-drops-137799},
  doi =		{10.4230/LIPIcs.SEA.2021.7},
  annote =	{Keywords: integer data structures, dynamic data structures, predecessor, universe reduction, y-fast trie, fusion tree, B-tree}
}

Document

DOI: 10.4230/LIPIcs.ESA.2020.39

Practical Performance of Space Efficient Data Structures for Longest Common Extensions

Authors: Patrick Dinklage, Johannes Fischer, Alexander Herlez, Tomasz Kociumaka, and Florian Kurpicz

Published in: LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)

Abstract

For a text T[1,n], a Longest Common Extension (LCE) query lce_T(i,j) asks for the length of the longest common prefix of the suffixes T[i,n] and T[j,n] identified by their starting positions 1 ≤ i,j ≤ n. A classic problem in stringology asks to preprocess a static text T[1,n] over an alphabet of size σ so that LCE queries can be efficiently answered on-line. Since its introduction in the 1980’s, this problem has found numerous applications: in suffix sorting, edit distance computation, approximate pattern matching, regularities finding, string mining, and many more. Text-book solutions offer O(n) preprocessing time and O(1) query time, but they employ memory-heavy data structures, such as suffix arrays, in practice several times bigger than the text itself. Very recently, more space efficient solutions using O(nlogσ) bits of total space or even only O(log n) bits of extra space have been proposed: string synchronizing sets [Kempa and Kociumaka, STOC'19, and Birenzwige et al., SODA'20] and in-place fingerprinting [Prezza, SODA'18]. The goal of this article is to present well-engineered implementations of these new solutions and study their practicality on a commonly agreed text corpus. We show that both perform extremely well in practice, with space consumption of only around 10% of the input size for string synchronizing sets (around 20% for highly repetitive texts), and essentially no extra space for fingerprinting. Interestingly, our experiments also show that both solutions become much faster than naive scanning even for finding common prefixes of moderate length, contradicting a common belief that sophisticated data structures for LCE queries are not competitive with naive approaches [Ilie and Tinta, SPIRE'09].

Cite as

Patrick Dinklage, Johannes Fischer, Alexander Herlez, Tomasz Kociumaka, and Florian Kurpicz. Practical Performance of Space Efficient Data Structures for Longest Common Extensions. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 39:1-39:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{dinklage_et_al:LIPIcs.ESA.2020.39,
  author =	{Dinklage, Patrick and Fischer, Johannes and Herlez, Alexander and Kociumaka, Tomasz and Kurpicz, Florian},
  title =	{{Practical Performance of Space Efficient Data Structures for Longest Common Extensions}},
  booktitle =	{28th Annual European Symposium on Algorithms (ESA 2020)},
  pages =	{39:1--39:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-162-7},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{173},
  editor =	{Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.39},
  URN =		{urn:nbn:de:0030-drops-129050},
  doi =		{10.4230/LIPIcs.ESA.2020.39},
  annote =	{Keywords: text indexing, longest common prefix, space efficient data structures}
}

4 Search Results for "Herlez, Alexander"

Time-Optimal Construction of String Synchronizing Sets

Abstract

Cite as

A Simple Integer Successor-Delete Data Structure

Abstract

Cite as

Engineering Predecessor Data Structures for Dynamic Integer Sets

Abstract

Cite as

Practical Performance of Space Efficient Data Structures for Longest Common Extensions

Abstract

Cite as

Thanks for your feedback!

Could not send message