DROPS

Document

DOI: 10.4230/LIPIcs.MFCS.2025.36

Counting Distinct Square Substrings in Sublinear Time

Authors: Panagiotis Charalampopoulos, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

We show that the number of distinct squares in a packed string of length n over an alphabet of size σ can be computed in 𝒪(n/log_{σ}n) time in the word-RAM model of computation. This paper is the first to introduce a sublinear time algorithm for the packed version of squares counting. The packed representation of a string of length n over an alphabet of size σ is given as a sequence of 𝒪(n/ log_{σ} n) machine words in the word-RAM model (a machine word consists of ω ≥ log₂ n bits). Previously it was known how to count distinct squares in 𝒪(n) time [Gusfield and Stoye, JCSS 2004], even for a string over an integer alphabet, see [Crochemore et al., TCS 2014; Bannai et al., CPM 2017; Charalampopoulos et al., SPIRE 2020]. We use techniques of squares extraction from runs described by Crochemore et al. [TCS 2014]. However, the packed model requires novel approaches. In particular, we need an 𝒪(n/log_{σ}n) sized representation of all long-period runs (runs with periods that are Ω(log_{σ}n)) which guarantees sublinear time counting of potentially linearly-many implied squares. The long-period runs with a string period that is periodic itself (called layer runs) are an obstacle, since their number can be Ω(n). Fortunately, the number of all other long-period runs is 𝒪(n/log_{σ}n) and we can construct an implicit representation of all long-period runs in 𝒪(n/log_{σ}n) time by adopting the insights of Amir et al. [ESA 2019], combined with sublinear time tools provided by the PILLAR model of computations in case of packed strings. We count squares in layer runs in sublinear time by exploiting combinatorial properties of types of pyramidally-shaped groups of layer runs. As a by-product, we discover several new structural properties of runs. Another difficulty is to compute, in sublinear time, locations of Lyndon roots of runs in packed strings, which is needed for grouping of runs that can generate equal squares. To overcome this difficulty, we introduce sparse-Lyndon roots which are based on the notion of string synchronizers proposed by Kempa and Kociumaka [STOC 2019].

Cite as

Panagiotis Charalampopoulos, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Counting Distinct Square Substrings in Sublinear Time. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 36:1-36:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.MFCS.2025.36,
  author =	{Charalampopoulos, Panagiotis and Mohamed, Manal and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor},
  title =	{{Counting Distinct Square Substrings in Sublinear Time}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{36:1--36:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.36},
  URN =		{urn:nbn:de:0030-drops-241439},
  doi =		{10.4230/LIPIcs.MFCS.2025.36},
  annote =	{Keywords: square in a string, packed model, run (maximal repetition), Lyndon word}
}

@InProceedings{charalampopoulos_et_al:LIPIcs.MFCS.2025.36,
  author =	{Charalampopoulos, Panagiotis and Mohamed, Manal and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor},
  title =	{{Counting Distinct Square Substrings in Sublinear Time}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{36:1--36:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.36},
  URN =		{urn:nbn:de:0030-drops-241439},
  doi =		{10.4230/LIPIcs.MFCS.2025.36},
  annote =	{Keywords: square in a string, packed model, run (maximal repetition), Lyndon word}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.14

Minimal Generators in Optimal Time

Authors: Jonas Ellert, Paweł Gawrychowski, and Tatiana Starikovskaya

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

A walk of length n on a string S of length m is a function f : {1, … , n} → {1, … , m} such that ∀ i ∈ {2, … , n} : |f(i) - f(i - 1)| ≤ 1. The walk generates the string T of length n defined by {∀ i ∈ {1, … , n} : T[i] = S[f(i)]}. Intuitively, this can be seen as walking n steps in S and outputting the encountered symbols, where in each step we either remain at the same position, or move one position to the left or to the right. The minimal generator of a string T is the shortest string S such that a walk on S generates T. Recently, it was shown that each string admits exactly one (up to reversal) minimal generator (Pratt-Hartmann, CPM 2024). However, no efficient algorithm for computing the minimal generator was known. We provide an optimal algorithm for this task, taking {O}(n) time for a string of length n over general unordered alphabet, i.e., accessing the string only by equality comparisons of symbols. The main challenge is to detect substrings of the form axbx̃axb and replace them with axb, where a,b are symbols and x is a string with reversal x̃. We solve this problem with a non-trivial adaptation of Manacher’s classic algorithm for computing maximal palindromic substrings (Manacher, J. ACM 1975). To obtain the final algorithm, we solve small subinstances of the problem in optimal time by adapting the "Four Russians" technique to strings over general unordered alphabet, which may be of independent interest.

Cite as

Jonas Ellert, Paweł Gawrychowski, and Tatiana Starikovskaya. Minimal Generators in Optimal Time. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ellert_et_al:LIPIcs.CPM.2025.14,
  author =	{Ellert, Jonas and Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  title =	{{Minimal Generators in Optimal Time}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{14:1--14:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.14},
  URN =		{urn:nbn:de:0030-drops-231082},
  doi =		{10.4230/LIPIcs.CPM.2025.14},
  annote =	{Keywords: string algorithms, walking on words, minimal generator, palindromic substrings, general unordered alphabet, decision tree complexity}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2022.59

Shortest Beer Path Queries in Interval Graphs

Authors: Rathish Das, Meng He, Eitan Kondratovsky, J. Ian Munro, Anurag Murty Naredla, and Kaiyu Wu

Published in: LIPIcs, Volume 248, 33rd International Symposium on Algorithms and Computation (ISAAC 2022)

Abstract

Our interest is in paths between pairs of vertices that go through at least one of a subset of the vertices known as beer vertices. Such a path is called a beer path, and the beer distance between two vertices is the length of the shortest beer path. We show that we can represent unweighted interval graphs using 2n log n + O(n) + O(|B|log n) bits where |B| is the number of beer vertices. This data structure answers beer distance queries in O(log^ε n) time for any constant ε > 0 and shortest beer path queries in O(log^ε n + d) time, where d is the beer distance between the two nodes. We also show that proper interval graphs may be represented using 3n + o(n) bits to support beer distance queries in O(f(n)log n) time for any f(n) ∈ ω(1) and shortest beer path queries in O(d) time. All of these results also have time-space trade-offs. Lastly we show that the information theoretic lower bound for beer proper interval graphs is very close to the space of our structure, namely log(4+2√3)n - o(n) (or about 2.9 n) bits.

Cite as

Rathish Das, Meng He, Eitan Kondratovsky, J. Ian Munro, Anurag Murty Naredla, and Kaiyu Wu. Shortest Beer Path Queries in Interval Graphs. In 33rd International Symposium on Algorithms and Computation (ISAAC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 248, pp. 59:1-59:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{das_et_al:LIPIcs.ISAAC.2022.59,
  author =	{Das, Rathish and He, Meng and Kondratovsky, Eitan and Munro, J. Ian and Naredla, Anurag Murty and Wu, Kaiyu},
  title =	{{Shortest Beer Path Queries in Interval Graphs}},
  booktitle =	{33rd International Symposium on Algorithms and Computation (ISAAC 2022)},
  pages =	{59:1--59:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-258-7},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{248},
  editor =	{Bae, Sang Won and Park, Heejin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2022.59},
  URN =		{urn:nbn:de:0030-drops-173442},
  doi =		{10.4230/LIPIcs.ISAAC.2022.59},
  annote =	{Keywords: Beer Path, Interval Graph}
}

Document

DOI: 10.4230/LIPIcs.CPM.2021.5

The k-Mappability Problem Revisited

Authors: Amihood Amir, Itai Boneh, and Eitan Kondratovsky

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Abstract

The k-mappability problem has two integers parameters m and k. For every subword of size m in a text S, we wish to report the number of indices in S in which the word occurs with at most k mismatches. The problem was lately tackled by Alzamel et al. [Mai Alzamel et al., 2018]. For a text with constant alphabet Σ and k ∈ O(1), they present an algorithm with linear space and O(nlog^{k+1}n) time. For the case in which k = 1 and a constant size alphabet, a faster algorithm with linear space and O(nlog(n)log log(n)) time was presented in [Mai Alzamel et al., 2020]. In this work, we enhance the techniques of [Mai Alzamel et al., 2020] to obtain an algorithm with linear space and O(n log(n)) time for k = 1. Our algorithm removes the constraint of the alphabet being of constant size. We also present linear algorithms for the case of k = 1, |Σ| ∈ O(1) and m = Ω(√n).

Cite as

Amihood Amir, Itai Boneh, and Eitan Kondratovsky. The k-Mappability Problem Revisited. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 5:1-5:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2021.5,
  author =	{Amir, Amihood and Boneh, Itai and Kondratovsky, Eitan},
  title =	{{The k-Mappability Problem Revisited}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{5:1--5:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.5},
  URN =		{urn:nbn:de:0030-drops-139566},
  doi =		{10.4230/LIPIcs.CPM.2021.5},
  annote =	{Keywords: Pattern Matching, Hamming Distance, Suffix Tree, Suffix Array}
}

Document

DOI: 10.4230/LIPIcs.ESA.2020.5

Analysis of the Period Recovery Error Bound

Authors: Amihood Amir, Itai Boneh, Michael Itzhaki, and Eitan Kondratovsky

Published in: LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)

Abstract

The recovery problem is the problem whose input is a corrupted text T that was originally periodic, and where one wishes to recover its original period. The algorithm’s input is T without any information about either the period’s length or the period itself. An algorithm that solves this problem is called a recovery algorithm. In order to make recovery possible, there must be some assumption that not "too many" errors corrupted the initial periodic string. This is called the error bound. In previous recovery algorithms, it was shown that a given error bound of n/((2+ε)p) can lead to O(log_{1+ε} n) period candidates, that are guaranteed to include the original period, where p is the length of the original period (unknown by the algorithm) and ε > 0 is an arbitrary constant. This paper provides the first analysis of the relationship between the error bound and the number of candidates, as well as identification of the error parameters that still guarantee recovery. We improve the previously known upper error bound on the number of corruptions, n/((2+ε)p), that outputs O(log_{1+ε} n) period candidates. We show how to (1) remove ε from the bound, (2) relax the error bound to allow more errors while keeping the candidates set of size O(log n). It turns out that this relaxation on the previously known upper bound is quite challenging. To achieve this result we provide what, to our knowledge, is the first known non-trivial lower bound on the Hamming distance between two periodic strings. This proof leads to an error bound, that produces a family of period candidates of size 2log₃ n. We show that this result is tight and further provide a compact representation of the period candidates. We call this representation the canonic period seed. In addition to providing less restrictive error bounds that guarantee a smaller candidate set, we also provide a hierarchy of more restrictive upper error bounds that asymptotically reduces the size of the potential period candidate set.

Cite as

Amihood Amir, Itai Boneh, Michael Itzhaki, and Eitan Kondratovsky. Analysis of the Period Recovery Error Bound. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 5:1-5:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ESA.2020.5,
  author =	{Amir, Amihood and Boneh, Itai and Itzhaki, Michael and Kondratovsky, Eitan},
  title =	{{Analysis of the Period Recovery Error Bound}},
  booktitle =	{28th Annual European Symposium on Algorithms (ESA 2020)},
  pages =	{5:1--5:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-162-7},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{173},
  editor =	{Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.5},
  URN =		{urn:nbn:de:0030-drops-128717},
  doi =		{10.4230/LIPIcs.ESA.2020.5},
  annote =	{Keywords: Period Recovery, Period Recovery Hierarchy, Hamming Distance}
}

Document

DOI: 10.4230/LIPIcs.ESA.2019.5

Repetition Detection in a Dynamic String

Authors: Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos, and Eitan Kondratovsky

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)

Abstract

A string UU for a non-empty string U is called a square. Squares have been well-studied both from a combinatorial and an algorithmic perspective. In this paper, we are the first to consider the problem of maintaining a representation of the squares in a dynamic string S of length at most n. We present an algorithm that updates this representation in n^o(1) time. This representation allows us to report a longest square-substring of S in O(1) time and all square-substrings of S in O(output) time. We achieve this by introducing a novel tool - maintaining prefix-suffix matches of two dynamic strings. We extend the above result to address the problem of maintaining a representation of all runs (maximal repetitions) of the string. Runs are known to capture the periodic structure of a string, and, as an application, we show that our representation of runs allows us to efficiently answer periodicity queries for substrings of a dynamic string. These queries have proven useful in static pattern matching problems and our techniques have the potential of offering solutions to these problems in a dynamic text setting.

Cite as

Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos, and Eitan Kondratovsky. Repetition Detection in a Dynamic String. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ESA.2019.5,
  author =	{Amir, Amihood and Boneh, Itai and Charalampopoulos, Panagiotis and Kondratovsky, Eitan},
  title =	{{Repetition Detection in a Dynamic String}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{5:1--5:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.5},
  URN =		{urn:nbn:de:0030-drops-111265},
  doi =		{10.4230/LIPIcs.ESA.2019.5},
  annote =	{Keywords: string algorithms, dynamic algorithms, squares, repetitions, runs}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.6

Sufficient Conditions for Efficient Indexing Under Different Matchings

Authors: Amihood Amir and Eitan Kondratovsky

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

The most important task derived from the massive digital data accumulation in the world, is efficient access to this data, hence the importance of indexing. In the last decade, many different types of matching relations were defined, each requiring an efficient indexing scheme. Cole and Hariharan in a ground breaking paper [Cole and Hariharan, SIAM J. Comput., 33(1):26–42, 2003], formulate sufficient conditions for building an efficient indexing for quasi-suffix collections, collections that behave as suffixes. It was shown that known matchings, including parameterized, 2-D array and order preserving matchings, fit their indexing settings. In this paper, we formulate more basic sufficient conditions based on the order relation derived from the matching relation itself, our conditions are more general than the previously known conditions.

Cite as

Amihood Amir and Eitan Kondratovsky. Sufficient Conditions for Efficient Indexing Under Different Matchings. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 6:1-6:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2019.6,
  author =	{Amir, Amihood and Kondratovsky, Eitan},
  title =	{{Sufficient Conditions for Efficient Indexing Under Different Matchings}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{6:1--6:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.6},
  URN =		{urn:nbn:de:0030-drops-104773},
  doi =		{10.4230/LIPIcs.CPM.2019.6},
  annote =	{Keywords: off-the-shelf indexing algorithms, general matching relations, weaker sufficient conditions for indexing}
}

7 Search Results for "Kondratovsky, Eitan"

Counting Distinct Square Substrings in Sublinear Time

Abstract

Cite as

Minimal Generators in Optimal Time

Abstract

Cite as

Shortest Beer Path Queries in Interval Graphs

Abstract

Cite as

The k-Mappability Problem Revisited

Abstract

Cite as

Analysis of the Period Recovery Error Bound

Abstract

Cite as

Repetition Detection in a Dynamic String

Abstract

Cite as

Sufficient Conditions for Efficient Indexing Under Different Matchings

Abstract

Cite as

Thanks for your feedback!

Could not send message