Document

**Published in:** LIPIcs, Volume 289, 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024)

In the k-Edit Circular Pattern Matching (k-Edit CPM) problem, we are given a length-n text T, a length-m pattern P, and a positive integer threshold k, and we are to report all starting positions of the substrings of T that are at edit distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if any such substring exists. Very recently, Charalampopoulos et al. [ESA 2022] presented 𝒪(nk²)-time and 𝒪(nk log³ k)-time solutions for the reporting and decision versions of k-Edit CPM, respectively. Here, we show that the reporting and decision versions of k-Edit CPM can be solved in 𝒪(n+(n/m) k⁶) time and 𝒪(n+(n/m) k⁵ log³ k) time, respectively, thus obtaining the first algorithms with a complexity of the type 𝒪(n+(n/m) poly(k)) for this problem. Notably, our algorithms run in 𝒪(n) time when m = Ω(k⁶) and are superior to the previous respective solutions when m = ω(k⁴). We provide a meta-algorithm that yields efficient algorithms in several other interesting settings, such as when the strings are given in a compressed form (as straight-line programs), when the strings are dynamic, or when we have a quantum computer.
We obtain our solutions by exploiting the structure of approximate circular occurrences of P in T, when T is relatively short w.r.t. P. Roughly speaking, either the starting positions of approximate occurrences of rotations of P form 𝒪(k⁴) intervals that can be computed efficiently, or some rotation of P is almost periodic (is at a small edit distance from a string with small period). Dealing with the almost periodic case is the most technically demanding part of this work; we tackle it using properties of locked fragments (originating from [Cole and Hariharan, SICOMP 2002]).

Panagiotis Charalampopoulos, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Approximate Circular Pattern Matching Under Edit Distance. In 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 289, pp. 24:1-24:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.STACS.2024.24, author = {Charalampopoulos, Panagiotis and Pissis, Solon P. and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Approximate Circular Pattern Matching Under Edit Distance}}, booktitle = {41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024)}, pages = {24:1--24:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-311-9}, ISSN = {1868-8969}, year = {2024}, volume = {289}, editor = {Beyersdorff, Olaf and Kant\'{e}, Mamadou Moustapha and Kupferman, Orna and Lokshtanov, Daniel}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2024.24}, URN = {urn:nbn:de:0030-drops-197346}, doi = {10.4230/LIPIcs.STACS.2024.24}, annote = {Keywords: circular pattern matching, approximate pattern matching, edit distance} }

Document

**Published in:** LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

An elastic-degenerate (ED) string T is a sequence of n sets T[1],…,T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as ℒ(T) = {S_1 ⋯ S_n : S_i ∈ T[i] for all i ∈ [1,n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T₁ and T₂ of lengths n₁ and n₂, cardinalities m₁ and m₂, and sizes N₁ and N₂, respectively, we show the following:
- There is no 𝒪((N₁N₂)^{1-ε})-time algorithm, thus no 𝒪((N₁m₂+N₂m₁)^{1-ε})-time algorithm and no 𝒪((N₁n₂+N₂n₁)^{1-ε})-time algorithm, for any constant ε > 0, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false.
- There is no combinatorial 𝒪((N₁+N₂)^{1.2-ε}f(n₁,n₂))-time algorithm, for any constant ε > 0 and any function f, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false.
- An 𝒪(N₁log N₁log n₁+N₂log N₂log n₂)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T₁ and T₂ are given in a compact representation, we show that the problem is NP-complete.
- An 𝒪(N₁m₂+N₂m₁)-time algorithm for EDSI.
- An Õ(N₁^{ω-1}n₂+N₂^{ω-1}n₁)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size.
We also show that the techniques we develop have applications outside of ED string comparison.

Esteban Gabory, Moses Njagi Mwaniki, Nadia Pisanti, Solon P. Pissis, Jakub Radoszewski, Michelle Sweering, and Wiktor Zuba. Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{gabory_et_al:LIPIcs.CPM.2023.11, author = {Gabory, Esteban and Mwaniki, Moses Njagi and Pisanti, Nadia and Pissis, Solon P. and Radoszewski, Jakub and Sweering, Michelle and Zuba, Wiktor}, title = {{Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {11:1--11:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.11}, URN = {urn:nbn:de:0030-drops-179650}, doi = {10.4230/LIPIcs.CPM.2023.11}, annote = {Keywords: elastic-degenerate string, sequence comparison, languages intersection, pangenome, acronym identification} }

Document

**Published in:** LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Cyclic versions of covers and roots of a string are considered in this paper. A prefix V of a string S is a cyclic root of S if S is a concatenation of cyclic rotations of V. A prefix V of S is a cyclic cover of S if the occurrences of the cyclic rotations of V cover all positions of S. We present 𝒪(n)-time algorithms computing all cyclic roots (using number-theoretic tools) and all cyclic covers (using tools related to seeds) of a length-n string over an integer alphabet. Our results improve upon 𝒪(n log log n) and 𝒪(n log n) time complexities of recent algorithms of Grossi et al. (WALCOM 2023) for the respective problems and provide novel approaches to the problems. As a by-product, we obtain an optimal data structure for Internal Circular Pattern Matching queries that generalize Internal Pattern Matching and Cyclic Equivalence queries of Kociumaka et al. (SODA 2015).

Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Linear-Time Computation of Cyclic Roots and Cyclic Covers of a String. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{iliopoulos_et_al:LIPIcs.CPM.2023.15, author = {Iliopoulos, Costas S. and Kociumaka, Tomasz and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Linear-Time Computation of Cyclic Roots and Cyclic Covers of a String}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {15:1--15:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.15}, URN = {urn:nbn:de:0030-drops-179697}, doi = {10.4230/LIPIcs.CPM.2023.15}, annote = {Keywords: cyclic cover, cyclic root, circular pattern matching, internal pattern matching} }

Document

**Published in:** LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

In the all-pairs suffix-prefix (APSP) problem, we are given a dictionary R of k strings, S_1,…,S_k, of total length n, and we are asked to find the length SPL_{i,j} of the longest string that is both a suffix of S_i and a prefix of S_j, for all i,j ∈ [1,k]. APSP is a classic problem in string algorithms with many applications in bioinformatics. When all strings of the dictionary are over an integer alphabet of size σ ≤ n^𝒪(1), APSP can be solved in the optimal 𝒪(n+k²) time with the use of the generalized suffix tree of the dictionary [Gusfield et al., Inf. Process. Lett. 1992].
In many bioinformatics applications, such as in sequence assembly, the size k of dictionary R is very large. In particular, k² usually dominates n, and thus the k² factor is the bottleneck both in the time and in the space complexity of such applications. We thus initiate a holistic study on several data structure variants of APSP. In particular, we consider the following types of queries:
- One-to-One(i,j): output SPL_{i,j}.
- One-to-All(i): output SPL_{i,j} for every j ∈ [1,k].
- Report(i,𝓁): output all distinct j ∈ [1,k] such that SPL_{i,j} ≥ 𝓁, where 𝓁 ≥ 0 is an integer.
- Count(i,𝓁): output the number of distinct j ∈ [1,k] such that SPL_{i,j} ≥ 𝓁, where 𝓁 ≥ 0 is an integer.
- Top(i,K): output K distinct j ∈ [1,k] with the highest values of SPL_{i,j} breaking ties arbitrarily.
We assume the standard word RAM model of computation with word size w = Ω(log n) and an integer alphabet of size σ ≤ n^𝒪(1). We show the following upper bounds:
Query | Space (words) | Query time | Note
One-to-One(i,j) | 𝒪(n) | 𝒪(log log k) | Theorem 11
One-to-All(i) | 𝒪(n) | 𝒪(k) | Theorem 14
Report(i,𝓁) | 𝒪(n) | 𝒪(log n/log log n+output) | Theorem 19(i)
Count(i,𝓁) | 𝒪(n) | 𝒪(log n/log log n) | Theorem 19(ii)
Top(i,K) | 𝒪(n) | 𝒪(log² n/log log n+K) | Theorem 22
We also present efficient algorithms for constructing these data structures.

Grigorios Loukides, Solon P. Pissis, Sharma V. Thankachan, and Wiktor Zuba. Suffix-Prefix Queries on a Dictionary. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 21:1-21:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{loukides_et_al:LIPIcs.CPM.2023.21, author = {Loukides, Grigorios and Pissis, Solon P. and Thankachan, Sharma V. and Zuba, Wiktor}, title = {{Suffix-Prefix Queries on a Dictionary}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {21:1--21:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.21}, URN = {urn:nbn:de:0030-drops-179757}, doi = {10.4230/LIPIcs.CPM.2023.21}, annote = {Keywords: all-pairs suffix-prefix, suffix-prefix queries, internal pattern matching} }

Document

**Published in:** LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

We investigate the complexity of approximate circular pattern matching (CPM, in short) under the Hamming and edit distance. Under each of these two basic metrics, we are given a length-n text T, a length-m pattern P, and a positive integer threshold k, and we are to report all starting positions (called occurrences) of fragments of T that are at distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if there is any such occurrence. All previous results for approximate CPM were either average-case upper bounds or heuristics, with the exception of the work of Charalampopoulos et al. [CKP^+, JCSS'21], who considered only the Hamming distance. For the reporting version of the approximate CPM problem, under the Hamming distance we improve upon the main algorithm of [CKP^+, JCSS'21] from 𝒪(n+(n/m) ⋅ k⁴) to 𝒪(n+(n/m) ⋅ k³ log log k) time; for the edit distance, we give an 𝒪(nk²)-time algorithm. Notably, for the decision versions and wide parameter-ranges, we give algorithms whose complexities are almost identical to the state-of-the-art for standard (i.e., non-circular) approximate pattern matching:
- For the decision version of the approximate CPM problem under the Hamming distance, we obtain an 𝒪(n+(n/m) ⋅ k² log k / log log k)-time algorithm, which works in 𝒪(n) time whenever k = 𝒪(√{m log log m / log m}). In comparison, the fastest algorithm for the standard counterpart of the problem, by Chan et al. [CGKKP, STOC’20], runs in 𝒪(n) time only for k = 𝒪(√m). We achieve this result via a reduction to a geometric problem by building on ideas from [CKP^+, JCSS'21] and Charalampopoulos et al. [CKW, FOCS'20].
- For the decision version of the approximate CPM problem under the edit distance, the 𝒪(nklog³ k) runtime of our algorithm near matches the 𝒪(nk) runtime of the Landau-Vishkin algorithm [LV, J. Algorithms'89] for approximate pattern matching under edit distance; the latter algorithm remains the fastest known for k = Ω(m^{2/5}). As a stepping stone, we propose an 𝒪(nklog³ k)-time algorithm for solving the Longest Prefix k'-Approximate Match problem, proposed by Landau et al. [LMS, SICOMP'98], for all k' ∈ {1,…,k}. Our algorithm is based on Tiskin’s theory of seaweeds [Tiskin, Math. Comput. Sci.'08], with recent advancements (see Charalampopoulos et al. [CKW, FOCS'22]), and on exploiting the seaweeds' relation to Monge matrices.
In contrast, we obtain a conditional lower bound that suggests a polynomial separation between approximate CPM under the Hamming distance over the binary alphabet and its non-circular counterpart. We also show that a strongly subquadratic-time algorithm for the decision version of approximate CPM under edit distance would refute the Strong Exponential Time Hypothesis.

Panagiotis Charalampopoulos, Tomasz Kociumaka, Jakub Radoszewski, Solon P. Pissis, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Approximate Circular Pattern Matching. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 35:1-35:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.ESA.2022.35, author = {Charalampopoulos, Panagiotis and Kociumaka, Tomasz and Radoszewski, Jakub and Pissis, Solon P. and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Approximate Circular Pattern Matching}}, booktitle = {30th Annual European Symposium on Algorithms (ESA 2022)}, pages = {35:1--35:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-247-1}, ISSN = {1868-8969}, year = {2022}, volume = {244}, editor = {Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.35}, URN = {urn:nbn:de:0030-drops-169738}, doi = {10.4230/LIPIcs.ESA.2022.35}, annote = {Keywords: approximate circular pattern matching, Hamming distance, edit distance} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

We show that lengths of shortest covers of all rotations of a length-n string over an integer alphabet can be computed in 𝒪(n) time in the word-RAM model, thus improving an 𝒪(n log n)-time algorithm from Crochemore et al. (Theor. Comput. Sci., 2021). Similarly as Crochemore et al., we use a relation of covers of rotations of a string S to seeds and squares in S³. The crucial parameter of a string S is the number ξ(S) of primitive covers of all rotations of S. We show first that the time complexity of the algorithm from Crochemore et al. can be slightly improved which results in time complexity Θ(ξ(S)). However, we also show that in the worst case ξ(S) is Ω(|S|log |S|). This is the main difficulty in obtaining a linear time algorithm. We overcome it and obtain yet another application of runs in strings.

Maxime Crochemore, Costas S. Iliopoulos, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Linear-Time Computation of Shortest Covers of All Rotations of a String. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 22:1-22:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{crochemore_et_al:LIPIcs.CPM.2022.22, author = {Crochemore, Maxime and Iliopoulos, Costas S. and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Linear-Time Computation of Shortest Covers of All Rotations of a String}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {22:1--22:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.22}, URN = {urn:nbn:de:0030-drops-161495}, doi = {10.4230/LIPIcs.CPM.2022.22}, annote = {Keywords: cover, quasiperiod, cyclic rotation, seed, run} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

We consider tile covers of 2D-strings which are a generalization of periodicity of 1D-strings. We say that a 2D-string A is a tile cover of a 2D-string S if S can be decomposed into non-overlapping 2D-strings, each of them equal to A or to A^T, where A^T is the transpose of A. We show that all tile covers of a 2D-string of size N can be computed in 𝒪(N^{1+ε}) time for any ε > 0. We also show a linear-time algorithm for computing all 1D-strings being tile covers of a 2D-string.

Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Rectangular Tile Covers of 2D-Strings. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 23:1-23:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{radoszewski_et_al:LIPIcs.CPM.2022.23, author = {Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Rectangular Tile Covers of 2D-Strings}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {23:1--23:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.23}, URN = {urn:nbn:de:0030-drops-161508}, doi = {10.4230/LIPIcs.CPM.2022.23}, annote = {Keywords: tile cover, periodicity, efficient algorithm} }

Document

**Published in:** LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

We prove 3SUM-hardness (no strongly subquadratic-time algorithm, assuming the 3SUM conjecture) of several problems related to finding Abelian square and additive square factors in a string. In particular, we conclude conditional optimality of the state-of-the-art algorithms for finding such factors.
Overall, we show 3SUM-hardness of (a) detecting an Abelian square factor of an odd half-length, (b) computing centers of all Abelian square factors, (c) detecting an additive square factor in a length-n string of integers of magnitude n^{𝒪(1)}, and (d) a problem of computing a double 3-term arithmetic progression (i.e., finding indices i ≠ j such that (x_i+x_j)/2 = x_{(i+j)/2}) in a sequence of integers x₁,… ,x_n of magnitude n^{𝒪(1)}.
Problem (d) is essentially a convolution version of the AVERAGE problem that was proposed in a manuscript of Erickson. We obtain a conditional lower bound for it with the aid of techniques recently developed by Dudek et al. [STOC 2020]. Problem (d) immediately reduces to problem (c) and is a step in reductions to problems (a) and (b). In conditional lower bounds for problems (a) and (b) we apply an encoding of Amir et al. [ICALP 2014] and extend it using several string gadgets that include arbitrarily long Abelian-square-free strings.
Our reductions also imply conditional lower bounds for detecting Abelian squares in strings over a constant-sized alphabet. We also show a subquadratic upper bound in this case, applying a result of Chan and Lewenstein [STOC 2015].

Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Hardness of Detecting Abelian and Additive Square Factors in Strings. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 77:1-77:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{radoszewski_et_al:LIPIcs.ESA.2021.77, author = {Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Hardness of Detecting Abelian and Additive Square Factors in Strings}}, booktitle = {29th Annual European Symposium on Algorithms (ESA 2021)}, pages = {77:1--77:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-204-4}, ISSN = {1868-8969}, year = {2021}, volume = {204}, editor = {Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.77}, URN = {urn:nbn:de:0030-drops-146581}, doi = {10.4230/LIPIcs.ESA.2021.77}, annote = {Keywords: Abelian square, additive square, 3SUM problem} }

Document

**Published in:** LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

We consider two notions of covers of a two-dimensional string T. A (rectangular) subarray P of T is a 2D-cover of T if each position of T is in an occurrence of P in T. A one-dimensional string S is a 1D-cover of T if its vertical and horizontal occurrences in T cover all positions of T. We show how to compute the smallest-area 2D-cover of an m × n array T in the optimal 𝒪(N) time, where N = mn, all aperiodic 2D-covers of T in 𝒪(N log N) time, and all 2D-covers of T in N^{4/3}⋅ log^{𝒪(1)}N time. Further, we show how to compute all 1D-covers in the optimal 𝒪(N) time. Along the way, we show that the Klee’s measure of a set of rectangles, each of width and height at least √n, on an n × n grid can be maintained in √n⋅ log^{𝒪(1)}n time per insertion or deletion of a rectangle, a result which could be of independent interest.

Panagiotis Charalampopoulos, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Computing Covers of 2D-Strings. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2021.12, author = {Charalampopoulos, Panagiotis and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Computing Covers of 2D-Strings}}, booktitle = {32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)}, pages = {12:1--12:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-186-3}, ISSN = {1868-8969}, year = {2021}, volume = {191}, editor = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.12}, URN = {urn:nbn:de:0030-drops-139635}, doi = {10.4230/LIPIcs.CPM.2021.12}, annote = {Keywords: 2D-string, cover, dynamic Klee’s measure problem} }

Document

**Published in:** LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)

The notions of periodicity and repetitions in strings, and hence these of runs and squares, naturally extend to two-dimensional strings. We consider two types of repetitions in 2D-strings: 2D-runs and quartics (quartics are a 2D-version of squares in standard strings). Amir et al. introduced 2D-runs, showed that there are 𝒪(n³) of them in an n × n 2D-string and presented a simple construction giving a lower bound of Ω(n²) for their number (Theoretical Computer Science, 2020). We make a significant step towards closing the gap between these bounds by showing that the number of 2D-runs in an n × n 2D-string is 𝒪(n² log² n). In particular, our bound implies that the 𝒪(n²log n + output) run-time of the algorithm of Amir et al. for computing 2D-runs is also 𝒪(n² log² n). We expect this result to allow for exploiting 2D-runs algorithmically in the area of 2D pattern matching.
A quartic is a 2D-string composed of 2 × 2 identical blocks (2D-strings) that was introduced by Apostolico and Brimkov (Theoretical Computer Science, 2000), where by quartics they meant only primitively rooted quartics, i.e. built of a primitive block. Here our notion of quartics is more general and analogous to that of squares in 1D-strings. Apostolico and Brimkov showed that there are 𝒪(n² log² n) occurrences of primitively rooted quartics in an n × n 2D-string and that this bound is attainable. Consequently the number of distinct primitively rooted quartics is 𝒪(n² log² n). The straightforward bound for the maximal number of distinct general quartics is 𝒪(n⁴). Here, we prove that the number of distinct general quartics is also 𝒪(n² log² n). This extends the rich combinatorial study of the number of distinct squares in a 1D-string, that was initiated by Fraenkel and Simpson (Journal of Combinatorial Theory, Series A, 1998), to two dimensions.
Finally, we show some algorithmic applications of 2D-runs. Specifically, we present algorithms for computing all occurrences of primitively rooted quartics and counting all general distinct quartics in 𝒪(n² log² n) time, which is quasi-linear with respect to the size of the input. The former algorithm is optimal due to the lower bound of Apostolico and Brimkov. The latter can be seen as a continuation of works on enumeration of distinct squares in 1D-strings using runs (Crochemore et al., Theoretical Computer Science, 2014). However, the methods used in 2D are different because of different properties of 2D-runs and quartics.

Panagiotis Charalampopoulos, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. The Number of Repetitions in 2D-Strings. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 32:1-32:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.ESA.2020.32, author = {Charalampopoulos, Panagiotis and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{The Number of Repetitions in 2D-Strings}}, booktitle = {28th Annual European Symposium on Algorithms (ESA 2020)}, pages = {32:1--32:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-162-7}, ISSN = {1868-8969}, year = {2020}, volume = {173}, editor = {Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.32}, URN = {urn:nbn:de:0030-drops-128987}, doi = {10.4230/LIPIcs.ESA.2020.32}, annote = {Keywords: 2D-run, quartic, run, square} }

Document

**Published in:** LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

We consider the problem of preprocessing a text T of length n and a dictionary 𝒟 in order to be able to efficiently answer queries CountDistinct(i,j), that is, given i and j return the number of patterns from 𝒟 that occur in the fragment T[i..j]. The dictionary is internal in the sense that each pattern in 𝒟 is given as a fragment of T. This way, the dictionary takes space proportional to the number of patterns d=|𝒟| rather than their total length, which could be Θ(n⋅ d). An 𝒪̃(n+d)-size data structure that answers CountDistinct(i,j) queries 𝒪(log n)-approximately in 𝒪̃(1) time was recently proposed in a work that introduced internal dictionary matching [ISAAC 2019]. Here we present an 𝒪̃(n+d)-size data structure that answers CountDistinct(i,j) queries 2-approximately in 𝒪̃(1) time. Using range queries, for any m, we give an 𝒪̃(min(nd/m,n²/m²)+d)-size data structure that answers CountDistinct(i,j) queries exactly in 𝒪̃(m) time. We also consider the special case when the dictionary consists of all square factors of the string. We design an 𝒪(n log² n)-size data structure that allows us to count distinct squares in a text fragment T[i..j] in 𝒪(log n) time.

Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Counting Distinct Patterns in Internal Dictionary Matching. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 8:1-8:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2020.8, author = {Charalampopoulos, Panagiotis and Kociumaka, Tomasz and Mohamed, Manal and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Counting Distinct Patterns in Internal Dictionary Matching}}, booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)}, pages = {8:1--8:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-149-8}, ISSN = {1868-8969}, year = {2020}, volume = {161}, editor = {G{\o}rtz, Inge Li and Weimann, Oren}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.8}, URN = {urn:nbn:de:0030-drops-121336}, doi = {10.4230/LIPIcs.CPM.2020.8}, annote = {Keywords: dictionary matching, internal pattern matching, squares} }

Document

**Published in:** LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

The edit distance (a.k.a. the Levenshtein distance) between two words is defined as the minimum number of insertions, deletions or substitutions of letters needed to transform one word into another. The Levenshtein k-neighbourhood of a word w is the set of words that are at edit distance at most k from w. This is perhaps the most important concept underlying BLAST, a widely-used tool for comparing biological sequences. A natural combinatorial question is to ask for upper and lower bounds on the size of this set. The answer to this question has important algorithmic implications as well. Myers notes that "such bounds would give a tighter characterisation of the running time of the algorithm" behind BLAST. We show that the size of the Levenshtein k-neighbourhood of any word of length n over an arbitrary alphabet is not smaller than the size of the Levenshtein k-neighbourhood of a unary word of length n, thus providing a tight lower bound on the size of the Levenshtein k-neighbourhood. We remark that this result was posed as a conjecture by Dufresne at WCTA 2019.

Panagiotis Charalampopoulos, Solon P. Pissis, Jakub Radoszewski, Tomasz Waleń, and Wiktor Zuba. Unary Words Have the Smallest Levenshtein k-Neighbourhoods. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 10:1-10:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2020.10, author = {Charalampopoulos, Panagiotis and Pissis, Solon P. and Radoszewski, Jakub and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Unary Words Have the Smallest Levenshtein k-Neighbourhoods}}, booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)}, pages = {10:1--10:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-149-8}, ISSN = {1868-8969}, year = {2020}, volume = {161}, editor = {G{\o}rtz, Inge Li and Weimann, Oren}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.10}, URN = {urn:nbn:de:0030-drops-121359}, doi = {10.4230/LIPIcs.CPM.2020.10}, annote = {Keywords: combinatorics on words, Levenshtein distance, edit distance} }

Document

**Published in:** LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

We introduce the Longest Common Circular Factor (LCCF) problem in which, given strings S and T of length at most n, we are to compute the longest factor of S whose cyclic shift occurs as a factor of T. It is a new similarity measure, an extension of the classic Longest Common Factor. We show how to solve the LCCF problem in O(n log^4 n) time using O(n log^2 n) space.

Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Quasi-Linear-Time Algorithm for Longest Common Circular Factor. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 25:1-25:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{alzamel_et_al:LIPIcs.CPM.2019.25, author = {Alzamel, Mai and Crochemore, Maxime and Iliopoulos, Costas S. and Kociumaka, Tomasz and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Quasi-Linear-Time Algorithm for Longest Common Circular Factor}}, booktitle = {30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)}, pages = {25:1--25:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-103-0}, ISSN = {1868-8969}, year = {2019}, volume = {128}, editor = {Pisanti, Nadia and P. Pissis, Solon}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.25}, URN = {urn:nbn:de:0030-drops-104961}, doi = {10.4230/LIPIcs.CPM.2019.25}, annote = {Keywords: longest common factor, circular pattern matching, internal pattern matching, intersection of hyperrectangles} }