Document

**Published in:** LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

The Cover Suffix Tree (CST) of a string T is the suffix tree of T with additional explicit nodes corresponding to halves of square substrings of T. In the CST an explicit node corresponding to a substring C of T is annotated with two numbers: the number of non-overlapping consecutive occurrences of C and the total number of positions in T that are covered by occurrences of C in T. Kociumaka et al. (Algorithmica, 2015) have shown how to compute the CST of a length-n string in 𝒪(n log n) time. We give an algorithm that computes the same data structure in 𝒪(n) time assuming that T is over an integer alphabet and discuss its implications.
A string C is a cover of text T if occurrences of C in T cover all positions of T; C is a seed of T if occurrences and overhangs (i.e., prefix-suffix occurrences) of C in T cover all positions of T. An α-partial cover (α-partial seed) of text T is a string C whose occurrences in T (occurrences and overhangs in T, respectively) cover at least α positions of T. Kociumaka et al. (Algorithmica, 2015; Theor. Comput. Sci., 2018) have shown that knowing the CST of a length-n string T, one can compute a linear-sized representation of all seeds of T as well as all shortest α-partial covers and seeds in T for a given α in 𝒪(n) time. Thus our result implies linear-time algorithms computing these notions of quasiperiodicity. The resulting algorithm computing seeds is substantially different from the previous one (Kociumaka et al., SODA 2012, ACM Trans. Algorithms, 2020); in particular, it is non-recursive. Kociumaka et al. (Algorithmica, 2015) proposed an 𝒪(n log n)-time algorithm for computing a shortest α-partial cover for each α = 1,…,n; we improve this complexity to 𝒪(n).
Our results are based on a new combinatorial characterization of consecutive overlapping occurrences of a substring S of T in terms of the set of runs (see Kolpakov and Kucherov, FOCS 1999) in T. This new insight also leads to an 𝒪(n)-sized index for reporting overlapping consecutive occurrences of a given pattern P of length m in the optimal 𝒪(m+output) time, where output is the number of occurrences reported. In comparison, a general index for reporting bounded-gap consecutive occurrences of Navarro and Thankachan (Theor. Comput. Sci., 2016) uses 𝒪(n log n) space.

Jakub Radoszewski. Linear Time Construction of Cover Suffix Tree and Applications. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 89:1-89:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{radoszewski:LIPIcs.ESA.2023.89, author = {Radoszewski, Jakub}, title = {{Linear Time Construction of Cover Suffix Tree and Applications}}, booktitle = {31st Annual European Symposium on Algorithms (ESA 2023)}, pages = {89:1--89:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-295-2}, ISSN = {1868-8969}, year = {2023}, volume = {274}, editor = {G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.89}, URN = {urn:nbn:de:0030-drops-187428}, doi = {10.4230/LIPIcs.ESA.2023.89}, annote = {Keywords: cover (quasiperiod), seed, suffix tree, run (maximal repetition)} }

Document

**Published in:** LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

An elastic-degenerate (ED) string T is a sequence of n sets T[1],…,T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as ℒ(T) = {S_1 ⋯ S_n : S_i ∈ T[i] for all i ∈ [1,n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T₁ and T₂ of lengths n₁ and n₂, cardinalities m₁ and m₂, and sizes N₁ and N₂, respectively, we show the following:
- There is no 𝒪((N₁N₂)^{1-ε})-time algorithm, thus no 𝒪((N₁m₂+N₂m₁)^{1-ε})-time algorithm and no 𝒪((N₁n₂+N₂n₁)^{1-ε})-time algorithm, for any constant ε > 0, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false.
- There is no combinatorial 𝒪((N₁+N₂)^{1.2-ε}f(n₁,n₂))-time algorithm, for any constant ε > 0 and any function f, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false.
- An 𝒪(N₁log N₁log n₁+N₂log N₂log n₂)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T₁ and T₂ are given in a compact representation, we show that the problem is NP-complete.
- An 𝒪(N₁m₂+N₂m₁)-time algorithm for EDSI.
- An Õ(N₁^{ω-1}n₂+N₂^{ω-1}n₁)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size.
We also show that the techniques we develop have applications outside of ED string comparison.

Esteban Gabory, Moses Njagi Mwaniki, Nadia Pisanti, Solon P. Pissis, Jakub Radoszewski, Michelle Sweering, and Wiktor Zuba. Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 11:1-11:20, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{gabory_et_al:LIPIcs.CPM.2023.11, author = {Gabory, Esteban and Mwaniki, Moses Njagi and Pisanti, Nadia and Pissis, Solon P. and Radoszewski, Jakub and Sweering, Michelle and Zuba, Wiktor}, title = {{Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {11:1--11:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.11}, URN = {urn:nbn:de:0030-drops-179650}, doi = {10.4230/LIPIcs.CPM.2023.11}, annote = {Keywords: elastic-degenerate string, sequence comparison, languages intersection, pangenome, acronym identification} }

Document

**Published in:** LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Cyclic versions of covers and roots of a string are considered in this paper. A prefix V of a string S is a cyclic root of S if S is a concatenation of cyclic rotations of V. A prefix V of S is a cyclic cover of S if the occurrences of the cyclic rotations of V cover all positions of S. We present 𝒪(n)-time algorithms computing all cyclic roots (using number-theoretic tools) and all cyclic covers (using tools related to seeds) of a length-n string over an integer alphabet. Our results improve upon 𝒪(n log log n) and 𝒪(n log n) time complexities of recent algorithms of Grossi et al. (WALCOM 2023) for the respective problems and provide novel approaches to the problems. As a by-product, we obtain an optimal data structure for Internal Circular Pattern Matching queries that generalize Internal Pattern Matching and Cyclic Equivalence queries of Kociumaka et al. (SODA 2015).

Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Linear-Time Computation of Cyclic Roots and Cyclic Covers of a String. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 15:1-15:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{iliopoulos_et_al:LIPIcs.CPM.2023.15, author = {Iliopoulos, Costas S. and Kociumaka, Tomasz and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Linear-Time Computation of Cyclic Roots and Cyclic Covers of a String}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {15:1--15:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.15}, URN = {urn:nbn:de:0030-drops-179697}, doi = {10.4230/LIPIcs.CPM.2023.15}, annote = {Keywords: cyclic cover, cyclic root, circular pattern matching, internal pattern matching} }

Document

**Published in:** LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

We investigate the complexity of approximate circular pattern matching (CPM, in short) under the Hamming and edit distance. Under each of these two basic metrics, we are given a length-n text T, a length-m pattern P, and a positive integer threshold k, and we are to report all starting positions (called occurrences) of fragments of T that are at distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if there is any such occurrence. All previous results for approximate CPM were either average-case upper bounds or heuristics, with the exception of the work of Charalampopoulos et al. [CKP^+, JCSS'21], who considered only the Hamming distance. For the reporting version of the approximate CPM problem, under the Hamming distance we improve upon the main algorithm of [CKP^+, JCSS'21] from 𝒪(n+(n/m) ⋅ k⁴) to 𝒪(n+(n/m) ⋅ k³ log log k) time; for the edit distance, we give an 𝒪(nk²)-time algorithm. Notably, for the decision versions and wide parameter-ranges, we give algorithms whose complexities are almost identical to the state-of-the-art for standard (i.e., non-circular) approximate pattern matching:
- For the decision version of the approximate CPM problem under the Hamming distance, we obtain an 𝒪(n+(n/m) ⋅ k² log k / log log k)-time algorithm, which works in 𝒪(n) time whenever k = 𝒪(√{m log log m / log m}). In comparison, the fastest algorithm for the standard counterpart of the problem, by Chan et al. [CGKKP, STOC’20], runs in 𝒪(n) time only for k = 𝒪(√m). We achieve this result via a reduction to a geometric problem by building on ideas from [CKP^+, JCSS'21] and Charalampopoulos et al. [CKW, FOCS'20].
- For the decision version of the approximate CPM problem under the edit distance, the 𝒪(nklog³ k) runtime of our algorithm near matches the 𝒪(nk) runtime of the Landau-Vishkin algorithm [LV, J. Algorithms'89] for approximate pattern matching under edit distance; the latter algorithm remains the fastest known for k = Ω(m^{2/5}). As a stepping stone, we propose an 𝒪(nklog³ k)-time algorithm for solving the Longest Prefix k'-Approximate Match problem, proposed by Landau et al. [LMS, SICOMP'98], for all k' ∈ {1,…,k}. Our algorithm is based on Tiskin’s theory of seaweeds [Tiskin, Math. Comput. Sci.'08], with recent advancements (see Charalampopoulos et al. [CKW, FOCS'22]), and on exploiting the seaweeds' relation to Monge matrices.
In contrast, we obtain a conditional lower bound that suggests a polynomial separation between approximate CPM under the Hamming distance over the binary alphabet and its non-circular counterpart. We also show that a strongly subquadratic-time algorithm for the decision version of approximate CPM under edit distance would refute the Strong Exponential Time Hypothesis.

Panagiotis Charalampopoulos, Tomasz Kociumaka, Jakub Radoszewski, Solon P. Pissis, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Approximate Circular Pattern Matching. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 35:1-35:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.ESA.2022.35, author = {Charalampopoulos, Panagiotis and Kociumaka, Tomasz and Radoszewski, Jakub and Pissis, Solon P. and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Approximate Circular Pattern Matching}}, booktitle = {30th Annual European Symposium on Algorithms (ESA 2022)}, pages = {35:1--35:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-247-1}, ISSN = {1868-8969}, year = {2022}, volume = {244}, editor = {Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.35}, URN = {urn:nbn:de:0030-drops-169738}, doi = {10.4230/LIPIcs.ESA.2022.35}, annote = {Keywords: approximate circular pattern matching, Hamming distance, edit distance} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

We revisit the classic algorithmic problem of computing a longest palidromic substring. This problem is solvable by a celebrated 𝒪(n)-time algorithm [Manacher, J. ACM 1975], where n is the length of the input string. For small alphabets, 𝒪(n) is not necessarily optimal in the word RAM model of computation: a string of length n over alphabet [0,σ) can be stored in 𝒪(n log σ/log n) space and read in 𝒪(n log σ/log n) time. We devise a simple 𝒪(n log σ/log n)-time algorithm for computing a longest palindromic substring. In particular, our algorithm works in sublinear time if σ = 2^{o(log n)}. Our technique relies on periodicity and on the 𝒪(n log σ/log n)-time constructible data structure of Kempa and Kociumaka [STOC 2019] that answers longest common extension queries in 𝒪(1) time.

Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest Palindromic Substring in Sublinear Time. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 20:1-20:9, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2022.20, author = {Charalampopoulos, Panagiotis and Pissis, Solon P. and Radoszewski, Jakub}, title = {{Longest Palindromic Substring in Sublinear Time}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {20:1--20:9}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.20}, URN = {urn:nbn:de:0030-drops-161472}, doi = {10.4230/LIPIcs.CPM.2022.20}, annote = {Keywords: string algorithms, longest palindromic substring, longest common extension} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

We show that lengths of shortest covers of all rotations of a length-n string over an integer alphabet can be computed in 𝒪(n) time in the word-RAM model, thus improving an 𝒪(n log n)-time algorithm from Crochemore et al. (Theor. Comput. Sci., 2021). Similarly as Crochemore et al., we use a relation of covers of rotations of a string S to seeds and squares in S³. The crucial parameter of a string S is the number ξ(S) of primitive covers of all rotations of S. We show first that the time complexity of the algorithm from Crochemore et al. can be slightly improved which results in time complexity Θ(ξ(S)). However, we also show that in the worst case ξ(S) is Ω(|S|log |S|). This is the main difficulty in obtaining a linear time algorithm. We overcome it and obtain yet another application of runs in strings.

Maxime Crochemore, Costas S. Iliopoulos, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Linear-Time Computation of Shortest Covers of All Rotations of a String. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 22:1-22:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{crochemore_et_al:LIPIcs.CPM.2022.22, author = {Crochemore, Maxime and Iliopoulos, Costas S. and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Linear-Time Computation of Shortest Covers of All Rotations of a String}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {22:1--22:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.22}, URN = {urn:nbn:de:0030-drops-161495}, doi = {10.4230/LIPIcs.CPM.2022.22}, annote = {Keywords: cover, quasiperiod, cyclic rotation, seed, run} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

We consider tile covers of 2D-strings which are a generalization of periodicity of 1D-strings. We say that a 2D-string A is a tile cover of a 2D-string S if S can be decomposed into non-overlapping 2D-strings, each of them equal to A or to A^T, where A^T is the transpose of A. We show that all tile covers of a 2D-string of size N can be computed in 𝒪(N^{1+ε}) time for any ε > 0. We also show a linear-time algorithm for computing all 1D-strings being tile covers of a 2D-string.

Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Rectangular Tile Covers of 2D-Strings. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 23:1-23:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{radoszewski_et_al:LIPIcs.CPM.2022.23, author = {Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Rectangular Tile Covers of 2D-Strings}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {23:1--23:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.23}, URN = {urn:nbn:de:0030-drops-161508}, doi = {10.4230/LIPIcs.CPM.2022.23}, annote = {Keywords: tile cover, periodicity, efficient algorithm} }

Document

**Published in:** LIPIcs, Volume 212, 32nd International Symposium on Algorithms and Computation (ISAAC 2021)

Data masking is a common technique for sanitizing sensitive data maintained in database systems, and it is also becoming increasingly important in various application areas, such as in record linkage of personal data. This work formalizes the Pattern Masking for Dictionary Matching (PMDM) problem. In PMDM, we are given a dictionary 𝒟 of d strings, each of length 𝓁, a query string q of length 𝓁, and a positive integer z, and we are asked to compute a smallest set K ⊆ {1,…,𝓁}, so that if q[i] is replaced by a wildcard for all i ∈ K, then q matches at least z strings from 𝒟. Solving PMDM allows providing data utility guarantees as opposed to existing approaches.
We first show, through a reduction from the well-known k-Clique problem, that a decision version of the PMDM problem is NP-complete, even for strings over a binary alphabet. We thus approach the problem from a more practical perspective. We show a combinatorial 𝒪((d𝓁)^{|K|/3}+d𝓁)-time and 𝒪(d𝓁)-space algorithm for PMDM for |K| = 𝒪(1). In fact, we show that we cannot hope for a faster combinatorial algorithm, unless the combinatorial k-Clique hypothesis fails [Abboud et al., SIAM J. Comput. 2018; Lincoln et al., SODA 2018]. We also generalize this algorithm for the problem of masking multiple query strings simultaneously so that every string has at least z matches in 𝒟.
Note that PMDM can be viewed as a generalization of the decision version of the dictionary matching with mismatches problem: by querying a PMDM data structure with string q and z = 1, one obtains the minimal number of mismatches of q with any string from 𝒟. The query time or space of all known data structures for the more restricted problem of dictionary matching with at most k mismatches incurs some exponential factor with respect to k. A simple exact algorithm for PMDM runs in time 𝒪(2^𝓁 d). We present a data structure for PMDM that answers queries over 𝒟 in time 𝒪(2^{𝓁/2}(2^{𝓁/2}+τ)𝓁) and requires space 𝒪(2^𝓁 d²/τ²+2^{𝓁/2}d), for any parameter τ ∈ [1,d].
We complement our results by showing a two-way polynomial-time reduction between PMDM and the Minimum Union problem [Chlamtáč et al., SODA 2017]. This gives a polynomial-time 𝒪(d^{1/4+ε})-approximation algorithm for PMDM, which is tight under a plausible complexity conjecture.

Panagiotis Charalampopoulos, Huiping Chen, Peter Christen, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, and Jakub Radoszewski. Pattern Masking for Dictionary Matching. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 65:1-65:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.ISAAC.2021.65, author = {Charalampopoulos, Panagiotis and Chen, Huiping and Christen, Peter and Loukides, Grigorios and Pisanti, Nadia and Pissis, Solon P. and Radoszewski, Jakub}, title = {{Pattern Masking for Dictionary Matching}}, booktitle = {32nd International Symposium on Algorithms and Computation (ISAAC 2021)}, pages = {65:1--65:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-214-3}, ISSN = {1868-8969}, year = {2021}, volume = {212}, editor = {Ahn, Hee-Kap and Sadakane, Kunihiko}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2021.65}, URN = {urn:nbn:de:0030-drops-154982}, doi = {10.4230/LIPIcs.ISAAC.2021.65}, annote = {Keywords: string algorithms, dictionary matching, wildcards, record linkage, query term dropping} }

Document

**Published in:** LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

In the classic longest common substring (LCS) problem, we are given two strings S and T, each of length at most n, over an alphabet of size σ, and we are asked to find a longest string occurring as a fragment of both S and T. Weiner, in his seminal paper that introduced the suffix tree, presented an 𝒪(n log σ)-time algorithm for this problem [SWAT 1973]. For polynomially-bounded integer alphabets, the linear-time construction of suffix trees by Farach yielded an 𝒪(n)-time algorithm for the LCS problem [FOCS 1997]. However, for small alphabets, this is not necessarily optimal for the LCS problem in the word RAM model of computation, in which the strings can be stored in 𝒪(n log σ/log n) space and read in 𝒪(n log σ/log n) time. We show that, in this model, we can compute an LCS in time 𝒪(n log σ / √{log n}), which is sublinear in n if σ = 2^{o(√{log n})} (in particular, if σ = 𝒪(1)), using optimal space 𝒪(n log σ/log n).
We then lift our ideas to the problem of computing a k-mismatch LCS, which has received considerable attention in recent years. In this problem, the aim is to compute a longest substring of S that occurs in T with at most k mismatches. Flouri et al. showed how to compute a 1-mismatch LCS in 𝒪(n log n) time [IPL 2015]. Thankachan et al. extended this result to computing a k-mismatch LCS in 𝒪(n log^k n) time for k = 𝒪(1) [J. Comput. Biol. 2016]. We show an 𝒪(n log^{k-1/2} n)-time algorithm, for any constant integer k > 0 and irrespective of the alphabet size, using 𝒪(n) space as the previous approaches. We thus notably break through the well-known n log^k n barrier, which stems from a recursive heavy-path decomposition technique that was first introduced in the seminal paper of Cole et al. [STOC 2004] for string indexing with k errors.

Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. Faster Algorithms for Longest Common Substring. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 30:1-30:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.ESA.2021.30, author = {Charalampopoulos, Panagiotis and Kociumaka, Tomasz and Pissis, Solon P. and Radoszewski, Jakub}, title = {{Faster Algorithms for Longest Common Substring}}, booktitle = {29th Annual European Symposium on Algorithms (ESA 2021)}, pages = {30:1--30:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-204-4}, ISSN = {1868-8969}, year = {2021}, volume = {204}, editor = {Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.30}, URN = {urn:nbn:de:0030-drops-146114}, doi = {10.4230/LIPIcs.ESA.2021.30}, annote = {Keywords: longest common substring, k mismatches, wavelet tree} }

Document

**Published in:** LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

We prove 3SUM-hardness (no strongly subquadratic-time algorithm, assuming the 3SUM conjecture) of several problems related to finding Abelian square and additive square factors in a string. In particular, we conclude conditional optimality of the state-of-the-art algorithms for finding such factors.
Overall, we show 3SUM-hardness of (a) detecting an Abelian square factor of an odd half-length, (b) computing centers of all Abelian square factors, (c) detecting an additive square factor in a length-n string of integers of magnitude n^{𝒪(1)}, and (d) a problem of computing a double 3-term arithmetic progression (i.e., finding indices i ≠ j such that (x_i+x_j)/2 = x_{(i+j)/2}) in a sequence of integers x₁,… ,x_n of magnitude n^{𝒪(1)}.
Problem (d) is essentially a convolution version of the AVERAGE problem that was proposed in a manuscript of Erickson. We obtain a conditional lower bound for it with the aid of techniques recently developed by Dudek et al. [STOC 2020]. Problem (d) immediately reduces to problem (c) and is a step in reductions to problems (a) and (b). In conditional lower bounds for problems (a) and (b) we apply an encoding of Amir et al. [ICALP 2014] and extend it using several string gadgets that include arbitrarily long Abelian-square-free strings.
Our reductions also imply conditional lower bounds for detecting Abelian squares in strings over a constant-sized alphabet. We also show a subquadratic upper bound in this case, applying a result of Chan and Lewenstein [STOC 2015].

Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Hardness of Detecting Abelian and Additive Square Factors in Strings. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 77:1-77:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{radoszewski_et_al:LIPIcs.ESA.2021.77, author = {Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Hardness of Detecting Abelian and Additive Square Factors in Strings}}, booktitle = {29th Annual European Symposium on Algorithms (ESA 2021)}, pages = {77:1--77:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-204-4}, ISSN = {1868-8969}, year = {2021}, volume = {204}, editor = {Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.77}, URN = {urn:nbn:de:0030-drops-146581}, doi = {10.4230/LIPIcs.ESA.2021.77}, annote = {Keywords: Abelian square, additive square, 3SUM problem} }

Document

**Published in:** LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

We consider two notions of covers of a two-dimensional string T. A (rectangular) subarray P of T is a 2D-cover of T if each position of T is in an occurrence of P in T. A one-dimensional string S is a 1D-cover of T if its vertical and horizontal occurrences in T cover all positions of T. We show how to compute the smallest-area 2D-cover of an m × n array T in the optimal 𝒪(N) time, where N = mn, all aperiodic 2D-covers of T in 𝒪(N log N) time, and all 2D-covers of T in N^{4/3}⋅ log^{𝒪(1)}N time. Further, we show how to compute all 1D-covers in the optimal 𝒪(N) time. Along the way, we show that the Klee’s measure of a set of rectangles, each of width and height at least √n, on an n × n grid can be maintained in √n⋅ log^{𝒪(1)}n time per insertion or deletion of a rectangle, a result which could be of independent interest.

Panagiotis Charalampopoulos, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Computing Covers of 2D-Strings. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 12:1-12:20, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2021.12, author = {Charalampopoulos, Panagiotis and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Computing Covers of 2D-Strings}}, booktitle = {32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)}, pages = {12:1--12:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-186-3}, ISSN = {1868-8969}, year = {2021}, volume = {191}, editor = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.12}, URN = {urn:nbn:de:0030-drops-139635}, doi = {10.4230/LIPIcs.CPM.2021.12}, annote = {Keywords: 2D-string, cover, dynamic Klee’s measure problem} }

Document

**Published in:** LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)

The notions of periodicity and repetitions in strings, and hence these of runs and squares, naturally extend to two-dimensional strings. We consider two types of repetitions in 2D-strings: 2D-runs and quartics (quartics are a 2D-version of squares in standard strings). Amir et al. introduced 2D-runs, showed that there are 𝒪(n³) of them in an n × n 2D-string and presented a simple construction giving a lower bound of Ω(n²) for their number (Theoretical Computer Science, 2020). We make a significant step towards closing the gap between these bounds by showing that the number of 2D-runs in an n × n 2D-string is 𝒪(n² log² n). In particular, our bound implies that the 𝒪(n²log n + output) run-time of the algorithm of Amir et al. for computing 2D-runs is also 𝒪(n² log² n). We expect this result to allow for exploiting 2D-runs algorithmically in the area of 2D pattern matching.
A quartic is a 2D-string composed of 2 × 2 identical blocks (2D-strings) that was introduced by Apostolico and Brimkov (Theoretical Computer Science, 2000), where by quartics they meant only primitively rooted quartics, i.e. built of a primitive block. Here our notion of quartics is more general and analogous to that of squares in 1D-strings. Apostolico and Brimkov showed that there are 𝒪(n² log² n) occurrences of primitively rooted quartics in an n × n 2D-string and that this bound is attainable. Consequently the number of distinct primitively rooted quartics is 𝒪(n² log² n). The straightforward bound for the maximal number of distinct general quartics is 𝒪(n⁴). Here, we prove that the number of distinct general quartics is also 𝒪(n² log² n). This extends the rich combinatorial study of the number of distinct squares in a 1D-string, that was initiated by Fraenkel and Simpson (Journal of Combinatorial Theory, Series A, 1998), to two dimensions.
Finally, we show some algorithmic applications of 2D-runs. Specifically, we present algorithms for computing all occurrences of primitively rooted quartics and counting all general distinct quartics in 𝒪(n² log² n) time, which is quasi-linear with respect to the size of the input. The former algorithm is optimal due to the lower bound of Apostolico and Brimkov. The latter can be seen as a continuation of works on enumeration of distinct squares in 1D-strings using runs (Crochemore et al., Theoretical Computer Science, 2014). However, the methods used in 2D are different because of different properties of 2D-runs and quartics.

Panagiotis Charalampopoulos, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. The Number of Repetitions in 2D-Strings. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 32:1-32:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.ESA.2020.32, author = {Charalampopoulos, Panagiotis and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{The Number of Repetitions in 2D-Strings}}, booktitle = {28th Annual European Symposium on Algorithms (ESA 2020)}, pages = {32:1--32:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-162-7}, ISSN = {1868-8969}, year = {2020}, volume = {173}, editor = {Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.32}, URN = {urn:nbn:de:0030-drops-128987}, doi = {10.4230/LIPIcs.ESA.2020.32}, annote = {Keywords: 2D-run, quartic, run, square} }

Document

**Published in:** LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)

Quasiperiodicity is a generalization of periodicity that has been researched for almost 30 years. The notion of cover is the classic variant of quasiperiodicity. A cover of a text T is a string whose occurrences in T cover all positions of T. There are several algorithms computing covers of a text in linear time. In this paper we consider a natural extension of cover. For a text T, we call a pair of strings a 2-cover if they have the same length and their occurrences cover the text T. We give an algorithm that computes all 2-covers of a string of length n in 𝒪(n log n log log n + output) expected time or 𝒪(n log n log² log n / log log log n + output) worst-case time, where output is the size of output.
If (X,Y) is a 2-cover of T, then either X is a prefix and Y is a suffix of T, in which case we call (X,Y) a ps-cover, or one of X, Y is a border (that is, both a prefix and a suffix) of T, and then we call (X,Y) a b-cover. A string of length n has up to n ps-covers; we show an algorithm that computes all of them in 𝒪(n log log n) expected time or 𝒪(n log² log n / log log log n) worst-case time. A string of length n can have Θ(n²) non-trivial b-covers; our algorithm can report one b-cover per length (if it exists) or all shortest b-covers in 𝒪(n log n log log n) expected time or 𝒪(n log n log² log n / log log log n) worst-case time. All our algorithms use linear space.
The problem in scope can be generalized to λ > 2 equal-length strings, resulting in the notion of λ-cover. Cole et al. (2005) showed that the λ-cover problem is NP-complete. Our algorithms generalize to λ-covers, with (the first component of) the algorithm’s complexity multiplied by n^{λ-2}.

Jakub Radoszewski and Juliusz Straszyński. Efficient Computation of 2-Covers of a String. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 77:1-77:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{radoszewski_et_al:LIPIcs.ESA.2020.77, author = {Radoszewski, Jakub and Straszy\'{n}ski, Juliusz}, title = {{Efficient Computation of 2-Covers of a String}}, booktitle = {28th Annual European Symposium on Algorithms (ESA 2020)}, pages = {77:1--77:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-162-7}, ISSN = {1868-8969}, year = {2020}, volume = {173}, editor = {Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.77}, URN = {urn:nbn:de:0030-drops-129432}, doi = {10.4230/LIPIcs.ESA.2020.77}, annote = {Keywords: quasiperiodicity, cover of a string, 2-cover, lambda-cover} }

Document

**Published in:** LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

We consider the problem of preprocessing a text T of length n and a dictionary 𝒟 in order to be able to efficiently answer queries CountDistinct(i,j), that is, given i and j return the number of patterns from 𝒟 that occur in the fragment T[i..j]. The dictionary is internal in the sense that each pattern in 𝒟 is given as a fragment of T. This way, the dictionary takes space proportional to the number of patterns d=|𝒟| rather than their total length, which could be Θ(n⋅ d). An 𝒪̃(n+d)-size data structure that answers CountDistinct(i,j) queries 𝒪(log n)-approximately in 𝒪̃(1) time was recently proposed in a work that introduced internal dictionary matching [ISAAC 2019]. Here we present an 𝒪̃(n+d)-size data structure that answers CountDistinct(i,j) queries 2-approximately in 𝒪̃(1) time. Using range queries, for any m, we give an 𝒪̃(min(nd/m,n²/m²)+d)-size data structure that answers CountDistinct(i,j) queries exactly in 𝒪̃(m) time. We also consider the special case when the dictionary consists of all square factors of the string. We design an 𝒪(n log² n)-size data structure that allows us to count distinct squares in a text fragment T[i..j] in 𝒪(log n) time.

Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Counting Distinct Patterns in Internal Dictionary Matching. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 8:1-8:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2020.8, author = {Charalampopoulos, Panagiotis and Kociumaka, Tomasz and Mohamed, Manal and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Counting Distinct Patterns in Internal Dictionary Matching}}, booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)}, pages = {8:1--8:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-149-8}, ISSN = {1868-8969}, year = {2020}, volume = {161}, editor = {G{\o}rtz, Inge Li and Weimann, Oren}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.8}, URN = {urn:nbn:de:0030-drops-121336}, doi = {10.4230/LIPIcs.CPM.2020.8}, annote = {Keywords: dictionary matching, internal pattern matching, squares} }

Document

**Published in:** LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

The edit distance (a.k.a. the Levenshtein distance) between two words is defined as the minimum number of insertions, deletions or substitutions of letters needed to transform one word into another. The Levenshtein k-neighbourhood of a word w is the set of words that are at edit distance at most k from w. This is perhaps the most important concept underlying BLAST, a widely-used tool for comparing biological sequences. A natural combinatorial question is to ask for upper and lower bounds on the size of this set. The answer to this question has important algorithmic implications as well. Myers notes that "such bounds would give a tighter characterisation of the running time of the algorithm" behind BLAST. We show that the size of the Levenshtein k-neighbourhood of any word of length n over an arbitrary alphabet is not smaller than the size of the Levenshtein k-neighbourhood of a unary word of length n, thus providing a tight lower bound on the size of the Levenshtein k-neighbourhood. We remark that this result was posed as a conjecture by Dufresne at WCTA 2019.

Panagiotis Charalampopoulos, Solon P. Pissis, Jakub Radoszewski, Tomasz Waleń, and Wiktor Zuba. Unary Words Have the Smallest Levenshtein k-Neighbourhoods. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 10:1-10:12, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2020.10, author = {Charalampopoulos, Panagiotis and Pissis, Solon P. and Radoszewski, Jakub and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Unary Words Have the Smallest Levenshtein k-Neighbourhoods}}, booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)}, pages = {10:1--10:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-149-8}, ISSN = {1868-8969}, year = {2020}, volume = {161}, editor = {G{\o}rtz, Inge Li and Weimann, Oren}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.10}, URN = {urn:nbn:de:0030-drops-121359}, doi = {10.4230/LIPIcs.CPM.2020.10}, annote = {Keywords: combinatorics on words, Levenshtein distance, edit distance} }

Document

**Published in:** LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

In the problem of the longest common substring with k mismatches we are given two strings X, Y and must find the maximal length 𝓁 such that there is a length-𝓁 substring of X and a length-𝓁 substring of Y that differ in at most k positions. The length 𝓁 can be used as a robust measure of similarity between X, Y. In this work, we develop new approximation algorithms for computing 𝓁 that are significantly more efficient that previously known solutions from the theoretical point of view. Our approach is simple and practical, which we confirm via an experimental evaluation, and is probably close to optimal as we demonstrate via a conditional lower bound.

Garance Gourdel, Tomasz Kociumaka, Jakub Radoszewski, and Tatiana Starikovskaya. Approximating Longest Common Substring with k mismatches: Theory and Practice. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 16:1-16:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{gourdel_et_al:LIPIcs.CPM.2020.16, author = {Gourdel, Garance and Kociumaka, Tomasz and Radoszewski, Jakub and Starikovskaya, Tatiana}, title = {{Approximating Longest Common Substring with k mismatches: Theory and Practice}}, booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)}, pages = {16:1--16:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-149-8}, ISSN = {1868-8969}, year = {2020}, volume = {161}, editor = {G{\o}rtz, Inge Li and Weimann, Oren}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.16}, URN = {urn:nbn:de:0030-drops-121410}, doi = {10.4230/LIPIcs.CPM.2020.16}, annote = {Keywords: approximation algorithms, string similarity, LSH, conditional lower bounds} }

Document

**Published in:** LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Quasiperiodicity in strings was introduced almost 30 years ago as an extension of string periodicity. The basic notions of quasiperiodicity are cover and seed. A cover of a text T is a string whose occurrences in T cover all positions of T. A seed of text T is a cover of a superstring of T. In various applications exact quasiperiodicity is still not sufficient due to the presence of errors. We consider approximate notions of quasiperiodicity, for which we allow approximate occurrences in T with a small Hamming, Levenshtein or weighted edit distance.
In previous work Sip et al. (2002) and Christodoulakis et al. (2005) showed that computing approximate covers and seeds, respectively, under weighted edit distance is NP-hard. They, therefore, considered restricted approximate covers and seeds which need to be factors of the original string T and presented polynomial-time algorithms for computing them. Further algorithms, considering approximate occurrences with Hamming distance bounded by k, were given in several contributions by Guth et al. They also studied relaxed approximate quasiperiods that do not need to cover all positions of T.
In case of large data the exponents in polynomial time complexity play a crucial role. We present more efficient algorithms for computing restricted approximate covers and seeds. In particular, we improve upon the complexities of many of the aforementioned algorithms, also for relaxed quasiperiods. Our solutions are especially efficient if the number (or total cost) of allowed errors is bounded. We also show NP-hardness of computing non-restricted approximate covers and seeds under Hamming distance.
Approximate covers were studied in three recent contributions at CPM over the last three years. However, these works consider a different definition of an approximate cover of T, that is, the shortest exact cover of a string T' with the smallest Hamming distance from T.

Aleksander Kędzierski and Jakub Radoszewski. k-Approximate Quasiperiodicity under Hamming and Edit Distance. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 18:1-18:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{kedzierski_et_al:LIPIcs.CPM.2020.18, author = {K\k{e}dzierski, Aleksander and Radoszewski, Jakub}, title = {{k-Approximate Quasiperiodicity under Hamming and Edit Distance}}, booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)}, pages = {18:1--18:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-149-8}, ISSN = {1868-8969}, year = {2020}, volume = {161}, editor = {G{\o}rtz, Inge Li and Weimann, Oren}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.18}, URN = {urn:nbn:de:0030-drops-121437}, doi = {10.4230/LIPIcs.CPM.2020.18}, annote = {Keywords: approximate cover, approximate seed, enhanced cover, Hamming distance, edit distance} }

Document

**Published in:** LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary D in fragments of a given string T of length n. The dictionary is internal in the sense that each pattern in D is given as a fragment of T. This way, D takes space proportional to the number of patterns d=|D| rather than their total length, which could be Theta(n * d).
In particular, we consider the following types of queries: reporting and counting all occurrences of patterns from D in a fragment T[i..j] (operations Report(i,j) and Count(i,j) below, as well as operation Exists(i,j) that returns true iff Count(i,j)>0) and reporting distinct patterns from D that occur in T[i..j] (operation ReportDistinct(i,j)). We show how to construct, in O((n+d) log^{O(1)} n) time, a data structure that answers each of these queries in time O(log^{O(1)} n+|output|) - see the table below for specific time and space complexities.
Query | Preprocessing time | Space | Query time
Exists(i,j) | O(n+d) | O(n) | O(1)
Report(i,j) | O(n+d) | O(n+d) | O(1+|output|)
ReportDistinct(i,j) | O(n log n+d) | O(n+d) | O(log n+|output|)
Count(i,j) | O({n log n}/{log log n} + d log^{3/2} n) | O(n+d log n) | O({log^2n}/{log log n})
The case of counting patterns is much more involved and needs a combination of a locally consistent parsing with orthogonal range searching. Reporting distinct patterns, on the other hand, uses the structure of maximal repetitions in strings. Finally, we provide tight - up to subpolynomial factors - upper and lower bounds for the case of a dynamic dictionary.

Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Internal Dictionary Matching. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 22:1-22:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.ISAAC.2019.22, author = {Charalampopoulos, Panagiotis and Kociumaka, Tomasz and Mohamed, Manal and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz}, title = {{Internal Dictionary Matching}}, booktitle = {30th International Symposium on Algorithms and Computation (ISAAC 2019)}, pages = {22:1--22:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-130-6}, ISSN = {1868-8969}, year = {2019}, volume = {149}, editor = {Lu, Pinyan and Zhang, Guochuan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.22}, URN = {urn:nbn:de:0030-drops-115182}, doi = {10.4230/LIPIcs.ISAAC.2019.22}, annote = {Keywords: string algorithms, dictionary matching, internal pattern matching} }

Document

**Published in:** LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)

Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an O(n)-time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to this problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in O~(n^(2/3)) time, after O~(n)-time and space preprocessing.
This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, they presented an O~(n)-sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in O~(1) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings. We show that the techniques we develop can be applied to obtain fully dynamic algorithms for all of these variants. The only previously known sublinear-time dynamic algorithms for problems on strings were for maintaining a dynamic collection of strings for comparison queries and for pattern matching, with the most recent advances made by Gawrychowski et al. [SODA 2018] and by Clifford et al. [STACS 2018].
As an intermediate problem we consider computing the solution for a string with a given set of k edits, which leads us, in particular, to answering internal queries on a string. The input to such a query is specified by a substring (or substrings) of a given string. Data structures for answering internal string queries that were proposed by Kociumaka et al. [SODA 2015] and by Gagie et al. [CCCG 2013] are used, along with new ones, based on ingredients such as the suffix tree, heavy-path decomposition, orthogonal range queries, difference covers, and string periodicity.

Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest Common Substring Made Fully Dynamic. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 6:1-6:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ESA.2019.6, author = {Amir, Amihood and Charalampopoulos, Panagiotis and Pissis, Solon P. and Radoszewski, Jakub}, title = {{Longest Common Substring Made Fully Dynamic}}, booktitle = {27th Annual European Symposium on Algorithms (ESA 2019)}, pages = {6:1--6:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-124-5}, ISSN = {1868-8969}, year = {2019}, volume = {144}, editor = {Bender, Michael A. and Svensson, Ola and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.6}, URN = {urn:nbn:de:0030-drops-111275}, doi = {10.4230/LIPIcs.ESA.2019.6}, annote = {Keywords: longest common substring, string algorithms, dynamic algorithms} }

Document

**Published in:** LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

In this work, we show two streaming algorithms for computing the length of the shortest cover of a string of length n. We start by showing a two-pass algorithm that uses O(log^2 n) space and then show a one-pass streaming algorithm that uses O(sqrt{n log n}) space. Both algorithms run in near-linear time. The algorithms are randomized and compute the answer incorrectly with probability inverse-polynomial in n. We also show that there is no sublinear-space streaming algorithm for computing the length of the shortest seed of a string.

Paweł Gawrychowski, Jakub Radoszewski, and Tatiana Starikovskaya. Quasi-Periodicity in Streams. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 22:1-22:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2019.22, author = {Gawrychowski, Pawe{\l} and Radoszewski, Jakub and Starikovskaya, Tatiana}, title = {{Quasi-Periodicity in Streams}}, booktitle = {30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)}, pages = {22:1--22:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-103-0}, ISSN = {1868-8969}, year = {2019}, volume = {128}, editor = {Pisanti, Nadia and P. Pissis, Solon}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.22}, URN = {urn:nbn:de:0030-drops-104930}, doi = {10.4230/LIPIcs.CPM.2019.22}, annote = {Keywords: Streaming algorithms, quasi-periodicity, covers, seeds} }

Document

**Published in:** LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

We introduce the Longest Common Circular Factor (LCCF) problem in which, given strings S and T of length at most n, we are to compute the longest factor of S whose cyclic shift occurs as a factor of T. It is a new similarity measure, an extension of the classic Longest Common Factor. We show how to solve the LCCF problem in O(n log^4 n) time using O(n log^2 n) space.

Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Quasi-Linear-Time Algorithm for Longest Common Circular Factor. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 25:1-25:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{alzamel_et_al:LIPIcs.CPM.2019.25, author = {Alzamel, Mai and Crochemore, Maxime and Iliopoulos, Costas S. and Kociumaka, Tomasz and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor}, title = {{Quasi-Linear-Time Algorithm for Longest Common Circular Factor}}, booktitle = {30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)}, pages = {25:1--25:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-103-0}, ISSN = {1868-8969}, year = {2019}, volume = {128}, editor = {Pisanti, Nadia and P. Pissis, Solon}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.25}, URN = {urn:nbn:de:0030-drops-104961}, doi = {10.4230/LIPIcs.CPM.2019.25}, annote = {Keywords: longest common factor, circular pattern matching, internal pattern matching, intersection of hyperrectangles} }

Document

**Published in:** LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

In the Longest Common Factor with k Mismatches (LCF_k) problem, we are given two strings X and Y of total length n, and we are asked to find a pair of maximal-length factors, one of X and the other of Y, such that their Hamming distance is at most k. Thankachan et al. [Thankachan et al. 2016] show that this problem can be solved in O(n log^k n) time and O(n) space for constant k. We consider the LCF_k(l) problem in which we assume that the sought factors have length at least l. We use difference covers to reduce the LCF_k(l) problem with l=Omega(log^{2k+2}n) to a task involving m=O(n/log^{k+1}n) synchronized factors. The latter can be solved in O(m log^{k+1}m) time, which results in a linear-time algorithm for LCF_k(l) with l=Omega(log^{2k+2}n). In general, our solution to the LCF_k(l) problem for arbitrary l takes O(n + n log^{k+1} n/sqrt{l}) time.

Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Linear-Time Algorithm for Long LCF with k Mismatches. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 23:1-23:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2018.23, author = {Charalampopoulos, Panagiotis and Crochemore, Maxime and Iliopoulos, Costas S. and Kociumaka, Tomasz and Pissis, Solon P. and Radoszewski, Jakub and Rytter, Wojciech and Walen, Tomasz}, title = {{Linear-Time Algorithm for Long LCF with k Mismatches}}, booktitle = {29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)}, pages = {23:1--23:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-074-3}, ISSN = {1868-8969}, year = {2018}, volume = {105}, editor = {Navarro, Gonzalo and Sankoff, David and Zhu, Binhai}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.23}, URN = {urn:nbn:de:0030-drops-86869}, doi = {10.4230/LIPIcs.CPM.2018.23}, annote = {Keywords: longest common factor, longest common substring, Hamming distance, heavy-light decomposition, difference cover} }

Document

**Published in:** LIPIcs, Volume 96, 35th Symposium on Theoretical Aspects of Computer Science (STACS 2018)

The order-preserving model (op-model, in short) was introduced quite recently but has already attracted significant attention because of its applications in data analysis. We introduce several types of periods in this setting (op-periods). Then we give algorithms to compute these periods in time O(n), O(n log log n), O(n log^2 log n/log log log n), O(n log n) depending on the type of periodicity. In the most general variant the number of different periods can be as big as Omega(n^2), and a compact representation is needed. Our algorithms require novel combinatorial insight into the properties of such periods.

Garance Gourdel, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Arseny Shur, and Tomasz Walen. String Periods in the Order-Preserving Model. In 35th Symposium on Theoretical Aspects of Computer Science (STACS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 96, pp. 38:1-38:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gourdel_et_al:LIPIcs.STACS.2018.38, author = {Gourdel, Garance and Kociumaka, Tomasz and Radoszewski, Jakub and Rytter, Wojciech and Shur, Arseny and Walen, Tomasz}, title = {{String Periods in the Order-Preserving Model}}, booktitle = {35th Symposium on Theoretical Aspects of Computer Science (STACS 2018)}, pages = {38:1--38:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-062-0}, ISSN = {1868-8969}, year = {2018}, volume = {96}, editor = {Niedermeier, Rolf and Vall\'{e}e, Brigitte}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2018.38}, URN = {urn:nbn:de:0030-drops-85064}, doi = {10.4230/LIPIcs.STACS.2018.38}, annote = {Keywords: order-preserving pattern matching, period, efficient algorithm} }

Document

Complete Volume

**Published in:** LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

LIPIcs, Volume 78, CPM'17, Complete Volume

Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter. LIPIcs, Volume 78, CPM'17, Complete Volume. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@Proceedings{karkkainen_et_al:LIPIcs.CPM.2017, title = {{LIPIcs, Volume 78, CPM'17, Complete Volume}}, booktitle = {28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-039-2}, ISSN = {1868-8969}, year = {2017}, volume = {78}, editor = {K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017}, URN = {urn:nbn:de:0030-drops-75091}, doi = {10.4230/LIPIcs.CPM.2017}, annote = {Keywords: Data Structures, Data Storage Representations, Coding and Information Theory, Theory of Computation, Discrete Mathematics, Information Systems,} }

Document

Front Matter

**Published in:** LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers

Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter. Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 0:i-0:xvi, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{karkkainen_et_al:LIPIcs.CPM.2017.0, author = {K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech}, title = {{Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers}}, booktitle = {28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)}, pages = {0:i--0:xvi}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-039-2}, ISSN = {1868-8969}, year = {2017}, volume = {78}, editor = {K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.0}, URN = {urn:nbn:de:0030-drops-73178}, doi = {10.4230/LIPIcs.CPM.2017.0}, annote = {Keywords: Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers} }

Document

**Published in:** LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)

We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the lookahead scoring technique. We also consider a general variant of the pattern matching problems in which both the pattern and the text are uncertain. Central to our solution is a special case where the sequences have equal length, called the consensus problem. We propose algorithms for the consensus problem parameterized by the number of strings that match one of the sequences. As our basic approach, a careful adaptation of the classic meet-in-the-middle algorithm for the knapsack problem is used. On the lower bound side, we prove that our dependence on the parameter is optimal up to lower-order terms conditioned on the optimality of the original algorithm for the knapsack problem.

Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. Pattern Matching and Consensus Problems on Weighted Sequences and Profiles. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 46:1-46:12, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{kociumaka_et_al:LIPIcs.ISAAC.2016.46, author = {Kociumaka, Tomasz and Pissis, Solon P. and Radoszewski, Jakub}, title = {{Pattern Matching and Consensus Problems on Weighted Sequences and Profiles}}, booktitle = {27th International Symposium on Algorithms and Computation (ISAAC 2016)}, pages = {46:1--46:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-026-2}, ISSN = {1868-8969}, year = {2016}, volume = {64}, editor = {Hong, Seok-Hee}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.46}, URN = {urn:nbn:de:0030-drops-68166}, doi = {10.4230/LIPIcs.ISAAC.2016.46}, annote = {Keywords: weighted sequence, position weight matrix, profile matching} }

Document

**Published in:** LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to support efficient on-line pattern queries. We study this problem in the case where the text is weighted: for every position of the text and every letter of the alphabet a probability of occurrence of this letter at this position is given. Sequences of this type, also called position weight matrices, are commonly used to represent imprecise or uncertain data. A weighted sequence may represent many different strings, each with probability of occurrence equal to the product of probabilities of its letters at subsequent positions. Given a probability threshold 1/z, we say that a pattern string P matches a weighted text at position i if the product of probabilities of the letters of P at positions i,...,i+|P|-1 in the text is at least 1/z. In this article, we present an O(nz)-time construction of an O(nz)-sized index that can answer pattern matching queries in a weighted text in optimal time improving upon the state of the art by a factor of z log z. Other applications of this data structure include an O(nz)-time construction of the weighted prefix table and an O(nz)-time computation of all covers of a weighted sequence, which improve upon the state of the art by the same factor.

Carl Barton, Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. Efficient Index for Weighted Sequences. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 4:1-4:13, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{barton_et_al:LIPIcs.CPM.2016.4, author = {Barton, Carl and Kociumaka, Tomasz and Pissis, Solon P. and Radoszewski, Jakub}, title = {{Efficient Index for Weighted Sequences}}, booktitle = {27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)}, pages = {4:1--4:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-012-5}, ISSN = {1868-8969}, year = {2016}, volume = {54}, editor = {Grossi, Roberto and Lewenstein, Moshe}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.4}, URN = {urn:nbn:de:0030-drops-60807}, doi = {10.4230/LIPIcs.CPM.2016.4}, annote = {Keywords: weighted sequence, position weight matrix, indexing, weighted suffix tree} }

Document

**Published in:** LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Strings with don't care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still their worst-case running time is Theta(n^2). We present the first truly subquadratic-time solutions for a number of such problems on partial words that can also be adapted to indeterminate strings over a constant-sized alphabet. We show that $n$ longest common compatible prefix queries (which correspond to longest common extension queries in regular strings) can be answered on-line in O(n * sqrt(n * log(n)) time after O(n * sqrt(n * log(n))-time preprocessing. We also present O(n * sqrt(n * log(n))-time algorithms for computing the prefix array and two types of border array of a partial word.

Costas S. Iliopoulos and Jakub Radoszewski. Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 8:1-8:12, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{iliopoulos_et_al:LIPIcs.CPM.2016.8, author = {Iliopoulos, Costas S. and Radoszewski, Jakub}, title = {{Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties}}, booktitle = {27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)}, pages = {8:1--8:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-012-5}, ISSN = {1868-8969}, year = {2016}, volume = {54}, editor = {Grossi, Roberto and Lewenstein, Moshe}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.8}, URN = {urn:nbn:de:0030-drops-60842}, doi = {10.4230/LIPIcs.CPM.2016.8}, annote = {Keywords: string with don’t cares, partial word, indeterminate string, longest common conservative prefix queries, prefix array} }

Document

**Published in:** LIPIcs, Volume 20, 30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013)

We present efficient algorithms computing all Abelian periods of two types in a word. Regular Abelian periods are computed in O(n log log{n}) randomized time which improves over the best previously known algorithm by almost a factor of n. The other algorithm, for full Abelian periods, works in O(n) time. As a tool we develop an O(n) time construction of a data structure that allows O(1) time gcd(i,j) queries for all 1 <= i,j <= n, this is a result of independent interest.

Tomasz Kociumaka, Jakub Radoszewski, and Wojciech Rytter. Fast Algorithms for Abelian Periods in Words and Greatest Common Divisor Queries. In 30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013). Leibniz International Proceedings in Informatics (LIPIcs), Volume 20, pp. 245-256, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2013)

Copy BibTex To Clipboard

@InProceedings{kociumaka_et_al:LIPIcs.STACS.2013.245, author = {Kociumaka, Tomasz and Radoszewski, Jakub and Rytter, Wojciech}, title = {{Fast Algorithms for Abelian Periods in Words and Greatest Common Divisor Queries}}, booktitle = {30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013)}, pages = {245--256}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-50-7}, ISSN = {1868-8969}, year = {2013}, volume = {20}, editor = {Portier, Natacha and Wilke, Thomas}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2013.245}, URN = {urn:nbn:de:0030-drops-39387}, doi = {10.4230/LIPIcs.STACS.2013.245}, annote = {Keywords: Abelian period, greatest common divisor} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail