# Search Results

### Documents authored by Gabory, Esteban

Document
##### Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications

Authors: Esteban Gabory, Moses Njagi Mwaniki, Nadia Pisanti, Solon P. Pissis, Jakub Radoszewski, Michelle Sweering, and Wiktor Zuba

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

##### Abstract
An elastic-degenerate (ED) string T is a sequence of n sets T[1],…,T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as ℒ(T) = {S_1 ⋯ S_n : S_i ∈ T[i] for all i ∈ [1,n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T₁ and T₂ of lengths n₁ and n₂, cardinalities m₁ and m₂, and sizes N₁ and N₂, respectively, we show the following: - There is no 𝒪((N₁N₂)^{1-ε})-time algorithm, thus no 𝒪((N₁m₂+N₂m₁)^{1-ε})-time algorithm and no 𝒪((N₁n₂+N₂n₁)^{1-ε})-time algorithm, for any constant ε > 0, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false. - There is no combinatorial 𝒪((N₁+N₂)^{1.2-ε}f(n₁,n₂))-time algorithm, for any constant ε > 0 and any function f, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false. - An 𝒪(N₁log N₁log n₁+N₂log N₂log n₂)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T₁ and T₂ are given in a compact representation, we show that the problem is NP-complete. - An 𝒪(N₁m₂+N₂m₁)-time algorithm for EDSI. - An Õ(N₁^{ω-1}n₂+N₂^{ω-1}n₁)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size. We also show that the techniques we develop have applications outside of ED string comparison.

##### Cite as

Esteban Gabory, Moses Njagi Mwaniki, Nadia Pisanti, Solon P. Pissis, Jakub Radoszewski, Michelle Sweering, and Wiktor Zuba. Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 11:1-11:20, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)

```@InProceedings{gabory_et_al:LIPIcs.CPM.2023.11,
author =	{Gabory, Esteban and Mwaniki, Moses Njagi and Pisanti, Nadia and Pissis, Solon P. and Radoszewski, Jakub and Sweering, Michelle and Zuba, Wiktor},
title =	{{Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications}},
booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
pages =	{11:1--11:20},
series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN =	{978-3-95977-276-1},
ISSN =	{1868-8969},
year =	{2023},
volume =	{259},
editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.11},
URN =		{urn:nbn:de:0030-drops-179650},
doi =		{10.4230/LIPIcs.CPM.2023.11},
annote =	{Keywords: elastic-degenerate string, sequence comparison, languages intersection, pangenome, acronym identification}
}```
Document
##### On Strings Having the Same Length- k Substrings

Authors: Giulia Bernardini, Alessio Conte, Esteban Gabory, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Giulia Punzi, and Michelle Sweering

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

##### Abstract
Let Substr_k(X) denote the set of length-k substrings of a given string X for a given integer k > 0. We study the following basic string problem, called z-Shortest 𝒮_k-Equivalent Strings: Given a set 𝒮_k of n length-k strings and an integer z > 0, list z shortest distinct strings T₁,…,T_z such that Substr_k(T_i) = 𝒮_k, for all i ∈ [1,z]. The z-Shortest 𝒮_k-Equivalent Strings problem arises naturally as an encoding problem in many real-world applications; e.g., in data privacy, in data compression, and in bioinformatics. The 1-Shortest 𝒮_k-Equivalent Strings, referred to as Shortest 𝒮_k-Equivalent String, asks for a shortest string X such that Substr_k(X) = 𝒮_k. Our main contributions are summarized below: - Given a directed graph G(V,E), the Directed Chinese Postman (DCP) problem asks for a shortest closed walk that visits every edge of G at least once. DCP can be solved in 𝒪̃(|E||V|) time using an algorithm for min-cost flow. We show, via a non-trivial reduction, that if Shortest 𝒮_k-Equivalent String over a binary alphabet has a near-linear-time solution then so does DCP. - We show that the length of a shortest string output by Shortest 𝒮_k-Equivalent String is in 𝒪(k+n²). We generalize this bound by showing that the total length of z shortest strings is in 𝒪(zk+zn²+z²n). We derive these upper bounds by showing (asymptotically tight) bounds on the total length of z shortest Eulerian walks in general directed graphs. - We present an algorithm for solving z-Shortest 𝒮_k-Equivalent Strings in 𝒪(nk+n²log²n+zn²log n+|output|) time. If z = 1, the time becomes 𝒪(nk+n²log²n) by the fact that the size of the input is Θ(nk) and the size of the output is 𝒪(k+n²).

##### Cite as

Giulia Bernardini, Alessio Conte, Esteban Gabory, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Giulia Punzi, and Michelle Sweering. On Strings Having the Same Length- k Substrings. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 16:1-16:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

```@InProceedings{bernardini_et_al:LIPIcs.CPM.2022.16,
author =	{Bernardini, Giulia and Conte, Alessio and Gabory, Esteban and Grossi, Roberto and Loukides, Grigorios and Pissis, Solon P. and Punzi, Giulia and Sweering, Michelle},
title =	{{On Strings Having the Same Length- k Substrings}},
booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
pages =	{16:1--16:17},
series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN =	{978-3-95977-234-1},
ISSN =	{1868-8969},
year =	{2022},
volume =	{223},
editor =	{Bannai, Hideo and Holub, Jan},
publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.16},
URN =		{urn:nbn:de:0030-drops-161439},
doi =		{10.4230/LIPIcs.CPM.2022.16},
annote =	{Keywords: string algorithms, combinatorics on words, de Bruijn graph, Chinese Postman}
}```
X

Feedback for Dagstuhl Publishing