Document

**Published in:** LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Analyzing and comparing sequences of symbols is among the most fundamental problems in computer science, possibly even more so in bioinformatics. Maximal Common Subsequences (MCSs), i.e., inclusion-maximal sequences of non-contiguous symbols common to two or more strings, have only recently received attention in this area, despite being a basic notion and a natural generalization of more common tools like Longest Common Substrings/Subsequences. In this paper we simplify and engineer recent advancements on MCSs into a practical tool called McDag, the first publicly available tool that can index MCSs of real genomic data. We demonstrate that our tool can index sequences exceeding 10,000 base pairs within minutes, utilizing only 4-7% more than the minimum required nodes, while also extracting relevant insights.

Giovanni Buzzega, Alessio Conte, Roberto Grossi, and Giulia Punzi. McDag: Indexing Maximal Common Subsequences in Practice. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{buzzega_et_al:LIPIcs.WABI.2024.21, author = {Buzzega, Giovanni and Conte, Alessio and Grossi, Roberto and Punzi, Giulia}, title = {{McDag: Indexing Maximal Common Subsequences in Practice}}, booktitle = {24th International Workshop on Algorithms in Bioinformatics (WABI 2024)}, pages = {21:1--21:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-340-9}, ISSN = {1868-8969}, year = {2024}, volume = {312}, editor = {Pissis, Solon P. and Sung, Wing-Kin}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.21}, URN = {urn:nbn:de:0030-drops-206650}, doi = {10.4230/LIPIcs.WABI.2024.21}, annote = {Keywords: Index data structure, DAG, Common subsequence, Inclusion-wise maximality, LCS} }

Document

**Published in:** LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

In this paper, we study for the first time the Diverse Longest Common Subsequences (LCSs) problem under Hamming distance. Given a set of a constant number of input strings, the problem asks to decide if there exists some subset X of K longest common subsequences whose diversity is no less than a specified threshold Δ, where we consider two types of diversities of a set X of strings of equal length: the Sum diversity and the Min diversity defined as the sum and the minimum of the pairwise Hamming distance between any two strings in X, respectively. We analyze the computational complexity of the respective problems with Sum- and Min-diversity measures, called the Max-Sum and Max-Min Diverse LCSs, respectively, considering both approximation algorithms and parameterized complexity. Our results are summarized as follows. When K is bounded, both problems are polynomial time solvable. In contrast, when K is unbounded, both problems become NP-hard, while Max-Sum Diverse LCSs problem admits a PTAS. Furthermore, we analyze the parameterized complexity of both problems with combinations of parameters K and r, where r is the length of the candidate strings to be selected. Importantly, all positive results above are proven in a more general setting, where an input is an edge-labeled directed acyclic graph (DAG) that succinctly represents a set of strings of the same length. Negative results are proven in the setting where an input is explicitly given as a set of strings. The latter results are equipped with an encoding such a set as the longest common subsequences of a specific input string set.

Yuto Shida, Giulia Punzi, Yasuaki Kobayashi, Takeaki Uno, and Hiroki Arimura. Finding Diverse Strings and Longest Common Subsequences in a Graph. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 27:1-27:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{shida_et_al:LIPIcs.CPM.2024.27, author = {Shida, Yuto and Punzi, Giulia and Kobayashi, Yasuaki and Uno, Takeaki and Arimura, Hiroki}, title = {{Finding Diverse Strings and Longest Common Subsequences in a Graph}}, booktitle = {35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)}, pages = {27:1--27:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-326-3}, ISSN = {1868-8969}, year = {2024}, volume = {296}, editor = {Inenaga, Shunsuke and Puglisi, Simon J.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.27}, URN = {urn:nbn:de:0030-drops-201370}, doi = {10.4230/LIPIcs.CPM.2024.27}, annote = {Keywords: Sequence analysis, longest common subsequence, Hamming distance, dispersion, approximation algorithms, parameterized complexity} }

Document

**Published in:** LIPIcs, Volume 283, 34th International Symposium on Algorithms and Computation (ISAAC 2023)

Maximal Common Subsequences (MCSs) between two strings X and Y are subsequences of both X and Y that are maximal under inclusion. MCSs relax and generalize the well known and widely used concept of Longest Common Subsequences (LCSs), which can be seen as MCSs of maximum length. While the number both LCSs and MCSs can be exponential in the length of the strings, LCSs have been long exploited for string and text analysis, as simple compact representations of all LCSs between two strings, built via dynamic programming or automata, have been known since the '70s. MCSs appear to have a more challenging structure: even listing them efficiently was an open problem open until recently, thus narrowing the complexity difference between the two problems, but the gap remained significant. In this paper we close the complexity gap: we show how to build DAG of polynomial size - in polynomial time - which allows for efficient operations on the set of all MCSs such as enumeration in Constant Amortized Time per solution (CAT), counting, and random access to the i-th element (i.e., rank and select operations). Other than improving known algorithmic results, this work paves the way for new sequence analysis methods based on MCSs.

Alessio Conte, Roberto Grossi, Giulia Punzi, and Takeaki Uno. A Compact DAG for Storing and Searching Maximal Common Subsequences. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 21:1-21:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{conte_et_al:LIPIcs.ISAAC.2023.21, author = {Conte, Alessio and Grossi, Roberto and Punzi, Giulia and Uno, Takeaki}, title = {{A Compact DAG for Storing and Searching Maximal Common Subsequences}}, booktitle = {34th International Symposium on Algorithms and Computation (ISAAC 2023)}, pages = {21:1--21:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-289-1}, ISSN = {1868-8969}, year = {2023}, volume = {283}, editor = {Iwata, Satoru and Kakimura, Naonori}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2023.21}, URN = {urn:nbn:de:0030-drops-193231}, doi = {10.4230/LIPIcs.ISAAC.2023.21}, annote = {Keywords: Maximal common subsequence, DAG, Compact data structures, Enumeration, Constant amortized time, Random access} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Let Substr_k(X) denote the set of length-k substrings of a given string X for a given integer k > 0. We study the following basic string problem, called z-Shortest 𝒮_k-Equivalent Strings: Given a set 𝒮_k of n length-k strings and an integer z > 0, list z shortest distinct strings T₁,…,T_z such that Substr_k(T_i) = 𝒮_k, for all i ∈ [1,z]. The z-Shortest 𝒮_k-Equivalent Strings problem arises naturally as an encoding problem in many real-world applications; e.g., in data privacy, in data compression, and in bioinformatics. The 1-Shortest 𝒮_k-Equivalent Strings, referred to as Shortest 𝒮_k-Equivalent String, asks for a shortest string X such that Substr_k(X) = 𝒮_k.
Our main contributions are summarized below:
- Given a directed graph G(V,E), the Directed Chinese Postman (DCP) problem asks for a shortest closed walk that visits every edge of G at least once. DCP can be solved in 𝒪̃(|E||V|) time using an algorithm for min-cost flow. We show, via a non-trivial reduction, that if Shortest 𝒮_k-Equivalent String over a binary alphabet has a near-linear-time solution then so does DCP.
- We show that the length of a shortest string output by Shortest 𝒮_k-Equivalent String is in 𝒪(k+n²). We generalize this bound by showing that the total length of z shortest strings is in 𝒪(zk+zn²+z²n). We derive these upper bounds by showing (asymptotically tight) bounds on the total length of z shortest Eulerian walks in general directed graphs.
- We present an algorithm for solving z-Shortest 𝒮_k-Equivalent Strings in 𝒪(nk+n²log²n+zn²log n+|output|) time. If z = 1, the time becomes 𝒪(nk+n²log²n) by the fact that the size of the input is Θ(nk) and the size of the output is 𝒪(k+n²).

Giulia Bernardini, Alessio Conte, Esteban Gabory, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Giulia Punzi, and Michelle Sweering. On Strings Having the Same Length- k Substrings. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 16:1-16:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bernardini_et_al:LIPIcs.CPM.2022.16, author = {Bernardini, Giulia and Conte, Alessio and Gabory, Esteban and Grossi, Roberto and Loukides, Grigorios and Pissis, Solon P. and Punzi, Giulia and Sweering, Michelle}, title = {{On Strings Having the Same Length- k Substrings}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {16:1--16:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.16}, URN = {urn:nbn:de:0030-drops-161439}, doi = {10.4230/LIPIcs.CPM.2022.16}, annote = {Keywords: string algorithms, combinatorics on words, de Bruijn graph, Chinese Postman} }

Document

**Published in:** LIPIcs, Volume 170, 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)

Temporal graphs are graphs in which arcs have temporal labels, specifying at which time they can be traversed. Motivated by recent results concerning the reliability analysis of a temporal graph through the enumeration of minimal cutsets in the corresponding line graph, in this paper we attack the problem of enumerating minimal s-d separators in s-d directed acyclic graphs (in short, s-d DAGs), also known as 2-terminal DAGs or s-t digraphs. Our main result is an algorithm for enumerating all the minimal s-d separators in a DAG with O(nm) delay, where n and m are respectively the number of nodes and arcs, and the delay is the time between the output of two consecutive solutions. To this aim, we give a characterization of the minimal s-d separators in a DAG through vertex cuts of an expanded version of the DAG itself. As a consequence of our main result, we provide an algorithm for enumerating all the minimal s-d cutsets in a temporal graph with delay O(m³), where m is the number of temporal arcs.

Alessio Conte, Pierluigi Crescenzi, Andrea Marino, and Giulia Punzi. Enumeration of s-d Separators in DAGs with Application to Reliability Analysis in Temporal Graphs. In 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 170, pp. 25:1-25:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{conte_et_al:LIPIcs.MFCS.2020.25, author = {Conte, Alessio and Crescenzi, Pierluigi and Marino, Andrea and Punzi, Giulia}, title = {{Enumeration of s-d Separators in DAGs with Application to Reliability Analysis in Temporal Graphs}}, booktitle = {45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)}, pages = {25:1--25:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-159-7}, ISSN = {1868-8969}, year = {2020}, volume = {170}, editor = {Esparza, Javier and Kr\'{a}l', Daniel}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2020.25}, URN = {urn:nbn:de:0030-drops-126932}, doi = {10.4230/LIPIcs.MFCS.2020.25}, annote = {Keywords: minimal cutset, temporal graph, minimal separator, directed acyclic graph} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail