14 Search Results for "Tomescu, Alexandru I."


Document
Track A: Algorithms, Complexity and Games
Minimum Chain Cover in Almost Linear Time

Authors: Manuel Cáceres

Published in: LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)


Abstract
A minimum chain cover (MCC) of a k-width directed acyclic graph (DAG) G = (V, E) is a set of k chains (paths in the transitive closure) of G such that every vertex appears in at least one chain in the cover. The state-of-the-art solutions for MCC run in time Õ(k(|V|+|E|)) [Mäkinen et at., TALG], O(T_{MF}(|E|) + k|V|), O(k²|V| + |E|) [Cáceres et al., SODA 2022], Õ(|V|^{3/2} + |E|) [Kogan and Parter, ICALP 2022] and Õ(T_{MCF}(|E|) + √k|V|) [Kogan and Parter, SODA 2023], where T_{MF}(|E|) and T_{MCF}(|E|) are the running times for solving maximum flow (MF) and minimum-cost flow (MCF), respectively. In this work we present an algorithm running in time O(T_{MF}(|E|) + (|V|+|E|)log k). By considering the recent result for solving MF [Chen et al., FOCS 2022] our algorithm is the first running in almost linear time. Moreover, our techniques are deterministic and derive a deterministic near-linear time algorithm for MCC if the same is provided for MF. At the core of our solution we use a modified version of the mergeable dictionaries [Farach and Thorup, Algorithmica], [Iacono and Özkan, ICALP 2010] data structure boosted with the SIZE-SPLIT operation and answering queries in amortized logarithmic time, which can be of independent interest.

Cite as

Manuel Cáceres. Minimum Chain Cover in Almost Linear Time. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 31:1-31:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{caceres:LIPIcs.ICALP.2023.31,
  author =	{C\'{a}ceres, Manuel},
  title =	{{Minimum Chain Cover in Almost Linear Time}},
  booktitle =	{50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)},
  pages =	{31:1--31:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-278-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{261},
  editor =	{Etessami, Kousha and Feige, Uriel and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.31},
  URN =		{urn:nbn:de:0030-drops-180834},
  doi =		{10.4230/LIPIcs.ICALP.2023.31},
  annote =	{Keywords: Minimum chain cover, directed acyclic graph, minimum flow, flow decomposition, mergeable dictionaries, amortized running time}
}
Document
Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond

Authors: Manuel Cáceres

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)


Abstract
The problem of String Matching to Labeled Graphs (SMLG) asks to find all the paths in a labeled graph G = (V, E) whose spellings match that of an input string S ∈ Σ^m. SMLG can be solved in quadratic O(m|E|) time [Amir et al., JALG 2000], which was proven to be optimal by a recent lower bound conditioned on SETH [Equi et al., ICALP 2019]. The lower bound states that no strongly subquadratic time algorithm exists, even if restricted to directed acyclic graphs (DAGs). In this work we present the first parameterized algorithms for SMLG on DAGs. Our parameters capture the topological structure of G. All our results are derived from a generalization of the Knuth-Morris-Pratt algorithm [Park and Kim, CPM 1995] optimized to work in time proportional to the number of prefix-incomparable matches. To obtain the parameterization in the topological structure of G, we first study a special class of DAGs called funnels [Millani et al., JCO 2020] and generalize them to k-funnels and the class ST_k. We present several novel characterizations and algorithmic contributions on both funnels and their generalizations.

Cite as

Manuel Cáceres. Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{caceres:LIPIcs.CPM.2023.7,
  author =	{C\'{a}ceres, Manuel},
  title =	{{Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{7:1--7:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.7},
  URN =		{urn:nbn:de:0030-drops-179619},
  doi =		{10.4230/LIPIcs.CPM.2023.7},
  annote =	{Keywords: string matching, parameterized algorithms, FPT inside P, string algorithms, graph algorithms, directed acyclic graphs, labeled graphs, funnels}
}
Document
Cut Paths and Their Remainder Structure, with Applications

Authors: Massimo Cairo, Shahbaz Khan, Romeo Rizzi, Sebastian Schmidt, Alexandru I. Tomescu, and Elia C. Zirondelli

Published in: LIPIcs, Volume 254, 40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023)


Abstract
In a strongly connected graph G = (V,E), a cut arc (also called strong bridge) is an arc e ∈ E whose removal makes the graph no longer strongly connected. Equivalently, there exist u,v ∈ V, such that all u-v walks contain e. Cut arcs are a fundamental graph-theoretic notion, with countless applications, especially in reachability problems. In this paper we initiate the study of cut paths, as a generalisation of cut arcs, which we naturally define as those paths P for which there exist u,v ∈ V, such that all u-v walks contain P as subwalk. We first prove various properties of cut paths and define their remainder structures, which we use to present a simple O(m)-time verification algorithm for a cut path (|V| = n, |E| = m). Secondly, we apply cut paths and their remainder structures to improve several reachability problems from bioinformatics, as follows. A walk is called safe if it is a subwalk of every node-covering closed walk of a strongly connected graph. Multi-safety is defined analogously, by considering node-covering sets of closed walks instead. We show that cut paths provide simple O(m)-time algorithms verifying if a walk is safe or multi-safe. For multi-safety, we present the first linear time algorithm, while for safety, we present a simple algorithm where the state-of-the-art employed complex data structures. Finally we show that the simultaneous computation of remainder structures of all subwalks of a cut path can be performed in linear time, since they are related in a structured way. These properties yield an O(mn)-time algorithm outputting all maximal multi-safe walks, improving over the state-of-the-art algorithm running in time O(m²+n³). The results of this paper only scratch the surface in the study of cut paths, and we believe a rich structure of a graph can be revealed, considering the perspective of a path, instead of just an arc.

Cite as

Massimo Cairo, Shahbaz Khan, Romeo Rizzi, Sebastian Schmidt, Alexandru I. Tomescu, and Elia C. Zirondelli. Cut Paths and Their Remainder Structure, with Applications. In 40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 254, pp. 17:1-17:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{cairo_et_al:LIPIcs.STACS.2023.17,
  author =	{Cairo, Massimo and Khan, Shahbaz and Rizzi, Romeo and Schmidt, Sebastian and Tomescu, Alexandru I. and Zirondelli, Elia C.},
  title =	{{Cut Paths and Their Remainder Structure, with Applications}},
  booktitle =	{40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023)},
  pages =	{17:1--17:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-266-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{254},
  editor =	{Berenbrink, Petra and Bouyer, Patricia and Dawar, Anuj and Kant\'{e}, Mamadou Moustapha},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2023.17},
  URN =		{urn:nbn:de:0030-drops-176690},
  doi =		{10.4230/LIPIcs.STACS.2023.17},
  annote =	{Keywords: reachability, cut arc, strong bridge, covering walk, safety, persistence, essentiality, genome assembly}
}
Document
Width Helps and Hinders Splitting Flows

Authors: Manuel Cáceres, Massimo Cairo, Andreas Grigorjew, Shahbaz Khan, Brendan Mumey, Romeo Rizzi, Alexandru I. Tomescu, and Lucia Williams

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)


Abstract
Minimum flow decomposition (MFD) is the NP-hard problem of finding a smallest decomposition of a network flow X on directed graph G into weighted source-to-sink paths whose superposition equals X. We focus on a common formulation of the problem where the path weights must be non-negative integers and also on a new variant where these weights can be negative. We show that, for acyclic graphs, considering the width of the graph (the minimum number of s-t paths needed to cover all of its edges) yields advances in our understanding of its approximability. For the non-negative version, we show that a popular heuristic is a O(log |X|)-approximation (|X| being the total flow of X) on graphs satisfying two properties related to the width (satisfied by e.g., series-parallel graphs), and strengthen its worst-case approximation ratio from Ω(√m) to Ω(m / log m) for sparse graphs, where m is the number of edges in the graph. For the negative version, we give a (⌈log ║X║⌉+1)-approximation (║X║ being the maximum absolute value of X on any edge) using a power-of-two approach, combined with parity fixing arguments and a decomposition of unitary flows (║X║ ≤ 1) into at most width paths. We also disprove a conjecture about the linear independence of minimum (non-negative) flow decompositions posed by Kloster et al. [ALENEX 2018], but show that its useful implication (polynomial-time assignments of weights to a given set of paths to decompose a flow) holds for the negative version.

Cite as

Manuel Cáceres, Massimo Cairo, Andreas Grigorjew, Shahbaz Khan, Brendan Mumey, Romeo Rizzi, Alexandru I. Tomescu, and Lucia Williams. Width Helps and Hinders Splitting Flows. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 31:1-31:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{caceres_et_al:LIPIcs.ESA.2022.31,
  author =	{C\'{a}ceres, Manuel and Cairo, Massimo and Grigorjew, Andreas and Khan, Shahbaz and Mumey, Brendan and Rizzi, Romeo and Tomescu, Alexandru I. and Williams, Lucia},
  title =	{{Width Helps and Hinders Splitting Flows}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{31:1--31:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.31},
  URN =		{urn:nbn:de:0030-drops-169695},
  doi =		{10.4230/LIPIcs.ESA.2022.31},
  annote =	{Keywords: Flow decomposition, approximation algorithms, graph width}
}
Document
Optimizing Safe Flow Decompositions in DAGs

Authors: Shahbaz Khan and Alexandru I. Tomescu

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)


Abstract
Network flow is one of the most studied combinatorial optimization problems having innumerable applications. Any flow on a directed acyclic graph G having n vertices and m edges can be decomposed into a set of O(m) paths. The applications of such a flow decomposition range from network routing to the assembly of biological sequences. However, in some applications, each solution (decomposition) corresponds to some particular data that generated the original flow. Given the possibility of multiple optimal solutions, no optimization criterion ensures the identification of the correct decomposition. Hence, recently flow decomposition was studied [RECOMB22] in the Safe and Complete framework, particularly for RNA Assembly. The proposed solution reported all the safe paths, i.e., the paths which are subpath of every possible solution of flow decomposition. They presented a characterization of the safe paths, resulting in an O(mn+out_R) time algorithm to compute all safe paths, where out_R is the size of the raw output reporting each safe path explicitly. They also showed that out_R can be Ω(mn²) in the worst case but O(m) in the best case. Hence, they further presented an algorithm to report a concise representation of the output out_C in O(mn+out_C) time, where out_C can be Ω(mn) in the worst case but O(m) in the best case. In this work, we study how different safe paths interact, resulting in optimal output-sensitive algorithms requiring O(m+out_R) and O(m+out_C) time for computing the existing representations of the safe paths. Our algorithm uses a novel data structure called Path Tries, which may be of independent interest. Further, we propose a new characterization of the safe paths resulting in the optimal representation of safe paths out_O, which can be Ω(mn) in the worst case but requires optimal O(1) space for every safe path reported. We also present a near-optimal algorithm to compute all the safe paths in O(m+out_Olog n) time. The new representation also establishes tighter worst case bounds Θ(mn²) and Θ(mn) bounds for out_R and out_C (along with out_O), respectively. Overall we further develop the theory of safe and complete solutions for the flow decomposition problem, giving an optimal algorithm for the explicit representation, and a near-optimal algorithm for the optimal representation of the safe paths.

Cite as

Shahbaz Khan and Alexandru I. Tomescu. Optimizing Safe Flow Decompositions in DAGs. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 72:1-72:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{khan_et_al:LIPIcs.ESA.2022.72,
  author =	{Khan, Shahbaz and Tomescu, Alexandru I.},
  title =	{{Optimizing Safe Flow Decompositions in DAGs}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{72:1--72:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.72},
  URN =		{urn:nbn:de:0030-drops-170101},
  doi =		{10.4230/LIPIcs.ESA.2022.72},
  annote =	{Keywords: safety, flows, networks, directed acyclic graphs}
}
Document
Invited Talk
Efficient Solutions to Biological Problems Using de Bruijn Graphs (Invited Talk)

Authors: Leena Salmela

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
The de Bruijn graph has become a standard method in the analysis of sequencing reads in computational biology due to its ability to represent the information contained in large read sets in small space. A de Bruijn graph represents a set of sequencing reads by its k-mers, i.e. the set of substrings of length k that occur in the reads. In the classical definition, the k-mers are the edges of the graph and the nodes are the k-1 bases long prefixes and suffixes of the k-mers. Usually only k-mers occurring several times in the read set are kept to filter out noise in the data. De Bruijn graphs have been used to solve many problems in computational biology including genome assembly [Ramana M. Idury and Michael S. Waterman, 1995; Pavel A. Pevzner et al., 2001; Anton Bankevich et al., 2012; Yu Peng et al., 2010], sequencing error correction [Leena Salmela and Eric Rivals, 2014; Giles Miclotte et al., 2016; Leena Salmela et al., 2017; Limasset et al., 2019], reference free variant calling [Raluca Uricaru et al., 2015], indexing read sets [Camille Marchet et al., 2021], and so on. Next I will discuss two of these problems in more depth. The de Bruijn graph first emerged in computation biology in the context of genome assembly [Ramana M. Idury and Michael S. Waterman, 1995; Pavel A. Pevzner et al., 2001] where the task is to reconstruct a genome based on sequencing reads. As the de Bruijn graph can represent large read sets compactly, it became the standard approach to assemble short reads [Anton Bankevich et al., 2012; Yu Peng et al., 2010]. In the theoretical framework of de Bruijn graph based genome assembly, a genome is thought to be the Eulerian path in the de Bruijn graph built on the sequencing reads. In practise, the Eulerian path is not unique and thus not useful in the biological context. Therefore, practical implementations report subpaths that are guaranteed to be part of any Eulerian path and thus part of the actual genome. Such models include unitigs, which are nonbranching paths of the de Bruijn graph, and more involved definitions such as omnitigs [Alexandru I. Tomescu and Paul Medvedev, 2017]. In genome assembly the choice of k is a crucial matter. A small k can result in a tangled graph, whereas a too large k will fragment the graph. Furthermore, a different value of k may be optimal for different parts of the genome. Variable order de Bruijn graphs [Christina Boucher et al., 2015; Djamal Belazzougui et al., 2016], which represent de Bruijn graphs of all orders k in a single data structure, have been proposed as a solution but no rigorous definition corresponding to unitigs has been presented. We give the first definition of assembled sequences, i.e. contigs, on such graphs and an algorithm for enumerating them. Another problem that can be solved with de Bruijn graphs is the correction of sequencing errors [Leena Salmela and Eric Rivals, 2014; Giles Miclotte et al., 2016; Leena Salmela et al., 2017; Limasset et al., 2019]. Because each position of a genome is sequenced several times, it is possible to correct sequencing errors in reads if we can identify data originating from the same genomic region. A de Bruijn graph can be used to represent compactly the reliable information and the individual reads can be corrected by aligning them to the graph.

Cite as

Leena Salmela. Efficient Solutions to Biological Problems Using de Bruijn Graphs (Invited Talk). In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 1:1-1:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{salmela:LIPIcs.WABI.2022.1,
  author =	{Salmela, Leena},
  title =	{{Efficient Solutions to Biological Problems Using de Bruijn Graphs}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{1:1--1:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.1},
  URN =		{urn:nbn:de:0030-drops-170357},
  doi =		{10.4230/LIPIcs.WABI.2022.1},
  annote =	{Keywords: de Bruijn graph, variable order de Bruijn graph, genome assembly, sequencing error correction, k-mers}
}
Document
Algorithms and Complexity on Indexing Elastic Founder Graphs

Authors: Massimo Equi, Tuukka Norri, Jarno Alanko, Bastien Cazaux, Alexandru I. Tomescu, and Veli Mäkinen

Published in: LIPIcs, Volume 212, 32nd International Symposium on Algorithms and Computation (ISAAC 2021)


Abstract
We study the problem of matching a string in a labeled graph. Previous research has shown that unless the Orthogonal Vectors Hypothesis (OVH) is false, one cannot solve this problem in strongly sub-quadratic time, nor index the graph in polynomial time to answer queries efficiently (Equi et al. ICALP 2019, SOFSEM 2021). These conditional lower-bounds cover even deterministic graphs with binary alphabet, but there naturally exist also graph classes that are easy to index: E.g. Wheeler graphs (Gagie et al. Theor. Comp. Sci. 2017) cover graphs admitting a Burrows-Wheeler transform -based indexing scheme. However, it is NP-complete to recognize if a graph is a Wheeler graph (Gibney, Thankachan, ESA 2019). We propose an approach to alleviate the construction bottleneck of Wheeler graphs. Rather than starting from an arbitrary graph, we study graphs induced from multiple sequence alignments. Elastic degenerate strings (Bernadini et al. SPIRE 2017, ICALP 2019) can be seen as such graphs, and we introduce here their generalization: elastic founder graphs. We first prove that even such induced graphs are hard to index under OVH. Then we introduce two subclasses that are easy to index. Moreover, we give a near-linear time algorithm to construct indexable elastic founder graphs. This algorithm is based on an earlier segmentation algorithm for gapless multiple sequence alignments inducing non-elastic founder graphs (Mäkinen et al., WABI 2020), but uses more involved techniques to cope with repetitive string collections synchronized with gaps. Finally, we show that one of the subclasses admits a reduction to Wheeler graphs in polynomial time.

Cite as

Massimo Equi, Tuukka Norri, Jarno Alanko, Bastien Cazaux, Alexandru I. Tomescu, and Veli Mäkinen. Algorithms and Complexity on Indexing Elastic Founder Graphs. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 20:1-20:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{equi_et_al:LIPIcs.ISAAC.2021.20,
  author =	{Equi, Massimo and Norri, Tuukka and Alanko, Jarno and Cazaux, Bastien and Tomescu, Alexandru I. and M\"{a}kinen, Veli},
  title =	{{Algorithms and Complexity on Indexing Elastic Founder Graphs}},
  booktitle =	{32nd International Symposium on Algorithms and Computation (ISAAC 2021)},
  pages =	{20:1--20:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-214-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{212},
  editor =	{Ahn, Hee-Kap and Sadakane, Kunihiko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2021.20},
  URN =		{urn:nbn:de:0030-drops-154532},
  doi =		{10.4230/LIPIcs.ISAAC.2021.20},
  annote =	{Keywords: orthogonal vectors hypothesis, multiple sequence alignment, segmentation}
}
Document
Flow Decomposition with Subpath Constraints

Authors: Lucia Williams, Alexandru I. Tomescu, and Brendan Mumey

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)


Abstract
Flow network decomposition is a natural model for problems where we are given a flow network arising from superimposing a set of weighted paths and would like to recover the underlying data, i.e., decompose the flow into the original paths and their weights. Thus, variations on flow decomposition are often used as subroutines in multiassembly problems such as RNA transcript assembly. In practice, we frequently have access to information beyond flow values in the form of subpaths, and many tools incorporate these heuristically. But despite acknowledging their utility in practice, previous work has not formally addressed the effect of subpath constraints on the accuracy of flow network decomposition approaches. We formalize the flow decomposition with subpath constraints problem, give the first algorithms for it, and study its usefulness for recovering ground truth decompositions. For finding a minimum decomposition, we propose both a heuristic and an FPT algorithm. Experiments on RNA transcript datasets show that for instances with larger solution path sets, the addition of subpath constraints finds 13% more ground truth solutions when minimal decompositions are found exactly, and 30% more ground truth solutions when minimal decompositions are found heuristically.

Cite as

Lucia Williams, Alexandru I. Tomescu, and Brendan Mumey. Flow Decomposition with Subpath Constraints. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 16:1-16:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{williams_et_al:LIPIcs.WABI.2021.16,
  author =	{Williams, Lucia and Tomescu, Alexandru I. and Mumey, Brendan},
  title =	{{Flow Decomposition with Subpath Constraints}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{16:1--16:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.16},
  URN =		{urn:nbn:de:0030-drops-143695},
  doi =		{10.4230/LIPIcs.WABI.2021.16},
  annote =	{Keywords: Flow decomposition, subpath constraints, RNA-Seq}
}
Document
Track A: Algorithms, Complexity and Games
Genome Assembly, from Practice to Theory: Safe, Complete and Linear-Time

Authors: Massimo Cairo, Romeo Rizzi, Alexandru I. Tomescu, and Elia C. Zirondelli

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)


Abstract
Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Even though it is one of the key problems in Bioinformatics, it is generally lacking major theoretical advances. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state-of-the-art assemblers are based on finding non-branching paths (unitigs) in an assembly graph. While such paths constitute only partial assemblies, they are likely to be correct. More precisely, if one defines a genome assembly solution as a closed arc-covering walk of the graph, then unitigs appear in all solutions, being thus safe partial solutions. Until recently, it was open what are all the safe walks of an assembly graph. Tomescu and Medvedev (RECOMB 2016) characterized all such safe walks (omnitigs), thus giving the first safe and complete genome assembly algorithm. Even though omnitig finding was later improved to quadratic time, it remained open whether the crucial linear-time feature of finding unitigs can be attained with omnitigs. We answer this question affirmatively, by describing a surprising O(m)-time algorithm to identify all maximal omnitigs of a graph with n nodes and m arcs, notwithstanding the existence of families of graphs with Θ(mn) total maximal omnitig size. This is based on the discovery of a family of walks (macrotigs) with the property that all the non-trivial omnitigs are univocal extensions of subwalks of a macrotig. This has two consequences: (1) A linear-time output-sensitive algorithm enumerating all maximal omnitigs. (2) A compact O(m) representation of all maximal omnitigs, which allows, e.g., for O(m)-time computation of various statistics on them. Our results close a long-standing theoretical question inspired by practical genome assemblers, originating with the use of unitigs in 1995. We envision our results to be at the core of a reverse transfer from theory to practical and complete genome assembly programs, as has been the case for other key Bioinformatics problems.

Cite as

Massimo Cairo, Romeo Rizzi, Alexandru I. Tomescu, and Elia C. Zirondelli. Genome Assembly, from Practice to Theory: Safe, Complete and Linear-Time. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 43:1-43:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{cairo_et_al:LIPIcs.ICALP.2021.43,
  author =	{Cairo, Massimo and Rizzi, Romeo and Tomescu, Alexandru I. and Zirondelli, Elia C.},
  title =	{{Genome Assembly, from Practice to Theory: Safe, Complete and Linear-Time}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{43:1--43:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.43},
  URN =		{urn:nbn:de:0030-drops-141122},
  doi =		{10.4230/LIPIcs.ICALP.2021.43},
  annote =	{Keywords: Graph algorithm, strong connectivity, reachability under failures}
}
Document
Optimal Construction of Hierarchical Overlap Graphs

Authors: Shahbaz Khan

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)


Abstract
Genome assembly is a fundamental problem in Bioinformatics, where for a given set of overlapping substrings of a genome, the aim is to reconstruct the source genome. The classical approaches to solving this problem use assembly graphs, such as de Bruijn graphs or overlap graphs, which maintain partial information about such overlaps. For genome assembly algorithms, these graphs present a trade-off between overlap information stored and scalability. Thus, Hierarchical Overlap Graph (HOG) was proposed to overcome the limitations of both these approaches. For a given set P of n strings, the first algorithm to compute HOG was given by Cazaux and Rivals [IPL20] requiring O(||P||+n²) time using superlinear space, where ||P|| is the cumulative sum of the lengths of strings in P. This was improved by Park et al. [SPIRE20] to O(||P||log n) time and O(||P||) space using segment trees, and further to O(||P||(log n)/(log log n)) for the word RAM model. Both these results described an open problem to compute HOG in optimal O(||P||) time and space. In this paper, we achieve the desired optimal bounds by presenting a simple algorithm that does not use any complex data structures. At its core, our solution improves the classical result [IPL92] for a special case of the All Pairs Suffix Prefix (APSP) problem from O(||P||+n²) time to optimal O(||P||) time, which may be of independent interest.

Cite as

Shahbaz Khan. Optimal Construction of Hierarchical Overlap Graphs. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 17:1-17:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{khan:LIPIcs.CPM.2021.17,
  author =	{Khan, Shahbaz},
  title =	{{Optimal Construction of Hierarchical Overlap Graphs}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{17:1--17:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.17},
  URN =		{urn:nbn:de:0030-drops-139683},
  doi =		{10.4230/LIPIcs.CPM.2021.17},
  annote =	{Keywords: Hierarchical Overlap Graphs, String algorithms, Genome assembly}
}
Document
Linear Time Construction of Indexable Founder Block Graphs

Authors: Veli Mäkinen, Bastien Cazaux, Massimo Equi, Tuukka Norri, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)


Abstract
We introduce a compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA). Such founder sequences have the feature that each row of the MSA is a recombination of the founders. Several linear time dynamic programming algorithms have been previously devised to optimize segmentations that induce founder blocks that then can be concatenated into a set of founder sequences. All possible concatenation orders can be expressed as a founder block graph. We observe a key property of such graphs: if the node labels (founder segments) do not repeat in the paths of the graph, such graphs can be indexed for efficient string matching. We call such graphs segment repeat-free founder block graphs. We give a linear time algorithm to construct a segment repeat-free founder block graph given an MSA. The algorithm combines techniques from the founder segmentation algorithms (Cazaux et al. SPIRE 2019) and fully-functional bidirectional Burrows-Wheeler index (Belazzougui and Cunial, CPM 2019). We derive a succinct index structure to support queries of arbitrary length in the paths of the graph. Experiments on an MSA of SARS-CoV-2 strains are reported. An MSA of size 410× 29811 is compacted in one minute into a segment repeat-free founder block graph of 3900 nodes and 4440 edges. The maximum length and total length of node labels is 12 and 34968, respectively. The index on the graph takes only 3% of the size of the MSA.

Cite as

Veli Mäkinen, Bastien Cazaux, Massimo Equi, Tuukka Norri, and Alexandru I. Tomescu. Linear Time Construction of Indexable Founder Block Graphs. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{makinen_et_al:LIPIcs.WABI.2020.7,
  author =	{M\"{a}kinen, Veli and Cazaux, Bastien and Equi, Massimo and Norri, Tuukka and Tomescu, Alexandru I.},
  title =	{{Linear Time Construction of Indexable Founder Block Graphs}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.7},
  URN =		{urn:nbn:de:0030-drops-127961},
  doi =		{10.4230/LIPIcs.WABI.2020.7},
  annote =	{Keywords: Pangenome indexing, founder reconstruction, multiple sequence alignment, compressed data structures, string matching}
}
Document
Track A: Algorithms, Complexity and Games
On the Complexity of String Matching for Graphs

Authors: Massimo Equi, Roberto Grossi, Veli Mäkinen, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)


Abstract
Exact string matching in labeled graphs is the problem of searching paths of a graph G=(V,E) such that the concatenation of their node labels is equal to the given pattern string P[1..m]. This basic problem can be found at the heart of more complex operations on variation graphs in computational biology, of query operations in graph databases, and of analysis operations in heterogeneous networks. We prove a conditional lower bound stating that, for any constant epsilon>0, an O(|E|^{1 - epsilon} m)-time, or an O(|E| m^{1 - epsilon})-time algorithm for exact string matching in graphs, with node labels and patterns drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is false. This holds even if restricted to undirected graphs with maximum node degree two, i.e. to zig-zag matching in bidirectional strings, or to deterministic directed acyclic graphs whose nodes have maximum sum of indegree and outdegree three. These restricted cases make the lower bound stricter than what can be directly derived from related bounds on regular expression matching (Backurs and Indyk, FOCS'16). In fact, our bounds are tight in the sense that lowering the degree or the alphabet size yields linear-time solvable problems. An interesting corollary is that exact and approximate matching are equally hard (quadratic time) in graphs under SETH. In comparison, the same problems restricted to strings have linear-time vs quadratic-time solutions, respectively (approximate pattern matching having also a matching SETH lower bound (Backurs and Indyk, STOC'15)).

Cite as

Massimo Equi, Roberto Grossi, Veli Mäkinen, and Alexandru I. Tomescu. On the Complexity of String Matching for Graphs. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 55:1-55:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{equi_et_al:LIPIcs.ICALP.2019.55,
  author =	{Equi, Massimo and Grossi, Roberto and M\"{a}kinen, Veli and Tomescu, Alexandru I.},
  title =	{{On the Complexity of String Matching for Graphs}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{55:1--55:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.55},
  URN =		{urn:nbn:de:0030-drops-106314},
  doi =		{10.4230/LIPIcs.ICALP.2019.55},
  annote =	{Keywords: exact pattern matching, graph query, graph search, labeled graphs, string matching, string search, strong exponential time hypothesis, heterogeneous networks, variation graphs}
}
Document
Safe and Complete Algorithms for Dynamic Programming Problems, with an Application to RNA Folding

Authors: Niko Kiirala, Leena Salmela, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
Many bioinformatics problems admit a large number of solutions, with no way of distinguishing the correct one among them. One approach of coping with this issue is to look at the partial solutions common to all solutions. Such partial solutions have been called safe, and an algorithm outputting all safe solutions has been called safe and complete. In this paper we develop a general technique that automatically provides a safe and complete algorithm to problems solvable by dynamic programming. We illustrate it by applying it to the bioinformatics problem of RNA folding, assuming the simplistic folding model maximizing the number of paired bases. Our safe and complete algorithm has time complexity O(n^3M(n)) and space complexity O(n^3) where n is the length of the RNA sequence and M(n) in Omega(n) is the time complexity of arithmetic operations on O(n)-bit integers. We also implement this algorithm and show that, despite an exponential number of optimal solutions, our algorithm is efficient in practice.

Cite as

Niko Kiirala, Leena Salmela, and Alexandru I. Tomescu. Safe and Complete Algorithms for Dynamic Programming Problems, with an Application to RNA Folding. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kiirala_et_al:LIPIcs.CPM.2019.8,
  author =	{Kiirala, Niko and Salmela, Leena and Tomescu, Alexandru I.},
  title =	{{Safe and Complete Algorithms for Dynamic Programming Problems, with an Application to RNA Folding}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{8:1--8:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.8},
  URN =		{urn:nbn:de:0030-drops-104794},
  doi =		{10.4230/LIPIcs.CPM.2019.8},
  annote =	{Keywords: RNA secondary structure, RNA folding, Safe solution, Safe and complete algorithm, Counting problem}
}
Document
Optimal Omnitig Listing for Safe and Complete Contig Assembly

Authors: Massimo Cairo, Paul Medvedev, Nidia Obscura Acosta, Romeo Rizzi, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)


Abstract
Genome assembly is the problem of reconstructing a genome sequence from a set of reads from a sequencing experiment. Typical formulations of the assembly problem admit in practice many genomic reconstructions, and actual genome assemblers usually output contigs, namely substrings that are promised to occur in the genome. To bridge the theory and practice, Tomescu and Medvedev [RECOMB 2016] reformulated contig assembly as finding all substrings common to all genomic reconstructions. They also gave a characterization of those walks (omnitigs) that are common to all closed edge-covering walks of a (directed) graph, a typical notion of genomic reconstruction. An algorithm for listing all maximal omnitigs was also proposed, by launching an exhaustive visit from every edge. In this paper, we prove new insights about the structure of omnitigs and solve several open questions about them. We combine these to achieve an O(nm)-time algorithm for outputting all the maximal omnitigs of a graph (with n nodes and m edges). This is also optimal, as we show families of graphs whose total omnitig length is Omega(nm). We implement this algorithm and show that it is 9-12 times faster in practice than the one of Tomescu and Medvedev [RECOMB 2016].

Cite as

Massimo Cairo, Paul Medvedev, Nidia Obscura Acosta, Romeo Rizzi, and Alexandru I. Tomescu. Optimal Omnitig Listing for Safe and Complete Contig Assembly. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 29:1-29:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{cairo_et_al:LIPIcs.CPM.2017.29,
  author =	{Cairo, Massimo and Medvedev, Paul and Obscura Acosta, Nidia and Rizzi, Romeo and Tomescu, Alexandru I.},
  title =	{{Optimal Omnitig Listing for Safe and Complete Contig Assembly}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{29:1--29:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.29},
  URN =		{urn:nbn:de:0030-drops-73423},
  doi =		{10.4230/LIPIcs.CPM.2017.29},
  annote =	{Keywords: genome assembly, graph algorithm, edge-covering walk, strong bridge}
}
  • Refine by Author
  • 10 Tomescu, Alexandru I.
  • 4 Cairo, Massimo
  • 4 Khan, Shahbaz
  • 4 Rizzi, Romeo
  • 3 Cáceres, Manuel
  • Show More...

  • Refine by Classification
  • 5 Theory of computation → Pattern matching
  • 4 Theory of computation → Graph algorithms analysis
  • 4 Theory of computation → Network flows
  • 3 Theory of computation → Dynamic programming
  • 3 Theory of computation → Sorting and searching
  • Show More...

  • Refine by Keyword
  • 3 genome assembly
  • 3 string matching
  • 2 Flow decomposition
  • 2 directed acyclic graphs
  • 2 labeled graphs
  • Show More...

  • Refine by Type
  • 14 document

  • Refine by Publication Year
  • 4 2021
  • 3 2022
  • 3 2023
  • 2 2019
  • 1 2017
  • Show More...

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail