25 Search Results for "Mäkinen, Veli"


Document
Applying the Safe-And-Complete Framework to Practical Genome Assembly

Authors: Sebastian Schmidt, Santeri Toivonen, Paul Medvedev, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)


Abstract
Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs (simple omnitigs), giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the D. melanogaster and the C. elegans genomes. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible additional computational costs and either no or a small increase in the number of misassemblies.

Cite as

Sebastian Schmidt, Santeri Toivonen, Paul Medvedev, and Alexandru I. Tomescu. Applying the Safe-And-Complete Framework to Practical Genome Assembly. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{schmidt_et_al:LIPIcs.WABI.2024.8,
  author =	{Schmidt, Sebastian and Toivonen, Santeri and Medvedev, Paul and Tomescu, Alexandru I.},
  title =	{{Applying the Safe-And-Complete Framework to Practical Genome Assembly}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{8:1--8:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.8},
  URN =		{urn:nbn:de:0030-drops-206520},
  doi =		{10.4230/LIPIcs.WABI.2024.8},
  annote =	{Keywords: Genome assembly, Omnitigs, Safe-and-complete framework, graph algorithm, HiFi sequencing data, Assembly evaluation}
}
Document
b-move: Faster Bidirectional Character Extensions in a Run-Length Compressed Index

Authors: Lore Depuydt, Luca Renders, Simon Van de Vyver, Lennart Veys, Travis Gagie, and Jan Fostier

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)


Abstract
Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.’s r-index and Nishimoto and Tabei’s move structure, alleviate memory constraints but focus primarily on backward search for MEM-finding. Arakawa et al.’s br-index initiates complete approximate pattern matching using bidirectional search in run-length compressed space, but with significant computational overhead due to complex memory access patterns. We introduce b-move, a novel bidirectional extension of the move structure, enabling fast, cache-efficient bidirectional character extensions in run-length compressed space. It achieves bidirectional character extensions up to 8 times faster than the br-index, closing the performance gap with FM-index-based alternatives, while maintaining the br-index’s favorable memory characteristics. For example, all available complete E. coli genomes on NCBI’s RefSeq collection can be compiled into a b-move index that fits into the RAM of a typical laptop. Thus, b-move proves practical and scalable for pan-genome indexing and querying. We provide a C++ implementation of b-move, supporting efficient lossless approximate pattern matching including locate functionality, available at https://github.com/biointec/b-move under the AGPL-3.0 license.

Cite as

Lore Depuydt, Luca Renders, Simon Van de Vyver, Lennart Veys, Travis Gagie, and Jan Fostier. b-move: Faster Bidirectional Character Extensions in a Run-Length Compressed Index. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 10:1-10:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{depuydt_et_al:LIPIcs.WABI.2024.10,
  author =	{Depuydt, Lore and Renders, Luca and Van de Vyver, Simon and Veys, Lennart and Gagie, Travis and Fostier, Jan},
  title =	{{b-move: Faster Bidirectional Character Extensions in a Run-Length Compressed Index}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{10:1--10:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.10},
  URN =		{urn:nbn:de:0030-drops-206546},
  doi =		{10.4230/LIPIcs.WABI.2024.10},
  annote =	{Keywords: Pan-genomics, FM-index, r-index, Move Structure, Bidirectional Search, Approximate Pattern Matching, Lossless Alignment, Cache Efficiency}
}
Document
A Unifying Taxonomy of Pattern Matching in Degenerate Strings and Founder Graphs

Authors: Rocco Ascone, Giulia Bernardini, Alessio Conte, Massimo Equi, Esteban Gabory, Roberto Grossi, and Nadia Pisanti

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)


Abstract
Elastic Degenerate (ED) strings and Elastic Founder (EF) graphs are two versions of acyclic components of pangenomes. Both ED strings and EF graphs (which we collectively name variable strings) extend the well-known notion of indeterminate string. Recent work has extensively investigated algorithmic tasks over these structures, and over several other variable strings notions that they generalise. Among such tasks, the basic operation of matching a pattern into a text, which can serve as a toolkit for many pangenomic data analyses using these data structures, deserves special attention. In this paper we: (1) highlight a clear taxonomy within both ED strings and EF graphs ranging through variable strings of all types, from the linear string up to the most general one; (2) investigate the problem PvarT(X,Y) of matching a solid or variable pattern of type X into a variable text of type Y; (3) using as a reference the quadratic conditional lower bounds that are known for PvarT(solid,ED) and PvarT(solid,EF), for all possible types of variable strings X and Y we either prove the quadratic conditional lower bound for PvarT(X,Y), or provide non-trivial, often sub-quadratic, upper bounds, also exploiting the above-mentioned taxonomy.

Cite as

Rocco Ascone, Giulia Bernardini, Alessio Conte, Massimo Equi, Esteban Gabory, Roberto Grossi, and Nadia Pisanti. A Unifying Taxonomy of Pattern Matching in Degenerate Strings and Founder Graphs. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 14:1-14:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{ascone_et_al:LIPIcs.WABI.2024.14,
  author =	{Ascone, Rocco and Bernardini, Giulia and Conte, Alessio and Equi, Massimo and Gabory, Esteban and Grossi, Roberto and Pisanti, Nadia},
  title =	{{A Unifying Taxonomy of Pattern Matching in Degenerate Strings and Founder Graphs}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{14:1--14:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.14},
  URN =		{urn:nbn:de:0030-drops-206586},
  doi =		{10.4230/LIPIcs.WABI.2024.14},
  annote =	{Keywords: Pangenomics, pattern matching, degenerate string, founder graph, fine-grained complexity}
}
Document
Algorithms and Complexity for Path Covers of Temporal DAGs

Authors: Dibyayan Chakraborty, Antoine Dailly, Florent Foucaud, and Ralf Klasing

Published in: LIPIcs, Volume 306, 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)


Abstract
A path cover of a digraph is a collection of paths collectively containing its vertex set. A path cover with minimum cardinality for a directed acyclic graph can be found in polynomial time [Fulkerson, AMS'56; Cáceres et al., SODA'22]. Moreover, Dilworth’s celebrated theorem on chain coverings of partially ordered sets equivalently states that the minimum size of a path cover of a DAG is equal to the maximum size of a set of mutually unreachable vertices. In this paper, we examine how far these classic results can be extended to a dynamic setting. A temporal digraph has an arc set that changes over discrete time-steps; if the underlying digraph is acyclic, then it is a temporal DAG. A temporal path is a directed path in the underlying digraph, such that the time-steps of arcs are strictly increasing along the path. Two temporal paths are temporally disjoint if they do not occupy any vertex at the same time. A temporal path cover is a collection 𝒞 of temporal paths that covers all vertices, and 𝒞 is temporally disjoint if all its temporal paths are pairwise temporally disjoint. We study the computational complexities of the problems of finding a minimum-size temporal (disjoint) path cover (denoted as Temporal Path Cover and Temporally Disjoint Path Cover). On the negative side, we show that both Temporal Path Cover and Temporally Disjoint Path Cover are NP-hard even when the underlying DAG is planar, bipartite, subcubic, and there are only two arc-disjoint time-steps. Moreover, Temporally Disjoint Path Cover remains NP-hard even on temporal oriented trees. We also observe that natural temporal analogues of Dilworth’s theorem on these classes of temporal DAGs do not hold. In contrast, we show that Temporal Path Cover is polynomial-time solvable on temporal oriented trees by a reduction to Clique Cover for (static undirected) weakly chordal graphs (a subclass of perfect graphs for which Clique Cover admits an efficient algorithm). This highlights an interesting algorithmic difference between the two problems. Although it is NP-hard on temporal oriented trees, Temporally Disjoint Path Cover becomes polynomial-time solvable on temporal oriented lines and temporal rooted directed trees. Motivated by the hardness result on trees, we show that, in contrast, Temporal Path Cover admits an XP time algorithm with respect to parameter t_max + tw, where t_max is the maximum time-step and tw is the treewidth of the underlying static undirected graph; moreover, Temporally Disjoint Path Cover admits an FPT algorithm with respect to the same parameterization.

Cite as

Dibyayan Chakraborty, Antoine Dailly, Florent Foucaud, and Ralf Klasing. Algorithms and Complexity for Path Covers of Temporal DAGs. In 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 306, pp. 38:1-38:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{chakraborty_et_al:LIPIcs.MFCS.2024.38,
  author =	{Chakraborty, Dibyayan and Dailly, Antoine and Foucaud, Florent and Klasing, Ralf},
  title =	{{Algorithms and Complexity for Path Covers of Temporal DAGs}},
  booktitle =	{49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)},
  pages =	{38:1--38:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-335-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{306},
  editor =	{Kr\'{a}lovi\v{c}, Rastislav and Ku\v{c}era, Anton{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2024.38},
  URN =		{urn:nbn:de:0030-drops-205940},
  doi =		{10.4230/LIPIcs.MFCS.2024.38},
  annote =	{Keywords: Temporal Graphs, Dilworth’s Theorem, DAGs, Path Cover, Temporally Disjoint Paths, Algorithms, Oriented Trees, Treewidth}
}
Document
Approximate Suffix-Prefix Dictionary Queries

Authors: Wiktor Zuba, Grigorios Loukides, Solon P. Pissis, and Sharma V. Thankachan

Published in: LIPIcs, Volume 306, 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)


Abstract
In the all-pairs suffix-prefix (APSP) problem [Gusfield et al., Inf. Process. Lett. 1992], we are given a dictionary R of r strings, S₁,…,S_r, of total length n, and we are asked to find the length SPL_{i,j} of the longest string that is both a suffix of S_i and a prefix of S_j, for all i,j ∈ [1..r]. APSP is a classic problem in string algorithms with applications in bioinformatics, especially in sequence assembly. Since r = |R| is typically very large in real-world applications, considering all r² pairs of strings explicitly is prohibitive. This is when the data structure variant of APSP makes sense; in the same spirit as distance oracles computing shortest paths between any two vertices given online. We show how to quickly locate k-approximate matches (under the Hamming or the edit distance) in R using a version of the k-errata tree [Cole et al., STOC 2004] that we introduce. Let SPL^k_{i,j} be the length of the longest suffix of S_i that is at distance at most k from a prefix of S_j. In particular, for any k = 𝒪(1), we show an 𝒪(nlog^k n)-sized data structure to support the following queries: - One-to-One^k(i,j): output SPL^k_{i,j} in 𝒪(log^k nlog log n) time. - Report^k(i,d): output all j ∈ [1..r], such that SPL^k_{i,j} ≥ d, in 𝒪(log^{k}n(log n/log log n+output)) time, where output denotes the size of the output. In fact, our algorithms work for any value of k not just for k = 𝒪(1), but the formulas bounding the complexities get much more complicated for larger values of k.

Cite as

Wiktor Zuba, Grigorios Loukides, Solon P. Pissis, and Sharma V. Thankachan. Approximate Suffix-Prefix Dictionary Queries. In 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 306, pp. 85:1-85:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{zuba_et_al:LIPIcs.MFCS.2024.85,
  author =	{Zuba, Wiktor and Loukides, Grigorios and Pissis, Solon P. and Thankachan, Sharma V.},
  title =	{{Approximate Suffix-Prefix Dictionary Queries}},
  booktitle =	{49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)},
  pages =	{85:1--85:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-335-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{306},
  editor =	{Kr\'{a}lovi\v{c}, Rastislav and Ku\v{c}era, Anton{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2024.85},
  URN =		{urn:nbn:de:0030-drops-206416},
  doi =		{10.4230/LIPIcs.MFCS.2024.85},
  annote =	{Keywords: all-pairs suffix-prefix, suffix-prefix queries, suffix tree, k-errata tree}
}
Document
Engineering Zuffix Arrays

Authors: Paolo Boldi, Stefano Marchini, and Sebastiano Vigna

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)


Abstract
Searching patterns in long strings is a classical algorithmic problem with countless practical applications. Suffix trees and suffix arrays (and their variants) are a long-established solution that yields linear-time search (in the size of the pattern). In [Paolo Boldi and Sebastiano Vigna, 2018] it is shown that a z-map gadget can be attached to (enhanced) suffix arrays to improve their theoretical query time, obtaining a data structure called zuffix array. The main contribution of this paper is to show that a carefully engineered implementation of the z-map gadget does provide significant speedups with respect to enhanced suffix arrays on real-world datasets, albeit doubling the required space. In particular, for large alphabets we observe a sevenfold improvement in query time with respect to enhanced suffix arrays; even in the worst case (small alphabets), the query time is almost halved. Thus, zuffix arrays provide a very interesting new point in the space-time tradeoff spectrum.

Cite as

Paolo Boldi, Stefano Marchini, and Sebastiano Vigna. Engineering Zuffix Arrays. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 2:1-2:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{boldi_et_al:LIPIcs.SEA.2024.2,
  author =	{Boldi, Paolo and Marchini, Stefano and Vigna, Sebastiano},
  title =	{{Engineering Zuffix Arrays}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{2:1--2:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.2},
  URN =		{urn:nbn:de:0030-drops-203677},
  doi =		{10.4230/LIPIcs.SEA.2024.2},
  annote =	{Keywords: Suffix trees, suffix arrays, z-fast tries}
}
Document
Practical Minimum Path Cover

Authors: Manuel Cáceres, Brendan Mumey, Santeri Toivonen, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)


Abstract
Computing a minimum path cover (MPC) of a directed acyclic graph (DAG) is a fundamental problem with a myriad of applications, including reachability. Although it is known how to solve the problem by a simple reduction to minimum flow, recent theoretical advances exploit this idea to obtain algorithms parameterized by the number of paths of an MPC, known as the width. These results obtain fast [Mäkinen et al., TALG 2019] and even linear time [Cáceres et al., SODA 2022] algorithms in the small-width regime. In this paper, we present the first publicly available high-performance implementation of state-of-the-art MPC algorithms, including the parameterized approaches. Our experiments on random DAGs show that parameterized algorithms are orders-of-magnitude faster on dense graphs. Additionally, we present new fast pre-processing heuristics based on transitive edge sparsification. We show that our heuristics improve MPC-solvers by orders of magnitude.

Cite as

Manuel Cáceres, Brendan Mumey, Santeri Toivonen, and Alexandru I. Tomescu. Practical Minimum Path Cover. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 3:1-3:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{caceres_et_al:LIPIcs.SEA.2024.3,
  author =	{C\'{a}ceres, Manuel and Mumey, Brendan and Toivonen, Santeri and Tomescu, Alexandru I.},
  title =	{{Practical Minimum Path Cover}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{3:1--3:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.3},
  URN =		{urn:nbn:de:0030-drops-203687},
  doi =		{10.4230/LIPIcs.SEA.2024.3},
  annote =	{Keywords: minimum path cover, directed acyclic graph, maximum flow, parameterized algorithms, edge sparsification, algorithm engineering}
}
Document
Taxonomic Classification with Maximal Exact Matches in KATKA Kernels and Minimizer Digests

Authors: Dominika Draesslerová, Omar Ahmed, Travis Gagie, Jan Holub, Ben Langmead, Giovanni Manzini, and Gonzalo Navarro

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)


Abstract
For taxonomic classification, we are asked to index the genomes in a phylogenetic tree such that later, given a DNA read, we can quickly choose a small subtree likely to contain the genome from which that read was drawn. Although popular classifiers such as Kraken use k-mers, recent research indicates that using maximal exact matches (MEMs) can lead to better classifications. For example, we can - build an augmented FM-index over the the genomes in the tree concatenated in left-to-right order; - for each MEM in a read, find the interval in the suffix array containing the starting positions of that MEM’s occurrences in those genomes; - find the minimum and maximum values stored in that interval; - take the lowest common ancestor (LCA) of the genomes containing the characters at those positions. This solution is practical, however, only when the total size of the genomes in the tree is fairly small. In this paper we consider applying the same solution to three lossily compressed representations of the genomes' concatenation: - a KATKA kernel, which discards characters that are not in the first or last occurrence of any k_max-tuple, for a parameter k_max; - a minimizer digest; - a KATKA kernel of a minimizer digest. With a test dataset and these three representations of it, simulated reads and various parameter settings, we checked how many reads' longest MEMs occurred only in the sequences from which those reads were generated ("true positive" reads). For some parameter settings we achieved significant compression while only slightly decreasing the true-positive rate.

Cite as

Dominika Draesslerová, Omar Ahmed, Travis Gagie, Jan Holub, Ben Langmead, Giovanni Manzini, and Gonzalo Navarro. Taxonomic Classification with Maximal Exact Matches in KATKA Kernels and Minimizer Digests. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 10:1-10:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{draesslerova_et_al:LIPIcs.SEA.2024.10,
  author =	{Draesslerov\'{a}, Dominika and Ahmed, Omar and Gagie, Travis and Holub, Jan and Langmead, Ben and Manzini, Giovanni and Navarro, Gonzalo},
  title =	{{Taxonomic Classification with Maximal Exact Matches in KATKA Kernels and Minimizer Digests}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{10:1--10:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.10},
  URN =		{urn:nbn:de:0030-drops-203756},
  doi =		{10.4230/LIPIcs.SEA.2024.10},
  annote =	{Keywords: Taxonomic classification, metagenomics, KATKA, maximal exact matches, string kernels, minimizer digests}
}
Document
Accelerating ILP Solvers for Minimum Flow Decompositions Through Search Space and Dimensionality Reductions

Authors: Andreas Grigorjew, Fernando H. C. Dias, Andrea Cracco, Romeo Rizzi, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)


Abstract
Given a flow network, the Minimum Flow Decomposition (MFD) problem is finding the smallest possible set of weighted paths whose superposition equals the flow. It is a classical, strongly NP-hard problem that is proven to be useful in RNA transcript assembly and applications outside of Bioinformatics. We improve an existing ILP (Integer Linear Programming) model by Dias et al. [RECOMB 2022] for DAGs by decreasing the solver’s search space using solution safety and several other optimizations. This results in a significant speedup compared to the original ILP, of up to 34× on average on the hardest instances. Moreover, we show that our optimizations apply also to MFD problem variants, resulting in speedups that go up to 219× on the hardest instances. We also developed an ILP model of reduced dimensionality for an MFD variant in which the solution path weights are restricted to a given set. This model can find an optimal MFD solution for most instances, and overall, its accuracy significantly outperforms that of previous greedy algorithms while being up to an order of magnitude faster than our optimized ILP.

Cite as

Andreas Grigorjew, Fernando H. C. Dias, Andrea Cracco, Romeo Rizzi, and Alexandru I. Tomescu. Accelerating ILP Solvers for Minimum Flow Decompositions Through Search Space and Dimensionality Reductions. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{grigorjew_et_al:LIPIcs.SEA.2024.14,
  author =	{Grigorjew, Andreas and Dias, Fernando H. C. and Cracco, Andrea and Rizzi, Romeo and Tomescu, Alexandru I.},
  title =	{{Accelerating ILP Solvers for Minimum Flow Decompositions Through Search Space and Dimensionality Reductions}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{14:1--14:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.14},
  URN =		{urn:nbn:de:0030-drops-203792},
  doi =		{10.4230/LIPIcs.SEA.2024.14},
  annote =	{Keywords: Flow decomposition, Integer Linear Programming, Safety, RNA-seq, RNA transcript assembly, isoform}
}
Document
SPIDER: Improved Succinct Rank and Select Performance

Authors: Matthew D. Laws, Jocelyn Bliven, Kit Conklin, Elyes Laalai, Samuel McCauley, and Zach S. Sturdevant

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)


Abstract
Rank and select data structures seek to preprocess a bit vector to quickly answer two kinds of queries: Rank(i) gives the number of 1 bits in slots 0 through i, and Select(j) gives the first slot s with Rank(s) = j. A succinct data structure can answer these queries while using space much smaller than the size of the original bit vector. State of the art succinct rank and select data structures use as little as 4% extra space (over the underlying bit vector) while answering rank and select queries very quickly. Rank queries can be answered using only a handful of array accesses. Select queries can be answered by starting with similar array accesses, followed by a linear scan through the bit vector. Nonetheless, a tradeoff remains: data structures that use under 4% space are significantly slower at answering rank and select queries than less-space-efficient data structures (using, say, over 20% extra space). In this paper we make significantly progress towards closing this gap. We give a new data structure, SPIDER, which uses 3.82% extra space. SPIDER gives the best known rank query time for data sets of 8 billion or more bits, even compared to much less space-efficient data structures. For select queries, SPIDER outperforms all data structures that use less than 4% space, and significantly closes the gap in select performance between data structures with less than 4% space, and those that use more (over 20% for both rank and select) space. SPIDER makes two main technical contributions. For rank queries, it improves performance by interleaving the metadata with the bit vector to improve cache efficiency. For select queries, it uses predictions to almost eliminate the cost of the linear scan. These predictions are inspired by recent results on data structures with machine-learned predictions, adapted to the succinct data structure setting. Our results hold on both real and synthetic data, showing that these predictions are effective in practice.

Cite as

Matthew D. Laws, Jocelyn Bliven, Kit Conklin, Elyes Laalai, Samuel McCauley, and Zach S. Sturdevant. SPIDER: Improved Succinct Rank and Select Performance. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{laws_et_al:LIPIcs.SEA.2024.21,
  author =	{Laws, Matthew D. and Bliven, Jocelyn and Conklin, Kit and Laalai, Elyes and McCauley, Samuel and Sturdevant, Zach S.},
  title =	{{SPIDER: Improved Succinct Rank and Select Performance}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{21:1--21:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.21},
  URN =		{urn:nbn:de:0030-drops-203865},
  doi =		{10.4230/LIPIcs.SEA.2024.21},
  annote =	{Keywords: Rank and Select, Succinct Data Structures, Data Structres, Cache Performance, Predictions}
}
Document
Track A: Algorithms, Complexity and Games
Õptimal Dynamic Time Warping on Run-Length Encoded Strings

Authors: Itai Boneh, Shay Golan, Shay Mozes, and Oren Weimann

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
Dynamic Time Warping (DTW) distance is the optimal cost of matching two strings when extending runs of letters is for free. Therefore, it is natural to measure the time complexity of DTW in terms of the number of runs n (rather than the string lengths N). In this paper, we give an Õ(n²) time algorithm for computing the DTW distance. This matches (up to log factors) the known (conditional) lower bound, and should be compared with the previous fastest O(n³) time exact algorithm and the Õ(n²) time approximation algorithm. Our method also immediately implies an Õ(nk) time algorithm when the distance is bounded by k. This should be compared with the previous fastest O(n²k) and O(Nk) time exact algorithms and the Õ(nk) time approximation algorithm.

Cite as

Itai Boneh, Shay Golan, Shay Mozes, and Oren Weimann. Õptimal Dynamic Time Warping on Run-Length Encoded Strings. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 30:1-30:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.ICALP.2024.30,
  author =	{Boneh, Itai and Golan, Shay and Mozes, Shay and Weimann, Oren},
  title =	{{\~{O}ptimal Dynamic Time Warping on Run-Length Encoded Strings}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{30:1--30:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.30},
  URN =		{urn:nbn:de:0030-drops-201730},
  doi =		{10.4230/LIPIcs.ICALP.2024.30},
  annote =	{Keywords: Dynamic time warping, Fr\'{e}chet distance, edit distance, run-length encoding}
}
Document
Finding Maximal Exact Matches in Graphs

Authors: Nicola Rizzo, Manuel Cáceres, and Veli Mäkinen

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)


Abstract
We study the problem of finding maximal exact matches (MEMs) between a query string Q and a labeled graph G. MEMs are an important class of seeds, often used in seed-chain-extend type of practical alignment methods because of their strong connections to classical metrics. A principled way to speed up chaining is to limit the number of MEMs by considering only MEMs of length at least κ (κ-MEMs). However, on arbitrary input graphs, the problem of finding MEMs cannot be solved in truly sub-quadratic time under SETH (Equi et al., ICALP 2019) even on acyclic graphs. In this paper we show an O(n⋅ L ⋅ d^{L-1} + m + M_{κ,L})-time algorithm finding all κ-MEMs between Q and G spanning exactly L nodes in G, where n is the total length of node labels, d is the maximum degree of a node in G, m = |Q|, and M_{κ,L} is the number of output MEMs. We use this algorithm to develop a κ-MEM finding solution on indexable Elastic Founder Graphs (Equi et al., Algorithmica 2022) running in time O(nH² + m + M_κ), where H is the maximum number of nodes in a block, and M_κ is the total number of κ-MEMs. Our results generalize to the analysis of multiple query strings (MEMs between G and any of the strings). Additionally, we provide some preliminary experimental results showing that the number of graph MEMs is an order of magnitude smaller than the number of string MEMs of the corresponding concatenated collection.

Cite as

Nicola Rizzo, Manuel Cáceres, and Veli Mäkinen. Finding Maximal Exact Matches in Graphs. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{rizzo_et_al:LIPIcs.WABI.2023.10,
  author =	{Rizzo, Nicola and C\'{a}ceres, Manuel and M\"{a}kinen, Veli},
  title =	{{Finding Maximal Exact Matches in Graphs}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{10:1--10:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.10},
  URN =		{urn:nbn:de:0030-drops-186364},
  doi =		{10.4230/LIPIcs.WABI.2023.10},
  annote =	{Keywords: Sequence to graph alignment, bidirectional BWT, r-index, suffix tree, founder graphs}
}
Document
Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond

Authors: Manuel Cáceres

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)


Abstract
The problem of String Matching to Labeled Graphs (SMLG) asks to find all the paths in a labeled graph G = (V, E) whose spellings match that of an input string S ∈ Σ^m. SMLG can be solved in quadratic O(m|E|) time [Amir et al., JALG 2000], which was proven to be optimal by a recent lower bound conditioned on SETH [Equi et al., ICALP 2019]. The lower bound states that no strongly subquadratic time algorithm exists, even if restricted to directed acyclic graphs (DAGs). In this work we present the first parameterized algorithms for SMLG on DAGs. Our parameters capture the topological structure of G. All our results are derived from a generalization of the Knuth-Morris-Pratt algorithm [Park and Kim, CPM 1995] optimized to work in time proportional to the number of prefix-incomparable matches. To obtain the parameterization in the topological structure of G, we first study a special class of DAGs called funnels [Millani et al., JCO 2020] and generalize them to k-funnels and the class ST_k. We present several novel characterizations and algorithmic contributions on both funnels and their generalizations.

Cite as

Manuel Cáceres. Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{caceres:LIPIcs.CPM.2023.7,
  author =	{C\'{a}ceres, Manuel},
  title =	{{Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{7:1--7:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.7},
  URN =		{urn:nbn:de:0030-drops-179619},
  doi =		{10.4230/LIPIcs.CPM.2023.7},
  annote =	{Keywords: string matching, parameterized algorithms, FPT inside P, string algorithms, graph algorithms, directed acyclic graphs, labeled graphs, funnels}
}
Document
From Bit-Parallelism to Quantum String Matching for Labelled Graphs

Authors: Massimo Equi, Arianne Meijer-van de Griend, and Veli Mäkinen

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)


Abstract
Many problems that can be solved in quadratic time have bit-parallel speed-ups with factor w, where w is the computer word size. A classic example is computing the edit distance of two strings of length n, which can be solved in O(n²/w) time. In a reasonable classical model of computation, one can assume w = Θ(log n), and obtaining significantly better speed-ups is unlikely in the light of conditional lower bounds obtained for such problems. In this paper, we study the connection of bit-parallelism to quantum computation, aiming to see if a bit-parallel algorithm could be converted to a quantum algorithm with better than logarithmic speed-up. We focus on string matching in labeled graphs, the problem of finding an exact occurrence of a string as the label of a path in a graph. This problem admits a quadratic conditional lower bound under a very restricted class of graphs (Equi et al. ICALP 2019), stating that no algorithm in the classical model of computation can solve the problem in time O(|P||E|^(1-ε)) or O(|P|^(1-ε)|E|). We show that a simple bit-parallel algorithm on such restricted family of graphs (level DAGs) can indeed be converted into a realistic quantum algorithm that attains subquadratic time complexity O(|E|√|P|).

Cite as

Massimo Equi, Arianne Meijer-van de Griend, and Veli Mäkinen. From Bit-Parallelism to Quantum String Matching for Labelled Graphs. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 9:1-9:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{equi_et_al:LIPIcs.CPM.2023.9,
  author =	{Equi, Massimo and Meijer-van de Griend, Arianne and M\"{a}kinen, Veli},
  title =	{{From Bit-Parallelism to Quantum String Matching for Labelled Graphs}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{9:1--9:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.9},
  URN =		{urn:nbn:de:0030-drops-179637},
  doi =		{10.4230/LIPIcs.CPM.2023.9},
  annote =	{Keywords: Bit-parallelism, quantum computation, string matching, level DAGs}
}
Document
Indexable Elastic Founder Graphs of Minimum Height

Authors: Nicola Rizzo and Veli Mäkinen

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)


Abstract
Indexable elastic founder graphs have been recently proposed as a data structure for genomics applications supporting fast pattern matching queries. Consider segmenting a multiple sequence alignment MSA[1..m,1..n] into b blocks MSA[1..m,1..j₁], MSA[1..m,j₁+1..j₂], …, MSA[1..m,j_{b-1}+1..n]. The resulting elastic founder graph (EFG) is obtained by merging in each block the strings that are equivalent after the removal of gap symbols, taking the strings as the nodes of the block and the original MSA connections as edges. We call an elastic founder graph indexable if a node label occurs as a prefix of only those paths that start from a node of the same block. Equi et al. (ISAAC 2021) showed that such EFGs support fast pattern matching and studied their construction maximizing the number of blocks and minimizing the maximum length of a block, but left open the case of minimizing the maximum number of distinct strings in a block that we call graph height. For the simplified gapless setting, we give an O(mn) time algorithm to find a segmentation of an MSA minimizing the height of the resulting indexable founder graph, by combining previous results in segmentation algorithms and founder graphs. For the general setting, the known techniques yield a linear-time parameterized solution on constant alphabet Σ, taking time O(m n² log|Σ|) in the worst case, so we study the refined measure of prefix-aware height, that omits counting strings that are prefixes of another considered string. The indexable EFG minimizing the maximum prefix-aware height provides a lower bound for the original height: by exploiting exploiting suffix trees built from the MSA rows and the data structure answering weighted ancestor queries in constant time of Belazzougui et al. (CPM 2021), we give an O(mn)-time algorithm for the optimal EFG under this alternative height.

Cite as

Nicola Rizzo and Veli Mäkinen. Indexable Elastic Founder Graphs of Minimum Height. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 19:1-19:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{rizzo_et_al:LIPIcs.CPM.2022.19,
  author =	{Rizzo, Nicola and M\"{a}kinen, Veli},
  title =	{{Indexable Elastic Founder Graphs of Minimum Height}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{19:1--19:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.19},
  URN =		{urn:nbn:de:0030-drops-161467},
  doi =		{10.4230/LIPIcs.CPM.2022.19},
  annote =	{Keywords: multiple sequence alignment, pattern matching, data structures, segmentation algorithms, dynamic programming, suffix tree}
}
  • Refine by Author
  • 11 Mäkinen, Veli
  • 6 Tomescu, Alexandru I.
  • 5 Equi, Massimo
  • 3 Cazaux, Bastien
  • 3 Cáceres, Manuel
  • Show More...

  • Refine by Classification
  • 14 Theory of computation → Pattern matching
  • 7 Theory of computation → Graph algorithms analysis
  • 5 Applied computing → Genomics
  • 5 Theory of computation → Sorting and searching
  • 4 Theory of computation → Dynamic programming
  • Show More...

  • Refine by Keyword
  • 4 string matching
  • 3 multiple sequence alignment
  • 3 suffix tree
  • 2 Genome assembly
  • 2 Succinct Data Structures
  • Show More...

  • Refine by Type
  • 25 document

  • Refine by Publication Year
  • 11 2024
  • 3 2021
  • 3 2023
  • 2 2008
  • 2 2020
  • Show More...