DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2025.55

Safe Sequences via Dominators in DAGs for Path-Covering Problems

Authors: Francisco Sena, Romeo Rizzi, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

A path-covering problem on a directed acyclic graph (DAG) requires finding a set of source-to-sink paths that cover all the nodes, all the arcs, or subsets thereof, and additionally they are optimal with respect to some function. In this paper we study safe sequences of nodes or arcs, namely sequences that appear in some path of every path cover of a DAG. We show that safe sequences admit a simple characterization via cutnodes. Moreover, we establish a connection between maximal safe sequences and leaf-to-root paths in the source- and sink-dominator trees of the DAG, which may be of independent interest in the extensive literature on dominators. With dominator trees, safe sequences admit an O(n)-size representation and a linear-time output-sensitive enumeration algorithm running in time O(m + o), where n and m are the number of nodes and arcs, respectively, and o is the total length of the maximal safe sequences. We then apply maximal safe sequences to simplify Integer Linear Programs (ILPs) for two path-covering problems, LeastSquares and MinPathError, which are at the core of RNA transcript assembly problems from bioinformatics. On various datasets, maximal safe sequences can be computed in under 0.1 seconds per graph, on average, and ILP solvers whose search space is reduced in this manner exhibit significant speed-ups. For example on graphs with a large width, average speed-ups are in the range 50-250× for MinPathError and in the range 80-350× for LeastSquares. Optimizing ILPs using safe sequences can thus become a fast building block of practical RNA transcript assembly tools, and more generally, of path-covering problems.

Cite as

Francisco Sena, Romeo Rizzi, and Alexandru I. Tomescu. Safe Sequences via Dominators in DAGs for Path-Covering Problems. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 55:1-55:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{sena_et_al:LIPIcs.ESA.2025.55,
  author =	{Sena, Francisco and Rizzi, Romeo and Tomescu, Alexandru I.},
  title =	{{Safe Sequences via Dominators in DAGs for Path-Covering Problems}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{55:1--55:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.55},
  URN =		{urn:nbn:de:0030-drops-245230},
  doi =		{10.4230/LIPIcs.ESA.2025.55},
  annote =	{Keywords: directed acyclic graph, path cover, dominator tree, integer linear programming, least squares, minimum path error}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2025.42

On the Spectral Expansion of Monotone Subsets of the Hypercube

Authors: Yumou Fei and Renato Ferreira Pinto Jr.

Published in: LIPIcs, Volume 353, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)

Abstract

We study the spectral gap of subgraphs of the hypercube induced by monotone subsets of vertices. For a monotone subset A ⊆ {0,1}ⁿ of density μ(A), the previous best lower bound on the spectral gap, due to Cohen [Cohen, 2016], was γ ≳ μ(A)/n², improving upon the earlier bound γ ≳ μ(A)²/n² established by Ding and Mossel [Ding and Mossel, 2014]. In this paper, we prove the optimal lower bound γ ≳ μ(A)/n. As a corollary, we improve the mixing time upper bound of the random walk on constant-density monotone sets from O(n³), as shown by Ding and Mossel, to O(n²). Along the way, we develop two new inequalities that may be of independent interest: (1) a directed L²-Poincaré inequality on the hypercube, and (2) an "approximate" FKG inequality for monotone sets.

Cite as

Yumou Fei and Renato Ferreira Pinto Jr.. On the Spectral Expansion of Monotone Subsets of the Hypercube. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 353, pp. 42:1-42:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{fei_et_al:LIPIcs.APPROX/RANDOM.2025.42,
  author =	{Fei, Yumou and Ferreira Pinto Jr., Renato},
  title =	{{On the Spectral Expansion of Monotone Subsets of the Hypercube}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)},
  pages =	{42:1--42:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-397-3},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{353},
  editor =	{Ene, Alina and Chattopadhyay, Eshan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2025.42},
  URN =		{urn:nbn:de:0030-drops-244081},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2025.42},
  annote =	{Keywords: Random walks, mixing time, FKG inequality, Poincar\'{e} inequality, directed isoperimetry}
}

Document

Research

DOI: 10.4230/OASIcs.Grossi.2

On the Construction of Elastic Degenerate Strings

Authors: Nicola Rizzo, Veli Mäkinen, and Nadia Pisanti

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)

Abstract

An elastic degenerate string (EDS) is a sequence of sets of strings. In the context of bioinformatics, EDSes can be used to represent the variations observed in a population from its consensus genome. Pattern matching and comparison problems on EDSes have been widely studied in the literature, but their construction has been largely omitted. We fill this gap by showing how algorithms originally developed for related problems of founder reconstruction can be adapted to minimize the total cardinality of the EDS sets and total length of the EDS strings in linear time, given suitable multiple alignments representing the input data.

Cite as

Nicola Rizzo, Veli Mäkinen, and Nadia Pisanti. On the Construction of Elastic Degenerate Strings. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 2:1-2:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{rizzo_et_al:OASIcs.Grossi.2,
  author =	{Rizzo, Nicola and M\"{a}kinen, Veli and Pisanti, Nadia},
  title =	{{On the Construction of Elastic Degenerate Strings}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{2:1--2:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.2},
  URN =		{urn:nbn:de:0030-drops-238014},
  doi =		{10.4230/OASIcs.Grossi.2},
  annote =	{Keywords: multiple sequence alignment, pattern matching, data structures, segmentation algorithms, founder reconstruction, dynamic programming, semi-dynamic range minimum queries, positional Burrows-Wheeler transform}
}

@InProceedings{rizzo_et_al:OASIcs.Grossi.2,
  author =	{Rizzo, Nicola and M\"{a}kinen, Veli and Pisanti, Nadia},
  title =	{{On the Construction of Elastic Degenerate Strings}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{2:1--2:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.2},
  URN =		{urn:nbn:de:0030-drops-238014},
  doi =		{10.4230/OASIcs.Grossi.2},
  annote =	{Keywords: multiple sequence alignment, pattern matching, data structures, segmentation algorithms, founder reconstruction, dynamic programming, semi-dynamic range minimum queries, positional Burrows-Wheeler transform}
}

Document

Research

DOI: 10.4230/OASIcs.Grossi.7

Conditional Lower Bounds for String Matching in Labelled Graphs

Authors: Massimo Equi

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)

Abstract

The problem of String Matching in Labelled Graphs (SMLG) is one possible generalization of the classic problem of finding a string inside another of greater length. In its most general form, SMLG asks to find a match for a string into a graph, which can be directed or undirected. As for string matching, many different variations are possible. For example, the match could be exact or approximate, and the match could lie on a path or a walk. Some of these variations easily fall into the NP-hard realm, while other variants are solvable in polynomial time. For the latter ones, fine-grained complexity has been a game changer in proving quadratic conditional lower bounds, allowing to finally close the gap with those upper bounds that remained unmatched for almost two decades. If the match is allowed to be approximate, SMLG enjoys the same conditional quadratic lower bounds shown for example for edit distance (Backurs and Indyk, STOC '15). The case that really requires ad hoc conditional lower bounds is the one of finding an exact match that lies on a walk. In this work, we focus on explaining various conditional lower bounds for this version of SMLG, with the goal of giving an overall perspective that could help understand which aspects of the problem make it quadratic. We will introduce the reader to the field of fine-grained complexity and show how it can successfully provide the exact type of lower bounds needed for polynomial problems such as SMLG.

Cite as

Massimo Equi. Conditional Lower Bounds for String Matching in Labelled Graphs. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 7:1-7:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{equi:OASIcs.Grossi.7,
  author =	{Equi, Massimo},
  title =	{{Conditional Lower Bounds for String Matching in Labelled Graphs}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{7:1--7:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.7},
  URN =		{urn:nbn:de:0030-drops-238063},
  doi =		{10.4230/OASIcs.Grossi.7},
  annote =	{Keywords: conditional lower bounds, strong exponential time hypothesis, fine-grained complexity, string matching, graphs}
}

Document

DOI: 10.4230/OASIcs.Manzini.12

Wheeler Graphs and Wheeler Languages

Authors: Nicola Cotumaccio, Giovanna D'Agostino, Daniel Gibney, Alberto Policriti, Nicola Prezza, and Sharma V. Thankachan

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

Suffix sorting stands at the core of the most efficient solutions for indexed pattern matching: the suffix tree, the suffix array, compressed indexes based on the Burrows-Wheeler transform, and so on. In [Gagie, Manzini, Sirén, TCS 2017] this concept was extended to labeled graphs, obtaining the rich class of Wheeler graphs. This work opened a very fruitful line of research, ultimately generating results able to bridge the fields of compressed data structures, graph theory, and regular language theory. In a Wheeler graph, nodes are sorted according to the alphabetic order of their incoming labels, propagating this order through pairs of equally-labeled edges. This apparently-simple definition makes it possible to solve on Wheeler graphs problems (including, but not limited to: compression, subpath queries, NFA equivalence, determinization, minimization) that on general labeled graphs are extremely hard to solve, and induces a rich structure in the class of regular languages (Wheeler languages) recognized by automata whose state transition is a Wheeler graph. The goal of this survey is to provide a summary of (and intuitions behind) the results on Wheeler graphs that appeared in the literature since their introduction, in addition to a discussion of interesting problems that are still open in the field.

Cite as

Nicola Cotumaccio, Giovanna D'Agostino, Daniel Gibney, Alberto Policriti, Nicola Prezza, and Sharma V. Thankachan. Wheeler Graphs and Wheeler Languages. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 12:1-12:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{cotumaccio_et_al:OASIcs.Manzini.12,
  author =	{Cotumaccio, Nicola and D'Agostino, Giovanna and Gibney, Daniel and Policriti, Alberto and Prezza, Nicola and Thankachan, Sharma V.},
  title =	{{Wheeler Graphs and Wheeler Languages}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{12:1--12:28},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.12},
  URN =		{urn:nbn:de:0030-drops-239205},
  doi =		{10.4230/OASIcs.Manzini.12},
  annote =	{Keywords: Wheeler languages, Wheeler graphs, pattern matching, indexing, compressed data structures}
}

Document

DOI: 10.4230/OASIcs.Manzini.13

Graph Indexing Beyond Wheeler Graphs

Authors: Jarno N. Alanko, Elena Biagi, Massimo Equi, Veli Mäkinen, Simon J. Puglisi, Nicola Rizzo, Kunihiko Sadakane, and Jouni Sirén

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

After the discovery of the FM index, which linked the Burrows-Wheeler transform (BWT) to pattern matching on strings, several contemporaneous strands of research began on indexing more complex structures with the BWT, such as tries, finite languages, de Bruijn graphs, and aligned sequences. These directions can now be viewed as culminating in the theory of Wheeler Graphs, but sometimes they go beyond. This chapter reviews the significant body of "proto Wheeler Graph" indexes, many of which exploit characteristics of their specific case to outperform Wheeler graphs, especially in practice.

Cite as

Jarno N. Alanko, Elena Biagi, Massimo Equi, Veli Mäkinen, Simon J. Puglisi, Nicola Rizzo, Kunihiko Sadakane, and Jouni Sirén. Graph Indexing Beyond Wheeler Graphs. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 13:1-13:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{alanko_et_al:OASIcs.Manzini.13,
  author =	{Alanko, Jarno N. and Biagi, Elena and Equi, Massimo and M\"{a}kinen, Veli and Puglisi, Simon J. and Rizzo, Nicola and Sadakane, Kunihiko and Sir\'{e}n, Jouni},
  title =	{{Graph Indexing Beyond Wheeler Graphs}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{13:1--13:29},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.13},
  URN =		{urn:nbn:de:0030-drops-239215},
  doi =		{10.4230/OASIcs.Manzini.13},
  annote =	{Keywords: indexing, compression, compressed data structures, string algorithms, pattern matching}
}

Document

DOI: 10.4230/LIPIcs.SEA.2025.13

Pangenome Graph Indexing via the Multidollar-BWT

Authors: Davide Cozzi, Brian Riccardi, Luca Denti, Simone Ciccolella, Kunihiko Sadakane, and Paola Bonizzoni

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)

Abstract

Indexing pangenome graphs is a major algorithmic challenge in computational pangenomics, a recent and active research field that seeks to use graphs as representations of multiple genomes. Since these graphs are constructed from whole genome sequences of a species population, they can become very large, making indexing one of the most challenging problems. In this paper, we propose gindex, a novel indexing approach to solve the Graph Pattern Matching Problem based on the multidollar-BWT. Specifically, gindex aims to find all occurrences of a pattern in a sequence-labeled graph by overcoming two main limitations of GCSA2, one of the most widely used graph indexes: handling queries of arbitrary length and scaling to large graphs without pruning any complex regions. Moreover, we show how a smart preprocessing step can optimize the use of multidollar-BWT to skip small redundant sub-patterns and enhance gindex’s querying capabilities. We demonstrate the effectiveness of our approach by comparing it to GCSA2 in terms of index construction and query time, using different preprocessing modes on three pangenome graphs: one built from Drosophila genomes and two produced by the Human Pangenome Reference Consortium. The results show that gindex can scale on human pangenome graphs - which GCSA2 cannot index using large amounts of RAM - with acceptable memory and time requirements. Moreover, gindex achieves fast query times, although not as fast as GCSA2, which may produce false positives.

Cite as

Davide Cozzi, Brian Riccardi, Luca Denti, Simone Ciccolella, Kunihiko Sadakane, and Paola Bonizzoni. Pangenome Graph Indexing via the Multidollar-BWT. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{cozzi_et_al:LIPIcs.SEA.2025.13,
  author =	{Cozzi, Davide and Riccardi, Brian and Denti, Luca and Ciccolella, Simone and Sadakane, Kunihiko and Bonizzoni, Paola},
  title =	{{Pangenome Graph Indexing via the Multidollar-BWT}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.13},
  URN =		{urn:nbn:de:0030-drops-232515},
  doi =		{10.4230/LIPIcs.SEA.2025.13},
  annote =	{Keywords: Multidollar-BWT, Graph Index, Graph Pattern Matching, Pangenome Graph}
}

Document

DOI: 10.4230/LIPIcs.SEA.2025.15

Succinct Rank Dictionaries Revisited

Authors: Saska Dönges and Simon J. Puglisi

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)

Abstract

We study data structures for representing sets of m elements drawn from the universe [0..n-1] that support access and rank queries. A classical approach to this problem, foundational to the fields of succinct and compact data structures, is to represent the set as a bitvector X of n bits, where X[i] = 1 iff i is a member of the set. Our particular focus in this paper is on structures taking log₂{n choose m} + o(n) bits, which stem from the so-called RRR bitvector scheme (Raman et al., ACM Trans. Alg., 2007). In RRR bitvectors, X is conceptually divided into n/b blocks of b bits each. A block containing c 1 bits is then encoded using log₂ b + log₂{b choose c} bits, where log b bits are used to encode c, and log₂{b choose c} bits are used to say which of the {b choose c} possible combinations the block represents. In all existing RRR implementations the code assigned to a block is its lexicographical rank amongst the {b choose c} combinations of its class. In this paper we explore alternative non-lexicographical assignments of codes to blocks. We show these approaches can lead to faster query times and offer relevant space-time trade-offs in practice compared to state-of-the-art implementations (Gog and Petri, Software, Prac. & Exp., 2014) from the Succinct Data Structures Library.

Cite as

Saska Dönges and Simon J. Puglisi. Succinct Rank Dictionaries Revisited. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 15:1-15:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{donges_et_al:LIPIcs.SEA.2025.15,
  author =	{D\"{o}nges, Saska and Puglisi, Simon J.},
  title =	{{Succinct Rank Dictionaries Revisited}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{15:1--15:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.15},
  URN =		{urn:nbn:de:0030-drops-232530},
  doi =		{10.4230/LIPIcs.SEA.2025.15},
  annote =	{Keywords: data structures, data compression, succinct data structures, compressed data structures, weighted de Bruijn sequence, text indexing, string algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.7

FL-RMQ: A Learned Approach to Range Minimum Queries

Authors: Paolo Ferragina and Filippo Lari

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

We address the problem of designing and implementing a data structure for the Range Minimum Query problem. We show a surprising connection between this classical problem and the geometry of a properly defined set of points in the Cartesian plane. Building on this insight, we hinge upon a well-known result in Computational Geometry to introduce the first RMQ solution that exploits (i.e., learns) the distribution of such 2D-points via proper error-bounded linear approximations. Because of these features, we name the resulting data structure: Fully-Learned RMQ, shortly FL-RMQ. We prove theoretical bounds for its space usage and query time, covering both worst-case scenarios and average-case performance for uniformly distributed inputs. These bounds compare favorably with the ones achievable by the best-known indexing solutions (i.e., the ones that allow access to the indexed array), especially when the input data follow some geometric regularities that we characterize in the paper, thus providing principled evidence of FL-RMQ being a novel data-aware solution to the RMQ problem. We corroborate our theoretical findings with a wide set of experiments showing that FL-RMQ offers more robust space-time trade-offs than the other known practical indexing solutions on both artificial and real-world datasets. We believe that our novel approach to the RMQ problem is noteworthy not only for its interesting space-time trade-offs, but also because it is flexible enough to be applied easily to the encoding variant of RMQ (i.e., the one that does not allow access to the indexed array), and moreover, because it paves the way to research opportunities on possibly other problems.

Cite as

Paolo Ferragina and Filippo Lari. FL-RMQ: A Learned Approach to Range Minimum Queries. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 7:1-7:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ferragina_et_al:LIPIcs.CPM.2025.7,
  author =	{Ferragina, Paolo and Lari, Filippo},
  title =	{{FL-RMQ: A Learned Approach to Range Minimum Queries}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{7:1--7:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.7},
  URN =		{urn:nbn:de:0030-drops-231014},
  doi =		{10.4230/LIPIcs.CPM.2025.7},
  annote =	{Keywords: Range-Minimum query, Learned data structures, Compact data structures, Experimental results}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.12

Shortest Undirected Paths in de Bruijn Graphs

Authors: Wiktor Zuba, Oded Lachish, and Solon P. Pissis

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

Computing shortest directed paths in de Bruijn graphs is well studied and well understood. This is not the case for computing undirected paths, which is much more challenging algorithmically. In this paper, we present a general framework for computing shortest undirected paths in arbitrary de Bruijn graphs, that is, arbitrary subgraphs of the complete de Bruijn graph. We then present an application of our techniques for making any arbitrary order-k de Bruijn graph G(V,E) weakly connected by adding a set of edges of minimum total cost. This improves the running time of the recent (2-2/d)-approximation algorithm by Bernardini et al. [CPM 2024] from 𝒪(k|V|²) to 𝒪(k|V|log d) time, where d is the number of weakly connected components of graph G.

Cite as

Wiktor Zuba, Oded Lachish, and Solon P. Pissis. Shortest Undirected Paths in de Bruijn Graphs. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 12:1-12:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{zuba_et_al:LIPIcs.CPM.2025.12,
  author =	{Zuba, Wiktor and Lachish, Oded and Pissis, Solon P.},
  title =	{{Shortest Undirected Paths in de Bruijn Graphs}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{12:1--12:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.12},
  URN =		{urn:nbn:de:0030-drops-231060},
  doi =		{10.4230/LIPIcs.CPM.2025.12},
  annote =	{Keywords: string algorithm, graph algorithm, de Bruijn graph, Eulerian graph}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.29

Faster Approximate Elastic-Degenerate String Matching - Part B

Authors: Paweł Gawrychowski, Adam Górkiewicz, Pola Marciniak, Solon P. Pissis, and Karol Pokorski

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

We revisit the complexity of approximate pattern matching in an elastic-degenerate string. Such a string is a sequence of n finite sets of strings of total length N, and compactly describes a collection of strings obtained by first choosing exactly one string in every set, and then concatenating them together. This is motivated by the need of storing a collection of highly similar DNA sequences. The basic algorithmic question on elastic-degenerate strings is pattern matching: given such an elastic-degenerate string and a standard pattern of length m, check if the pattern occurs in one of the strings in the described collection. Bernardini et al. [SICOMP 2022] showed how to leverage fast matrix multiplication to obtain an Õ(nm^{ω-1})+𝒪(N)-time complexity for this problem, where ω is the matrix multiplication exponent. However, from the point of view of possible applications, it is more desirable to work with approximate pattern matching, where we seek approximate occurrences of the pattern. This generalization has been considered in a few papers already, but the best result so far for occurrences with k mismatches, where k is a constant, is the Õ(nm²+N)-time algorithm presented in Part A [CPM 2025]. This brings the question whether increasing the dependency on m from m^{ω-1} to quadratic is necessary when moving from k = 0 to larger (but still constant) k. We design an Õ(nm^{1.5}+N)-time algorithm for pattern matching with k mismatches in an elastic-degenerate string, for any constant k. To obtain this time bound, we leverage the structural characterization of occurrences with k mismatches of Charalampopoulos, Kociumaka, and Wellnitz [FOCS 2020] together with fast Fourier transform. We need to work with multiple patterns at the same time, instead of a single pattern, which requires refining the original characterization. This might be of independent interest.

Cite as

Paweł Gawrychowski, Adam Górkiewicz, Pola Marciniak, Solon P. Pissis, and Karol Pokorski. Faster Approximate Elastic-Degenerate String Matching - Part B. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 29:1-29:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2025.29,
  author =	{Gawrychowski, Pawe{\l} and G\'{o}rkiewicz, Adam and Marciniak, Pola and Pissis, Solon P. and Pokorski, Karol},
  title =	{{Faster Approximate Elastic-Degenerate String Matching - Part B}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{29:1--29:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.29},
  URN =		{urn:nbn:de:0030-drops-231236},
  doi =		{10.4230/LIPIcs.CPM.2025.29},
  annote =	{Keywords: ED string, approximate pattern matching, Hamming distance, k mismatches}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.28

Faster Approximate Elastic-Degenerate String Matching - Part A

Authors: Solon P. Pissis, Jakub Radoszewski, and Wiktor Zuba

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

An elastic-degenerate (ED) string 𝐓 is a sequence 𝐓 = 𝐓[1] ⋯ 𝐓[n] of n finite sets of strings. The cardinality m of 𝐓 is the total number of strings in 𝐓[i], for all i ∈ [1..n]. The size N of 𝐓 is the total length of all m strings of 𝐓. ED strings have been introduced to represent a set of closely-related DNA sequences. Let P = P[1..p] be a pattern of length p and k > 0 be an integer. We consider the problem of k-Approximate ED String Matching (EDSM): searching k-approximate occurrences of P in the language of 𝐓. We call k-Approximate EDSM under the Hamming distance, k-Mismatch EDSM; and we call k-Approximate EDSM under edit distance, k-Edit EDSM. Bernardini et al. (Theoretical Computer Science, 2020) showed a simple 𝒪(k m p + kN)-time algorithm for k-Mismatch EDSM and an 𝒪(k² m p + kN)-time algorithm for k-Edit EDSM. We improve the dependency on k in both results, obtaining an Õ(k^{2/3}mp+√kN)-time algorithm for k-Mismatch EDSM and an Õ(kmp+ kN)-time algorithm for k-Edit EDSM. Bernardini et al. (Theory of Computing Systems, 2024) presented several algorithms for 1-Approximate EDSM working in Õ(np²+N) time. They have also left the possibility to generalize these solutions for k > 1 as an open problem. We improve the runtime of their solution for 1-Mismatch and 1-Edit EDSM from Õ(np²+N) to 𝒪(np²+N). We further show algorithms for k-Approximate EDSM for the Hamming and edit distances working in Õ(np² + N) time, for any constant k > 0. Finally, we show how our techniques can be applied to improve upon the complexity of the k-Approximate ED String Intersection and k-Approximate Doubly EDSM problems that were introduced very recently by Gabory et al. (Information and Computation, 2025).

Cite as

Solon P. Pissis, Jakub Radoszewski, and Wiktor Zuba. Faster Approximate Elastic-Degenerate String Matching - Part A. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 28:1-28:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{pissis_et_al:LIPIcs.CPM.2025.28,
  author =	{Pissis, Solon P. and Radoszewski, Jakub and Zuba, Wiktor},
  title =	{{Faster Approximate Elastic-Degenerate String Matching - Part A}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{28:1--28:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.28},
  URN =		{urn:nbn:de:0030-drops-231227},
  doi =		{10.4230/LIPIcs.CPM.2025.28},
  annote =	{Keywords: ED string, approximate string matching, Hamming distance, edit distance}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.1

Representing Paths in Digraphs

Authors: Riccardo Dondi and Alexandru Popa

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

In this contribution we consider two combinatorial problems related to graph string matching, motivated by recent approaches in computational genomics. Given a DAG where each node is labeled by a symbol, the problems aim to find a path in the DAG whose nodes contain all (or the maximum number of) symbols of the alphabet. We introduce a decision problem, Σ-Representing Path, that asks whether there exists a path that contains all the symbols of the alphabet, and an optimization problem, called Maximum Representing Path, that asks for a path that contains the maximum number of symbols. We analyze the complexity of the problems, showing the NP-completeness of {Σ-Representing Path} when each symbol labels at most three nodes in the DAG, and showing the APX-hardness of Maximum Representing Path when each symbol labels at most two nodes in the DAG. We complement the first result by giving a polynomial-time algorithm for Σ-Representing Path when each symbol labels at most two nodes in the DAG. Then we investigate the parameterized complexity of the two problems when the DAG has a limited distance from a set of disjoint paths and we show that both problems are W[1]-hard for this parameter. We consider the approximation of Maximum Representing Path, giving an approximation algorithm of factor √OPT, where OPT is the value of an optimal solution of the problem. We also show that Maximum Representing Path cannot be approximated within factor e/(e-1) - α, for any constant α > 0, unless NP ⊆ DTIME(|V|^{O(log log |V|)}) (V is the set of nodes of the DAG).

Cite as

Riccardo Dondi and Alexandru Popa. Representing Paths in Digraphs. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 1:1-1:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{dondi_et_al:LIPIcs.CPM.2025.1,
  author =	{Dondi, Riccardo and Popa, Alexandru},
  title =	{{Representing Paths in Digraphs}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{1:1--1:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.1},
  URN =		{urn:nbn:de:0030-drops-230954},
  doi =		{10.4230/LIPIcs.CPM.2025.1},
  annote =	{Keywords: Graph String Matching, Computational Complexity, Parameterized Complexity, Algorithms}
}

Artifact

Software

DOI: 10.4230/artifacts.22465

Practical Omnitigs

Authors: Sebastian Schmidt

Abstract

Cite as

Sebastian Schmidt. Practical Omnitigs (Software, Source Code). Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@misc{dagstuhl-artifact-22465,
   title = {{Practical Omnitigs}}, 
   author = {Schmidt, Sebastian},
   note = {Software, swhId: \href{https://archive.softwareheritage.org/swh:1:rev:bb1de69873c6b48f183e51bca2f48d2a057b8b64;origin=https://github.com/algbio/practical-omnitigs;visit=swh:1:snp:04e6ac0423d201dfab2c8d8ebe834756a6f88de9}{\texttt{swh:1:rev:bb1de69873c6b48f183e51bca2f48d2a057b8b64}} (visited on 2024-11-28)},
   url = {https://github.com/algbio/practical-omnitigs},
   doi = {10.4230/artifacts.22465},
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.8

Applying the Safe-And-Complete Framework to Practical Genome Assembly

Authors: Sebastian Schmidt, Santeri Toivonen, Paul Medvedev, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs (simple omnitigs), giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the D. melanogaster and the C. elegans genomes. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible additional computational costs and either no or a small increase in the number of misassemblies.

Cite as

Sebastian Schmidt, Santeri Toivonen, Paul Medvedev, and Alexandru I. Tomescu. Applying the Safe-And-Complete Framework to Practical Genome Assembly. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{schmidt_et_al:LIPIcs.WABI.2024.8,
  author =	{Schmidt, Sebastian and Toivonen, Santeri and Medvedev, Paul and Tomescu, Alexandru I.},
  title =	{{Applying the Safe-And-Complete Framework to Practical Genome Assembly}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{8:1--8:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.8},
  URN =		{urn:nbn:de:0030-drops-206520},
  doi =		{10.4230/LIPIcs.WABI.2024.8},
  annote =	{Keywords: Genome assembly, Omnitigs, Safe-and-complete framework, graph algorithm, HiFi sequencing data, Assembly evaluation}
}

31 Search Results for "Tomescu, Alexandru I."

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message