DROPS

Volume

LIPIcs, Volume 259

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

CPM 2023, June 26-28, 2023, Marne-la-Vallée, France

Editors: Laurent Bulteau and Zsuzsanna Lipták

Document

DOI: 10.4230/LIPIcs.IPEC.2025.7

Parameterized Complexity of Scheduling Unit-Time Jobs with Generalized Precedence Constraints

Authors: Christina Büsing, Maurice Draeger, and Corinna Mathwieser

Published in: LIPIcs, Volume 358, 20th International Symposium on Parameterized and Exact Computation (IPEC 2025)

Abstract

We study the parameterized complexity of scheduling unit-time jobs on parallel, identical machines under generalized precedence constraints for minimization of the makespan and the sum of completion times (P|gen-prec, p_j = 1|γ, γ ∈ {C_max,∑_jC_j}). In our setting, each job is equipped with a Boolean formula (precedence constraint) over the set of jobs. A schedule satisfies a job’s precedence constraint if setting earlier jobs to true satisfies the formula. Our definition generalizes several common types of precedence constraints: classical and-constraints if every formula is a conjunction, or-constraints if every formula is a disjunction, and and/or-constraints if every formula is in conjunctive normal form. We prove fixed-parameter tractability when parameterizing by the number of predecessors. For parameterization by the number of successors, however, the complexity depends on the structure of the precedence constraints. If every constraint is a conjunction or a disjunction, we prove the problem to be fixed-parameter tractable. For constraints in disjunctive normal form, we prove W[1]-hardness. We show that the and/or-constrained problem is NP-hard, even for a single successor. Moreover, we prove NP-hardness on two machines if every constraint is a conjunction or a disjunction. This result not only proves para-NP-hardness for parameterization by the number of machines but also complements the polynomial-time solvability on two machines if every constraint is a conjunction [Coffman and Graham, 1972] or if every constraint is a disjunction [Johannes, 2005].

Cite as

Christina Büsing, Maurice Draeger, and Corinna Mathwieser. Parameterized Complexity of Scheduling Unit-Time Jobs with Generalized Precedence Constraints. In 20th International Symposium on Parameterized and Exact Computation (IPEC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 358, pp. 7:1-7:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{busing_et_al:LIPIcs.IPEC.2025.7,
  author =	{B\"{u}sing, Christina and Draeger, Maurice and Mathwieser, Corinna},
  title =	{{Parameterized Complexity of Scheduling Unit-Time Jobs with Generalized Precedence Constraints}},
  booktitle =	{20th International Symposium on Parameterized and Exact Computation (IPEC 2025)},
  pages =	{7:1--7:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-407-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{358},
  editor =	{Agrawal, Akanksha and van Leeuwen, Erik Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.IPEC.2025.7},
  URN =		{urn:nbn:de:0030-drops-251390},
  doi =		{10.4230/LIPIcs.IPEC.2025.7},
  annote =	{Keywords: scheduling, precedence constraints, fixed-parameter tractability, complexity}
}

Document

DOI: 10.4230/LIPIcs.IPEC.2025.16

Designing Compact ILPs via Fast Witness Verification

Authors: Michał Włodarczyk

Published in: LIPIcs, Volume 358, 20th International Symposium on Parameterized and Exact Computation (IPEC 2025)

Abstract

The standard formalization of preprocessing in parameterized complexity is given by kernelization. In this work, we depart from this paradigm and study a different type of preprocessing for problems without polynomial kernels, still aiming at producing instances that are easily solvable in practice. Specifically, we ask for which parameterized problems an instance (I,k) can be reduced in polynomial time to an integer linear program (ILP) with poly(k) constraints. We show that this property coincides with the parameterized complexity class WK[1], previously studied in the context of Turing kernelization lower bounds. In turn, the class WK[1] enjoys an elegant characterization in terms of witness verification protocols: a yes-instance should admit a witness of size poly(k) that can be verified in time poly(k). By combining known data structures with new ideas, we design such protocols for several problems, such as r-Way Cut, Vertex Multiway Cut, Steiner Tree, and Minimum Common String Partition, thus showing that they can be modeled by compact ILPs. We also present explicit ILP and MILP formulations for Weighted Vertex Cover on graphs with small (unweighted) vertex cover number. We believe that these results will provide a background for a systematic study of ILP-oriented preprocessing procedures for parameterized problems.

Cite as

Michał Włodarczyk. Designing Compact ILPs via Fast Witness Verification. In 20th International Symposium on Parameterized and Exact Computation (IPEC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 358, pp. 16:1-16:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{wlodarczyk:LIPIcs.IPEC.2025.16,
  author =	{W{\l}odarczyk, Micha{\l}},
  title =	{{Designing Compact ILPs via Fast Witness Verification}},
  booktitle =	{20th International Symposium on Parameterized and Exact Computation (IPEC 2025)},
  pages =	{16:1--16:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-407-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{358},
  editor =	{Agrawal, Akanksha and van Leeuwen, Erik Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.IPEC.2025.16},
  URN =		{urn:nbn:de:0030-drops-251481},
  doi =		{10.4230/LIPIcs.IPEC.2025.16},
  annote =	{Keywords: integer programming, kernelization, nondeterminism, multiway cut}
}

Document

DOI: 10.4230/LIPIcs.ESA.2025.14

Testing Sumsets Is Hard

Authors: Xi Chen, Shivam Nadimpalli, Tim Randolph, Rocco A. Servedio, and Or Zamir

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

A subset S of the Boolean hypercube 𝔽₂ⁿ is a sumset if S = {a + b : a, b ∈ A} for some A ⊆ 𝔽₂ⁿ. Sumsets are central objects of study in additive combinatorics, where they play a role in several of the field’s most important results. We prove a lower bound of Ω(2^{n/2}) for the number of queries needed to test whether a Boolean function f:𝔽₂ⁿ → {0,1} is the indicator function of a sumset, ruling out an efficient testing algorithm for sumsets. Our lower bound for testing sumsets follows from sharp bounds on the related problem of shift testing, which may be of independent interest. We also give a near-optimal {2^{n/2} ⋅ poly(n)}-query algorithm for a smoothed analysis formulation of the sumset refutation problem. Finally, we include a simple proof that the number of different sumsets in 𝔽₂ⁿ is 2^{(1±o(1))2^{n-1}}.

Cite as

Xi Chen, Shivam Nadimpalli, Tim Randolph, Rocco A. Servedio, and Or Zamir. Testing Sumsets Is Hard. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 14:1-14:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ESA.2025.14,
  author =	{Chen, Xi and Nadimpalli, Shivam and Randolph, Tim and Servedio, Rocco A. and Zamir, Or},
  title =	{{Testing Sumsets Is Hard}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{14:1--14:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.14},
  URN =		{urn:nbn:de:0030-drops-244822},
  doi =		{10.4230/LIPIcs.ESA.2025.14},
  annote =	{Keywords: Sumsets, additive combinatorics, property testing, Boolean functions}
}

Document

DOI: 10.4230/LIPIcs.ESA.2025.8

Fast Computation of k-Runs, Parameterized Squares, and Other Generalised Squares

Authors: Yuto Nakashima, Jakub Radoszewski, and Tomasz Waleń

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

A k-mismatch square is a string of the form XY where X and Y are two equal-length strings that have at most k mismatches. Kolpakov and Kucherov [Theor. Comput. Sci., 2003] defined two notions of k-mismatch repeats, called k-repetitions and k-runs, each representing a sequence of consecutive k-mismatch squares of equal length. They proposed algorithms for computing k-repetitions and k-runs working in 𝒪(nklog k+output) time for a string of length n over an integer alphabet, where output is the number of the reported repeats. We show that output = 𝒪(nk log k), both in case of k-repetitions and k-runs, which implies that the complexity of their algorithms is actually 𝒪(nk log k). We apply this result to computing parameterized squares. A parameterized square is a string of the form XY such that X and Y parameterized-match, i.e., there exists a bijection f on the alphabet such that f(X) = Y. Two parameterized squares XY and X'Y' are equivalent if they parameterized match. Recently Hamai et al. [SPIRE 2024] showed that a string of length n over an alphabet of size σ contains less than nσ non-equivalent parameterized squares, improving an earlier bound by Kociumaka et al. [Theor. Comput. Sci., 2016]. We apply our bound for k-mismatch repeats to propose an algorithm that reports all non-equivalent parameterized squares in 𝒪(nσ log σ) time. We also show that the number of non-equivalent parameterized squares can be computed in 𝒪(n log n) time. This last algorithm applies to squares under any substring compatible equivalence relation and also to counting squares that are distinct as strings. In particular, this improves upon the 𝒪(nσ)-time algorithm of Gawrychowski et al. [CPM 2023] for counting order-preserving squares that are distinct as strings if σ = ω(log n).

Cite as

Yuto Nakashima, Jakub Radoszewski, and Tomasz Waleń. Fast Computation of k-Runs, Parameterized Squares, and Other Generalised Squares. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{nakashima_et_al:LIPIcs.ESA.2025.8,
  author =	{Nakashima, Yuto and Radoszewski, Jakub and Wale\'{n}, Tomasz},
  title =	{{Fast Computation of k-Runs, Parameterized Squares, and Other Generalised Squares}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.8},
  URN =		{urn:nbn:de:0030-drops-244768},
  doi =		{10.4230/LIPIcs.ESA.2025.8},
  annote =	{Keywords: string algorithm, k-mismatch square, parameterized square, order-preserving square, maximum gapped repeat}
}

Document

DOI: 10.4230/LIPIcs.WABI.2025.3

Approximability of Longest Run Subsequence and Complementary Minimization Problems

Authors: Yuichi Asahiro, Mingyang Gong, Jesper Jansson, Guohui Lin, Sichen Lu, Eiji Miyano, Hirotaka Ono, Toshiki Saitoh, and Shunichi Tanaka

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)

Abstract

We study the polynomial-time approximability of the Longest Run Subsequence problem (LRS for short) and its complementary minimization variant Minimum Run Subsequence Deletion problem (MRSD for short). For a string S = s₁ ⋯ s_n over an alphabet Σ, a subsequence S' of S is S' = s_{i₁} ⋯ s_{i_p}, such that 1 ≤ i₁ < i₂ < … < i_p ≤ |S|. A run of a symbol σ ∈ Σ in S is a maximal substring of consecutive occurrences of σ. A run subsequence S' of S is a subsequence of S in which every symbol σ ∈ Σ occurs in at most one run. The co-subsequence ̅{S'} of the subsequence S' = s_{i₁} ⋯ s_{i_p} in S is the subsequence obtained by deleting all the characters in S' from S, i.e., ̅{S'} = s_{j₁} ⋯ s_{j_{n-p}} such that j₁ < j₂ < … < j_{n-p} and {j₁, …, j_{n-p}} = {1, …, n}⧵ {i₁, …, i_p}. Given a string S, the goal of LRS (resp., MRSD) is to find a run subsequence S^* of S such that the length |S^*| is maximized (resp., the number | ̅{S^*}| of deleted symbols from S is minimized) over all the run subsequences of S. Let k be the maximum number of symbol occurrences in the input S. It is known that LRS and MRSD are APX-hard even if k = 2. In this paper, we show that LRS can be approximated in polynomial time within factors of (k+2)/3 for k = 2 or 3, and 2(k+1)/5 for every k ≥ 4. Furthermore, we show that MRSD can be approximated in linear time within a factor of (k+4)/4 if k is even and (k+3)/4 if k is odd.

Cite as

Yuichi Asahiro, Mingyang Gong, Jesper Jansson, Guohui Lin, Sichen Lu, Eiji Miyano, Hirotaka Ono, Toshiki Saitoh, and Shunichi Tanaka. Approximability of Longest Run Subsequence and Complementary Minimization Problems. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 3:1-3:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{asahiro_et_al:LIPIcs.WABI.2025.3,
  author =	{Asahiro, Yuichi and Gong, Mingyang and Jansson, Jesper and Lin, Guohui and Lu, Sichen and Miyano, Eiji and Ono, Hirotaka and Saitoh, Toshiki and Tanaka, Shunichi},
  title =	{{Approximability of Longest Run Subsequence and Complementary Minimization Problems}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{3:1--3:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.3},
  URN =		{urn:nbn:de:0030-drops-239290},
  doi =		{10.4230/LIPIcs.WABI.2025.3},
  annote =	{Keywords: Longest run subsequence, minimum run subsequence deletion, approximation algorithm}
}

@InProceedings{asahiro_et_al:LIPIcs.WABI.2025.3,
  author =	{Asahiro, Yuichi and Gong, Mingyang and Jansson, Jesper and Lin, Guohui and Lu, Sichen and Miyano, Eiji and Ono, Hirotaka and Saitoh, Toshiki and Tanaka, Shunichi},
  title =	{{Approximability of Longest Run Subsequence and Complementary Minimization Problems}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{3:1--3:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.3},
  URN =		{urn:nbn:de:0030-drops-239290},
  doi =		{10.4230/LIPIcs.WABI.2025.3},
  annote =	{Keywords: Longest run subsequence, minimum run subsequence deletion, approximation algorithm}
}

Document

Research

DOI: 10.4230/OASIcs.Grossi.2

On the Construction of Elastic Degenerate Strings

Authors: Nicola Rizzo, Veli Mäkinen, and Nadia Pisanti

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)

Abstract

An elastic degenerate string (EDS) is a sequence of sets of strings. In the context of bioinformatics, EDSes can be used to represent the variations observed in a population from its consensus genome. Pattern matching and comparison problems on EDSes have been widely studied in the literature, but their construction has been largely omitted. We fill this gap by showing how algorithms originally developed for related problems of founder reconstruction can be adapted to minimize the total cardinality of the EDS sets and total length of the EDS strings in linear time, given suitable multiple alignments representing the input data.

Cite as

Nicola Rizzo, Veli Mäkinen, and Nadia Pisanti. On the Construction of Elastic Degenerate Strings. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 2:1-2:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{rizzo_et_al:OASIcs.Grossi.2,
  author =	{Rizzo, Nicola and M\"{a}kinen, Veli and Pisanti, Nadia},
  title =	{{On the Construction of Elastic Degenerate Strings}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{2:1--2:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.2},
  URN =		{urn:nbn:de:0030-drops-238014},
  doi =		{10.4230/OASIcs.Grossi.2},
  annote =	{Keywords: multiple sequence alignment, pattern matching, data structures, segmentation algorithms, founder reconstruction, dynamic programming, semi-dynamic range minimum queries, positional Burrows-Wheeler transform}
}

@InProceedings{rizzo_et_al:OASIcs.Grossi.2,
  author =	{Rizzo, Nicola and M\"{a}kinen, Veli and Pisanti, Nadia},
  title =	{{On the Construction of Elastic Degenerate Strings}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{2:1--2:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.2},
  URN =		{urn:nbn:de:0030-drops-238014},
  doi =		{10.4230/OASIcs.Grossi.2},
  annote =	{Keywords: multiple sequence alignment, pattern matching, data structures, segmentation algorithms, founder reconstruction, dynamic programming, semi-dynamic range minimum queries, positional Burrows-Wheeler transform}
}

Document

Research

DOI: 10.4230/OASIcs.Grossi.7

Conditional Lower Bounds for String Matching in Labelled Graphs

Authors: Massimo Equi

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)

Abstract

The problem of String Matching in Labelled Graphs (SMLG) is one possible generalization of the classic problem of finding a string inside another of greater length. In its most general form, SMLG asks to find a match for a string into a graph, which can be directed or undirected. As for string matching, many different variations are possible. For example, the match could be exact or approximate, and the match could lie on a path or a walk. Some of these variations easily fall into the NP-hard realm, while other variants are solvable in polynomial time. For the latter ones, fine-grained complexity has been a game changer in proving quadratic conditional lower bounds, allowing to finally close the gap with those upper bounds that remained unmatched for almost two decades. If the match is allowed to be approximate, SMLG enjoys the same conditional quadratic lower bounds shown for example for edit distance (Backurs and Indyk, STOC '15). The case that really requires ad hoc conditional lower bounds is the one of finding an exact match that lies on a walk. In this work, we focus on explaining various conditional lower bounds for this version of SMLG, with the goal of giving an overall perspective that could help understand which aspects of the problem make it quadratic. We will introduce the reader to the field of fine-grained complexity and show how it can successfully provide the exact type of lower bounds needed for polynomial problems such as SMLG.

Cite as

Massimo Equi. Conditional Lower Bounds for String Matching in Labelled Graphs. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 7:1-7:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{equi:OASIcs.Grossi.7,
  author =	{Equi, Massimo},
  title =	{{Conditional Lower Bounds for String Matching in Labelled Graphs}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{7:1--7:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.7},
  URN =		{urn:nbn:de:0030-drops-238063},
  doi =		{10.4230/OASIcs.Grossi.7},
  annote =	{Keywords: conditional lower bounds, strong exponential time hypothesis, fine-grained complexity, string matching, graphs}
}

Document

Research

DOI: 10.4230/OASIcs.Grossi.20

Subsequence-Based Indices for Genome Sequence Analysis

Authors: Giovanni Buzzega, Alessio Conte, Veronica Guerrini, Giulia Punzi, Giovanna Rosone, and Lorenzo Tattini

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)

Abstract

Compact indices are a fundamental tool in string analysis, even more so in bioinformatics, where genomic sequences can reach billions in length. This paper presents some recent results in which Roberto Grossi has been involved, showing how some of these indices do more than just efficiently represent data, but rather are able to bring out salient information within it, which can be exploited for their downstream analysis. Specifically, we first review a recently-introduced method [Guerrini et al., 2023] that employs the Burrows-Wheeler Transform to build reasonably accurate phylogenetic trees in an assembly-free scenario. We then describe a recent practical tool [Buzzega et al., 2025] for indexing Maximal Common Subsequences between strings, which can enable analysis of genomic sequence similarity. Experimentally, we show that the results produced by the one index are consistent with the expectations about the results of the other index.

Cite as

Giovanni Buzzega, Alessio Conte, Veronica Guerrini, Giulia Punzi, Giovanna Rosone, and Lorenzo Tattini. Subsequence-Based Indices for Genome Sequence Analysis. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 20:1-20:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{buzzega_et_al:OASIcs.Grossi.20,
  author =	{Buzzega, Giovanni and Conte, Alessio and Guerrini, Veronica and Punzi, Giulia and Rosone, Giovanna and Tattini, Lorenzo},
  title =	{{Subsequence-Based Indices for Genome Sequence Analysis}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{20:1--20:21},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.20},
  URN =		{urn:nbn:de:0030-drops-238199},
  doi =		{10.4230/OASIcs.Grossi.20},
  annote =	{Keywords: String Indices, Burrows-Wheeler Transform, Maximal Common Subsequences, Sequence Analysis, Phylogeny}
}

@InProceedings{buzzega_et_al:OASIcs.Grossi.20,
  author =	{Buzzega, Giovanni and Conte, Alessio and Guerrini, Veronica and Punzi, Giulia and Rosone, Giovanna and Tattini, Lorenzo},
  title =	{{Subsequence-Based Indices for Genome Sequence Analysis}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{20:1--20:21},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.20},
  URN =		{urn:nbn:de:0030-drops-238199},
  doi =		{10.4230/OASIcs.Grossi.20},
  annote =	{Keywords: String Indices, Burrows-Wheeler Transform, Maximal Common Subsequences, Sequence Analysis, Phylogeny}
}

Document

DOI: 10.4230/LIPIcs.SEA.2025.9

Incremental Reachability Index

Authors: Laurent Bulteau, Pierre-Yves David, Florian Horn, and Euxane Tran-Girard

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)

Abstract

We study the reachability problem in append-only DAGs: given two nodes u and v, is there a path from u to v? While the problem is linear in general, it can be answered faster by using a precomputed index, which gives a compressed representation of the transitive closure of the graph. Index algorithms are evaluated on three dimensions: the query time that the algorithm needs to answer whether there is a path from one node to another, the memory that the index uses per node, and the indexing time that is required to update the index when a node is added to the graph. In this paper, we combine Jagadish’s static index [Jagadish, 1990] with Felsner’s online chain-decomposition algorithm [Stefan Felsner, 1997] to create an incremental index: data associated with a node is immutable, guaranteeing that queries are answered properly even if new nodes are inserted while the query is processed. Its query time is constant, but its index size is heavily dependent on the graph width, and as such is not competitive with recent indexing algorithms (2-hop, tree-chain, ...). We also propose a version of that incremental algorithm with a much lighter index. In the most compressed version, the query time becomes O(log n). However, constant-time queries can be retained depending on the desired time/memory trade-off.

Cite as

Laurent Bulteau, Pierre-Yves David, Florian Horn, and Euxane Tran-Girard. Incremental Reachability Index. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 9:1-9:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bulteau_et_al:LIPIcs.SEA.2025.9,
  author =	{Bulteau, Laurent and David, Pierre-Yves and Horn, Florian and Tran-Girard, Euxane},
  title =	{{Incremental Reachability Index}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{9:1--9:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.9},
  URN =		{urn:nbn:de:0030-drops-232477},
  doi =		{10.4230/LIPIcs.SEA.2025.9},
  annote =	{Keywords: Directed acyclic graphs, reachability, append-only, index}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.15

Encoding Co-Lex Orders of Finite-State Automata in Linear Space

Authors: Ruben Becker, Nicola Cotumaccio, Sung-Hwan Kim, Nicola Prezza, and Carlo Tosoni

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

The Burrows-Wheeler transform (BWT) is a string transformation that enhances string indexing and compressibility. Cotumaccio and Prezza [SODA '21] extended this transformation to nondeterministic finite automata (NFAs) through co-lexicographic partial orders, i.e., by sorting the states of an NFA according to the co-lexicographic order of the strings reaching them. As the BWT of an NFA shares many properties with its original string variant, the transformation can be used to implement indices for locating specific patterns on the NFA itself. The efficiency of the resulting index is influenced by the width of the partial order on the states: the smaller the width, the faster the index. The most efficient index for arbitrary NFAs currently known in the literature is based on the coarsest forward-stable co-lex (CFS) order of Becker et al. [SPIRE '24]. In this paper, we prove that this CFS order can be encoded within linear space in the number of states in the automaton. The importance of this result stems from the fact that encoding such an order in linear space represents a big first step in the direction of building the index based on this order in near-linear time - the biggest open research question in this context. The currently most efficient known algorithm for this task run in quadratic time in the number of transitions in the NFA and are thus infeasible to run on very large graphs (e.g., pangenome graphs). At this point, a near-linear time algorithm is solely known for the simpler case of deterministic automata [Becker et al., ESA '23] and, in fact, this algorithmic result was enabled by a linear space encoding for deterministic automata [Kim et al., CPM '23].

Cite as

Ruben Becker, Nicola Cotumaccio, Sung-Hwan Kim, Nicola Prezza, and Carlo Tosoni. Encoding Co-Lex Orders of Finite-State Automata in Linear Space. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 15:1-15:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.CPM.2025.15,
  author =	{Becker, Ruben and Cotumaccio, Nicola and Kim, Sung-Hwan and Prezza, Nicola and Tosoni, Carlo},
  title =	{{Encoding Co-Lex Orders of Finite-State Automata in Linear Space}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{15:1--15:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.15},
  URN =		{urn:nbn:de:0030-drops-231094},
  doi =		{10.4230/LIPIcs.CPM.2025.15},
  annote =	{Keywords: Burrows-Wheeler Transform, Co-Lexicographic Orders, Nondeterministic Finite Automata, Graph Walks}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.24

Sorted Consecutive Occurrence Queries in Substrings

Authors: Waseem Akram and Takuya Mieno

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

The string indexing problem is a fundamental computational problem with numerous applications, including information retrieval and bioinformatics. It aims to efficiently solve the pattern matching problem: given a text T of length n for preprocessing and a pattern P of length m as a query, the goal is to report all occurrences of P as substrings of T. Navarro and Thankachan [CPM 2015, Theor. Comput. Sci. 2016] introduced a variant of this problem called the gap-bounded consecutive occurrence query, which reports pairs of consecutive occurrences of P in T such that their gaps (i.e., the distances between them) lie within a query-specified range [g₁, g₂]. Recently, Bille et al. [FSTTCS 2020, Theor. Comput. Sci. 2022] proposed the top-k close consecutive occurrence query, which reports the k closest consecutive occurrences of P in T, sorted in non-decreasing order of distance. Both problems are optimally solved in query time with O(n log n)-space data structures. In this paper, we generalize these problems to the range query model, which focuses only on occurrences of P in a specified substring T[a.. b] of T. Our contributions are as follows: - We propose an O(n log² n)-space data structure that answers the range top-k consecutive occurrence query in O(|P| + log log n + k) time. - We propose an O(n log^{2+ε} n)-space data structure that answers the range gap-bounded consecutive occurrence query in O(|P| + log log n + output) time, where ε is a positive constant and output denotes the number of outputs. Additionally, as by-products, we present algorithms for geometric problems involving weighted horizontal segments in a 2D plane, which are of independent interest.

Cite as

Waseem Akram and Takuya Mieno. Sorted Consecutive Occurrence Queries in Substrings. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 24:1-24:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{akram_et_al:LIPIcs.CPM.2025.24,
  author =	{Akram, Waseem and Mieno, Takuya},
  title =	{{Sorted Consecutive Occurrence Queries in Substrings}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{24:1--24:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.24},
  URN =		{urn:nbn:de:0030-drops-231187},
  doi =		{10.4230/LIPIcs.CPM.2025.24},
  annote =	{Keywords: string algorithm, consecutive occurrences, suffix tree}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.17

On the Compressiveness of the Burrows-Wheeler Transform

Authors: Hideo Bannai, Tomohiro I, and Yuto Nakashima

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

The Burrows-Wheeler transform (BWT) is a reversible transform that converts a string w into another string BWT(w). The size of the run-length encoded BWT (RLBWT) can be interpreted as a measure of repetitiveness in the class of representations called dictionary compression which are essentially representations based on copy and paste operations. In this paper, we shed new light on the compressiveness of BWT and the bijective BWT (BBWT). We first extend previous results on the relations of their run-length compressed sizes r and r_B. We also show that the so-called "clustering effect" of BWT and BBWT can be captured by measures other than empirical entropy or run-length encoding. In particular, we show that BWT and BBWT do not increase the repetitiveness of the string with respect to various measures based on dictionary compression by more than a polylogarithmic factor. Furthermore, we show that there exists an infinite family of strings that are maximally incompressible by any dictionary compression measure, but become very compressible after applying BBWT. An interesting implication of this result is that it is possible to transcend dictionary compression in some cases by simply applying BBWT before applying dictionary compression.

Cite as

Hideo Bannai, Tomohiro I, and Yuto Nakashima. On the Compressiveness of the Burrows-Wheeler Transform. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 17:1-17:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2025.17,
  author =	{Bannai, Hideo and I, Tomohiro and Nakashima, Yuto},
  title =	{{On the Compressiveness of the Burrows-Wheeler Transform}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{17:1--17:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.17},
  URN =		{urn:nbn:de:0030-drops-231116},
  doi =		{10.4230/LIPIcs.CPM.2025.17},
  annote =	{Keywords: Data Compression, Bijective Burrows-Wheeler Transform, Fibonacci words}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.19

The Trie Measure, Revisited

Authors: Jarno N. Alanko, Ruben Becker, Davide Cenzato, Travis Gagie, Sung-Hwan Kim, Bojana Kodric, and Nicola Prezza

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

In this paper, we study the following problem: given n subsets S₁, … , S_n of an integer universe U = {0,… , u-1}, having total cardinality N = ∑_{i = 1}ⁿ |S_i|, find a prefix-free encoding enc : U → {0,1}^+ minimizing the so-called trie measure, i.e., the total number of edges in the n binary tries T₁, … , T_n, where T_i is the trie packing the encoded integers {enc(x):x ∈ S_i}. We first observe that this problem is equivalent to that of merging u sets with the cheapest sequence of binary unions, a problem which in [Ghosh et al., ICDCS 2015] is shown to be NP-hard. Motivated by the hardness of the general problem, we focus on particular families of prefix-free encodings. We start by studying the fixed-length shifted encoding of [Gupta et al., Theoretical Computer Science 2007]. Given a parameter 0 ≤ a < u, this encoding sends each x ∈ U to (x + a) mod u, interpreted as a bit-string of log u bits. We develop the first efficient algorithms that find the value of a minimizing the trie measure when this encoding is used. Our two algorithms run in O(u + Nlog u) and O(Nlog² u) time, respectively. We proceed by studying ordered encodings (a.k.a. monotone or alphabetic), and describe an algorithm finding the optimal such encoding in O(N+u³) time. Within the same running time, we show how to compute the best shifted ordered encoding, provably no worse than both the optimal shifted and optimal ordered encodings. We provide implementations of our algorithms and discuss how these encodings perform in practice.

Cite as

Jarno N. Alanko, Ruben Becker, Davide Cenzato, Travis Gagie, Sung-Hwan Kim, Bojana Kodric, and Nicola Prezza. The Trie Measure, Revisited. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 19:1-19:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.CPM.2025.19,
  author =	{Alanko, Jarno N. and Becker, Ruben and Cenzato, Davide and Gagie, Travis and Kim, Sung-Hwan and Kodric, Bojana and Prezza, Nicola},
  title =	{{The Trie Measure, Revisited}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{19:1--19:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.19},
  URN =		{urn:nbn:de:0030-drops-231135},
  doi =		{10.4230/LIPIcs.CPM.2025.19},
  annote =	{Keywords: Succinct data structures, degenerate strings, integer encoding}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.5

Covers in Optimal Space

Authors: Itai Boneh and Shay Golan

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

A cover of a string S is a string C such that every index of S is contained in some occurrence of C. First introduced by Apostolico and Ehrenfeucht [TCS'93] over 30 years ago, covers have since received significant attention in the string algorithms community. In this work, we present a space-efficient algorithm for computing a compact representation of all covers of a given string. Our algorithm requires only O(log n) additional memory while accessing the input string of length n in a read-only manner. Moreover, it runs in O(n) time, matching the best-known time complexity for this problem while achieving an exponential improvement in space usage.

Cite as

Itai Boneh and Shay Golan. Covers in Optimal Space. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2025.5,
  author =	{Boneh, Itai and Golan, Shay},
  title =	{{Covers in Optimal Space}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{5:1--5:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.5},
  URN =		{urn:nbn:de:0030-drops-230993},
  doi =		{10.4230/LIPIcs.CPM.2025.5},
  annote =	{Keywords: Cover, Read-only random access, small space}
}

56 Search Results for "Bulteau, Laurent"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message