43 Search Results for "Zhu, Binhai"


Volume

LIPIcs, Volume 105

29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

CPM 2018, July 2-4, 2018, Qingdao, China

Editors: Gonzalo Navarro, David Sankoff, and Binhai Zhu

Document
A Dimension-Reducing Fréchet Simplification Oracle

Authors: Boris Aronov, Tsuri Farhana, Matthew J. Katz, and Indu Ramesh

Published in: LIPIcs, Volume 359, 36th International Symposium on Algorithms and Computation (ISAAC 2025)


Abstract
Let P be a polygonal curve with n vertices in the plane. We construct a data structure of size O(n log n) suited for simplification queries of the following kind. Given a query line 𝓁 and an integer k ≥ 1, find a curve Q on 𝓁 with at most k vertices that minimizes the discrete Fréchet distance to P, among all such curves. Using our data structure, a query can be handled in O(k² log³ n + k log⁴n) time. More generally, a geometric tree T on n vertices in the plane can be preprocessed into a near-linear-size structure so that, given a pair u, v of its vertices, a line 𝓁, and an integer k ≥ 1, one can find a curve Q on 𝓁 with at most k vertices that minimizes the discrete Fréchet distance to the path from u to v in T, in time O(k² polylog n). For the general dimension-reduction problem, where P is a curve in ℝ^d (d ≥ 3), 0 < ε₀ < 1 is a real parameter, and a query specifies a g-flat h (1 ≤ g ≤ d-1) and an integer k ≥ 1, we construct a data structure of size O(nlog n + f(ε₀) n), where f(ε₀) = (1+1/ε₀)^{(d-1)/2}, that allows us to find a curve Q on h with at most k vertices, whose discrete Fréchet distance to P is at most 1+ε₀ times the distance of Q^* to P, where Q^* is such a curve that minimizes the distance to P. The query handling time is O(f(ε₀) k² log² n).

Cite as

Boris Aronov, Tsuri Farhana, Matthew J. Katz, and Indu Ramesh. A Dimension-Reducing Fréchet Simplification Oracle. In 36th International Symposium on Algorithms and Computation (ISAAC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 359, pp. 6:1-6:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{aronov_et_al:LIPIcs.ISAAC.2025.6,
  author =	{Aronov, Boris and Farhana, Tsuri and Katz, Matthew J. and Ramesh, Indu},
  title =	{{A Dimension-Reducing Fr\'{e}chet Simplification Oracle}},
  booktitle =	{36th International Symposium on Algorithms and Computation (ISAAC 2025)},
  pages =	{6:1--6:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-408-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{359},
  editor =	{Chen, Ho-Lin and Hon, Wing-Kai and Tsai, Meng-Tsung},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2025.6},
  URN =		{urn:nbn:de:0030-drops-249149},
  doi =		{10.4230/LIPIcs.ISAAC.2025.6},
  annote =	{Keywords: Computational geometry, discrete Fr\'{e}chet distance, curve simplification oracle, restricted minimum enclosing disk queries}
}
Document
Planar Stories of Graph Drawings: Algorithms and Experiments

Authors: Carla Binucci, Sabine Cornelsen, Walter Didimo, Seok-Hee Hong, Eleni Katsanou, Maurizio Patrignani, Antonios Symvonis, and Samuel Wolf

Published in: LIPIcs, Volume 357, 33rd International Symposium on Graph Drawing and Network Visualization (GD 2025)


Abstract
We address the problem of computing a dynamic visualization of a geometric graph G as a sequence of frames. Each frame shows only a portion of the graph but their union covers G entirely. The two main requirements of our dynamic visualization are: (i) guaranteeing drawing stability, so to preserve the user’s mental map; (ii) keeping the visual complexity of each frame low. To satisfy the first requirement, we never change the position of the vertices. Regarding the second requirement, we avoid edge crossings in each frame. More precisely, in the first frame we visualize a suitable subset of non-crossing edges; in each subsequent frame, exactly one new edge enters the visualization and all the edges that cross with it are deleted. We call such a sequence of frames a planar story of G. Our goal is to find a planar story whose minimum number of edges contemporarily displayed is maximized (i.e., a planar story that maximizes the minimum frame size). Besides studying our model from a theoretical point of view, we also design and experimentally compare different algorithms, both exact techniques and heuristics. These algorithms provide an array of alternative trade-offs between efficiency and effectiveness, also depending on the structure of the input graph.

Cite as

Carla Binucci, Sabine Cornelsen, Walter Didimo, Seok-Hee Hong, Eleni Katsanou, Maurizio Patrignani, Antonios Symvonis, and Samuel Wolf. Planar Stories of Graph Drawings: Algorithms and Experiments. In 33rd International Symposium on Graph Drawing and Network Visualization (GD 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 357, pp. 32:1-32:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{binucci_et_al:LIPIcs.GD.2025.32,
  author =	{Binucci, Carla and Cornelsen, Sabine and Didimo, Walter and Hong, Seok-Hee and Katsanou, Eleni and Patrignani, Maurizio and Symvonis, Antonios and Wolf, Samuel},
  title =	{{Planar Stories of Graph Drawings: Algorithms and Experiments}},
  booktitle =	{33rd International Symposium on Graph Drawing and Network Visualization (GD 2025)},
  pages =	{32:1--32:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-403-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{357},
  editor =	{Dujmovi\'{c}, Vida and Montecchiani, Fabrizio},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.GD.2025.32},
  URN =		{urn:nbn:de:0030-drops-250182},
  doi =		{10.4230/LIPIcs.GD.2025.32},
  annote =	{Keywords: Graph Drawing, Dynamic Graphs, Graph Stories, Heuristics, ILP}
}
Document
Invited Talk
We Are What We Index; a Primer for the Wheeler Graph Era (Invited Talk)

Authors: Ben Langmead

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
Since the arrival of second-generation sequencing, we have needed to build indexes over reference sequences - e.g. genomes and transcriptomes - in order to solve read alignment and classification problems efficiently [Langmead et al., 2009; Li and Durbin, 2009; Li et al., 2009]. The rule has been: what we can index determines what we can do. When indexing strings, we can use methods like suffix arrays [Manber and Myers, 1993], the Burrows-Wheeler Transform (BWT) [Burrows and Wheeler, 1994] / FM Index [Ferragina and Manzini, 2000], or k-mer indexes [Marchet et al., 2021]. What if we want to index objects more complex than strings? A pangenome, for example, is a large collection of similar strings, e.g. the hundreds of assemblies that make up the Human Pangenome Reference [Liao et al., 2023] or all the bacteria in the Refseq database [Goldfarb et al., 2025]. We may wish to combine these strings into a multiple sequence alignment (MSA) or a graph first. Can we index those efficiently? In many useful cases the answer is "yes," but in others the answer is "no." The story of how we learned exactly when the answer is "yes" versus "no" unfolded through a sequence of insights. Here we review this story, eventually arriving at the definition of Wheeler graphs as discovered and formalized by Gagie, Manzini and Sirén [Gagie et al., 2017]. We will focus on indexes based on the BWT, since these (a) are lossless full-text indexes, (b) are widely used in practice [Langmead et al., 2009; Li and Durbin, 2009], and (c) form the theoretical throughline for all the indexing strategies on the path to Wheeler graphs. We will trace the BWT-based indexing story from the early days of the FM Index, though its step-by-step gobbling up of trees (XBW-transform [Ferragina et al., 2005]) and de Bruijn Graphs (BOSS representation [Bowe et al., 2012]), and to the eventual formalization of Wheeler graphs [Gagie et al., 2017]. Along the way, we will define and update our notions of what it means to track a consecutive range of elements in the structure, and what it means for an index to be efficient. We will also connect these notions to automata [Sipser, 1996], noting how the indexability of Wheeler graphs (also called Wheeler automata) is connected to the mechanics of how to efficiently represent and simulate a finite automaton [Alanko et al., 2021]. With this context, we can imagine improved indexes for the future of genomics and pangenomics. De Bruijn are extremely practical and are the most widely used among the non-string data structures that are also Wheeler graphs. But we might prefer other options. For example, de Bruijn graphs have the undesirable property that they usually encode not only the true longer-than-k substrings of the original text, but also "false" substrings that span repeats. Related to this, paths through the de Bruijn graph can "glue" substrings together that are horizontally distant in the MSA. Could other Wheeler graphs be practical alternatives to de Bruijn graphs? For instance, the original GCSA study by Sirén, Välimäki and Mäkinen proposed a way to convert a multiple alignment into an automaton that either is a Wheeler graph or can be made into one [Sirén et al., 2014]. This warrants further exploration, possibly with the help of improved tools for solving the NP-complete problem of recognizing whether a graph is a Wheeler graph [Chao et al., 2023]. The notion of BWT tunnels [Baier, 2018] gives another route: we can begin with a concatenated pangenome strings and compress it by identifying and collapsing BWT tunnels. This yields a Wheeler graph that is compressed like the de Bruijn graph, but without departing from the exact contents or coordinate systems of the original genomes. The future might need us to explore all these Wheeler-graph indexes, along with the also highly practical and always-improving world of indexes buiover collections of strings [Gagie et al., 2018].

Cite as

Ben Langmead. We Are What We Index; a Primer for the Wheeler Graph Era (Invited Talk). In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 2:1-2:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{langmead:LIPIcs.WABI.2025.2,
  author =	{Langmead, Ben},
  title =	{{We Are What We Index; a Primer for the Wheeler Graph Era}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{2:1--2:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.2},
  URN =		{urn:nbn:de:0030-drops-239288},
  doi =		{10.4230/LIPIcs.WABI.2025.2},
  annote =	{Keywords: Indexing, Burrows-Wheeler Transform}
}
Document
Research
Subsequence-Based Indices for Genome Sequence Analysis

Authors: Giovanni Buzzega, Alessio Conte, Veronica Guerrini, Giulia Punzi, Giovanna Rosone, and Lorenzo Tattini

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)


Abstract
Compact indices are a fundamental tool in string analysis, even more so in bioinformatics, where genomic sequences can reach billions in length. This paper presents some recent results in which Roberto Grossi has been involved, showing how some of these indices do more than just efficiently represent data, but rather are able to bring out salient information within it, which can be exploited for their downstream analysis. Specifically, we first review a recently-introduced method [Guerrini et al., 2023] that employs the Burrows-Wheeler Transform to build reasonably accurate phylogenetic trees in an assembly-free scenario. We then describe a recent practical tool [Buzzega et al., 2025] for indexing Maximal Common Subsequences between strings, which can enable analysis of genomic sequence similarity. Experimentally, we show that the results produced by the one index are consistent with the expectations about the results of the other index.

Cite as

Giovanni Buzzega, Alessio Conte, Veronica Guerrini, Giulia Punzi, Giovanna Rosone, and Lorenzo Tattini. Subsequence-Based Indices for Genome Sequence Analysis. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 20:1-20:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{buzzega_et_al:OASIcs.Grossi.20,
  author =	{Buzzega, Giovanni and Conte, Alessio and Guerrini, Veronica and Punzi, Giulia and Rosone, Giovanna and Tattini, Lorenzo},
  title =	{{Subsequence-Based Indices for Genome Sequence Analysis}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{20:1--20:21},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.20},
  URN =		{urn:nbn:de:0030-drops-238199},
  doi =		{10.4230/OASIcs.Grossi.20},
  annote =	{Keywords: String Indices, Burrows-Wheeler Transform, Maximal Common Subsequences, Sequence Analysis, Phylogeny}
}
Document
Track A: Algorithms, Complexity and Games
Faster Fréchet Distance Under Transformations

Authors: Kevin Buchin, Maike Buchin, Zijin Huang, André Nusser, and Sampson Wong

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)


Abstract
We study the problem of computing the Fréchet distance between two polygonal curves under transformations. First, we consider translations in the Euclidean plane. Given two curves π and σ of total complexity n and a threshold δ ≥ 0, we present an 𝒪̃(n^{7 + 1/3}) time algorithm to determine whether there exists a translation t ∈ ℝ² such that the Fréchet distance between π and σ + t is at most δ. This improves on the previous best result, which is an 𝒪(n⁸) time algorithm. We then generalize this result to any class of rationally parameterized transformations, which includes translation, rotation, scaling, and arbitrary affine transformations. For a class T of rationally parametrized transformations with k degrees of freedom, we show that one can determine whether there is a transformation τ ∈ T such that the Fréchet distance between π and τ(σ) is at most δ in 𝒪̃(n^{3k+4/3}) time.

Cite as

Kevin Buchin, Maike Buchin, Zijin Huang, André Nusser, and Sampson Wong. Faster Fréchet Distance Under Transformations. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 36:1-36:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{buchin_et_al:LIPIcs.ICALP.2025.36,
  author =	{Buchin, Kevin and Buchin, Maike and Huang, Zijin and Nusser, Andr\'{e} and Wong, Sampson},
  title =	{{Faster Fr\'{e}chet Distance Under Transformations}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{36:1--36:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.36},
  URN =		{urn:nbn:de:0030-drops-234137},
  doi =		{10.4230/LIPIcs.ICALP.2025.36},
  annote =	{Keywords: Fr\'{e}chet distance, curve similarity, shape matching}
}
Document
Transforming Dogs on the Line: On the Fréchet Distance Under Translation or Scaling in 1D

Authors: Lotte Blank, Jacobus Conradi, Anne Driemel, Benedikt Kolbe, André Nusser, and Marena Richter

Published in: LIPIcs, Volume 332, 41st International Symposium on Computational Geometry (SoCG 2025)


Abstract
The Fréchet distance is a computational mainstay for comparing polygonal curves. The Fréchet distance under translation, which is a translation invariant version, considers the similarity of two curves independent of their location in space. It is defined as the minimum Fréchet distance that arises from allowing arbitrary translations of the input curves. This problem and numerous variants of the Fréchet distance under some transformations have been studied, with more work concentrating on the discrete Fréchet distance, leaving a significant gap between the discrete and continuous versions of the Fréchet distance under transformations. Our contribution is twofold: First, we present an algorithm for the Fréchet distance under translation on 1-dimensional curves of complexity n with a running time of 𝒪(n^{8/3} log³ n). To achieve this, we develop a novel framework for the problem for 1-dimensional curves, which also applies to other scenarios and leads to our second contribution. We present an algorithm with the same running time of 𝒪(n^{8/3} log³ n) for the Fréchet distance under scaling for 1-dimensional curves. For both algorithms we match the running times of the discrete case and improve the previously best known bounds of 𝒪̃(n⁴). Our algorithms rely on technical insights but are conceptually simple, essentially reducing the continuous problem to the discrete case across different length scales.

Cite as

Lotte Blank, Jacobus Conradi, Anne Driemel, Benedikt Kolbe, André Nusser, and Marena Richter. Transforming Dogs on the Line: On the Fréchet Distance Under Translation or Scaling in 1D. In 41st International Symposium on Computational Geometry (SoCG 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 332, pp. 22:1-22:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{blank_et_al:LIPIcs.SoCG.2025.22,
  author =	{Blank, Lotte and Conradi, Jacobus and Driemel, Anne and Kolbe, Benedikt and Nusser, Andr\'{e} and Richter, Marena},
  title =	{{Transforming Dogs on the Line: On the Fr\'{e}chet Distance Under Translation or Scaling in 1D}},
  booktitle =	{41st International Symposium on Computational Geometry (SoCG 2025)},
  pages =	{22:1--22:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-370-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{332},
  editor =	{Aichholzer, Oswin and Wang, Haitao},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2025.22},
  URN =		{urn:nbn:de:0030-drops-231746},
  doi =		{10.4230/LIPIcs.SoCG.2025.22},
  annote =	{Keywords: Fr\'{e}chet distance under translation, Fr\'{e}chet distance under scaling, time series, shape matching}
}
Document
Faster Approximate Elastic-Degenerate String Matching - Part B

Authors: Paweł Gawrychowski, Adam Górkiewicz, Pola Marciniak, Solon P. Pissis, and Karol Pokorski

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)


Abstract
We revisit the complexity of approximate pattern matching in an elastic-degenerate string. Such a string is a sequence of n finite sets of strings of total length N, and compactly describes a collection of strings obtained by first choosing exactly one string in every set, and then concatenating them together. This is motivated by the need of storing a collection of highly similar DNA sequences. The basic algorithmic question on elastic-degenerate strings is pattern matching: given such an elastic-degenerate string and a standard pattern of length m, check if the pattern occurs in one of the strings in the described collection. Bernardini et al. [SICOMP 2022] showed how to leverage fast matrix multiplication to obtain an Õ(nm^{ω-1})+𝒪(N)-time complexity for this problem, where ω is the matrix multiplication exponent. However, from the point of view of possible applications, it is more desirable to work with approximate pattern matching, where we seek approximate occurrences of the pattern. This generalization has been considered in a few papers already, but the best result so far for occurrences with k mismatches, where k is a constant, is the Õ(nm²+N)-time algorithm presented in Part A [CPM 2025]. This brings the question whether increasing the dependency on m from m^{ω-1} to quadratic is necessary when moving from k = 0 to larger (but still constant) k. We design an Õ(nm^{1.5}+N)-time algorithm for pattern matching with k mismatches in an elastic-degenerate string, for any constant k. To obtain this time bound, we leverage the structural characterization of occurrences with k mismatches of Charalampopoulos, Kociumaka, and Wellnitz [FOCS 2020] together with fast Fourier transform. We need to work with multiple patterns at the same time, instead of a single pattern, which requires refining the original characterization. This might be of independent interest.

Cite as

Paweł Gawrychowski, Adam Górkiewicz, Pola Marciniak, Solon P. Pissis, and Karol Pokorski. Faster Approximate Elastic-Degenerate String Matching - Part B. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 29:1-29:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2025.29,
  author =	{Gawrychowski, Pawe{\l} and G\'{o}rkiewicz, Adam and Marciniak, Pola and Pissis, Solon P. and Pokorski, Karol},
  title =	{{Faster Approximate Elastic-Degenerate String Matching - Part B}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{29:1--29:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.29},
  URN =		{urn:nbn:de:0030-drops-231236},
  doi =		{10.4230/LIPIcs.CPM.2025.29},
  annote =	{Keywords: ED string, approximate pattern matching, Hamming distance, k mismatches}
}
Document
Faster Approximate Elastic-Degenerate String Matching - Part A

Authors: Solon P. Pissis, Jakub Radoszewski, and Wiktor Zuba

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)


Abstract
An elastic-degenerate (ED) string 𝐓 is a sequence 𝐓 = 𝐓[1] ⋯ 𝐓[n] of n finite sets of strings. The cardinality m of 𝐓 is the total number of strings in 𝐓[i], for all i ∈ [1..n]. The size N of 𝐓 is the total length of all m strings of 𝐓. ED strings have been introduced to represent a set of closely-related DNA sequences. Let P = P[1..p] be a pattern of length p and k > 0 be an integer. We consider the problem of k-Approximate ED String Matching (EDSM): searching k-approximate occurrences of P in the language of 𝐓. We call k-Approximate EDSM under the Hamming distance, k-Mismatch EDSM; and we call k-Approximate EDSM under edit distance, k-Edit EDSM. Bernardini et al. (Theoretical Computer Science, 2020) showed a simple 𝒪(k m p + kN)-time algorithm for k-Mismatch EDSM and an 𝒪(k² m p + kN)-time algorithm for k-Edit EDSM. We improve the dependency on k in both results, obtaining an Õ(k^{2/3}mp+√kN)-time algorithm for k-Mismatch EDSM and an Õ(kmp+ kN)-time algorithm for k-Edit EDSM. Bernardini et al. (Theory of Computing Systems, 2024) presented several algorithms for 1-Approximate EDSM working in Õ(np²+N) time. They have also left the possibility to generalize these solutions for k > 1 as an open problem. We improve the runtime of their solution for 1-Mismatch and 1-Edit EDSM from Õ(np²+N) to 𝒪(np²+N). We further show algorithms for k-Approximate EDSM for the Hamming and edit distances working in Õ(np² + N) time, for any constant k > 0. Finally, we show how our techniques can be applied to improve upon the complexity of the k-Approximate ED String Intersection and k-Approximate Doubly EDSM problems that were introduced very recently by Gabory et al. (Information and Computation, 2025).

Cite as

Solon P. Pissis, Jakub Radoszewski, and Wiktor Zuba. Faster Approximate Elastic-Degenerate String Matching - Part A. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 28:1-28:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{pissis_et_al:LIPIcs.CPM.2025.28,
  author =	{Pissis, Solon P. and Radoszewski, Jakub and Zuba, Wiktor},
  title =	{{Faster Approximate Elastic-Degenerate String Matching - Part A}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{28:1--28:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.28},
  URN =		{urn:nbn:de:0030-drops-231227},
  doi =		{10.4230/LIPIcs.CPM.2025.28},
  annote =	{Keywords: ED string, approximate string matching, Hamming distance, edit distance}
}
Document
Representing Paths in Digraphs

Authors: Riccardo Dondi and Alexandru Popa

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)


Abstract
In this contribution we consider two combinatorial problems related to graph string matching, motivated by recent approaches in computational genomics. Given a DAG where each node is labeled by a symbol, the problems aim to find a path in the DAG whose nodes contain all (or the maximum number of) symbols of the alphabet. We introduce a decision problem, Σ-Representing Path, that asks whether there exists a path that contains all the symbols of the alphabet, and an optimization problem, called Maximum Representing Path, that asks for a path that contains the maximum number of symbols. We analyze the complexity of the problems, showing the NP-completeness of {Σ-Representing Path} when each symbol labels at most three nodes in the DAG, and showing the APX-hardness of Maximum Representing Path when each symbol labels at most two nodes in the DAG. We complement the first result by giving a polynomial-time algorithm for Σ-Representing Path when each symbol labels at most two nodes in the DAG. Then we investigate the parameterized complexity of the two problems when the DAG has a limited distance from a set of disjoint paths and we show that both problems are W[1]-hard for this parameter. We consider the approximation of Maximum Representing Path, giving an approximation algorithm of factor √OPT, where OPT is the value of an optimal solution of the problem. We also show that Maximum Representing Path cannot be approximated within factor e/(e-1) - α, for any constant α > 0, unless NP ⊆ DTIME(|V|^{O(log log |V|)}) (V is the set of nodes of the DAG).

Cite as

Riccardo Dondi and Alexandru Popa. Representing Paths in Digraphs. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 1:1-1:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{dondi_et_al:LIPIcs.CPM.2025.1,
  author =	{Dondi, Riccardo and Popa, Alexandru},
  title =	{{Representing Paths in Digraphs}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{1:1--1:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.1},
  URN =		{urn:nbn:de:0030-drops-230954},
  doi =		{10.4230/LIPIcs.CPM.2025.1},
  annote =	{Keywords: Graph String Matching, Computational Complexity, Parameterized Complexity, Algorithms}
}
Document
Can You Link Up With Treewidth?

Authors: Radu Curticapean, Simon Döring, Daniel Neuen, and Jiaheng Wang

Published in: LIPIcs, Volume 327, 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)


Abstract
A central result by Marx [ToC '10] constructs k-vertex graphs H of maximum degree 3 such that n^o(k/log k) time algorithms for detecting colorful H-subgraphs would refute the Exponential-Time Hypothesis (ETH). This result is widely used to obtain almost-tight conditional lower bounds for parameterized problems under ETH. Our first contribution is a new and fully self-contained proof of this result that further simplifies a recent work by Karthik et al. [SOSA 2024]. In our proof, we introduce a novel graph parameter of independent interest, the linkage capacity γ(H), and show that detecting colorful H-subgraphs in time n^o(γ(H)) refutes ETH. Then, we use a simple construction of communication networks credited to Beneš to obtain k-vertex graphs of maximum degree 3 and linkage capacity Ω(k/log k), avoiding arguments involving expander graphs, which were required in previous papers. We also show that every graph H of treewidth t has linkage capacity Ω(t/log t), thus recovering a stronger result shown by Marx [ToC '10] with a simplified proof. Additionally, we obtain new tight lower bounds on the complexity of subgraph detection for certain types of patterns by analyzing their linkage capacity: We prove that almost all k-vertex graphs of polynomial average degree Ω(k^β) for β > 0 have linkage capacity Θ(k), which implies tight lower bounds for finding such patterns H. As an application of these results, we also obtain tight lower bounds for counting small induced subgraphs having a fixed property Φ, improving bounds from, e.g., [Roth et al., FOCS 2020].

Cite as

Radu Curticapean, Simon Döring, Daniel Neuen, and Jiaheng Wang. Can You Link Up With Treewidth?. In 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 327, pp. 28:1-28:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{curticapean_et_al:LIPIcs.STACS.2025.28,
  author =	{Curticapean, Radu and D\"{o}ring, Simon and Neuen, Daniel and Wang, Jiaheng},
  title =	{{Can You Link Up With Treewidth?}},
  booktitle =	{42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)},
  pages =	{28:1--28:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-365-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{327},
  editor =	{Beyersdorff, Olaf and Pilipczuk, Micha{\l} and Pimentel, Elaine and Thắng, Nguy\~{ê}n Kim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2025.28},
  URN =		{urn:nbn:de:0030-drops-228534},
  doi =		{10.4230/LIPIcs.STACS.2025.28},
  annote =	{Keywords: subgraph isomorphism, constraint satisfaction problems, linkage capacity, exponential-time hypothesis, parameterized complexity, counting complexity}
}
Document
Beyond the Longest Letter-Duplicated Subsequence Problem

Authors: Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, and Peng Zou

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)


Abstract
Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest letter-duplicated subsequence (LLDS) is proposed. Given a sequence S of length n, a letter-duplicated subsequence is a subsequence of S in the form of x₁^{d₁}x₂^{d₂}⋯ x_k^{d_k} with x_i ∈ Σ, x_j≠ x_{j+1} and d_i ≥ 2 for all i in [k] and j in [k-1]. A linear time algorithm for computing the longest letter-duplicated subsequence (LLDS) of S can be easily obtained. In this paper, we focus on two variants of this problem. We first consider the constrained version when Σ is unbounded, each letter appears in S at least 6 times and all the letters in Σ must appear in the solution. We show that the problem is NP-hard (a further twist indicates that the problem does not admit any polynomial time approximation). The reduction is from possibly the simplest version of SAT that is NP-complete, (≤ 2,1, ≤ 3)-SAT, where each variable appears at most twice positively and exact once negatively, and each clause contains at most three literals and some clauses must contain exactly two literals. (We hope that this technique will serve as a general tool to help us proving the NP-hardness for some more tricky sequence problems involving only one sequence - much harder than with at least two input sequences, which we apply successfully at the end of the paper on some extra variations of the LLDS problem.) We then show that when each letter appears in S at most 3 times, then the problem admits a factor 1.5-O(1/n) approximation. Finally, we consider the weighted version, where the weight of a block x_i^{d_i} (d_i ≥ 2) could be any positive function which might not grow with d_i. We give a non-trivial O(n²) time dynamic programming algorithm for this version, i.e., computing an LD-subsequence of S whose weight is maximized.

Cite as

Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, and Peng Zou. Beyond the Longest Letter-Duplicated Subsequence Problem. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 7:1-7:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{lai_et_al:LIPIcs.CPM.2022.7,
  author =	{Lai, Wenfeng and Liyanage, Adiesha and Zhu, Binhai and Zou, Peng},
  title =	{{Beyond the Longest Letter-Duplicated Subsequence Problem}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{7:1--7:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.7},
  URN =		{urn:nbn:de:0030-drops-161348},
  doi =		{10.4230/LIPIcs.CPM.2022.7},
  annote =	{Keywords: Segmental duplications, Tandem duplications, Longest common subsequence, NP-completeness, Dynamic programming}
}
Document
Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms

Authors: Manuel Lafond, Binhai Zhu, and Peng Zou

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)


Abstract
Recently, due to the genomic sequence analysis in several types of cancer, genomic data based on copy number profiles (CNP for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific segment of interest. The motivation is that in the late stage of certain types of cancer, the genomes are progressing rapidly by segmental duplications and deletions, and hence obtaining the exact sequences becomes difficult. Instead, the number of copies of important segments can be predicted from expression analysis and carries important biological information. Therefore, significant research has recently been devoted to the analysis of genomic data represented as CNP’s. In this paper, we present two streams of results. The first is the negative results on two open problems regarding the computational complexity of the Minimum Copy Number Generation (MCNG) problem posed by Qingge et al. in 2018. The Minimum Copy Number Generation (MCNG) is defined as follows: given a string S in which each character represents a gene or segment, and a CNP C, compute a string T from S, with the minimum number of segmental duplications and deletions, such that cnp(T)=C. It was shown by Qingge et al. that the problem is NP-hard if the duplications are tandem and they left the open question of whether the problem remains NP-hard if arbitrary duplications and/or deletions are used. We answer this question affirmatively in this paper; in fact, we prove that it is NP-hard to even obtain a constant factor approximation. This is achieved through a general-purpose lemma on set-cover reductions that require an exact cover in one direction, but not the other, which might be of independent interest. We also prove that the corresponding parameterized version is W[1]-hard, answering another open question by Qingge et al. The other result is positive and is based on a new (and more general) problem regarding CNP’s. The Copy Number Profile Conforming (CNPC) problem is formally defined as follows: given two CNP’s C₁ and C₂, compute two strings S₁ and S₂ with cnp(S₁)=C₁ and cnp(S₂)=C₂ such that the distance between S₁ and S₂, d(S₁,S₂), is minimized. Here, d(S₁,S₂) is a very general term, which means it could be any genome rearrangement distance (like reversal, transposition, and tandem duplication, etc). We make the first step by showing that if d(S₁,S₂) is measured by the breakpoint distance then the problem is polynomially solvable. We expect that this will trigger some related research along the line in the near future.

Cite as

Manuel Lafond, Binhai Zhu, and Peng Zou. Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 22:1-22:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{lafond_et_al:LIPIcs.CPM.2020.22,
  author =	{Lafond, Manuel and Zhu, Binhai and Zou, Peng},
  title =	{{Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{22:1--22:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.22},
  URN =		{urn:nbn:de:0030-drops-121471},
  doi =		{10.4230/LIPIcs.CPM.2020.22},
  annote =	{Keywords: Computational genomics, cancer genomics, copy number profiles, NP-hardness, approximation algorithms, FPT algorithms}
}
Document
The Tandem Duplication Distance Is NP-Hard

Authors: Manuel Lafond, Binhai Zhu, and Peng Zou

Published in: LIPIcs, Volume 154, 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)


Abstract
In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation AXB ⇒ AXXB. Tandem exon duplications have been found in many species such as human, fly or worm, and have been largely studied in computational biology. The Tandem Duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet, compute the smallest sequence of tandem duplications required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that tandem duplications have received much attention ever since. In this paper, we prove that this problem is NP-hard, settling the 16-year old open problem. We further show that this hardness holds even if all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. One of the tools we develop for the reduction is a new problem called the Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems.

Cite as

Manuel Lafond, Binhai Zhu, and Peng Zou. The Tandem Duplication Distance Is NP-Hard. In 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 154, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{lafond_et_al:LIPIcs.STACS.2020.15,
  author =	{Lafond, Manuel and Zhu, Binhai and Zou, Peng},
  title =	{{The Tandem Duplication Distance Is NP-Hard}},
  booktitle =	{37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-140-5},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{154},
  editor =	{Paul, Christophe and Bl\"{a}ser, Markus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2020.15},
  URN =		{urn:nbn:de:0030-drops-118769},
  doi =		{10.4230/LIPIcs.STACS.2020.15},
  annote =	{Keywords: Tandem duplication, Text processing, Formal languages, Computational genomics, FPT algorithms}
}
Document
A 2-Approximation Algorithm for the Complementary Maximal Strip Recovery Problem

Authors: Haitao Jiang, Jiong Guo, Daming Zhu, and Binhai Zhu

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
The Maximal Strip Recovery problem (MSR) and its complementary (CMSR) are well-studied NP-hard problems in computational genomics. The input of these dual problems are two signed permutations. The goal is to delete some gene markers from both permutations, such that, in the remaining permutations, each gene marker has at least one common neighbor. Equivalently, the resulting permutations could be partitioned into common strips of length at least two. Then MSR is to maximize the number of remaining genes, while the objective of CMSR is to delete the minimum number of gene markers. In this paper, we present a new approximation algorithm for the Complementary Maximal Strip Recovery (CMSR) problem. Our approximation factor is 2, improving the currently best 7/3-approximation algorithm. Although the improvement on the factor is not huge, the analysis is greatly simplified by a compensating method, commonly referred to as the non-oblivious local search technique. In such a method a substitution may not always increase the value of the current solution (it sometimes may even decrease the solution value), though it always improves the value of another function seemingly unrelated to the objective function.

Cite as

Haitao Jiang, Jiong Guo, Daming Zhu, and Binhai Zhu. A 2-Approximation Algorithm for the Complementary Maximal Strip Recovery Problem. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.CPM.2019.5,
  author =	{Jiang, Haitao and Guo, Jiong and Zhu, Daming and Zhu, Binhai},
  title =	{{A 2-Approximation Algorithm for the Complementary Maximal Strip Recovery Problem}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{5:1--5:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.5},
  URN =		{urn:nbn:de:0030-drops-104769},
  doi =		{10.4230/LIPIcs.CPM.2019.5},
  annote =	{Keywords: Maximal strip recovery, complementary maximal strip recovery, computational genomics, approximation algorithm, local search}
}
  • Refine by Type
  • 42 Document/PDF
  • 9 Document/HTML
  • 1 Volume

  • Refine by Publication Year
  • 10 2025
  • 1 2022
  • 2 2020
  • 1 2019
  • 27 2018
  • Show More...

  • Refine by Author
  • 8 Zhu, Binhai
  • 7 Bannai, Hideo
  • 5 Inenaga, Shunsuke
  • 5 Takeda, Masayuki
  • 4 Nakashima, Yuto
  • Show More...

  • Refine by Series/Journal
  • 41 LIPIcs
  • 1 OASIcs

  • Refine by Classification
  • 12 Theory of computation → Pattern matching
  • 4 Mathematics of computing → Combinatorial algorithms
  • 3 Theory of computation
  • 3 Theory of computation → Computational geometry
  • 3 Theory of computation → Data compression
  • Show More...

  • Refine by Keyword
  • 3 FPT algorithms
  • 3 Hamming distance
  • 2 Burrows-Wheeler Transform
  • 2 Computational genomics
  • 2 Data Structure
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail