DROPS

Volume

LIPIcs, Volume 259

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

CPM 2023, June 26-28, 2023, Marne-la-Vallée, France

Editors: Laurent Bulteau and Zsuzsanna Lipták

Document

DOI: 10.4230/LIPIcs.WABI.2024.2

On the Complexity of the Median and Closest Permutation Problems

Authors: Luís Cunha, Ignasi Sau, and Uéverton Souza

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Genome rearrangements are events where large blocks of DNA exchange places during evolution. The analysis of these events is a promising tool for understanding evolutionary genomics, providing data for phylogenetic reconstruction based on genome rearrangement measures. Many pairwise rearrangement distances have been proposed, based on finding the minimum number of rearrangement events to transform one genome into the other, using some predefined operation. When more than two genomes are considered, we have the more challenging problem of rearrangement-based phylogeny reconstruction. Given a set of genomes and a distance notion, there are at least two natural ways to define the "target" genome. On the one hand, finding a genome that minimizes the sum of the distances from this to any other, called the median genome. On the other hand, finding a genome that minimizes the maximum distance to any other, called the closest genome. Considering genomes as permutations of distinct integers, some distance metrics have been extensively studied. We investigate the median and closest problems on permutations over the following metrics: breakpoint distance, swap distance, block-interchange distance, short-block-move distance, and transposition distance. In biological applications some values are usually very small, such as the solution value d or the number k of input permutations. For each of these metrics and parameters d or k, we analyze the closest and the median problems from the viewpoint of parameterized complexity. We obtain the following results: NP-hardness for finding the median/closest permutation regarding some metrics of distance, even for only k = 3 permutations; Polynomial kernels for the problems of finding the median permutation of all studied metrics, considering the target distance d as parameter; NP-hardness result for finding the closest permutation by short-block-moves; FPT algorithms and infeasibility of polynomial kernels for finding the closest permutation for some metrics when parameterized by the target distance d.

Cite as

Luís Cunha, Ignasi Sau, and Uéverton Souza. On the Complexity of the Median and Closest Permutation Problems. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 2:1-2:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{cunha_et_al:LIPIcs.WABI.2024.2,
  author =	{Cunha, Lu{\'\i}s and Sau, Ignasi and Souza, U\'{e}verton},
  title =	{{On the Complexity of the Median and Closest Permutation Problems}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{2:1--2:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.2},
  URN =		{urn:nbn:de:0030-drops-206468},
  doi =		{10.4230/LIPIcs.WABI.2024.2},
  annote =	{Keywords: Median problem, Closest problem, Genome rearrangements, Parameterized complexity}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.9

Orientability of Undirected Phylogenetic Networks to a Desired Class: Practical Algorithms and Application to Tree-Child Orientation

Authors: Tsuyoshi Urata, Manato Yokoyama, and Momoko Hayamizu

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

The 𝒞-Orientation problem asks whether it is possible to orient an undirected graph to a directed phylogenetic network of a desired class 𝒞, and to find such an orientation if one exists. The problem can arise when visualising evolutionary data, for example, because popular phylogenetic network reconstruction methods such as Neighbor-Net are distance-based and thus inevitably produce undirected graphs. The complexity of 𝒞-Orientation remains open for many classes 𝒞, including binary tree-child networks, and practical methods are still lacking. In this paper, we propose an exponential but practically efficient FPT algorithm for 𝒞-Orientation, which is parameterised by the reticulation number and the maximum size of minimal basic cycles used in the computation. We also present a very fast heuristic for Tree-Child Orientation. To evaluate the empirical performance of the proposed methods, we compared their accuracy and execution time for Tree-Child Orientation with those of an exponential time 𝒞-orientation algorithm from the literature. Our experiments show that the proposed exact algorithm is significantly faster than the state-of-the-art exponential time algorithm. The proposed heuristic runs even faster but the accuracy decreases as the reticulation number increases.

Cite as

Tsuyoshi Urata, Manato Yokoyama, and Momoko Hayamizu. Orientability of Undirected Phylogenetic Networks to a Desired Class: Practical Algorithms and Application to Tree-Child Orientation. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 9:1-9:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{urata_et_al:LIPIcs.WABI.2024.9,
  author =	{Urata, Tsuyoshi and Yokoyama, Manato and Hayamizu, Momoko},
  title =	{{Orientability of Undirected Phylogenetic Networks to a Desired Class: Practical Algorithms and Application to Tree-Child Orientation}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{9:1--9:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.9},
  URN =		{urn:nbn:de:0030-drops-206531},
  doi =		{10.4230/LIPIcs.WABI.2024.9},
  annote =	{Keywords: Phylogenetic Networks, Tree-Child Networks, Graph Orientation Algorithms}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.16

Finding Maximum Common Contractions Between Phylogenetic Networks

Authors: Bertrand Marchand, Nadia Tahiri, Olivier Tremblay-Savard, and Manuel Lafond

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

In this paper, we lay the groundwork on the comparison of phylogenetic networks based on edge contractions and expansions as edit operations, as originally proposed by Robinson and Foulds to compare trees. We prove that these operations connect the space of all phylogenetic networks on the same set of leaves, even if we forbid contractions that create cycles. This allows to define an operational distance on this space, as the minimum number of contractions and expansions required to transform one network into another. We highlight the difference between this distance and the computation of the maximum common contraction between two networks. Given its ability to outline a common structure between them, which can provide valuable biological insights, we study the algorithmic aspects of the latter. We first prove that computing a maximum common contraction between two networks is NP-hard, even when the maximum degree, the size of the common contraction, or the number of leaves is bounded. We also provide lower bounds to the problem based on the Exponential-Time Hypothesis. Nonetheless, we do provide a polynomial-time algorithm for weakly galled trees, a generalization of galled trees.

Cite as

Bertrand Marchand, Nadia Tahiri, Olivier Tremblay-Savard, and Manuel Lafond. Finding Maximum Common Contractions Between Phylogenetic Networks. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 16:1-16:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{marchand_et_al:LIPIcs.WABI.2024.16,
  author =	{Marchand, Bertrand and Tahiri, Nadia and Tremblay-Savard, Olivier and Lafond, Manuel},
  title =	{{Finding Maximum Common Contractions Between Phylogenetic Networks}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{16:1--16:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.16},
  URN =		{urn:nbn:de:0030-drops-206606},
  doi =		{10.4230/LIPIcs.WABI.2024.16},
  annote =	{Keywords: Phylogenetic networks, contractions, algorithms, weakly galled trees}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.19

RNA Inverse Folding Can Be Solved in Linear Time for Structures Without Isolated Stacks or Base Pairs

Authors: Théo Boury, Laurent Bulteau, and Yann Ponty

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Inverse folding is a classic instance of negative RNA design which consists in finding a sequence that uniquely folds into a target secondary structure with respect to energy minimization. A breakthrough result of Bonnet et al. shows that, even in simple base pairs-based (BP) models, the decision version of a mildly constrained version of inverse folding is NP-hard. In this work, we show that inverse folding can be solved in linear time for a large collection of targets, including every structure that contains no isolated BP and no isolated stack (or, equivalently, when all helices consist of 3^{+} base pairs). For structures featuring shorter helices, our linear algorithm is no longer guaranteed to produce a solution, but still does so for a large proportion of instances. Our approach introduces a notion of modulo m-separability, generalizing a property pioneered by Hales et al. Separability is a sufficient condition for the existence of a solution to the inverse folding problem. We show that, for any input secondary structure of length n, a modulo m-separated sequence can be produced in time 𝒪(n 2^m) anytime such a sequence exists. Meanwhile, we show that any structure consisting of 3^{+} base pairs is either trivially non-designable, or always admits a modulo-2 separated solution (m = 2). Solution sequences can thus be produced in linear time, and even be uniformly generated within the set of modulo-2 separable sequences.

Cite as

Théo Boury, Laurent Bulteau, and Yann Ponty. RNA Inverse Folding Can Be Solved in Linear Time for Structures Without Isolated Stacks or Base Pairs. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 19:1-19:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boury_et_al:LIPIcs.WABI.2024.19,
  author =	{Boury, Th\'{e}o and Bulteau, Laurent and Ponty, Yann},
  title =	{{RNA Inverse Folding Can Be Solved in Linear Time for Structures Without Isolated Stacks or Base Pairs}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{19:1--19:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.19},
  URN =		{urn:nbn:de:0030-drops-206632},
  doi =		{10.4230/LIPIcs.WABI.2024.19},
  annote =	{Keywords: RNA structure, String Design, Parameterized Complexity, Uniform Sampling}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.21

McDag: Indexing Maximal Common Subsequences in Practice

Authors: Giovanni Buzzega, Alessio Conte, Roberto Grossi, and Giulia Punzi

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Analyzing and comparing sequences of symbols is among the most fundamental problems in computer science, possibly even more so in bioinformatics. Maximal Common Subsequences (MCSs), i.e., inclusion-maximal sequences of non-contiguous symbols common to two or more strings, have only recently received attention in this area, despite being a basic notion and a natural generalization of more common tools like Longest Common Substrings/Subsequences. In this paper we simplify and engineer recent advancements on MCSs into a practical tool called McDag, the first publicly available tool that can index MCSs of real genomic data. We demonstrate that our tool can index sequences exceeding 10,000 base pairs within minutes, utilizing only 4-7% more than the minimum required nodes, while also extracting relevant insights.

Cite as

Giovanni Buzzega, Alessio Conte, Roberto Grossi, and Giulia Punzi. McDag: Indexing Maximal Common Subsequences in Practice. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{buzzega_et_al:LIPIcs.WABI.2024.21,
  author =	{Buzzega, Giovanni and Conte, Alessio and Grossi, Roberto and Punzi, Giulia},
  title =	{{McDag: Indexing Maximal Common Subsequences in Practice}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{21:1--21:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.21},
  URN =		{urn:nbn:de:0030-drops-206650},
  doi =		{10.4230/LIPIcs.WABI.2024.21},
  annote =	{Keywords: Index data structure, DAG, Common subsequence, Inclusion-wise maximality, LCS}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2024.85

Approximate Suffix-Prefix Dictionary Queries

Authors: Wiktor Zuba, Grigorios Loukides, Solon P. Pissis, and Sharma V. Thankachan

Published in: LIPIcs, Volume 306, 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)

Abstract

In the all-pairs suffix-prefix (APSP) problem [Gusfield et al., Inf. Process. Lett. 1992], we are given a dictionary R of r strings, S₁,…,S_r, of total length n, and we are asked to find the length SPL_{i,j} of the longest string that is both a suffix of S_i and a prefix of S_j, for all i,j ∈ [1..r]. APSP is a classic problem in string algorithms with applications in bioinformatics, especially in sequence assembly. Since r = |R| is typically very large in real-world applications, considering all r² pairs of strings explicitly is prohibitive. This is when the data structure variant of APSP makes sense; in the same spirit as distance oracles computing shortest paths between any two vertices given online. We show how to quickly locate k-approximate matches (under the Hamming or the edit distance) in R using a version of the k-errata tree [Cole et al., STOC 2004] that we introduce. Let SPL^k_{i,j} be the length of the longest suffix of S_i that is at distance at most k from a prefix of S_j. In particular, for any k = 𝒪(1), we show an 𝒪(nlog^k n)-sized data structure to support the following queries: - One-to-One^k(i,j): output SPL^k_{i,j} in 𝒪(log^k nlog log n) time. - Report^k(i,d): output all j ∈ [1..r], such that SPL^k_{i,j} ≥ d, in 𝒪(log^{k}n(log n/log log n+output)) time, where output denotes the size of the output. In fact, our algorithms work for any value of k not just for k = 𝒪(1), but the formulas bounding the complexities get much more complicated for larger values of k.

Cite as

Wiktor Zuba, Grigorios Loukides, Solon P. Pissis, and Sharma V. Thankachan. Approximate Suffix-Prefix Dictionary Queries. In 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 306, pp. 85:1-85:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{zuba_et_al:LIPIcs.MFCS.2024.85,
  author =	{Zuba, Wiktor and Loukides, Grigorios and Pissis, Solon P. and Thankachan, Sharma V.},
  title =	{{Approximate Suffix-Prefix Dictionary Queries}},
  booktitle =	{49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)},
  pages =	{85:1--85:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-335-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{306},
  editor =	{Kr\'{a}lovi\v{c}, Rastislav and Ku\v{c}era, Anton{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2024.85},
  URN =		{urn:nbn:de:0030-drops-206416},
  doi =		{10.4230/LIPIcs.MFCS.2024.85},
  annote =	{Keywords: all-pairs suffix-prefix, suffix-prefix queries, suffix tree, k-errata tree}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.89

Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching

Authors: Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

A parameterized string (p-string) is a string over an alphabet (Σ_s ∪ Σ_p), where Σ_s and Σ_p are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings x and y are said to parameterized match (p-match) if and only if x can be transformed into y by applying a bijection on Σ_p to every occurrence of p-symbols in x. The indexing problem for p-matching is to preprocess a p-string T of length n so that we can efficiently find the occurrences of substrings of T that p-match with a given pattern. Let σ_s and respectively σ_p be the numbers of distinct s-symbols and p-symbols that appear in T and σ = σ_s + σ_p. Extending the Burrows-Wheeler Transform (BWT) based index for exact string pattern matching, Ganguly et al. [SODA 2017] proposed parameterized BWTs (pBWTs) to design the first compact index for p-matching, and posed an open problem on how to construct the pBWT-based index in compact space, i.e., in O(n lg |Σ_s ∪ Σ_p|) bits of space. Hashimoto et al. [SPIRE 2022] showed how to construct the pBWT for T, under the assumption that Σ_s ∪ Σ_p = [0..O(σ)], in O(n lg σ) bits of space and O(n (σ_p lg n)/(lg lg n)) time in an online manner while reading the symbols of T from right to left. In this paper, we refine Hashimoto et al.’s algorithm to work in O(n lg σ) bits of space and O(n (lg σ_p lg n)/(lg lg n)) time in a more general assumption that Σ_s ∪ Σ_p = [0..n^{O(1)}]. Our result has an immediate application to constructing parameterized suffix arrays in O(n (lg σ_p lg n)/(lg lg n)) time and O(n lg σ) bits of working space. We also show that our data structure can support backward search, a core procedure of BWT-based indexes, at any stage of the online construction, making it the first compact index for p-matching that can be constructed in compact space and even in an online manner.

Cite as

Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara. Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 89:1-89:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{iseri_et_al:LIPIcs.ICALP.2024.89,
  author =	{Iseri, Kento and I, Tomohiro and Hendrian, Diptarama and K\"{o}ppl, Dominik and Yoshinaka, Ryo and Shinohara, Ayumi},
  title =	{{Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{89:1--89:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.89},
  URN =		{urn:nbn:de:0030-drops-202324},
  doi =		{10.4230/LIPIcs.ICALP.2024.89},
  annote =	{Keywords: Index for parameterized pattern matching, Parameterized Burrows-Wheeler Transform, Online construction}
}

@InProceedings{iseri_et_al:LIPIcs.ICALP.2024.89,
  author =	{Iseri, Kento and I, Tomohiro and Hendrian, Diptarama and K\"{o}ppl, Dominik and Yoshinaka, Ryo and Shinohara, Ayumi},
  title =	{{Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{89:1--89:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.89},
  URN =		{urn:nbn:de:0030-drops-202324},
  doi =		{10.4230/LIPIcs.ICALP.2024.89},
  annote =	{Keywords: Index for parameterized pattern matching, Parameterized Burrows-Wheeler Transform, Online construction}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.CPM.2023

LIPIcs, Volume 259, CPM 2023, Complete Volume

Authors: Laurent Bulteau and Zsuzsanna Lipták

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

LIPIcs, Volume 259, CPM 2023, Complete Volume

Cite as

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 1-472, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Proceedings{bulteau_et_al:LIPIcs.CPM.2023,
  title =	{{LIPIcs, Volume 259, CPM 2023, Complete Volume}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{1--472},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023},
  URN =		{urn:nbn:de:0030-drops-179536},
  doi =		{10.4230/LIPIcs.CPM.2023},
  annote =	{Keywords: LIPIcs, Volume 259, CPM 2023, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.CPM.2023.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Laurent Bulteau and Zsuzsanna Lipták

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bulteau_et_al:LIPIcs.CPM.2023.0,
  author =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.0},
  URN =		{urn:nbn:de:0030-drops-179542},
  doi =		{10.4230/LIPIcs.CPM.2023.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.1

Trie-Compressed Adaptive Set Intersection

Authors: Diego Arroyuelo and Juan Pablo Castillo

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

We introduce space- and time-efficient algorithms and data structures for the offline set intersection problem. We show that a sorted integer set S ⊆ [0..u) of n elements can be represented using compressed space while supporting k-way intersections in adaptive O(kδlg(u/δ)) time, δ being the alternation measure introduced by Barbay and Kenyon. Our experimental results suggest that our approaches are competitive in practice, outperforming the most efficient alternatives (Partitioned Elias-Fano indexes, Roaring Bitmaps, and Recursive Universe Partitioning (RUP)) in several scenarios, offering in general relevant space-time trade-offs.

Cite as

Diego Arroyuelo and Juan Pablo Castillo. Trie-Compressed Adaptive Set Intersection. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 1:1-1:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{arroyuelo_et_al:LIPIcs.CPM.2023.1,
  author =	{Arroyuelo, Diego and Castillo, Juan Pablo},
  title =	{{Trie-Compressed Adaptive Set Intersection}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{1:1--1:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.1},
  URN =		{urn:nbn:de:0030-drops-179552},
  doi =		{10.4230/LIPIcs.CPM.2023.1},
  annote =	{Keywords: Set intersection problem, Adaptive Algorithms, Compressed and compact data structures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.2

Approximation Algorithms for the Longest Run Subsequence Problem

Authors: Yuichi Asahiro, Hiroshi Eto, Mingyang Gong, Jesper Jansson, Guohui Lin, Eiji Miyano, Hirotaka Ono, and Shunichi Tanaka

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

We study the approximability of the Longest Run Subsequence problem (LRS for short). For a string S = s_1 ⋯ s_n over an alphabet Σ, a run of a symbol σ ∈ Σ in S is a maximal substring of consecutive occurrences of σ. A run subsequence S' of S is a sequence in which every symbol σ ∈ Σ occurs in at most one run. Given a string S, the goal of LRS is to find a longest run subsequence S^* of S such that the length |S^*| is maximized over all the run subsequences of S. It is known that LRS is APX-hard even if each symbol has at most two occurrences in the input string, and that LRS admits a polynomial-time k-approximation algorithm if the number of occurrences of every symbol in the input string is bounded by k. In this paper, we design a polynomial-time (k+1)/2-approximation algorithm for LRS under the k-occurrence constraint on input strings. For the case k = 2, we further improve the approximation ratio from 3/2 to 4/3.

Cite as

Yuichi Asahiro, Hiroshi Eto, Mingyang Gong, Jesper Jansson, Guohui Lin, Eiji Miyano, Hirotaka Ono, and Shunichi Tanaka. Approximation Algorithms for the Longest Run Subsequence Problem. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 2:1-2:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{asahiro_et_al:LIPIcs.CPM.2023.2,
  author =	{Asahiro, Yuichi and Eto, Hiroshi and Gong, Mingyang and Jansson, Jesper and Lin, Guohui and Miyano, Eiji and Ono, Hirotaka and Tanaka, Shunichi},
  title =	{{Approximation Algorithms for the Longest Run Subsequence Problem}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{2:1--2:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.2},
  URN =		{urn:nbn:de:0030-drops-179560},
  doi =		{10.4230/LIPIcs.CPM.2023.2},
  annote =	{Keywords: Longest run subsequence problem, bounded occurrence, approximation algorithm}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.3

Optimal LZ-End Parsing Is Hard

Authors: Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, and Takeaki Uno

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

LZ-End is a variant of the well-known Lempel-Ziv parsing family such that each phrase of the parsing has a previous occurrence, with the additional constraint that the previous occurrence must end at the end of a previous phrase. LZ-End was initially proposed as a greedy parsing, where each phrase is determined greedily from left to right, as the longest factor that satisfies the above constraint [Kreft & Navarro, 2010]. In this work, we consider an optimal LZ-End parsing that has the minimum number of phrases in such parsings. We show that a decision version of computing the optimal LZ-End parsing is NP-complete by showing a reduction from the vertex cover problem. Moreover, we give a MAX-SAT formulation for the optimal LZ-End parsing adapting an approach for computing various NP-hard repetitiveness measures recently presented by [Bannai et al., 2022]. We also consider the approximation ratio of the size of greedy LZ-End parsing to the size of the optimal LZ-End parsing, and give a lower bound of the ratio which asymptotically approaches 2.

Cite as

Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, and Takeaki Uno. Optimal LZ-End Parsing Is Hard. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 3:1-3:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2023.3,
  author =	{Bannai, Hideo and Funakoshi, Mitsuru and Kurita, Kazuhiro and Nakashima, Yuto and Seto, Kazuhisa and Uno, Takeaki},
  title =	{{Optimal LZ-End Parsing Is Hard}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{3:1--3:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.3},
  URN =		{urn:nbn:de:0030-drops-179571},
  doi =		{10.4230/LIPIcs.CPM.2023.3},
  annote =	{Keywords: Data Compression, LZ-End, Repetitiveness measures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.4

Sliding Window String Indexing in Streams

Authors: Philip Bille, Johannes Fischer, Inge Li Gørtz, Max Rishøj Pedersen, and Tord Joakim Stordalen

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Given a string S over an alphabet Σ, the string indexing problem is to preprocess S to subsequently support efficient pattern matching queries, that is, given a pattern string P report all the occurrences of P in S. In this paper we study the streaming sliding window string indexing problem. Here the string S arrives as a stream, one character at a time, and the goal is to maintain an index of the last w characters, called the window, for a specified parameter w. At any point in time a pattern matching query for a pattern P may arrive, also streamed one character at a time, and all occurrences of P within the current window must be returned. The streaming sliding window string indexing problem naturally captures scenarios where we want to index the most recent data (i.e. the window) of a stream while supporting efficient pattern matching. Our main result is a simple O(w) space data structure that uses O(log w) time with high probability to process each character from both the input string S and any pattern string P. Reporting each occurrence of P uses additional constant time per reported occurrence. Compared to previous work in similar scenarios this result is the first to achieve an efficient worst-case time per character from the input stream with high probability. We also consider a delayed variant of the problem, where a query may be answered at any point within the next δ characters that arrive from either stream. We present an O(w + δ) space data structure for this problem that improves the above time bounds to O(log (w/δ)). In particular, for a delay of δ = ε w we obtain an O(w) space data structure with constant time processing per character. The key idea to achieve our result is a novel and simple hierarchical structure of suffix trees of independent interest, inspired by the classic log-structured merge trees.

Cite as

Philip Bille, Johannes Fischer, Inge Li Gørtz, Max Rishøj Pedersen, and Tord Joakim Stordalen. Sliding Window String Indexing in Streams. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2023.4,
  author =	{Bille, Philip and Fischer, Johannes and G{\o}rtz, Inge Li and Pedersen, Max Rish{\o}j and Stordalen, Tord Joakim},
  title =	{{Sliding Window String Indexing in Streams}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{4:1--4:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.4},
  URN =		{urn:nbn:de:0030-drops-179587},
  doi =		{10.4230/LIPIcs.CPM.2023.4},
  annote =	{Keywords: String indexing, pattern matching, sliding window, streaming}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.5

Faster Algorithms for Computing the Hairpin Completion Distance and Minimum Ancestor

Authors: Itai Boneh, Dvir Fried, Adrian Miclăuş, and Alexandru Popa

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Hairpin completion is an operation on formal languages that has been inspired by hairpin formation in DNA biochemistry and has many applications especially in DNA computing. Consider s to be a string over the alphabet {A, C, G, T} such that a prefix/suffix of it matches the reversed complement of a substring of s. Then, in a hairpin completion operation the reversed complement of this prefix/suffix is added to the start/end of s forming a new string. In this paper we study two problems related to the hairpin completion. The first problem asks the minimum number of hairpin operations necessary to transform one string into another, number that is called the hairpin completion distance. For this problem we show an algorithm of running time O(n²), where n is the maximum length of the two strings. Our algorithm improves on the algorithm of Manea (TCS 2010), that has running time O(n² log n). In the minimum distance common hairpin completion ancestor problem we want to find, for two input strings x and y, a string w that minimizes the sum of the hairpin completion distances to x and y. Similarly, we present an algorithm with running time O(n²) that improves by a O(log n) factor the algorithm of Manea (TCS 2010).

Cite as

Itai Boneh, Dvir Fried, Adrian Miclăuş, and Alexandru Popa. Faster Algorithms for Computing the Hairpin Completion Distance and Minimum Ancestor. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2023.5,
  author =	{Boneh, Itai and Fried, Dvir and Micl\u{a}u\c{s}, Adrian and Popa, Alexandru},
  title =	{{Faster Algorithms for Computing the Hairpin Completion Distance and Minimum Ancestor}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{5:1--5:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.5},
  URN =		{urn:nbn:de:0030-drops-179592},
  doi =		{10.4230/LIPIcs.CPM.2023.5},
  annote =	{Keywords: dynamic programming, incremental trees, exact algorithm}
}

46 Search Results for "Bulteau, Laurent"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message