DROPS

Document

DOI: 10.4230/LIPIcs.STACS.2024.21

Local Certification of Local Properties: Tight Bounds, Trade-Offs and New Parameters

Authors: Nicolas Bousquet, Laurent Feuilloley, and Sébastien Zeitoun

Published in: LIPIcs, Volume 289, 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024)

Abstract

Local certification is a distributed mechanism enabling the nodes of a network to check the correctness of the current configuration, thanks to small pieces of information called certificates. For many classic global properties, like checking the acyclicity of the network, the optimal size of the certificates depends on the size of the network, n. In this paper, we focus on properties for which the size of the certificates does not depend on n but on other parameters. We focus on three such important properties and prove tight bounds for all of them. Namely, we prove that the optimal certification size is: Θ(log k) for k-colorability (and even exactly ⌈ log k ⌉ bits in the anonymous model while previous works had only proved a 2-bit lower bound); (1/2)log t+o(log t) for dominating sets at distance t (an unexpected and tighter-than-usual bound) ; and Θ(log Δ) for perfect matching in graphs of maximum degree Δ (the first non-trivial bound parameterized by Δ). We also prove some surprising upper bounds, for example, certifying the existence of a perfect matching in a planar graph can be done with only two bits. In addition, we explore various specific cases for these properties, in particular improving our understanding of the trade-off between locality of the verification and certificate size.

Cite as

Nicolas Bousquet, Laurent Feuilloley, and Sébastien Zeitoun. Local Certification of Local Properties: Tight Bounds, Trade-Offs and New Parameters. In 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 289, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bousquet_et_al:LIPIcs.STACS.2024.21,
  author =	{Bousquet, Nicolas and Feuilloley, Laurent and Zeitoun, S\'{e}bastien},
  title =	{{Local Certification of Local Properties: Tight Bounds, Trade-Offs and New Parameters}},
  booktitle =	{41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024)},
  pages =	{21:1--21:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-311-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{289},
  editor =	{Beyersdorff, Olaf and Kant\'{e}, Mamadou Moustapha and Kupferman, Orna and Lokshtanov, Daniel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2024.21},
  URN =		{urn:nbn:de:0030-drops-197317},
  doi =		{10.4230/LIPIcs.STACS.2024.21},
  annote =	{Keywords: Local certification, local properties, proof-labeling schemes, locally checkable proofs, optimal certification size, colorability, dominating set, perfect matching, fault-tolerance, graph structure}
}

@InProceedings{bousquet_et_al:LIPIcs.STACS.2024.21,
  author =	{Bousquet, Nicolas and Feuilloley, Laurent and Zeitoun, S\'{e}bastien},
  title =	{{Local Certification of Local Properties: Tight Bounds, Trade-Offs and New Parameters}},
  booktitle =	{41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024)},
  pages =	{21:1--21:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-311-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{289},
  editor =	{Beyersdorff, Olaf and Kant\'{e}, Mamadou Moustapha and Kupferman, Orna and Lokshtanov, Daniel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2024.21},
  URN =		{urn:nbn:de:0030-drops-197317},
  doi =		{10.4230/LIPIcs.STACS.2024.21},
  annote =	{Keywords: Local certification, local properties, proof-labeling schemes, locally checkable proofs, optimal certification size, colorability, dominating set, perfect matching, fault-tolerance, graph structure}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.1

Trie-Compressed Adaptive Set Intersection

Authors: Diego Arroyuelo and Juan Pablo Castillo

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

We introduce space- and time-efficient algorithms and data structures for the offline set intersection problem. We show that a sorted integer set S ⊆ [0..u) of n elements can be represented using compressed space while supporting k-way intersections in adaptive O(kδlg(u/δ)) time, δ being the alternation measure introduced by Barbay and Kenyon. Our experimental results suggest that our approaches are competitive in practice, outperforming the most efficient alternatives (Partitioned Elias-Fano indexes, Roaring Bitmaps, and Recursive Universe Partitioning (RUP)) in several scenarios, offering in general relevant space-time trade-offs.

Cite as

Diego Arroyuelo and Juan Pablo Castillo. Trie-Compressed Adaptive Set Intersection. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 1:1-1:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{arroyuelo_et_al:LIPIcs.CPM.2023.1,
  author =	{Arroyuelo, Diego and Castillo, Juan Pablo},
  title =	{{Trie-Compressed Adaptive Set Intersection}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{1:1--1:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.1},
  URN =		{urn:nbn:de:0030-drops-179552},
  doi =		{10.4230/LIPIcs.CPM.2023.1},
  annote =	{Keywords: Set intersection problem, Adaptive Algorithms, Compressed and compact data structures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.2

Approximation Algorithms for the Longest Run Subsequence Problem

Authors: Yuichi Asahiro, Hiroshi Eto, Mingyang Gong, Jesper Jansson, Guohui Lin, Eiji Miyano, Hirotaka Ono, and Shunichi Tanaka

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

We study the approximability of the Longest Run Subsequence problem (LRS for short). For a string S = s_1 ⋯ s_n over an alphabet Σ, a run of a symbol σ ∈ Σ in S is a maximal substring of consecutive occurrences of σ. A run subsequence S' of S is a sequence in which every symbol σ ∈ Σ occurs in at most one run. Given a string S, the goal of LRS is to find a longest run subsequence S^* of S such that the length |S^*| is maximized over all the run subsequences of S. It is known that LRS is APX-hard even if each symbol has at most two occurrences in the input string, and that LRS admits a polynomial-time k-approximation algorithm if the number of occurrences of every symbol in the input string is bounded by k. In this paper, we design a polynomial-time (k+1)/2-approximation algorithm for LRS under the k-occurrence constraint on input strings. For the case k = 2, we further improve the approximation ratio from 3/2 to 4/3.

Cite as

Yuichi Asahiro, Hiroshi Eto, Mingyang Gong, Jesper Jansson, Guohui Lin, Eiji Miyano, Hirotaka Ono, and Shunichi Tanaka. Approximation Algorithms for the Longest Run Subsequence Problem. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 2:1-2:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{asahiro_et_al:LIPIcs.CPM.2023.2,
  author =	{Asahiro, Yuichi and Eto, Hiroshi and Gong, Mingyang and Jansson, Jesper and Lin, Guohui and Miyano, Eiji and Ono, Hirotaka and Tanaka, Shunichi},
  title =	{{Approximation Algorithms for the Longest Run Subsequence Problem}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{2:1--2:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.2},
  URN =		{urn:nbn:de:0030-drops-179560},
  doi =		{10.4230/LIPIcs.CPM.2023.2},
  annote =	{Keywords: Longest run subsequence problem, bounded occurrence, approximation algorithm}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.4

Sliding Window String Indexing in Streams

Authors: Philip Bille, Johannes Fischer, Inge Li Gørtz, Max Rishøj Pedersen, and Tord Joakim Stordalen

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Given a string S over an alphabet Σ, the string indexing problem is to preprocess S to subsequently support efficient pattern matching queries, that is, given a pattern string P report all the occurrences of P in S. In this paper we study the streaming sliding window string indexing problem. Here the string S arrives as a stream, one character at a time, and the goal is to maintain an index of the last w characters, called the window, for a specified parameter w. At any point in time a pattern matching query for a pattern P may arrive, also streamed one character at a time, and all occurrences of P within the current window must be returned. The streaming sliding window string indexing problem naturally captures scenarios where we want to index the most recent data (i.e. the window) of a stream while supporting efficient pattern matching. Our main result is a simple O(w) space data structure that uses O(log w) time with high probability to process each character from both the input string S and any pattern string P. Reporting each occurrence of P uses additional constant time per reported occurrence. Compared to previous work in similar scenarios this result is the first to achieve an efficient worst-case time per character from the input stream with high probability. We also consider a delayed variant of the problem, where a query may be answered at any point within the next δ characters that arrive from either stream. We present an O(w + δ) space data structure for this problem that improves the above time bounds to O(log (w/δ)). In particular, for a delay of δ = ε w we obtain an O(w) space data structure with constant time processing per character. The key idea to achieve our result is a novel and simple hierarchical structure of suffix trees of independent interest, inspired by the classic log-structured merge trees.

Cite as

Philip Bille, Johannes Fischer, Inge Li Gørtz, Max Rishøj Pedersen, and Tord Joakim Stordalen. Sliding Window String Indexing in Streams. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2023.4,
  author =	{Bille, Philip and Fischer, Johannes and G{\o}rtz, Inge Li and Pedersen, Max Rish{\o}j and Stordalen, Tord Joakim},
  title =	{{Sliding Window String Indexing in Streams}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{4:1--4:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.4},
  URN =		{urn:nbn:de:0030-drops-179587},
  doi =		{10.4230/LIPIcs.CPM.2023.4},
  annote =	{Keywords: String indexing, pattern matching, sliding window, streaming}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.5

Faster Algorithms for Computing the Hairpin Completion Distance and Minimum Ancestor

Authors: Itai Boneh, Dvir Fried, Adrian Miclăuş, and Alexandru Popa

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Hairpin completion is an operation on formal languages that has been inspired by hairpin formation in DNA biochemistry and has many applications especially in DNA computing. Consider s to be a string over the alphabet {A, C, G, T} such that a prefix/suffix of it matches the reversed complement of a substring of s. Then, in a hairpin completion operation the reversed complement of this prefix/suffix is added to the start/end of s forming a new string. In this paper we study two problems related to the hairpin completion. The first problem asks the minimum number of hairpin operations necessary to transform one string into another, number that is called the hairpin completion distance. For this problem we show an algorithm of running time O(n²), where n is the maximum length of the two strings. Our algorithm improves on the algorithm of Manea (TCS 2010), that has running time O(n² log n). In the minimum distance common hairpin completion ancestor problem we want to find, for two input strings x and y, a string w that minimizes the sum of the hairpin completion distances to x and y. Similarly, we present an algorithm with running time O(n²) that improves by a O(log n) factor the algorithm of Manea (TCS 2010).

Cite as

Itai Boneh, Dvir Fried, Adrian Miclăuş, and Alexandru Popa. Faster Algorithms for Computing the Hairpin Completion Distance and Minimum Ancestor. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2023.5,
  author =	{Boneh, Itai and Fried, Dvir and Micl\u{a}u\c{s}, Adrian and Popa, Alexandru},
  title =	{{Faster Algorithms for Computing the Hairpin Completion Distance and Minimum Ancestor}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{5:1--5:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.5},
  URN =		{urn:nbn:de:0030-drops-179592},
  doi =		{10.4230/LIPIcs.CPM.2023.5},
  annote =	{Keywords: dynamic programming, incremental trees, exact algorithm}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.7

Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond

Authors: Manuel Cáceres

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

The problem of String Matching to Labeled Graphs (SMLG) asks to find all the paths in a labeled graph G = (V, E) whose spellings match that of an input string S ∈ Σ^m. SMLG can be solved in quadratic O(m|E|) time [Amir et al., JALG 2000], which was proven to be optimal by a recent lower bound conditioned on SETH [Equi et al., ICALP 2019]. The lower bound states that no strongly subquadratic time algorithm exists, even if restricted to directed acyclic graphs (DAGs). In this work we present the first parameterized algorithms for SMLG on DAGs. Our parameters capture the topological structure of G. All our results are derived from a generalization of the Knuth-Morris-Pratt algorithm [Park and Kim, CPM 1995] optimized to work in time proportional to the number of prefix-incomparable matches. To obtain the parameterization in the topological structure of G, we first study a special class of DAGs called funnels [Millani et al., JCO 2020] and generalize them to k-funnels and the class ST_k. We present several novel characterizations and algorithmic contributions on both funnels and their generalizations.

Cite as

Manuel Cáceres. Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{caceres:LIPIcs.CPM.2023.7,
  author =	{C\'{a}ceres, Manuel},
  title =	{{Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{7:1--7:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.7},
  URN =		{urn:nbn:de:0030-drops-179619},
  doi =		{10.4230/LIPIcs.CPM.2023.7},
  annote =	{Keywords: string matching, parameterized algorithms, FPT inside P, string algorithms, graph algorithms, directed acyclic graphs, labeled graphs, funnels}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.14

MUL-Tree Pruning for Consistency and Compatibility

Authors: Christopher Hampson, Daniel J. Harvey, Costas S. Iliopoulos, Jesper Jansson, Zara Lim, and Wing-Kin Sung

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

A multi-labelled tree (or MUL-tree) is a rooted tree leaf-labelled by a set of labels, where each label may appear more than once in the tree. We consider the MUL-tree Set Pruning for Consistency problem (MULSETPC), which takes as input a set of MUL-trees and asks whether there exists a perfect pruning of each MUL-tree that results in a consistent set of single-labelled trees. MULSETPC was proven to be NP-complete by Gascon et al. when the MUL-trees are binary, each leaf label is used at most three times, and the number of MUL-trees is unbounded. To determine the computational complexity of the problem when the number of MUL-trees is constant was left as an open problem. Here, we resolve this question by proving a much stronger result, namely that MULSETPC is NP-complete even when there are only two MUL-trees, every leaf label is used at most twice, and every MUL-tree is either binary or has constant height. Furthermore, we introduce an extension of MULSETPC that we call MULSETPComp, which replaces the notion of consistency with compatibility, and prove that MULSETPComp is NP-complete even when there are only two MUL-trees, every leaf label is used at most thrice, and every MUL-tree has constant height. Finally, we present a polynomial-time algorithm for instances of MULSETPC with a constant number of binary MUL-trees, in the special case where every leaf label occurs exactly once in at least one MUL-tree.

Cite as

Christopher Hampson, Daniel J. Harvey, Costas S. Iliopoulos, Jesper Jansson, Zara Lim, and Wing-Kin Sung. MUL-Tree Pruning for Consistency and Compatibility. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{hampson_et_al:LIPIcs.CPM.2023.14,
  author =	{Hampson, Christopher and Harvey, Daniel J. and Iliopoulos, Costas S. and Jansson, Jesper and Lim, Zara and Sung, Wing-Kin},
  title =	{{MUL-Tree Pruning for Consistency and Compatibility}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.14},
  URN =		{urn:nbn:de:0030-drops-179682},
  doi =		{10.4230/LIPIcs.CPM.2023.14},
  annote =	{Keywords: multi-labelled tree, phylogenetic tree, consistent, compatible, pruning, algorithm, NP-complete}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.15

Linear-Time Computation of Cyclic Roots and Cyclic Covers of a String

Authors: Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Cyclic versions of covers and roots of a string are considered in this paper. A prefix V of a string S is a cyclic root of S if S is a concatenation of cyclic rotations of V. A prefix V of S is a cyclic cover of S if the occurrences of the cyclic rotations of V cover all positions of S. We present 𝒪(n)-time algorithms computing all cyclic roots (using number-theoretic tools) and all cyclic covers (using tools related to seeds) of a length-n string over an integer alphabet. Our results improve upon 𝒪(n log log n) and 𝒪(n log n) time complexities of recent algorithms of Grossi et al. (WALCOM 2023) for the respective problems and provide novel approaches to the problems. As a by-product, we obtain an optimal data structure for Internal Circular Pattern Matching queries that generalize Internal Pattern Matching and Cyclic Equivalence queries of Kociumaka et al. (SODA 2015).

Cite as

Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Linear-Time Computation of Cyclic Roots and Cyclic Covers of a String. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{iliopoulos_et_al:LIPIcs.CPM.2023.15,
  author =	{Iliopoulos, Costas S. and Kociumaka, Tomasz and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor},
  title =	{{Linear-Time Computation of Cyclic Roots and Cyclic Covers of a String}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.15},
  URN =		{urn:nbn:de:0030-drops-179697},
  doi =		{10.4230/LIPIcs.CPM.2023.15},
  annote =	{Keywords: cyclic cover, cyclic root, circular pattern matching, internal pattern matching}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.16

Faster Prefix-Sorting Algorithms for Deterministic Finite Automata

Authors: Sung-Hwan Kim, Francisco Olivares, and Nicola Prezza

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Sorting is a fundamental algorithmic pre-processing technique which often allows to represent data more compactly and, at the same time, speeds up search queries on it. In this paper, we focus on the well-studied problem of sorting and indexing string sets. Since the introduction of suffix trees in 1973, dozens of suffix sorting algorithms have been described in the literature. In 2017, these techniques were extended to sets of strings described by means of finite automata: the theory of Wheeler graphs [Gagie et al., TCS'17] introduced automata whose states can be totally-sorted according to the co-lexicographic (co-lex in the following) order of the prefixes of words accepted by the automaton. More recently, in [Cotumaccio, Prezza, SODA'21] it was shown how to extend these ideas to arbitrary automata by means of partial co-lex orders. This work showed that a co-lex order of minimum width (thus optimizing search query times) on deterministic finite automata (DFAs) can be computed in O(m² + n^{5/2}) time, m being the number of transitions and n the number of states of the input DFA. In this paper, we exhibit new combinatorial properties of the minimum-width co-lex order of DFAs and exploit them to design faster prefix sorting algorithms. In particular, we describe two algorithms sorting arbitrary DFAs in O(mn) and O(n² log n) time, respectively, and an algorithm sorting acyclic DFAs in O(m log n) time. Within these running times, all algorithms compute also a smallest chain partition of the partial order (required to index the DFA). We present an experiment result to show that an optimized implementation of the O(n² log n)-time algorithm exhibits a nearly-linear behaviour on large deterministic pan-genomic graphs and is thus also of practical interest.

Cite as

Sung-Hwan Kim, Francisco Olivares, and Nicola Prezza. Faster Prefix-Sorting Algorithms for Deterministic Finite Automata. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 16:1-16:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{kim_et_al:LIPIcs.CPM.2023.16,
  author =	{Kim, Sung-Hwan and Olivares, Francisco and Prezza, Nicola},
  title =	{{Faster Prefix-Sorting Algorithms for Deterministic Finite Automata}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{16:1--16:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.16},
  URN =		{urn:nbn:de:0030-drops-179707},
  doi =		{10.4230/LIPIcs.CPM.2023.16},
  annote =	{Keywords: String Matching, Deterministic Finite Automata, Graph Indexing, Co-lexicographical Sorting}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.19

String Factorization via Prefix Free Families

Authors: Matan Kraus, Moshe Lewenstein, Alexandru Popa, Ely Porat, and Yonathan Sadia

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

A factorization of a string S is a partition of w into substrings u_1,… ,u_k such that S = u_1 u_2 ⋯ u_k. Such a partition is called equality-free if no two factors are equal: u_i ≠ u_j, ∀ i,j with i ≠ j. The maximum equality-free factorization problem is to find for a given string S, the largest integer k for which S admits an equality-free factorization with k factors. Equality-free factorizations have lately received attention because of their applications in DNA self-assembly. The best approximation algorithm known for the problem is the natural greedy algorithm, that chooses iteratively from left to right the shortest factor that does not appear before. This algorithm has a √n approximation ratio (SOFSEM 2020) and it is an open problem whether there is a better solution. Our main result is to show that the natural greedy algorithm is a Θ(n^{1/4}) approximation algorithm for the maximum equality-free factorization problem. Thus, we disprove one of the conjectures of Mincu and Popa (SOFSEM 2020) according to which the greedy algorithm is a Θ(√n) approximation. The most challenging part of the proof is to show that the greedy algorithm is an O(n^{1/4}) approximation. We obtain this algorithm via prefix free factor families, i.e. a set of non-overlapping factors of the string which are pairwise non-prefixes of each other. In the paper we show the relation between prefix free factor families and the maximum equality-free factorization. Moreover, as a byproduct we present another approximation algorithm that achieves an approximation ratio of O(n^{1/4}) that we believe is of independent interest and may lead to improved algorithms. We then show that the natural greedy algorithm has an approximation ratio that is Ω(n^{1/4}) via a clever analysis which shows that the greedy algorithm is Θ(n^{1/4}) for the maximum equality-free factorization problem.

Cite as

Matan Kraus, Moshe Lewenstein, Alexandru Popa, Ely Porat, and Yonathan Sadia. String Factorization via Prefix Free Families. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 19:1-19:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{kraus_et_al:LIPIcs.CPM.2023.19,
  author =	{Kraus, Matan and Lewenstein, Moshe and Popa, Alexandru and Porat, Ely and Sadia, Yonathan},
  title =	{{String Factorization via Prefix Free Families}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{19:1--19:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.19},
  URN =		{urn:nbn:de:0030-drops-179738},
  doi =		{10.4230/LIPIcs.CPM.2023.19},
  annote =	{Keywords: string factorization, NP-hard problem, approximation algorithm}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.20

Improving the Sensitivity of MinHash Through Hash-Value Analysis

Authors: Gregory Kucherov and Steven Skiena

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

MinHash sketching is an important algorithm for efficient document retrieval and bioinformatics. We show that the value of the matching MinHash codes convey additional information about the Jaccard similarity of S and T over and above the fact that the MinHash codes agree. This observation holds the potential to increase the sensitivity of minhash-based retrieval systems. We analyze the expected Jaccard similarity of two sets as a function of observing a matching MinHash value a under a reasonable prior distribution on intersection set sizes, and present a practical approach to using MinHash values to improve the sensitivity of traditional Jaccard similarity estimation, based on the Kolmogorov-Smirnov statistical test for sample distributions. Experiments over a wide range of hash function counts and set similarities show a small but consistent improvement over chance at predicting over/under-estimation, yielding an average accuracy of 61% over the range of experiments.

Cite as

Gregory Kucherov and Steven Skiena. Improving the Sensitivity of MinHash Through Hash-Value Analysis. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 20:1-20:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{kucherov_et_al:LIPIcs.CPM.2023.20,
  author =	{Kucherov, Gregory and Skiena, Steven},
  title =	{{Improving the Sensitivity of MinHash Through Hash-Value Analysis}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{20:1--20:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.20},
  URN =		{urn:nbn:de:0030-drops-179740},
  doi =		{10.4230/LIPIcs.CPM.2023.20},
  annote =	{Keywords: MinHash sketching, sequence similarity, hashing}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.22

Merging Sorted Lists of Similar Strings

Authors: Gene Myers

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Merging T sorted, non-redundant lists containing M elements into a single sorted, non-redundant result of size N ≥ M/T is a classic problem typically solved practically in O(M log T) time with a priority-queue data structure the most basic of which is the simple heap. We revisit this problem in the situation where the list elements are strings and the lists contain many identical or nearly identical elements. By keeping simple auxiliary information with each heap node, we devise an O(M log T+S) worst-case method that performs no more character comparisons than the sum of the lengths of all the strings S, and another O(M log (T/e¯)+S) method that becomes progressively more efficient as a function of the fraction of equal elements e¯ = M/N between input lists, reaching linear time when the lists are all identical. The methods perform favorably in practice versus an alternate formulation based on a trie.

Cite as

Gene Myers. Merging Sorted Lists of Similar Strings. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 22:1-22:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{myers:LIPIcs.CPM.2023.22,
  author =	{Myers, Gene},
  title =	{{Merging Sorted Lists of Similar Strings}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{22:1--22:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.22},
  URN =		{urn:nbn:de:0030-drops-179763},
  doi =		{10.4230/LIPIcs.CPM.2023.22},
  annote =	{Keywords: heap, trie, longest common prefix}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.25

L-Systems for Measuring Repetitiveness

Authors: Gonzalo Navarro and Cristian Urbina

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

In order to use them for compression, we extend L-systems (without ε-rules) with two parameters d and n, and also a coding τ, which determines unambiguously a string w = τ(φ^d(s))[1:n], where φ is the morphism of the system, and s is its axiom. The length of the shortest description of an L-system generating w is known as 𝓁, and it is arguably a relevant measure of repetitiveness that builds on the self-similarities that arise in the sequence. In this paper, we deepen the study of the measure 𝓁 and its relation with a better-established measure called δ, which builds on substring complexity. Our results show that 𝓁 and δ are largely orthogonal, in the sense that one can be much larger than the other, depending on the case. This suggests that both mechanisms capture different kinds of regularities related to repetitiveness. We then show that the recently introduced NU-systems, which combine the capabilities of L-systems with bidirectional macro schemes, can be asymptotically strictly smaller than both mechanisms for the same fixed string family, which makes the size ν of the smallest NU-system the unique smallest reachable repetitiveness measure to date. We conclude that in order to achieve better compression, we should combine morphism substitution with copy-paste mechanisms.

Cite as

Gonzalo Navarro and Cristian Urbina. L-Systems for Measuring Repetitiveness. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 25:1-25:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{navarro_et_al:LIPIcs.CPM.2023.25,
  author =	{Navarro, Gonzalo and Urbina, Cristian},
  title =	{{L-Systems for Measuring Repetitiveness}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{25:1--25:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.25},
  URN =		{urn:nbn:de:0030-drops-179792},
  doi =		{10.4230/LIPIcs.CPM.2023.25},
  annote =	{Keywords: L-systems, String morphisms, Repetitiveness measures, Text compression}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.26

MONI Can Find k-MEMs

Authors: Igor Tatarnikov, Ardavan Shahrabi Farahani, Sana Kashgouli, and Travis Gagie

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Suppose we are asked to index a text T [0..n - 1] such that, given a pattern P [0..m - 1], we can quickly report the maximal substrings of P that each occur in T at least k times. We first show how we can add O (r log n) bits to Rossi et al.’s recent MONI index, where r is the number of runs in the Burrows-Wheeler Transform of T, such that it supports such queries in O (k m log n) time. We then show how, if we are given k at construction time, we can reduce the query time to O (m log n).

Cite as

Igor Tatarnikov, Ardavan Shahrabi Farahani, Sana Kashgouli, and Travis Gagie. MONI Can Find k-MEMs. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{tatarnikov_et_al:LIPIcs.CPM.2023.26,
  author =	{Tatarnikov, Igor and Shahrabi Farahani, Ardavan and Kashgouli, Sana and Gagie, Travis},
  title =	{{MONI Can Find k-MEMs}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{26:1--26:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.26},
  URN =		{urn:nbn:de:0030-drops-179802},
  doi =		{10.4230/LIPIcs.CPM.2023.26},
  annote =	{Keywords: Compact data structures, Burrows-Wheeler Transform, run-length compression, maximal exact matches}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2022.74

Correlation Detection in Trees for Planted Graph Alignment

Authors: Luca Ganassali, Laurent Massoulié, and Marc Lelarge

Published in: LIPIcs, Volume 215, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)

Abstract

Motivated by alignment of correlated sparse random graphs, we study a hypothesis problem of deciding whether two random trees are correlated or not. Based on this correlation detection problem, we propose MPAlign, a message-passing algorithm for graph alignment, which we prove to succeed in polynomial time at partial alignment whenever tree detection is feasible. As a result our analysis of tree detection reveals new ranges of parameters for which partial alignment of sparse random graphs is feasible in polynomial time. We conjecture that the connection between partial graph alignment and tree detection runs deeper, and that the parameter range where tree detection is impossible, which we partially characterize, corresponds to a region where partial graph alignment is hard (not polytime feasible).

Cite as

Luca Ganassali, Laurent Massoulié, and Marc Lelarge. Correlation Detection in Trees for Planted Graph Alignment. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 74:1-74:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{ganassali_et_al:LIPIcs.ITCS.2022.74,
  author =	{Ganassali, Luca and Massouli\'{e}, Laurent and Lelarge, Marc},
  title =	{{Correlation Detection in Trees for Planted Graph Alignment}},
  booktitle =	{13th Innovations in Theoretical Computer Science Conference (ITCS 2022)},
  pages =	{74:1--74:8},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-217-4},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{215},
  editor =	{Braverman, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2022.74},
  URN =		{urn:nbn:de:0030-drops-156709},
  doi =		{10.4230/LIPIcs.ITCS.2022.74},
  annote =	{Keywords: inference on graphs, hypothesis testing, Erd\H{o}s-R\'{e}nyi random graphs}
}

37 Search Results for "Gourv�s, Laurent"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message