DROPS

Volume

LIPIcs, Volume 223

33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

CPM 2022, June 27-29, 2022, Prague, Czech Republic

Editors: Hideo Bannai and Jan Holub

Document

Complete Volume

DOI: 10.4230/LIPIcs.CPM.2022

LIPIcs, Volume 223, CPM 2022, Complete Volume

Authors: Hideo Bannai and Jan Holub

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

LIPIcs, Volume 223, CPM 2022, Complete Volume

Cite as

33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 1-470, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@Proceedings{bannai_et_al:LIPIcs.CPM.2022,
  title =	{{LIPIcs, Volume 223, CPM 2022, Complete Volume}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{1--470},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022},
  URN =		{urn:nbn:de:0030-drops-161265},
  doi =		{10.4230/LIPIcs.CPM.2022},
  annote =	{Keywords: LIPIcs, Volume 223, CPM 2022, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.CPM.2022.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Hideo Bannai and Jan Holub

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 0:i-0:xviii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2022.0,
  author =	{Bannai, Hideo and Holub, Jan},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{0:i--0:xviii},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.0},
  URN =		{urn:nbn:de:0030-drops-161271},
  doi =		{10.4230/LIPIcs.CPM.2022.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.CPM.2022.1

Invitation to Combinatorial Reconfiguration (Invited Talk)

Authors: Takehiro Ito

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Combinatorial reconfiguration studies reachability and related questions over the solution space formed by feasible solutions of an instance of a combinatorial search problem. For example, as the solution space for the satisfiability problem, we may consider the subgraph of the hypercube induced by the satisfying truth assignments of a given CNF formula. Then, the reachability problem for satisfiability is the problem of asking whether two given satisfying truth assignments are contained in the same connected component of the solution space. The study of reconfiguration problems has motivation from a variety of fields such as puzzles, statistical physics, and industry. In this decade, reconfiguration problems have been studied intensively for many central combinatorial search problems, such as satisfiability, independent set and coloring, from the algorithmic viewpoints. Many reconfiguration problems are PSPACE-complete in general, although several efficiently solvable cases have been obtained. In this talk, I will give a broad introduction of combinatorial reconfiguration.

Cite as

Takehiro Ito. Invitation to Combinatorial Reconfiguration (Invited Talk). In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{ito:LIPIcs.CPM.2022.1,
  author =	{Ito, Takehiro},
  title =	{{Invitation to Combinatorial Reconfiguration}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.1},
  URN =		{urn:nbn:de:0030-drops-161281},
  doi =		{10.4230/LIPIcs.CPM.2022.1},
  annote =	{Keywords: Combinatorial reconfiguration, graph algorithm}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.CPM.2022.2

Using Automata and a Decision Procedure to Prove Results in Pattern Matching (Invited Talk)

Authors: Jeffrey Shallit

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

The first-order theory of automatic sequences with addition is decidable, and this means that one can often prove combinatorial properties of these sequences "automatically", using the free software Walnut written by Hamoon Mousavi. In this talk I will explain how this is done, using as an example the measure of minimize size string attractor, introduced by Kempa and Prezza in 2018. Using the logic-based approach, we can also prove more general properties of string attractors for automatic sequences. This is joint work with Luke Schaeffer.

Cite as

Jeffrey Shallit. Using Automata and a Decision Procedure to Prove Results in Pattern Matching (Invited Talk). In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 2:1-2:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{shallit:LIPIcs.CPM.2022.2,
  author =	{Shallit, Jeffrey},
  title =	{{Using Automata and a Decision Procedure to Prove Results in Pattern Matching}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{2:1--2:3},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.2},
  URN =		{urn:nbn:de:0030-drops-161297},
  doi =		{10.4230/LIPIcs.CPM.2022.2},
  annote =	{Keywords: finite automata, decision procedure, automatic sequence, Thue-Morse sequence, Fibonacci word, string attractor}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.CPM.2022.3

Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc. (Invited Talk)

Authors: Sharma V. Thankachan

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

In the past two decades, we have witnessed the design of various compact data structures for pattern matching over an indexed text [Navarro, 2016]. Popular indexes like the FM-index [Paolo Ferragina and Giovanni Manzini, 2005], compressed suffix arrays/trees [Roberto Grossi and Jeffrey Scott Vitter, 2005; Kunihiko Sadakane, 2007], the recent r-index [Travis Gagie et al., 2020; Takaaki Nishimoto and Yasuo Tabei, 2021], etc., capture the key functionalities of classic suffix arrays/trees [Udi Manber and Eugene W. Myers, 1993; Peter Weiner, 1973] in compact space. Mostly, they rely on the Burrows-Wheeler Transform (BWT) and its associated operations [Burrows and Wheeler, 1994]. However, compactly encoding some advanced suffix tree (ST) variants, like parameterized ST [Brenda S. Baker, 1993; S. Rao Kosaraju, 1995; Juan Mendivelso et al., 2020], order-isomorphic/preserving ST [Maxime Crochemore et al., 2016], two-dimensional ST [Raffaele Giancarlo, 1995; Dong Kyue Kim et al., 1998], etc. [Sung Gwan Park et al., 2019; Tetsuo Shibuya, 2000]- collectively known as suffix trees with missing suffix links [Richard Cole and Ramesh Hariharan, 2003], has been challenging. The previous techniques are not easily extendable because these variants do not hold some structural properties of the standard ST that enable compression. However, some limited progress has been made in these directions recently [Arnab Ganguly et al., 2017; Travis Gagie et al., 2017; Gianni Decaroli et al., 2017; Dhrumil Patel and Rahul Shah, 2021; Arnab Ganguly et al., 2021; Sung{-}Hwan Kim and Hwan{-}Gue Cho, 2021; Sung{-}Hwan Kim and Hwan{-}Gue Cho, 2021; Arnab Ganguly et al., 2017; Arnab Ganguly et al., 2022; Arnab Ganguly et al., 2021]. This talk will briefly survey them and highlight some interesting open problems.

Cite as

Sharma V. Thankachan. Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc. (Invited Talk). In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 3:1-3:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{thankachan:LIPIcs.CPM.2022.3,
  author =	{Thankachan, Sharma V.},
  title =	{{Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc.}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{3:1--3:3},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.3},
  URN =		{urn:nbn:de:0030-drops-161300},
  doi =		{10.4230/LIPIcs.CPM.2022.3},
  annote =	{Keywords: Text Indexing, Suffix Trees, String Matching}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.4

The Fine-Grained Complexity of Episode Matching

Authors: Philip Bille, Inge Li Gørtz, Shay Mozes, Teresa Anna Steiner, and Oren Weimann

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Given two strings S and P, the Episode Matching problem is to find the shortest substring of S that contains P as a subsequence. The best known upper bound for this problem is Õ(nm) by Das et al. (1997), where n,m are the lengths of S and P, respectively. Although the problem is well studied and has many applications in data mining, this bound has never been improved. In this paper we show why this is the case by proving that no O((nm)^{1-ε}) algorithm (even for binary strings) exists, unless the Strong Exponential Time Hypothesis (SETH) is false. We then consider the indexing version of the problem, where S is preprocessed into a data structure for answering episode matching queries P. We show that for any τ, there is a data structure using O(n+(n/(τ)) ^k) space that answers episode matching queries for any P of length k in O(k⋅ τ ⋅ log log n) time. We complement this upper bound with an almost matching lower bound, showing that any data structure that answers episode matching queries for patterns of length k in time O(n^δ), must use Ω(n^{k-kδ-o(1)}) space, unless the Strong k-Set Disjointness Conjecture is false. Finally, for the special case of k = 2, we present a faster construction of the data structure using fast min-plus multiplication of bounded integer matrices.

Cite as

Philip Bille, Inge Li Gørtz, Shay Mozes, Teresa Anna Steiner, and Oren Weimann. The Fine-Grained Complexity of Episode Matching. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 4:1-4:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2022.4,
  author =	{Bille, Philip and G{\o}rtz, Inge Li and Mozes, Shay and Steiner, Teresa Anna and Weimann, Oren},
  title =	{{The Fine-Grained Complexity of Episode Matching}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{4:1--4:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.4},
  URN =		{urn:nbn:de:0030-drops-161312},
  doi =		{10.4230/LIPIcs.CPM.2022.4},
  annote =	{Keywords: Pattern matching, fine-grained complexity, longest common subsequence}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.5

Mechanical Proving with Walnut for Squares and Cubes in Partial Words

Authors: John Machacek

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Walnut is a software that can prove theorems in combinatorics on words about automatic sequences. We are able to apply this software to both prove new results as well as reprove some old results on avoiding squares and cubes in partial words. We also define the notion of an antisquare in a partial word and begin the study of binary partial words which contain only a fixed number of distinct squares and antisquares.

Cite as

John Machacek. Mechanical Proving with Walnut for Squares and Cubes in Partial Words. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 5:1-5:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{machacek:LIPIcs.CPM.2022.5,
  author =	{Machacek, John},
  title =	{{Mechanical Proving with Walnut for Squares and Cubes in Partial Words}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{5:1--5:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.5},
  URN =		{urn:nbn:de:0030-drops-161320},
  doi =		{10.4230/LIPIcs.CPM.2022.5},
  annote =	{Keywords: Partial words, squares, antisquares, cubes, Walnut}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.6

An FPT-Algorithm for Longest Common Subsequence Parameterized by the Maximum Number of Deletions

Authors: Laurent Bulteau, Mark Jones, Rolf Niedermeier, and Till Tantau

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

In the NP-hard Longest Common Subsequence problem (LCS), given a set of strings, the task is to find a string that can be obtained from every input string using as few deletions as possible. LCS is one of the most fundamental string problems with numerous applications in various areas, having gained a lot of attention in the algorithms and complexity research community. Significantly improving on an algorithm by Irving and Fraser [CPM'92], featured as a research challenge in a 2014 survey paper, we show that LCS is fixed-parameter tractable (FPT) when parameterized by the maximum number of deletions per input string. Given the relatively moderate running time of our algorithm (linear time when the parameter is a constant) and small parameter values to be expected in several applications, we believe that our purely theoretical analysis could finally pave the way to a new, exact and practically useful algorithm for this notoriously hard string problem.

Cite as

Laurent Bulteau, Mark Jones, Rolf Niedermeier, and Till Tantau. An FPT-Algorithm for Longest Common Subsequence Parameterized by the Maximum Number of Deletions. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 6:1-6:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bulteau_et_al:LIPIcs.CPM.2022.6,
  author =	{Bulteau, Laurent and Jones, Mark and Niedermeier, Rolf and Tantau, Till},
  title =	{{An FPT-Algorithm for Longest Common Subsequence Parameterized by the Maximum Number of Deletions}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{6:1--6:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.6},
  URN =		{urn:nbn:de:0030-drops-161338},
  doi =		{10.4230/LIPIcs.CPM.2022.6},
  annote =	{Keywords: NP-hard string problems, multiple sequence alignment, center string, parameterized complexity, search tree algorithms, enumerative algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.7

Beyond the Longest Letter-Duplicated Subsequence Problem

Authors: Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, and Peng Zou

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest letter-duplicated subsequence (LLDS) is proposed. Given a sequence S of length n, a letter-duplicated subsequence is a subsequence of S in the form of x₁^{d₁}x₂^{d₂}⋯ x_k^{d_k} with x_i ∈ Σ, x_j≠ x_{j+1} and d_i ≥ 2 for all i in [k] and j in [k-1]. A linear time algorithm for computing the longest letter-duplicated subsequence (LLDS) of S can be easily obtained. In this paper, we focus on two variants of this problem. We first consider the constrained version when Σ is unbounded, each letter appears in S at least 6 times and all the letters in Σ must appear in the solution. We show that the problem is NP-hard (a further twist indicates that the problem does not admit any polynomial time approximation). The reduction is from possibly the simplest version of SAT that is NP-complete, (≤ 2,1, ≤ 3)-SAT, where each variable appears at most twice positively and exact once negatively, and each clause contains at most three literals and some clauses must contain exactly two literals. (We hope that this technique will serve as a general tool to help us proving the NP-hardness for some more tricky sequence problems involving only one sequence - much harder than with at least two input sequences, which we apply successfully at the end of the paper on some extra variations of the LLDS problem.) We then show that when each letter appears in S at most 3 times, then the problem admits a factor 1.5-O(1/n) approximation. Finally, we consider the weighted version, where the weight of a block x_i^{d_i} (d_i ≥ 2) could be any positive function which might not grow with d_i. We give a non-trivial O(n²) time dynamic programming algorithm for this version, i.e., computing an LD-subsequence of S whose weight is maximized.

Cite as

Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, and Peng Zou. Beyond the Longest Letter-Duplicated Subsequence Problem. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 7:1-7:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{lai_et_al:LIPIcs.CPM.2022.7,
  author =	{Lai, Wenfeng and Liyanage, Adiesha and Zhu, Binhai and Zou, Peng},
  title =	{{Beyond the Longest Letter-Duplicated Subsequence Problem}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{7:1--7:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.7},
  URN =		{urn:nbn:de:0030-drops-161348},
  doi =		{10.4230/LIPIcs.CPM.2022.7},
  annote =	{Keywords: Segmental duplications, Tandem duplications, Longest common subsequence, NP-completeness, Dynamic programming}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.8

Reduction Ratio of the IS-Algorithm: Worst and Random Cases

Authors: Vincent Jugé

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

We study the IS-algorithm, a well-known linear-time algorithm for computing the suffix array of a word. This algorithm relies on transforming the input word w into another word, called the reduced word of w, that will be at least twice shorter; then, the algorithm recursively computes the suffix array of the reduced word. In this article, we study the reduction ratio of the IS-algorithm, i.e., the ratio between the lengths of the input word and the word obtained after reducing k times the input word. We investigate both worst cases, in which we find precise results, and random cases, where we prove some strong convergence phenomena. Finally, we prove that, if the input word is a randomly chosen word of length n, we should not expect much more than log(log(n)) recursive function calls.

Cite as

Vincent Jugé. Reduction Ratio of the IS-Algorithm: Worst and Random Cases. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 8:1-8:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{juge:LIPIcs.CPM.2022.8,
  author =	{Jug\'{e}, Vincent},
  title =	{{Reduction Ratio of the IS-Algorithm: Worst and Random Cases}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{8:1--8:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.8},
  URN =		{urn:nbn:de:0030-drops-161357},
  doi =		{10.4230/LIPIcs.CPM.2022.8},
  annote =	{Keywords: Word combinatorics, Suffix array, IS algorithm}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.9

Arbitrary-Length Analogs to de Bruijn Sequences

Authors: Abhinav Nellore and Rachel Ward

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Let α̃ be a length-L cyclic sequence of characters from a size-K alphabet 𝒜 such that for every positive integer m ≤ L, the number of occurrences of any length-m string on 𝒜 as a substring of α̃ is ⌊ L / K^m ⌋ or ⌈ L / K^m ⌉. When L = K^N for any positive integer N, α̃ is a de Bruijn sequence of order N, and when L ≠ K^N, α̃ shares many properties with de Bruijn sequences. We describe an algorithm that outputs some α̃ for any combination of K ≥ 2 and L ≥ 1 in O(L) time using O(L log K) space. This algorithm extends Lempel’s recursive construction of a binary de Bruijn sequence. An implementation written in Python is available at https://github.com/nelloreward/pkl.

Cite as

Abhinav Nellore and Rachel Ward. Arbitrary-Length Analogs to de Bruijn Sequences. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 9:1-9:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{nellore_et_al:LIPIcs.CPM.2022.9,
  author =	{Nellore, Abhinav and Ward, Rachel},
  title =	{{Arbitrary-Length Analogs to de Bruijn Sequences}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{9:1--9:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.9},
  URN =		{urn:nbn:de:0030-drops-161361},
  doi =		{10.4230/LIPIcs.CPM.2022.9},
  annote =	{Keywords: de Bruijn sequence, de Bruijn word, Lempel’s D-morphism, Lempel’s homomorphism}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.10

Partial Permutations Comparison, Maintenance and Applications

Authors: Avivit Levy, Ely Porat, and B. Riva Shalom

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

This paper focuses on the concept of partial permutations and their use in algorithmic tasks. A partial permutation over Σ is a bijection π_{par}: Σ₁↦Σ₂ mapping a subset Σ₁ ⊂ Σ to a subset Σ₂ ⊂ Σ, where |Σ₁| = |Σ₂| (|Σ| denotes the size of a set Σ). Intuitively, two partial permutations agree if their mapping pairs do not form conflicts. This notion, which is formally defined in this paper, enables a consistent as well as informatively rich comparison between partial permutations. We formalize the Partial Permutations Agreement problem (PPA), as follows. Given two sets A₁, A₂ of partial permutations over alphabet Σ, each of size n, output all pairs (π_i, π_j), where π_i ∈ A₁, π_j ∈ A₂ and π_i agrees with π_j. The possibility of having a data structure for efficiently maintaining a dynamic set of partial permutations enabling to retrieve agreement of partial permutations is then studied, giving both negative and positive results. Applying our study enables to point out fruitful versus futile methods for efficient genes sequences comparison in database or automatic color transformation data augmentation technique for image processing through neural networks. It also shows that an efficient solution of strict Parameterized Dictionary Matching with One Gap (PDMOG) over general dictionary alphabets is not likely, unless the Strong Exponential Time Hypothesis (SETH) fails, thus negatively answering an open question posed lately.

Cite as

Avivit Levy, Ely Porat, and B. Riva Shalom. Partial Permutations Comparison, Maintenance and Applications. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{levy_et_al:LIPIcs.CPM.2022.10,
  author =	{Levy, Avivit and Porat, Ely and Shalom, B. Riva},
  title =	{{Partial Permutations Comparison, Maintenance and Applications}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{10:1--10:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.10},
  URN =		{urn:nbn:de:0030-drops-161376},
  doi =		{10.4230/LIPIcs.CPM.2022.10},
  annote =	{Keywords: Partial permutations, Partial words, Genes comparison, Color transformation, Dictionary matching with gaps, Parameterized matching, SETH hypothesis}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.11

Bi-Directional r-Indexes

Authors: Yuma Arakawa, Gonzalo Navarro, and Kunihiko Sadakane

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Indexing highly repetitive texts is important in fields such as bioinformatics and versioned repositories. The run-length compression of the Burrows-Wheeler transform (BWT) provides a compressed representation particularly well-suited to text indexing. The r-index is one such index. It enables fast locating of occurrences of a pattern within O(r) words of space, where r is the number of equal-letter runs in the BWT. Its mechanism of locating is to maintain one suffix array sample along the backward-search of the pattern, and to compute all the pattern positions from that sample once the backward-search is complete. In this paper we develop this algorithm further, and propose a new bi-directional text index called the br-index, which supports extending the matched pattern both in forward and backward directions, and locating the occurrences of the pattern at any step of the search, within O(r+r_R) words of space, where r_R is the number of equal-letter runs in the BWT of the reversed text. Our experiments show that the br-index captures the long repetitions of the text, and outperforms the existing indexes in text searching allowing some mismatches except in an internal part.

Cite as

Yuma Arakawa, Gonzalo Navarro, and Kunihiko Sadakane. Bi-Directional r-Indexes. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{arakawa_et_al:LIPIcs.CPM.2022.11,
  author =	{Arakawa, Yuma and Navarro, Gonzalo and Sadakane, Kunihiko},
  title =	{{Bi-Directional r-Indexes}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{11:1--11:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.11},
  URN =		{urn:nbn:de:0030-drops-161386},
  doi =		{10.4230/LIPIcs.CPM.2022.11},
  annote =	{Keywords: Compressed text indexes, Burrows-Wheeler Transform, highly repetitive text collections}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.12

Making de Bruijn Graphs Eulerian

Authors: Giulia Bernardini, Huiping Chen, Grigorios Loukides, Solon P. Pissis, Leen Stougie, and Michelle Sweering

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

A directed multigraph is called Eulerian if it has a circuit which uses each edge exactly once. Euler’s theorem tells us that a weakly connected directed multigraph is Eulerian if and only if every node is balanced. Given a collection S of strings over an alphabet Σ, the de Bruijn graph (dBG) of order k of S is a directed multigraph G_{S,k}(V,E), where V is the set of length-(k-1) substrings of the strings in S, and G_{S,k} contains an edge (u,v) with multiplicity m_{u,v}, if and only if the string u[0]⋅ v is equal to the string u⋅ v[k-2] and this string occurs exactly m_{u,v} times in total in strings in S. Let G_{Σ,k}(V_{Σ,k},E_{Σ,k}) be the complete dBG of Σ^k. The Eulerian Extension (EE) problem on G_{S,k} asks to extend G_{S,k} with a set ℬ of nodes from V_{Σ,k} and a smallest multiset 𝒜 of edges from E_{Σ,k} to make it Eulerian. Note that extending dBGs is algorithmically much more challenging than extending general directed multigraphs because some edges in dBGs are by definition forbidden. Extending dBGs lies at the heart of sequence assembly [Medvedev et al., WABI 2007], one of the most important tasks in bioinformatics. The novelty of our work with respect to existing works is that we allow not only to duplicate existing edges of G_{S,k} but to also add novel edges and nodes, in an effort to (i) connect multiple components and (ii) reduce the total EE cost. It is easy to show that EE on G_{S,k} is NP-hard via a reduction from shortest common superstring. We further show that EE remains NP-hard, even when we are not allowed to add new nodes, via a highly non-trivial reduction from 3-SAT. We thus investigate the following two problems underlying EE in dBGs: 1) When G_{S,k} is not weakly connected, we are asked to connect its d > 1 components using a minimum-weight spanning tree, whose edges are paths on the underlying G_{Σ,k} and weights are the corresponding path lengths. This way of connecting guarantees that no new unbalanced node is added. We show that this problem can be solved in 𝒪(|V|klog d+|E|) time, which is nearly optimal, since the size of G_{S,k} is Θ(|V|k+|E|). 2) When G_{S,k} is not balanced, we are asked to extend G_{S,k} to H_{S,k}(V∪ℬ,E∪𝒜) such that every node of H_{S,k} is balanced and the total number |𝒜| of added edges is minimized. We show that this problem can be solved in the optimal 𝒪(k|V| + |E|+ |𝒜|) time. Let us stress that, although our main contributions are theoretical, the algorithms we design for the above two problems are practical. We combine the two algorithms in one method that makes any dBG Eulerian; and show experimentally that the cost of the obtained feasible solutions on real-world dBGs is substantially smaller than the corresponding cost obtained by existing greedy approaches.

Cite as

Giulia Bernardini, Huiping Chen, Grigorios Loukides, Solon P. Pissis, Leen Stougie, and Michelle Sweering. Making de Bruijn Graphs Eulerian. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bernardini_et_al:LIPIcs.CPM.2022.12,
  author =	{Bernardini, Giulia and Chen, Huiping and Loukides, Grigorios and Pissis, Solon P. and Stougie, Leen and Sweering, Michelle},
  title =	{{Making de Bruijn Graphs Eulerian}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{12:1--12:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.12},
  URN =		{urn:nbn:de:0030-drops-161391},
  doi =		{10.4230/LIPIcs.CPM.2022.12},
  annote =	{Keywords: string algorithms, graph algorithms, Eulerian graph, de Bruijn graph}
}

33 Search Results for "Holub, Jan"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message