DROPS

Volume

LIPIcs, Volume 223

33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

CPM 2022, June 27-29, 2022, Prague, Czech Republic

Editors: Hideo Bannai and Jan Holub

Document

DOI: 10.4230/LIPIcs.ESA.2023.14

Lyndon Arrays in Sublinear Time

Authors: Hideo Bannai and Jonas Ellert

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Abstract

A Lyndon word is a string that is lexicographically smaller than all of its non-trivial suffixes. For example, airbus is a Lyndon word, but amtrak is not a Lyndon word due to its suffix ak. The Lyndon array stores the length of the longest Lyndon prefix of each suffix of a string. For a length-n string over a general ordered alphabet, the array can be computed in O(n) time (Bille et al., ICALP 2020). However, on a word-RAM of word-width w ≥ log₂ n, linear time is not optimal if the string is over integer alphabet {0, … , σ} with σ ≪ n. In this case, the string can be stored in O(n log σ) bits (or O(n / log_σ n) words) of memory, and reading it takes only O(n / log_σ n) time. We show that O(n / log_σ n) time and words of space suffice to compute the succinct 2n-bit version of the Lyndon array. The time is optimal for w = O(log n). The algorithm uses precomputed lookup tables to perform significant parts of the computation in constant time. This is possible due to properties of periodic substrings, which we carefully analyze to achieve the desired result. We envision that the algorithm has applications in the computation of runs (maximal periodic substrings), where the Lyndon array plays a central role in both theoretically and practically fast algorithms.

Cite as

Hideo Bannai and Jonas Ellert. Lyndon Arrays in Sublinear Time. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 14:1-14:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.ESA.2023.14,
  author =	{Bannai, Hideo and Ellert, Jonas},
  title =	{{Lyndon Arrays in Sublinear Time}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{14:1--14:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.14},
  URN =		{urn:nbn:de:0030-drops-186670},
  doi =		{10.4230/LIPIcs.ESA.2023.14},
  annote =	{Keywords: Lyndon forest, Lyndon table, Lyndon array, sublinear time algorithms, word RAM algorithms, word packing, tabulation, lookup tables, periodicity}
}

Document

DOI: 10.4230/LIPIcs.WABI.2023.13

Acceleration of FM-Index Queries Through Prefix-Free Parsing

Authors: Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, and Travis Gagie

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

FM-indexes are a crucial data structure in DNA alignment, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [Ferragina and Fischer, 2007] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. Last year, Deng et al. [Deng et al., 2022] proposed parsing genomic data by induced suffix sorting, and showed the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing - which takes parameters that let us tune the average length of the phrases - instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38. And was consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it is very clear that our method accelerates the performance of count over all state-of-the-art methods with a minor increase in the memory.

Cite as

Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, and Travis Gagie. Acceleration of FM-Index Queries Through Prefix-Free Parsing. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 13:1-13:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{hong_et_al:LIPIcs.WABI.2023.13,
  author =	{Hong, Aaron and Oliva, Marco and K\"{o}ppl, Dominik and Bannai, Hideo and Boucher, Christina and Gagie, Travis},
  title =	{{Acceleration of FM-Index Queries Through Prefix-Free Parsing}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{13:1--13:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.13},
  URN =		{urn:nbn:de:0030-drops-186390},
  doi =		{10.4230/LIPIcs.WABI.2023.13},
  annote =	{Keywords: FM-index, pangenomics, scalability, word-based indexing, random access}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.3

Optimal LZ-End Parsing Is Hard

Authors: Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, and Takeaki Uno

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

LZ-End is a variant of the well-known Lempel-Ziv parsing family such that each phrase of the parsing has a previous occurrence, with the additional constraint that the previous occurrence must end at the end of a previous phrase. LZ-End was initially proposed as a greedy parsing, where each phrase is determined greedily from left to right, as the longest factor that satisfies the above constraint [Kreft & Navarro, 2010]. In this work, we consider an optimal LZ-End parsing that has the minimum number of phrases in such parsings. We show that a decision version of computing the optimal LZ-End parsing is NP-complete by showing a reduction from the vertex cover problem. Moreover, we give a MAX-SAT formulation for the optimal LZ-End parsing adapting an approach for computing various NP-hard repetitiveness measures recently presented by [Bannai et al., 2022]. We also consider the approximation ratio of the size of greedy LZ-End parsing to the size of the optimal LZ-End parsing, and give a lower bound of the ratio which asymptotically approaches 2.

Cite as

Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, and Takeaki Uno. Optimal LZ-End Parsing Is Hard. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 3:1-3:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2023.3,
  author =	{Bannai, Hideo and Funakoshi, Mitsuru and Kurita, Kazuhiro and Nakashima, Yuto and Seto, Kazuhisa and Uno, Takeaki},
  title =	{{Optimal LZ-End Parsing Is Hard}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{3:1--3:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.3},
  URN =		{urn:nbn:de:0030-drops-179571},
  doi =		{10.4230/LIPIcs.CPM.2023.3},
  annote =	{Keywords: Data Compression, LZ-End, Repetitiveness measures}
}

Document

DOI: 10.4230/LIPIcs.ESA.2022.12

Computing NP-Hard Repetitiveness Measures via MAX-SAT

Authors: Hideo Bannai, Keisuke Goto, Masakazu Ishihata, Shunsuke Kanda, Dominik Köppl, and Takaaki Nishimoto

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

Abstract

Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional macro scheme, and the smallest size of a straight-line program. While a vast variety of implementations for heuristically computing approximations exist, exact computation of these measures has received little to no attention. In this paper, we present MAX-SAT formulations that provide the first non-trivial implementations for exact computation of smallest string attractors, smallest bidirectional macro schemes, and smallest straight-line programs. Computational experiments show that our implementations work for texts of length up to a few hundred for straight-line programs and bidirectional macro schemes, and texts even over a million for string attractors.

Cite as

Hideo Bannai, Keisuke Goto, Masakazu Ishihata, Shunsuke Kanda, Dominik Köppl, and Takaaki Nishimoto. Computing NP-Hard Repetitiveness Measures via MAX-SAT. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 12:1-12:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.ESA.2022.12,
  author =	{Bannai, Hideo and Goto, Keisuke and Ishihata, Masakazu and Kanda, Shunsuke and K\"{o}ppl, Dominik and Nishimoto, Takaaki},
  title =	{{Computing NP-Hard Repetitiveness Measures via MAX-SAT}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{12:1--12:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.12},
  URN =		{urn:nbn:de:0030-drops-169505},
  doi =		{10.4230/LIPIcs.ESA.2022.12},
  annote =	{Keywords: repetitiveness measures, string attractor, bidirectional macro scheme}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.CPM.2022

LIPIcs, Volume 223, CPM 2022, Complete Volume

Authors: Hideo Bannai and Jan Holub

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

LIPIcs, Volume 223, CPM 2022, Complete Volume

Cite as

33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 1-470, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@Proceedings{bannai_et_al:LIPIcs.CPM.2022,
  title =	{{LIPIcs, Volume 223, CPM 2022, Complete Volume}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{1--470},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022},
  URN =		{urn:nbn:de:0030-drops-161265},
  doi =		{10.4230/LIPIcs.CPM.2022},
  annote =	{Keywords: LIPIcs, Volume 223, CPM 2022, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.CPM.2022.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Hideo Bannai and Jan Holub

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 0:i-0:xviii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2022.0,
  author =	{Bannai, Hideo and Holub, Jan},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{0:i--0:xviii},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.0},
  URN =		{urn:nbn:de:0030-drops-161271},
  doi =		{10.4230/LIPIcs.CPM.2022.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.CPM.2022.1

Invitation to Combinatorial Reconfiguration (Invited Talk)

Authors: Takehiro Ito

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Combinatorial reconfiguration studies reachability and related questions over the solution space formed by feasible solutions of an instance of a combinatorial search problem. For example, as the solution space for the satisfiability problem, we may consider the subgraph of the hypercube induced by the satisfying truth assignments of a given CNF formula. Then, the reachability problem for satisfiability is the problem of asking whether two given satisfying truth assignments are contained in the same connected component of the solution space. The study of reconfiguration problems has motivation from a variety of fields such as puzzles, statistical physics, and industry. In this decade, reconfiguration problems have been studied intensively for many central combinatorial search problems, such as satisfiability, independent set and coloring, from the algorithmic viewpoints. Many reconfiguration problems are PSPACE-complete in general, although several efficiently solvable cases have been obtained. In this talk, I will give a broad introduction of combinatorial reconfiguration.

Cite as

Takehiro Ito. Invitation to Combinatorial Reconfiguration (Invited Talk). In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{ito:LIPIcs.CPM.2022.1,
  author =	{Ito, Takehiro},
  title =	{{Invitation to Combinatorial Reconfiguration}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.1},
  URN =		{urn:nbn:de:0030-drops-161281},
  doi =		{10.4230/LIPIcs.CPM.2022.1},
  annote =	{Keywords: Combinatorial reconfiguration, graph algorithm}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.CPM.2022.2

Using Automata and a Decision Procedure to Prove Results in Pattern Matching (Invited Talk)

Authors: Jeffrey Shallit

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

The first-order theory of automatic sequences with addition is decidable, and this means that one can often prove combinatorial properties of these sequences "automatically", using the free software Walnut written by Hamoon Mousavi. In this talk I will explain how this is done, using as an example the measure of minimize size string attractor, introduced by Kempa and Prezza in 2018. Using the logic-based approach, we can also prove more general properties of string attractors for automatic sequences. This is joint work with Luke Schaeffer.

Cite as

Jeffrey Shallit. Using Automata and a Decision Procedure to Prove Results in Pattern Matching (Invited Talk). In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 2:1-2:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{shallit:LIPIcs.CPM.2022.2,
  author =	{Shallit, Jeffrey},
  title =	{{Using Automata and a Decision Procedure to Prove Results in Pattern Matching}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{2:1--2:3},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.2},
  URN =		{urn:nbn:de:0030-drops-161297},
  doi =		{10.4230/LIPIcs.CPM.2022.2},
  annote =	{Keywords: finite automata, decision procedure, automatic sequence, Thue-Morse sequence, Fibonacci word, string attractor}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.CPM.2022.3

Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc. (Invited Talk)

Authors: Sharma V. Thankachan

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

In the past two decades, we have witnessed the design of various compact data structures for pattern matching over an indexed text [Navarro, 2016]. Popular indexes like the FM-index [Paolo Ferragina and Giovanni Manzini, 2005], compressed suffix arrays/trees [Roberto Grossi and Jeffrey Scott Vitter, 2005; Kunihiko Sadakane, 2007], the recent r-index [Travis Gagie et al., 2020; Takaaki Nishimoto and Yasuo Tabei, 2021], etc., capture the key functionalities of classic suffix arrays/trees [Udi Manber and Eugene W. Myers, 1993; Peter Weiner, 1973] in compact space. Mostly, they rely on the Burrows-Wheeler Transform (BWT) and its associated operations [Burrows and Wheeler, 1994]. However, compactly encoding some advanced suffix tree (ST) variants, like parameterized ST [Brenda S. Baker, 1993; S. Rao Kosaraju, 1995; Juan Mendivelso et al., 2020], order-isomorphic/preserving ST [Maxime Crochemore et al., 2016], two-dimensional ST [Raffaele Giancarlo, 1995; Dong Kyue Kim et al., 1998], etc. [Sung Gwan Park et al., 2019; Tetsuo Shibuya, 2000]- collectively known as suffix trees with missing suffix links [Richard Cole and Ramesh Hariharan, 2003], has been challenging. The previous techniques are not easily extendable because these variants do not hold some structural properties of the standard ST that enable compression. However, some limited progress has been made in these directions recently [Arnab Ganguly et al., 2017; Travis Gagie et al., 2017; Gianni Decaroli et al., 2017; Dhrumil Patel and Rahul Shah, 2021; Arnab Ganguly et al., 2021; Sung{-}Hwan Kim and Hwan{-}Gue Cho, 2021; Sung{-}Hwan Kim and Hwan{-}Gue Cho, 2021; Arnab Ganguly et al., 2017; Arnab Ganguly et al., 2022; Arnab Ganguly et al., 2021]. This talk will briefly survey them and highlight some interesting open problems.

Cite as

Sharma V. Thankachan. Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc. (Invited Talk). In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 3:1-3:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{thankachan:LIPIcs.CPM.2022.3,
  author =	{Thankachan, Sharma V.},
  title =	{{Compact Text Indexing for Advanced Pattern Matching Problems: Parameterized, Order-Isomorphic, 2D, etc.}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{3:1--3:3},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.3},
  URN =		{urn:nbn:de:0030-drops-161300},
  doi =		{10.4230/LIPIcs.CPM.2022.3},
  annote =	{Keywords: Text Indexing, Suffix Trees, String Matching}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.4

The Fine-Grained Complexity of Episode Matching

Authors: Philip Bille, Inge Li Gørtz, Shay Mozes, Teresa Anna Steiner, and Oren Weimann

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Given two strings S and P, the Episode Matching problem is to find the shortest substring of S that contains P as a subsequence. The best known upper bound for this problem is Õ(nm) by Das et al. (1997), where n,m are the lengths of S and P, respectively. Although the problem is well studied and has many applications in data mining, this bound has never been improved. In this paper we show why this is the case by proving that no O((nm)^{1-ε}) algorithm (even for binary strings) exists, unless the Strong Exponential Time Hypothesis (SETH) is false. We then consider the indexing version of the problem, where S is preprocessed into a data structure for answering episode matching queries P. We show that for any τ, there is a data structure using O(n+(n/(τ)) ^k) space that answers episode matching queries for any P of length k in O(k⋅ τ ⋅ log log n) time. We complement this upper bound with an almost matching lower bound, showing that any data structure that answers episode matching queries for patterns of length k in time O(n^δ), must use Ω(n^{k-kδ-o(1)}) space, unless the Strong k-Set Disjointness Conjecture is false. Finally, for the special case of k = 2, we present a faster construction of the data structure using fast min-plus multiplication of bounded integer matrices.

Cite as

Philip Bille, Inge Li Gørtz, Shay Mozes, Teresa Anna Steiner, and Oren Weimann. The Fine-Grained Complexity of Episode Matching. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 4:1-4:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2022.4,
  author =	{Bille, Philip and G{\o}rtz, Inge Li and Mozes, Shay and Steiner, Teresa Anna and Weimann, Oren},
  title =	{{The Fine-Grained Complexity of Episode Matching}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{4:1--4:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.4},
  URN =		{urn:nbn:de:0030-drops-161312},
  doi =		{10.4230/LIPIcs.CPM.2022.4},
  annote =	{Keywords: Pattern matching, fine-grained complexity, longest common subsequence}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.5

Mechanical Proving with Walnut for Squares and Cubes in Partial Words

Authors: John Machacek

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Walnut is a software that can prove theorems in combinatorics on words about automatic sequences. We are able to apply this software to both prove new results as well as reprove some old results on avoiding squares and cubes in partial words. We also define the notion of an antisquare in a partial word and begin the study of binary partial words which contain only a fixed number of distinct squares and antisquares.

Cite as

John Machacek. Mechanical Proving with Walnut for Squares and Cubes in Partial Words. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 5:1-5:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{machacek:LIPIcs.CPM.2022.5,
  author =	{Machacek, John},
  title =	{{Mechanical Proving with Walnut for Squares and Cubes in Partial Words}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{5:1--5:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.5},
  URN =		{urn:nbn:de:0030-drops-161320},
  doi =		{10.4230/LIPIcs.CPM.2022.5},
  annote =	{Keywords: Partial words, squares, antisquares, cubes, Walnut}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.6

An FPT-Algorithm for Longest Common Subsequence Parameterized by the Maximum Number of Deletions

Authors: Laurent Bulteau, Mark Jones, Rolf Niedermeier, and Till Tantau

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

In the NP-hard Longest Common Subsequence problem (LCS), given a set of strings, the task is to find a string that can be obtained from every input string using as few deletions as possible. LCS is one of the most fundamental string problems with numerous applications in various areas, having gained a lot of attention in the algorithms and complexity research community. Significantly improving on an algorithm by Irving and Fraser [CPM'92], featured as a research challenge in a 2014 survey paper, we show that LCS is fixed-parameter tractable (FPT) when parameterized by the maximum number of deletions per input string. Given the relatively moderate running time of our algorithm (linear time when the parameter is a constant) and small parameter values to be expected in several applications, we believe that our purely theoretical analysis could finally pave the way to a new, exact and practically useful algorithm for this notoriously hard string problem.

Cite as

Laurent Bulteau, Mark Jones, Rolf Niedermeier, and Till Tantau. An FPT-Algorithm for Longest Common Subsequence Parameterized by the Maximum Number of Deletions. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 6:1-6:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bulteau_et_al:LIPIcs.CPM.2022.6,
  author =	{Bulteau, Laurent and Jones, Mark and Niedermeier, Rolf and Tantau, Till},
  title =	{{An FPT-Algorithm for Longest Common Subsequence Parameterized by the Maximum Number of Deletions}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{6:1--6:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.6},
  URN =		{urn:nbn:de:0030-drops-161338},
  doi =		{10.4230/LIPIcs.CPM.2022.6},
  annote =	{Keywords: NP-hard string problems, multiple sequence alignment, center string, parameterized complexity, search tree algorithms, enumerative algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.7

Beyond the Longest Letter-Duplicated Subsequence Problem

Authors: Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, and Peng Zou

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest letter-duplicated subsequence (LLDS) is proposed. Given a sequence S of length n, a letter-duplicated subsequence is a subsequence of S in the form of x₁^{d₁}x₂^{d₂}⋯ x_k^{d_k} with x_i ∈ Σ, x_j≠ x_{j+1} and d_i ≥ 2 for all i in [k] and j in [k-1]. A linear time algorithm for computing the longest letter-duplicated subsequence (LLDS) of S can be easily obtained. In this paper, we focus on two variants of this problem. We first consider the constrained version when Σ is unbounded, each letter appears in S at least 6 times and all the letters in Σ must appear in the solution. We show that the problem is NP-hard (a further twist indicates that the problem does not admit any polynomial time approximation). The reduction is from possibly the simplest version of SAT that is NP-complete, (≤ 2,1, ≤ 3)-SAT, where each variable appears at most twice positively and exact once negatively, and each clause contains at most three literals and some clauses must contain exactly two literals. (We hope that this technique will serve as a general tool to help us proving the NP-hardness for some more tricky sequence problems involving only one sequence - much harder than with at least two input sequences, which we apply successfully at the end of the paper on some extra variations of the LLDS problem.) We then show that when each letter appears in S at most 3 times, then the problem admits a factor 1.5-O(1/n) approximation. Finally, we consider the weighted version, where the weight of a block x_i^{d_i} (d_i ≥ 2) could be any positive function which might not grow with d_i. We give a non-trivial O(n²) time dynamic programming algorithm for this version, i.e., computing an LD-subsequence of S whose weight is maximized.

Cite as

Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, and Peng Zou. Beyond the Longest Letter-Duplicated Subsequence Problem. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 7:1-7:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{lai_et_al:LIPIcs.CPM.2022.7,
  author =	{Lai, Wenfeng and Liyanage, Adiesha and Zhu, Binhai and Zou, Peng},
  title =	{{Beyond the Longest Letter-Duplicated Subsequence Problem}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{7:1--7:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.7},
  URN =		{urn:nbn:de:0030-drops-161348},
  doi =		{10.4230/LIPIcs.CPM.2022.7},
  annote =	{Keywords: Segmental duplications, Tandem duplications, Longest common subsequence, NP-completeness, Dynamic programming}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.8

Reduction Ratio of the IS-Algorithm: Worst and Random Cases

Authors: Vincent Jugé

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

We study the IS-algorithm, a well-known linear-time algorithm for computing the suffix array of a word. This algorithm relies on transforming the input word w into another word, called the reduced word of w, that will be at least twice shorter; then, the algorithm recursively computes the suffix array of the reduced word. In this article, we study the reduction ratio of the IS-algorithm, i.e., the ratio between the lengths of the input word and the word obtained after reducing k times the input word. We investigate both worst cases, in which we find precise results, and random cases, where we prove some strong convergence phenomena. Finally, we prove that, if the input word is a randomly chosen word of length n, we should not expect much more than log(log(n)) recursive function calls.

Cite as

Vincent Jugé. Reduction Ratio of the IS-Algorithm: Worst and Random Cases. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 8:1-8:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{juge:LIPIcs.CPM.2022.8,
  author =	{Jug\'{e}, Vincent},
  title =	{{Reduction Ratio of the IS-Algorithm: Worst and Random Cases}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{8:1--8:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.8},
  URN =		{urn:nbn:de:0030-drops-161357},
  doi =		{10.4230/LIPIcs.CPM.2022.8},
  annote =	{Keywords: Word combinatorics, Suffix array, IS algorithm}
}

64 Search Results for "Bannai, Hideo"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message