63 Search Results for "Inenaga, Shunsuke"


Volume

LIPIcs, Volume 296

35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

CPM 2024, June 25-27, 2024, Fukuoka, Japan

Editors: Shunsuke Inenaga and Simon J. Puglisi

Document
Edit and Alphabet-Ordering Sensitivity of Lex-Parse

Authors: Yuto Nakashima, Dominik Köppl, Mitsuru Funakoshi, Shunsuke Inenaga, and Hideo Bannai

Published in: LIPIcs, Volume 306, 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)


Abstract
We investigate the compression sensitivity [Akagi et al., 2023] of lex-parse [Navarro et al., 2021] for two operations: (1) single character edit and (2) modification of the alphabet ordering, and give tight upper and lower bounds for both operations (i.e., we show Θ(log n) bounds for strings of length n). For both lower bounds, we use the family of Fibonacci words. For the bounds on edit operations, our analysis makes heavy use of properties of the Lyndon factorization of Fibonacci words to characterize the structure of lex-parse.

Cite as

Yuto Nakashima, Dominik Köppl, Mitsuru Funakoshi, Shunsuke Inenaga, and Hideo Bannai. Edit and Alphabet-Ordering Sensitivity of Lex-Parse. In 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 306, pp. 75:1-75:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{nakashima_et_al:LIPIcs.MFCS.2024.75,
  author =	{Nakashima, Yuto and K\"{o}ppl, Dominik and Funakoshi, Mitsuru and Inenaga, Shunsuke and Bannai, Hideo},
  title =	{{Edit and Alphabet-Ordering Sensitivity of Lex-Parse}},
  booktitle =	{49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)},
  pages =	{75:1--75:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-335-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{306},
  editor =	{Kr\'{a}lovi\v{c}, Rastislav and Ku\v{c}era, Anton{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2024.75},
  URN =		{urn:nbn:de:0030-drops-206314},
  doi =		{10.4230/LIPIcs.MFCS.2024.75},
  annote =	{Keywords: Compression sensitivity, Lex-parse, Fibonacci words}
}
Document
Complete Volume
LIPIcs, Volume 296, CPM 2024, Complete Volume

Authors: Shunsuke Inenaga and Simon J. Puglisi

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
LIPIcs, Volume 296, CPM 2024, Complete Volume

Cite as

35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 1-472, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@Proceedings{inenaga_et_al:LIPIcs.CPM.2024,
  title =	{{LIPIcs, Volume 296, CPM 2024, Complete Volume}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{1--472},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024},
  URN =		{urn:nbn:de:0030-drops-201098},
  doi =		{10.4230/LIPIcs.CPM.2024},
  annote =	{Keywords: LIPIcs, Volume 296, CPM 2024, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Shunsuke Inenaga and Simon J. Puglisi

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 0:i-0:xiv, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{inenaga_et_al:LIPIcs.CPM.2024.0,
  author =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{0:i--0:xiv},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.0},
  URN =		{urn:nbn:de:0030-drops-201101},
  doi =		{10.4230/LIPIcs.CPM.2024.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Computing the LCP Array of a Labeled Graph

Authors: Jarno N. Alanko, Davide Cenzato, Nicola Cotumaccio, Sung-Hwan Kim, Giovanni Manzini, and Nicola Prezza

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
The LCP array is an important tool in stringology, allowing to speed up pattern matching algorithms and enabling compact representations of the suffix tree. Recently, Conte et al. [DCC 2023] and Cotumaccio et al. [SPIRE 2023] extended the definition of this array to Wheeler DFAs and, ultimately, to arbitrary labeled graphs, proving that it can be used to efficiently solve matching statistics queries on the graph’s paths. In this paper, we provide the first efficient algorithm building the LCP array of a directed labeled graph with n nodes and m edges labeled over an alphabet of size σ. The first step is to transform the input graph G into a deterministic Wheeler pseudoforest G_{is} with O(n) edges encoding the lexicographically- smallest and largest strings entering in each node of the original graph. Using state-of-the-art algorithms, this step runs in O(min{mlog n, m+n²}) time on arbitrary labeled graphs, and in O(m) time on Wheeler DFAs. The LCP array of G stores the longest common prefixes between those strings, i.e. it can easily be derived from the LCP array of G_{is}. After arguing that the natural generalization of a compact-space LCP-construction algorithm by Beller et al. [J. Discrete Algorithms 2013] runs in time Ω(nσ) on pseudoforests, we present a new algorithm based on dynamic range stabbing building the LCP array of G_{is} in O(nlog σ) time and O(nlogσ) bits of working space. Combined with our reduction, we obtain the first efficient algorithm to build the LCP array of an arbitrary labeled graph. An implementation of our algorithm is publicly available at https://github.com/regindex/Labeled-Graph-LCP.

Cite as

Jarno N. Alanko, Davide Cenzato, Nicola Cotumaccio, Sung-Hwan Kim, Giovanni Manzini, and Nicola Prezza. Computing the LCP Array of a Labeled Graph. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 1:1-1:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.CPM.2024.1,
  author =	{Alanko, Jarno N. and Cenzato, Davide and Cotumaccio, Nicola and Kim, Sung-Hwan and Manzini, Giovanni and Prezza, Nicola},
  title =	{{Computing the LCP Array of a Labeled Graph}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{1:1--1:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.1},
  URN =		{urn:nbn:de:0030-drops-201113},
  doi =		{10.4230/LIPIcs.CPM.2024.1},
  annote =	{Keywords: LCP array, Wheeler automata, prefix sorting, pattern matching, sorting}
}
Document
Reconstructing General Matching Graphs

Authors: Amihood Amir and Michael Itzhaki

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set Σ. Motivated by many applications, algorithms were developed for pattern matching where the matching relation is not necessarily the "=" relation. Examples are pattern matching with "don't cares", approximate matching, less-than matching, Cartesian-tree matching, order preserving matching, parameterized matching, degenerate matching, function matching, and more. Some of the matchings above allow for efficient pattern matching algorithms, while others do not. Much work has not been done on categorization of the complexity of various string matching queries based on the type of matching. For example, when can exact matching be done fast? When can approximate matching be calculated fast? When can tandem or palindrome recognition be efficiently calculated? This paper defines the matching graph of a given string under a matching relation. We show that the type of graph affects various string algorithms. The matching graph can also be a tool for lower bounds. We provide a lower bound for finding palindromes in a general degenerate graph. We also show some results in recognizing the minimum alphabet required for reconstructing a string that presents a given matching graph.

Cite as

Amihood Amir and Michael Itzhaki. Reconstructing General Matching Graphs. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 2:1-2:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2024.2,
  author =	{Amir, Amihood and Itzhaki, Michael},
  title =	{{Reconstructing General Matching Graphs}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{2:1--2:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.2},
  URN =		{urn:nbn:de:0030-drops-201120},
  doi =		{10.4230/LIPIcs.CPM.2024.2},
  annote =	{Keywords: Pattern Matching, Matching Graphs, Reconstruction, NP-hardness}
}
Document
Maintaining the Size of LZ77 on Semi-Dynamic Strings

Authors: Hideo Bannai, Panagiotis Charalampopoulos, and Jakub Radoszewski

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
We consider the problem of maintaining the size of the LZ77 factorization of a string S of length at most n under the following operations: (a) appending a given letter to S and (b) deleting the first letter of S. Our main result is an algorithm for this problem with amortized update time Õ(√n). As a corollary, we obtain an Õ(n√n)-time algorithm for computing the most LZ77-compressible rotation of a length-n string - a naive approach for this problem would compute the LZ77 factorization of each possible rotation and would thus take quadratic time in the worst case. We also show an Ω(√n) lower bound for the additive sensitivity of LZ77 with respect to the rotation operation. Our algorithm employs dynamic trees to maintain the longest-previous-factor array information and depends on periodicity-based arguments that bound the number of the required updates and enable their efficient computation.

Cite as

Hideo Bannai, Panagiotis Charalampopoulos, and Jakub Radoszewski. Maintaining the Size of LZ77 on Semi-Dynamic Strings. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 3:1-3:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2024.3,
  author =	{Bannai, Hideo and Charalampopoulos, Panagiotis and Radoszewski, Jakub},
  title =	{{Maintaining the Size of LZ77 on Semi-Dynamic Strings}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{3:1--3:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.3},
  URN =		{urn:nbn:de:0030-drops-201134},
  doi =		{10.4230/LIPIcs.CPM.2024.3},
  annote =	{Keywords: Lempel-Ziv, compression, LZ77, semi-dynamic algorithm, cyclic rotation}
}
Document
Internal Pattern Matching in Small Space and Applications

Authors: Gabriel Bathie, Panagiotis Charalampopoulos, and Tatiana Starikovskaya

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
In this work, we consider pattern matching variants in small space, that is, in the read-only setting, where we want to bound the space usage on top of storing the strings. Our main contribution is a space-time trade-off for the Internal Pattern Matching (IPM) problem, where the goal is to construct a data structure over a string S of length n that allows one to answer the following type of queries: Compute the occurrences of a fragment P of S inside another fragment T of S, provided that |T| < 2|P|. For any τ ∈ [1 . . n/log² n], we present a nearly-optimal Õ(n/τ)-size data structure that can be built in Õ(n) time using Õ(n/τ) extra space, and answers IPM queries in O(τ+log n log³ log n) time. IPM queries have been identified as a crucial primitive operation for the analysis of algorithms on strings. In particular, the complexities of several recent algorithms for approximate pattern matching are expressed with regards to the number of calls to a small set of primitive operations that include IPM queries; our data structure allows us to port these results to the small-space setting. We further showcase the applicability of our IPM data structure by using it to obtain space-time trade-offs for the longest common substring and circular pattern matching problems in the asymmetric streaming setting.

Cite as

Gabriel Bathie, Panagiotis Charalampopoulos, and Tatiana Starikovskaya. Internal Pattern Matching in Small Space and Applications. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 4:1-4:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bathie_et_al:LIPIcs.CPM.2024.4,
  author =	{Bathie, Gabriel and Charalampopoulos, Panagiotis and Starikovskaya, Tatiana},
  title =	{{Internal Pattern Matching in Small Space and Applications}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{4:1--4:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.4},
  URN =		{urn:nbn:de:0030-drops-201148},
  doi =		{10.4230/LIPIcs.CPM.2024.4},
  annote =	{Keywords: internal pattern matching, longest common substring, small-space algorithms}
}
Document
Random Wheeler Automata

Authors: Ruben Becker, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Riccardo Maso, and Nicola Prezza

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
Wheeler automata were introduced in 2017 as a tool to generalize existing indexing and compression techniques based on the Burrows-Wheeler transform. Intuitively, an automaton is said to be Wheeler if there exists a total order on its states reflecting the natural co-lexicographic order of the strings labeling the automaton’s paths; this property makes it possible to represent the automaton’s topology in a constant number of bits per transition, as well as efficiently solving pattern matching queries on its accepted regular language. After their introduction, Wheeler automata have been the subject of a prolific line of research, both from the algorithmic and language-theoretic points of view. A recurring issue faced in these studies is the lack of large datasets of Wheeler automata on which the developed algorithms and theories could be tested. One possible way to overcome this issue is to generate random Wheeler automata. Motivated by this observation of practical nature, in this paper we initiate the theoretical study of random Wheeler automata, focusing our attention on the deterministic case (Wheeler DFAs - WDFAs). We start by naturally extending the Erdős-Rényi random graph model to WDFAs, and proceed by providing an algorithm generating uniform WDFAs according to this model. Our algorithm generates a uniform WDFA with n states, m transitions, and alphabet’s cardinality σ in O(m) expected time (O(mlog m) time w.h.p.) and constant working space for all alphabets of size σ ≤ m/ln m. The output WDFA is streamed directly to the output. As a by-product, we also give formulas for the number of distinct WDFAs and obtain that nσ + (n - σ) log σ bits are necessary and sufficient to encode a WDFA with n states and alphabet of size σ, up to an additive Θ(n) term. We present an implementation of our algorithm and show that it is extremely fast in practice, with a throughput of over 8 million transitions per second.

Cite as

Ruben Becker, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Riccardo Maso, and Nicola Prezza. Random Wheeler Automata. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{becker_et_al:LIPIcs.CPM.2024.5,
  author =	{Becker, Ruben and Cenzato, Davide and Kim, Sung-Hwan and Kodric, Bojana and Maso, Riccardo and Prezza, Nicola},
  title =	{{Random Wheeler Automata}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{5:1--5:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.5},
  URN =		{urn:nbn:de:0030-drops-201157},
  doi =		{10.4230/LIPIcs.CPM.2024.5},
  annote =	{Keywords: Wheeler automata, Burrows-Wheeler transform, random graphs}
}
Document
Connecting de Bruijn Graphs

Authors: Giulia Bernardini, Huiping Chen, Inge Li Gørtz, Christoffer Krogh, Grigorios Loukides, Solon P. Pissis, Leen Stougie, and Michelle Sweering

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
We study the problem of making a de Bruijn graph (dBG), constructed from a collection of strings, weakly connected while minimizing the total cost of edge additions. The input graph is a dBG that can be made weakly connected by adding edges (along with extra nodes if needed) from the underlying complete dBG. The problem arises from genome reconstruction, where the dBG is constructed from a set of sequences generated from a genome sample by a sequencing experiment. Due to sequencing errors, the dBG is never Eulerian in practice and is often not even weakly connected. We show the following results for a dBG G(V,E) of order k consisting of d weakly connected components: 1) Making G weakly connected by adding a set of edges of minimal total cost is NP-hard. 2) No PTAS exists for making G weakly connected by adding a set of edges of minimal total cost (unless the unique games conjecture fails). We complement this result by showing that there does exist a polynomial-time (2-2/d)-approximation algorithm for the problem. 3) We consider a restricted version of the above problem, where we are asked to make G weakly connected by only adding directed paths between pairs of components. We show that making G weakly connected by adding d-1 such paths of minimal total cost can be done in 𝒪(k|V|α(|V|)+|E|) time, where α(⋅) is the inverse Ackermann function. This improves on the 𝒪(k|V|log(|V|)+|E|)-time algorithm proposed by Bernardini et al. [CPM 2022] for the same restricted problem. 4) An ILP formulation of polynomial size for making G Eulerian with minimal total cost.

Cite as

Giulia Bernardini, Huiping Chen, Inge Li Gørtz, Christoffer Krogh, Grigorios Loukides, Solon P. Pissis, Leen Stougie, and Michelle Sweering. Connecting de Bruijn Graphs. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 6:1-6:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bernardini_et_al:LIPIcs.CPM.2024.6,
  author =	{Bernardini, Giulia and Chen, Huiping and G{\o}rtz, Inge Li and Krogh, Christoffer and Loukides, Grigorios and Pissis, Solon P. and Stougie, Leen and Sweering, Michelle},
  title =	{{Connecting de Bruijn Graphs}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{6:1--6:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.6},
  URN =		{urn:nbn:de:0030-drops-201168},
  doi =		{10.4230/LIPIcs.CPM.2024.6},
  annote =	{Keywords: string algorithm, graph algorithm, de Bruijn graph, Eulerian graph}
}
Document
A Class of Heuristics for Reducing the Number of BWT-Runs in the String Ordering Problem

Authors: Gianmarco Bertola, Anthony J. Cox, Veronica Guerrini, and Giovanna Rosone

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
The Burrows-Wheeler transform (BWT) is a famous text transformation that rearranges the symbols of the input strings so that occurrences of a same symbol tend to occur in runs. The number of runs is an important parameter in the BWT output string, historically associated with its high compressibility and more recently used as a measure for the space complexity of efficient data structures. It is a known fact that reordering the strings in the input collection 𝒮 affects the number of runs in the output string bwt(𝒮) produced by applying the BWT to the string collection. In this paper, we define a class of transformed strings where symbols in particular blocks of the bwt(𝒮) can be reordered according to a different adaptive alphabet order. Then, we introduce new heuristics to reduce the number of runs in the BWT output of a string collection that improve on the two existing heuristics introduced in Cox et al. [Anthony J. Cox et al., 2012]. These new heuristics are computed when applying the BWT to a string collection assuming no a priori order on the input strings and without requiring any pre- and/or post- processing of the collection 𝒮 or of the BWT string. In this paper, we also face the problem of reconstructing the input collection 𝒮 from the string bwt(𝒮) together with the string permutation realized when applying an alphabetical reordering of symbols during the construction of bwt(𝒮).

Cite as

Gianmarco Bertola, Anthony J. Cox, Veronica Guerrini, and Giovanna Rosone. A Class of Heuristics for Reducing the Number of BWT-Runs in the String Ordering Problem. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 7:1-7:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bertola_et_al:LIPIcs.CPM.2024.7,
  author =	{Bertola, Gianmarco and Cox, Anthony J. and Guerrini, Veronica and Rosone, Giovanna},
  title =	{{A Class of Heuristics for Reducing the Number of BWT-Runs in the String Ordering Problem}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{7:1--7:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.7},
  URN =		{urn:nbn:de:0030-drops-201179},
  doi =		{10.4230/LIPIcs.CPM.2024.7},
  annote =	{Keywords: Burrows-Wheeler Transform, SAP-interval, repetitive text, string compression}
}
Document
Faster Sliding Window String Indexing in Streams

Authors: Philip Bille, Paweł Gawrychowski, Inge Li Gørtz, and Simon R. Tarnow

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
The classical string indexing problem asks to preprocess the input string S for efficient pattern matching queries. Bille, Fischer, Gørtz, Pedersen, and Stordalen [CPM 2023] generalized this to the {streaming sliding window string indexing} problem, where the input string S arrives as a stream, and we are asked to maintain an index of the last w characters, called the window. Further, at any point in time, a pattern P might appear, again given as a stream, and all occurrences of P in the current window must be output. We require that the time to process each character of the text or the pattern is worst-case. It appears that standard string indexing structures, such as suffix trees, do not provide an efficient solution in such a setting, as to obtain a good worst-case bound, they necessarily need to work right-to-left, and we cannot reverse the pattern while keeping a worst-case guarantee on the time to process each of its characters. Nevertheless, it is possible to obtain a bound of 𝒪(log w) (with high probability) by maintaining a hierarchical structure of multiple suffix trees. We significantly improve this upper bound by designing a black-box reduction to maintain a suffix tree under prepending characters to the current text. By plugging in the known results, this allows us to obtain a bound of 𝒪(log log w +log log σ) (with high probability), where σ is the size of the alphabet. Further, we introduce an even more general problem, called the {streaming dynamic window string indexing}, where the goal is to maintain the current text under adding and deleting characters at either end and design a similar black-box reduction.

Cite as

Philip Bille, Paweł Gawrychowski, Inge Li Gørtz, and Simon R. Tarnow. Faster Sliding Window String Indexing in Streams. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 8:1-8:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2024.8,
  author =	{Bille, Philip and Gawrychowski, Pawe{\l} and G{\o}rtz, Inge Li and Tarnow, Simon R.},
  title =	{{Faster Sliding Window String Indexing in Streams}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{8:1--8:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.8},
  URN =		{urn:nbn:de:0030-drops-201183},
  doi =		{10.4230/LIPIcs.CPM.2024.8},
  annote =	{Keywords: data structures, pattern matching, string indexing}
}
Document
Tight Bounds for Compressing Substring Samples

Authors: Philip Bille, Christian Mikkelsen Fuglsang, and Inge Li Gørtz

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
We consider the problem of compressing a set of substrings sampled from a string and analyzing the size of the compression. Given a string S of length n, and integers d and m where n ≥ m ≥ 2d > 0, let SCS(S, m, d) be the string obtained by sequentially concatenating substrings of length m sampled regularly at intervals of d starting at position 1 in S. We consider the size of the LZ77 parsing of SCS(S, m, d), in relation to the size of the LZ77 parsing of S. This is motivated by genome sequencing, where the mentioned sampling process is an idealization of the short-read DNA sequencing. We show the following upper bound: |LZ77(SCS(S, m, d))| ≤ |LZ77(S)| + 2(n-m)/d. We also give a lower bound showing that this is tight. This improves previous results by Badkobeh et al. [ICTCS 2022], and closes the open problem of whether their bound can be improved. Another natural question is whether assuming that all letters in S are part of a sample, it is always the case that |LZ77(S)| ≤ |LZ77(SCS(S, m, d))|. Surprisingly, we show that there is a family of strings such that |LZ77(SCS(S, m, d))| = |LZ77(S)| - 1.

Cite as

Philip Bille, Christian Mikkelsen Fuglsang, and Inge Li Gørtz. Tight Bounds for Compressing Substring Samples. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2024.9,
  author =	{Bille, Philip and Fuglsang, Christian Mikkelsen and G{\o}rtz, Inge Li},
  title =	{{Tight Bounds for Compressing Substring Samples}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{9:1--9:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.9},
  URN =		{urn:nbn:de:0030-drops-201192},
  doi =		{10.4230/LIPIcs.CPM.2024.9},
  annote =	{Keywords: Compression, Algorithms, Lempel-Ziv}
}
Document
Searching 2D-Strings for Matching Frames

Authors: Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus, Adrian Miclăuş, and Arseny Shur

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
We study a natural type of repetitions in 2-dimensional strings. Such a repetition, called a matching frame, is a rectangular substring of size at least 2× 2 with equal marginal rows and equal marginal columns. Matching frames first appeared in literature in the context of Wang tiles. We present two algorithms finding a matching frame with the maximum perimeter in a given n× m input string. The first algorithm solves the problem exactly in Õ(n^{2.5}) time (assuming n ≥ m). The second algorithm finds a (1-ε)-approximate solution in Õ((nm)/ε⁴) time, which is near linear in the size of the input for constant ε. In particular, by setting ε = O(1) the second algorithm decides the existence of a matching frame in a given string in Õ(nm) time. Some technical elements and structural properties used in these algorithms can be of independent interest.

Cite as

Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus, Adrian Miclăuş, and Arseny Shur. Searching 2D-Strings for Matching Frames. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 10:1-10:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2024.10,
  author =	{Boneh, Itai and Fried, Dvir and Golan, Shay and Kraus, Matan and Micl\u{a}u\c{s}, Adrian and Shur, Arseny},
  title =	{{Searching 2D-Strings for Matching Frames}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{10:1--10:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.10},
  URN =		{urn:nbn:de:0030-drops-201205},
  doi =		{10.4230/LIPIcs.CPM.2024.10},
  annote =	{Keywords: 2D string, matching frame, LCP, multidimensional range query}
}
Document
Hairpin Completion Distance Lower Bound

Authors: Itai Boneh, Dvir Fried, Shay Golan, and Matan Kraus

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
Hairpin completion, derived from the hairpin formation observed in DNA biochemistry, is an operation applied to strings, particularly useful in DNA computing. Conceptually, a right hairpin completion operation transforms a string S into S⋅ S' where S' is the reverse complement of a prefix of S. Similarly, a left hairpin completion operation transforms a string S into S'⋅ S where S' is the reverse complement of a suffix of S. The hairpin completion distance from S to T is the minimum number of hairpin completion operations needed to transform S into T. Recently Boneh et al. [Itai Boneh et al., 2023] showed an O(n²) time algorithm for finding the hairpin completion distance between two strings of length at most n. In this paper we show that for any ε > 0 there is no O(n^{2-ε})-time algorithm for the hairpin completion distance problem unless the Strong Exponential Time Hypothesis (SETH) is false. Thus, under SETH, the time complexity of the hairpin completion distance problem is quadratic, up to sub-polynomial factors.

Cite as

Itai Boneh, Dvir Fried, Shay Golan, and Matan Kraus. Hairpin Completion Distance Lower Bound. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 11:1-11:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2024.11,
  author =	{Boneh, Itai and Fried, Dvir and Golan, Shay and Kraus, Matan},
  title =	{{Hairpin Completion Distance Lower Bound}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{11:1--11:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.11},
  URN =		{urn:nbn:de:0030-drops-201215},
  doi =		{10.4230/LIPIcs.CPM.2024.11},
  annote =	{Keywords: Fine-grained complexity, Hairpin completion, LCS}
}
  • Refine by Author
  • 34 Inenaga, Shunsuke
  • 24 Bannai, Hideo
  • 21 Takeda, Masayuki
  • 13 Nakashima, Yuto
  • 7 Mieno, Takuya
  • Show More...

  • Refine by Classification

  • Refine by Keyword
  • 6 suffix trees
  • 5 pattern matching
  • 5 string algorithms
  • 3 DAWGs
  • 3 Fibonacci words
  • Show More...

  • Refine by Type
  • 62 document
  • 1 volume

  • Refine by Publication Year
  • 32 2024
  • 7 2016
  • 5 2017
  • 5 2018
  • 5 2019
  • Show More...

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail