LIPIcs, Volume 128

30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)



Thumbnail PDF

Event

CPM 2019, June 18-20, 2019, Pisa, Italy

Editors

Nadia Pisanti
  • University of Pisa, Italy
Solon P. Pissis
  • CWI Amsterdam, the Netherlands

Publication Details

  • published at: 2019-06-06
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-103-0
  • DBLP: db/conf/cpm/cpm2019

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 128, CPM'19, Complete Volume

Authors: Nadia Pisanti and Solon P. Pissis


Abstract
LIPIcs, Volume 128, CPM'19, Complete Volume

Cite as

30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@Proceedings{pisanti_et_al:LIPIcs.CPM.2019,
  title =	{{LIPIcs, Volume 128, CPM'19, Complete Volume}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019},
  URN =		{urn:nbn:de:0030-drops-105577},
  doi =		{10.4230/LIPIcs.CPM.2019},
  annote =	{Keywords: Mathematics of computing, Discrete mathematics, Applied computing, Computational biology, Information theory, Information systems}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Nadia Pisanti and Solon P. Pissis


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 0:i-0:xviii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{pisanti_et_al:LIPIcs.CPM.2019.0,
  author =	{Pisanti, Nadia and P. Pissis, Solon},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{0:i--0:xviii},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.0},
  URN =		{urn:nbn:de:0030-drops-104719},
  doi =		{10.4230/LIPIcs.CPM.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
How to Exploit Periodicity (Invited Talk)

Authors: Paweł Gawrychowski


Abstract
Periodicity is a fundamental combinatorial property of strings. We say that p is a period of a string s[1..n] when s[i]=s[i+p] for every i such that both s[i] and s[i+p] are defined. While this notion is interesting on its own, it can be often used as a tool for designing efficient algorithms. At a high level, such algorithms often operate differently depending on whether a given string does or does not have a small period, where small usually means smaller than half of its length (or, say, quarter). In other words, we design an algorithm that is efficient if the given string is repetitive, and another algorithm that is efficient if the given string is non-repetitive, in every case carefully exploiting either the periodicity or the fact that input looks sufficiently “random”, and then choose the appropriate algorithm depending on the input. Of course, in some cases, one needs to proceed in a more complex manner, for example by classifying the whole string look at its substrings and process each of them differently depending on its structure. I will survey results, mostly connected to different version of pattern matching, that are based on this paradigm. This will include the recent generalization of periodicity that can be applied in approximate pattern matching, and some examples of how the notion of periodicity can be applied to design a better data structure.

Cite as

Paweł Gawrychowski. How to Exploit Periodicity (Invited Talk). In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{gawrychowski:LIPIcs.CPM.2019.1,
  author =	{Gawrychowski, Pawe{\l}},
  title =	{{How to Exploit Periodicity}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.1},
  URN =		{urn:nbn:de:0030-drops-104727},
  doi =		{10.4230/LIPIcs.CPM.2019.1},
  annote =	{Keywords: periodicity, pattern matching, Hamming distance}
}
Document
Invited Talk
Some Variations on Lyndon Words (Invited Talk)

Authors: Francesco Dolce, Antonio Restivo, and Christophe Reutenauer


Abstract
In this paper we compare two finite words u and v by the lexicographical order of the infinite words u^omega and v^omega. Informally, we say that we compare u and v by the infinite order. We show several properties of Lyndon words expressed using this infinite order. The innovative aspect of this approach is that it allows to take into account also non trivial conditions on the prefixes of a word, instead that only on the suffixes. In particular, we derive a result of Ufnarovskij [V. Ufnarovskij, Combinatorial and asymptotic methods in algebra, 1995] that characterizes a Lyndon word as a word which is greater, with respect to the infinite order, than all its prefixes. Motivated by this result, we introduce the prefix standard permutation of a Lyndon word and the corresponding (left) Cartesian tree. We prove that the left Cartesian tree is equal to the left Lyndon tree, defined by the left standard factorization of Viennot [G. Viennot, Algèbres de Lie libres et monoïdes libres, 1978]. This result is dual with respect to a theorem of Hohlweg and Reutenauer [C. Hohlweg and C. Reutenauer, Lyndon words, permutations and trees, 2003].

Cite as

Francesco Dolce, Antonio Restivo, and Christophe Reutenauer. Some Variations on Lyndon Words (Invited Talk). In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{dolce_et_al:LIPIcs.CPM.2019.2,
  author =	{Dolce, Francesco and Restivo, Antonio and Reutenauer, Christophe},
  title =	{{Some Variations on Lyndon Words}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{2:1--2:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.2},
  URN =		{urn:nbn:de:0030-drops-104732},
  doi =		{10.4230/LIPIcs.CPM.2019.2},
  annote =	{Keywords: Lyndon words, Infinite words, Left Lyndon trees, Left Cartesian trees}
}
Document
Invited Talk
Stringology Combats Microbiological Threats (Invited Talk)

Authors: Michal Ziv-Ukelson


Abstract
A major concern worldwide is the acquisition of antibiotic resistance by pathogenic bacteria. Genomic elements carrying resistance and virulence function can be acquired through horizontal gene transfer, yielding a broad spread of evolutionary successful elements, both within and in between species, with devastating effect. Recent advances in pyrosequencing techniques, combined with global efforts to study microbial adaptation to a wide range of ecological niches (and in particular to life in host tissues that we perceive as pathogenesis), yield huge and rapidly-growing databases of microbial genomes. This big new data statistically empowers genomic-context based approaches to functional analysis: the idea is that groups of genes that are clustered locally together across many genomes usually express protein products that interact in the same biological pathway, and thus the function of a new, uncharacterized gene can be deciphered based on the previously characterized genes that are co-localized with it in the same gene cluster. Identifying and interpreting microbial gene context in huge genomic data requires efficient string-based data mining algorithms. Additionally, new computational challenges are raised by the need to study the grammar and evolutionary spreading patterns of microbial gene context. In this talk, we will review some classical combinatorial pattern matching and data mining problems, previously inspired by this application domain. We will re-examine the biological assumptions behind the previously proposed models in light of some new biological observations. We will consider the computational challenges arising in accomodating the new biological observations, and in exploiting them to scale up the algorithmic solutions to the huge new data. Our goal is to inspire interesting new problems that harness Stringology to the study of microbial adaptation and to the fight against microbiological threats ...

Cite as

Michal Ziv-Ukelson. Stringology Combats Microbiological Threats (Invited Talk). In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{zivukelson:LIPIcs.CPM.2019.3,
  author =	{Ziv-Ukelson, Michal},
  title =	{{Stringology Combats Microbiological Threats}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.3},
  URN =		{urn:nbn:de:0030-drops-104748},
  doi =		{10.4230/LIPIcs.CPM.2019.3},
  annote =	{Keywords: comparative genomics, syntenic blocks, gene clusters, reconciliation of gene and species trees}
}
Document
Optimal Rank and Select Queries on Dictionary-Compressed Text

Authors: Nicola Prezza


Abstract
We study the problem of supporting queries on a string S of length n within a space bounded by the size gamma of a string attractor for S. In the paper introducing string attractors it was shown that random access on S can be supported in optimal O(log(n/gamma)/log log n) time within O(gamma polylog n) space. In this paper, we extend this result to rank and select queries and provide lower bounds matching our upper bounds on alphabets of polylogarithmic size. Our solutions are given in the form of a space-time trade-off that is more general than the one previously known for grammars and that improves existing bounds on LZ77-compressed text by a log log n time-factor in select queries. We also provide matching lower and upper bounds for partial sum and predecessor queries within attractor-bounded space, and extend our lower bounds to encompass navigation of dictionary-compressed tree representations.

Cite as

Nicola Prezza. Optimal Rank and Select Queries on Dictionary-Compressed Text. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 4:1-4:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{prezza:LIPIcs.CPM.2019.4,
  author =	{Prezza, Nicola},
  title =	{{Optimal Rank and Select Queries on Dictionary-Compressed Text}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{4:1--4:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.4},
  URN =		{urn:nbn:de:0030-drops-104756},
  doi =		{10.4230/LIPIcs.CPM.2019.4},
  annote =	{Keywords: Rank, Select, Dictionary compression, String Attractors}
}
Document
A 2-Approximation Algorithm for the Complementary Maximal Strip Recovery Problem

Authors: Haitao Jiang, Jiong Guo, Daming Zhu, and Binhai Zhu


Abstract
The Maximal Strip Recovery problem (MSR) and its complementary (CMSR) are well-studied NP-hard problems in computational genomics. The input of these dual problems are two signed permutations. The goal is to delete some gene markers from both permutations, such that, in the remaining permutations, each gene marker has at least one common neighbor. Equivalently, the resulting permutations could be partitioned into common strips of length at least two. Then MSR is to maximize the number of remaining genes, while the objective of CMSR is to delete the minimum number of gene markers. In this paper, we present a new approximation algorithm for the Complementary Maximal Strip Recovery (CMSR) problem. Our approximation factor is 2, improving the currently best 7/3-approximation algorithm. Although the improvement on the factor is not huge, the analysis is greatly simplified by a compensating method, commonly referred to as the non-oblivious local search technique. In such a method a substitution may not always increase the value of the current solution (it sometimes may even decrease the solution value), though it always improves the value of another function seemingly unrelated to the objective function.

Cite as

Haitao Jiang, Jiong Guo, Daming Zhu, and Binhai Zhu. A 2-Approximation Algorithm for the Complementary Maximal Strip Recovery Problem. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.CPM.2019.5,
  author =	{Jiang, Haitao and Guo, Jiong and Zhu, Daming and Zhu, Binhai},
  title =	{{A 2-Approximation Algorithm for the Complementary Maximal Strip Recovery Problem}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{5:1--5:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.5},
  URN =		{urn:nbn:de:0030-drops-104769},
  doi =		{10.4230/LIPIcs.CPM.2019.5},
  annote =	{Keywords: Maximal strip recovery, complementary maximal strip recovery, computational genomics, approximation algorithm, local search}
}
Document
Sufficient Conditions for Efficient Indexing Under Different Matchings

Authors: Amihood Amir and Eitan Kondratovsky


Abstract
The most important task derived from the massive digital data accumulation in the world, is efficient access to this data, hence the importance of indexing. In the last decade, many different types of matching relations were defined, each requiring an efficient indexing scheme. Cole and Hariharan in a ground breaking paper [Cole and Hariharan, SIAM J. Comput., 33(1):26–42, 2003], formulate sufficient conditions for building an efficient indexing for quasi-suffix collections, collections that behave as suffixes. It was shown that known matchings, including parameterized, 2-D array and order preserving matchings, fit their indexing settings. In this paper, we formulate more basic sufficient conditions based on the order relation derived from the matching relation itself, our conditions are more general than the previously known conditions.

Cite as

Amihood Amir and Eitan Kondratovsky. Sufficient Conditions for Efficient Indexing Under Different Matchings. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 6:1-6:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2019.6,
  author =	{Amir, Amihood and Kondratovsky, Eitan},
  title =	{{Sufficient Conditions for Efficient Indexing Under Different Matchings}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{6:1--6:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.6},
  URN =		{urn:nbn:de:0030-drops-104773},
  doi =		{10.4230/LIPIcs.CPM.2019.6},
  annote =	{Keywords: off-the-shelf indexing algorithms, general matching relations, weaker sufficient conditions for indexing}
}
Document
Space-Efficient Computation of the LCP Array from the Burrows-Wheeler Transform

Authors: Nicola Prezza and Giovanna Rosone


Abstract
We show that the Longest Common Prefix Array of a text collection of total size n on alphabet [1,sigma] can be computed from the Burrows-Wheeler transformed collection in O(n log sigma) time using o(n log sigma) bits of working space on top of the input and output. Our result improves (on small alphabets) and generalizes (to string collections) the previous solution from Beller et al., which required O(n) bits of extra working space. We also show how to merge the BWTs of two collections of total size n within the same time and space bounds. The procedure at the core of our algorithms can be used to enumerate suffix tree intervals in succinct space from the BWT, which is of independent interest. An engineered implementation of our first algorithm on DNA alphabet induces the LCP of a large (16 GiB) collection of short (100 bases) reads at a rate of 2.92 megabases per second using in total 1.5 Bytes per base in RAM. Our second algorithm merges the BWTs of two short-reads collections of 8 GiB each at a rate of 1.7 megabases per second and uses 0.625 Bytes per base in RAM. An extension of this algorithm that computes also the LCP array of the merged collection processes the data at a rate of 1.48 megabases per second and uses 1.625 Bytes per base in RAM.

Cite as

Nicola Prezza and Giovanna Rosone. Space-Efficient Computation of the LCP Array from the Burrows-Wheeler Transform. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{prezza_et_al:LIPIcs.CPM.2019.7,
  author =	{Prezza, Nicola and Rosone, Giovanna},
  title =	{{Space-Efficient Computation of the LCP Array from the Burrows-Wheeler Transform}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.7},
  URN =		{urn:nbn:de:0030-drops-104782},
  doi =		{10.4230/LIPIcs.CPM.2019.7},
  annote =	{Keywords: Burrows-Wheeler Transform, LCP array, DNA reads}
}
Document
Safe and Complete Algorithms for Dynamic Programming Problems, with an Application to RNA Folding

Authors: Niko Kiirala, Leena Salmela, and Alexandru I. Tomescu


Abstract
Many bioinformatics problems admit a large number of solutions, with no way of distinguishing the correct one among them. One approach of coping with this issue is to look at the partial solutions common to all solutions. Such partial solutions have been called safe, and an algorithm outputting all safe solutions has been called safe and complete. In this paper we develop a general technique that automatically provides a safe and complete algorithm to problems solvable by dynamic programming. We illustrate it by applying it to the bioinformatics problem of RNA folding, assuming the simplistic folding model maximizing the number of paired bases. Our safe and complete algorithm has time complexity O(n^3M(n)) and space complexity O(n^3) where n is the length of the RNA sequence and M(n) in Omega(n) is the time complexity of arithmetic operations on O(n)-bit integers. We also implement this algorithm and show that, despite an exponential number of optimal solutions, our algorithm is efficient in practice.

Cite as

Niko Kiirala, Leena Salmela, and Alexandru I. Tomescu. Safe and Complete Algorithms for Dynamic Programming Problems, with an Application to RNA Folding. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kiirala_et_al:LIPIcs.CPM.2019.8,
  author =	{Kiirala, Niko and Salmela, Leena and Tomescu, Alexandru I.},
  title =	{{Safe and Complete Algorithms for Dynamic Programming Problems, with an Application to RNA Folding}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{8:1--8:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.8},
  URN =		{urn:nbn:de:0030-drops-104794},
  doi =		{10.4230/LIPIcs.CPM.2019.8},
  annote =	{Keywords: RNA secondary structure, RNA folding, Safe solution, Safe and complete algorithm, Counting problem}
}
Document
Conversion from RLBWT to LZ77

Authors: Takaaki Nishimoto and Yasuo Tabei


Abstract
Converting a compressed format of a string into another compressed format without an explicit decompression is one of the central research topics in string processing. We discuss the problem of converting the run-length Burrows-Wheeler Transform (RLBWT) of a string into Lempel-Ziv 77 (LZ77) phrases of the reversed string. The first results with Policriti and Prezza’s conversion algorithm [Algorithmica 2018] were O(n log r) time and O(r) working space for length of the string n, number of runs r in the RLBWT, and number of LZ77 phrases z. Recent results with Kempa’s conversion algorithm [SODA 2019] are O(n / log n + r log^{9} n + z log^{9} n) time and O(n / log_{sigma} n + r log^{8} n) working space for the alphabet size sigma of the RLBWT. In this paper, we present a new conversion algorithm by improving Policriti and Prezza’s conversion algorithm where dynamic data structures for general purpose are used. We argue that these dynamic data structures can be replaced and present new data structures for faster conversion. The time and working space of our conversion algorithm with new data structures are O(n min{log log n, sqrt{(log r)/(log log r)}}) and O(r), respectively.

Cite as

Takaaki Nishimoto and Yasuo Tabei. Conversion from RLBWT to LZ77. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 9:1-9:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{nishimoto_et_al:LIPIcs.CPM.2019.9,
  author =	{Nishimoto, Takaaki and Tabei, Yasuo},
  title =	{{Conversion from RLBWT to LZ77}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{9:1--9:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.9},
  URN =		{urn:nbn:de:0030-drops-104807},
  doi =		{10.4230/LIPIcs.CPM.2019.9},
  annote =	{Keywords: Burrows-Wheeler Transform, Lempel-Ziv Parsing, Lossless Data Compression}
}
Document
Fully-Functional Bidirectional Burrows-Wheeler Indexes and Infinite-Order De Bruijn Graphs

Authors: Djamal Belazzougui and Fabio Cunial


Abstract
Given a string T on an alphabet of size sigma, we describe a bidirectional Burrows-Wheeler index that takes O(|T| log sigma) bits of space, and that supports the addition and removal of one character, on the left or right side of any substring of T, in constant time. Previously known data structures that used the same space allowed constant-time addition to any substring of T, but they could support removal only from specific substrings of T. We also describe an index that supports bidirectional addition and removal in O(log log |T|) time, and that takes a number of words proportional to the number of left and right extensions of the maximal repeats of T. We use such fully-functional indexes to implement bidirectional, frequency-aware, variable-order de Bruijn graphs with no upper bound on their order, and supporting natural criteria for increasing and decreasing the order during traversal.

Cite as

Djamal Belazzougui and Fabio Cunial. Fully-Functional Bidirectional Burrows-Wheeler Indexes and Infinite-Order De Bruijn Graphs. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{belazzougui_et_al:LIPIcs.CPM.2019.10,
  author =	{Belazzougui, Djamal and Cunial, Fabio},
  title =	{{Fully-Functional Bidirectional Burrows-Wheeler Indexes and Infinite-Order De Bruijn Graphs}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{10:1--10:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.10},
  URN =		{urn:nbn:de:0030-drops-104811},
  doi =		{10.4230/LIPIcs.CPM.2019.10},
  annote =	{Keywords: BWT, suffix tree, CDAWG, de Bruijn graph, maximal repeat, string depth, contraction, bidirectional index}
}
Document
Entropy Lower Bounds for Dictionary Compression

Authors: Michał Gańczorz


Abstract
We show that a wide class of dictionary compression methods (including LZ77, LZ78, grammar compressors as well as parsing-based structures) require |S|H_k(S) + Omega (|S|k log sigma/log_sigma |S|) bits to encode their output. This matches known upper bounds and improves the information-theoretic lower bound of |S|H_k(S). To this end, we abstract the crucial properties of parsings created by those methods, construct a certain family of strings and analyze the parsings of those strings. We also show that for k = alpha log_sigma |S|, where 0 < alpha < 1 is a constant, the aforementioned methods produce an output of size at least 1/(1-alpha)|S|H_k(S) bits. Thus our results separate dictionary compressors from context-based one (such as PPM) and BWT-based ones, as the those include methods achieving |S|H_k(S) + O(sigma^k log sigma) bits, i.e. the redundancy depends on k and sigma but not on |S|.

Cite as

Michał Gańczorz. Entropy Lower Bounds for Dictionary Compression. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 11:1-11:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{ganczorz:LIPIcs.CPM.2019.11,
  author =	{Ga\'{n}czorz, Micha{\l}},
  title =	{{Entropy Lower Bounds for Dictionary Compression}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{11:1--11:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.11},
  URN =		{urn:nbn:de:0030-drops-104822},
  doi =		{10.4230/LIPIcs.CPM.2019.11},
  annote =	{Keywords: compression, empirical entropy, parsing, lower bounds}
}
Document
A New Class of Searchable and Provably Highly Compressible String Transformations

Authors: Raffaele Giancarlo, Giovanni Manzini, Giovanna Rosone, and Marinella Sciortino


Abstract
The Burrows-Wheeler Transform is a string transformation that plays a fundamental role for the design of self-indexing compressed data structures. Over the years, researchers have successfully extended this transformation outside the domains of strings. However, efforts to find non-trivial alternatives of the original, now 25 years old, Burrows-Wheeler string transformation have met limited success. In this paper we bring new lymph to this area by introducing a whole new family of transformations that have all the "myriad virtues" of the BWT: they can be computed and inverted in linear time, they produce provably highly compressible strings, and they support linear time pattern search directly on the transformed string. This new family is a special case of a more general class of transformations based on context adaptive alphabet orderings, a concept introduced here. This more general class includes also the Alternating BWT, another invertible string transforms recently introduced in connection with a generalization of Lyndon words.

Cite as

Raffaele Giancarlo, Giovanni Manzini, Giovanna Rosone, and Marinella Sciortino. A New Class of Searchable and Provably Highly Compressible String Transformations. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 12:1-12:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{giancarlo_et_al:LIPIcs.CPM.2019.12,
  author =	{Giancarlo, Raffaele and Manzini, Giovanni and Rosone, Giovanna and Sciortino, Marinella},
  title =	{{A New Class of Searchable and Provably Highly Compressible String Transformations}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{12:1--12:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.12},
  URN =		{urn:nbn:de:0030-drops-104833},
  doi =		{10.4230/LIPIcs.CPM.2019.12},
  annote =	{Keywords: Data Indexing and Compression, Burrows-Wheeler Transformation, Combinatorics on Words}
}
Document
Compressed Multiple Pattern Matching

Authors: Dmitry Kosolobov and Nikita Sivukhin


Abstract
Given d strings over the alphabet {0,1,...,sigma{-}1}, the classical Aho - Corasick data structure allows us to find all occ occurrences of the strings in any text T in O(|T| + occ) time using O(m log m) bits of space, where m is the number of edges in the trie containing the strings. Fix any constant epsilon in (0, 2). We describe a compressed solution for the problem that, provided sigma <=m^delta for a constant delta < 1, works in O(|T| 1/epsilon log(1/epsilon) + occ) time, which is O(|T| + occ) since epsilon is constant, and occupies mH_k + 1.443 m + epsilon m + O(d log m/d) bits of space, for all 0 <= k <= max{0,alpha log_sigma m - 2} simultaneously, where alpha in (0,1) is an arbitrary constant and H_k is the kth-order empirical entropy of the trie. Hence, we reduce the 3.443m term in the space bounds of previously best succinct solutions to (1.443 + epsilon)m, thus solving an open problem posed by Belazzougui. Further, we notice that L = log binom{sigma (m+1)}{m} - O(log(sigma m)) is a worst-case space lower bound for any solution of the problem and, for d = o(m) and constant epsilon, our approach allows to achieve L + epsilon m bits of space, which gives an evidence that, for d = o(m), the space of our data structure is theoretically optimal up to the epsilon m additive term and it is hardly possible to eliminate the term 1.443m. In addition, we refine the space analysis of previous works by proposing a more appropriate definition for H_k. We also simplify the construction for practice adapting the fixed block compression boosting technique, then implement our data structure, and conduct a number of experiments showing that it is comparable to the state of the art in terms of time and is superior in space.

Cite as

Dmitry Kosolobov and Nikita Sivukhin. Compressed Multiple Pattern Matching. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 13:1-13:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kosolobov_et_al:LIPIcs.CPM.2019.13,
  author =	{Kosolobov, Dmitry and Sivukhin, Nikita},
  title =	{{Compressed Multiple Pattern Matching}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{13:1--13:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.13},
  URN =		{urn:nbn:de:0030-drops-104847},
  doi =		{10.4230/LIPIcs.CPM.2019.13},
  annote =	{Keywords: multiple pattern matching, compressed space, Aho--Corasick automaton}
}
Document
Hamming Distance Completeness

Authors: Karim Labib, Przemysław Uznański, and Daniel Wolleb-Graf


Abstract
We show, given a binary integer function diamond that is piecewise polynomial, that (+,diamond) vector products are equivalent under one-to-polylog reductions to the computation of the Hamming distance. Examples include the dominance and l_{2p+1} distances for constant p. Our results imply equivalence (up to polylog factors) between the complexity of computing All Pairs Hamming Distance, All Pairs l_{2p+1} Distance and Dominance Matrix Product, and equivalence between Hamming Distance Pattern Matching, l_{2p+1} Pattern Matching and Less-Than Pattern Matching. The resulting algorithms for l_{2p+1} Pattern Matching and All Pairs l_{2p+1}, for 2p+1 = 3,5,7,... are likely to be optimal, given lack of progress in improving upper bounds for Hamming distance in the past 30 years. While reductions between selected pairs of products were presented in the past, our work is the first to generalize them to a general class of functions, showing that a wide class of "intermediate" complexity problems are in fact equivalent.

Cite as

Karim Labib, Przemysław Uznański, and Daniel Wolleb-Graf. Hamming Distance Completeness. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{labib_et_al:LIPIcs.CPM.2019.14,
  author =	{Labib, Karim and Uzna\'{n}ski, Przemys{\l}aw and Wolleb-Graf, Daniel},
  title =	{{Hamming Distance Completeness}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.14},
  URN =		{urn:nbn:de:0030-drops-104853},
  doi =		{10.4230/LIPIcs.CPM.2019.14},
  annote =	{Keywords: fine grained complexity, approximate pattern matching, matrix products}
}
Document
Approximating Approximate Pattern Matching

Authors: Jan Studený and Przemysław Uznański


Abstract
Given a text T of length n and a pattern P of length m, the approximate pattern matching problem asks for computation of a particular distance function between P and every m-substring of T. We consider a (1 +/- epsilon) multiplicative approximation variant of this problem, for l_p distance function. In this paper, we describe two (1+epsilon)-approximate algorithms with a runtime of O~(n/epsilon) for all (constant) non-negative values of p. For constant p >= 1 we show a deterministic (1+epsilon)-approximation algorithm. Previously, such run time was known only for the case of l_1 distance, by Gawrychowski and Uznański [ICALP 2018] and only with a randomized algorithm. For constant 0 <= p <= 1 we show a randomized algorithm for the l_p, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS 2015, SOSA 2018] for Hamming distance (case of p=0) and of Gawrychowski and Uznański for l_1 distance.

Cite as

Jan Studený and Przemysław Uznański. Approximating Approximate Pattern Matching. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{studeny_et_al:LIPIcs.CPM.2019.15,
  author =	{Studen\'{y}, Jan and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{Approximating Approximate Pattern Matching}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.15},
  URN =		{urn:nbn:de:0030-drops-104865},
  doi =		{10.4230/LIPIcs.CPM.2019.15},
  annote =	{Keywords: Approximate Pattern Matching, l\underlinep Distance, l\underline1 Distance, Hamming Distance, Approximation Algorithms}
}
Document
Cartesian Tree Matching and Indexing

Authors: Sung Gwan Park, Amihood Amir, Gad M. Landau, and Kunsoo Park


Abstract
We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a pattern of length m, and multiple pattern matching for a text of length n and k patterns of total length m. We present an O(n+m) time algorithm for single pattern matching, and an O((n+m) log k) deterministic time or O(n+m) randomized time algorithm for multiple pattern matching. We also define an index data structure called Cartesian suffix tree, and present an O(n) randomized time algorithm to build the Cartesian suffix tree. Our efficient algorithms for Cartesian tree matching use a representation of the Cartesian tree, called the parent-distance representation.

Cite as

Sung Gwan Park, Amihood Amir, Gad M. Landau, and Kunsoo Park. Cartesian Tree Matching and Indexing. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 16:1-16:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{park_et_al:LIPIcs.CPM.2019.16,
  author =	{Park, Sung Gwan and Amir, Amihood and Landau, Gad M. and Park, Kunsoo},
  title =	{{Cartesian Tree Matching and Indexing}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{16:1--16:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.16},
  URN =		{urn:nbn:de:0030-drops-104879},
  doi =		{10.4230/LIPIcs.CPM.2019.16},
  annote =	{Keywords: Cartesian tree matching, Pattern matching, Indexing, Parent-distance representation}
}
Document
Indexing the Bijective BWT

Authors: Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Pia̧tkowski


Abstract
The Burrows-Wheeler transform (BWT) is a permutation whose applications are prevalent in data compression and text indexing. The bijective BWT is a bijective variant of it that has not yet been studied for text indexing applications. We fill this gap by proposing a self-index built on the bijective BWT . The self-index applies the backward search technique of the FM-index to find a pattern P with O(|P| lg|P|) backward search steps.

Cite as

Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Pia̧tkowski. Indexing the Bijective BWT. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2019.17,
  author =	{Bannai, Hideo and K\"{a}rkk\"{a}inen, Juha and K\"{o}ppl, Dominik and Pia̧tkowski, Marcin},
  title =	{{Indexing the Bijective BWT}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{17:1--17:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.17},
  URN =		{urn:nbn:de:0030-drops-104887},
  doi =		{10.4230/LIPIcs.CPM.2019.17},
  annote =	{Keywords: Burrows-Wheeler Transform, Lyndon words, Text Indexing}
}
Document
On Maximal Repeats in Compressed Strings

Authors: Julian Pape-Lange


Abstract
This paper presents and proves a new non-trivial upper bound on the number of maximal repeats of compressed strings. Using Theorem 1 of Raffinot’s article "On Maximal Repeats in Strings", this upper bound can be directly translated into an upper bound on the number of nodes in the Compacted Directed Acyclic Word Graphs of compressed strings. More formally, this paper proves that the number of maximal repeats in a string with z (self-referential) LZ77-factors and without q-th powers is at most 3q(z+1)^3-2. Also, this paper proves that for 2000 <= z <= q this upper bound is tight up to a constant factor.

Cite as

Julian Pape-Lange. On Maximal Repeats in Compressed Strings. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 18:1-18:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{papelange:LIPIcs.CPM.2019.18,
  author =	{Pape-Lange, Julian},
  title =	{{On Maximal Repeats in Compressed Strings}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{18:1--18:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.18},
  URN =		{urn:nbn:de:0030-drops-104898},
  doi =		{10.4230/LIPIcs.CPM.2019.18},
  annote =	{Keywords: Maximal repeats, Combinatorics on compressed strings, LZ77, Compact suffix automata, CDAWGs}
}
Document
Dichotomic Selection on Words: A Probabilistic Analysis

Authors: Ali Akhavi, Julien Clément, Dimitri Darthenay, Loïck Lhote, and Brigitte Vallée


Abstract
The paper studies the behaviour of selection algorithms that are based on dichotomy principles. On the entry formed by an ordered list L and a searched element x not in L, they return the interval of the list L the element x belongs to. We focus here on the case of words, where dichotomy principles lead to a selection algorithm designed by Crochemore, Hancart and Lecroq, which appears to be "quasi-optimal". We perform a probabilistic analysis of this algorithm that exhibits its quasi-optimality on average.

Cite as

Ali Akhavi, Julien Clément, Dimitri Darthenay, Loïck Lhote, and Brigitte Vallée. Dichotomic Selection on Words: A Probabilistic Analysis. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 19:1-19:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{akhavi_et_al:LIPIcs.CPM.2019.19,
  author =	{Akhavi, Ali and Cl\'{e}ment, Julien and Darthenay, Dimitri and Lhote, Lo\"{i}ck and Vall\'{e}e, Brigitte},
  title =	{{Dichotomic Selection on Words: A Probabilistic Analysis}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{19:1--19:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.19},
  URN =		{urn:nbn:de:0030-drops-104903},
  doi =		{10.4230/LIPIcs.CPM.2019.19},
  annote =	{Keywords: dichotomic selection, text algorithms, analysis of algorithms, average case analysis of algorithms, trie, suffix array, lcp-array, information theory, numeration process, sources, entropy, coincidence, analytic combinatorics, depoissonization techniques}
}
Document
Finding a Small Number of Colourful Components

Authors: Laurent Bulteau, Konrad K. Dabrowski, Guillaume Fertin, Matthew Johnson, Daniël Paulusma, and Stéphane Vialette


Abstract
A partition (V_1,...,V_k) of the vertex set of a graph G with a (not necessarily proper) colouring c is colourful if no two vertices in any V_i have the same colour and every set V_i induces a connected graph. The Colourful Partition problem, introduced by Adamaszek and Popa, is to decide whether a coloured graph (G,c) has a colourful partition of size at most k. This problem is related to the Colourful Components problem, introduced by He, Liu and Zhao, which is to decide whether a graph can be modified into a graph whose connected components form a colourful partition by deleting at most p edges. Despite the similarities in their definitions, we show that Colourful Partition and Colourful Components may have different complexities for restricted instances. We tighten known NP-hardness results for both problems by closing a number of complexity gaps. In addition, we prove new hardness and tractability results for Colourful Partition. In particular, we prove that deciding whether a coloured graph (G,c) has a colourful partition of size 2 is NP-complete for coloured planar bipartite graphs of maximum degree 3 and path-width 3, but polynomial-time solvable for coloured graphs of treewidth 2. Rather than performing an ad hoc study, we use our classical complexity results to guide us in undertaking a thorough parameterized study of Colourful Partition. We show that this leads to suitable parameters for obtaining FPT results and moreover prove that Colourful Components and Colourful Partition may have different parameterized complexities, depending on the chosen parameter.

Cite as

Laurent Bulteau, Konrad K. Dabrowski, Guillaume Fertin, Matthew Johnson, Daniël Paulusma, and Stéphane Vialette. Finding a Small Number of Colourful Components. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 20:1-20:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{bulteau_et_al:LIPIcs.CPM.2019.20,
  author =	{Bulteau, Laurent and Dabrowski, Konrad K. and Fertin, Guillaume and Johnson, Matthew and Paulusma, Dani\"{e}l and Vialette, St\'{e}phane},
  title =	{{Finding a Small Number of Colourful Components}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{20:1--20:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.20},
  URN =		{urn:nbn:de:0030-drops-104914},
  doi =		{10.4230/LIPIcs.CPM.2019.20},
  annote =	{Keywords: Colourful component, colourful partition, tree, treewidth, vertex cover}
}
Document
Streaming Dictionary Matching with Mismatches

Authors: Paweł Gawrychowski and Tatiana Starikovskaya


Abstract
In the k-mismatch problem we are given a pattern of length m and a text and must find all locations where the Hamming distance between the pattern and the text is at most k. A series of recent breakthroughs have resulted in an ultra-efficient streaming algorithm for this problem that requires only O(k log m/k) space [Clifford, Kociumaka, Porat, SODA 2019]. In this work, we consider a strictly harder problem called dictionary matching with k mismatches, where we are given a dictionary of d patterns of lengths at most m and must find all their k-mismatch occurrences in the text, and show the first streaming algorithm for it. The algorithm uses O(k d log^k d polylog m) space and processes each position of the text in O(k log^k d polylog m + occ) time, where occ is the number of k-mismatch occurrences of the patterns that end at this position. The algorithm is randomised and outputs correct answers with high probability.

Cite as

Paweł Gawrychowski and Tatiana Starikovskaya. Streaming Dictionary Matching with Mismatches. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 21:1-21:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2019.21,
  author =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  title =	{{Streaming Dictionary Matching with Mismatches}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{21:1--21:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.21},
  URN =		{urn:nbn:de:0030-drops-104925},
  doi =		{10.4230/LIPIcs.CPM.2019.21},
  annote =	{Keywords: Streaming, multiple pattern matching, Hamming distance}
}
Document
Quasi-Periodicity in Streams

Authors: Paweł Gawrychowski, Jakub Radoszewski, and Tatiana Starikovskaya


Abstract
In this work, we show two streaming algorithms for computing the length of the shortest cover of a string of length n. We start by showing a two-pass algorithm that uses O(log^2 n) space and then show a one-pass streaming algorithm that uses O(sqrt{n log n}) space. Both algorithms run in near-linear time. The algorithms are randomized and compute the answer incorrectly with probability inverse-polynomial in n. We also show that there is no sublinear-space streaming algorithm for computing the length of the shortest seed of a string.

Cite as

Paweł Gawrychowski, Jakub Radoszewski, and Tatiana Starikovskaya. Quasi-Periodicity in Streams. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 22:1-22:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2019.22,
  author =	{Gawrychowski, Pawe{\l} and Radoszewski, Jakub and Starikovskaya, Tatiana},
  title =	{{Quasi-Periodicity in Streams}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{22:1--22:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.22},
  URN =		{urn:nbn:de:0030-drops-104930},
  doi =		{10.4230/LIPIcs.CPM.2019.22},
  annote =	{Keywords: Streaming algorithms, quasi-periodicity, covers, seeds}
}
Document
Computing Runs on a Trie

Authors: Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda


Abstract
A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with n edges, we show that the number of runs is less than n. We also show an O(n sqrt{log n}log log n) time and O(n) space algorithm for counting and finding the shallower endpoint of all runs. We further show an O(n log n) time and O(n) space algorithm for finding both endpoints of all runs. We also discuss how to improve the running time even more.

Cite as

Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing Runs on a Trie. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 23:1-23:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{sugahara_et_al:LIPIcs.CPM.2019.23,
  author =	{Sugahara, Ryo and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Computing Runs on a Trie}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{23:1--23:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.23},
  URN =		{urn:nbn:de:0030-drops-104943},
  doi =		{10.4230/LIPIcs.CPM.2019.23},
  annote =	{Keywords: runs, Lyndon words}
}
Document
Linking BWT and XBW via Aho-Corasick Automaton: Applications to Run-Length Encoding

Authors: Bastien Cazaux and Eric Rivals


Abstract
The boom of genomic sequencing makes compression of sets of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the Burrows-Wheeler Transform (BWT), a reversible permutation of a text that improves its compressibility. A similar data structure, the eXtended Burrows-Wheeler Transform (XBW), is able to index a tree labelled with alphabet symbols. A link between a multi-string BWT and the Aho-Corasick automaton has already been found and led to a way to build a XBW from a multi-string BWT. We exhibit a stronger link between a multi-string BWT and a XBW by using the order of the concatenation in the multi-string. This bijective link has several applications: first, it allows one to build one data structure from the other; second, it enables one to compute an ordering of the input strings that optimises a Run-Length measure (i.e., the compressibility) of the BWT or of the XBW.

Cite as

Bastien Cazaux and Eric Rivals. Linking BWT and XBW via Aho-Corasick Automaton: Applications to Run-Length Encoding. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 24:1-24:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{cazaux_et_al:LIPIcs.CPM.2019.24,
  author =	{Cazaux, Bastien and Rivals, Eric},
  title =	{{Linking BWT and XBW via Aho-Corasick Automaton: Applications to Run-Length Encoding}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{24:1--24:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.24},
  URN =		{urn:nbn:de:0030-drops-104955},
  doi =		{10.4230/LIPIcs.CPM.2019.24},
  annote =	{Keywords: Data Structure, Algorithm, Aho-Corasick Tree, compression, RLE}
}
Document
Quasi-Linear-Time Algorithm for Longest Common Circular Factor

Authors: Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba


Abstract
We introduce the Longest Common Circular Factor (LCCF) problem in which, given strings S and T of length at most n, we are to compute the longest factor of S whose cyclic shift occurs as a factor of T. It is a new similarity measure, an extension of the classic Longest Common Factor. We show how to solve the LCCF problem in O(n log^4 n) time using O(n log^2 n) space.

Cite as

Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Quasi-Linear-Time Algorithm for Longest Common Circular Factor. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 25:1-25:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{alzamel_et_al:LIPIcs.CPM.2019.25,
  author =	{Alzamel, Mai and Crochemore, Maxime and Iliopoulos, Costas S. and Kociumaka, Tomasz and Radoszewski, Jakub and Rytter, Wojciech and Straszy\'{n}ski, Juliusz and Wale\'{n}, Tomasz and Zuba, Wiktor},
  title =	{{Quasi-Linear-Time Algorithm for Longest Common Circular Factor}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{25:1--25:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.25},
  URN =		{urn:nbn:de:0030-drops-104961},
  doi =		{10.4230/LIPIcs.CPM.2019.25},
  annote =	{Keywords: longest common factor, circular pattern matching, internal pattern matching, intersection of hyperrectangles}
}
Document
Simulating the DNA Overlap Graph in Succinct Space

Authors: Diego Díaz-Domínguez, Travis Gagie, and Gonzalo Navarro


Abstract
Converting a set of sequencing reads into a lossless compact data structure that encodes all the relevant biological information is a major challenge. The classical approaches are to build the string graph or the de Bruijn graph (dBG) of some order k. Each has advantages over the other depending on the application. Still, the ideal setting would be to have an index of the reads that is easy to build and can be adapted to any type of biological analysis. In this paper we propose rBOSS, a new data structure based on the Burrows-Wheeler Transform (BWT), which gets close to that ideal. Our rBOSS simultaneously encodes all the dBGs of a set of sequencing reads up to some order k, and for any dBG node v, it can compute in O(k) time all the other nodes whose labels have an overlap of at least m characters with the label of v, with m being a parameter. If we choose the parameter k equal to the size of the reads (assuming that all have equal length), then we can simulate the overlap graph of the read set. Instead of storing the edges of this graph explicitly, rBOSS computes them on the fly as we traverse the graph. As most BWT-based structures, rBOSS is unidirectional, meaning that we can retrieve only the suffix overlaps of the nodes. However, we exploit the property of the DNA reverse complements to simulate bi-directionality. We implemented a genome assembler on top of rBOSS to demonstrate its usefulness. The experimental results show that, using k=100, our rBOSS-based assembler can process ~500K reads of 150 characters long each (a FASTQ file of 185 MB) in less than 15 minutes and using 110 MB in total. It produces contigs of mean sizes over 10,000, which is twice the size obtained by using a pure de Bruijn graph of fixed length k.

Cite as

Diego Díaz-Domínguez, Travis Gagie, and Gonzalo Navarro. Simulating the DNA Overlap Graph in Succinct Space. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 26:1-26:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{diazdominguez_et_al:LIPIcs.CPM.2019.26,
  author =	{D{\'\i}az-Dom{\'\i}nguez, Diego and Gagie, Travis and Navarro, Gonzalo},
  title =	{{Simulating the DNA Overlap Graph in Succinct Space}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{26:1--26:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.26},
  URN =		{urn:nbn:de:0030-drops-104978},
  doi =		{10.4230/LIPIcs.CPM.2019.26},
  annote =	{Keywords: Overlap graph, de Bruijn graph, DNA sequencing, Succinct ordinal trees}
}
Document
Faster Queries for Longest Substring Palindrome After Block Edit

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda


Abstract
Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. Manacher [J. ACM 1975] proposed a seminal algorithm that computes the longest substring palindromes (LSPals) of a given string in O(n) time, where n is the length of the string. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log log n) time, after a substring in T is replaced by a string of arbitrary length l. This outperforms the query algorithm proposed in our previous work [CPM 2018] that uses O(l + log n) time for each query.

Cite as

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Queries for Longest Substring Palindrome After Block Edit. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 27:1-27:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.CPM.2019.27,
  author =	{Funakoshi, Mitsuru and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster Queries for Longest Substring Palindrome After Block Edit}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{27:1--27:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.27},
  URN =		{urn:nbn:de:0030-drops-104989},
  doi =		{10.4230/LIPIcs.CPM.2019.27},
  annote =	{Keywords: palindromes, string algorithm, periodicity}
}
Document
A Rearrangement Distance for Fully-Labelled Trees

Authors: Giulia Bernardini, Paola Bonizzoni, Gianluca Della Vedova, and Murray Patterson


Abstract
The problem of comparing trees representing the evolutionary histories of cancerous tumors has turned out to be crucial, since there is a variety of different methods which typically infer multiple possible trees. A departure from the widely studied setting of classical phylogenetics, where trees are leaf-labelled, tumoral trees are fully labelled, i.e., every vertex has a label. In this paper we provide a rearrangement distance measure between two fully-labelled trees. This notion originates from two operations: one which modifies the topology of the tree, the other which permutes the labels of the vertices, hence leaving the topology unaffected. While we show that the distance between two trees in terms of each such operation alone can be decided in polynomial time, the more general notion of distance when both operations are allowed is NP-hard to decide. Despite this result, we show that it is fixed-parameter tractable, and we give a 4-approximation algorithm when one of the trees is binary.

Cite as

Giulia Bernardini, Paola Bonizzoni, Gianluca Della Vedova, and Murray Patterson. A Rearrangement Distance for Fully-Labelled Trees. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 28:1-28:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{bernardini_et_al:LIPIcs.CPM.2019.28,
  author =	{Bernardini, Giulia and Bonizzoni, Paola and Della Vedova, Gianluca and Patterson, Murray},
  title =	{{A Rearrangement Distance for Fully-Labelled Trees}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{28:1--28:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.28},
  URN =		{urn:nbn:de:0030-drops-104998},
  doi =		{10.4230/LIPIcs.CPM.2019.28},
  annote =	{Keywords: Tree rearrangement distance, Cancer progression, Approximation algorithms, Computational complexity}
}
Document
On the Size of Overlapping Lempel-Ziv and Lyndon Factorizations

Authors: Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda


Abstract
Lempel-Ziv (LZ) factorization and Lyndon factorization are well-known factorizations of strings. Recently, Kärkkäinen et al. studied the relation between the sizes of the two factorizations, and showed that the size of the Lyndon factorization is always smaller than twice the size of the non-overlapping LZ factorization [STACS 2017]. In this paper, we consider a similar problem for the overlapping version of the LZ factorization. Since the size of the overlapping LZ factorization is always smaller than the size of the non-overlapping LZ factorization and, in fact, can even be an O(log n) factor smaller, it is not immediately clear whether a similar bound as in previous work would hold. Nevertheless, in this paper, we prove that the size of the Lyndon factorization is always smaller than four times the size of the overlapping LZ factorization.

Cite as

Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. On the Size of Overlapping Lempel-Ziv and Lyndon Factorizations. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 29:1-29:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{urabe_et_al:LIPIcs.CPM.2019.29,
  author =	{Urabe, Yuki and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{On the Size of Overlapping Lempel-Ziv and Lyndon Factorizations}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{29:1--29:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.29},
  URN =		{urn:nbn:de:0030-drops-105008},
  doi =		{10.4230/LIPIcs.CPM.2019.29},
  annote =	{Keywords: Lyndon factorization, Lyndon words, Lempel-Ziv factorization}
}
Document
Online Algorithms for Constructing Linear-Size Suffix Trie

Authors: Diptarama Hendrian, Takuya Takagi, and Shunsuke Inenaga


Abstract
The suffix trees are fundamental data structures for various kinds of string processing. The suffix tree of a string T of length n has O(n) nodes and edges, and the string label of each edge is encoded by a pair of positions in T. Thus, even after the tree is built, the input text T needs to be kept stored and random access to T is still needed. The linear-size suffix tries (LSTs), proposed by Crochemore et al. [Linear-size suffix tries, TCS 638:171-178, 2016], are a "stand-alone" alternative to the suffix trees. Namely, the LST of a string T of length n occupies O(n) total space, and supports pattern matching and other tasks in the same efficiency as the suffix tree without the need to store the input text T. Crochemore et al. proposed an offline algorithm which transforms the suffix tree of T into the LST of T in O(n log sigma) time and O(n) space, where sigma is the alphabet size. In this paper, we present two types of online algorithms which "directly" construct the LST, from right to left, and from left to right, without constructing the suffix tree as an intermediate structure. Both algorithms construct the LST incrementally when a new symbol is read, and do not access to the previously read symbols. The right-to-left construction algorithm works in O(n log sigma) time and O(n) space and the left-to-right construction algorithm works in O(n (log sigma + log n / log log n)) time and O(n) space. The main feature of our algorithms is that the input text does not need to be stored.

Cite as

Diptarama Hendrian, Takuya Takagi, and Shunsuke Inenaga. Online Algorithms for Constructing Linear-Size Suffix Trie. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 30:1-30:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{hendrian_et_al:LIPIcs.CPM.2019.30,
  author =	{Hendrian, Diptarama and Takagi, Takuya and Inenaga, Shunsuke},
  title =	{{Online Algorithms for Constructing Linear-Size Suffix Trie}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{30:1--30:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.30},
  URN =		{urn:nbn:de:0030-drops-105016},
  doi =		{10.4230/LIPIcs.CPM.2019.30},
  annote =	{Keywords: Indexing structure, Linear-size suffix trie, Online algorithm, Pattern Matching}
}
Document
Searching Long Repeats in Streams

Authors: Oleg Merkurev and Arseny M. Shur


Abstract
We consider two well-known related problems: Longest Repeated Substring (LRS) and Longest Repeated Reversed Substring (LRRS). Their streaming versions cannot be solved exactly; we show that only approximate solutions by Monte Carlo algorithms are possible, and prove a lower bound on consumed memory. For both problems, we present purely linear-time Monte Carlo algorithms working in O(E + n/E) space, where E is the additive approximation error. Within the same space bounds, we then present nearly real-time solutions, which require O(log n) time per symbol and O(n + n/E log n) time overall. The working space exactly matches the lower bound whenever E=O(n^{0.5}) and the size of the alphabet is Omega(n^{0.01}).

Cite as

Oleg Merkurev and Arseny M. Shur. Searching Long Repeats in Streams. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 31:1-31:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{merkurev_et_al:LIPIcs.CPM.2019.31,
  author =	{Merkurev, Oleg and Shur, Arseny M.},
  title =	{{Searching Long Repeats in Streams}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{31:1--31:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.31},
  URN =		{urn:nbn:de:0030-drops-105029},
  doi =		{10.4230/LIPIcs.CPM.2019.31},
  annote =	{Keywords: Longest repeated substring, longest repeated reversed substring, streaming algorithm, Karp, Rabin fingerprint, suffix tree}
}
Document
Computing the Antiperiod(s) of a String

Authors: Hayam Alamro, Golnaz Badkobeh, Djamal Belazzougui, Costas S. Iliopoulos, and Simon J. Puglisi


Abstract
A string S[1,n] is a power (or repetition or tandem repeat) of order k and period n/k, if it can be decomposed into k consecutive identical blocks of length n/k. Powers and periods are fundamental structures in the study of strings and algorithms to compute them efficiently have been widely studied. Recently, Fici et al. (Proc. ICALP 2016) introduced an antipower of order k to be a string composed of k distinct blocks of the same length, n/k, called the antiperiod. An arbitrary string will have antiperiod t if it is prefix of an antipower with antiperiod t. In this paper, we describe efficient algorithm for computing the smallest antiperiod of a string S of length n in O(n) time. We also describe an algorithm to compute all the antiperiods of S that runs in O(n log n) time.

Cite as

Hayam Alamro, Golnaz Badkobeh, Djamal Belazzougui, Costas S. Iliopoulos, and Simon J. Puglisi. Computing the Antiperiod(s) of a String. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 32:1-32:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{alamro_et_al:LIPIcs.CPM.2019.32,
  author =	{Alamro, Hayam and Badkobeh, Golnaz and Belazzougui, Djamal and Iliopoulos, Costas S. and Puglisi, Simon J.},
  title =	{{Computing the Antiperiod(s) of a String}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{32:1--32:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.32},
  URN =		{urn:nbn:de:0030-drops-105035},
  doi =		{10.4230/LIPIcs.CPM.2019.32},
  annote =	{Keywords: antiperiod, antipower, power, period, repetition, run, string}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail