LIPIcs, Volume 54

27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)



Thumbnail PDF

Event

CPM 2016, June 27-29, 2016, Tel Aviv, Israel

Editors

Roberto Grossi
Moshe Lewenstein

Publication Details

  • published at: 2016-06-27
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-012-5
  • DBLP: db/conf/cpm/cpm2016

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 54, CPM'16, Complete Volume

Authors: Roberto Grossi and Moshe Lewenstein


Abstract
LIPIcs, Volume 54, CPM'16, Complete Volume

Cite as

27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@Proceedings{grossi_et_al:LIPIcs.CPM.2016,
  title =	{{LIPIcs, Volume 54, CPM'16, Complete Volume}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016},
  URN =		{urn:nbn:de:0030-drops-60935},
  doi =		{10.4230/LIPIcs.CPM.2016},
  annote =	{Keywords: Data Structures, Data Storage Representations, Coding and Information Theory, Theory of Computation Discrete Mathematics, Information Systems}
}
Document
Front Matter
Front Matter, Table of Contents, Preface

Authors: Roberto Grossi and Moshe Lewenstein


Abstract
Front Matter, Table of Contents, Preface, List of Authors

Cite as

27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 0:i-0:x, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{grossi_et_al:LIPIcs.CPM.2016.0,
  author =	{Grossi, Roberto and Lewenstein, Moshe},
  title =	{{Front Matter, Table of Contents, Preface}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{0:i--0:x},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.0},
  URN =		{urn:nbn:de:0030-drops-60916},
  doi =		{10.4230/LIPIcs.CPM.2016.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, List of Authors}
}
Document
Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction

Authors: Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda


Abstract
Given a string S of n symbols, a longest common extension query LCE(i,j) asks for the length of the longest common prefix of the $i$th and $j$th suffixes of S. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015:65-76) described several data structures for answering LCE queries that offers a space-time trade-off between data structure size and query time. In particular, for a parameter 1 <= tau <= n, their best deterministic solution is a data structure of size O(n/tau) which allows LCE queries to be answered in O(tau) time. However, the construction time for all deterministic versions of their data structure is quadratic in n. In this paper, we propose a deterministic solution that achieves a similar space-time trade-off of O(tau * min(log(tau),log(n/tau)) query time using O(n/tau) space, but significantly improve the construction time to O(n*tau).

Cite as

Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda. Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 1:1-1:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{tanimura_et_al:LIPIcs.CPM.2016.1,
  author =	{Tanimura, Yuka and I, Tomohiro and Bannai, Hideo and Inenaga, Shunsuke and Puglisi, Simon J. and Takeda, Masayuki},
  title =	{{Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{1:1--1:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.1},
  URN =		{urn:nbn:de:0030-drops-60655},
  doi =		{10.4230/LIPIcs.CPM.2016.1},
  annote =	{Keywords: longest common extension, longest common prefix, sparse suffix array}
}
Document
Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching

Authors: Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Yilin Yang


Abstract
Let S and S' be two strings of the same length.We consider the following two variants of string matching. * Parameterized Matching: The characters of S and S' are partitioned into static characters and parameterized characters. The strings are parameterized match iff the static characters match exactly and there exists a one-to-one function which renames the parameterized characters in S to those in S'. * Order-Preserving Matching: The strings are order-preserving match iff for any two integers i,j in [1,|S|], S[i] <= S[j] iff S'[i] <= S'[j]. Let P be a collection of d patterns {P_1, P_2, ..., P_d} of total length n characters, which are chosen from an alphabet Sigma. Given a text T, also over Sigma, we consider the dictionary indexing problem under the above definitions of string matching. Specifically, the task is to index P, such that we can report all positions j where at least one of the patterns P_i in P is a parameterized-match (resp. order-preserving match) with the same-length substring of $T$ starting at j. Previous best-known indexes occupy O(n * log(n)) bits and can report all occ positions in O(|T| * log(|Sigma|) + occ) time. We present space-efficient indexes that occupy O(n * log(|Sigma|+d) * log(n)) bits and reports all occ positions in O(|T| * (log(|Sigma|) + log_{|Sigma|}(n)) + occ) time for parameterized matching and in O(|T| * log(n) + occ) time for order-preserving matching.

Cite as

Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Yilin Yang. Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 2:1-2:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.CPM.2016.2,
  author =	{Ganguly, Arnab and Hon, Wing-Kai and Sadakane, Kunihiko and Shah, Rahul and Thankachan, Sharma V. and Yang, Yilin},
  title =	{{Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{2:1--2:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.2},
  URN =		{urn:nbn:de:0030-drops-60736},
  doi =		{10.4230/LIPIcs.CPM.2016.2},
  annote =	{Keywords: Parameterized Matching, Order-preserving Matching, Dictionary Indexing, Aho-Corasick Automaton, Sparsification}
}
Document
Encoding Two-Dimensional Range Top-k Queries

Authors: Seungbum Jo, Rahul Lingala, and Srinivasa Rao Satti


Abstract
We consider various encodings that support range top-k queries on a two-dimensional array containing elements from a total order. For an m x n array, we first propose an almost optimal encoding for answering one-sided top-k queries, whose query range is restricted to [1 ... m][1 .. a], for 1 <= a <= n. Next, we propose an encoding for the general top-k queries that takes m^2 * lg(binom((k+1)n)(n)) + m * lg(m) + o(n) bits. This generalizes the one-dimensional top-k encoding of Gawrychowski and Nicholson [ICALP, 2015]. Finally, for a 2 x n array, we obtain a 2 lg(binom(3n)(n)) + 3n + o(n)-bit encoding for answering top-2 queries.

Cite as

Seungbum Jo, Rahul Lingala, and Srinivasa Rao Satti. Encoding Two-Dimensional Range Top-k Queries. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 3:1-3:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{jo_et_al:LIPIcs.CPM.2016.3,
  author =	{Jo, Seungbum and Lingala, Rahul and Satti, Srinivasa Rao},
  title =	{{Encoding Two-Dimensional Range Top-k Queries}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{3:1--3:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.3},
  URN =		{urn:nbn:de:0030-drops-60704},
  doi =		{10.4230/LIPIcs.CPM.2016.3},
  annote =	{Keywords: Encoding model, top-k query, range minimum query}
}
Document
Efficient Index for Weighted Sequences

Authors: Carl Barton, Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski


Abstract
The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to support efficient on-line pattern queries. We study this problem in the case where the text is weighted: for every position of the text and every letter of the alphabet a probability of occurrence of this letter at this position is given. Sequences of this type, also called position weight matrices, are commonly used to represent imprecise or uncertain data. A weighted sequence may represent many different strings, each with probability of occurrence equal to the product of probabilities of its letters at subsequent positions. Given a probability threshold 1/z, we say that a pattern string P matches a weighted text at position i if the product of probabilities of the letters of P at positions i,...,i+|P|-1 in the text is at least 1/z. In this article, we present an O(nz)-time construction of an O(nz)-sized index that can answer pattern matching queries in a weighted text in optimal time improving upon the state of the art by a factor of z log z. Other applications of this data structure include an O(nz)-time construction of the weighted prefix table and an O(nz)-time computation of all covers of a weighted sequence, which improve upon the state of the art by the same factor.

Cite as

Carl Barton, Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. Efficient Index for Weighted Sequences. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 4:1-4:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{barton_et_al:LIPIcs.CPM.2016.4,
  author =	{Barton, Carl and Kociumaka, Tomasz and Pissis, Solon P. and Radoszewski, Jakub},
  title =	{{Efficient Index for Weighted Sequences}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{4:1--4:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.4},
  URN =		{urn:nbn:de:0030-drops-60807},
  doi =		{10.4230/LIPIcs.CPM.2016.4},
  annote =	{Keywords: weighted sequence, position weight matrix, indexing, weighted suffix tree}
}
Document
Faster Longest Common Extension Queries in Strings over General Alphabets

Authors: Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, and Tomasz Walen


Abstract
Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of q LCE queries for a string of size n over a general ordered alphabet can be realized in O(q log log n + n log* n) time making only O(q + n) symbol comparisons. Consequently, all runs in a string over a general ordered alphabets can be computed in O(n log log n) time making O(n) symbol comparisons. Our results improve upon a solution by Kosolobov (Information Processing Letters, 2016), who designed an algorithm with O(n log^⅔ n) running time and conjectured that O(n) time is possible. Our paper makes a significant progress towards resolving this conjecture. Our techniques extend to the case of general unordered alphabets, when the time increases to O(q log n + n log* n). The main tools are difference covers and a variant of the disjoint-sets data structure by La Poutré (SODA 1990).

Cite as

Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, and Tomasz Walen. Faster Longest Common Extension Queries in Strings over General Alphabets. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2016.5,
  author =	{Gawrychowski, Pawel and Kociumaka, Tomasz and Rytter, Wojciech and Walen, Tomasz},
  title =	{{Faster Longest Common Extension Queries in Strings over General Alphabets}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{5:1--5:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.5},
  URN =		{urn:nbn:de:0030-drops-60810},
  doi =		{10.4230/LIPIcs.CPM.2016.5},
  annote =	{Keywords: longest common extension, longest common prefix, maximal repetitions, difference cover}
}
Document
Succinct Online Dictionary Matching with Improved Worst-Case Guarantees

Authors: Tsvi Kopelowitz, Ely Porat, and Yaron Rozen


Abstract
In the online dictionary matching problem the goal is to preprocess a set of patterns D={P_1,...,P_d} over alphabet Sigma, so that given an online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the online dictionary matching problem. Our solution uses a new succinct representation for multi-labeled trees, in which each node has a set of labels from a universe of size lambda. We consider lowest labeled ancestor (LLA) queries on multi-labeled trees, where given a node and a label we return the lowest proper ancestor of the node that has the queried label. In this paper we introduce a succinct representation of multi-labeled trees for lambda=omega(1) that support LLA queries in O(log(log(lambda))) time. Using this representation of multi-labeled trees, we introduce a succinct data structure for the online dictionary matching problem when sigma=omega(1). In this solution the worst case cost per character is O(log(log(sigma)) + occ) time, where occ is the size of the current output. Moreover, the amortized cost per character is O(1+occ) time.

Cite as

Tsvi Kopelowitz, Ely Porat, and Yaron Rozen. Succinct Online Dictionary Matching with Improved Worst-Case Guarantees. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 6:1-6:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{kopelowitz_et_al:LIPIcs.CPM.2016.6,
  author =	{Kopelowitz, Tsvi and Porat, Ely and Rozen, Yaron},
  title =	{{Succinct Online Dictionary Matching with Improved Worst-Case Guarantees}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{6:1--6:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.6},
  URN =		{urn:nbn:de:0030-drops-60825},
  doi =		{10.4230/LIPIcs.CPM.2016.6},
  annote =	{Keywords: Succinct indexing, dictionary matching, Aho-Corasick, labeled trees}
}
Document
Graph Motif Problems Parameterized by Dual

Authors: Guillaume Fertin and Christian Komusiewicz


Abstract
Let G=(V,E) be a vertex-colored graph, where C is the set of colors used to color V. The Graph Motif (or GM) problem takes as input G, a multiset M of colors built from C, and asks whether there is a subset S subseteq V such that (i) G[S] is connected and (ii) the multiset of colors obtained from S equals M. The Colorful Graph Motif problem (or CGM) is a constrained version of GM in which M=C, and the List-Colored Graph Motif problem (or LGM) is the extension of GM in which each vertex v of V may choose its color from a list L(v) of colors. We study the three problems GM, CGM and LGM, parameterized by l:=|V|-|M|. In particular, for general graphs, we show that, assuming the strong exponential-time hypothesis, CGM has no (2-epsilon)^l * |V|^{O(1)}-time algorithm, which implies that a previous algorithm, running in O(2^l\cdot |E|) time is optimal. We also prove that LGM is W[1]-hard even if we restrict ourselves to lists of at most two colors. If we constrain the input graph to be a tree, then we show that, in contrast to CGM, GM can be solved in O(4^l *|V|) time but admits no polynomial kernel, while CGM can be solved in O(sqrt{2}^l + |V|) time and admits a polynomial kernel.

Cite as

Guillaume Fertin and Christian Komusiewicz. Graph Motif Problems Parameterized by Dual. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 7:1-7:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{fertin_et_al:LIPIcs.CPM.2016.7,
  author =	{Fertin, Guillaume and Komusiewicz, Christian},
  title =	{{Graph Motif Problems Parameterized by Dual}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{7:1--7:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.7},
  URN =		{urn:nbn:de:0030-drops-60837},
  doi =		{10.4230/LIPIcs.CPM.2016.7},
  annote =	{Keywords: NP-hard problem, subgraph problem, fixed-parameter algorithm, lowerbounds, kernelization}
}
Document
Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties

Authors: Costas S. Iliopoulos and Jakub Radoszewski


Abstract
Strings with don't care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still their worst-case running time is Theta(n^2). We present the first truly subquadratic-time solutions for a number of such problems on partial words that can also be adapted to indeterminate strings over a constant-sized alphabet. We show that $n$ longest common compatible prefix queries (which correspond to longest common extension queries in regular strings) can be answered on-line in O(n * sqrt(n * log(n)) time after O(n * sqrt(n * log(n))-time preprocessing. We also present O(n * sqrt(n * log(n))-time algorithms for computing the prefix array and two types of border array of a partial word.

Cite as

Costas S. Iliopoulos and Jakub Radoszewski. Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 8:1-8:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{iliopoulos_et_al:LIPIcs.CPM.2016.8,
  author =	{Iliopoulos, Costas S. and Radoszewski, Jakub},
  title =	{{Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{8:1--8:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.8},
  URN =		{urn:nbn:de:0030-drops-60842},
  doi =		{10.4230/LIPIcs.CPM.2016.8},
  annote =	{Keywords: string with don’t cares, partial word, indeterminate string, longest common conservative prefix queries, prefix array}
}
Document
Estimating Statistics on Words Using Ambiguous Descriptions

Authors: Cyril Nicaud


Abstract
In this article we propose an alternative way to prove some recent results on statistics on words, such as the expected number of runs or the expected sum of the run exponents. Our approach consists in designing a general framework, based on the symbolic method developped in analytic combinatorics. The descriptions obtained in this framework are built in such a way that the degree of ambiguity of an object O (i.e., the number of different descriptions corresponding to O) is exactly the value of the statistic under study for O. The asymptotic estimation of the expectation is then done using classical techniques from analytic combinatorics. To show the generality of our method, we not only apply it to obtain new proofs of known results but also extend them from the uniform distribution to any memoryless distribution.

Cite as

Cyril Nicaud. Estimating Statistics on Words Using Ambiguous Descriptions. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 9:1-9:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{nicaud:LIPIcs.CPM.2016.9,
  author =	{Nicaud, Cyril},
  title =	{{Estimating Statistics on Words Using Ambiguous Descriptions}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{9:1--9:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.9},
  URN =		{urn:nbn:de:0030-drops-60859},
  doi =		{10.4230/LIPIcs.CPM.2016.9},
  annote =	{Keywords: random words, runs, symbolic method, analytic combinatorics}
}
Document
Reconstruction of Trees from Jumbled and Weighted Subtrees

Authors: Dénes Bartha, Peter Burcsi, and Zsuzsanna Lipták


Abstract
Let T be an edge-labeled graph, where the labels are from a finite alphabet Sigma. For a subtree U of T the Parikh vector of U is a vector of length |Sigma| which specifies the multiplicity of each label in U. We ask when T can be reconstructed from the multiset of Parikh vectors of all its subtrees, or all of its paths, or all of its maximal paths. We consider the analogous problems for weighted trees. We show how several well-known reconstruction problems on labeled strings, weighted strings and point sets on a line can be included in this framework. We present reconstruction algorithms and non-reconstructibility results, and extend the polynomial method, previously applied to jumbled strings [Acharya et al., SIAM J. on Discr. Math, 2015] and weighted strings [Bansal et al., CPM 2004], to deal with general trees and special tree classes.

Cite as

Dénes Bartha, Peter Burcsi, and Zsuzsanna Lipták. Reconstruction of Trees from Jumbled and Weighted Subtrees. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 10:1-10:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{bartha_et_al:LIPIcs.CPM.2016.10,
  author =	{Bartha, D\'{e}nes and Burcsi, Peter and Lipt\'{a}k, Zsuzsanna},
  title =	{{Reconstruction of Trees from Jumbled and Weighted Subtrees}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{10:1--10:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.10},
  URN =		{urn:nbn:de:0030-drops-60861},
  doi =		{10.4230/LIPIcs.CPM.2016.10},
  annote =	{Keywords: trees, paths, Parikh vectors, reconstruction problems, homometric sets, polynomial method, jumbled strings, weighted strings}
}
Document
A 7/2-Approximation Algorithm for the Maximum Duo-Preservation String Mapping Problem

Authors: Nicolas Boria, Gianpiero Cabodi, Paolo Camurati, Marco Palena, Paolo Pasini, and Stefano Quer


Abstract
This paper presents a simple 7/2-approximation algorithm for the Maximum Duo-Preservation String Mapping (MPSM) problem. This problem is complementary to the classical and well studied min common string partition problem (MCSP), that computes the minimal edit distance between two strings when the only operation allowed is to shift blocks of characters. The algorithm improves on the previously best-known 4-approximation algorithm by computing a simple local optimum.

Cite as

Nicolas Boria, Gianpiero Cabodi, Paolo Camurati, Marco Palena, Paolo Pasini, and Stefano Quer. A 7/2-Approximation Algorithm for the Maximum Duo-Preservation String Mapping Problem. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 11:1-11:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{boria_et_al:LIPIcs.CPM.2016.11,
  author =	{Boria, Nicolas and Cabodi, Gianpiero and Camurati, Paolo and Palena, Marco and Pasini, Paolo and Quer, Stefano},
  title =	{{A 7/2-Approximation Algorithm for the Maximum Duo-Preservation String Mapping Problem}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{11:1--11:8},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.11},
  URN =		{urn:nbn:de:0030-drops-60875},
  doi =		{10.4230/LIPIcs.CPM.2016.11},
  annote =	{Keywords: Polynomial approximation, Max Duo-Preservation String Mapping Problem, Min Common String Partition Problem, Local Search}
}
Document
Fast Compatibility Testing for Rooted Phylogenetic Trees

Authors: Yun Deng and David Fernández-Baca


Abstract
We consider the following basic problem in phylogenetic tree construction. Let $\mathcal P = {T_1, ..., T_k} be a collection of rooted phylogenetic trees over various subsets of a set of species. The tree compatibility problem asks whether there is a tree T with the following property: for each i in {1, ..., k}, T_i can be obtained from the restriction of T to the species set of T_i by contracting zero or more edges. If such a tree T exists, we say that P is compatible. We give a ~O(M_P) algorithm for the tree compatibility problem, where M_P is the total number of nodes and edges in P. Unlike previous algorithms for this problem, the running time of our method does not depend on the degrees of the nodes in the input trees. Thus, it is equally fast on highly resolved and highly unresolved trees.

Cite as

Yun Deng and David Fernández-Baca. Fast Compatibility Testing for Rooted Phylogenetic Trees. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 12:1-12:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{deng_et_al:LIPIcs.CPM.2016.12,
  author =	{Deng, Yun and Fern\'{a}ndez-Baca, David},
  title =	{{Fast Compatibility Testing for Rooted Phylogenetic Trees}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{12:1--12:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.12},
  URN =		{urn:nbn:de:0030-drops-60884},
  doi =		{10.4230/LIPIcs.CPM.2016.12},
  annote =	{Keywords: Algorithms, computational biology, phylogenetics}
}
Document
Hardness of RNA Folding Problem With Four Symbols

Authors: Yi-Jun Chang


Abstract
An RNA sequence is a string composed of four types of nucleotides, A, C, G, and U. Given an RNA sequence, the goal of the RNA folding problem is to find a maximum cardinality set of crossing-free pairs of the form {A,U} or {C,G}. The problem is central in bioinformatics and has received much attention over the years. Whether the RNA folding problem can be solved in O(n^{3-epsilon}) time remains an open problem. Recently, Abboud, Backurs, and Williams (FOCS'15) made the first progress by showing a conditional lower bound for a generalized version of the RNA folding problem based on a conjectured hardness of the $k$-clique problem. However, their proof requires alphabet size >= 36 to work, making the result biologically irrelevant. In this paper, by constructing the gadgets using a lemma of Bringmann and Künnemann (FOCS'15) and surrounding them with some carefully designed sequences, we improve upon the framework of Abboud et al. to handle the case of alphabet size 4, yielding a conditional lower bound for the RNA folding problem. We also investigate the Dyck edit distance problem. We demonstrate a reduction from RNA folding problem to Dyck edit distance problem of alphabet size 10, establishing a connection between the two fundamental string problems. This leads to a much simpler proof of the conditional lower bound for Dyck edit distance problem given by Abboud et al. and lowers the required alphabet size for the lower bound to work.

Cite as

Yi-Jun Chang. Hardness of RNA Folding Problem With Four Symbols. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 13:1-13:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{chang:LIPIcs.CPM.2016.13,
  author =	{Chang, Yi-Jun},
  title =	{{Hardness of RNA Folding Problem With Four Symbols}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{13:1--13:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.13},
  URN =		{urn:nbn:de:0030-drops-60894},
  doi =		{10.4230/LIPIcs.CPM.2016.13},
  annote =	{Keywords: RNA folding, Dyck edit distance, longest common subsequence, conditional lower bound, clique}
}
Document
Efficient Non-Binary Gene Tree Resolution with Weighted Reconciliation Cost

Authors: Manuel Lafond, Emmanuel Noutahi, and Nadia El-Mabrouk


Abstract
Polytomies in gene trees are multifurcated nodes corresponding to unresolved parts of the tree, usually due to insufficient differentiation between sequences of homologous gene copies. Apart from gene sequences, other information such as that contained in the species tree can be used to resolve such intricate parts of a gene tree. The problem of resolving a multifurcated tree has been considered by many authors, the objective function often being the number of duplications and losses reflected by the reconciliation of the resolved gene tree with the species tree. Here, we present PolytomySolver, an algorithm accounting for a more general model allowing different costs for duplications and losses per species. The time complexity of this algorithm is linear for the unit cost and is quadratic for the general cost, which outperforms the best known solutions so far by a linear factor. We show on simulated trees that the gain in theoretical complexity has a real practical impact on running times.

Cite as

Manuel Lafond, Emmanuel Noutahi, and Nadia El-Mabrouk. Efficient Non-Binary Gene Tree Resolution with Weighted Reconciliation Cost. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 14:1-14:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{lafond_et_al:LIPIcs.CPM.2016.14,
  author =	{Lafond, Manuel and Noutahi, Emmanuel and El-Mabrouk, Nadia},
  title =	{{Efficient Non-Binary Gene Tree Resolution with Weighted Reconciliation Cost}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{14:1--14:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.14},
  URN =		{urn:nbn:de:0030-drops-60907},
  doi =		{10.4230/LIPIcs.CPM.2016.14},
  annote =	{Keywords: gene tree, polytomy, reconciliation, resolution, weighted cost, phylogeny}
}
Document
Genomic Scaffold Filling Revisited

Authors: Haitao Jiang, Chenglin Fan, Boting Yang, Farong Zhong, Daming Zhu, and Binhai Zhu


Abstract
The genomic scaffold filling problem has attracted a lot of attention recently. The problem is on filling an incomplete sequence (scaffold) I into I', with respect to a complete reference genome G, such that the number of adjacencies between G and I' is maximized. The problem is NP-complete and APX-hard, and admits a 1.2-approximation. However, the sequence input I is not quite practical and does not fit most of the real datasets (where a scaffold is more often given as a list of contigs). In this paper, we revisit the genomic scaffold filling problem by considering this important case when, (1) a scaffold S is given, the missing genes X = c(G) - c(S) can only be inserted in between the contigs, and the objective is to maximize the number of adjacencies between G and the filled S' and (2) a scaffold S is given, a subset of the missing genes X' subset X = c(G) - c(S) can only be inserted in between the contigs, and the objective is still to maximize the number of adjacencies between G and the filled S''. For problem (1), we present a simple NP-completeness proof, we then present a factor-2 greedy approximation algorithm, and finally we show that the problem is FPT when each gene appears at most d times in G. For problem (2), we prove that the problem is W[1]-hard and then we present a factor-2 FPT-approximation for the case when each gene appears at most d times in G.

Cite as

Haitao Jiang, Chenglin Fan, Boting Yang, Farong Zhong, Daming Zhu, and Binhai Zhu. Genomic Scaffold Filling Revisited. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.CPM.2016.15,
  author =	{Jiang, Haitao and Fan, Chenglin and Yang, Boting and Zhong, Farong and Zhu, Daming and Zhu, Binhai},
  title =	{{Genomic Scaffold Filling Revisited}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.15},
  URN =		{urn:nbn:de:0030-drops-60791},
  doi =		{10.4230/LIPIcs.CPM.2016.15},
  annote =	{Keywords: Computational biology, Approximation algorithms, FPT algorithms, NP- completeness}
}
Document
A Linear-Time Algorithm for the Copy Number Transformation Problem

Authors: Ron Shamir, Meirav Zehavi, and Ron Zeira


Abstract
Problems of genome rearrangement are central in both evolution and cancer. Most evolutionary scenarios have been studied under the assumption that the genome contains a single copy of each gene. In contrast, tumor genomes undergo deletions and duplications, and thus the number of copies of genes varies. The number of copies of each gene along a chromosome is called its copy number profile. Understanding copy number profile changes can assist in predicting disease progression and treatment. To date, questions related to distances between copy number profiles gained little scientific attention. Here we focus on the following fundamental problem, introduced by Schwarz et al. (PLOS Comp. Biol., 2014): given two copy number profiles, u and v, compute the edit distance from u to v, where the edit operations are segmental deletions and amplifications. We establish the computational complexity of this problem, showing that it is solvable in linear time and constant space.

Cite as

Ron Shamir, Meirav Zehavi, and Ron Zeira. A Linear-Time Algorithm for the Copy Number Transformation Problem. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 16:1-16:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{shamir_et_al:LIPIcs.CPM.2016.16,
  author =	{Shamir, Ron and Zehavi, Meirav and Zeira, Ron},
  title =	{{A Linear-Time Algorithm for the Copy Number Transformation Problem}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{16:1--16:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.16},
  URN =		{urn:nbn:de:0030-drops-60789},
  doi =		{10.4230/LIPIcs.CPM.2016.16},
  annote =	{Keywords: Genome Rearrangement, Copy Number}
}
Document
On Almost Monge All Scores Matrices

Authors: Amir Carmel, Dekel Tsur, and Michal Ziv-Ukelson


Abstract
The all scores matrix of a grid graph is a matrix containing the optimal scores of paths from every vertex on the first row of the graph to every vertex on the last row. This matrix is commonly used to solve diverse string comparison problems. All scores matrices have the Monge property, and this was exploited by previous works that used all scores matrices for solving various problems. In this paper, we study an extension of grid graphs that contain an additional set of edges, called bridges. Our main result is to show several properties of the all scores matrices of such graphs. We also give an O(r(nm + n2)) time algorithm for constructing the all scores matrix of an m × n grid graph with r bridges.

Cite as

Amir Carmel, Dekel Tsur, and Michal Ziv-Ukelson. On Almost Monge All Scores Matrices. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 17:1-17:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{carmel_et_al:LIPIcs.CPM.2016.17,
  author =	{Carmel, Amir and Tsur, Dekel and Ziv-Ukelson, Michal},
  title =	{{On Almost Monge All Scores Matrices}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{17:1--17:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.17},
  URN =		{urn:nbn:de:0030-drops-60770},
  doi =		{10.4230/LIPIcs.CPM.2016.17},
  annote =	{Keywords: Sequence alignment, longest common subsequences, DIST matrices, Monge matrices, all path score computations.}
}
Document
Tight Tradeoffs for Real-Time Approximation of Longest Palindromes in Streams

Authors: Pawel Gawrychowski, Oleg Merkurev, Arseny Shur, and Przemyslaw Uznanski


Abstract
We consider computing a longest palindrome in the streaming model, where the symbols arrive one-by-one and we do not have random access to the input. While computing the answer exactly using sublinear space is not possible in such a setting, one can still hope for a good approximation guarantee. Our contribution is twofold. First, we provide lower bounds on the space requirements for randomized approximation algorithms processing inputs of length n. We rule out Las Vegas algorithms, as they cannot achieve sublinear space complexity. For Monte Carlo algorithms, we prove a lower bounds of Omega(M log min {|Sigma|, M}) bits of memory; here M=n/E for approximating the answer with additive error E, and M= log n / log (1 + epsilon) for approximating the answer with multiplicative error (1 + epsilon). Second, we design three real-time algorithms for this problem. Our Monte Carlo approximation algorithms for both additive and multiplicative versions of the problem use O(M) words of memory. Thus the obtained lower bounds are asymptotically tight up to a logarithmic factor. The third algorithm is deterministic and finds a longest palindrome exactly if it is short. This algorithm can be run in parallel with a Monte Carlo algorithm to obtain better results in practice. Overall, both the time and space complexity of finding a longest palindrome in a stream are essentially settled.

Cite as

Pawel Gawrychowski, Oleg Merkurev, Arseny Shur, and Przemyslaw Uznanski. Tight Tradeoffs for Real-Time Approximation of Longest Palindromes in Streams. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 18:1-18:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2016.18,
  author =	{Gawrychowski, Pawel and Merkurev, Oleg and Shur, Arseny and Uznanski, Przemyslaw},
  title =	{{Tight Tradeoffs for Real-Time Approximation of Longest Palindromes in Streams}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{18:1--18:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.18},
  URN =		{urn:nbn:de:0030-drops-60765},
  doi =		{10.4230/LIPIcs.CPM.2016.18},
  annote =	{Keywords: streaming algorithms, space lower bounds, real-time algorithms, palin- dromes, Monte Carlo algorithms}
}
Document
Finding Maximal 2-Dimensional Palindromes

Authors: Sara Geizhals and Dina Sokol


Abstract
This paper extends the problem of palindrome searching into a higher dimension, addressing two definitions of 2D palindromes. The first definition implies a square, while the second definition (also known as a centrosymmetric factor), can be any rectangular shape. We describe two algorithms for searching a 2D text for maximal palindromes, one for each type of 2D palindrome. The first algorithm is optimal; it runs in linear time, on par with Manacher's linear time 1D palindrome algorithm. The second algorithm searches a text of size n_1 x n_2 (n_1 >= n_2) in O(n_2) time for each of its n_1 x n_2 positions. Since each position may have up to O(n_2) maximal palindromes centered at that location, the second result is also optimal in terms of the worst-case output size.

Cite as

Sara Geizhals and Dina Sokol. Finding Maximal 2-Dimensional Palindromes. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 19:1-19:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{geizhals_et_al:LIPIcs.CPM.2016.19,
  author =	{Geizhals, Sara and Sokol, Dina},
  title =	{{Finding Maximal 2-Dimensional Palindromes}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{19:1--19:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.19},
  URN =		{urn:nbn:de:0030-drops-60752},
  doi =		{10.4230/LIPIcs.CPM.2016.19},
  annote =	{Keywords: palindrome, pattern matching, 2-Dimensional, centrosymmetric factor}
}
Document
Boxed Permutation Pattern Matching

Authors: Mika Amit, Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, and Hjalte Wedel Vildhøj


Abstract
Given permutations T and P of length n and m, respectively, the Permutation Pattern Matching problem asks to find all m-length subsequences of T that are order-isomorphic to P. This problem has a wide range of applications but is known to be NP-hard. In this paper, we study the special case, where the goal is to only find the boxed subsequences of T that are order-isomorphic to P. This problem was introduced by Bruner and Lackner who showed that it can be solved in O(n^3) time. Cho et al. [CPM 2015] gave an O(n^2m) time algorithm and improved it to O(n^2 log m). In this paper we present a solution that uses only O(n^2) time. In general, there are instances where the output size is Omega(n^2) and hence our bound is optimal. To achieve our results, we introduce several new ideas including a novel reduction to 2D offline dominance counting. Our algorithm is surprisingly simple and straightforward to implement.

Cite as

Mika Amit, Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, and Hjalte Wedel Vildhøj. Boxed Permutation Pattern Matching. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 20:1-20:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{amit_et_al:LIPIcs.CPM.2016.20,
  author =	{Amit, Mika and Bille, Philip and Hagge Cording, Patrick and Li G{\o}rtz, Inge and Wedel Vildh{\o}j, Hjalte},
  title =	{{Boxed Permutation Pattern Matching}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{20:1--20:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.20},
  URN =		{urn:nbn:de:0030-drops-60744},
  doi =		{10.4230/LIPIcs.CPM.2016.20},
  annote =	{Keywords: Permutation, Subsequence, Pattern Matching, Order Preserving, Boxed Mesh Pattern}
}
Document
Longest Common Substring with Approximately k Mismatches

Authors: Tatiana Starikovskaya


Abstract
In the longest common substring problem we are given two strings of length n and must find a substring of maximal length that occurs in both strings. It is well-known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one letter. To circumvent this, Leimester and Morgenstern introduced the problem of the longest common substring with k mismatches. Lately, this problem has received a lot of attention in the literature, and several algorithms have been suggested. The running time of these algorithms is n^{2-o(1)}, and unfortunately, conditional lower bounds have been shown which imply that there is little hope to improve this bound. In this paper we study a different but closely related problem of the longest common substring with approximately k mismatches and use computational geometry techniques to show that it admits a randomised solution with strongly subquadratic running time.

Cite as

Tatiana Starikovskaya. Longest Common Substring with Approximately k Mismatches. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 21:1-21:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{starikovskaya:LIPIcs.CPM.2016.21,
  author =	{Starikovskaya, Tatiana},
  title =	{{Longest Common Substring with Approximately k Mismatches}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{21:1--21:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.21},
  URN =		{urn:nbn:de:0030-drops-60720},
  doi =		{10.4230/LIPIcs.CPM.2016.21},
  annote =	{Keywords: Randomised algorithms, string similarity measures, longest common substring, sketching, locality-sensitive hashing}
}
Document
Fully-online Construction of Suffix Trees for Multiple Texts

Authors: Takuya Takagi, Shunsuke Inenaga, and Hiroki Arimura


Abstract
We consider fully-online construction of indexing data structures for multiple texts. Let T = {T_1, ..., T_K} be a collection of texts. By fully-online, we mean that a new character can be appended to any text in T at any time. This is a natural generalization of semi-online construction of indexing data structures for multiple texts in which, after a new character is appended to the kth text T_k, then its previous texts T_1, ..., T_k-1 will remain static. Our fully-online scenario arises when we maintain dynamic indexes for multi-sensor data. Let N and sigma denote the total length of texts in T and the alphabet size, respectively. We first show that the algorithm by Blumer et al. [Theoretical Computer Science, 40:31-55, 1985] to construct the directed acyclic word graph (DAWG) for T can readily be extended to our fully-online setting, retaining O(N log sigma)-time and O(N)-space complexities. Then, we give a sophisticated fully-online algorithm which constructs the suffix tree for T in O(N log sigma) time and O(N) space. A key idea of this algorithm is synchronized maintenance of the DAWG and the suffix tree.

Cite as

Takuya Takagi, Shunsuke Inenaga, and Hiroki Arimura. Fully-online Construction of Suffix Trees for Multiple Texts. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 22:1-22:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{takagi_et_al:LIPIcs.CPM.2016.22,
  author =	{Takagi, Takuya and Inenaga, Shunsuke and Arimura, Hiroki},
  title =	{{Fully-online Construction of Suffix Trees for Multiple Texts}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{22:1--22:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.22},
  URN =		{urn:nbn:de:0030-drops-60719},
  doi =		{10.4230/LIPIcs.CPM.2016.22},
  annote =	{Keywords: suffix trees, DAWGs, multiple texts, online algorithms}
}
Document
Linear-time Suffix Sorting - A New Approach for Suffix Array Construction

Authors: Uwe Baier


Abstract
This paper presents a new approach for linear-time suffix sorting. It introduces a new sorting principle that can be used to build the first non-recursive linear-time suffix array construction algorithm named GSACA. Although GSACA cannot keep up with the performance of state of the art suffix array construction algorithms, the algorithm introduces a couple of new ideas for suffix array construction, and therefore can be seen as an ’idea collection’ for further suffix array construction improvements.

Cite as

Uwe Baier. Linear-time Suffix Sorting - A New Approach for Suffix Array Construction. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 23:1-23:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{baier:LIPIcs.CPM.2016.23,
  author =	{Baier, Uwe},
  title =	{{Linear-time Suffix Sorting - A New Approach for Suffix Array Construction}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{23:1--23:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.23},
  URN =		{urn:nbn:de:0030-drops-60698},
  doi =		{10.4230/LIPIcs.CPM.2016.23},
  annote =	{Keywords: Suffix array, sorting algorithm, linear time}
}
Document
Color-Distance Oracles and Snippets

Authors: Tsvi Kopelowitz and Robert Krauthgamer


Abstract
In the snippets problem we are interested in preprocessing a text T so that given two pattern queries P_1 and P_2, one can quickly locate the occurrences of the patterns in T that are the closest to each other. A closely related problem is that of constructing a color-distance oracle, where the goal is to preprocess a set of points from some metric space, in which every point is associated with a set of colors, so that given two colors one can quickly locate two points associated with those colors, that are as close as possible to each other. We introduce efficient data structures for both color-distance oracles and the snippets problem. Moreover, we prove conditional lower bounds for these problems from both the 3SUM conjecture and the Combinatorial Boolean Matrix Multiplication conjecture.

Cite as

Tsvi Kopelowitz and Robert Krauthgamer. Color-Distance Oracles and Snippets. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 24:1-24:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{kopelowitz_et_al:LIPIcs.CPM.2016.24,
  author =	{Kopelowitz, Tsvi and Krauthgamer, Robert},
  title =	{{Color-Distance Oracles and Snippets}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{24:1--24:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.24},
  URN =		{urn:nbn:de:0030-drops-60684},
  doi =		{10.4230/LIPIcs.CPM.2016.24},
  annote =	{Keywords: Snippets, Text Indexing, Distance Oracles, Near Neighbor Search}
}
Document
The Nearest Colored Node in a Tree

Authors: Pawel Gawrychowski, Gad M. Landau, Shay Mozes, and Oren Weimann


Abstract
We start a systematic study of data structures for the nearest colored node problem on trees. Given a tree with colored nodes and weighted edges, we want to answer queries (v,c) asking for the nearest node to node v that has color c. This is a natural generalization of the well-known nearest marked ancestor problem. We give an O(n)-space O(log log n)-query solution and show that this is optimal. We also consider the dynamic case where updates can change a node's color and show that in O(n) space we can support both updates and queries in O(log n) time. We complement this by showing that O(polylog n) update time implies Omega(log n \ log log n) query time. Finally, we consider the case where updates can change the edges of the tree (link-cut operations). There is a known (top-tree based) solution that requires update time that is roughly linear in the number of colors. We show that this solution is probably optimal by showing that a strictly sublinear update time implies a strictly subcubic time algorithm for the classical all pairs shortest paths problem on a general graph. We also consider versions where the tree is rooted, and the query asks for the nearest ancestor/descendant of node v that has color c, and present efficient data structures for both variants in the static and the dynamic setting.

Cite as

Pawel Gawrychowski, Gad M. Landau, Shay Mozes, and Oren Weimann. The Nearest Colored Node in a Tree. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 25:1-25:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2016.25,
  author =	{Gawrychowski, Pawel and Landau, Gad M. and Mozes, Shay and Weimann, Oren},
  title =	{{The Nearest Colored Node in a Tree}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{25:1--25:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.25},
  URN =		{urn:nbn:de:0030-drops-60674},
  doi =		{10.4230/LIPIcs.CPM.2016.25},
  annote =	{Keywords: Marked ancestor, Vertex-label distance oracles, Nearest colored descend- ant, Top-trees}
}
Document
On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Authors: Johannes Fischer, Dominik Köppl, and Florian Kurpicz


Abstract
We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with p processors. Given a static text of length n, we first show how to compute the suffix array interval of a given pattern of length m in O(m/p + lg p + lg lg p * lg lg n) time for p <= m. For approximate pattern matching with k differences or mismatches, we show how to compute all occurrences of a given pattern in O((m^k sigma^k)/p max (k, lg lg n) + (1+m/p) lg p * lg lg n + occ} time, where sigma is the size of the alphabet and p <= sigma^k m^k. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns P and P', we present a data structure for computing the interval of PP' in O(lg lg n) sequential time, or in O(1 + lg_p lg n) parallel time. All our data structures are of size O(n) bits (in addition to the suffix array).

Cite as

Johannes Fischer, Dominik Köppl, and Florian Kurpicz. On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 26:1-26:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{fischer_et_al:LIPIcs.CPM.2016.26,
  author =	{Fischer, Johannes and K\"{o}ppl, Dominik and Kurpicz, Florian},
  title =	{{On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{26:1--26:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.26},
  URN =		{urn:nbn:de:0030-drops-60669},
  doi =		{10.4230/LIPIcs.CPM.2016.26},
  annote =	{Keywords: parallel algorithms, pattern matching, approximate string matching}
}
Document
Factorizing a String into Squares in Linear Time

Authors: Yoshiaki Matsuoka, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Florin Manea


Abstract
A square factorization of a string w is a factorization of w in which each factor is a square. Dumitran et al. [SPIRE 2015, pp. 54-66] showed how to find a square factorization of a given string of length n in O(n log n) time, and they posed a question whether it can be done in O(n) time. In this paper, we answer their question positively, showing an O(n)-time algorithm for square factorization in the standard word RAM model with machine word size omega = Omega(log n). We also show an O(n + (n log^2 n) / omega)-time (respectively, O(n log n)-time) algorithm to find a square factorization which contains the maximum (respectively, minimum) number of squares.

Cite as

Yoshiaki Matsuoka, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Florin Manea. Factorizing a String into Squares in Linear Time. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 27:1-27:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{matsuoka_et_al:LIPIcs.CPM.2016.27,
  author =	{Matsuoka, Yoshiaki and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki and Manea, Florin},
  title =	{{Factorizing a String into Squares in Linear Time}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{27:1--27:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.27},
  URN =		{urn:nbn:de:0030-drops-60645},
  doi =		{10.4230/LIPIcs.CPM.2016.27},
  annote =	{Keywords: Squares, Runs, Factorization of Strings}
}
Document
Minimal Suffix and Rotation of a Substring in Optimal Time

Authors: Tomasz Kociumaka


Abstract
For a text of length $n$ given in advance, the substring minimal suffix queries ask to determine the lexicographically minimal non-empty suffix of a substring specified by the location of its occurrence in the text. We develop a data structure answering such queries optimally: in constant time after linear-time preprocessing. This improves upon the results of Babenko et al. (CPM 2014), whose trade-off solution is characterized by Theta(n log n) product of these time complexities. Next, we extend our queries to support concatenations of O(1) substrings, for which the construction and query time is preserved. We apply these generalized queries to compute lexicographically minimal and maximal rotations of a given substring in constant time after linear-time preprocessing. Our data structures mainly rely on properties of Lyndon words and Lyndon factorizations. We combine them with further algorithmic and combinatorial tools, such as fusion trees and the notion of order isomorphism of strings.

Cite as

Tomasz Kociumaka. Minimal Suffix and Rotation of a Substring in Optimal Time. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 28:1-28:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{kociumaka:LIPIcs.CPM.2016.28,
  author =	{Kociumaka, Tomasz},
  title =	{{Minimal Suffix and Rotation of a Substring in Optimal Time}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{28:1--28:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.28},
  URN =		{urn:nbn:de:0030-drops-60626},
  doi =		{10.4230/LIPIcs.CPM.2016.28},
  annote =	{Keywords: minimal suffix, minimal rotation, Lyndon factorization, substring canon- ization, substring queries}
}
Document
Optimal Prefix Free Codes with Partial Sorting

Authors: Jérémy Barbay


Abstract
We describe an algorithm computing an optimal prefix free code for n unsorted positive weights in less time than required to sort them on many large classes of instances, identified by a new measure of difficulty for this problem, the alternation alpha. This asymptotical complexity is within a constant factor of the optimal in the algebraic decision tree computational model, in the worst case over all instances of fixed size n and alternation alpha. Such results refine the state of the art complexity in the worst case over instances of size n in the same computational model, a landmark in compression and coding since 1952, by the mere combination of van Leeuwen's algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with Deferred Data Structures to partially sort multisets (known since 1988).

Cite as

Jérémy Barbay. Optimal Prefix Free Codes with Partial Sorting. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 29:1-29:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{barbay:LIPIcs.CPM.2016.29,
  author =	{Barbay, J\'{e}r\'{e}my},
  title =	{{Optimal Prefix Free Codes with Partial Sorting}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{29:1--29:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.29},
  URN =		{urn:nbn:de:0030-drops-60635},
  doi =		{10.4230/LIPIcs.CPM.2016.29},
  annote =	{Keywords: Deferred Data Structure, Huffman, Median, Optimal Prefix Free Codes, van Leeuwen.}
}

Filters