LIPIcs, Volume 78

28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)



Thumbnail PDF

Event

CPM 2017, July 4-6, 2017, Warsaw, Poland

Editors

Juha Kärkkäinen
Jakub Radoszewski
Wojciech Rytter

Publication Details

  • published at: 2017-06-30
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-039-2
  • DBLP: db/conf/cpm/cpm2017

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 78, CPM'17, Complete Volume

Authors: Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter


Abstract
LIPIcs, Volume 78, CPM'17, Complete Volume

Cite as

28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@Proceedings{karkkainen_et_al:LIPIcs.CPM.2017,
  title =	{{LIPIcs, Volume 78, CPM'17, Complete Volume}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017},
  URN =		{urn:nbn:de:0030-drops-75091},
  doi =		{10.4230/LIPIcs.CPM.2017},
  annote =	{Keywords: Data Structures, Data Storage Representations, Coding and Information Theory, Theory of Computation, Discrete Mathematics, Information Systems,}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers

Authors: Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter


Abstract
Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers

Cite as

28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{karkkainen_et_al:LIPIcs.CPM.2017.0,
  author =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.0},
  URN =		{urn:nbn:de:0030-drops-73178},
  doi =		{10.4230/LIPIcs.CPM.2017.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers}
}
Document
Wheeler Graphs: Variations on a Theme by Burrows and Wheeler

Authors: Giovanni Manzini


Abstract
The famous Burrows-Wheeler Transform was originally defined for single strings but variations have been developed for sets of strings, labelled trees, de Bruijn graphs, alignments, etc. In this talk we propose a unifying view that includes many of these variations and that we hope will simplify the search for more. Somewhat surprisingly we get our unifying view by considering the Nondeterministic Finite Automata related to different pattern-matching problems. We show that the state graphs associated with these automata have common properties that we summarize with the concept of a Wheeler graph. Using the notion of a Wheeler graph, we show that it is possible to process strings efficiently even if the automaton is nondeterministic. In addition, we show that Wheeler graphs can be compactly represented and traversed using up to three arrays with additional data structures supporting efficient rank and select operations. It turns out that these arrays coincide with, or are substantially equivalent to, the output of many Burrows-Wheeler Transform variants described in the literature. This is joint work with Travis Gagie and Jouni Sirén.

Cite as

Giovanni Manzini. Wheeler Graphs: Variations on a Theme by Burrows and Wheeler. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{manzini:LIPIcs.CPM.2017.1,
  author =	{Manzini, Giovanni},
  title =	{{Wheeler Graphs: Variations on a Theme by Burrows and Wheeler}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.1},
  URN =		{urn:nbn:de:0030-drops-73343},
  doi =		{10.4230/LIPIcs.CPM.2017.1},
  annote =	{Keywords: compressed data structures, pattern matching}
}
Document
Recompression of SLPs

Authors: Artur Jez


Abstract
In this talk I will survey the recompression technique in case of SLPs. The technique is based on applying simple compression operations (replacement of pairs of two different letters by a new letter and replacement of maximal repetition of a letter by a new symbol) to strings represented by SLPs. To this end we modify the SLPs, so that performing such compression operations on SLPs is possible. For instance, when we want to replace ab in the string and SLP has a production X to aY and the string generated by Y is bw, then we alter the rule of Y so that it generates w and replace Y with bY in all rules. In this way the rule becomes X to abY and so ab can be replaced, similar operations are defined for the right sides of the nonterminals. As a result, we are interested mostly in the SLP representation rather than the string itself and its combinatorial properties. What we need to control, though, is the size of the SLP. With appropriate choices of substrings to be compressed it can be shown that it stays linear. The proposed method turned out to be surprisingly efficient and applicable in various scenarios: for instance it can be used to test the equality of SLPs in time O(n log N), where n is the size of the SLP and N the length of the generated string; on the other hand it can be used to approximate the smallest SLP for a given string, with the approximation ratio O(log(n/g)) where n is the length of the string and g the size of the smallest SLP for this string, matching the best known bounds.

Cite as

Artur Jez. Recompression of SLPs. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{jez:LIPIcs.CPM.2017.2,
  author =	{Jez, Artur},
  title =	{{Recompression of SLPs}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.2},
  URN =		{urn:nbn:de:0030-drops-73475},
  doi =		{10.4230/LIPIcs.CPM.2017.2},
  annote =	{Keywords: Straight Line Programs, smallest grammar problem, compression, pro- cessing compressed data, recompression}
}
Document
Shortest Superstring

Authors: Marcin Mucha


Abstract
In the Shortest Superstring problem (SS) one has to find a shortest string s containing given strings s_1,...,s_n as substrings. The problem is NP-hard, so a natural question is that of its approximability. One natural approach to approximately solving SS is the following GREEDY heuristic: repeatedly merge two strings with the largest overlap until only a single string is left. This heuristic is conjectured to be a 2-approximation, but even after 30 years since the conjecture has been posed, we are still very far from proving it. The situation is better for non-greedy approximation algorithms, where several approaches yielding 2.5-approximation (and better) are known. In this talk, we will survey the main results in the area, focusing on the fundamental ideas and intuitions.

Cite as

Marcin Mucha. Shortest Superstring. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{mucha:LIPIcs.CPM.2017.3,
  author =	{Mucha, Marcin},
  title =	{{Shortest Superstring}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.3},
  URN =		{urn:nbn:de:0030-drops-73483},
  doi =		{10.4230/LIPIcs.CPM.2017.3},
  annote =	{Keywords: shortest superstring, approximation algorithms}
}
Document
Document Listing on Repetitive Collections with Guaranteed Performance

Authors: Gonzalo Navarro


Abstract
We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size N over alphabet [1,a] is composed of D copies of a string of size n, and s single-character edits are applied on the copies. We introduce the first document listing index with size O~(n + s), precisely O((n lg a + s lg^2 N) lg D) bits, and with useful worst-case time guarantees: Given a pattern of length m, the index reports the ndoc strings where it appears in time O(m^2 + m lg N (lg D + lg^e N) ndoc), for any constant e > 0.

Cite as

Gonzalo Navarro. Document Listing on Repetitive Collections with Guaranteed Performance. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 4:1-4:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{navarro:LIPIcs.CPM.2017.4,
  author =	{Navarro, Gonzalo},
  title =	{{Document Listing on Repetitive Collections with Guaranteed Performance}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{4:1--4:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.4},
  URN =		{urn:nbn:de:0030-drops-73268},
  doi =		{10.4230/LIPIcs.CPM.2017.4},
  annote =	{Keywords: repetitive string collections, document listing, grammar compression, range minimum queries, succinct data structures}
}
Document
Path Queries on Functions

Authors: Travis Gagie, Meng He, and Gonzalo Navarro


Abstract
Let f : [1..n] -> [1..n] be a function, and l : [1..n] -> [1..s] indicate a label assigned to each element of the domain. We design several compact data structures that answer various queries on the labels of paths in f. For example, we can find the minimum label in f^k (i) for a given i and any k >= 0 in a given range [k1..k2], using n lg n + O(n) bits, or the minimum label in f^(-k) (i) for a given i and k > 0, using 2n lg n + O(n) bits, both in time O(lg n/ lg lg n). By using n lg s + o(n lg s) further bits, we can also count, within the same time, the number of elements within a range of labels, and report each such element in O(1 + lg s / lg lg n) additional time. Several other possible queries are considered, such as top-t queries and t-majorities.

Cite as

Travis Gagie, Meng He, and Gonzalo Navarro. Path Queries on Functions. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{gagie_et_al:LIPIcs.CPM.2017.5,
  author =	{Gagie, Travis and He, Meng and Navarro, Gonzalo},
  title =	{{Path Queries on Functions}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{5:1--5:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.5},
  URN =		{urn:nbn:de:0030-drops-73274},
  doi =		{10.4230/LIPIcs.CPM.2017.5},
  annote =	{Keywords: succinct data structures, integer functions, range queries, trees and permutations}
}
Document
Deterministic Indexing for Packed Strings

Authors: Philip Bille, Inge Li Gørtz, and Frederik Rye Skjoldjensen


Abstract
Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In the deterministic variant the goal is to solve the string indexing problem without any randomization (at preprocessing time or query time). In the packed variant the strings are stored with several character in a single word, giving us the opportunity to read multiple characters simultaneously. Our main result is a new string index in the deterministic and packed setting. Given a packed string S of length n over an alphabet s, we show how to preprocess S in O(n) (deterministic) time and space O(n) such that given a packed pattern string of length m we can support queries in (deterministic) time O(m/a + log m + log log s), where a = w /log s is the number of characters packed in a word of size w = log n. Our query time is always at least as good as the previous best known bounds and whenever several characters are packed in a word, i.e., log s << w, the query times are faster.

Cite as

Philip Bille, Inge Li Gørtz, and Frederik Rye Skjoldjensen. Deterministic Indexing for Packed Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 6:1-6:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2017.6,
  author =	{Bille, Philip and G{\o}rtz, Inge Li and Skjoldjensen, Frederik Rye},
  title =	{{Deterministic Indexing for Packed Strings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{6:1--6:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.6},
  URN =		{urn:nbn:de:0030-drops-73351},
  doi =		{10.4230/LIPIcs.CPM.2017.6},
  annote =	{Keywords: suffix tree, suffix array, deterministic algorithm, word packing}
}
Document
Representing the Suffix Tree with the CDAWG

Authors: Djamal Belazzougui and Fabio Cunial


Abstract
Given a string T, it is known that its suffix tree can be represented using the compact directed acyclic word graph (CDAWG) with e_T arcs, taking overall O(e_T+e_REV(T)) words of space, where REV(T) is the reverse of T, and supporting some key operations in time between O(1) and O(log(log(n))) in the worst case. This representation is especially appealing for highly repetitive strings, like collections of similar genomes or of version-controlled documents, in which e_T grows sublinearly in the length of T in practice. In this paper we augment such representation, supporting a number of additional queries in worst-case time between O(1) and O(log(n)) in the RAM model, without increasing space complexity asymptotically. Our technique, based on a heavy path decomposition of the suffix tree, enables also a representation of the suffix array, of the inverse suffix array, and of T itself, that takes O(e_T) words of space, and that supports random access in O(log(n)) time. Furthermore, we establish a connection between the reversed CDAWG of T and a context-free grammar that produces T and only T, which might have independent interest.

Cite as

Djamal Belazzougui and Fabio Cunial. Representing the Suffix Tree with the CDAWG. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 7:1-7:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{belazzougui_et_al:LIPIcs.CPM.2017.7,
  author =	{Belazzougui, Djamal and Cunial, Fabio},
  title =	{{Representing the Suffix Tree with the CDAWG}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{7:1--7:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.7},
  URN =		{urn:nbn:de:0030-drops-73402},
  doi =		{10.4230/LIPIcs.CPM.2017.7},
  annote =	{Keywords: CDAWG, suffix tree, heavy path decomposition, maximal repeat, context-free grammar}
}
Document
Position Heaps for Parameterized Strings

Authors: Diptarama Diptarama, Takashi Katsura, Yuhei Otomo, Kazuyuki Narisawa, and Ayumi Shinohara


Abstract
We propose a new indexing structure for parameterized strings, called parameterized position heap. Parameterized position heap is applicable for parameterized pattern matching problem, where the pattern matches a substring of the text if there exists a bijective mapping from the symbols of the pattern to the symbols of the substring. We propose an online construction algorithm of parameterized position heap of a text and show that our algorithm runs in linear time with respect to the text size. We also show that by using parameterized position heap, we can find all occurrences of a pattern in the text in linear time with respect to the product of the pattern size and the alphabet size.

Cite as

Diptarama Diptarama, Takashi Katsura, Yuhei Otomo, Kazuyuki Narisawa, and Ayumi Shinohara. Position Heaps for Parameterized Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 8:1-8:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{diptarama_et_al:LIPIcs.CPM.2017.8,
  author =	{Diptarama, Diptarama and Katsura, Takashi and Otomo, Yuhei and Narisawa, Kazuyuki and Shinohara, Ayumi},
  title =	{{Position Heaps for Parameterized Strings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{8:1--8:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.8},
  URN =		{urn:nbn:de:0030-drops-73396},
  doi =		{10.4230/LIPIcs.CPM.2017.8},
  annote =	{Keywords: string matching, indexing structure, parameterized pattern matching, position heap}
}
Document
On-Line Pattern Matching on Similar Texts

Authors: Roberto Grossi, Costas S. Iliopoulos, Chang Liu, Nadia Pisanti, Solon P. Pissis, Ahmad Retha, Giovanna Rosone, Fatima Vayani, and Luca Versari


Abstract
Pattern matching on a set of similar texts has received much attention, especially recently, mainly due to its application in cataloguing human genetic variation. In particular, many different algorithms have been proposed for the off-line version of this problem; that is, constructing a compressed index for a set of similar texts in order to answer pattern matching queries efficiently. However, the on-line, more fundamental, version of this problem is a rather undeveloped topic. Solutions to the on-line version can be beneficial for a number of reasons; for instance, efficient on-line solutions can be used in combination with partial indexes as practical trade-offs. We make here an attempt to close this gap via proposing two efficient algorithms for this problem. Notably, one of the algorithms requires time linear in the size of the texts' representation, for short patterns. Furthermore, experimental results confirm our theoretical findings in practical terms.

Cite as

Roberto Grossi, Costas S. Iliopoulos, Chang Liu, Nadia Pisanti, Solon P. Pissis, Ahmad Retha, Giovanna Rosone, Fatima Vayani, and Luca Versari. On-Line Pattern Matching on Similar Texts. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{grossi_et_al:LIPIcs.CPM.2017.9,
  author =	{Grossi, Roberto and Iliopoulos, Costas S. and Liu, Chang and Pisanti, Nadia and Pissis, Solon P. and Retha, Ahmad and Rosone, Giovanna and Vayani, Fatima and Versari, Luca},
  title =	{{On-Line Pattern Matching on Similar Texts}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{9:1--9:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.9},
  URN =		{urn:nbn:de:0030-drops-73379},
  doi =		{10.4230/LIPIcs.CPM.2017.9},
  annote =	{Keywords: string algorithms, pattern matching, degenerate strings, elastic-degenerate strings, on-line algorithms}
}
Document
A Family of Approximation Algorithms for the Maximum Duo-Preservation String Mapping Problem

Authors: Bartlomiej Dudek, Pawel Gawrychowski, and Piotr Ostropolski-Nalewaja


Abstract
In the Maximum Duo-Preservation String Mapping problem we are given two strings and wish to map the letters of the former to the letters of the latter as to maximise the number of duos. A duo is a pair of consecutive letters that is mapped to a pair of consecutive letters in the same order. This is complementary to the well-studied Minimum Common String Partition problem, where the goal is to partition the former string into blocks that can be permuted and concatenated to obtain the latter string. Maximum Duo-Preservation String Mapping is APX-hard. After a series of improvements, Brubach [WABI 2016] showed a polynomial-time 3.25-approximation algorithm. Our main contribution is that, for any eps>0, there exists a polynomial-time (2+eps)-approximation algorithm. Similarly to a previous solution by Boria et al. [CPM 2016], our algorithm uses the local search technique. However, this is used only after a certain preliminary greedy procedure, which gives us more structure and makes a more general local search possible. We complement this with a specialised version of the algorithm that achieves 2.67-approximation in quadratic time.

Cite as

Bartlomiej Dudek, Pawel Gawrychowski, and Piotr Ostropolski-Nalewaja. A Family of Approximation Algorithms for the Maximum Duo-Preservation String Mapping Problem. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 10:1-10:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{dudek_et_al:LIPIcs.CPM.2017.10,
  author =	{Dudek, Bartlomiej and Gawrychowski, Pawel and Ostropolski-Nalewaja, Piotr},
  title =	{{A Family of Approximation Algorithms for the Maximum Duo-Preservation String Mapping Problem}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{10:1--10:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.10},
  URN =		{urn:nbn:de:0030-drops-73458},
  doi =		{10.4230/LIPIcs.CPM.2017.10},
  annote =	{Keywords: approximation scheme, minimum common string partition, local search}
}
Document
Revisiting the Parameterized Complexity of Maximum-Duo Preservation String Mapping

Authors: Christian Komusiewicz, Mateus de Oliveira Oliveira, and Meirav Zehavi


Abstract
In the Maximum-Duo Preservation String Mapping (Max-Duo PSM) problem, the input consists of two related strings A and B of length n and a nonnegative integer k. The objective is to determine whether there exists a mapping m from the set of positions of A to the set of positions of B that maps only to positions with the same character and preserves at least k duos, which are pairs of adjacent positions. We develop a randomized algorithm that solves Max-Duo PSM in time 4^k * n^{O(1)}, and a deterministic algorithm that solves this problem in time 6.855^k * n^{O(1)}. The previous best known (deterministic) algorithm for this problem has running time (8e)^{2k+o(k)} * n^{O(1)} [Beretta et al., Theor. Comput. Sci. 2016]. We also show that Max-Duo PSM admits a problem kernel of size O(k^3), improving upon the previous best known problem kernel of size O(k^6).

Cite as

Christian Komusiewicz, Mateus de Oliveira Oliveira, and Meirav Zehavi. Revisiting the Parameterized Complexity of Maximum-Duo Preservation String Mapping. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 11:1-11:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{komusiewicz_et_al:LIPIcs.CPM.2017.11,
  author =	{Komusiewicz, Christian and de Oliveira Oliveira, Mateus and Zehavi, Meirav},
  title =	{{Revisiting the Parameterized Complexity of Maximum-Duo Preservation String Mapping}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{11:1--11:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.11},
  URN =		{urn:nbn:de:0030-drops-73436},
  doi =		{10.4230/LIPIcs.CPM.2017.11},
  annote =	{Keywords: comparative genomics, parameterized complexity, kernelization}
}
Document
Clique-Based Lower Bounds for Parsing Tree-Adjoining Grammars

Authors: Karl Bringmann and Philip Wellnitz


Abstract
Tree-adjoining grammars are a generalization of context-free grammars that are well suited to model human languages and are thus popular in computational linguistics. In the tree-adjoining grammar recognition problem, given a grammar G and a string s of length n, the task is to decide whether s can be obtained from G. Rajasekaran and Yooseph’s parser (JCSS’98) solves this problem in time O(n^2w), where w < 2.373 is the matrix multiplication exponent. The best algorithms avoiding fast matrix multiplication take time O(n^6). The first evidence for hardness was given by Satta (J. Comp. Linguist.’94): For a more general parsing problem, any algorithm that avoids fast matrix multiplication and is significantly faster than O(|G|·n^6) in the case of |G| = Theta(n^12) would imply a breakthrough for Boolean matrix multiplication. Following an approach by Abboud et al. (FOCS’15) for context-free grammar recognition, in this paper we resolve many of the disadvantages of the previous lower bound. We show that, even on constant-size grammars, any improvement on Rajasekaran and Yooseph’s parser would imply a breakthrough for the k-Clique problem. This establishes tree-adjoining grammar parsing as a practically relevant problem with the unusual running time of n^2w , up to lower order factors.

Cite as

Karl Bringmann and Philip Wellnitz. Clique-Based Lower Bounds for Parsing Tree-Adjoining Grammars. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 12:1-12:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{bringmann_et_al:LIPIcs.CPM.2017.12,
  author =	{Bringmann, Karl and Wellnitz, Philip},
  title =	{{Clique-Based Lower Bounds for Parsing Tree-Adjoining Grammars}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{12:1--12:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.12},
  URN =		{urn:nbn:de:0030-drops-73329},
  doi =		{10.4230/LIPIcs.CPM.2017.12},
  annote =	{Keywords: conditional lower bounds, k-Clique, parsing, tree-adjoining grammars}
}
Document
Communication and Streaming Complexity of Approximate Pattern Matching

Authors: Tatiana Starikovskaya


Abstract
We consider the approximate pattern matching problem. Given a text T of length 2n and a pattern P of length n, the task is to decide for each prefix T[1, j] of T if it ends with a string that is at the edit distance at most k from P. If this is the case, we must output the edit distance and the corresponding edit operations. We first show the communication complexity of the problem for the case when Alice and Bob both share the pattern and Alice holds the first half of the text and Bob the second half, and for the case when Alice holds the first half of the text, Bob the second half of the text, and Charlie the pattern. We then develop the first sublinear-space streaming algorithm for the problem. The algorithm is randomised with error probability at most 1/poly(n).

Cite as

Tatiana Starikovskaya. Communication and Streaming Complexity of Approximate Pattern Matching. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 13:1-13:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{starikovskaya:LIPIcs.CPM.2017.13,
  author =	{Starikovskaya, Tatiana},
  title =	{{Communication and Streaming Complexity of Approximate Pattern Matching}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{13:1--13:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.13},
  URN =		{urn:nbn:de:0030-drops-73206},
  doi =		{10.4230/LIPIcs.CPM.2017.13},
  annote =	{Keywords: approximate pattern matching, edit distance, randomised algorithms, streaming algorithms, communication complexity}
}
Document
The Longest Filled Common Subsequence Problem

Authors: Mauro Castelli, Riccardo Dondi, Giancarlo Mauri, and Italo Zoppis


Abstract
Inspired by a recent approach for genome reconstruction from incomplete data, we consider a variant of the longest common subsequence problem for the comparison of two sequences, one of which is incomplete, i.e. it has some missing elements. The new combinatorial problem, called Longest Filled Common Subsequence, given two sequences A and B, and a multiset M of symbols missing in B, asks for a sequence B* obtained by inserting the symbols of M into B so that B* induces a common subsequence with A of maximum length. First, we investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol. Then, we give a 3/5 approximation algorithm for the problem. Finally, we present a fixed-parameter algorithm, when the problem is parameterized by the number of symbols inserted in B that "match" symbols of A.

Cite as

Mauro Castelli, Riccardo Dondi, Giancarlo Mauri, and Italo Zoppis. The Longest Filled Common Subsequence Problem. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 14:1-14:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{castelli_et_al:LIPIcs.CPM.2017.14,
  author =	{Castelli, Mauro and Dondi, Riccardo and Mauri, Giancarlo and Zoppis, Italo},
  title =	{{The Longest Filled Common Subsequence Problem}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{14:1--14:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.14},
  URN =		{urn:nbn:de:0030-drops-73293},
  doi =		{10.4230/LIPIcs.CPM.2017.14},
  annote =	{Keywords: longest common subsequence, approximation algorithms, computational complexity, fixed-parameter algorithms}
}
Document
Lempel-Ziv Compression in a Sliding Window

Authors: Philip Bille, Patrick Hagge Cording, Johannes Fischer, and Inge Li Gørtz


Abstract
We present new algorithms for the sliding window Lempel-Ziv (LZ77) problem and the approximate rightmost LZ77 parsing problem. Our main result is a new and surprisingly simple algorithm that computes the sliding window LZ77 parse in O(w) space and either O(n) expected time or O(n log log w+z log log s) deterministic time. Here, w is the window size, n is the size of the input string, z is the number of phrases in the parse, and s is the size of the alphabet. This matches the space and time bounds of previous results while removing constant size restrictions on the alphabet size. To achieve our result, we combine a simple modification and augmentation of the suffix tree with periodicity properties of sliding windows. We also apply this new technique to obtain an algorithm for the approximate rightmost LZ77 problem that uses O(n(log z + log log n)) time and O(n) space and produces a (1+e)-approximation of the rightmost parsing (any constant e>0). While this does not improve the best known time-space trade-offs for exact rightmost parsing, our algorithm is significantly simpler and exposes a direct connection between sliding window parsing and the approximate rightmost matching problem.

Cite as

Philip Bille, Patrick Hagge Cording, Johannes Fischer, and Inge Li Gørtz. Lempel-Ziv Compression in a Sliding Window. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 15:1-15:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2017.15,
  author =	{Bille, Philip and Cording, Patrick Hagge and Fischer, Johannes and G{\o}rtz, Inge Li},
  title =	{{Lempel-Ziv Compression in a Sliding Window}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{15:1--15:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.15},
  URN =		{urn:nbn:de:0030-drops-73316},
  doi =		{10.4230/LIPIcs.CPM.2017.15},
  annote =	{Keywords: Lempel-Ziv parsing, sliding window, rightmost matching}
}
Document
Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing

Authors: Philip Bille, Mikko Berggren Ettienne, Inge Li Gørtz, and Hjalte Wedel Vildhøj


Abstract
Given a string S, the compressed indexing problem is to preprocess S into a compressed representation that supports fast substring queries. The goal is to use little space relative to the compressed size of S while supporting fast queries. We present a compressed index based on the Lempel-Ziv 1977 compression scheme. Let n, and z denote the size of the input string, and the compressed LZ77 string, respectively. We obtain the following time-space trade-offs. Given a pattern string P of length m, we can solve the problem in (i) O(m + occ lglg n) time using O(z lg(n/z) lglg z) space, or (ii) O(m(1 + lg^e z / lg(n/z)) + occ(lglg n + lg^e z)) time using O(z lg(n/z)) space, for any 0 < e < 1 In particular, (i) improves the leading term in the query time of the previous best solution from O(m lg m) to O(m) at the cost of increasing the space by a factor lglg z. Alternatively, (ii) matches the previous best space bound, but has a leading term in the query time of O(m(1+lg^e z / lg(n/z))). However, for any polynomial compression ratio, i.e., z = O(n^{1-d}), for constant d > 0, this becomes O(m). Our index also supports extraction of any substring of length l in O(l + lg(n/z)) time. Technically, our results are obtained by novel extensions and combinations of existing data structures of independent interest, including a new batched variant of weak prefix search.

Cite as

Philip Bille, Mikko Berggren Ettienne, Inge Li Gørtz, and Hjalte Wedel Vildhøj. Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 16:1-16:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2017.16,
  author =	{Bille, Philip and Ettienne, Mikko Berggren and G{\o}rtz, Inge Li and Vildh{\o}j, Hjalte Wedel},
  title =	{{Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{16:1--16:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.16},
  URN =		{urn:nbn:de:0030-drops-73254},
  doi =		{10.4230/LIPIcs.CPM.2017.16},
  annote =	{Keywords: compressed indexing, pattern matching, LZ77, prefix search}
}
Document
From LZ77 to the Run-Length Encoded Burrows-Wheeler Transform, and Back

Authors: Alberto Policriti and Nicola Prezza


Abstract
The Lempel-Ziv factorization (LZ77) and the Run-Length encoded Burrows-Wheeler Transform (RLBWT) are two important tools in text compression and indexing, being their sizes z and r closely related to the amount of text self-repetitiveness. In this paper we consider the problem of converting the two representations into each other within a working space proportional to the input and the output. Let n be the text length. We show that RLBWT can be converted to LZ77 in O(n log r) time and O(r) words of working space. Conversely, we provide an algorithm to convert LZ77 to RLBWT in O(n(log r + log z)) time and O(r+z) words of working space. Note that r and z can be constant if the text is highly repetitive, and our algorithms can operate with (up to) exponentially less space than naive solutions based on full decompression.

Cite as

Alberto Policriti and Nicola Prezza. From LZ77 to the Run-Length Encoded Burrows-Wheeler Transform, and Back. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 17:1-17:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{policriti_et_al:LIPIcs.CPM.2017.17,
  author =	{Policriti, Alberto and Prezza, Nicola},
  title =	{{From LZ77 to the Run-Length Encoded Burrows-Wheeler Transform, and Back}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{17:1--17:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.17},
  URN =		{urn:nbn:de:0030-drops-73215},
  doi =		{10.4230/LIPIcs.CPM.2017.17},
  annote =	{Keywords: Lempel-Ziv, Burrows-Wheeler transform, compressed computation, repetitive text collections}
}
Document
Longest Common Extensions with Recompression

Authors: Tomohiro I


Abstract
Given two positions i and j in a string T of length N, a longest common extension (LCE) query asks for the length of the longest common prefix between suffixes beginning at i and j. A compressed LCE data structure stores T in a compressed form while supporting fast LCE queries. In this article we show that the recompression technique is a powerful tool for compressed LCE data structures. We present a new compressed LCE data structure of size O(z lg (N/z)) that supports LCE queries in O(lg N) time, where z is the size of Lempel-Ziv 77 factorization without self-reference of T. Given T as an uncompressed form, we show how to build our data structure in O(N) time and space. Given T as a grammar compressed form, i.e., a straight-line program of size n generating T, we show how to build our data structure in O(n lg (N/n)) time and O(n + z lg (N/z)) space. Our algorithms are deterministic and always return correct answers.

Cite as

Tomohiro I. Longest Common Extensions with Recompression. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 18:1-18:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{i:LIPIcs.CPM.2017.18,
  author =	{I, Tomohiro},
  title =	{{Longest Common Extensions with Recompression}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{18:1--18:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.18},
  URN =		{urn:nbn:de:0030-drops-73234},
  doi =		{10.4230/LIPIcs.CPM.2017.18},
  annote =	{Keywords: Longest Common Extension (LCE) queries, compressed data structure, grammar compressed strings, Straight-Line Program (SLP)}
}
Document
Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings

Authors: Luís Cunha, Simone Dantas, Travis Gagie, Roland Wittler, Luis Kowada, and Jens Stoye


Abstract
Important papers have appeared recently on the problem of indexing binary strings for jumbled pattern matching, and further lowering the time bounds in terms of the input size would now be a breakthrough with broad implications. We can still make progress on the problem, however, by considering other natural parameters. Badkobeh et al. (IPL, 2013) and Amir et al. (TCS, 2016) gave algorithms that index a binary string in O(n + r^2 log r) time, where n is the length and r is the number of runs, and Giaquinta and Grabowski (IPL, 2013) gave one that runs in O(n + r^2) time. In this paper we propose a new and very simple algorithm that also runs in O(n + r^2) time and can be extended either so that the index returns the position of a match (if there is one), or so that the algorithm uses only O(n) bits of space instead of O(n) words.

Cite as

Luís Cunha, Simone Dantas, Travis Gagie, Roland Wittler, Luis Kowada, and Jens Stoye. Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 19:1-19:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{cunha_et_al:LIPIcs.CPM.2017.19,
  author =	{Cunha, Lu{\'\i}s and Dantas, Simone and Gagie, Travis and Wittler, Roland and Kowada, Luis and Stoye, Jens},
  title =	{{Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{19:1--19:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.19},
  URN =		{urn:nbn:de:0030-drops-73418},
  doi =		{10.4230/LIPIcs.CPM.2017.19},
  annote =	{Keywords: string algorithms, indexing, jumbled pattern matching, run-length encoding}
}
Document
Faster STR-IC-LCS Computation via RLE

Authors: Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda


Abstract
The constrained LCS problem asks one to find a longest common subsequence of two input strings A and B with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string C as a substring. Given two strings A and B of respective lengths M and N, and a constraint string C of length at most min{M, N}, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz (Inf. Process. Lett., 11:423-426, 2012), runs in O(MN) time. In this work, we present an O(mN + nM)-time solution to the STR-IC-LCS problem, where m and n denote the sizes of the run-length encodings of A and B, respectively. Since m <= M and n <= N always hold, our algorithm is always as fast as Deorowicz's algorithm, and is faster when input strings are compressible via RLE.

Cite as

Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster STR-IC-LCS Computation via RLE. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 20:1-20:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{kuboi_et_al:LIPIcs.CPM.2017.20,
  author =	{Kuboi, Keita and Fujishige, Yuta and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster STR-IC-LCS Computation via RLE}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{20:1--20:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.20},
  URN =		{urn:nbn:de:0030-drops-73335},
  doi =		{10.4230/LIPIcs.CPM.2017.20},
  annote =	{Keywords: longest common subsequence, STR-IC-LCS, run-length encoding}
}
Document
Gapped Pattern Statistics

Authors: Philippe Duchon, Cyril Nicaud, and Carine Pivoteau


Abstract
We give a probabilistic analysis of parameters related to alpha-gapped repeats and palindromes in random words, under both uniform and memoryless distributions (where letters have different probabilities, but are drawn independently). More precisely, we study the expected number of maximal alpha-gapped patterns, as well as the expected length of the longest alpha-gapped pattern in a random word.

Cite as

Philippe Duchon, Cyril Nicaud, and Carine Pivoteau. Gapped Pattern Statistics. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 21:1-21:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{duchon_et_al:LIPIcs.CPM.2017.21,
  author =	{Duchon, Philippe and Nicaud, Cyril and Pivoteau, Carine},
  title =	{{Gapped Pattern Statistics}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{21:1--21:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.21},
  URN =		{urn:nbn:de:0030-drops-73309},
  doi =		{10.4230/LIPIcs.CPM.2017.21},
  annote =	{Keywords: combinatorics on words, alpha-gapped repeats, random words, memoryless sources, analytic combinatorics}
}
Document
Computing All Distinct Squares in Linear Time for Integer Alphabets

Authors: Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl


Abstract
Given a string on an integer alphabet, we present an algorithm that computes the set of all distinct squares belonging to this string in time linear to the string length. As an application, we show how to compute the tree topology of the minimal augmented suffix tree in linear time. Asides from that, we elaborate an algorithm computing the longest previous table in a succinct representation using compressed working space.

Cite as

Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl. Computing All Distinct Squares in Linear Time for Integer Alphabets. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 22:1-22:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2017.22,
  author =	{Bannai, Hideo and Inenaga, Shunsuke and K\"{o}ppl, Dominik},
  title =	{{Computing All Distinct Squares in Linear Time for Integer Alphabets}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{22:1--22:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.22},
  URN =		{urn:nbn:de:0030-drops-73224},
  doi =		{10.4230/LIPIcs.CPM.2017.22},
  annote =	{Keywords: tandem repeats, distinct squares, counting algorithms}
}
Document
Palindromic Length in Linear Time

Authors: Kirill Borozdin, Dmitry Kosolobov, Mikhail Rubinchik, and Arseny M. Shur


Abstract
Palindromic length of a string is the minimum number of palindromes whose concatenation is equal to this string. The problem of finding the palindromic length drew some attention, and a few O(n log n) time online algorithms were recently designed for it. In this paper we present the first linear time online algorithm for this problem.

Cite as

Kirill Borozdin, Dmitry Kosolobov, Mikhail Rubinchik, and Arseny M. Shur. Palindromic Length in Linear Time. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 23:1-23:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{borozdin_et_al:LIPIcs.CPM.2017.23,
  author =	{Borozdin, Kirill and Kosolobov, Dmitry and Rubinchik, Mikhail and Shur, Arseny M.},
  title =	{{Palindromic Length in Linear Time}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{23:1--23:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.23},
  URN =		{urn:nbn:de:0030-drops-73389},
  doi =		{10.4230/LIPIcs.CPM.2017.23},
  annote =	{Keywords: palindrome, palindromic length, palindromic factorization, online}
}
Document
Tight Bounds on the Maximum Number of Shortest Unique Substrings

Authors: Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda


Abstract
A substring Q of a string S is called a shortest unique substring (SUS) for interval [s,t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s,t], and every substring of S which contains interval [s,t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUSs for interval [s,t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s <= t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

Cite as

Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Tight Bounds on the Maximum Number of Shortest Unique Substrings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 24:1-24:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.CPM.2017.24,
  author =	{Mieno, Takuya and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Tight Bounds on the Maximum Number of Shortest Unique Substrings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{24:1--24:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.24},
  URN =		{urn:nbn:de:0030-drops-73460},
  doi =		{10.4230/LIPIcs.CPM.2017.24},
  annote =	{Keywords: shortest unique substrings, maximal unique substrings}
}
Document
Can We Recover the Cover?

Authors: Amihood Amir, Avivit Levy, Moshe Lewenstein, Ronit Lubin, and Benny Porat


Abstract
Data analysis typically involves error recovery and detection of regularities as two different key tasks. In this paper we show that there are data types for which these two tasks can be powerfully combined. A common notion of regularity in strings is that of a cover. Data describing measures of a natural coverable phenomenon may be corrupted by errors caused by the measurement process, or by the inexact features of the phenomenon itself. Due to this reason, different variants of approximate covers have been introduced, some of which are NP-hard to compute. In this paper we assume that the Hamming distance metric measures the amount of corruption experienced, and study the problem of recovering the correct cover from data corrupted by mismatch errors, formally defined as the cover recovery problem (CRP). We show that for the Hamming distance metric, coverability is a powerful property allowing detecting the original cover and correcting the data, under suitable conditions. We also study a relaxation of another problem, which is called the approximate cover problem (ACP). Since the ACP is proved to be NP-hard [Amir,Levy,Lubin,Porat, CPM 2017], we study a relaxation, which we call the candidate-relaxation of the ACP, and show it has a polynomial time complexity. As a result, we get that the ACP also has a polynomial time complexity in many practical situations. An important application of our ACP relaxation study is also a polynomial time algorithm for the cover recovery problem (CRP).

Cite as

Amihood Amir, Avivit Levy, Moshe Lewenstein, Ronit Lubin, and Benny Porat. Can We Recover the Cover?. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 25:1-25:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2017.25,
  author =	{Amir, Amihood and Levy, Avivit and Lewenstein, Moshe and Lubin, Ronit and Porat, Benny},
  title =	{{Can We Recover the Cover?}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{25:1--25:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.25},
  URN =		{urn:nbn:de:0030-drops-73190},
  doi =		{10.4230/LIPIcs.CPM.2017.25},
  annote =	{Keywords: periodicity, quasi-periodicity, cover, approximate cover, data recovery}
}
Document
Approximate Cover of Strings

Authors: Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat


Abstract
Regularities in strings arise in various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. A common notion to describe regularity in a string T is a cover, which is a string C for which every letter of T lies within some occurrence of C. The alignment of the cover repetitions in the given text is called a tiling. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper, we use a new approach for handling errors in coverable phenomena and define the approximate cover problem (ACP), in which we are given a text that is a sequence of some cover repetitions with possible mismatch errors, and we seek a string that covers the text with the minimum number of errors. We first show that the ACP is NP-hard, by studying the cover-size relaxation of the ACP, in which the requested size of the approximate cover is also given with the input string. We show this relaxation is already NP-hard. We also study another two relaxations of the ACP, which we call the partial-tiling relaxation of the ACP and the full-tiling relaxation of the ACP, in which a tiling of the requested cover is also given with the input string. A given full tiling retains all the occurrences of the cover before the errors, while in a partial tiling there can be additional occurrences of the cover that are not marked by the tiling. We show that the partial-tiling relaxation has a polynomial time complexity and give experimental evidence that the full-tiling also has polynomial time complexity. The study of these relaxations, besides shedding another light on the complexity of the ACP, also involves a deep understanding of the properties of covers, yielding some key lemmas and observations that may be helpful for a future study of regularities in the presence of errors.

Cite as

Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat. Approximate Cover of Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2017.26,
  author =	{Amir, Amihood and Levy, Avivit and Lubin, Ronit and Porat, Ely},
  title =	{{Approximate Cover of Strings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{26:1--26:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.26},
  URN =		{urn:nbn:de:0030-drops-73189},
  doi =		{10.4230/LIPIcs.CPM.2017.26},
  annote =	{Keywords: periodicity, quasi-periodicity, cover, approximate cover}
}
Document
Beyond Adjacency Maximization: Scaffold Filling for New String Distances

Authors: Laurent Bulteau, Guillaume Fertin, and Christian Komusiewicz


Abstract
In Genomic Scaffold Filling, one aims at polishing in silico a draft genome, called scaffold. The scaffold is given in the form of an ordered set of gene sequences, called contigs. This is done by confronting the scaffold to an already complete reference genome from a close species. More precisely, given a scaffold S, a reference genome G and a score function f() between two genomes, the aim is to complete S by adding the missing genes from G so that the obtained complete genome S* optimizes f(S*, G). In this paper, we extend a model of Jiang et al. [CPM 2016] (i) by allowing the insertions of strings instead of single characters (i.e., some groups of genes may be forced to be inserted together) and (ii) by considering two alternative score functions: the first generalizes the notion of common adjacencies by maximizing the number of common k-mers between S* and G (k-Mer Scaffold Filling), the second aims at minimizing the number of breakpoints between S* and G (Min-Breakpoint Scaffold Filling). We study these problems from the parameterized complexity point of view, providing fixed-parameter (FPT) algorithms for both problems. In particular, we show that k-Mer Scaffold Filling is FPT wrt. parameter l, the number of additional k-mers realized by the completion of S—this answers an open question of Jiang et al. [CPM 2016]. We also show that Min-Breakpoint Scaffold Filling is FPT wrt. a parameter combining the number of missing genes, the number of gene repetitions and the target distance.

Cite as

Laurent Bulteau, Guillaume Fertin, and Christian Komusiewicz. Beyond Adjacency Maximization: Scaffold Filling for New String Distances. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 27:1-27:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{bulteau_et_al:LIPIcs.CPM.2017.27,
  author =	{Bulteau, Laurent and Fertin, Guillaume and Komusiewicz, Christian},
  title =	{{Beyond Adjacency Maximization: Scaffold Filling for New String Distances}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{27:1--27:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.27},
  URN =		{urn:nbn:de:0030-drops-73364},
  doi =		{10.4230/LIPIcs.CPM.2017.27},
  annote =	{Keywords: computational biology, strings, FPT algorithms, kernelization}
}
Document
On the Weighted Quartet Consensus Problem

Authors: Manuel Lafond and Celine Scornavacca


Abstract
In phylogenetics, the consensus problem consists in summarizing a set of phylogenetic trees that all classify the same set of species into a single tree. Several definitions of consensus exist in the literature; in this paper we focus on the Weighted Quartet Consensus problem, a problem with unknown complexity status so far. Here we prove that the Weighted Quartet Consensus problem is NP-hard and we give a 1/2-factor approximation for this problem. During the process, we propose a derandomization procedure of a previously known randomized 1/3-factor approximation. We also investigate the fixed-parameter tractability of this problem.

Cite as

Manuel Lafond and Celine Scornavacca. On the Weighted Quartet Consensus Problem. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 28:1-28:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{lafond_et_al:LIPIcs.CPM.2017.28,
  author =	{Lafond, Manuel and Scornavacca, Celine},
  title =	{{On the Weighted Quartet Consensus Problem}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{28:1--28:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.28},
  URN =		{urn:nbn:de:0030-drops-73284},
  doi =		{10.4230/LIPIcs.CPM.2017.28},
  annote =	{Keywords: phylogenetic tree, consensus tree, quartets, complexity, fixed-parameter tractability}
}
Document
Optimal Omnitig Listing for Safe and Complete Contig Assembly

Authors: Massimo Cairo, Paul Medvedev, Nidia Obscura Acosta, Romeo Rizzi, and Alexandru I. Tomescu


Abstract
Genome assembly is the problem of reconstructing a genome sequence from a set of reads from a sequencing experiment. Typical formulations of the assembly problem admit in practice many genomic reconstructions, and actual genome assemblers usually output contigs, namely substrings that are promised to occur in the genome. To bridge the theory and practice, Tomescu and Medvedev [RECOMB 2016] reformulated contig assembly as finding all substrings common to all genomic reconstructions. They also gave a characterization of those walks (omnitigs) that are common to all closed edge-covering walks of a (directed) graph, a typical notion of genomic reconstruction. An algorithm for listing all maximal omnitigs was also proposed, by launching an exhaustive visit from every edge. In this paper, we prove new insights about the structure of omnitigs and solve several open questions about them. We combine these to achieve an O(nm)-time algorithm for outputting all the maximal omnitigs of a graph (with n nodes and m edges). This is also optimal, as we show families of graphs whose total omnitig length is Omega(nm). We implement this algorithm and show that it is 9-12 times faster in practice than the one of Tomescu and Medvedev [RECOMB 2016].

Cite as

Massimo Cairo, Paul Medvedev, Nidia Obscura Acosta, Romeo Rizzi, and Alexandru I. Tomescu. Optimal Omnitig Listing for Safe and Complete Contig Assembly. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 29:1-29:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{cairo_et_al:LIPIcs.CPM.2017.29,
  author =	{Cairo, Massimo and Medvedev, Paul and Obscura Acosta, Nidia and Rizzi, Romeo and Tomescu, Alexandru I.},
  title =	{{Optimal Omnitig Listing for Safe and Complete Contig Assembly}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{29:1--29:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.29},
  URN =		{urn:nbn:de:0030-drops-73423},
  doi =		{10.4230/LIPIcs.CPM.2017.29},
  annote =	{Keywords: genome assembly, graph algorithm, edge-covering walk, strong bridge}
}
Document
Dynamic Elias-Fano Representation

Authors: Giulio Ermanno Pibiri and Rossano Venturini


Abstract
We show that it is possible to store a dynamic ordered set S of n integers drawn from a bounded universe of size u in space close to the information-theoretic lower bound and preserve, at the same time, the asymptotic time optimality of the operations. Our results leverage on the Elias-Fano representation of monotone integer sequences, which can be shown to be less than half a bit per element away from the information-theoretic minimum. In particular, considering a RAM model with memory word size Theta(log u) bits, when integers are drawn from a polynomial universe of size u = n^gamma for any gamma = Theta(1), we add o(n) bits to the static Elias-Fano representation in order to: 1. support static predecessor/successor queries in O(min{1+log(u/n), loglog n}); 2. make S grow in an append-only fashion by spending O(1) per inserted element; 3. describe a dynamic data structure supporting random access in O(log n / loglog n) worst-case, insertions/deletions in O(log n / loglog n) amortized and predecessor/successor queries in O(min{1+log(u/n), loglog n}) worst-case time. These time bounds are optimal.

Cite as

Giulio Ermanno Pibiri and Rossano Venturini. Dynamic Elias-Fano Representation. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 30:1-30:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{pibiri_et_al:LIPIcs.CPM.2017.30,
  author =	{Pibiri, Giulio Ermanno and Venturini, Rossano},
  title =	{{Dynamic Elias-Fano Representation}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{30:1--30:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.30},
  URN =		{urn:nbn:de:0030-drops-73244},
  doi =		{10.4230/LIPIcs.CPM.2017.30},
  annote =	{Keywords: succinct data structures, integer sets, predecessor problem, Elias-Fano}
}
Document
Synergistic Solutions on MultiSets

Authors: Jérémy Barbay, Carlos Ochoa, and Srinivasa Rao Satti


Abstract
Karp et al. (1988) described Deferred Data Structures for Multisets as "lazy" data structures which partially sort data to support online rank and select queries, with the minimum amount of work in the worst case over instances of size n and number of queries q fixed. Barbay et al. (2016) refined this approach to take advantage of the gaps between the positions hit by the queries (i.e., the structure in the queries). We develop new techniques in order to further refine this approach and take advantage all at once of the structure (i.e., the multiplicities of the elements), some notions of local order (i.e., the number and sizes of runs) and global order (i.e., the number and positions of existing pivots) in the input; and of the structure and order in the sequence of queries. Our main result is a synergistic deferred data structure which outperforms all solutions in the comparison model that take advantage of only a subset of these features. As intermediate results, we describe two new synergistic sorting algorithms, which take advantage of some notions of structure and order (local and global) in the input, improving upon previous results which take advantage only of the structure (Munro and Spira 1979) or of the local order (Takaoka 1997) in the input; and one new multiselection algorithm which takes advantage of not only the order and structure in the input, but also of the structure in the queries.

Cite as

Jérémy Barbay, Carlos Ochoa, and Srinivasa Rao Satti. Synergistic Solutions on MultiSets. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 31:1-31:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{barbay_et_al:LIPIcs.CPM.2017.31,
  author =	{Barbay, J\'{e}r\'{e}my and Ochoa, Carlos and Satti, Srinivasa Rao},
  title =	{{Synergistic Solutions on MultiSets}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{31:1--31:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.31},
  URN =		{urn:nbn:de:0030-drops-73441},
  doi =		{10.4230/LIPIcs.CPM.2017.31},
  annote =	{Keywords: deferred data structure, multivariate analysis, quick sort, select}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail