DROPS

Document

DOI: 10.4230/LIPIcs.CPM.2021.7

Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time

Authors: Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Piątkowski

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Abstract

The Burrows-Wheeler transform (BWT) is a permutation whose applications are prevalent in data compression and text indexing. The bijective BWT (BBWT) is a bijective variant of it. Although it is known that the BWT can be constructed in linear time for integer alphabets by using a linear time suffix array construction algorithm, it was up to now only conjectured that the BBWT can also be constructed in linear time. We confirm this conjecture in the word RAM model by proposing a construction algorithm that is based on SAIS, improving the best known result of O(n lg n / lg lg n) time to linear. Since we can reduce the problem of constructing the extended BWT to constructing the BBWT in linear time, we obtain a linear-time algorithm computing the extended BWT at the same time.

Cite as

Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Piątkowski. Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 7:1-7:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2021.7,
  author =	{Bannai, Hideo and K\"{a}rkk\"{a}inen, Juha and K\"{o}ppl, Dominik and Pi\k{a}tkowski, Marcin},
  title =	{{Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{7:1--7:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.7},
  URN =		{urn:nbn:de:0030-drops-139588},
  doi =		{10.4230/LIPIcs.CPM.2021.7},
  annote =	{Keywords: Burrows-Wheeler Transform, Lyndon words, Circular Suffix Array, Suffix Array Construction Algorithm}
}

Document

DOI: 10.4230/LIPIcs.CSL.2021.27

On the Complexity of Horn and Krom Fragments of Second-Order Boolean Logic

Authors: Miika Hannula, Juha Kontinen, Martin Lück, and Jonni Virtema

Published in: LIPIcs, Volume 183, 29th EACSL Annual Conference on Computer Science Logic (CSL 2021)

Abstract

Second-order Boolean logic is a generalization of QBF, whose constant alternation fragments are known to be complete for the levels of the exponential time hierarchy. We consider two types of restriction of this logic: 1) restrictions to term constructions, 2) restrictions to the form of the Boolean matrix. Of the first sort, we consider two kinds of restrictions: firstly, disallowing nested use of proper function variables, and secondly stipulating that each function variable must appear with a fixed sequence of arguments. Of the second sort, we consider Horn, Krom, and core fragments of the Boolean matrix. We classify the complexity of logics obtained by combining these two types of restrictions. We show that, in most cases, logics with k alternating blocks of function quantifiers are complete for the kth or (k-1)th level of the exponential time hierarchy. Furthermore, we establish NL-completeness for the Krom and core fragments, when k = 1 and both restrictions of the first sort are in effect.

Cite as

Miika Hannula, Juha Kontinen, Martin Lück, and Jonni Virtema. On the Complexity of Horn and Krom Fragments of Second-Order Boolean Logic. In 29th EACSL Annual Conference on Computer Science Logic (CSL 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 183, pp. 27:1-27:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{hannula_et_al:LIPIcs.CSL.2021.27,
  author =	{Hannula, Miika and Kontinen, Juha and L\"{u}ck, Martin and Virtema, Jonni},
  title =	{{On the Complexity of Horn and Krom Fragments of Second-Order Boolean Logic}},
  booktitle =	{29th EACSL Annual Conference on Computer Science Logic (CSL 2021)},
  pages =	{27:1--27:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-175-7},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{183},
  editor =	{Baier, Christel and Goubault-Larrecq, Jean},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CSL.2021.27},
  URN =		{urn:nbn:de:0030-drops-134610},
  doi =		{10.4230/LIPIcs.CSL.2021.27},
  annote =	{Keywords: quantified Boolean formulae, computational complexity, second-order logic, Horn and Krom fragment}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.17

Indexing the Bijective BWT

Authors: Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Pia̧tkowski

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

The Burrows-Wheeler transform (BWT) is a permutation whose applications are prevalent in data compression and text indexing. The bijective BWT is a bijective variant of it that has not yet been studied for text indexing applications. We fill this gap by proposing a self-index built on the bijective BWT . The self-index applies the backward search technique of the FM-index to find a pattern P with O(|P| lg|P|) backward search steps.

Cite as

Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Pia̧tkowski. Indexing the Bijective BWT. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2019.17,
  author =	{Bannai, Hideo and K\"{a}rkk\"{a}inen, Juha and K\"{o}ppl, Dominik and Pia̧tkowski, Marcin},
  title =	{{Indexing the Bijective BWT}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{17:1--17:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.17},
  URN =		{urn:nbn:de:0030-drops-104887},
  doi =		{10.4230/LIPIcs.CPM.2019.17},
  annote =	{Keywords: Burrows-Wheeler Transform, Lyndon words, Text Indexing}
}

Document

DOI: 10.4230/LIPIcs.SEA.2017.17

Engineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet

Authors: Juha Kärkkäinen and Dominik Kempa

Published in: LIPIcs, Volume 75, 16th International Symposium on Experimental Algorithms (SEA 2017)

Abstract

The suffix array augmented with the LCP array is perhaps the most important data structure in modern string processing. There has been a lot of recent research activity on constructing these arrays in external memory. In this paper, we engineer the two fastest LCP array construction algorithms (ESA 2016) and improve them in three ways. First, we speed up the algorithms by up to a factor of two through parallelism. Just 8 threads is sufficient for making the algorithms essentially I/O bound. Second, we reduce the disk space usage of the algorithms making them in-place: The input (text and suffix array) is treated as read-only and the working disk space never exceeds the size of the final output (the LCP array). Third, we add support for large alphabets. All previous implementations assume the byte alphabet.

Cite as

Juha Kärkkäinen and Dominik Kempa. Engineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{karkkainen_et_al:LIPIcs.SEA.2017.17,
  author =	{K\"{a}rkk\"{a}inen, Juha and Kempa, Dominik},
  title =	{{Engineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet}},
  booktitle =	{16th International Symposium on Experimental Algorithms (SEA 2017)},
  pages =	{17:1--17:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-036-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{75},
  editor =	{Iliopoulos, Costas S. and Pissis, Solon P. and Puglisi, Simon J. and Raman, Rajeev},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2017.17},
  URN =		{urn:nbn:de:0030-drops-76116},
  doi =		{10.4230/LIPIcs.SEA.2017.17},
  annote =	{Keywords: LCP array, suffix array, external memory algorithms}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.CPM.2017

LIPIcs, Volume 78, CPM'17, Complete Volume

Authors: Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

LIPIcs, Volume 78, CPM'17, Complete Volume

Cite as

28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@Proceedings{karkkainen_et_al:LIPIcs.CPM.2017,
  title =	{{LIPIcs, Volume 78, CPM'17, Complete Volume}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017},
  URN =		{urn:nbn:de:0030-drops-75091},
  doi =		{10.4230/LIPIcs.CPM.2017},
  annote =	{Keywords: Data Structures, Data Storage Representations, Coding and Information Theory, Theory of Computation, Discrete Mathematics, Information Systems,}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2017.62

String Inference from Longest-Common-Prefix Array

Authors: Juha Kärkkäinen, Marcin Piatkowski, and Simon J. Puglisi

Published in: LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

Abstract

The suffix array, perhaps the most important data structure in modern string processing, is often augmented with the longest common prefix (LCP) array which stores the lengths of the LCPs for lexicographically adjacent suffixes of a string. Together the two arrays are roughly equivalent to the suffix tree with the LCP array representing the tree shape. In order to better understand the combinatorics of LCP arrays, we consider the problem of inferring a string from an LCP array, i.e., determining whether a given array of integers is a valid LCP array, and if it is, reconstructing some string or all strings with that LCP array. There are recent studies of inferring a string from a suffix tree shape but using significantly more information (in the form of suffix links) than is available in the LCP array. We provide two main results. (1) We describe two algorithms for inferring strings from an LCP array when we allow a generalized form of LCP array defined for a multiset of cyclic strings: a linear time algorithm for binary alphabet and a general algorithm with polynomial time complexity for a constant alphabet size. (2) We prove that determining whether a given integer array is a valid LCP array is NP-complete when we require more restricted forms of LCP array defined for a single cyclic or non-cyclic string or a multiset of non-cyclic strings. The result holds whether or not the alphabet is restricted to be binary. In combination, the two results show that the generalized form of LCP array for a multiset of cyclic strings is fundamentally different from the other more restricted forms.

Cite as

Juha Kärkkäinen, Marcin Piatkowski, and Simon J. Puglisi. String Inference from Longest-Common-Prefix Array. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 62:1-62:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{karkkainen_et_al:LIPIcs.ICALP.2017.62,
  author =	{K\"{a}rkk\"{a}inen, Juha and Piatkowski, Marcin and Puglisi, Simon J.},
  title =	{{String Inference from Longest-Common-Prefix Array}},
  booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
  pages =	{62:1--62:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-041-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{80},
  editor =	{Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.62},
  URN =		{urn:nbn:de:0030-drops-74989},
  doi =		{10.4230/LIPIcs.ICALP.2017.62},
  annote =	{Keywords: LCP array, string inference, BWT, suffix array, suffix tree, NP-hardness}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.CPM.2017.0

Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers

Authors: Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers

Cite as

28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{karkkainen_et_al:LIPIcs.CPM.2017.0,
  author =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.0},
  URN =		{urn:nbn:de:0030-drops-73178},
  doi =		{10.4230/LIPIcs.CPM.2017.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization, External Reviewers}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.1

Wheeler Graphs: Variations on a Theme by Burrows and Wheeler

Authors: Giovanni Manzini

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

The famous Burrows-Wheeler Transform was originally defined for single strings but variations have been developed for sets of strings, labelled trees, de Bruijn graphs, alignments, etc. In this talk we propose a unifying view that includes many of these variations and that we hope will simplify the search for more. Somewhat surprisingly we get our unifying view by considering the Nondeterministic Finite Automata related to different pattern-matching problems. We show that the state graphs associated with these automata have common properties that we summarize with the concept of a Wheeler graph. Using the notion of a Wheeler graph, we show that it is possible to process strings efficiently even if the automaton is nondeterministic. In addition, we show that Wheeler graphs can be compactly represented and traversed using up to three arrays with additional data structures supporting efficient rank and select operations. It turns out that these arrays coincide with, or are substantially equivalent to, the output of many Burrows-Wheeler Transform variants described in the literature. This is joint work with Travis Gagie and Jouni Sirén.

Cite as

Giovanni Manzini. Wheeler Graphs: Variations on a Theme by Burrows and Wheeler. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{manzini:LIPIcs.CPM.2017.1,
  author =	{Manzini, Giovanni},
  title =	{{Wheeler Graphs: Variations on a Theme by Burrows and Wheeler}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.1},
  URN =		{urn:nbn:de:0030-drops-73343},
  doi =		{10.4230/LIPIcs.CPM.2017.1},
  annote =	{Keywords: compressed data structures, pattern matching}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.2

Recompression of SLPs

Authors: Artur Jez

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

In this talk I will survey the recompression technique in case of SLPs. The technique is based on applying simple compression operations (replacement of pairs of two different letters by a new letter and replacement of maximal repetition of a letter by a new symbol) to strings represented by SLPs. To this end we modify the SLPs, so that performing such compression operations on SLPs is possible. For instance, when we want to replace ab in the string and SLP has a production X to aY and the string generated by Y is bw, then we alter the rule of Y so that it generates w and replace Y with bY in all rules. In this way the rule becomes X to abY and so ab can be replaced, similar operations are defined for the right sides of the nonterminals. As a result, we are interested mostly in the SLP representation rather than the string itself and its combinatorial properties. What we need to control, though, is the size of the SLP. With appropriate choices of substrings to be compressed it can be shown that it stays linear. The proposed method turned out to be surprisingly efficient and applicable in various scenarios: for instance it can be used to test the equality of SLPs in time O(n log N), where n is the size of the SLP and N the length of the generated string; on the other hand it can be used to approximate the smallest SLP for a given string, with the approximation ratio O(log(n/g)) where n is the length of the string and g the size of the smallest SLP for this string, matching the best known bounds.

Cite as

Artur Jez. Recompression of SLPs. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{jez:LIPIcs.CPM.2017.2,
  author =	{Jez, Artur},
  title =	{{Recompression of SLPs}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.2},
  URN =		{urn:nbn:de:0030-drops-73475},
  doi =		{10.4230/LIPIcs.CPM.2017.2},
  annote =	{Keywords: Straight Line Programs, smallest grammar problem, compression, pro- cessing compressed data, recompression}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.3

Shortest Superstring

Authors: Marcin Mucha

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

In the Shortest Superstring problem (SS) one has to find a shortest string s containing given strings s_1,...,s_n as substrings. The problem is NP-hard, so a natural question is that of its approximability. One natural approach to approximately solving SS is the following GREEDY heuristic: repeatedly merge two strings with the largest overlap until only a single string is left. This heuristic is conjectured to be a 2-approximation, but even after 30 years since the conjecture has been posed, we are still very far from proving it. The situation is better for non-greedy approximation algorithms, where several approaches yielding 2.5-approximation (and better) are known. In this talk, we will survey the main results in the area, focusing on the fundamental ideas and intuitions.

Cite as

Marcin Mucha. Shortest Superstring. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{mucha:LIPIcs.CPM.2017.3,
  author =	{Mucha, Marcin},
  title =	{{Shortest Superstring}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.3},
  URN =		{urn:nbn:de:0030-drops-73483},
  doi =		{10.4230/LIPIcs.CPM.2017.3},
  annote =	{Keywords: shortest superstring, approximation algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.4

Document Listing on Repetitive Collections with Guaranteed Performance

Authors: Gonzalo Navarro

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size N over alphabet [1,a] is composed of D copies of a string of size n, and s single-character edits are applied on the copies. We introduce the first document listing index with size O~(n + s), precisely O((n lg a + s lg^2 N) lg D) bits, and with useful worst-case time guarantees: Given a pattern of length m, the index reports the ndoc strings where it appears in time O(m^2 + m lg N (lg D + lg^e N) ndoc), for any constant e > 0.

Cite as

Gonzalo Navarro. Document Listing on Repetitive Collections with Guaranteed Performance. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 4:1-4:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{navarro:LIPIcs.CPM.2017.4,
  author =	{Navarro, Gonzalo},
  title =	{{Document Listing on Repetitive Collections with Guaranteed Performance}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{4:1--4:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.4},
  URN =		{urn:nbn:de:0030-drops-73268},
  doi =		{10.4230/LIPIcs.CPM.2017.4},
  annote =	{Keywords: repetitive string collections, document listing, grammar compression, range minimum queries, succinct data structures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.5

Path Queries on Functions

Authors: Travis Gagie, Meng He, and Gonzalo Navarro

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

Let f : [1..n] -> [1..n] be a function, and l : [1..n] -> [1..s] indicate a label assigned to each element of the domain. We design several compact data structures that answer various queries on the labels of paths in f. For example, we can find the minimum label in f^k (i) for a given i and any k >= 0 in a given range [k1..k2], using n lg n + O(n) bits, or the minimum label in f^(-k) (i) for a given i and k > 0, using 2n lg n + O(n) bits, both in time O(lg n/ lg lg n). By using n lg s + o(n lg s) further bits, we can also count, within the same time, the number of elements within a range of labels, and report each such element in O(1 + lg s / lg lg n) additional time. Several other possible queries are considered, such as top-t queries and t-majorities.

Cite as

Travis Gagie, Meng He, and Gonzalo Navarro. Path Queries on Functions. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{gagie_et_al:LIPIcs.CPM.2017.5,
  author =	{Gagie, Travis and He, Meng and Navarro, Gonzalo},
  title =	{{Path Queries on Functions}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{5:1--5:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.5},
  URN =		{urn:nbn:de:0030-drops-73274},
  doi =		{10.4230/LIPIcs.CPM.2017.5},
  annote =	{Keywords: succinct data structures, integer functions, range queries, trees and permutations}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.6

Deterministic Indexing for Packed Strings

Authors: Philip Bille, Inge Li Gørtz, and Frederik Rye Skjoldjensen

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In the deterministic variant the goal is to solve the string indexing problem without any randomization (at preprocessing time or query time). In the packed variant the strings are stored with several character in a single word, giving us the opportunity to read multiple characters simultaneously. Our main result is a new string index in the deterministic and packed setting. Given a packed string S of length n over an alphabet s, we show how to preprocess S in O(n) (deterministic) time and space O(n) such that given a packed pattern string of length m we can support queries in (deterministic) time O(m/a + log m + log log s), where a = w /log s is the number of characters packed in a word of size w = log n. Our query time is always at least as good as the previous best known bounds and whenever several characters are packed in a word, i.e., log s << w, the query times are faster.

Cite as

Philip Bille, Inge Li Gørtz, and Frederik Rye Skjoldjensen. Deterministic Indexing for Packed Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 6:1-6:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2017.6,
  author =	{Bille, Philip and G{\o}rtz, Inge Li and Skjoldjensen, Frederik Rye},
  title =	{{Deterministic Indexing for Packed Strings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{6:1--6:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.6},
  URN =		{urn:nbn:de:0030-drops-73351},
  doi =		{10.4230/LIPIcs.CPM.2017.6},
  annote =	{Keywords: suffix tree, suffix array, deterministic algorithm, word packing}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.7

Representing the Suffix Tree with the CDAWG

Authors: Djamal Belazzougui and Fabio Cunial

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

Given a string T, it is known that its suffix tree can be represented using the compact directed acyclic word graph (CDAWG) with e_T arcs, taking overall O(e_T+e_REV(T)) words of space, where REV(T) is the reverse of T, and supporting some key operations in time between O(1) and O(log(log(n))) in the worst case. This representation is especially appealing for highly repetitive strings, like collections of similar genomes or of version-controlled documents, in which e_T grows sublinearly in the length of T in practice. In this paper we augment such representation, supporting a number of additional queries in worst-case time between O(1) and O(log(n)) in the RAM model, without increasing space complexity asymptotically. Our technique, based on a heavy path decomposition of the suffix tree, enables also a representation of the suffix array, of the inverse suffix array, and of T itself, that takes O(e_T) words of space, and that supports random access in O(log(n)) time. Furthermore, we establish a connection between the reversed CDAWG of T and a context-free grammar that produces T and only T, which might have independent interest.

Cite as

Djamal Belazzougui and Fabio Cunial. Representing the Suffix Tree with the CDAWG. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 7:1-7:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{belazzougui_et_al:LIPIcs.CPM.2017.7,
  author =	{Belazzougui, Djamal and Cunial, Fabio},
  title =	{{Representing the Suffix Tree with the CDAWG}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{7:1--7:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.7},
  URN =		{urn:nbn:de:0030-drops-73402},
  doi =		{10.4230/LIPIcs.CPM.2017.7},
  annote =	{Keywords: CDAWG, suffix tree, heavy path decomposition, maximal repeat, context-free grammar}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.8

Position Heaps for Parameterized Strings

Authors: Diptarama Diptarama, Takashi Katsura, Yuhei Otomo, Kazuyuki Narisawa, and Ayumi Shinohara

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

We propose a new indexing structure for parameterized strings, called parameterized position heap. Parameterized position heap is applicable for parameterized pattern matching problem, where the pattern matches a substring of the text if there exists a bijective mapping from the symbols of the pattern to the symbols of the substring. We propose an online construction algorithm of parameterized position heap of a text and show that our algorithm runs in linear time with respect to the text size. We also show that by using parameterized position heap, we can find all occurrences of a pattern in the text in linear time with respect to the product of the pattern size and the alphabet size.

Cite as

Diptarama Diptarama, Takashi Katsura, Yuhei Otomo, Kazuyuki Narisawa, and Ayumi Shinohara. Position Heaps for Parameterized Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 8:1-8:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{diptarama_et_al:LIPIcs.CPM.2017.8,
  author =	{Diptarama, Diptarama and Katsura, Takashi and Otomo, Yuhei and Narisawa, Kazuyuki and Shinohara, Ayumi},
  title =	{{Position Heaps for Parameterized Strings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{8:1--8:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.8},
  URN =		{urn:nbn:de:0030-drops-73396},
  doi =		{10.4230/LIPIcs.CPM.2017.8},
  annote =	{Keywords: string matching, indexing structure, parameterized pattern matching, position heap}
}

41 Search Results for "K�rkk�inen, Juha"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message