DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2023.74

Faster Block Tree Construction

Authors: Dominik Köppl, Florian Kurpicz, and Daniel Meyer

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Abstract

The block tree [Belazzougui et al. J. Comput. Syst. Sci. '21] is a compressed text index that can answer access (extract a character at a position), rank (number of occurrences of a specified character in a prefix of the text), and select (size of smallest prefix such that a specified character has a specified rank) queries. It requires O(zlog(n/z)) words of space, where z is the number of Lempel-Ziv factors of the text. For some highly repetitive inputs, a block tree can require as little as 0.015 bits per character of the text. Small values of z make the block tree a space-efficient alternative to the wavelet tree, which is another index for these three types of queries. While wavelet trees can be constructed fast in practice, up so far compressed versions of the wavelet tree only leverage statistical compression, meaning that they are blind to spaced repetitions. To make block trees usable in practice, a first step is to find ways in constructing them efficiently. We address this problem by presenting a practically efficient construction algorithm for block trees, which is up to an order of magnitude faster than previous implementations. Additionally, we parallelize our implementation, making it the first block tree construction implementation that works in parallel in shared memory.

Cite as

Dominik Köppl, Florian Kurpicz, and Daniel Meyer. Faster Block Tree Construction. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 74:1-74:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{koppl_et_al:LIPIcs.ESA.2023.74,
  author =	{K\"{o}ppl, Dominik and Kurpicz, Florian and Meyer, Daniel},
  title =	{{Faster Block Tree Construction}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{74:1--74:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.74},
  URN =		{urn:nbn:de:0030-drops-187274},
  doi =		{10.4230/LIPIcs.ESA.2023.74},
  annote =	{Keywords: compressed data structure, block tree, Lempel-Ziv compression, longest previous factor array, rank and select}
}

Document

DOI: 10.4230/LIPIcs.WABI.2023.13

Acceleration of FM-Index Queries Through Prefix-Free Parsing

Authors: Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, and Travis Gagie

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

FM-indexes are a crucial data structure in DNA alignment, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [Ferragina and Fischer, 2007] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. Last year, Deng et al. [Deng et al., 2022] proposed parsing genomic data by induced suffix sorting, and showed the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing - which takes parameters that let us tune the average length of the phrases - instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38. And was consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it is very clear that our method accelerates the performance of count over all state-of-the-art methods with a minor increase in the memory.

Cite as

Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, and Travis Gagie. Acceleration of FM-Index Queries Through Prefix-Free Parsing. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 13:1-13:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{hong_et_al:LIPIcs.WABI.2023.13,
  author =	{Hong, Aaron and Oliva, Marco and K\"{o}ppl, Dominik and Bannai, Hideo and Boucher, Christina and Gagie, Travis},
  title =	{{Acceleration of FM-Index Queries Through Prefix-Free Parsing}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{13:1--13:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.13},
  URN =		{urn:nbn:de:0030-drops-186390},
  doi =		{10.4230/LIPIcs.WABI.2023.13},
  annote =	{Keywords: FM-index, pangenomics, scalability, word-based indexing, random access}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.3

Optimal LZ-End Parsing Is Hard

Authors: Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, and Takeaki Uno

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

LZ-End is a variant of the well-known Lempel-Ziv parsing family such that each phrase of the parsing has a previous occurrence, with the additional constraint that the previous occurrence must end at the end of a previous phrase. LZ-End was initially proposed as a greedy parsing, where each phrase is determined greedily from left to right, as the longest factor that satisfies the above constraint [Kreft & Navarro, 2010]. In this work, we consider an optimal LZ-End parsing that has the minimum number of phrases in such parsings. We show that a decision version of computing the optimal LZ-End parsing is NP-complete by showing a reduction from the vertex cover problem. Moreover, we give a MAX-SAT formulation for the optimal LZ-End parsing adapting an approach for computing various NP-hard repetitiveness measures recently presented by [Bannai et al., 2022]. We also consider the approximation ratio of the size of greedy LZ-End parsing to the size of the optimal LZ-End parsing, and give a lower bound of the ratio which asymptotically approaches 2.

Cite as

Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, and Takeaki Uno. Optimal LZ-End Parsing Is Hard. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 3:1-3:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2023.3,
  author =	{Bannai, Hideo and Funakoshi, Mitsuru and Kurita, Kazuhiro and Nakashima, Yuto and Seto, Kazuhisa and Uno, Takeaki},
  title =	{{Optimal LZ-End Parsing Is Hard}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{3:1--3:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.3},
  URN =		{urn:nbn:de:0030-drops-179571},
  doi =		{10.4230/LIPIcs.CPM.2023.3},
  annote =	{Keywords: Data Compression, LZ-End, Repetitiveness measures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.17

Encoding Hard String Problems with Answer Set Programming

Authors: Dominik Köppl

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Despite the simple, one-dimensional nature of strings, several computationally hard problems on strings are known. Tackling hard problems beyond sizes of toy instances with straight-forward solutions is infeasible. To solve these problems on datasets of even small sizes, effort has to be put into the conception of algorithms leveraging profound characteristics of the input. Finding these characteristics can be eased by rapidly creating and evaluating prototypes of new concepts in how to tackle hard problems. Such a rapid-prototyping method for hard problems is answer set programming (ASP). In this light, we study the application of ASP on five NP-hard optimization problems in the field of strings. We provide MAX-SAT and ASP encodings, and empirically reason about the merits and flaws when working with ASP solvers.

Cite as

Dominik Köppl. Encoding Hard String Problems with Answer Set Programming. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 17:1-17:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{koppl:LIPIcs.CPM.2023.17,
  author =	{K\"{o}ppl, Dominik},
  title =	{{Encoding Hard String Problems with Answer Set Programming}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{17:1--17:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.17},
  URN =		{urn:nbn:de:0030-drops-179711},
  doi =		{10.4230/LIPIcs.CPM.2023.17},
  annote =	{Keywords: optimization problems, answer set programming, MAX-SAT encoding, NP-hard string problems}
}

Document

DOI: 10.4230/LIPIcs.ESA.2022.12

Computing NP-Hard Repetitiveness Measures via MAX-SAT

Authors: Hideo Bannai, Keisuke Goto, Masakazu Ishihata, Shunsuke Kanda, Dominik Köppl, and Takaaki Nishimoto

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

Abstract

Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional macro scheme, and the smallest size of a straight-line program. While a vast variety of implementations for heuristically computing approximations exist, exact computation of these measures has received little to no attention. In this paper, we present MAX-SAT formulations that provide the first non-trivial implementations for exact computation of smallest string attractors, smallest bidirectional macro schemes, and smallest straight-line programs. Computational experiments show that our implementations work for texts of length up to a few hundred for straight-line programs and bidirectional macro schemes, and texts even over a million for string attractors.

Cite as

Hideo Bannai, Keisuke Goto, Masakazu Ishihata, Shunsuke Kanda, Dominik Köppl, and Takaaki Nishimoto. Computing NP-Hard Repetitiveness Measures via MAX-SAT. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 12:1-12:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.ESA.2022.12,
  author =	{Bannai, Hideo and Goto, Keisuke and Ishihata, Masakazu and Kanda, Shunsuke and K\"{o}ppl, Dominik and Nishimoto, Takaaki},
  title =	{{Computing NP-Hard Repetitiveness Measures via MAX-SAT}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{12:1--12:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.12},
  URN =		{urn:nbn:de:0030-drops-169505},
  doi =		{10.4230/LIPIcs.ESA.2022.12},
  annote =	{Keywords: repetitiveness measures, string attractor, bidirectional macro scheme}
}

Document

DOI: 10.4230/LIPIcs.CPM.2021.7

Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time

Authors: Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Piątkowski

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Abstract

The Burrows-Wheeler transform (BWT) is a permutation whose applications are prevalent in data compression and text indexing. The bijective BWT (BBWT) is a bijective variant of it. Although it is known that the BWT can be constructed in linear time for integer alphabets by using a linear time suffix array construction algorithm, it was up to now only conjectured that the BBWT can also be constructed in linear time. We confirm this conjecture in the word RAM model by proposing a construction algorithm that is based on SAIS, improving the best known result of O(n lg n / lg lg n) time to linear. Since we can reduce the problem of constructing the extended BWT to constructing the BBWT in linear time, we obtain a linear-time algorithm computing the extended BWT at the same time.

Cite as

Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Piątkowski. Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 7:1-7:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2021.7,
  author =	{Bannai, Hideo and K\"{a}rkk\"{a}inen, Juha and K\"{o}ppl, Dominik and Pi\k{a}tkowski, Marcin},
  title =	{{Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{7:1--7:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.7},
  URN =		{urn:nbn:de:0030-drops-139588},
  doi =		{10.4230/LIPIcs.CPM.2021.7},
  annote =	{Keywords: Burrows-Wheeler Transform, Lyndon words, Circular Suffix Array, Suffix Array Construction Algorithm}
}

Document

DOI: 10.4230/LIPIcs.SEA.2020.7

Fast and Simple Compact Hashing via Bucketing

Authors: Dominik Köppl, Simon J. Puglisi, and Rajeev Raman

Published in: LIPIcs, Volume 160, 18th International Symposium on Experimental Algorithms (SEA 2020)

Abstract

Compact hash tables store a set S of n key-value pairs, where the keys are from the universe U = {0,…,u-1}, and the values are v-bit integers, in close to B(u, n) + nv bits of space, where {b(u, n)} = log₂ binom(u,n) is the information-theoretic lower bound for representing the set of keys in S, and support operations insert, delete and lookup on S. Compact hash tables have received significant attention in recent years, and approaches dating back to Cleary [IEEE T. Comput, 1984], as well as more recent ones have been implemented and used in a number of applications. However, the wins on space usage of these approaches are outweighed by their slowness relative to conventional hash tables. In this paper, we demonstrate that compact hash tables based upon a simple idea of bucketing practically outperform existing compact hash table implementations in terms of memory usage and construction time, and existing fast hash table implementations in terms of memory usage (and sometimes also in terms of construction time). A related notion is that of a compact Hash ID map, which stores a set Ŝ of n keys from U, and implicitly associates each key in Ŝ with a unique value (its ID), chosen by the data structure itself, which is an integer of magnitude O(n), and supports inserts and lookups on Ŝ, while using close to B(u,n) bits. One of our approaches is suitable for use as a compact Hash ID map.

Cite as

Dominik Köppl, Simon J. Puglisi, and Rajeev Raman. Fast and Simple Compact Hashing via Bucketing. In 18th International Symposium on Experimental Algorithms (SEA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 160, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{koppl_et_al:LIPIcs.SEA.2020.7,
  author =	{K\"{o}ppl, Dominik and Puglisi, Simon J. and Raman, Rajeev},
  title =	{{Fast and Simple Compact Hashing via Bucketing}},
  booktitle =	{18th International Symposium on Experimental Algorithms (SEA 2020)},
  pages =	{7:1--7:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-148-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{160},
  editor =	{Faro, Simone and Cantone, Domenico},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2020.7},
  URN =		{urn:nbn:de:0030-drops-120817},
  doi =		{10.4230/LIPIcs.SEA.2020.7},
  annote =	{Keywords: compact hashing, hash table, separate chaining}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.21

In-Place Bijective Burrows-Wheeler Transforms

Authors: Dominik Köppl, Daiki Hashimoto, Diptarama Hendrian, and Ayumi Shinohara

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

One of the most well-known variants of the Burrows-Wheeler transform (BWT) [Burrows and Wheeler, 1994] is the bijective BWT (BBWT) [Gil and Scott, arXiv 2012], which applies the extended BWT (EBWT) [Mantaci et al., TCS 2007] to the multiset of Lyndon factors of a given text. Since the EBWT is invertible, the BBWT is a bijective transform in the sense that the inverse image of the EBWT restores this multiset of Lyndon factors such that the original text can be obtained by sorting these factors in non-increasing order. In this paper, we present algorithms constructing or inverting the BBWT in-place using quadratic time. We also present conversions from the BBWT to the BWT, or vice versa, either (a) in-place using quadratic time, or (b) in the run-length compressed setting using 𝒪(n lg r / lg lg r) time with 𝒪(r lg n) bits of words, where r is the sum of character runs in the BWT and the BBWT.

Cite as

Dominik Köppl, Daiki Hashimoto, Diptarama Hendrian, and Ayumi Shinohara. In-Place Bijective Burrows-Wheeler Transforms. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 21:1-21:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{koppl_et_al:LIPIcs.CPM.2020.21,
  author =	{K\"{o}ppl, Dominik and Hashimoto, Daiki and Hendrian, Diptarama and Shinohara, Ayumi},
  title =	{{In-Place Bijective Burrows-Wheeler Transforms}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{21:1--21:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.21},
  URN =		{urn:nbn:de:0030-drops-121463},
  doi =		{10.4230/LIPIcs.CPM.2020.21},
  annote =	{Keywords: In-Place Algorithms, Burrows-Wheeler transform, Lyndon words}
}

Document

DOI: 10.4230/LIPIcs.ESA.2019.41

Bidirectional Text Compression in External Memory

Authors: Patrick Dinklage, Jonas Ellert, Johannes Fischer, Dominik Köppl, and Manuel Penschuck

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)

Abstract

Bidirectional compression algorithms work by substituting repeated substrings by references that, unlike in the famous LZ77-scheme, can point to either direction. We present such an algorithm that is particularly suited for an external memory implementation. We evaluate it experimentally on large data sets of size up to 128 GiB (using only 16 GiB of RAM) and show that it is significantly faster than all known LZ77 compressors, while producing a roughly similar number of factors. We also introduce an external memory decompressor for texts compressed with any uni- or bidirectional compression scheme.

Cite as

Patrick Dinklage, Jonas Ellert, Johannes Fischer, Dominik Köppl, and Manuel Penschuck. Bidirectional Text Compression in External Memory. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 41:1-41:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{dinklage_et_al:LIPIcs.ESA.2019.41,
  author =	{Dinklage, Patrick and Ellert, Jonas and Fischer, Johannes and K\"{o}ppl, Dominik and Penschuck, Manuel},
  title =	{{Bidirectional Text Compression in External Memory}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{41:1--41:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.41},
  URN =		{urn:nbn:de:0030-drops-111624},
  doi =		{10.4230/LIPIcs.ESA.2019.41},
  annote =	{Keywords: text compression, bidirectional parsing, text decompression, external algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.17

Indexing the Bijective BWT

Authors: Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Pia̧tkowski

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

The Burrows-Wheeler transform (BWT) is a permutation whose applications are prevalent in data compression and text indexing. The bijective BWT is a bijective variant of it that has not yet been studied for text indexing applications. We fill this gap by proposing a self-index built on the bijective BWT . The self-index applies the backward search technique of the FM-index to find a pattern P with O(|P| lg|P|) backward search steps.

Cite as

Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Pia̧tkowski. Indexing the Bijective BWT. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2019.17,
  author =	{Bannai, Hideo and K\"{a}rkk\"{a}inen, Juha and K\"{o}ppl, Dominik and Pia̧tkowski, Marcin},
  title =	{{Indexing the Bijective BWT}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{17:1--17:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.17},
  URN =		{urn:nbn:de:0030-drops-104887},
  doi =		{10.4230/LIPIcs.CPM.2019.17},
  annote =	{Keywords: Burrows-Wheeler Transform, Lyndon words, Text Indexing}
}

Document

DOI: 10.4230/LIPIcs.SEA.2017.13

Compression with the tudocomp Framework

Authors: Patrick Dinklage, Johannes Fischer, Dominik Köppl, Marvin Löbel, and Kunihiko Sadakane

Published in: LIPIcs, Volume 75, 16th International Symposium on Experimental Algorithms (SEA 2017)

Abstract

We present a framework facilitating the implementation and comparison of text compression algorithms. We evaluate its features by a case study on two novel compression algorithms based on the Lempel-Ziv compression schemes that perform well on highly repetitive texts.

Cite as

Patrick Dinklage, Johannes Fischer, Dominik Köppl, Marvin Löbel, and Kunihiko Sadakane. Compression with the tudocomp Framework. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 13:1-13:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{dinklage_et_al:LIPIcs.SEA.2017.13,
  author =	{Dinklage, Patrick and Fischer, Johannes and K\"{o}ppl, Dominik and L\"{o}bel, Marvin and Sadakane, Kunihiko},
  title =	{{Compression with the tudocomp Framework}},
  booktitle =	{16th International Symposium on Experimental Algorithms (SEA 2017)},
  pages =	{13:1--13:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-036-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{75},
  editor =	{Iliopoulos, Costas S. and Pissis, Solon P. and Puglisi, Simon J. and Raman, Rajeev},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2017.13},
  URN =		{urn:nbn:de:0030-drops-76015},
  doi =		{10.4230/LIPIcs.SEA.2017.13},
  annote =	{Keywords: lossless compression, compression framework, compression library, algorithm engineering, application of string algorithms}
}

@InProceedings{dinklage_et_al:LIPIcs.SEA.2017.13,
  author =	{Dinklage, Patrick and Fischer, Johannes and K\"{o}ppl, Dominik and L\"{o}bel, Marvin and Sadakane, Kunihiko},
  title =	{{Compression with the tudocomp Framework}},
  booktitle =	{16th International Symposium on Experimental Algorithms (SEA 2017)},
  pages =	{13:1--13:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-036-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{75},
  editor =	{Iliopoulos, Costas S. and Pissis, Solon P. and Puglisi, Simon J. and Raman, Rajeev},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2017.13},
  URN =		{urn:nbn:de:0030-drops-76015},
  doi =		{10.4230/LIPIcs.SEA.2017.13},
  annote =	{Keywords: lossless compression, compression framework, compression library, algorithm engineering, application of string algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.22

Computing All Distinct Squares in Linear Time for Integer Alphabets

Authors: Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

Given a string on an integer alphabet, we present an algorithm that computes the set of all distinct squares belonging to this string in time linear to the string length. As an application, we show how to compute the tree topology of the minimal augmented suffix tree in linear time. Asides from that, we elaborate an algorithm computing the longest previous table in a succinct representation using compressed working space.

Cite as

Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl. Computing All Distinct Squares in Linear Time for Integer Alphabets. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 22:1-22:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2017.22,
  author =	{Bannai, Hideo and Inenaga, Shunsuke and K\"{o}ppl, Dominik},
  title =	{{Computing All Distinct Squares in Linear Time for Integer Alphabets}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{22:1--22:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.22},
  URN =		{urn:nbn:de:0030-drops-73224},
  doi =		{10.4230/LIPIcs.CPM.2017.22},
  annote =	{Keywords: tandem repeats, distinct squares, counting algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2016.26

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Authors: Johannes Fischer, Dominik Köppl, and Florian Kurpicz

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Abstract

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with p processors. Given a static text of length n, we first show how to compute the suffix array interval of a given pattern of length m in O(m/p + lg p + lg lg p * lg lg n) time for p <= m. For approximate pattern matching with k differences or mismatches, we show how to compute all occurrences of a given pattern in O((m^k sigma^k)/p max (k, lg lg n) + (1+m/p) lg p * lg lg n + occ} time, where sigma is the size of the alphabet and p <= sigma^k m^k. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns P and P', we present a data structure for computing the interval of PP' in O(lg lg n) sequential time, or in O(1 + lg_p lg n) parallel time. All our data structures are of size O(n) bits (in addition to the suffix array).

Cite as

Johannes Fischer, Dominik Köppl, and Florian Kurpicz. On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 26:1-26:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{fischer_et_al:LIPIcs.CPM.2016.26,
  author =	{Fischer, Johannes and K\"{o}ppl, Dominik and Kurpicz, Florian},
  title =	{{On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{26:1--26:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.26},
  URN =		{urn:nbn:de:0030-drops-60669},
  doi =		{10.4230/LIPIcs.CPM.2016.26},
  annote =	{Keywords: parallel algorithms, pattern matching, approximate string matching}
}

Document

DOI: 10.4230/LIPIcs.STACS.2016.39

Efficiently Finding All Maximal alpha-gapped Repeats

Authors: Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea

Published in: LIPIcs, Volume 47, 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)

Abstract

For alpha >=1, an alpha-gapped repeat in a word w is a factor uvu of w such that |uv| <= alpha * |u|; the two occurrences of a factor u in such a repeat are called arms. Such a repeat is called maximal if its arms cannot be extended simultaneously with the same symbol to the right nor to the left. We show that the number of all maximal alpha-gapped repeats occurring in words of length n is upper bounded by 18 * alpha * n, allowing us to construct an algorithm finding all maximal alpha-gapped repeats of a word on an integer alphabet of size n^{O}(1)} in {O}(alpha * n) time. This result is optimal as there are words that have Theta(alpha * n) maximal alpha-gapped repeats. Our techniques can be extended to get comparable results in the case of alpha-gapped palindromes, i.e., factors uvu^{T} with |uv| <= alpha |u|.

Cite as

Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea. Efficiently Finding All Maximal alpha-gapped Repeats. In 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 47, pp. 39:1-39:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2016.39,
  author =	{Gawrychowski, Pawel and I, Tomohiro and Inenaga, Shunsuke and K\"{o}ppl, Dominik and Manea, Florin},
  title =	{{Efficiently Finding All Maximal alpha-gapped Repeats}},
  booktitle =	{33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)},
  pages =	{39:1--39:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-001-9},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{47},
  editor =	{Ollinger, Nicolas and Vollmer, Heribert},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2016.39},
  URN =		{urn:nbn:de:0030-drops-57408},
  doi =		{10.4230/LIPIcs.STACS.2016.39},
  annote =	{Keywords: combinatorics on words, counting algorithms}
}

14 Search Results for "Köppl, Dominik"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message