DROPS

Document

DOI: 10.4230/LIPIcs.SEA.2024.1

Move-r: Optimizing the r-index

Authors: Nico Bertram, Johannes Fischer, and Lukas Nalbach

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)

Abstract

We present a static text index called Move-r, which is a highly optimized version of the r-index ([Travis Gagie et al., 2020] Gagie et al., 2020) that encorporates recent theoretical developments of the move data structure ([Takaaki Nishimoto and Yasuo Tabei, 2021] Nishimoto and Tabei, 2021). The r-index is the method of choice for indexing highly repetitive texts, such as different versions of a text document or DNA from the same species, as it exploits the compressibilty of the underlying data. With Move-r, we can answer count- and locate queries 2-35 (typically 15) times as fast as with any other r-index supporting locate queries while being 0.8-2.5 (typically 2) times as large. A Move-r index can be constructed 0.9-2 (typically 2) times as fast while using 1/3-1 (typically 1/2) times as much space.

Cite as

Nico Bertram, Johannes Fischer, and Lukas Nalbach. Move-r: Optimizing the r-index. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 1:1-1:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bertram_et_al:LIPIcs.SEA.2024.1,
  author =	{Bertram, Nico and Fischer, Johannes and Nalbach, Lukas},
  title =	{{Move-r: Optimizing the r-index}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{1:1--1:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.1},
  URN =		{urn:nbn:de:0030-drops-203662},
  doi =		{10.4230/LIPIcs.SEA.2024.1},
  annote =	{Keywords: Compressed Text Index, Burrows-Wheeler Transform}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.4

Sliding Window String Indexing in Streams

Authors: Philip Bille, Johannes Fischer, Inge Li Gørtz, Max Rishøj Pedersen, and Tord Joakim Stordalen

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

Given a string S over an alphabet Σ, the string indexing problem is to preprocess S to subsequently support efficient pattern matching queries, that is, given a pattern string P report all the occurrences of P in S. In this paper we study the streaming sliding window string indexing problem. Here the string S arrives as a stream, one character at a time, and the goal is to maintain an index of the last w characters, called the window, for a specified parameter w. At any point in time a pattern matching query for a pattern P may arrive, also streamed one character at a time, and all occurrences of P within the current window must be returned. The streaming sliding window string indexing problem naturally captures scenarios where we want to index the most recent data (i.e. the window) of a stream while supporting efficient pattern matching. Our main result is a simple O(w) space data structure that uses O(log w) time with high probability to process each character from both the input string S and any pattern string P. Reporting each occurrence of P uses additional constant time per reported occurrence. Compared to previous work in similar scenarios this result is the first to achieve an efficient worst-case time per character from the input stream with high probability. We also consider a delayed variant of the problem, where a query may be answered at any point within the next δ characters that arrive from either stream. We present an O(w + δ) space data structure for this problem that improves the above time bounds to O(log (w/δ)). In particular, for a delay of δ = ε w we obtain an O(w) space data structure with constant time processing per character. The key idea to achieve our result is a novel and simple hierarchical structure of suffix trees of independent interest, inspired by the classic log-structured merge trees.

Cite as

Philip Bille, Johannes Fischer, Inge Li Gørtz, Max Rishøj Pedersen, and Tord Joakim Stordalen. Sliding Window String Indexing in Streams. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2023.4,
  author =	{Bille, Philip and Fischer, Johannes and G{\o}rtz, Inge Li and Pedersen, Max Rish{\o}j and Stordalen, Tord Joakim},
  title =	{{Sliding Window String Indexing in Streams}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{4:1--4:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.4},
  URN =		{urn:nbn:de:0030-drops-179587},
  doi =		{10.4230/LIPIcs.CPM.2023.4},
  annote =	{Keywords: String indexing, pattern matching, sliding window, streaming}
}

Document

DOI: 10.4230/LIPIcs.SEA.2022.10

A Parallel Framework for Approximate Max-Dicut in Partitionable Graphs

Authors: Nico Bertram, Jonas Ellert, and Johannes Fischer

Published in: LIPIcs, Volume 233, 20th International Symposium on Experimental Algorithms (SEA 2022)

Abstract

Computing a maximum cut in undirected and weighted graphs is a well studied problem and has many practical solutions that also scale well in shared memory (despite its NP-completeness). For its counterpart in directed graphs, however, we are not aware of practical solutions that also utilize parallelism. We engineer a framework that computes a high quality approximate cut in directed and weighted graphs by using a graph partitioning approach. The general idea is to partition a graph into k subgraphs using a parallel partitioning algorithm of our choice (the first ingredient of our framework). Then, for each subgraph in parallel, we compute a cut using any polynomial time approximation algorithm (the second ingredient). In a final step, we merge the locally computed solutions using a high-quality or exact parallel Max-Dicut algorithm (the third ingredient). On graphs that can be partitioned well, the quality of the computed cut is significantly better than the best cut achieved by any linear time algorithm. This is particularly relevant for large graphs, where linear time algorithms used to be the only feasible option.

Cite as

Nico Bertram, Jonas Ellert, and Johannes Fischer. A Parallel Framework for Approximate Max-Dicut in Partitionable Graphs. In 20th International Symposium on Experimental Algorithms (SEA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 233, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bertram_et_al:LIPIcs.SEA.2022.10,
  author =	{Bertram, Nico and Ellert, Jonas and Fischer, Johannes},
  title =	{{A Parallel Framework for Approximate Max-Dicut in Partitionable Graphs}},
  booktitle =	{20th International Symposium on Experimental Algorithms (SEA 2022)},
  pages =	{10:1--10:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-251-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{233},
  editor =	{Schulz, Christian and U\c{c}ar, Bora},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2022.10},
  URN =		{urn:nbn:de:0030-drops-165441},
  doi =		{10.4230/LIPIcs.SEA.2022.10},
  annote =	{Keywords: maximum directed cut, graph partitioning, algorithm engineering, approximation, parallel algorithms}
}

Document

DOI: 10.4230/LIPIcs.ESA.2021.15

Lyndon Words Accelerate Suffix Sorting

Authors: Nico Bertram, Jonas Ellert, and Johannes Fischer

Published in: LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

Abstract

Suffix sorting is arguably the most fundamental building block in string algorithmics, like regular sorting in the broader field of algorithms. It is thus not surprising that the literature is full of algorithms for suffix sorting, in particular focusing on their practicality. However, the advances on practical suffix sorting stalled with the emergence of the DivSufSort algorithm more than 10 years ago, which, up to date, has remained the fastest suffix sorter. This article shows how properties of Lyndon words can be exploited algorithmically to accelerate suffix sorting again. Our new algorithm is 6-19% faster than DivSufSort on real-world texts, and up to three times as fast on artificial repetitive texts. It can also be parallelized, where similar speedups can be observed. Thus, we make the first advances in practical suffix sorting after more than a decade of standstill.

Cite as

Nico Bertram, Jonas Ellert, and Johannes Fischer. Lyndon Words Accelerate Suffix Sorting. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bertram_et_al:LIPIcs.ESA.2021.15,
  author =	{Bertram, Nico and Ellert, Jonas and Fischer, Johannes},
  title =	{{Lyndon Words Accelerate Suffix Sorting}},
  booktitle =	{29th Annual European Symposium on Algorithms (ESA 2021)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-204-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{204},
  editor =	{Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.15},
  URN =		{urn:nbn:de:0030-drops-145961},
  doi =		{10.4230/LIPIcs.ESA.2021.15},
  annote =	{Keywords: Suffix array, suffix sorting, Lyndon words, string algorithms}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2021.63

Linear Time Runs Over General Ordered Alphabets

Authors: Jonas Ellert and Johannes Fischer

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

Abstract

A run in a string is a maximal periodic substring. For example, the string bananatree contains the runs anana = (an)^{5/2} and ee = e². There are less than n runs in any length-n string, and computing all runs for a string over a linearly-sortable alphabet takes 𝒪(n) time (Bannai et al., SIAM J. Comput. 2017). Kosolobov conjectured that there also exists a linear time runs algorithm for general ordered alphabets (Inf. Process. Lett. 2016). The conjecture was almost proven by Crochemore et al., who presented an 𝒪(nα(n)) time algorithm (where α(n) is the extremely slowly growing inverse Ackermann function). We show how to achieve 𝒪(n) time by exploiting combinatorial properties of the Lyndon array, thus proving Kosolobov’s conjecture. This also positively answers the at least 29-year-old question whether square-freeness can be tested in linear time over general ordered alphabets (Breslauer, PhD thesis, Columbia University 1992).

Cite as

Jonas Ellert and Johannes Fischer. Linear Time Runs Over General Ordered Alphabets. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 63:1-63:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{ellert_et_al:LIPIcs.ICALP.2021.63,
  author =	{Ellert, Jonas and Fischer, Johannes},
  title =	{{Linear Time Runs Over General Ordered Alphabets}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{63:1--63:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.63},
  URN =		{urn:nbn:de:0030-drops-141322},
  doi =		{10.4230/LIPIcs.ICALP.2021.63},
  annote =	{Keywords: String algorithms, Lyndon array, runs, squares, longest common extension, general ordered alphabets, combinatorics on words}
}

Document

DOI: 10.4230/LIPIcs.SEA.2021.7

Engineering Predecessor Data Structures for Dynamic Integer Sets

Authors: Patrick Dinklage, Johannes Fischer, and Alexander Herlez

Published in: LIPIcs, Volume 190, 19th International Symposium on Experimental Algorithms (SEA 2021)

Abstract

We present highly optimized data structures for the dynamic predecessor problem, where the task is to maintain a set S of w-bit numbers under insertions, deletions, and predecessor queries (return the largest element in S no larger than a given key). The problem of finding predecessors can be viewed as a generalized form of the membership problem, or as a simple version of the nearest neighbour problem. It lies at the core of various real-world problems such as internet routing. In this work, we engineer (1) a simple implementation of the idea of universe reduction, similar to van-Emde-Boas trees (2) variants of y-fast tries [Willard, IPL'83], and (3) B-trees with different strategies for organizing the keys contained in the nodes, including an implementation of dynamic fusion nodes [Pǎtraşcu and Thorup, FOCS'14]. We implement our data structures for w = 32,40,64, which covers most typical scenarios. Our data structures finish workloads faster than previous approaches while being significantly more space-efficient, e.g., they clearly outperform standard implementations of the STL by finishing up to four times as fast using less than a third of the memory. Our tests also provide more general insights on data structure design, such as how small sets should be stored and handled and if and when new CPU instructions such as advanced vector extensions pay off.

Cite as

Patrick Dinklage, Johannes Fischer, and Alexander Herlez. Engineering Predecessor Data Structures for Dynamic Integer Sets. In 19th International Symposium on Experimental Algorithms (SEA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 190, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{dinklage_et_al:LIPIcs.SEA.2021.7,
  author =	{Dinklage, Patrick and Fischer, Johannes and Herlez, Alexander},
  title =	{{Engineering Predecessor Data Structures for Dynamic Integer Sets}},
  booktitle =	{19th International Symposium on Experimental Algorithms (SEA 2021)},
  pages =	{7:1--7:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-185-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{190},
  editor =	{Coudert, David and Natale, Emanuele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2021.7},
  URN =		{urn:nbn:de:0030-drops-137799},
  doi =		{10.4230/LIPIcs.SEA.2021.7},
  annote =	{Keywords: integer data structures, dynamic data structures, predecessor, universe reduction, y-fast trie, fusion tree, B-tree}
}

Document

DOI: 10.4230/LIPIcs.ESA.2020.39

Practical Performance of Space Efficient Data Structures for Longest Common Extensions

Authors: Patrick Dinklage, Johannes Fischer, Alexander Herlez, Tomasz Kociumaka, and Florian Kurpicz

Published in: LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)

Abstract

For a text T[1,n], a Longest Common Extension (LCE) query lce_T(i,j) asks for the length of the longest common prefix of the suffixes T[i,n] and T[j,n] identified by their starting positions 1 ≤ i,j ≤ n. A classic problem in stringology asks to preprocess a static text T[1,n] over an alphabet of size σ so that LCE queries can be efficiently answered on-line. Since its introduction in the 1980’s, this problem has found numerous applications: in suffix sorting, edit distance computation, approximate pattern matching, regularities finding, string mining, and many more. Text-book solutions offer O(n) preprocessing time and O(1) query time, but they employ memory-heavy data structures, such as suffix arrays, in practice several times bigger than the text itself. Very recently, more space efficient solutions using O(nlogσ) bits of total space or even only O(log n) bits of extra space have been proposed: string synchronizing sets [Kempa and Kociumaka, STOC'19, and Birenzwige et al., SODA'20] and in-place fingerprinting [Prezza, SODA'18]. The goal of this article is to present well-engineered implementations of these new solutions and study their practicality on a commonly agreed text corpus. We show that both perform extremely well in practice, with space consumption of only around 10% of the input size for string synchronizing sets (around 20% for highly repetitive texts), and essentially no extra space for fingerprinting. Interestingly, our experiments also show that both solutions become much faster than naive scanning even for finding common prefixes of moderate length, contradicting a common belief that sophisticated data structures for LCE queries are not competitive with naive approaches [Ilie and Tinta, SPIRE'09].

Cite as

Patrick Dinklage, Johannes Fischer, Alexander Herlez, Tomasz Kociumaka, and Florian Kurpicz. Practical Performance of Space Efficient Data Structures for Longest Common Extensions. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 39:1-39:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{dinklage_et_al:LIPIcs.ESA.2020.39,
  author =	{Dinklage, Patrick and Fischer, Johannes and Herlez, Alexander and Kociumaka, Tomasz and Kurpicz, Florian},
  title =	{{Practical Performance of Space Efficient Data Structures for Longest Common Extensions}},
  booktitle =	{28th Annual European Symposium on Algorithms (ESA 2020)},
  pages =	{39:1--39:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-162-7},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{173},
  editor =	{Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.39},
  URN =		{urn:nbn:de:0030-drops-129050},
  doi =		{10.4230/LIPIcs.ESA.2020.39},
  annote =	{Keywords: text indexing, longest common prefix, space efficient data structures}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2020.14

Space Efficient Construction of Lyndon Arrays in Linear Time

Authors: Philip Bille, Jonas Ellert, Johannes Fischer, Inge Li Gørtz, Florian Kurpicz, J. Ian Munro, and Eva Rotenberg

Published in: LIPIcs, Volume 168, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)

Abstract

Given a string S of length n, its Lyndon array identifies for each suffix S[i..n] the next lexicographically smaller suffix S[j..n], i.e. the minimal index j > i with S[i..n] ≻ S[j..n]. Apart from its plain (n log₂ n)-bit array representation, the Lyndon array can also be encoded as a succinct parentheses sequence that requires only 2n bits of space. While linear time construction algorithms for both representations exist, it has previously been unknown if the same time bound can be achieved with less than Ω(n lg n) bits of additional working space. We show that, in fact, o(n) additional bits are sufficient to compute the succinct 2n-bit version of the Lyndon array in linear time. For the plain (n log₂ n)-bit version, we only need 𝒪(1) additional words to achieve linear time. Our space efficient construction algorithm makes the Lyndon array more accessible as a fundamental data structure in applications like full-text indexing.

Cite as

Philip Bille, Jonas Ellert, Johannes Fischer, Inge Li Gørtz, Florian Kurpicz, J. Ian Munro, and Eva Rotenberg. Space Efficient Construction of Lyndon Arrays in Linear Time. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.ICALP.2020.14,
  author =	{Bille, Philip and Ellert, Jonas and Fischer, Johannes and G{\o}rtz, Inge Li and Kurpicz, Florian and Munro, J. Ian and Rotenberg, Eva},
  title =	{{Space Efficient Construction of Lyndon Arrays in Linear Time}},
  booktitle =	{47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-138-2},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{168},
  editor =	{Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.14},
  URN =		{urn:nbn:de:0030-drops-124211},
  doi =		{10.4230/LIPIcs.ICALP.2020.14},
  annote =	{Keywords: String algorithms, string suffixes, succinct data structures, Lyndon word, Lyndon array, nearest smaller values, nearest smaller suffixes}
}

@InProceedings{bille_et_al:LIPIcs.ICALP.2020.14,
  author =	{Bille, Philip and Ellert, Jonas and Fischer, Johannes and G{\o}rtz, Inge Li and Kurpicz, Florian and Munro, J. Ian and Rotenberg, Eva},
  title =	{{Space Efficient Construction of Lyndon Arrays in Linear Time}},
  booktitle =	{47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-138-2},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{168},
  editor =	{Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.14},
  URN =		{urn:nbn:de:0030-drops-124211},
  doi =		{10.4230/LIPIcs.ICALP.2020.14},
  annote =	{Keywords: String algorithms, string suffixes, succinct data structures, Lyndon word, Lyndon array, nearest smaller values, nearest smaller suffixes}
}

Document

DOI: 10.4230/LIPIcs.ESA.2019.41

Bidirectional Text Compression in External Memory

Authors: Patrick Dinklage, Jonas Ellert, Johannes Fischer, Dominik Köppl, and Manuel Penschuck

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)

Abstract

Bidirectional compression algorithms work by substituting repeated substrings by references that, unlike in the famous LZ77-scheme, can point to either direction. We present such an algorithm that is particularly suited for an external memory implementation. We evaluate it experimentally on large data sets of size up to 128 GiB (using only 16 GiB of RAM) and show that it is significantly faster than all known LZ77 compressors, while producing a roughly similar number of factors. We also introduce an external memory decompressor for texts compressed with any uni- or bidirectional compression scheme.

Cite as

Patrick Dinklage, Jonas Ellert, Johannes Fischer, Dominik Köppl, and Manuel Penschuck. Bidirectional Text Compression in External Memory. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 41:1-41:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{dinklage_et_al:LIPIcs.ESA.2019.41,
  author =	{Dinklage, Patrick and Ellert, Jonas and Fischer, Johannes and K\"{o}ppl, Dominik and Penschuck, Manuel},
  title =	{{Bidirectional Text Compression in External Memory}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{41:1--41:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.41},
  URN =		{urn:nbn:de:0030-drops-111624},
  doi =		{10.4230/LIPIcs.ESA.2019.41},
  annote =	{Keywords: text compression, bidirectional parsing, text decompression, external algorithms}
}

Document

DOI: 10.4230/DagRep.8.7.44

Synergies between Adaptive Analysis of Algorithms, Parameterized Complexity, Compressed Data Structures and Compressed Indices (Dagstuhl Seminar 18281)

Authors: Jérémy Barbay, Johannes Fischer, Stefan Kratsch, and Srinivasa Rao Satti

Published in: Dagstuhl Reports, Volume 8, Issue 7 (2019)

Abstract

From the 8th of July 2018 to the 13th of July 2018, a Dagstuhl Seminar took place with the topic "Synergies between Adaptive Analysis of Algorithms, Parameterized Complexity, Compressed Data Structures and Compressed Indices". There, 40 participants from as many as 14 distinct countries and four distinct research areas, dealing with running time analysis and space usage analysis of algorithms and data structures, gathered to discuss results and techniques to "go beyond the worst-case" for classes of structurally restricted inputs, both for (fast) algorithms and (compressed) data structures. The seminar consisted of (1) a first session of personal introduction, each participant presenting his expertise and themes of interests in two slides; (2) a series of four technical talks; and (3) a larger series of presentations of open problems, with ample time left for the participants to gather and work on such open problems.

Cite as

Jérémy Barbay, Johannes Fischer, Stefan Kratsch, and Srinivasa Rao Satti. Synergies between Adaptive Analysis of Algorithms, Parameterized Complexity, Compressed Data Structures and Compressed Indices (Dagstuhl Seminar 18281). In Dagstuhl Reports, Volume 8, Issue 7, pp. 44-61, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@Article{barbay_et_al:DagRep.8.7.44,
  author =	{Barbay, J\'{e}r\'{e}my and Fischer, Johannes and Kratsch, Stefan and Satti, Srinivasa Rao},
  title =	{{Synergies between Adaptive Analysis of Algorithms, Parameterized Complexity, Compressed Data Structures and Compressed Indices (Dagstuhl Seminar 18281)}},
  pages =	{44--61},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2019},
  volume =	{8},
  number =	{7},
  editor =	{Barbay, J\'{e}r\'{e}my and Fischer, Johannes and Kratsch, Stefan and Satti, Srinivasa Rao},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.8.7.44},
  URN =		{urn:nbn:de:0030-drops-101724},
  doi =		{10.4230/DagRep.8.7.44},
  annote =	{Keywords: adaptive (analysis of) algorithms, compressed data structures, compressed indices, parameterized complexity}
}

Document

DOI: 10.4230/LIPIcs.SEA.2017.13

Compression with the tudocomp Framework

Authors: Patrick Dinklage, Johannes Fischer, Dominik Köppl, Marvin Löbel, and Kunihiko Sadakane

Published in: LIPIcs, Volume 75, 16th International Symposium on Experimental Algorithms (SEA 2017)

Abstract

We present a framework facilitating the implementation and comparison of text compression algorithms. We evaluate its features by a case study on two novel compression algorithms based on the Lempel-Ziv compression schemes that perform well on highly repetitive texts.

Cite as

Patrick Dinklage, Johannes Fischer, Dominik Köppl, Marvin Löbel, and Kunihiko Sadakane. Compression with the tudocomp Framework. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 13:1-13:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{dinklage_et_al:LIPIcs.SEA.2017.13,
  author =	{Dinklage, Patrick and Fischer, Johannes and K\"{o}ppl, Dominik and L\"{o}bel, Marvin and Sadakane, Kunihiko},
  title =	{{Compression with the tudocomp Framework}},
  booktitle =	{16th International Symposium on Experimental Algorithms (SEA 2017)},
  pages =	{13:1--13:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-036-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{75},
  editor =	{Iliopoulos, Costas S. and Pissis, Solon P. and Puglisi, Simon J. and Raman, Rajeev},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2017.13},
  URN =		{urn:nbn:de:0030-drops-76015},
  doi =		{10.4230/LIPIcs.SEA.2017.13},
  annote =	{Keywords: lossless compression, compression framework, compression library, algorithm engineering, application of string algorithms}
}

@InProceedings{dinklage_et_al:LIPIcs.SEA.2017.13,
  author =	{Dinklage, Patrick and Fischer, Johannes and K\"{o}ppl, Dominik and L\"{o}bel, Marvin and Sadakane, Kunihiko},
  title =	{{Compression with the tudocomp Framework}},
  booktitle =	{16th International Symposium on Experimental Algorithms (SEA 2017)},
  pages =	{13:1--13:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-036-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{75},
  editor =	{Iliopoulos, Costas S. and Pissis, Solon P. and Puglisi, Simon J. and Raman, Rajeev},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2017.13},
  URN =		{urn:nbn:de:0030-drops-76015},
  doi =		{10.4230/LIPIcs.SEA.2017.13},
  annote =	{Keywords: lossless compression, compression framework, compression library, algorithm engineering, application of string algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.15

Lempel-Ziv Compression in a Sliding Window

Authors: Philip Bille, Patrick Hagge Cording, Johannes Fischer, and Inge Li Gørtz

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

We present new algorithms for the sliding window Lempel-Ziv (LZ77) problem and the approximate rightmost LZ77 parsing problem. Our main result is a new and surprisingly simple algorithm that computes the sliding window LZ77 parse in O(w) space and either O(n) expected time or O(n log log w+z log log s) deterministic time. Here, w is the window size, n is the size of the input string, z is the number of phrases in the parse, and s is the size of the alphabet. This matches the space and time bounds of previous results while removing constant size restrictions on the alphabet size. To achieve our result, we combine a simple modification and augmentation of the suffix tree with periodicity properties of sliding windows. We also apply this new technique to obtain an algorithm for the approximate rightmost LZ77 problem that uses O(n(log z + log log n)) time and O(n) space and produces a (1+e)-approximation of the rightmost parsing (any constant e>0). While this does not improve the best known time-space trade-offs for exact rightmost parsing, our algorithm is significantly simpler and exposes a direct connection between sliding window parsing and the approximate rightmost matching problem.

Cite as

Philip Bille, Patrick Hagge Cording, Johannes Fischer, and Inge Li Gørtz. Lempel-Ziv Compression in a Sliding Window. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 15:1-15:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2017.15,
  author =	{Bille, Philip and Cording, Patrick Hagge and Fischer, Johannes and G{\o}rtz, Inge Li},
  title =	{{Lempel-Ziv Compression in a Sliding Window}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{15:1--15:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.15},
  URN =		{urn:nbn:de:0030-drops-73316},
  doi =		{10.4230/LIPIcs.CPM.2017.15},
  annote =	{Keywords: Lempel-Ziv parsing, sliding window, rightmost matching}
}

Document

DOI: 10.4230/LIPIcs.CPM.2016.26

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Authors: Johannes Fischer, Dominik Köppl, and Florian Kurpicz

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Abstract

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with p processors. Given a static text of length n, we first show how to compute the suffix array interval of a given pattern of length m in O(m/p + lg p + lg lg p * lg lg n) time for p <= m. For approximate pattern matching with k differences or mismatches, we show how to compute all occurrences of a given pattern in O((m^k sigma^k)/p max (k, lg lg n) + (1+m/p) lg p * lg lg n + occ} time, where sigma is the size of the alphabet and p <= sigma^k m^k. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns P and P', we present a data structure for computing the interval of PP' in O(lg lg n) sequential time, or in O(1 + lg_p lg n) parallel time. All our data structures are of size O(n) bits (in addition to the suffix array).

Cite as

Johannes Fischer, Dominik Köppl, and Florian Kurpicz. On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 26:1-26:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{fischer_et_al:LIPIcs.CPM.2016.26,
  author =	{Fischer, Johannes and K\"{o}ppl, Dominik and Kurpicz, Florian},
  title =	{{On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{26:1--26:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.26},
  URN =		{urn:nbn:de:0030-drops-60669},
  doi =		{10.4230/LIPIcs.CPM.2016.26},
  annote =	{Keywords: parallel algorithms, pattern matching, approximate string matching}
}

Search Results

Documents authored by Fischer, Johannes

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as