DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2023.14

Lyndon Arrays in Sublinear Time

Authors: Hideo Bannai and Jonas Ellert

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Abstract

A Lyndon word is a string that is lexicographically smaller than all of its non-trivial suffixes. For example, airbus is a Lyndon word, but amtrak is not a Lyndon word due to its suffix ak. The Lyndon array stores the length of the longest Lyndon prefix of each suffix of a string. For a length-n string over a general ordered alphabet, the array can be computed in O(n) time (Bille et al., ICALP 2020). However, on a word-RAM of word-width w ≥ log₂ n, linear time is not optimal if the string is over integer alphabet {0, … , σ} with σ ≪ n. In this case, the string can be stored in O(n log σ) bits (or O(n / log_σ n) words) of memory, and reading it takes only O(n / log_σ n) time. We show that O(n / log_σ n) time and words of space suffice to compute the succinct 2n-bit version of the Lyndon array. The time is optimal for w = O(log n). The algorithm uses precomputed lookup tables to perform significant parts of the computation in constant time. This is possible due to properties of periodic substrings, which we carefully analyze to achieve the desired result. We envision that the algorithm has applications in the computation of runs (maximal periodic substrings), where the Lyndon array plays a central role in both theoretically and practically fast algorithms.

Cite as

Hideo Bannai and Jonas Ellert. Lyndon Arrays in Sublinear Time. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 14:1-14:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.ESA.2023.14,
  author =	{Bannai, Hideo and Ellert, Jonas},
  title =	{{Lyndon Arrays in Sublinear Time}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{14:1--14:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.14},
  URN =		{urn:nbn:de:0030-drops-186670},
  doi =		{10.4230/LIPIcs.ESA.2023.14},
  annote =	{Keywords: Lyndon forest, Lyndon table, Lyndon array, sublinear time algorithms, word RAM algorithms, word packing, tabulation, lookup tables, periodicity}
}

Document

DOI: 10.4230/LIPIcs.ESA.2022.48

Lyndon Arrays Simplified

Authors: Jonas Ellert

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

Abstract

A Lyndon word is a string that is lexicographically smaller than all of its proper suffixes (e.g., "airbus" is a Lyndon word; "amtrak" is not a Lyndon word because its suffix "ak" is lexicographically smaller than "amtrak"). The Lyndon array (sometimes called Lyndon table) identifies the longest Lyndon prefix of each suffix of a string. It is well known that the Lyndon array of a length-n string can be computed in O(n) time. However, most of the existing algorithms require the suffix array, which has theoretical and practical disadvantages. The only known algorithms that compute the Lyndon array in O(n) time without the suffix array (or similar data structures) do so in a particularly space efficient way (Bille et al., ICALP 2020), or in an online manner (Badkobeh et al., CPM 2022). Due to the additional goals of space efficiency and online computation, these algorithms are complicated in technical detail. Using the main ideas of the aforementioned algorithms, we provide a simpler and easier to understand algorithm that computes the Lyndon array in O(n) time.

Cite as

Jonas Ellert. Lyndon Arrays Simplified. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 48:1-48:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{ellert:LIPIcs.ESA.2022.48,
  author =	{Ellert, Jonas},
  title =	{{Lyndon Arrays Simplified}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{48:1--48:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.48},
  URN =		{urn:nbn:de:0030-drops-169863},
  doi =		{10.4230/LIPIcs.ESA.2022.48},
  annote =	{Keywords: Lyndon table, Lyndon array, Lyndon word, nearest smaller suffixes, lexicographical ordering, general ordered alphabets, combinatorial algorithms}
}

Document

DOI: 10.4230/LIPIcs.SEA.2022.10

A Parallel Framework for Approximate Max-Dicut in Partitionable Graphs

Authors: Nico Bertram, Jonas Ellert, and Johannes Fischer

Published in: LIPIcs, Volume 233, 20th International Symposium on Experimental Algorithms (SEA 2022)

Abstract

Computing a maximum cut in undirected and weighted graphs is a well studied problem and has many practical solutions that also scale well in shared memory (despite its NP-completeness). For its counterpart in directed graphs, however, we are not aware of practical solutions that also utilize parallelism. We engineer a framework that computes a high quality approximate cut in directed and weighted graphs by using a graph partitioning approach. The general idea is to partition a graph into k subgraphs using a parallel partitioning algorithm of our choice (the first ingredient of our framework). Then, for each subgraph in parallel, we compute a cut using any polynomial time approximation algorithm (the second ingredient). In a final step, we merge the locally computed solutions using a high-quality or exact parallel Max-Dicut algorithm (the third ingredient). On graphs that can be partitioned well, the quality of the computed cut is significantly better than the best cut achieved by any linear time algorithm. This is particularly relevant for large graphs, where linear time algorithms used to be the only feasible option.

Cite as

Nico Bertram, Jonas Ellert, and Johannes Fischer. A Parallel Framework for Approximate Max-Dicut in Partitionable Graphs. In 20th International Symposium on Experimental Algorithms (SEA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 233, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bertram_et_al:LIPIcs.SEA.2022.10,
  author =	{Bertram, Nico and Ellert, Jonas and Fischer, Johannes},
  title =	{{A Parallel Framework for Approximate Max-Dicut in Partitionable Graphs}},
  booktitle =	{20th International Symposium on Experimental Algorithms (SEA 2022)},
  pages =	{10:1--10:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-251-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{233},
  editor =	{Schulz, Christian and U\c{c}ar, Bora},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2022.10},
  URN =		{urn:nbn:de:0030-drops-165441},
  doi =		{10.4230/LIPIcs.SEA.2022.10},
  annote =	{Keywords: maximum directed cut, graph partitioning, algorithm engineering, approximation, parallel algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.13

Back-To-Front Online Lyndon Forest Construction

Authors: Golnaz Badkobeh, Maxime Crochemore, Jonas Ellert, and Cyril Nicaud

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

A Lyndon word is a word that is lexicographically smaller than all of its non-trivial rotations (e.g. ananas is a Lyndon word; banana is not a Lyndon word due to its smaller rotation abanan). The Lyndon forest (or equivalently Lyndon table) identifies maximal Lyndon factors of a word, and is of great combinatoric interest, e.g. when finding maximal repetitions in words. While optimal linear time algorithms for computing the Lyndon forest are known, none of them work in an online manner. We present algorithms that compute the Lyndon forest of a word in a reverse online manner, processing the input word from back to front. We assume a general ordered alphabet, i.e. the only elementary operations on symbols are comparisons of the form less-equal-greater. We start with a naive algorithm and show that, despite its quadratic worst-case behaviour, it already takes expected linear time on words drawn uniformly at random. We then introduce a much more sophisticated algorithm that takes linear time in the worst case. It borrows some ideas from the offline algorithm by Bille et al. (ICALP 2020), combined with new techniques that are necessary for the reverse online setting. While the back-to-front approach for this computation is rather natural (see Franek and Liut, PSC 2019), the steps required to achieve linear time are surprisingly intricate. We envision that our algorithm will be useful for the online computation of maximal repetitions in words.

Cite as

Golnaz Badkobeh, Maxime Crochemore, Jonas Ellert, and Cyril Nicaud. Back-To-Front Online Lyndon Forest Construction. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 13:1-13:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{badkobeh_et_al:LIPIcs.CPM.2022.13,
  author =	{Badkobeh, Golnaz and Crochemore, Maxime and Ellert, Jonas and Nicaud, Cyril},
  title =	{{Back-To-Front Online Lyndon Forest Construction}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{13:1--13:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.13},
  URN =		{urn:nbn:de:0030-drops-161404},
  doi =		{10.4230/LIPIcs.CPM.2022.13},
  annote =	{Keywords: Lyndon factorisation, Lyndon forest, Lyndon table, Lyndon array, right Lyndon tree, Cartesian tree, standard factorisation, online algorithms}
}

Document

DOI: 10.4230/LIPIcs.ESA.2021.15

Lyndon Words Accelerate Suffix Sorting

Authors: Nico Bertram, Jonas Ellert, and Johannes Fischer

Published in: LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

Abstract

Suffix sorting is arguably the most fundamental building block in string algorithmics, like regular sorting in the broader field of algorithms. It is thus not surprising that the literature is full of algorithms for suffix sorting, in particular focusing on their practicality. However, the advances on practical suffix sorting stalled with the emergence of the DivSufSort algorithm more than 10 years ago, which, up to date, has remained the fastest suffix sorter. This article shows how properties of Lyndon words can be exploited algorithmically to accelerate suffix sorting again. Our new algorithm is 6-19% faster than DivSufSort on real-world texts, and up to three times as fast on artificial repetitive texts. It can also be parallelized, where similar speedups can be observed. Thus, we make the first advances in practical suffix sorting after more than a decade of standstill.

Cite as

Nico Bertram, Jonas Ellert, and Johannes Fischer. Lyndon Words Accelerate Suffix Sorting. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bertram_et_al:LIPIcs.ESA.2021.15,
  author =	{Bertram, Nico and Ellert, Jonas and Fischer, Johannes},
  title =	{{Lyndon Words Accelerate Suffix Sorting}},
  booktitle =	{29th Annual European Symposium on Algorithms (ESA 2021)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-204-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{204},
  editor =	{Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.15},
  URN =		{urn:nbn:de:0030-drops-145961},
  doi =		{10.4230/LIPIcs.ESA.2021.15},
  annote =	{Keywords: Suffix array, suffix sorting, Lyndon words, string algorithms}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2021.63

Linear Time Runs Over General Ordered Alphabets

Authors: Jonas Ellert and Johannes Fischer

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

Abstract

A run in a string is a maximal periodic substring. For example, the string bananatree contains the runs anana = (an)^{5/2} and ee = e². There are less than n runs in any length-n string, and computing all runs for a string over a linearly-sortable alphabet takes 𝒪(n) time (Bannai et al., SIAM J. Comput. 2017). Kosolobov conjectured that there also exists a linear time runs algorithm for general ordered alphabets (Inf. Process. Lett. 2016). The conjecture was almost proven by Crochemore et al., who presented an 𝒪(nα(n)) time algorithm (where α(n) is the extremely slowly growing inverse Ackermann function). We show how to achieve 𝒪(n) time by exploiting combinatorial properties of the Lyndon array, thus proving Kosolobov’s conjecture. This also positively answers the at least 29-year-old question whether square-freeness can be tested in linear time over general ordered alphabets (Breslauer, PhD thesis, Columbia University 1992).

Cite as

Jonas Ellert and Johannes Fischer. Linear Time Runs Over General Ordered Alphabets. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 63:1-63:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{ellert_et_al:LIPIcs.ICALP.2021.63,
  author =	{Ellert, Jonas and Fischer, Johannes},
  title =	{{Linear Time Runs Over General Ordered Alphabets}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{63:1--63:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.63},
  URN =		{urn:nbn:de:0030-drops-141322},
  doi =		{10.4230/LIPIcs.ICALP.2021.63},
  annote =	{Keywords: String algorithms, Lyndon array, runs, squares, longest common extension, general ordered alphabets, combinatorics on words}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2020.14

Space Efficient Construction of Lyndon Arrays in Linear Time

Authors: Philip Bille, Jonas Ellert, Johannes Fischer, Inge Li Gørtz, Florian Kurpicz, J. Ian Munro, and Eva Rotenberg

Published in: LIPIcs, Volume 168, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)

Abstract

Given a string S of length n, its Lyndon array identifies for each suffix S[i..n] the next lexicographically smaller suffix S[j..n], i.e. the minimal index j > i with S[i..n] ≻ S[j..n]. Apart from its plain (n log₂ n)-bit array representation, the Lyndon array can also be encoded as a succinct parentheses sequence that requires only 2n bits of space. While linear time construction algorithms for both representations exist, it has previously been unknown if the same time bound can be achieved with less than Ω(n lg n) bits of additional working space. We show that, in fact, o(n) additional bits are sufficient to compute the succinct 2n-bit version of the Lyndon array in linear time. For the plain (n log₂ n)-bit version, we only need 𝒪(1) additional words to achieve linear time. Our space efficient construction algorithm makes the Lyndon array more accessible as a fundamental data structure in applications like full-text indexing.

Cite as

Philip Bille, Jonas Ellert, Johannes Fischer, Inge Li Gørtz, Florian Kurpicz, J. Ian Munro, and Eva Rotenberg. Space Efficient Construction of Lyndon Arrays in Linear Time. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.ICALP.2020.14,
  author =	{Bille, Philip and Ellert, Jonas and Fischer, Johannes and G{\o}rtz, Inge Li and Kurpicz, Florian and Munro, J. Ian and Rotenberg, Eva},
  title =	{{Space Efficient Construction of Lyndon Arrays in Linear Time}},
  booktitle =	{47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-138-2},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{168},
  editor =	{Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.14},
  URN =		{urn:nbn:de:0030-drops-124211},
  doi =		{10.4230/LIPIcs.ICALP.2020.14},
  annote =	{Keywords: String algorithms, string suffixes, succinct data structures, Lyndon word, Lyndon array, nearest smaller values, nearest smaller suffixes}
}

@InProceedings{bille_et_al:LIPIcs.ICALP.2020.14,
  author =	{Bille, Philip and Ellert, Jonas and Fischer, Johannes and G{\o}rtz, Inge Li and Kurpicz, Florian and Munro, J. Ian and Rotenberg, Eva},
  title =	{{Space Efficient Construction of Lyndon Arrays in Linear Time}},
  booktitle =	{47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)},
  pages =	{14:1--14:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-138-2},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{168},
  editor =	{Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.14},
  URN =		{urn:nbn:de:0030-drops-124211},
  doi =		{10.4230/LIPIcs.ICALP.2020.14},
  annote =	{Keywords: String algorithms, string suffixes, succinct data structures, Lyndon word, Lyndon array, nearest smaller values, nearest smaller suffixes}
}

Document

DOI: 10.4230/LIPIcs.ESA.2019.41

Bidirectional Text Compression in External Memory

Authors: Patrick Dinklage, Jonas Ellert, Johannes Fischer, Dominik Köppl, and Manuel Penschuck

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)

Abstract

Bidirectional compression algorithms work by substituting repeated substrings by references that, unlike in the famous LZ77-scheme, can point to either direction. We present such an algorithm that is particularly suited for an external memory implementation. We evaluate it experimentally on large data sets of size up to 128 GiB (using only 16 GiB of RAM) and show that it is significantly faster than all known LZ77 compressors, while producing a roughly similar number of factors. We also introduce an external memory decompressor for texts compressed with any uni- or bidirectional compression scheme.

Cite as

Patrick Dinklage, Jonas Ellert, Johannes Fischer, Dominik Köppl, and Manuel Penschuck. Bidirectional Text Compression in External Memory. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 41:1-41:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{dinklage_et_al:LIPIcs.ESA.2019.41,
  author =	{Dinklage, Patrick and Ellert, Jonas and Fischer, Johannes and K\"{o}ppl, Dominik and Penschuck, Manuel},
  title =	{{Bidirectional Text Compression in External Memory}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{41:1--41:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.41},
  URN =		{urn:nbn:de:0030-drops-111624},
  doi =		{10.4230/LIPIcs.ESA.2019.41},
  annote =	{Keywords: text compression, bidirectional parsing, text decompression, external algorithms}
}

Search Results

Documents authored by Ellert, Jonas

Lyndon Arrays in Sublinear Time

Abstract

Cite as

Lyndon Arrays Simplified

Abstract

Cite as

A Parallel Framework for Approximate Max-Dicut in Partitionable Graphs

Abstract

Cite as

Back-To-Front Online Lyndon Forest Construction

Abstract

Cite as

Lyndon Words Accelerate Suffix Sorting

Abstract

Cite as

Linear Time Runs Over General Ordered Alphabets

Abstract

Cite as

Space Efficient Construction of Lyndon Arrays in Linear Time

Abstract

Cite as

Bidirectional Text Compression in External Memory

Abstract

Cite as