DROPS

Document

DOI: 10.4230/LIPIcs.CPM.2026.17

Sensitivity of Repetitiveness Measures to String Reversal

Authors: Hideo Bannai, Yuto Fujie, Peaker Guo, Shunsuke Inenaga, Yuto Nakashima, Simon J. Puglisi, and Cristian Urbina

Published in: LIPIcs, Volume 369, 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)

Abstract

We study the impact that string reversal can have on several repetitiveness measures. First, we exhibit an infinite family of strings where the number, r, of runs in the run-length encoding of the Burrows-Wheeler transform (BWT) can increase additively by Θ(n) when reversing the string. This substantially improves the known Ω(log n) lower-bound for the additive sensitivity of r and it is asymptotically tight. We generalize our result to other variants of the BWT, including the variant with an appended end-of-string symbol and the bijective BWT. We show that an analogous result holds for the size z of the Lempel-Ziv 77 (LZ) parsing of the text, and also for some of its variants, including the non-overlapping LZ parsing, and the LZ-end parsing. Moreover, we describe a family of strings for which the ratio z(w^R)/z(w) approaches 3 from below as |w| → ∞. We also show an asymptotically tight lower-bound of Θ(n) for the additive sensitivity of the size v of the smallest lexicographic parsing to string reversal. Finally, we show that the multiplicative sensitivity of v to reversing the string is Θ(log n), and this lower-bound is also tight. Overall, our results expose the limitations of repetitiveness measures that are widely used in practice, against string reversal - a simple and natural data transformation.

Cite as

Hideo Bannai, Yuto Fujie, Peaker Guo, Shunsuke Inenaga, Yuto Nakashima, Simon J. Puglisi, and Cristian Urbina. Sensitivity of Repetitiveness Measures to String Reversal. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 17:1-17:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2026.17,
  author =	{Bannai, Hideo and Fujie, Yuto and Guo, Peaker and Inenaga, Shunsuke and Nakashima, Yuto and Puglisi, Simon J. and Urbina, Cristian},
  title =	{{Sensitivity of Repetitiveness Measures to String Reversal}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{17:1--17:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.17},
  URN =		{urn:nbn:de:0030-drops-259434},
  doi =		{10.4230/LIPIcs.CPM.2026.17},
  annote =	{Keywords: String reversal, Repetitiveness measures, Burrows-Wheeler transform, Lempel-Ziv parsing, Lexicographic parsings}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.24

On Occurrence-Preserving Morphisms

Authors: Kaisei Kishi, Peaker Guo, Cristian Urbina, and Hideo Bannai

Published in: LIPIcs, Volume 369, 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)

Abstract

A morphism is a mapping that transforms words through letter-wise substitution, where each symbol is consistently replaced by a fixed word. In the field of combinatorics on words, one topic that has attracted considerable attention is the characterization of morphisms that preserve specific properties, such as overlap-freeness, square-freeness, lexicographic order, and primitivity. Continuing this direction, we initiate the study on occurrence-preserving morphisms, which address the following fundamental question: given a morphism ϕ, two words u and v, and k ≥ 1, under what conditions does the number of occurrences of u in v equal the number of occurrences of ϕ^k(u) in ϕ^k(v)? To answer this question, we introduce the notion of interference-free morphisms, examine their properties, and uncover a connection to recognizable morphisms. We then present a precise characterization of occurrence-preserving morphisms in terms of interference-freeness. As applications of our characterization, we first show that there exists a bijection between the starting positions of the occurrences of u in v and those of ϕ^k(u) in ϕ^k(v). We then apply the characterization to the Fibonacci and Thue-Morse words to identify their minimal unique substrings (MUSs). Finally, we exploit the connection between MUSs and net occurrences to simplify existing proofs on net occurrences in these words.

Cite as

Kaisei Kishi, Peaker Guo, Cristian Urbina, and Hideo Bannai. On Occurrence-Preserving Morphisms. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 24:1-24:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{kishi_et_al:LIPIcs.CPM.2026.24,
  author =	{Kishi, Kaisei and Guo, Peaker and Urbina, Cristian and Bannai, Hideo},
  title =	{{On Occurrence-Preserving Morphisms}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{24:1--24:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.24},
  URN =		{urn:nbn:de:0030-drops-259504},
  doi =		{10.4230/LIPIcs.CPM.2026.24},
  annote =	{Keywords: Property-preserving morphisms, interference-free morphisms, recognizable morphisms, injective morphisms, Fibonacci words, Thue-Morse words, minimal unique substrings (MUSs), net occurrences}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.33

The Smallest String Attractors of Fibonacci and Period-Doubling Words

Authors: Mutsunori Banbara, Hideo Bannai, Peaker Guo, Dominik Köppl, Takuya Mieno, and Yoshio Okamoto

Published in: LIPIcs, Volume 369, 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)

Abstract

A string attractor of a string T[1..|T|] is a set of positions Γ of T such that any substring w of T has an occurrence that crosses a position in Γ, i.e., there is a position i such that w = T[i..i+|w|-1] and the intersection [i,i+|w|-1]∩ Γ is nonempty. The size of the smallest string attractor of Fibonacci words is known to be 2. We completely characterize the set of all smallest string attractors of Fibonacci words, and show a recursive formula describing the 2^{n-4} + 2^{⌈n/2⌉ - 2} distinct position pairs that are the smallest string attractors of the nth Fibonacci word for n ≥ 7. Similarly, the size of the smallest string attractor of period-doubling words is known to be 2. We also completely characterize the set of all smallest string attractors of period-doubling words, and show a formula describing the two distinct position pairs that are the smallest string attractors of the nth period-doubling word for n ≥ 2. Our results show that strings with the same smallest attractor size can have a drastically different number of distinct smallest attractors.

Cite as

Mutsunori Banbara, Hideo Bannai, Peaker Guo, Dominik Köppl, Takuya Mieno, and Yoshio Okamoto. The Smallest String Attractors of Fibonacci and Period-Doubling Words. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 33:1-33:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{banbara_et_al:LIPIcs.CPM.2026.33,
  author =	{Banbara, Mutsunori and Bannai, Hideo and Guo, Peaker and K\"{o}ppl, Dominik and Mieno, Takuya and Okamoto, Yoshio},
  title =	{{The Smallest String Attractors of Fibonacci and Period-Doubling Words}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{33:1--33:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.33},
  URN =		{urn:nbn:de:0030-drops-259599},
  doi =		{10.4230/LIPIcs.CPM.2026.33},
  annote =	{Keywords: String attractors, Fibonacci words, Period-doubling words, Combinatorics on words}
}

Document

DOI: 10.4230/OASIcs.Manzini.2

A Survey of the Bijective Burrows-Wheeler Transform

Authors: Hideo Bannai, Dominik Köppl, and Zsuzsanna Lipták

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

The Bijective BWT (BBWT), conceived by Scott in 2007, later summarized in a preprint by Gil and Scott in 2009 (arXiv 2012), is a variant of the Burrows-Wheeler Transform which is bijective: every string is the BBWT of some string. Indeed, the BBWT of a string is the extended BWT [Mantaci et al., 2007] of the factors of its Lyndon factorization. The BBWT has been receiving increasing interest in recent years. In this paper, we survey existing research on the BBWT, starting with its history and motivation. We then present algorithmic topics including construction algorithms with various complexities and an index on top of the BBWT for pattern matching. We subsequently address some properties of the BBWT as a compressor, discussing robustness to operations such as reversal, edits, rotation, as well as compression power. We close with listing other bijective variants of the BWT and open problems concerning the BBWT.

Cite as

Hideo Bannai, Dominik Köppl, and Zsuzsanna Lipták. A Survey of the Bijective Burrows-Wheeler Transform. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 2:1-2:26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:OASIcs.Manzini.2,
  author =	{Bannai, Hideo and K\"{o}ppl, Dominik and Lipt\'{a}k, Zsuzsanna},
  title =	{{A Survey of the Bijective Burrows-Wheeler Transform}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{2:1--2:26},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.2},
  URN =		{urn:nbn:de:0030-drops-239100},
  doi =		{10.4230/OASIcs.Manzini.2},
  annote =	{Keywords: Burrows-Wheeler Transform, compression, text indexing, repetitiveness measure, Lyndon words, index construction algorithms, bijective string transformation}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.17

On the Compressiveness of the Burrows-Wheeler Transform

Authors: Hideo Bannai, Tomohiro I, and Yuto Nakashima

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

The Burrows-Wheeler transform (BWT) is a reversible transform that converts a string w into another string BWT(w). The size of the run-length encoded BWT (RLBWT) can be interpreted as a measure of repetitiveness in the class of representations called dictionary compression which are essentially representations based on copy and paste operations. In this paper, we shed new light on the compressiveness of BWT and the bijective BWT (BBWT). We first extend previous results on the relations of their run-length compressed sizes r and r_B. We also show that the so-called "clustering effect" of BWT and BBWT can be captured by measures other than empirical entropy or run-length encoding. In particular, we show that BWT and BBWT do not increase the repetitiveness of the string with respect to various measures based on dictionary compression by more than a polylogarithmic factor. Furthermore, we show that there exists an infinite family of strings that are maximally incompressible by any dictionary compression measure, but become very compressible after applying BBWT. An interesting implication of this result is that it is possible to transcend dictionary compression in some cases by simply applying BBWT before applying dictionary compression.

Cite as

Hideo Bannai, Tomohiro I, and Yuto Nakashima. On the Compressiveness of the Burrows-Wheeler Transform. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 17:1-17:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2025.17,
  author =	{Bannai, Hideo and I, Tomohiro and Nakashima, Yuto},
  title =	{{On the Compressiveness of the Burrows-Wheeler Transform}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{17:1--17:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.17},
  URN =		{urn:nbn:de:0030-drops-231116},
  doi =		{10.4230/LIPIcs.CPM.2025.17},
  annote =	{Keywords: Data Compression, Bijective Burrows-Wheeler Transform, Fibonacci words}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.20

Text Indexing for Simple Regular Expressions

Authors: Hideo Bannai, Philip Bille, Inge Li Gørtz, Gad M. Landau, Gonzalo Navarro, Nicola Prezza, Teresa Anna Steiner, and Simon Rumle Tarnow

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

We study the problem of indexing a text T[1..n] ∈ Σⁿ so that, later, given a query regular expression pattern R of size m = |R|, we can report all the occ substrings T[i..j] of T matching R. The problem is known to be hard for arbitrary patterns R, so in this paper, we consider the following two types of patterns. (1) Character-class Kleene-star patterns of the form P₁ D^* P₂, where P₁ and P₂ are strings and D = {c₁, …, c_k} ⊂ Σ is a character-class (shorthand for the regular expression (c₁ | c₂ | ⋯ | c_k)) and (2) String Kleene-star patterns of the form P₁ P^* P₂ where P, P₁ and P₂ are strings. In case (1), we describe an index of O(nlog^{1+ε}n) space (for any constant ε > 0) solving queries in time O(m + log n/log log n + occ) on constant-sized alphabets. We also describe a general solution for any alphabet size. This result is conditioned on the existence of an anchor: a character of P₁P₂ that does not belong to D. We justify this assumption by proving that no efficient indexing solution can exist if an anchor is not present unless the Set Disjointness Conjecture fails. In case (2), we describe an index of size O(n) answering queries in time O(m + (occ+1)log^{ε}n) on any alphabet size.

Cite as

Hideo Bannai, Philip Bille, Inge Li Gørtz, Gad M. Landau, Gonzalo Navarro, Nicola Prezza, Teresa Anna Steiner, and Simon Rumle Tarnow. Text Indexing for Simple Regular Expressions. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 20:1-20:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2025.20,
  author =	{Bannai, Hideo and Bille, Philip and G{\o}rtz, Inge Li and Landau, Gad M. and Navarro, Gonzalo and Prezza, Nicola and Steiner, Teresa Anna and Tarnow, Simon Rumle},
  title =	{{Text Indexing for Simple Regular Expressions}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{20:1--20:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.20},
  URN =		{urn:nbn:de:0030-drops-231143},
  doi =		{10.4230/LIPIcs.CPM.2025.20},
  annote =	{Keywords: Text indexing, regular expressions, data structures}
}

Document

DOI: 10.4230/LIPIcs.ESA.2024.18

Height-Bounded Lempel-Ziv Encodings

Authors: Hideo Bannai, Mitsuru Funakoshi, Diptarama Hendrian, Myuji Matsuda, and Simon J. Puglisi

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)

Abstract

We introduce height-bounded LZ encodings (LZHB), a new family of compressed representations that are variants of Lempel-Ziv parsings with a focus on bounding the worst-case access time to arbitrary positions in the text directly via the compressed representation. An LZ-like encoding is a partitioning of the string into phrases of length 1 which can be encoded literally, or phrases of length at least 2 which have a previous occurrence in the string and can be encoded by its position and length. An LZ-like encoding induces an implicit referencing forest on the set of positions of the string. An LZHB encoding is an LZ-like encoding where the height of the implicit referencing forest is bounded. An LZHB encoding with height constraint h allows access to an arbitrary position of the underlying text using O(h) predecessor queries. While computing the optimal (i.e., smallest) LZHB encoding efficiently seems to be difficult [Cicalese & Ugazio 2024, arXiv, to appear at DLT 2024], we give the first linear time algorithm for strings over a constant size alphabet that computes the greedy LZHB encoding, i.e., the string is processed from beginning to end, and the longest prefix of the remaining string that can satisfy the height constraint is taken as the next phrase. Our algorithms significantly improve both theoretically and practically, the very recently and independently proposed algorithms by Lipták et al. (CPM 2024). We also analyze the size of height bounded LZ encodings in the context of repetitiveness measures, and show that there exists a constant c such that the size ẑ_{HB(clog n)} of the optimal LZHB encoding whose height is bounded by clog n for any string of length n is O(ĝ_{rl}), where ĝ_{rl} is the size of the smallest run-length grammar. Furthermore, we show that there exists a family of strings such that ẑ_{HB(clog n)} = o(ĝ_{rl}), thus making ẑ_{HB(clog n)} one of the smallest known repetitiveness measures for which O(polylog n) time access is possible using linear (O(ẑ_{HB(clog n)})) space.

Cite as

Hideo Bannai, Mitsuru Funakoshi, Diptarama Hendrian, Myuji Matsuda, and Simon J. Puglisi. Height-Bounded Lempel-Ziv Encodings. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.ESA.2024.18,
  author =	{Bannai, Hideo and Funakoshi, Mitsuru and Hendrian, Diptarama and Matsuda, Myuji and Puglisi, Simon J.},
  title =	{{Height-Bounded Lempel-Ziv Encodings}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{18:1--18:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.18},
  URN =		{urn:nbn:de:0030-drops-210899},
  doi =		{10.4230/LIPIcs.ESA.2024.18},
  annote =	{Keywords: Lempel-Ziv parsing, data compression}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2024.75

Edit and Alphabet-Ordering Sensitivity of Lex-Parse

Authors: Yuto Nakashima, Dominik Köppl, Mitsuru Funakoshi, Shunsuke Inenaga, and Hideo Bannai

Published in: LIPIcs, Volume 306, 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)

Abstract

We investigate the compression sensitivity [Akagi et al., 2023] of lex-parse [Navarro et al., 2021] for two operations: (1) single character edit and (2) modification of the alphabet ordering, and give tight upper and lower bounds for both operations (i.e., we show Θ(log n) bounds for strings of length n). For both lower bounds, we use the family of Fibonacci words. For the bounds on edit operations, our analysis makes heavy use of properties of the Lyndon factorization of Fibonacci words to characterize the structure of lex-parse.

Cite as

Yuto Nakashima, Dominik Köppl, Mitsuru Funakoshi, Shunsuke Inenaga, and Hideo Bannai. Edit and Alphabet-Ordering Sensitivity of Lex-Parse. In 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 306, pp. 75:1-75:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{nakashima_et_al:LIPIcs.MFCS.2024.75,
  author =	{Nakashima, Yuto and K\"{o}ppl, Dominik and Funakoshi, Mitsuru and Inenaga, Shunsuke and Bannai, Hideo},
  title =	{{Edit and Alphabet-Ordering Sensitivity of Lex-Parse}},
  booktitle =	{49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)},
  pages =	{75:1--75:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-335-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{306},
  editor =	{Kr\'{a}lovi\v{c}, Rastislav and Ku\v{c}era, Anton{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2024.75},
  URN =		{urn:nbn:de:0030-drops-206314},
  doi =		{10.4230/LIPIcs.MFCS.2024.75},
  annote =	{Keywords: Compression sensitivity, Lex-parse, Fibonacci words}
}

Document

DOI: 10.4230/LIPIcs.CPM.2024.3

Maintaining the Size of LZ77 on Semi-Dynamic Strings

Authors: Hideo Bannai, Panagiotis Charalampopoulos, and Jakub Radoszewski

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

Abstract

We consider the problem of maintaining the size of the LZ77 factorization of a string S of length at most n under the following operations: (a) appending a given letter to S and (b) deleting the first letter of S. Our main result is an algorithm for this problem with amortized update time Õ(√n). As a corollary, we obtain an Õ(n√n)-time algorithm for computing the most LZ77-compressible rotation of a length-n string - a naive approach for this problem would compute the LZ77 factorization of each possible rotation and would thus take quadratic time in the worst case. We also show an Ω(√n) lower bound for the additive sensitivity of LZ77 with respect to the rotation operation. Our algorithm employs dynamic trees to maintain the longest-previous-factor array information and depends on periodicity-based arguments that bound the number of the required updates and enable their efficient computation.

Cite as

Hideo Bannai, Panagiotis Charalampopoulos, and Jakub Radoszewski. Maintaining the Size of LZ77 on Semi-Dynamic Strings. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 3:1-3:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2024.3,
  author =	{Bannai, Hideo and Charalampopoulos, Panagiotis and Radoszewski, Jakub},
  title =	{{Maintaining the Size of LZ77 on Semi-Dynamic Strings}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{3:1--3:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.3},
  URN =		{urn:nbn:de:0030-drops-201134},
  doi =		{10.4230/LIPIcs.CPM.2024.3},
  annote =	{Keywords: Lempel-Ziv, compression, LZ77, semi-dynamic algorithm, cyclic rotation}
}

Document

DOI: 10.4230/LIPIcs.ESA.2023.14

Lyndon Arrays in Sublinear Time

Authors: Hideo Bannai and Jonas Ellert

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Abstract

A Lyndon word is a string that is lexicographically smaller than all of its non-trivial suffixes. For example, airbus is a Lyndon word, but amtrak is not a Lyndon word due to its suffix ak. The Lyndon array stores the length of the longest Lyndon prefix of each suffix of a string. For a length-n string over a general ordered alphabet, the array can be computed in O(n) time (Bille et al., ICALP 2020). However, on a word-RAM of word-width w ≥ log₂ n, linear time is not optimal if the string is over integer alphabet {0, … , σ} with σ ≪ n. In this case, the string can be stored in O(n log σ) bits (or O(n / log_σ n) words) of memory, and reading it takes only O(n / log_σ n) time. We show that O(n / log_σ n) time and words of space suffice to compute the succinct 2n-bit version of the Lyndon array. The time is optimal for w = O(log n). The algorithm uses precomputed lookup tables to perform significant parts of the computation in constant time. This is possible due to properties of periodic substrings, which we carefully analyze to achieve the desired result. We envision that the algorithm has applications in the computation of runs (maximal periodic substrings), where the Lyndon array plays a central role in both theoretically and practically fast algorithms.

Cite as

Hideo Bannai and Jonas Ellert. Lyndon Arrays in Sublinear Time. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 14:1-14:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.ESA.2023.14,
  author =	{Bannai, Hideo and Ellert, Jonas},
  title =	{{Lyndon Arrays in Sublinear Time}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{14:1--14:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.14},
  URN =		{urn:nbn:de:0030-drops-186670},
  doi =		{10.4230/LIPIcs.ESA.2023.14},
  annote =	{Keywords: Lyndon forest, Lyndon table, Lyndon array, sublinear time algorithms, word RAM algorithms, word packing, tabulation, lookup tables, periodicity}
}

Document

DOI: 10.4230/LIPIcs.WABI.2023.13

Acceleration of FM-Index Queries Through Prefix-Free Parsing

Authors: Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, and Travis Gagie

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

FM-indexes are a crucial data structure in DNA alignment, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [Ferragina and Fischer, 2007] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. Last year, Deng et al. [Deng et al., 2022] proposed parsing genomic data by induced suffix sorting, and showed the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing - which takes parameters that let us tune the average length of the phrases - instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38. And was consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it is very clear that our method accelerates the performance of count over all state-of-the-art methods with a minor increase in the memory.

Cite as

Aaron Hong, Marco Oliva, Dominik Köppl, Hideo Bannai, Christina Boucher, and Travis Gagie. Acceleration of FM-Index Queries Through Prefix-Free Parsing. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 13:1-13:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{hong_et_al:LIPIcs.WABI.2023.13,
  author =	{Hong, Aaron and Oliva, Marco and K\"{o}ppl, Dominik and Bannai, Hideo and Boucher, Christina and Gagie, Travis},
  title =	{{Acceleration of FM-Index Queries Through Prefix-Free Parsing}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{13:1--13:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.13},
  URN =		{urn:nbn:de:0030-drops-186390},
  doi =		{10.4230/LIPIcs.WABI.2023.13},
  annote =	{Keywords: FM-index, pangenomics, scalability, word-based indexing, random access}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.3

Optimal LZ-End Parsing Is Hard

Authors: Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, and Takeaki Uno

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

LZ-End is a variant of the well-known Lempel-Ziv parsing family such that each phrase of the parsing has a previous occurrence, with the additional constraint that the previous occurrence must end at the end of a previous phrase. LZ-End was initially proposed as a greedy parsing, where each phrase is determined greedily from left to right, as the longest factor that satisfies the above constraint [Kreft & Navarro, 2010]. In this work, we consider an optimal LZ-End parsing that has the minimum number of phrases in such parsings. We show that a decision version of computing the optimal LZ-End parsing is NP-complete by showing a reduction from the vertex cover problem. Moreover, we give a MAX-SAT formulation for the optimal LZ-End parsing adapting an approach for computing various NP-hard repetitiveness measures recently presented by [Bannai et al., 2022]. We also consider the approximation ratio of the size of greedy LZ-End parsing to the size of the optimal LZ-End parsing, and give a lower bound of the ratio which asymptotically approaches 2.

Cite as

Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, and Takeaki Uno. Optimal LZ-End Parsing Is Hard. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 3:1-3:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2023.3,
  author =	{Bannai, Hideo and Funakoshi, Mitsuru and Kurita, Kazuhiro and Nakashima, Yuto and Seto, Kazuhisa and Uno, Takeaki},
  title =	{{Optimal LZ-End Parsing Is Hard}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{3:1--3:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.3},
  URN =		{urn:nbn:de:0030-drops-179571},
  doi =		{10.4230/LIPIcs.CPM.2023.3},
  annote =	{Keywords: Data Compression, LZ-End, Repetitiveness measures}
}

Document

DOI: 10.4230/LIPIcs.ESA.2022.12

Computing NP-Hard Repetitiveness Measures via MAX-SAT

Authors: Hideo Bannai, Keisuke Goto, Masakazu Ishihata, Shunsuke Kanda, Dominik Köppl, and Takaaki Nishimoto

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

Abstract

Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional macro scheme, and the smallest size of a straight-line program. While a vast variety of implementations for heuristically computing approximations exist, exact computation of these measures has received little to no attention. In this paper, we present MAX-SAT formulations that provide the first non-trivial implementations for exact computation of smallest string attractors, smallest bidirectional macro schemes, and smallest straight-line programs. Computational experiments show that our implementations work for texts of length up to a few hundred for straight-line programs and bidirectional macro schemes, and texts even over a million for string attractors.

Cite as

Hideo Bannai, Keisuke Goto, Masakazu Ishihata, Shunsuke Kanda, Dominik Köppl, and Takaaki Nishimoto. Computing NP-Hard Repetitiveness Measures via MAX-SAT. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 12:1-12:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.ESA.2022.12,
  author =	{Bannai, Hideo and Goto, Keisuke and Ishihata, Masakazu and Kanda, Shunsuke and K\"{o}ppl, Dominik and Nishimoto, Takaaki},
  title =	{{Computing NP-Hard Repetitiveness Measures via MAX-SAT}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{12:1--12:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.12},
  URN =		{urn:nbn:de:0030-drops-169505},
  doi =		{10.4230/LIPIcs.ESA.2022.12},
  annote =	{Keywords: repetitiveness measures, string attractor, bidirectional macro scheme}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.CPM.2022

LIPIcs, Volume 223, CPM 2022, Complete Volume

Authors: Hideo Bannai and Jan Holub

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

LIPIcs, Volume 223, CPM 2022, Complete Volume

Cite as

33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 1-470, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@Proceedings{bannai_et_al:LIPIcs.CPM.2022,
  title =	{{LIPIcs, Volume 223, CPM 2022, Complete Volume}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{1--470},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022},
  URN =		{urn:nbn:de:0030-drops-161265},
  doi =		{10.4230/LIPIcs.CPM.2022},
  annote =	{Keywords: LIPIcs, Volume 223, CPM 2022, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.CPM.2022.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Hideo Bannai and Jan Holub

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 0:i-0:xviii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2022.0,
  author =	{Bannai, Hideo and Holub, Jan},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{0:i--0:xviii},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.0},
  URN =		{urn:nbn:de:0030-drops-161271},
  doi =		{10.4230/LIPIcs.CPM.2022.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.CPM.2021.1

Repetitions in Strings: A "Constant" Problem (Invited Talk)

Authors: Hideo Bannai

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Abstract

Repeating structures in strings is one of the most fundamental characteristics of strings, and has been an important topic in the field of combinatorics on words and combinatorial pattern matching since their beginnings. In this talk, I will focus on squares and maximal repetitions and review the "runs" theorem [Hideo Bannai et al., 2017] as well as related results (e.g. [Aviezri S. Fraenkel and Jamie Simpson, 1998; Yuta Fujishige et al., 2017; Ryo Sugahara et al., 2019; Philip Bille et al., 2020; Hideo Bannai et al., 2020; Jonas Ellert and Johannes Fischer, 2021]) which address the two main questions: how many of them can be contained in a string of given length, and algorithms for computing them.

Cite as

Hideo Bannai. Repetitions in Strings: A "Constant" Problem (Invited Talk). In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bannai:LIPIcs.CPM.2021.1,
  author =	{Bannai, Hideo},
  title =	{{Repetitions in Strings: A "Constant" Problem}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.1},
  URN =		{urn:nbn:de:0030-drops-139523},
  doi =		{10.4230/LIPIcs.CPM.2021.1},
  annote =	{Keywords: Maximal repetitions, Squares, Lyndon words}
}

Document

DOI: 10.4230/LIPIcs.CPM.2021.7

Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time

Authors: Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Piątkowski

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Abstract

The Burrows-Wheeler transform (BWT) is a permutation whose applications are prevalent in data compression and text indexing. The bijective BWT (BBWT) is a bijective variant of it. Although it is known that the BWT can be constructed in linear time for integer alphabets by using a linear time suffix array construction algorithm, it was up to now only conjectured that the BBWT can also be constructed in linear time. We confirm this conjecture in the word RAM model by proposing a construction algorithm that is based on SAIS, improving the best known result of O(n lg n / lg lg n) time to linear. Since we can reduce the problem of constructing the extended BWT to constructing the BBWT in linear time, we obtain a linear-time algorithm computing the extended BWT at the same time.

Cite as

Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Piątkowski. Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 7:1-7:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2021.7,
  author =	{Bannai, Hideo and K\"{a}rkk\"{a}inen, Juha and K\"{o}ppl, Dominik and Pi\k{a}tkowski, Marcin},
  title =	{{Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{7:1--7:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.7},
  URN =		{urn:nbn:de:0030-drops-139588},
  doi =		{10.4230/LIPIcs.CPM.2021.7},
  annote =	{Keywords: Burrows-Wheeler Transform, Lyndon words, Circular Suffix Array, Suffix Array Construction Algorithm}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.12

Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Ayumi Shinohara

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

The equidistant subsequence pattern matching problem is considered. Given a pattern string P and a text string T, we say that P is an equidistant subsequence of T if P is a subsequence of the text such that consecutive symbols of P in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield o(n²) time algorithms for finding k-(sub-)cadences and equidistant subsequences. Furthermore, O(nlog² n) and O(nlog n) time algorithms, respectively for equidistant and Abelian equidistant matching for the case |P| = 3, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints.

Cite as

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Ayumi Shinohara. Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 12:1-12:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.CPM.2020.12,
  author =	{Funakoshi, Mitsuru and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki and Shinohara, Ayumi},
  title =	{{Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{12:1--12:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.12},
  URN =		{urn:nbn:de:0030-drops-121375},
  doi =		{10.4230/LIPIcs.CPM.2020.12},
  annote =	{Keywords: string algorithms, pattern matching, bit parallelism, subsequences, cadences}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.26

DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures

Authors: Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, and Masayuki Takeda

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

Two strings x and y over Σ ∪ Π of equal length are said to parameterized match (p-match) if there is a renaming bijection f:Σ ∪ Π → Σ ∪ Π that is identity on Σ and transforms x to y (or vice versa). The p-matching problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose parameterized suffix automata (p-suffix automata) and parameterized directed acyclic word graphs (PDAWGs) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have Θ(n²) nodes and edges but PDAWGs have only O(n) nodes and edges, where n is the length of an input string. We also give O(n |Π| log (|Π| + |Σ|))-time O(n)-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the parameterized suffix tree for the reversed string can also be built in the same time and space, in a right-to-left online manner.

Cite as

Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, and Masayuki Takeda. DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{nakashima_et_al:LIPIcs.CPM.2020.26,
  author =	{Nakashima, Katsuhito and Fujisato, Noriki and Hendrian, Diptarama and Nakashima, Yuto and Yoshinaka, Ryo and Inenaga, Shunsuke and Bannai, Hideo and Shinohara, Ayumi and Takeda, Masayuki},
  title =	{{DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{26:1--26:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.26},
  URN =		{urn:nbn:de:0030-drops-121512},
  doi =		{10.4230/LIPIcs.CPM.2020.26},
  annote =	{Keywords: parameterized matching, suffix trees, DAWGs, suffix automata}
}

@InProceedings{nakashima_et_al:LIPIcs.CPM.2020.26,
  author =	{Nakashima, Katsuhito and Fujisato, Noriki and Hendrian, Diptarama and Nakashima, Yuto and Yoshinaka, Ryo and Inenaga, Shunsuke and Bannai, Hideo and Shinohara, Ayumi and Takeda, Masayuki},
  title =	{{DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{26:1--26:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.26},
  URN =		{urn:nbn:de:0030-drops-121512},
  doi =		{10.4230/LIPIcs.CPM.2020.26},
  annote =	{Keywords: parameterized matching, suffix trees, DAWGs, suffix automata}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2019.40

An Improved Data Structure for Left-Right Maximal Generic Words Problem

Authors: Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

Abstract

For a set D of documents and a positive integer d, a string w is said to be d-left-right maximal, if (1) w occurs in at least d documents in D, and (2) any proper superstring of w occurs in less than d documents. The left-right-maximal generic words problem is, given a set D of documents, to preprocess D so that for any string p and for any positive integer d, all the superstrings of p that are d-left-right maximal can be answered quickly. In this paper, we present an O(n log m) space data structure (in words) which answers queries in O(|p| + o log log m) time, where n is the total length of documents in D, m is the number of documents in D and o is the number of outputs. Our solution improves the previous one by Nishimoto et al. (PSC 2015), which uses an O(n log n) space data structure answering queries in O(|p|+ r * log n + o * log^2 n) time, where r is the number of right-extensions q of p occurring in at least d documents such that any proper right extension of q occurs in less than d documents.

Cite as

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. An Improved Data Structure for Left-Right Maximal Generic Words Problem. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 40:1-40:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{fujishige_et_al:LIPIcs.ISAAC.2019.40,
  author =	{Fujishige, Yuta and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{An Improved Data Structure for Left-Right Maximal Generic Words Problem}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{40:1--40:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Lu, Pinyan and Zhang, Guochuan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.40},
  URN =		{urn:nbn:de:0030-drops-115366},
  doi =		{10.4230/LIPIcs.ISAAC.2019.40},
  annote =	{Keywords: generic words, suffix trees, string processing algorithms}
}

Document

DOI: 10.4230/LIPIcs.WABI.2019.8

Finding All Maximal Perfect Haplotype Blocks in Linear Time

Authors: Jarno Alanko, Hideo Bannai, Bastien Cazaux, Pierre Peterlongo, and Jens Stoye

Published in: LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)

Abstract

Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals' haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. (Proceedings of BSB 2019) suggested the maximal perfect haplotype block as a very simple combinatorial pattern, forming the basis of a new method to perform rapid genome-wide selection scans. The algorithm they presented for identifying these blocks, however, had a worst-case running time quadratic in the genome length. It was posed as an open problem whether an optimal, linear-time algorithm exists. In this paper we give two algorithms that achieve this time bound, one conceptually very simple one using suffix trees and a second one using the positional Burrows-Wheeler Transform, that is very efficient also in practice.

Cite as

Jarno Alanko, Hideo Bannai, Bastien Cazaux, Pierre Peterlongo, and Jens Stoye. Finding All Maximal Perfect Haplotype Blocks in Linear Time. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 8:1-8:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.WABI.2019.8,
  author =	{Alanko, Jarno and Bannai, Hideo and Cazaux, Bastien and Peterlongo, Pierre and Stoye, Jens},
  title =	{{Finding All Maximal Perfect Haplotype Blocks in Linear Time}},
  booktitle =	{19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
  pages =	{8:1--8:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-123-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{143},
  editor =	{Huber, Katharina T. and Gusfield, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.8},
  URN =		{urn:nbn:de:0030-drops-110388},
  doi =		{10.4230/LIPIcs.WABI.2019.8},
  annote =	{Keywords: Population genomics, selection coefficient, haplotype block, positional Burrows-Wheeler Transform}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.17

Indexing the Bijective BWT

Authors: Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Pia̧tkowski

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

The Burrows-Wheeler transform (BWT) is a permutation whose applications are prevalent in data compression and text indexing. The bijective BWT is a bijective variant of it that has not yet been studied for text indexing applications. We fill this gap by proposing a self-index built on the bijective BWT . The self-index applies the backward search technique of the FM-index to find a pattern P with O(|P| lg|P|) backward search steps.

Cite as

Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Pia̧tkowski. Indexing the Bijective BWT. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2019.17,
  author =	{Bannai, Hideo and K\"{a}rkk\"{a}inen, Juha and K\"{o}ppl, Dominik and Pia̧tkowski, Marcin},
  title =	{{Indexing the Bijective BWT}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{17:1--17:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.17},
  URN =		{urn:nbn:de:0030-drops-104887},
  doi =		{10.4230/LIPIcs.CPM.2019.17},
  annote =	{Keywords: Burrows-Wheeler Transform, Lyndon words, Text Indexing}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.23

Computing Runs on a Trie

Authors: Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with n edges, we show that the number of runs is less than n. We also show an O(n sqrt{log n}log log n) time and O(n) space algorithm for counting and finding the shallower endpoint of all runs. We further show an O(n log n) time and O(n) space algorithm for finding both endpoints of all runs. We also discuss how to improve the running time even more.

Cite as

Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing Runs on a Trie. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 23:1-23:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{sugahara_et_al:LIPIcs.CPM.2019.23,
  author =	{Sugahara, Ryo and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Computing Runs on a Trie}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{23:1--23:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.23},
  URN =		{urn:nbn:de:0030-drops-104943},
  doi =		{10.4230/LIPIcs.CPM.2019.23},
  annote =	{Keywords: runs, Lyndon words}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.27

Faster Queries for Longest Substring Palindrome After Block Edit

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. Manacher [J. ACM 1975] proposed a seminal algorithm that computes the longest substring palindromes (LSPals) of a given string in O(n) time, where n is the length of the string. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log log n) time, after a substring in T is replaced by a string of arbitrary length l. This outperforms the query algorithm proposed in our previous work [CPM 2018] that uses O(l + log n) time for each query.

Cite as

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Queries for Longest Substring Palindrome After Block Edit. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 27:1-27:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.CPM.2019.27,
  author =	{Funakoshi, Mitsuru and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster Queries for Longest Substring Palindrome After Block Edit}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{27:1--27:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.27},
  URN =		{urn:nbn:de:0030-drops-104989},
  doi =		{10.4230/LIPIcs.CPM.2019.27},
  annote =	{Keywords: palindromes, string algorithm, periodicity}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.29

On the Size of Overlapping Lempel-Ziv and Lyndon Factorizations

Authors: Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

Lempel-Ziv (LZ) factorization and Lyndon factorization are well-known factorizations of strings. Recently, Kärkkäinen et al. studied the relation between the sizes of the two factorizations, and showed that the size of the Lyndon factorization is always smaller than twice the size of the non-overlapping LZ factorization [STACS 2017]. In this paper, we consider a similar problem for the overlapping version of the LZ factorization. Since the size of the overlapping LZ factorization is always smaller than the size of the non-overlapping LZ factorization and, in fact, can even be an O(log n) factor smaller, it is not immediately clear whether a similar bound as in previous work would hold. Nevertheless, in this paper, we prove that the size of the Lyndon factorization is always smaller than four times the size of the overlapping LZ factorization.

Cite as

Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. On the Size of Overlapping Lempel-Ziv and Lyndon Factorizations. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 29:1-29:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{urabe_et_al:LIPIcs.CPM.2019.29,
  author =	{Urabe, Yuki and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{On the Size of Overlapping Lempel-Ziv and Lyndon Factorizations}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{29:1--29:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.29},
  URN =		{urn:nbn:de:0030-drops-105008},
  doi =		{10.4230/LIPIcs.CPM.2019.29},
  annote =	{Keywords: Lyndon factorization, Lyndon words, Lempel-Ziv factorization}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.2

Order-Preserving Pattern Matching Indeterminate Strings

Authors: Rui Henriques, Alexandre P. Francisco, Luís M. S. Russo, and Hideo Bannai

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

Given an indeterminate string pattern p and an indeterminate string text t, the problem of order-preserving pattern matching with character uncertainties (muOPPM) is to find all substrings of t that satisfy one of the possible orderings defined by p. When the text and pattern are determinate strings, we are in the presence of the well-studied exact order-preserving pattern matching (OPPM) problem with diverse applications on time series analysis. Despite its relevance, the exact OPPM problem suffers from two major drawbacks: 1) the inability to deal with indetermination in the text, thus preventing the analysis of noisy time series; and 2) the inability to deal with indetermination in the pattern, thus imposing the strict satisfaction of the orders among all pattern positions. In this paper, we provide the first polynomial algorithms to answer the muOPPM problem when: 1) indetermination is observed on the pattern or text; and 2) indetermination is observed on both the pattern and the text and given by uncertainties between pairs of characters. First, given two strings with the same length m and O(r) uncertain characters per string position, we show that the muOPPM problem can be solved in O(mr lg r) time when one string is indeterminate and r in N^+ and in O(m^2) time when both strings are indeterminate and r=2. Second, given an indeterminate text string of length n, we show that muOPPM can be efficiently solved in polynomial time and linear space.

Cite as

Rui Henriques, Alexandre P. Francisco, Luís M. S. Russo, and Hideo Bannai. Order-Preserving Pattern Matching Indeterminate Strings. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 2:1-2:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{henriques_et_al:LIPIcs.CPM.2018.2,
  author =	{Henriques, Rui and Francisco, Alexandre P. and Russo, Lu{\'\i}s M. S. and Bannai, Hideo},
  title =	{{Order-Preserving Pattern Matching Indeterminate Strings}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{2:1--2:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.2},
  URN =		{urn:nbn:de:0030-drops-87087},
  doi =		{10.4230/LIPIcs.CPM.2018.2},
  annote =	{Keywords: Order-preserving pattern matching, Indeterminate string analysis, Generic pattern matching, Satisfiability}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.7

Online LZ77 Parsing and Matching Statistics with RLBWTs

Authors: Hideo Bannai, Travis Gagie, and Tomohiro I

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse T^R of a text T, to compute offline the LZ77 parse of T in O(n log r) time and O(r) space, where n is the length of T and r is the number of runs in the BWT of T^R. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of T while still using O(n log r) time and O(r) space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further - albeit making it static again and increasing its space by a factor proportional to the size of the alphabet - such that later, given another string S and O(log log n)-time random access to T, we can compute the matching statistics of S with respect to T in O(|S| log log n) time.

Cite as

Hideo Bannai, Travis Gagie, and Tomohiro I. Online LZ77 Parsing and Matching Statistics with RLBWTs. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 7:1-7:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2018.7,
  author =	{Bannai, Hideo and Gagie, Travis and I, Tomohiro},
  title =	{{Online LZ77 Parsing and Matching Statistics with RLBWTs}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{7:1--7:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.7},
  URN =		{urn:nbn:de:0030-drops-87025},
  doi =		{10.4230/LIPIcs.CPM.2018.7},
  annote =	{Keywords: Lempel-Ziv 1977, Matching Statistics, Run-Length Compressed Burrows-Wheeler Transform}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.9

Faster Online Elastic Degenerate String Matching

Authors: Kotaro Aoyama, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm sqrt{m log m} + N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm^2 + N) time.

Cite as

Kotaro Aoyama, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Online Elastic Degenerate String Matching. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 9:1-9:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{aoyama_et_al:LIPIcs.CPM.2018.9,
  author =	{Aoyama, Kotaro and Nakashima, Yuto and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster Online Elastic Degenerate String Matching}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{9:1--9:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.9},
  URN =		{urn:nbn:de:0030-drops-87016},
  doi =		{10.4230/LIPIcs.CPM.2018.9},
  annote =	{Keywords: elastic degenerate pattern matching, boolean convolution}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.12

Longest substring palindrome after edit

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

It is known that the length of the longest substring palindromes (LSPals) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(log (min {sigma, log n })) time after single character substitution, insertion, or deletion, where sigma denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log n) time, after an existing substring in T is replaced by a string of arbitrary length l.

Cite as

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest substring palindrome after edit. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 12:1-12:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.CPM.2018.12,
  author =	{Funakoshi, Mitsuru and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Longest substring palindrome after edit}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{12:1--12:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.12},
  URN =		{urn:nbn:de:0030-drops-86977},
  doi =		{10.4230/LIPIcs.CPM.2018.12},
  annote =	{Keywords: maximal palindromes, edit operations, periodicity, suffix trees}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.15

Computing longest common square subsequences

Authors: Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings.

Cite as

Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, Hideo Bannai, and Masayuki Takeda. Computing longest common square subsequences. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{inoue_et_al:LIPIcs.CPM.2018.15,
  author =	{Inoue, Takafumi and Inenaga, Shunsuke and Hyyr\"{o}, Heikki and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Computing longest common square subsequences}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.15},
  URN =		{urn:nbn:de:0030-drops-86946},
  doi =		{10.4230/LIPIcs.CPM.2018.15},
  annote =	{Keywords: squares, subsequences, matching rectangles, dynamic programming}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.19

Longest Lyndon Substring After Edit

Authors: Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T') where T' is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(l log sigma + log n) time for any block edit where sigma is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T' for both problems.

Cite as

Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest Lyndon Substring After Edit. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 19:1-19:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{urabe_et_al:LIPIcs.CPM.2018.19,
  author =	{Urabe, Yuki and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Longest Lyndon Substring After Edit}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{19:1--19:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.19},
  URN =		{urn:nbn:de:0030-drops-86913},
  doi =		{10.4230/LIPIcs.CPM.2018.19},
  annote =	{Keywords: Lyndon word, Lyndon factorization, Lyndon tree, Edit operation}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.24

Lyndon Factorization of Grammar Compressed Texts Revisited

Authors: Isamu Furuya, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.

Cite as

Isamu Furuya, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Lyndon Factorization of Grammar Compressed Texts Revisited. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 24:1-24:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{furuya_et_al:LIPIcs.CPM.2018.24,
  author =	{Furuya, Isamu and Nakashima, Yuto and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Lyndon Factorization of Grammar Compressed Texts Revisited}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{24:1--24:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.24},
  URN =		{urn:nbn:de:0030-drops-86855},
  doi =		{10.4230/LIPIcs.CPM.2018.24},
  annote =	{Keywords: Lyndon word, Lyndon factorization, Straight line program}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2017.33

Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings

Authors: Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 92, 28th International Symposium on Algorithms and Computation (ISAAC 2017)

Abstract

We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding. Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2. We also show an algorithm for computing all maximal repetitions in O(m \alpha(m)) time and O(m) space, where \alpha denotes the inverse Ackermann function.

Cite as

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 33:1-33:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{fujishige_et_al:LIPIcs.ISAAC.2017.33,
  author =	{Fujishige, Yuta and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings}},
  booktitle =	{28th International Symposium on Algorithms and Computation (ISAAC 2017)},
  pages =	{33:1--33:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-054-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{92},
  editor =	{Okamoto, Yoshio and Tokuyama, Takeshi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2017.33},
  URN =		{urn:nbn:de:0030-drops-82610},
  doi =		{10.4230/LIPIcs.ISAAC.2017.33},
  annote =	{Keywords: maximal repetitions,run length encoding}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2017.10

Small-Space LCE Data Structure with Constant-Time Queries

Authors: Yuka Tanimura, Takaaki Nishimoto, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda

Published in: LIPIcs, Volume 83, 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017)

Abstract

The longest common extension (LCE) problem is to preprocess a given string w of length n so that the length of the longest common prefix between suffixes of w that start at any two given positions is answered quickly. In this paper, we present a data structure of O(z \tau^2 + \frac{n}{\tau}) words of space which answers LCE queries in O(1) time and can be built in O(n \log \sigma) time, where 1 \leq \tau \leq \sqrt{n} is a parameter, z is the size of the Lempel-Ziv 77 factorization of w and \sigma is the alphabet size. The proposed LCE data structure not access the input string w when answering queries, and thus w can be deleted after preprocessing. On top of this main result, we obtain further results using (variants of) our LCE data structure, which include the following: - For highly repetitive strings where the z\tau^2 term is dominated by \frac{n}{\tau}, we obtain a constant-time and sub-linear space LCE query data structure. - Even when the input string is not well compressible via Lempel-Ziv 77 factorization, we still can obtain a constant-time and sub-linear space LCE data structure for suitable \tau and for \sigma \leq 2^{o(\log n)}. - The time-space trade-off lower bounds for the LCE problem by Bille et al. [J. Discrete Algorithms, 25:42-50, 2014] and by Kosolobov [CoRR, abs/1611.02891, 2016] do not apply in some cases with our LCE data structure.

Cite as

Yuka Tanimura, Takaaki Nishimoto, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Small-Space LCE Data Structure with Constant-Time Queries. In 42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 83, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{tanimura_et_al:LIPIcs.MFCS.2017.10,
  author =	{Tanimura, Yuka and Nishimoto, Takaaki and Bannai, Hideo and Inenaga, Shunsuke and Takeda, Masayuki},
  title =	{{Small-Space LCE Data Structure with Constant-Time Queries}},
  booktitle =	{42nd International Symposium on Mathematical Foundations of Computer Science (MFCS 2017)},
  pages =	{10:1--10:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-046-0},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{83},
  editor =	{Larsen, Kim G. and Bodlaender, Hans L. and Raskin, Jean-Francois},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2017.10},
  URN =		{urn:nbn:de:0030-drops-81021},
  doi =		{10.4230/LIPIcs.MFCS.2017.10},
  annote =	{Keywords: longest common extension, truncated suffix trees, t-covers}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.20

Faster STR-IC-LCS Computation via RLE

Authors: Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

The constrained LCS problem asks one to find a longest common subsequence of two input strings A and B with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string C as a substring. Given two strings A and B of respective lengths M and N, and a constraint string C of length at most min{M, N}, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz (Inf. Process. Lett., 11:423-426, 2012), runs in O(MN) time. In this work, we present an O(mN + nM)-time solution to the STR-IC-LCS problem, where m and n denote the sizes of the run-length encodings of A and B, respectively. Since m <= M and n <= N always hold, our algorithm is always as fast as Deorowicz's algorithm, and is faster when input strings are compressible via RLE.

Cite as

Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster STR-IC-LCS Computation via RLE. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 20:1-20:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{kuboi_et_al:LIPIcs.CPM.2017.20,
  author =	{Kuboi, Keita and Fujishige, Yuta and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster STR-IC-LCS Computation via RLE}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{20:1--20:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.20},
  URN =		{urn:nbn:de:0030-drops-73335},
  doi =		{10.4230/LIPIcs.CPM.2017.20},
  annote =	{Keywords: longest common subsequence, STR-IC-LCS, run-length encoding}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.22

Computing All Distinct Squares in Linear Time for Integer Alphabets

Authors: Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

Given a string on an integer alphabet, we present an algorithm that computes the set of all distinct squares belonging to this string in time linear to the string length. As an application, we show how to compute the tree topology of the minimal augmented suffix tree in linear time. Asides from that, we elaborate an algorithm computing the longest previous table in a succinct representation using compressed working space.

Cite as

Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl. Computing All Distinct Squares in Linear Time for Integer Alphabets. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 22:1-22:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2017.22,
  author =	{Bannai, Hideo and Inenaga, Shunsuke and K\"{o}ppl, Dominik},
  title =	{{Computing All Distinct Squares in Linear Time for Integer Alphabets}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{22:1--22:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.22},
  URN =		{urn:nbn:de:0030-drops-73224},
  doi =		{10.4230/LIPIcs.CPM.2017.22},
  annote =	{Keywords: tandem repeats, distinct squares, counting algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.24

Tight Bounds on the Maximum Number of Shortest Unique Substrings

Authors: Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

A substring Q of a string S is called a shortest unique substring (SUS) for interval [s,t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s,t], and every substring of S which contains interval [s,t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUSs for interval [s,t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s <= t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

Cite as

Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Tight Bounds on the Maximum Number of Shortest Unique Substrings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 24:1-24:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.CPM.2017.24,
  author =	{Mieno, Takuya and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Tight Bounds on the Maximum Number of Shortest Unique Substrings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{24:1--24:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.24},
  URN =		{urn:nbn:de:0030-drops-73460},
  doi =		{10.4230/LIPIcs.CPM.2017.24},
  annote =	{Keywords: shortest unique substrings, maximal unique substrings}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2016.38

Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets

Authors: Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 58, 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)

Abstract

The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words of y can be computed in optimal O(n + |MAW(y)|) time and O(n) working space for integer alphabets.

Cite as

Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 38:1-38:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{fujishige_et_al:LIPIcs.MFCS.2016.38,
  author =	{Fujishige, Yuta and Tsujimaru, Yuki and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets}},
  booktitle =	{41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)},
  pages =	{38:1--38:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-016-3},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{58},
  editor =	{Faliszewski, Piotr and Muscholl, Anca and Niedermeier, Rolf},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2016.38},
  URN =		{urn:nbn:de:0030-drops-64528},
  doi =		{10.4230/LIPIcs.MFCS.2016.38},
  annote =	{Keywords: string algorithms, DAWGs, suffix trees, minimal absent words}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2016.69

Shortest Unique Substring Queries on Run-Length Encoded Strings

Authors: Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 58, 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)

Abstract

We consider the problem of answering shortest unique substring (SUS) queries on run-length encoded strings. For a string S, a unique substring u = S[i..j] is said to be a shortest unique substring (SUS) of S containing an interval [s, t] (i <= s <= t <= j) if for any i' <= s <= t <= j' with j-i > j'-i', S[i'..j'] occurs at least twice in S. Given a run-length encoding of size m of a string of length N, we show that we can construct a data structure of size O(m+pi_s(N, m)) in O(m log m + pi_c(N, m)) time such that queries can be answered in O(pi_q(N, m) + k) time, where k is the size of the output (the number of SUSs), and pi_s(N,m), pi_c(N,m), pi_q(N,m) are, respectively, the size, construction time, and query time for a predecessor/successor query data structure of m elements for the universe of [1,N]. Using the data structure by Beam and Fich (JCSS 2002), this results in a data structure of O(m) space that is constructed in O(m log m) time, and answers queries in O(sqrt(log m/loglog m)+k) time.

Cite as

Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Shortest Unique Substring Queries on Run-Length Encoded Strings. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 69:1-69:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.MFCS.2016.69,
  author =	{Mieno, Takuya and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Shortest Unique Substring Queries on Run-Length Encoded Strings}},
  booktitle =	{41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)},
  pages =	{69:1--69:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-016-3},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{58},
  editor =	{Faliszewski, Piotr and Muscholl, Anca and Niedermeier, Rolf},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2016.69},
  URN =		{urn:nbn:de:0030-drops-65033},
  doi =		{10.4230/LIPIcs.MFCS.2016.69},
  annote =	{Keywords: string algorithms, shortest unique substring, run-length encoding}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2016.72

Fully Dynamic Data Structure for LCE Queries in Compressed Space

Authors: Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 58, 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)

Abstract

A Longest Common Extension (LCE) query on a text T of length N asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding G of size w = O(min(z log N log^* M, N)) [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of T, which can be seen as a compressed representation of T, has a capability to support LCE queries in O(log N + log ell log^* M) time, where ell is the answer to the query, z is the size of the Lempel-Ziv77 (LZ77) factorization of T, and M >= 4N is an integer that can be handled in constant time under word RAM model. In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover, G can be enhanced to support efficient update operations: After processing G in O(w f_A) time, we can insert/delete any (sub)string of length y into/from an arbitrary position of T in O((y + log Nlog^* M) f_A) time, where f_A = O(min{ (loglog M loglog w)/(logloglog M), sqrt(log w/loglog w)}). This yields the first fully dynamic LCE data structure working in compressed space. We also present efficient construction algorithms from various types of inputs: We can construct G in O(N f_A) time from uncompressed string T; in O(n loglog (n log^* M) log N log^* M) time from grammar-compressed string T represented by a straight-line program of size n; and in O(z f_A log N log^* M) time from LZ77-compressed string T with z factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing.

Cite as

Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully Dynamic Data Structure for LCE Queries in Compressed Space. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 72:1-72:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{nishimoto_et_al:LIPIcs.MFCS.2016.72,
  author =	{Nishimoto, Takaaki and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Fully Dynamic Data Structure for LCE Queries in Compressed Space}},
  booktitle =	{41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)},
  pages =	{72:1--72:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-016-3},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{58},
  editor =	{Faliszewski, Piotr and Muscholl, Anca and Niedermeier, Rolf},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2016.72},
  URN =		{urn:nbn:de:0030-drops-65045},
  doi =		{10.4230/LIPIcs.MFCS.2016.72},
  annote =	{Keywords: dynamic texts, longest common extension (LCE) queries, straight-line program}
}

Document

DOI: 10.4230/LIPIcs.CPM.2016.1

Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction

Authors: Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Abstract

Given a string S of n symbols, a longest common extension query LCE(i,j) asks for the length of the longest common prefix of the $i$th and $j$th suffixes of S. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015:65-76) described several data structures for answering LCE queries that offers a space-time trade-off between data structure size and query time. In particular, for a parameter 1 <= tau <= n, their best deterministic solution is a data structure of size O(n/tau) which allows LCE queries to be answered in O(tau) time. However, the construction time for all deterministic versions of their data structure is quadratic in n. In this paper, we propose a deterministic solution that achieves a similar space-time trade-off of O(tau * min(log(tau),log(n/tau)) query time using O(n/tau) space, but significantly improve the construction time to O(n*tau).

Cite as

Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda. Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 1:1-1:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{tanimura_et_al:LIPIcs.CPM.2016.1,
  author =	{Tanimura, Yuka and I, Tomohiro and Bannai, Hideo and Inenaga, Shunsuke and Puglisi, Simon J. and Takeda, Masayuki},
  title =	{{Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{1:1--1:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.1},
  URN =		{urn:nbn:de:0030-drops-60655},
  doi =		{10.4230/LIPIcs.CPM.2016.1},
  annote =	{Keywords: longest common extension, longest common prefix, sparse suffix array}
}

Document

DOI: 10.4230/LIPIcs.CPM.2016.27

Factorizing a String into Squares in Linear Time

Authors: Yoshiaki Matsuoka, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Florin Manea

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Abstract

A square factorization of a string w is a factorization of w in which each factor is a square. Dumitran et al. [SPIRE 2015, pp. 54-66] showed how to find a square factorization of a given string of length n in O(n log n) time, and they posed a question whether it can be done in O(n) time. In this paper, we answer their question positively, showing an O(n)-time algorithm for square factorization in the standard word RAM model with machine word size omega = Omega(log n). We also show an O(n + (n log^2 n) / omega)-time (respectively, O(n log n)-time) algorithm to find a square factorization which contains the maximum (respectively, minimum) number of squares.

Cite as

Yoshiaki Matsuoka, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Florin Manea. Factorizing a String into Squares in Linear Time. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 27:1-27:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{matsuoka_et_al:LIPIcs.CPM.2016.27,
  author =	{Matsuoka, Yoshiaki and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki and Manea, Florin},
  title =	{{Factorizing a String into Squares in Linear Time}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{27:1--27:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.27},
  URN =		{urn:nbn:de:0030-drops-60645},
  doi =		{10.4230/LIPIcs.CPM.2016.27},
  annote =	{Keywords: Squares, Runs, Factorization of Strings}
}

Document

DOI: 10.4230/LIPIcs.STACS.2014.675

Faster Compact On-Line Lempel-Ziv Factorization

Authors: Jun'ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda

Published in: LIPIcs, Volume 25, 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)

Abstract

We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in O(N.log(N)) time and uses only O(N.log(s)) bits of working space, where N is the length of the string and s is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either O(N.log^3(N)) time [Okanohara and Sadakane, 2009] or O(N.log^2(N)) time [Starikovskaya, 2012]. The key to our new algorithm is in the utilization of an elegant but less popular index structure called Directed Acyclic Word Graphs, or DAWGs [Blumer et al., 1985]. We also present an opportunistic variant of our algorithm, which, given the run length encoding of size m of a string of length N, computes the Lempel-Ziv factorization of the string on-line, in O(m.min{log(log(m)).log(log(N))/(log(log(log(N)))), (log(m))^{1/2}/(log(log(m)))^{1/2})}) time and O(m.log(N)) bits of space.

Cite as

Jun'ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Faster Compact On-Line Lempel-Ziv Factorization. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 25, pp. 675-686, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{yamamoto_et_al:LIPIcs.STACS.2014.675,
  author =	{Yamamoto, Jun'ichi and I, Tomohiro and Bannai, Hideo and Inenaga, Shunsuke and Takeda, Masayuki},
  title =	{{Faster Compact On-Line Lempel-Ziv Factorization}},
  booktitle =	{31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)},
  pages =	{675--686},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-65-1},
  ISSN =	{1868-8969},
  year =	{2014},
  volume =	{25},
  editor =	{Mayr, Ernst W. and Portier, Natacha},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2014.675},
  URN =		{urn:nbn:de:0030-drops-44976},
  doi =		{10.4230/LIPIcs.STACS.2014.675},
  annote =	{Keywords: Lempel-Ziv Factorization, String Index}
}

Search Results

Documents authored by Bannai, Hideo

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as