DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2024.18

Height-Bounded Lempel-Ziv Encodings

Authors: Hideo Bannai, Mitsuru Funakoshi, Diptarama Hendrian, Myuji Matsuda, and Simon J. Puglisi

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)

Abstract

We introduce height-bounded LZ encodings (LZHB), a new family of compressed representations that are variants of Lempel-Ziv parsings with a focus on bounding the worst-case access time to arbitrary positions in the text directly via the compressed representation. An LZ-like encoding is a partitioning of the string into phrases of length 1 which can be encoded literally, or phrases of length at least 2 which have a previous occurrence in the string and can be encoded by its position and length. An LZ-like encoding induces an implicit referencing forest on the set of positions of the string. An LZHB encoding is an LZ-like encoding where the height of the implicit referencing forest is bounded. An LZHB encoding with height constraint h allows access to an arbitrary position of the underlying text using O(h) predecessor queries. While computing the optimal (i.e., smallest) LZHB encoding efficiently seems to be difficult [Cicalese & Ugazio 2024, arXiv, to appear at DLT 2024], we give the first linear time algorithm for strings over a constant size alphabet that computes the greedy LZHB encoding, i.e., the string is processed from beginning to end, and the longest prefix of the remaining string that can satisfy the height constraint is taken as the next phrase. Our algorithms significantly improve both theoretically and practically, the very recently and independently proposed algorithms by Lipták et al. (CPM 2024). We also analyze the size of height bounded LZ encodings in the context of repetitiveness measures, and show that there exists a constant c such that the size ẑ_{HB(clog n)} of the optimal LZHB encoding whose height is bounded by clog n for any string of length n is O(ĝ_{rl}), where ĝ_{rl} is the size of the smallest run-length grammar. Furthermore, we show that there exists a family of strings such that ẑ_{HB(clog n)} = o(ĝ_{rl}), thus making ẑ_{HB(clog n)} one of the smallest known repetitiveness measures for which O(polylog n) time access is possible using linear (O(ẑ_{HB(clog n)})) space.

Cite as

Hideo Bannai, Mitsuru Funakoshi, Diptarama Hendrian, Myuji Matsuda, and Simon J. Puglisi. Height-Bounded Lempel-Ziv Encodings. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.ESA.2024.18,
  author =	{Bannai, Hideo and Funakoshi, Mitsuru and Hendrian, Diptarama and Matsuda, Myuji and Puglisi, Simon J.},
  title =	{{Height-Bounded Lempel-Ziv Encodings}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{18:1--18:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.18},
  URN =		{urn:nbn:de:0030-drops-210899},
  doi =		{10.4230/LIPIcs.ESA.2024.18},
  annote =	{Keywords: Lempel-Ziv parsing, data compression}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.89

Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching

Authors: Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

A parameterized string (p-string) is a string over an alphabet (Σ_s ∪ Σ_p), where Σ_s and Σ_p are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings x and y are said to parameterized match (p-match) if and only if x can be transformed into y by applying a bijection on Σ_p to every occurrence of p-symbols in x. The indexing problem for p-matching is to preprocess a p-string T of length n so that we can efficiently find the occurrences of substrings of T that p-match with a given pattern. Let σ_s and respectively σ_p be the numbers of distinct s-symbols and p-symbols that appear in T and σ = σ_s + σ_p. Extending the Burrows-Wheeler Transform (BWT) based index for exact string pattern matching, Ganguly et al. [SODA 2017] proposed parameterized BWTs (pBWTs) to design the first compact index for p-matching, and posed an open problem on how to construct the pBWT-based index in compact space, i.e., in O(n lg |Σ_s ∪ Σ_p|) bits of space. Hashimoto et al. [SPIRE 2022] showed how to construct the pBWT for T, under the assumption that Σ_s ∪ Σ_p = [0..O(σ)], in O(n lg σ) bits of space and O(n (σ_p lg n)/(lg lg n)) time in an online manner while reading the symbols of T from right to left. In this paper, we refine Hashimoto et al.’s algorithm to work in O(n lg σ) bits of space and O(n (lg σ_p lg n)/(lg lg n)) time in a more general assumption that Σ_s ∪ Σ_p = [0..n^{O(1)}]. Our result has an immediate application to constructing parameterized suffix arrays in O(n (lg σ_p lg n)/(lg lg n)) time and O(n lg σ) bits of working space. We also show that our data structure can support backward search, a core procedure of BWT-based indexes, at any stage of the online construction, making it the first compact index for p-matching that can be constructed in compact space and even in an online manner.

Cite as

Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara. Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 89:1-89:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{iseri_et_al:LIPIcs.ICALP.2024.89,
  author =	{Iseri, Kento and I, Tomohiro and Hendrian, Diptarama and K\"{o}ppl, Dominik and Yoshinaka, Ryo and Shinohara, Ayumi},
  title =	{{Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{89:1--89:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.89},
  URN =		{urn:nbn:de:0030-drops-202324},
  doi =		{10.4230/LIPIcs.ICALP.2024.89},
  annote =	{Keywords: Index for parameterized pattern matching, Parameterized Burrows-Wheeler Transform, Online construction}
}

@InProceedings{iseri_et_al:LIPIcs.ICALP.2024.89,
  author =	{Iseri, Kento and I, Tomohiro and Hendrian, Diptarama and K\"{o}ppl, Dominik and Yoshinaka, Ryo and Shinohara, Ayumi},
  title =	{{Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{89:1--89:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.89},
  URN =		{urn:nbn:de:0030-drops-202324},
  doi =		{10.4230/LIPIcs.ICALP.2024.89},
  annote =	{Keywords: Index for parameterized pattern matching, Parameterized Burrows-Wheeler Transform, Online construction}
}

Document

DOI: 10.4230/LIPIcs.CPM.2024.18

Algorithms for Galois Words: Detection, Factorization, and Rotation

Authors: Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

Abstract

Lyndon words are extensively studied in combinatorics on words - they play a crucial role on upper bounding the number of runs a word can have [Bannai+, SIAM J. Comput.'17]. We can determine Lyndon words, factorize a word into Lyndon words in lexicographically non-increasing order, and find the Lyndon rotation of a word, all in linear time within constant additional working space. A recent research interest emerged from the question of what happens when we change the lexicographic order, which is at the heart of the definition of Lyndon words. In particular, the alternating order, where the order of all odd positions becomes reversed, has been recently proposed. While a Lyndon word is, among all its cyclic rotations, the smallest one with respect to the lexicographic order, a Galois word exhibits the same property by exchanging the lexicographic order with the alternating order. Unfortunately, this exchange has a large impact on the properties Galois words exhibit, which makes it a nontrivial task to translate results from Lyndon words to Galois words. Up until now, it has only been conjectured that linear-time algorithms with constant additional working space in the spirit of Duval’s algorithm are possible for computing the Galois factorization or the Galois rotation. Here, we affirm this conjecture as follows. Given a word T of length n, we can determine whether T is a Galois word, in O(n) time with constant additional working space. Within the same complexities, we can also determine the Galois rotation of T, and compute the Galois factorization of T online. The last result settles Open Problem 1 in [Dolce et al., TCS 2019] for Galois words.

Cite as

Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara. Algorithms for Galois Words: Detection, Factorization, and Rotation. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 18:1-18:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{hendrian_et_al:LIPIcs.CPM.2024.18,
  author =	{Hendrian, Diptarama and K\"{o}ppl, Dominik and Yoshinaka, Ryo and Shinohara, Ayumi},
  title =	{{Algorithms for Galois Words: Detection, Factorization, and Rotation}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{18:1--18:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.18},
  URN =		{urn:nbn:de:0030-drops-201288},
  doi =		{10.4230/LIPIcs.CPM.2024.18},
  annote =	{Keywords: Galois Factorization, Alternating Order, Word Factorization Algorithm, Regularity Detection}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.28

Parallel Algorithm for Pattern Matching Problems Under Substring Consistent Equivalence Relations

Authors: Davaajav Jargalsaikhan, Diptarama Hendrian, Ryo Yoshinaka, and Ayumi Shinohara

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

Given a text and a pattern over an alphabet, the pattern matching problem searches for all occurrences of the pattern in the text. An equivalence relation ≈ is a substring consistent equivalence relation (SCER), if for two strings X and Y, X ≈ Y implies |X| = |Y| and X[i:j] ≈ Y[i:j] for all 1 ≤ i ≤ j ≤ |X|. In this paper, we propose an efficient parallel algorithm for pattern matching under any SCER using the "duel-and-sweep" paradigm. For a pattern of length m and a text of length n, our algorithm runs in O(ξ_m^t log³ m) time and O(ξ_m^w ⋅ n log² m) work, with O(τ_n^t + ξ_m^t log² m) time and O(τ_n^w + ξ_m^w ⋅ m log² m) work preprocessing on the Priority Concurrent Read Concurrent Write Parallel Random-Access Machines (P-CRCW PRAM), where τ_n^t, τ_n^w, ξ_m^t, and ξ_m^w are parameters dependent on SCERs, which are often linear in n and m, respectively.

Cite as

Davaajav Jargalsaikhan, Diptarama Hendrian, Ryo Yoshinaka, and Ayumi Shinohara. Parallel Algorithm for Pattern Matching Problems Under Substring Consistent Equivalence Relations. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 28:1-28:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{jargalsaikhan_et_al:LIPIcs.CPM.2022.28,
  author =	{Jargalsaikhan, Davaajav and Hendrian, Diptarama and Yoshinaka, Ryo and Shinohara, Ayumi},
  title =	{{Parallel Algorithm for Pattern Matching Problems Under Substring Consistent Equivalence Relations}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{28:1--28:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.28},
  URN =		{urn:nbn:de:0030-drops-161552},
  doi =		{10.4230/LIPIcs.CPM.2022.28},
  annote =	{Keywords: parallel algorithm, substring consistent equivalence relation, pattern matching}
}

Document

DOI: 10.4230/LIPIcs.SEA.2020.13

Fast and Linear-Time String Matching Algorithms Based on the Distances of q-Gram Occurrences

Authors: Satoshi Kobayashi, Diptarama Hendrian, Ryo Yoshinaka, and Ayumi Shinohara

Published in: LIPIcs, Volume 160, 18th International Symposium on Experimental Algorithms (SEA 2020)

Abstract

Given a text T of length n and a pattern P of length m, the string matching problem is a task to find all occurrences of P in T. In this study, we propose an algorithm that solves this problem in O((n + m)q) time considering the distance between two adjacent occurrences of the same q-gram contained in P. We also propose a theoretical improvement of it which runs in O(n + m) time, though it is not necessarily faster in practice. We compare the execution times of our and existing algorithms on various kinds of real and artificial datasets such as an English text, a genome sequence and a Fibonacci string. The experimental results show that our algorithm is as fast as the state-of-the-art algorithms in many cases, particularly when a pattern frequently appears in a text.

Cite as

Satoshi Kobayashi, Diptarama Hendrian, Ryo Yoshinaka, and Ayumi Shinohara. Fast and Linear-Time String Matching Algorithms Based on the Distances of q-Gram Occurrences. In 18th International Symposium on Experimental Algorithms (SEA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 160, pp. 13:1-13:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{kobayashi_et_al:LIPIcs.SEA.2020.13,
  author =	{Kobayashi, Satoshi and Hendrian, Diptarama and Yoshinaka, Ryo and Shinohara, Ayumi},
  title =	{{Fast and Linear-Time String Matching Algorithms Based on the Distances of q-Gram Occurrences}},
  booktitle =	{18th International Symposium on Experimental Algorithms (SEA 2020)},
  pages =	{13:1--13:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-148-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{160},
  editor =	{Faro, Simone and Cantone, Domenico},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2020.13},
  URN =		{urn:nbn:de:0030-drops-120878},
  doi =		{10.4230/LIPIcs.SEA.2020.13},
  annote =	{Keywords: String matching algorithm, text processing}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.21

In-Place Bijective Burrows-Wheeler Transforms

Authors: Dominik Köppl, Daiki Hashimoto, Diptarama Hendrian, and Ayumi Shinohara

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

One of the most well-known variants of the Burrows-Wheeler transform (BWT) [Burrows and Wheeler, 1994] is the bijective BWT (BBWT) [Gil and Scott, arXiv 2012], which applies the extended BWT (EBWT) [Mantaci et al., TCS 2007] to the multiset of Lyndon factors of a given text. Since the EBWT is invertible, the BBWT is a bijective transform in the sense that the inverse image of the EBWT restores this multiset of Lyndon factors such that the original text can be obtained by sorting these factors in non-increasing order. In this paper, we present algorithms constructing or inverting the BBWT in-place using quadratic time. We also present conversions from the BBWT to the BWT, or vice versa, either (a) in-place using quadratic time, or (b) in the run-length compressed setting using 𝒪(n lg r / lg lg r) time with 𝒪(r lg n) bits of words, where r is the sum of character runs in the BWT and the BBWT.

Cite as

Dominik Köppl, Daiki Hashimoto, Diptarama Hendrian, and Ayumi Shinohara. In-Place Bijective Burrows-Wheeler Transforms. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 21:1-21:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{koppl_et_al:LIPIcs.CPM.2020.21,
  author =	{K\"{o}ppl, Dominik and Hashimoto, Daiki and Hendrian, Diptarama and Shinohara, Ayumi},
  title =	{{In-Place Bijective Burrows-Wheeler Transforms}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{21:1--21:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.21},
  URN =		{urn:nbn:de:0030-drops-121463},
  doi =		{10.4230/LIPIcs.CPM.2020.21},
  annote =	{Keywords: In-Place Algorithms, Burrows-Wheeler transform, Lyndon words}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.26

DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures

Authors: Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, and Masayuki Takeda

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

Two strings x and y over Σ ∪ Π of equal length are said to parameterized match (p-match) if there is a renaming bijection f:Σ ∪ Π → Σ ∪ Π that is identity on Σ and transforms x to y (or vice versa). The p-matching problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose parameterized suffix automata (p-suffix automata) and parameterized directed acyclic word graphs (PDAWGs) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have Θ(n²) nodes and edges but PDAWGs have only O(n) nodes and edges, where n is the length of an input string. We also give O(n |Π| log (|Π| + |Σ|))-time O(n)-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the parameterized suffix tree for the reversed string can also be built in the same time and space, in a right-to-left online manner.

Cite as

Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, and Masayuki Takeda. DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{nakashima_et_al:LIPIcs.CPM.2020.26,
  author =	{Nakashima, Katsuhito and Fujisato, Noriki and Hendrian, Diptarama and Nakashima, Yuto and Yoshinaka, Ryo and Inenaga, Shunsuke and Bannai, Hideo and Shinohara, Ayumi and Takeda, Masayuki},
  title =	{{DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{26:1--26:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.26},
  URN =		{urn:nbn:de:0030-drops-121512},
  doi =		{10.4230/LIPIcs.CPM.2020.26},
  annote =	{Keywords: parameterized matching, suffix trees, DAWGs, suffix automata}
}

@InProceedings{nakashima_et_al:LIPIcs.CPM.2020.26,
  author =	{Nakashima, Katsuhito and Fujisato, Noriki and Hendrian, Diptarama and Nakashima, Yuto and Yoshinaka, Ryo and Inenaga, Shunsuke and Bannai, Hideo and Shinohara, Ayumi and Takeda, Masayuki},
  title =	{{DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{26:1--26:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.26},
  URN =		{urn:nbn:de:0030-drops-121512},
  doi =		{10.4230/LIPIcs.CPM.2020.26},
  annote =	{Keywords: parameterized matching, suffix trees, DAWGs, suffix automata}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.30

Online Algorithms for Constructing Linear-Size Suffix Trie

Authors: Diptarama Hendrian, Takuya Takagi, and Shunsuke Inenaga

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

The suffix trees are fundamental data structures for various kinds of string processing. The suffix tree of a string T of length n has O(n) nodes and edges, and the string label of each edge is encoded by a pair of positions in T. Thus, even after the tree is built, the input text T needs to be kept stored and random access to T is still needed. The linear-size suffix tries (LSTs), proposed by Crochemore et al. [Linear-size suffix tries, TCS 638:171-178, 2016], are a "stand-alone" alternative to the suffix trees. Namely, the LST of a string T of length n occupies O(n) total space, and supports pattern matching and other tasks in the same efficiency as the suffix tree without the need to store the input text T. Crochemore et al. proposed an offline algorithm which transforms the suffix tree of T into the LST of T in O(n log sigma) time and O(n) space, where sigma is the alphabet size. In this paper, we present two types of online algorithms which "directly" construct the LST, from right to left, and from left to right, without constructing the suffix tree as an intermediate structure. Both algorithms construct the LST incrementally when a new symbol is read, and do not access to the previously read symbols. The right-to-left construction algorithm works in O(n log sigma) time and O(n) space and the left-to-right construction algorithm works in O(n (log sigma + log n / log log n)) time and O(n) space. The main feature of our algorithms is that the input text does not need to be stored.

Cite as

Diptarama Hendrian, Takuya Takagi, and Shunsuke Inenaga. Online Algorithms for Constructing Linear-Size Suffix Trie. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 30:1-30:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{hendrian_et_al:LIPIcs.CPM.2019.30,
  author =	{Hendrian, Diptarama and Takagi, Takuya and Inenaga, Shunsuke},
  title =	{{Online Algorithms for Constructing Linear-Size Suffix Trie}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{30:1--30:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.30},
  URN =		{urn:nbn:de:0030-drops-105016},
  doi =		{10.4230/LIPIcs.CPM.2019.30},
  annote =	{Keywords: Indexing structure, Linear-size suffix trie, Online algorithm, Pattern Matching}
}

Search Results

Documents authored by Hendrian, Diptarama

Height-Bounded Lempel-Ziv Encodings

Abstract

Cite as

Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching

Abstract

Cite as

Algorithms for Galois Words: Detection, Factorization, and Rotation

Abstract

Cite as

Parallel Algorithm for Pattern Matching Problems Under Substring Consistent Equivalence Relations

Abstract

Cite as

Fast and Linear-Time String Matching Algorithms Based on the Distances of q-Gram Occurrences

Abstract

Cite as

In-Place Bijective Burrows-Wheeler Transforms

Abstract

Cite as

DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures

Abstract

Cite as

Online Algorithms for Constructing Linear-Size Suffix Trie

Abstract

Cite as

Thanks for your feedback!

Could not send message