DROPS

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.89

Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching

Authors: Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

A parameterized string (p-string) is a string over an alphabet (Σ_s ∪ Σ_p), where Σ_s and Σ_p are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings x and y are said to parameterized match (p-match) if and only if x can be transformed into y by applying a bijection on Σ_p to every occurrence of p-symbols in x. The indexing problem for p-matching is to preprocess a p-string T of length n so that we can efficiently find the occurrences of substrings of T that p-match with a given pattern. Let σ_s and respectively σ_p be the numbers of distinct s-symbols and p-symbols that appear in T and σ = σ_s + σ_p. Extending the Burrows-Wheeler Transform (BWT) based index for exact string pattern matching, Ganguly et al. [SODA 2017] proposed parameterized BWTs (pBWTs) to design the first compact index for p-matching, and posed an open problem on how to construct the pBWT-based index in compact space, i.e., in O(n lg |Σ_s ∪ Σ_p|) bits of space. Hashimoto et al. [SPIRE 2022] showed how to construct the pBWT for T, under the assumption that Σ_s ∪ Σ_p = [0..O(σ)], in O(n lg σ) bits of space and O(n (σ_p lg n)/(lg lg n)) time in an online manner while reading the symbols of T from right to left. In this paper, we refine Hashimoto et al.’s algorithm to work in O(n lg σ) bits of space and O(n (lg σ_p lg n)/(lg lg n)) time in a more general assumption that Σ_s ∪ Σ_p = [0..n^{O(1)}]. Our result has an immediate application to constructing parameterized suffix arrays in O(n (lg σ_p lg n)/(lg lg n)) time and O(n lg σ) bits of working space. We also show that our data structure can support backward search, a core procedure of BWT-based indexes, at any stage of the online construction, making it the first compact index for p-matching that can be constructed in compact space and even in an online manner.

Cite as

Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara. Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 89:1-89:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{iseri_et_al:LIPIcs.ICALP.2024.89,
  author =	{Iseri, Kento and I, Tomohiro and Hendrian, Diptarama and K\"{o}ppl, Dominik and Yoshinaka, Ryo and Shinohara, Ayumi},
  title =	{{Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{89:1--89:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.89},
  URN =		{urn:nbn:de:0030-drops-202324},
  doi =		{10.4230/LIPIcs.ICALP.2024.89},
  annote =	{Keywords: Index for parameterized pattern matching, Parameterized Burrows-Wheeler Transform, Online construction}
}

@InProceedings{iseri_et_al:LIPIcs.ICALP.2024.89,
  author =	{Iseri, Kento and I, Tomohiro and Hendrian, Diptarama and K\"{o}ppl, Dominik and Yoshinaka, Ryo and Shinohara, Ayumi},
  title =	{{Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{89:1--89:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.89},
  URN =		{urn:nbn:de:0030-drops-202324},
  doi =		{10.4230/LIPIcs.ICALP.2024.89},
  annote =	{Keywords: Index for parameterized pattern matching, Parameterized Burrows-Wheeler Transform, Online construction}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.23

PalFM-Index: FM-Index for Palindrome Pattern Matching

Authors: Shinya Nagashita and Tomohiro I

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

The palindrome pattern matching (pal-matching) is a kind of generalized pattern matching, in which two strings x and y of same length are considered to match (pal-match) if they have the same palindromic structures, i.e., for any possible 1 ≤ i < j ≤ |x| = |y|, x[i..j] is a palindrome if and only if y[i..j] is a palindrome. The pal-matching problem is the problem of searching for, in a text, the occurrences of the substrings that pal-match with a pattern. Given a text T of length n over an alphabet of size σ, an index for pal-matching is to support, given a pattern P of length m, the counting queries that compute the number occ of occurrences of P and the locating queries that compute the occurrences of P. The authors in [I et al., Theor. Comput. Sci., 2013] proposed an O(n lg n)-bit data structure to support the counting queries in O(m lg σ) time and the locating queries in O(m lg σ + occ) time. In this paper, we propose an FM-index type index for the pal-matching problem, which we call the PalFM-index, that occupies 2n lg min(σ, lg n) + 2n + o(n) bits of space and supports the counting queries in O(m) time. The PalFM-indexes can support the locating queries in O(m + Δ occ) time by adding n/Δ lg n + n + o(n) bits of space, where Δ is a parameter chosen from {1, 2, … , n} in the preprocessing phase.

Cite as

Shinya Nagashita and Tomohiro I. PalFM-Index: FM-Index for Palindrome Pattern Matching. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 23:1-23:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{nagashita_et_al:LIPIcs.CPM.2023.23,
  author =	{Nagashita, Shinya and I, Tomohiro},
  title =	{{PalFM-Index: FM-Index for Palindrome Pattern Matching}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{23:1--23:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.23},
  URN =		{urn:nbn:de:0030-drops-179772},
  doi =		{10.4230/LIPIcs.CPM.2023.23},
  annote =	{Keywords: Palindrome matching, Generalized string pattern matching, Indexing}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.7

Online LZ77 Parsing and Matching Statistics with RLBWTs

Authors: Hideo Bannai, Travis Gagie, and Tomohiro I

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse T^R of a text T, to compute offline the LZ77 parse of T in O(n log r) time and O(r) space, where n is the length of T and r is the number of runs in the BWT of T^R. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of T while still using O(n log r) time and O(r) space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further - albeit making it static again and increasing its space by a factor proportional to the size of the alphabet - such that later, given another string S and O(log log n)-time random access to T, we can compute the matching statistics of S with respect to T in O(|S| log log n) time.

Cite as

Hideo Bannai, Travis Gagie, and Tomohiro I. Online LZ77 Parsing and Matching Statistics with RLBWTs. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 7:1-7:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2018.7,
  author =	{Bannai, Hideo and Gagie, Travis and I, Tomohiro},
  title =	{{Online LZ77 Parsing and Matching Statistics with RLBWTs}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{7:1--7:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.7},
  URN =		{urn:nbn:de:0030-drops-87025},
  doi =		{10.4230/LIPIcs.CPM.2018.7},
  annote =	{Keywords: Lempel-Ziv 1977, Matching Statistics, Run-Length Compressed Burrows-Wheeler Transform}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.9

Faster Online Elastic Degenerate String Matching

Authors: Kotaro Aoyama, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm sqrt{m log m} + N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm^2 + N) time.

Cite as

Kotaro Aoyama, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Online Elastic Degenerate String Matching. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 9:1-9:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{aoyama_et_al:LIPIcs.CPM.2018.9,
  author =	{Aoyama, Kotaro and Nakashima, Yuto and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster Online Elastic Degenerate String Matching}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{9:1--9:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.9},
  URN =		{urn:nbn:de:0030-drops-87016},
  doi =		{10.4230/LIPIcs.CPM.2018.9},
  annote =	{Keywords: elastic degenerate pattern matching, boolean convolution}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.24

Lyndon Factorization of Grammar Compressed Texts Revisited

Authors: Isamu Furuya, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.

Cite as

Isamu Furuya, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Lyndon Factorization of Grammar Compressed Texts Revisited. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 24:1-24:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{furuya_et_al:LIPIcs.CPM.2018.24,
  author =	{Furuya, Isamu and Nakashima, Yuto and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Lyndon Factorization of Grammar Compressed Texts Revisited}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{24:1--24:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.24},
  URN =		{urn:nbn:de:0030-drops-86855},
  doi =		{10.4230/LIPIcs.CPM.2018.24},
  annote =	{Keywords: Lyndon word, Lyndon factorization, Straight line program}
}

Document

DOI: 10.4230/LIPIcs.ESA.2017.67

A Space-Optimal Grammar Compression

Authors: Yoshimasa Takabatake, Tomohiro I, and Hiroshi Sakamoto

Published in: LIPIcs, Volume 87, 25th Annual European Symposium on Algorithms (ESA 2017)

Abstract

A grammar compression is a context-free grammar (CFG) deriving a single string deterministically. For an input string of length N over an alphabet of size sigma, the smallest CFG is O(log N)-approximable in the offline setting and O(log N log^* N)-approximable in the online setting. In addition, an information-theoretic lower bound for representing a CFG in Chomsky normal form of n variables is log (n!/n^sigma) + n + o(n) bits. Although there is an online grammar compression algorithm that directly computes the succinct encoding of its output CFG with O(log N log^* N) approximation guarantee, the problem of optimizing its working space has remained open. We propose a fully-online algorithm that requires the fewest bits of working space asymptotically equal to the lower bound in O(N log log n) compression time. In addition we propose several techniques to boost grammar compression and show their efficiency by computational experiments.

Cite as

Yoshimasa Takabatake, Tomohiro I, and Hiroshi Sakamoto. A Space-Optimal Grammar Compression. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 67:1-67:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{takabatake_et_al:LIPIcs.ESA.2017.67,
  author =	{Takabatake, Yoshimasa and I, Tomohiro and Sakamoto, Hiroshi},
  title =	{{A Space-Optimal Grammar Compression}},
  booktitle =	{25th Annual European Symposium on Algorithms (ESA 2017)},
  pages =	{67:1--67:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-049-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{87},
  editor =	{Pruhs, Kirk and Sohler, Christian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2017.67},
  URN =		{urn:nbn:de:0030-drops-78640},
  doi =		{10.4230/LIPIcs.ESA.2017.67},
  annote =	{Keywords: Grammar compression, fully-online algorithm, succinct data structure}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.18

Longest Common Extensions with Recompression

Authors: Tomohiro I

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

Given two positions i and j in a string T of length N, a longest common extension (LCE) query asks for the length of the longest common prefix between suffixes beginning at i and j. A compressed LCE data structure stores T in a compressed form while supporting fast LCE queries. In this article we show that the recompression technique is a powerful tool for compressed LCE data structures. We present a new compressed LCE data structure of size O(z lg (N/z)) that supports LCE queries in O(lg N) time, where z is the size of Lempel-Ziv 77 factorization without self-reference of T. Given T as an uncompressed form, we show how to build our data structure in O(N) time and space. Given T as a grammar compressed form, i.e., a straight-line program of size n generating T, we show how to build our data structure in O(n lg (N/n)) time and O(n + z lg (N/z)) space. Our algorithms are deterministic and always return correct answers.

Cite as

Tomohiro I. Longest Common Extensions with Recompression. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 18:1-18:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{i:LIPIcs.CPM.2017.18,
  author =	{I, Tomohiro},
  title =	{{Longest Common Extensions with Recompression}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{18:1--18:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.18},
  URN =		{urn:nbn:de:0030-drops-73234},
  doi =		{10.4230/LIPIcs.CPM.2017.18},
  annote =	{Keywords: Longest Common Extension (LCE) queries, compressed data structure, grammar compressed strings, Straight-Line Program (SLP)}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2016.72

Fully Dynamic Data Structure for LCE Queries in Compressed Space

Authors: Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 58, 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)

Abstract

A Longest Common Extension (LCE) query on a text T of length N asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding G of size w = O(min(z log N log^* M, N)) [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of T, which can be seen as a compressed representation of T, has a capability to support LCE queries in O(log N + log ell log^* M) time, where ell is the answer to the query, z is the size of the Lempel-Ziv77 (LZ77) factorization of T, and M >= 4N is an integer that can be handled in constant time under word RAM model. In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover, G can be enhanced to support efficient update operations: After processing G in O(w f_A) time, we can insert/delete any (sub)string of length y into/from an arbitrary position of T in O((y + log Nlog^* M) f_A) time, where f_A = O(min{ (loglog M loglog w)/(logloglog M), sqrt(log w/loglog w)}). This yields the first fully dynamic LCE data structure working in compressed space. We also present efficient construction algorithms from various types of inputs: We can construct G in O(N f_A) time from uncompressed string T; in O(n loglog (n log^* M) log N log^* M) time from grammar-compressed string T represented by a straight-line program of size n; and in O(z f_A log N log^* M) time from LZ77-compressed string T with z factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing.

Cite as

Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully Dynamic Data Structure for LCE Queries in Compressed Space. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 72:1-72:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{nishimoto_et_al:LIPIcs.MFCS.2016.72,
  author =	{Nishimoto, Takaaki and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Fully Dynamic Data Structure for LCE Queries in Compressed Space}},
  booktitle =	{41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)},
  pages =	{72:1--72:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-016-3},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{58},
  editor =	{Faliszewski, Piotr and Muscholl, Anca and Niedermeier, Rolf},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2016.72},
  URN =		{urn:nbn:de:0030-drops-65045},
  doi =		{10.4230/LIPIcs.MFCS.2016.72},
  annote =	{Keywords: dynamic texts, longest common extension (LCE) queries, straight-line program}
}

Document

DOI: 10.4230/LIPIcs.CPM.2016.1

Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction

Authors: Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Abstract

Given a string S of n symbols, a longest common extension query LCE(i,j) asks for the length of the longest common prefix of the $i$th and $j$th suffixes of S. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015:65-76) described several data structures for answering LCE queries that offers a space-time trade-off between data structure size and query time. In particular, for a parameter 1 <= tau <= n, their best deterministic solution is a data structure of size O(n/tau) which allows LCE queries to be answered in O(tau) time. However, the construction time for all deterministic versions of their data structure is quadratic in n. In this paper, we propose a deterministic solution that achieves a similar space-time trade-off of O(tau * min(log(tau),log(n/tau)) query time using O(n/tau) space, but significantly improve the construction time to O(n*tau).

Cite as

Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda. Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 1:1-1:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{tanimura_et_al:LIPIcs.CPM.2016.1,
  author =	{Tanimura, Yuka and I, Tomohiro and Bannai, Hideo and Inenaga, Shunsuke and Puglisi, Simon J. and Takeda, Masayuki},
  title =	{{Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{1:1--1:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.1},
  URN =		{urn:nbn:de:0030-drops-60655},
  doi =		{10.4230/LIPIcs.CPM.2016.1},
  annote =	{Keywords: longest common extension, longest common prefix, sparse suffix array}
}

Document

DOI: 10.4230/LIPIcs.STACS.2016.39

Efficiently Finding All Maximal alpha-gapped Repeats

Authors: Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea

Published in: LIPIcs, Volume 47, 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)

Abstract

For alpha >=1, an alpha-gapped repeat in a word w is a factor uvu of w such that |uv| <= alpha * |u|; the two occurrences of a factor u in such a repeat are called arms. Such a repeat is called maximal if its arms cannot be extended simultaneously with the same symbol to the right nor to the left. We show that the number of all maximal alpha-gapped repeats occurring in words of length n is upper bounded by 18 * alpha * n, allowing us to construct an algorithm finding all maximal alpha-gapped repeats of a word on an integer alphabet of size n^{O}(1)} in {O}(alpha * n) time. This result is optimal as there are words that have Theta(alpha * n) maximal alpha-gapped repeats. Our techniques can be extended to get comparable results in the case of alpha-gapped palindromes, i.e., factors uvu^{T} with |uv| <= alpha |u|.

Cite as

Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea. Efficiently Finding All Maximal alpha-gapped Repeats. In 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 47, pp. 39:1-39:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2016.39,
  author =	{Gawrychowski, Pawel and I, Tomohiro and Inenaga, Shunsuke and K\"{o}ppl, Dominik and Manea, Florin},
  title =	{{Efficiently Finding All Maximal alpha-gapped Repeats}},
  booktitle =	{33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)},
  pages =	{39:1--39:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-001-9},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{47},
  editor =	{Ollinger, Nicolas and Vollmer, Heribert},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2016.39},
  URN =		{urn:nbn:de:0030-drops-57408},
  doi =		{10.4230/LIPIcs.STACS.2016.39},
  annote =	{Keywords: combinatorics on words, counting algorithms}
}

Document

DOI: 10.4230/LIPIcs.STACS.2014.386

Faster Sparse Suffix Sorting

Authors: Tomohiro I, Juha Kärkkäinen, and Dominik Kempa

Published in: LIPIcs, Volume 25, 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)

Abstract

The sparse suffix sorting problem is to sort b=o(n) arbitrary suffixes of a string of length n using o(n) words of space in addition to the string. We present an O(n) time Monte Carlo algorithm using O(b.log(b)) space and an O(n.log(b)) time Las Vegas algorithm using O(b) space. This is a significant improvement over the best prior solutions of [Bille et al., ICALP 2013]: a Monte Carlo algorithm running in O(n.log(b)) time and O(b^(1+e)) space or O(n.log^2(b)) time and O(b) space, and a Las Vegas algorithm running in O(n.log^2(b)+b^2.log(b)) time and O(b) space. All the above results are obtained with high probability not just in expectation.

Cite as

Tomohiro I, Juha Kärkkäinen, and Dominik Kempa. Faster Sparse Suffix Sorting. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 25, pp. 386-396, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{i_et_al:LIPIcs.STACS.2014.386,
  author =	{I, Tomohiro and K\"{a}rkk\"{a}inen, Juha and Kempa, Dominik},
  title =	{{Faster Sparse Suffix Sorting}},
  booktitle =	{31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)},
  pages =	{386--396},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-65-1},
  ISSN =	{1868-8969},
  year =	{2014},
  volume =	{25},
  editor =	{Mayr, Ernst W. and Portier, Natacha},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2014.386},
  URN =		{urn:nbn:de:0030-drops-44738},
  doi =		{10.4230/LIPIcs.STACS.2014.386},
  annote =	{Keywords: string algorithms, sparse suffix sorting, sparse suffix trees, Karp-Rabin fingerprints, space-time tradeoffs}
}

Document

DOI: 10.4230/LIPIcs.STACS.2014.675

Faster Compact On-Line Lempel-Ziv Factorization

Authors: Jun'ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda

Published in: LIPIcs, Volume 25, 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)

Abstract

We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in O(N.log(N)) time and uses only O(N.log(s)) bits of working space, where N is the length of the string and s is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either O(N.log^3(N)) time [Okanohara and Sadakane, 2009] or O(N.log^2(N)) time [Starikovskaya, 2012]. The key to our new algorithm is in the utilization of an elegant but less popular index structure called Directed Acyclic Word Graphs, or DAWGs [Blumer et al., 1985]. We also present an opportunistic variant of our algorithm, which, given the run length encoding of size m of a string of length N, computes the Lempel-Ziv factorization of the string on-line, in O(m.min{log(log(m)).log(log(N))/(log(log(log(N)))), (log(m))^{1/2}/(log(log(m)))^{1/2})}) time and O(m.log(N)) bits of space.

Cite as

Jun'ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Faster Compact On-Line Lempel-Ziv Factorization. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 25, pp. 675-686, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{yamamoto_et_al:LIPIcs.STACS.2014.675,
  author =	{Yamamoto, Jun'ichi and I, Tomohiro and Bannai, Hideo and Inenaga, Shunsuke and Takeda, Masayuki},
  title =	{{Faster Compact On-Line Lempel-Ziv Factorization}},
  booktitle =	{31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)},
  pages =	{675--686},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-65-1},
  ISSN =	{1868-8969},
  year =	{2014},
  volume =	{25},
  editor =	{Mayr, Ernst W. and Portier, Natacha},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2014.675},
  URN =		{urn:nbn:de:0030-drops-44976},
  doi =		{10.4230/LIPIcs.STACS.2014.675},
  annote =	{Keywords: Lempel-Ziv Factorization, String Index}
}

Search Results

Documents authored by I, Tomohiro

Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching

Abstract

Cite as

PalFM-Index: FM-Index for Palindrome Pattern Matching

Abstract

Cite as

Online LZ77 Parsing and Matching Statistics with RLBWTs

Abstract

Cite as

Faster Online Elastic Degenerate String Matching

Abstract

Cite as

Lyndon Factorization of Grammar Compressed Texts Revisited

Abstract

Cite as

A Space-Optimal Grammar Compression

Abstract

Cite as

Longest Common Extensions with Recompression

Abstract

Cite as

Fully Dynamic Data Structure for LCE Queries in Compressed Space

Abstract

Cite as

Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction

Abstract

Cite as

Efficiently Finding All Maximal alpha-gapped Repeats

Abstract

Cite as

Faster Sparse Suffix Sorting

Abstract

Cite as

Faster Compact On-Line Lempel-Ziv Factorization

Abstract

Cite as

Thanks for your feedback!

Could not send message