Document

**Published in:** LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

The palindrome pattern matching (pal-matching) is a kind of generalized pattern matching, in which two strings x and y of same length are considered to match (pal-match) if they have the same palindromic structures, i.e., for any possible 1 ≤ i < j ≤ |x| = |y|, x[i..j] is a palindrome if and only if y[i..j] is a palindrome. The pal-matching problem is the problem of searching for, in a text, the occurrences of the substrings that pal-match with a pattern. Given a text T of length n over an alphabet of size σ, an index for pal-matching is to support, given a pattern P of length m, the counting queries that compute the number occ of occurrences of P and the locating queries that compute the occurrences of P. The authors in [I et al., Theor. Comput. Sci., 2013] proposed an O(n lg n)-bit data structure to support the counting queries in O(m lg σ) time and the locating queries in O(m lg σ + occ) time. In this paper, we propose an FM-index type index for the pal-matching problem, which we call the PalFM-index, that occupies 2n lg min(σ, lg n) + 2n + o(n) bits of space and supports the counting queries in O(m) time. The PalFM-indexes can support the locating queries in O(m + Δ occ) time by adding n/Δ lg n + n + o(n) bits of space, where Δ is a parameter chosen from {1, 2, … , n} in the preprocessing phase.

Shinya Nagashita and Tomohiro I. PalFM-Index: FM-Index for Palindrome Pattern Matching. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 23:1-23:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{nagashita_et_al:LIPIcs.CPM.2023.23, author = {Nagashita, Shinya and I, Tomohiro}, title = {{PalFM-Index: FM-Index for Palindrome Pattern Matching}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {23:1--23:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.23}, URN = {urn:nbn:de:0030-drops-179772}, doi = {10.4230/LIPIcs.CPM.2023.23}, annote = {Keywords: Palindrome matching, Generalized string pattern matching, Indexing} }

Document

**Published in:** LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse T^R of a text T, to compute offline the LZ77 parse of T in O(n log r) time and O(r) space, where n is the length of T and r is the number of runs in the BWT of T^R. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of T while still using O(n log r) time and O(r) space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further - albeit making it static again and increasing its space by a factor proportional to the size of the alphabet - such that later, given another string S and O(log log n)-time random access to T, we can compute the matching statistics of S with respect to T in O(|S| log log n) time.

Hideo Bannai, Travis Gagie, and Tomohiro I. Online LZ77 Parsing and Matching Statistics with RLBWTs. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 7:1-7:12, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2018.7, author = {Bannai, Hideo and Gagie, Travis and I, Tomohiro}, title = {{Online LZ77 Parsing and Matching Statistics with RLBWTs}}, booktitle = {29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)}, pages = {7:1--7:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-074-3}, ISSN = {1868-8969}, year = {2018}, volume = {105}, editor = {Navarro, Gonzalo and Sankoff, David and Zhu, Binhai}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.7}, URN = {urn:nbn:de:0030-drops-87025}, doi = {10.4230/LIPIcs.CPM.2018.7}, annote = {Keywords: Lempel-Ziv 1977, Matching Statistics, Run-Length Compressed Burrows-Wheeler Transform} }

Document

**Published in:** LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm sqrt{m log m} + N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm^2 + N) time.

Kotaro Aoyama, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Online Elastic Degenerate String Matching. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 9:1-9:10, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{aoyama_et_al:LIPIcs.CPM.2018.9, author = {Aoyama, Kotaro and Nakashima, Yuto and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki}, title = {{Faster Online Elastic Degenerate String Matching}}, booktitle = {29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)}, pages = {9:1--9:10}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-074-3}, ISSN = {1868-8969}, year = {2018}, volume = {105}, editor = {Navarro, Gonzalo and Sankoff, David and Zhu, Binhai}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.9}, URN = {urn:nbn:de:0030-drops-87016}, doi = {10.4230/LIPIcs.CPM.2018.9}, annote = {Keywords: elastic degenerate pattern matching, boolean convolution} }

Document

**Published in:** LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.

Isamu Furuya, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Lyndon Factorization of Grammar Compressed Texts Revisited. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 24:1-24:10, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{furuya_et_al:LIPIcs.CPM.2018.24, author = {Furuya, Isamu and Nakashima, Yuto and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki}, title = {{Lyndon Factorization of Grammar Compressed Texts Revisited}}, booktitle = {29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)}, pages = {24:1--24:10}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-074-3}, ISSN = {1868-8969}, year = {2018}, volume = {105}, editor = {Navarro, Gonzalo and Sankoff, David and Zhu, Binhai}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.24}, URN = {urn:nbn:de:0030-drops-86855}, doi = {10.4230/LIPIcs.CPM.2018.24}, annote = {Keywords: Lyndon word, Lyndon factorization, Straight line program} }

Document

**Published in:** LIPIcs, Volume 87, 25th Annual European Symposium on Algorithms (ESA 2017)

A grammar compression is a context-free grammar (CFG) deriving a single string deterministically. For an input string of length N over an alphabet of size sigma, the smallest CFG is O(log N)-approximable in the offline setting and O(log N log^* N)-approximable in the online setting. In addition, an information-theoretic lower bound for representing a CFG in Chomsky normal form of n variables is log (n!/n^sigma) + n + o(n) bits. Although there is an online grammar compression algorithm that directly computes the succinct encoding of its output CFG with O(log N log^* N) approximation guarantee, the problem of optimizing its working space has remained open. We propose a fully-online algorithm that requires the fewest bits of working space asymptotically equal to the lower bound in O(N log log n) compression time. In addition we propose several techniques to boost grammar compression and show their efficiency by computational experiments.

Yoshimasa Takabatake, Tomohiro I, and Hiroshi Sakamoto. A Space-Optimal Grammar Compression. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 67:1-67:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{takabatake_et_al:LIPIcs.ESA.2017.67, author = {Takabatake, Yoshimasa and I, Tomohiro and Sakamoto, Hiroshi}, title = {{A Space-Optimal Grammar Compression}}, booktitle = {25th Annual European Symposium on Algorithms (ESA 2017)}, pages = {67:1--67:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-049-1}, ISSN = {1868-8969}, year = {2017}, volume = {87}, editor = {Pruhs, Kirk and Sohler, Christian}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2017.67}, URN = {urn:nbn:de:0030-drops-78640}, doi = {10.4230/LIPIcs.ESA.2017.67}, annote = {Keywords: Grammar compression, fully-online algorithm, succinct data structure} }

Document

**Published in:** LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Given two positions i and j in a string T of length N, a longest common extension (LCE) query asks for the length of the longest common prefix between suffixes beginning at i and j. A compressed LCE data structure stores T in a compressed form while supporting fast LCE queries. In this article we show that the recompression technique is a powerful tool for compressed LCE data structures. We present a new compressed LCE data structure of size O(z lg (N/z)) that supports LCE queries in O(lg N) time, where z is the size of Lempel-Ziv 77 factorization without self-reference of T. Given T as an uncompressed form, we show how to build our data structure in O(N) time and space. Given T as a grammar compressed form, i.e., a straight-line program of size n generating T, we show how to build our data structure in O(n lg (N/n)) time and O(n + z lg (N/z)) space. Our algorithms are deterministic and always return correct answers.

Tomohiro I. Longest Common Extensions with Recompression. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 18:1-18:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{i:LIPIcs.CPM.2017.18, author = {I, Tomohiro}, title = {{Longest Common Extensions with Recompression}}, booktitle = {28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)}, pages = {18:1--18:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-039-2}, ISSN = {1868-8969}, year = {2017}, volume = {78}, editor = {K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.18}, URN = {urn:nbn:de:0030-drops-73234}, doi = {10.4230/LIPIcs.CPM.2017.18}, annote = {Keywords: Longest Common Extension (LCE) queries, compressed data structure, grammar compressed strings, Straight-Line Program (SLP)} }

Document

**Published in:** LIPIcs, Volume 58, 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)

A Longest Common Extension (LCE) query on a text T of length N asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding G of size w = O(min(z log N log^* M, N)) [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of T, which can be seen as a compressed representation of T, has a capability to support LCE queries in O(log N + log ell log^* M) time, where ell is the answer to the query, z is the size of the Lempel-Ziv77 (LZ77) factorization of T, and M >= 4N is an integer that can be handled in constant time under word RAM model. In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover, G can be enhanced to support efficient update operations: After processing G in O(w f_A) time, we can insert/delete any (sub)string of length y into/from an arbitrary position of T in O((y + log Nlog^* M) f_A) time, where f_A = O(min{ (loglog M loglog w)/(logloglog M), sqrt(log w/loglog w)}). This yields the first fully dynamic LCE data structure working in compressed space. We also present efficient construction algorithms from various types of inputs: We can construct G in O(N f_A) time from uncompressed string T; in O(n loglog (n log^* M) log N log^* M) time from grammar-compressed string T represented by a straight-line program of size n; and in O(z f_A log N log^* M) time from LZ77-compressed string T with z factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing.

Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully Dynamic Data Structure for LCE Queries in Compressed Space. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 72:1-72:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{nishimoto_et_al:LIPIcs.MFCS.2016.72, author = {Nishimoto, Takaaki and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki}, title = {{Fully Dynamic Data Structure for LCE Queries in Compressed Space}}, booktitle = {41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)}, pages = {72:1--72:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-016-3}, ISSN = {1868-8969}, year = {2016}, volume = {58}, editor = {Faliszewski, Piotr and Muscholl, Anca and Niedermeier, Rolf}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2016.72}, URN = {urn:nbn:de:0030-drops-65045}, doi = {10.4230/LIPIcs.MFCS.2016.72}, annote = {Keywords: dynamic texts, longest common extension (LCE) queries, straight-line program} }

Document

**Published in:** LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Given a string S of n symbols, a longest common extension query LCE(i,j) asks for the length of the longest common prefix of the $i$th and $j$th suffixes of S. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015:65-76) described several data structures for answering LCE queries that offers a space-time trade-off between data structure size and query time. In particular, for a parameter 1 <= tau <= n, their best deterministic solution is a data structure of size O(n/tau) which allows LCE queries to be answered in O(tau) time. However, the construction time for all deterministic versions of their data structure is quadratic in n. In this paper, we propose a deterministic solution that achieves a similar space-time trade-off of O(tau * min(log(tau),log(n/tau)) query time using O(n/tau) space, but significantly improve the construction time to O(n*tau).

Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda. Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 1:1-1:10, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{tanimura_et_al:LIPIcs.CPM.2016.1, author = {Tanimura, Yuka and I, Tomohiro and Bannai, Hideo and Inenaga, Shunsuke and Puglisi, Simon J. and Takeda, Masayuki}, title = {{Deterministic Sub-Linear Space LCE Data Structures With Efficient Construction}}, booktitle = {27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)}, pages = {1:1--1:10}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-012-5}, ISSN = {1868-8969}, year = {2016}, volume = {54}, editor = {Grossi, Roberto and Lewenstein, Moshe}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.1}, URN = {urn:nbn:de:0030-drops-60655}, doi = {10.4230/LIPIcs.CPM.2016.1}, annote = {Keywords: longest common extension, longest common prefix, sparse suffix array} }

Document

**Published in:** LIPIcs, Volume 47, 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)

For alpha >=1, an alpha-gapped repeat in a word w is a factor uvu of w such that |uv| <= alpha * |u|; the two occurrences of a factor u in such a repeat are called arms. Such a repeat is called maximal if its arms cannot be extended simultaneously with the same symbol to the right nor to the left. We show that the number of all maximal alpha-gapped repeats occurring in words of length n is upper bounded by 18 * alpha * n, allowing us to construct an algorithm finding all maximal alpha-gapped repeats of a word on an integer alphabet of size n^{O}(1)} in {O}(alpha * n) time. This result is optimal as there are words that have Theta(alpha * n) maximal alpha-gapped repeats. Our techniques can be extended to get comparable results in the case of alpha-gapped palindromes, i.e., factors uvu^{T} with |uv| <= alpha |u|.

Pawel Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, and Florin Manea. Efficiently Finding All Maximal alpha-gapped Repeats. In 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 47, pp. 39:1-39:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2016.39, author = {Gawrychowski, Pawel and I, Tomohiro and Inenaga, Shunsuke and K\"{o}ppl, Dominik and Manea, Florin}, title = {{Efficiently Finding All Maximal alpha-gapped Repeats}}, booktitle = {33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)}, pages = {39:1--39:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-001-9}, ISSN = {1868-8969}, year = {2016}, volume = {47}, editor = {Ollinger, Nicolas and Vollmer, Heribert}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2016.39}, URN = {urn:nbn:de:0030-drops-57408}, doi = {10.4230/LIPIcs.STACS.2016.39}, annote = {Keywords: combinatorics on words, counting algorithms} }

Document

**Published in:** LIPIcs, Volume 25, 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)

The sparse suffix sorting problem is to sort b=o(n) arbitrary suffixes of a string of length n using o(n) words of space in addition to the string. We present an O(n) time Monte Carlo algorithm using O(b.log(b)) space and an O(n.log(b)) time Las Vegas algorithm using O(b) space. This is a significant improvement over the best prior solutions of [Bille et al., ICALP 2013]: a Monte Carlo algorithm running in O(n.log(b)) time and O(b^(1+e)) space or O(n.log^2(b)) time and O(b) space, and a Las Vegas algorithm running in O(n.log^2(b)+b^2.log(b)) time and O(b) space. All the above results are obtained with high probability not just in expectation.

Tomohiro I, Juha Kärkkäinen, and Dominik Kempa. Faster Sparse Suffix Sorting. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 25, pp. 386-396, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{i_et_al:LIPIcs.STACS.2014.386, author = {I, Tomohiro and K\"{a}rkk\"{a}inen, Juha and Kempa, Dominik}, title = {{Faster Sparse Suffix Sorting}}, booktitle = {31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)}, pages = {386--396}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-65-1}, ISSN = {1868-8969}, year = {2014}, volume = {25}, editor = {Mayr, Ernst W. and Portier, Natacha}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2014.386}, URN = {urn:nbn:de:0030-drops-44738}, doi = {10.4230/LIPIcs.STACS.2014.386}, annote = {Keywords: string algorithms, sparse suffix sorting, sparse suffix trees, Karp-Rabin fingerprints, space-time tradeoffs} }

Document

**Published in:** LIPIcs, Volume 25, 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)

We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in O(N.log(N)) time and uses only O(N.log(s)) bits of working space, where N is the length of the string and s is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either O(N.log^3(N)) time [Okanohara and Sadakane, 2009] or O(N.log^2(N)) time [Starikovskaya, 2012]. The key to our new algorithm is in the utilization of an elegant but less popular index structure called Directed Acyclic Word Graphs, or DAWGs [Blumer et al., 1985]. We also present an opportunistic variant of our algorithm, which, given the run length encoding of size m of a string of length N, computes the Lempel-Ziv factorization of the string on-line, in O(m.min{log(log(m)).log(log(N))/(log(log(log(N)))), (log(m))^{1/2}/(log(log(m)))^{1/2})}) time and O(m.log(N)) bits of space.

Jun'ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Faster Compact On-Line Lempel-Ziv Factorization. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 25, pp. 675-686, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{yamamoto_et_al:LIPIcs.STACS.2014.675, author = {Yamamoto, Jun'ichi and I, Tomohiro and Bannai, Hideo and Inenaga, Shunsuke and Takeda, Masayuki}, title = {{Faster Compact On-Line Lempel-Ziv Factorization}}, booktitle = {31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)}, pages = {675--686}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-65-1}, ISSN = {1868-8969}, year = {2014}, volume = {25}, editor = {Mayr, Ernst W. and Portier, Natacha}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2014.675}, URN = {urn:nbn:de:0030-drops-44976}, doi = {10.4230/LIPIcs.STACS.2014.675}, annote = {Keywords: Lempel-Ziv Factorization, String Index} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail