DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2024.18

Height-Bounded Lempel-Ziv Encodings

Authors: Hideo Bannai, Mitsuru Funakoshi, Diptarama Hendrian, Myuji Matsuda, and Simon J. Puglisi

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)

Abstract

We introduce height-bounded LZ encodings (LZHB), a new family of compressed representations that are variants of Lempel-Ziv parsings with a focus on bounding the worst-case access time to arbitrary positions in the text directly via the compressed representation. An LZ-like encoding is a partitioning of the string into phrases of length 1 which can be encoded literally, or phrases of length at least 2 which have a previous occurrence in the string and can be encoded by its position and length. An LZ-like encoding induces an implicit referencing forest on the set of positions of the string. An LZHB encoding is an LZ-like encoding where the height of the implicit referencing forest is bounded. An LZHB encoding with height constraint h allows access to an arbitrary position of the underlying text using O(h) predecessor queries. While computing the optimal (i.e., smallest) LZHB encoding efficiently seems to be difficult [Cicalese & Ugazio 2024, arXiv, to appear at DLT 2024], we give the first linear time algorithm for strings over a constant size alphabet that computes the greedy LZHB encoding, i.e., the string is processed from beginning to end, and the longest prefix of the remaining string that can satisfy the height constraint is taken as the next phrase. Our algorithms significantly improve both theoretically and practically, the very recently and independently proposed algorithms by Lipták et al. (CPM 2024). We also analyze the size of height bounded LZ encodings in the context of repetitiveness measures, and show that there exists a constant c such that the size ẑ_{HB(clog n)} of the optimal LZHB encoding whose height is bounded by clog n for any string of length n is O(ĝ_{rl}), where ĝ_{rl} is the size of the smallest run-length grammar. Furthermore, we show that there exists a family of strings such that ẑ_{HB(clog n)} = o(ĝ_{rl}), thus making ẑ_{HB(clog n)} one of the smallest known repetitiveness measures for which O(polylog n) time access is possible using linear (O(ẑ_{HB(clog n)})) space.

Cite as

Hideo Bannai, Mitsuru Funakoshi, Diptarama Hendrian, Myuji Matsuda, and Simon J. Puglisi. Height-Bounded Lempel-Ziv Encodings. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.ESA.2024.18,
  author =	{Bannai, Hideo and Funakoshi, Mitsuru and Hendrian, Diptarama and Matsuda, Myuji and Puglisi, Simon J.},
  title =	{{Height-Bounded Lempel-Ziv Encodings}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{18:1--18:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.18},
  URN =		{urn:nbn:de:0030-drops-210899},
  doi =		{10.4230/LIPIcs.ESA.2024.18},
  annote =	{Keywords: Lempel-Ziv parsing, data compression}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2024.75

Edit and Alphabet-Ordering Sensitivity of Lex-Parse

Authors: Yuto Nakashima, Dominik Köppl, Mitsuru Funakoshi, Shunsuke Inenaga, and Hideo Bannai

Published in: LIPIcs, Volume 306, 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)

Abstract

We investigate the compression sensitivity [Akagi et al., 2023] of lex-parse [Navarro et al., 2021] for two operations: (1) single character edit and (2) modification of the alphabet ordering, and give tight upper and lower bounds for both operations (i.e., we show Θ(log n) bounds for strings of length n). For both lower bounds, we use the family of Fibonacci words. For the bounds on edit operations, our analysis makes heavy use of properties of the Lyndon factorization of Fibonacci words to characterize the structure of lex-parse.

Cite as

Yuto Nakashima, Dominik Köppl, Mitsuru Funakoshi, Shunsuke Inenaga, and Hideo Bannai. Edit and Alphabet-Ordering Sensitivity of Lex-Parse. In 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 306, pp. 75:1-75:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{nakashima_et_al:LIPIcs.MFCS.2024.75,
  author =	{Nakashima, Yuto and K\"{o}ppl, Dominik and Funakoshi, Mitsuru and Inenaga, Shunsuke and Bannai, Hideo},
  title =	{{Edit and Alphabet-Ordering Sensitivity of Lex-Parse}},
  booktitle =	{49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)},
  pages =	{75:1--75:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-335-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{306},
  editor =	{Kr\'{a}lovi\v{c}, Rastislav and Ku\v{c}era, Anton{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2024.75},
  URN =		{urn:nbn:de:0030-drops-206314},
  doi =		{10.4230/LIPIcs.MFCS.2024.75},
  annote =	{Keywords: Compression sensitivity, Lex-parse, Fibonacci words}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.3

Optimal LZ-End Parsing Is Hard

Authors: Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, and Takeaki Uno

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

LZ-End is a variant of the well-known Lempel-Ziv parsing family such that each phrase of the parsing has a previous occurrence, with the additional constraint that the previous occurrence must end at the end of a previous phrase. LZ-End was initially proposed as a greedy parsing, where each phrase is determined greedily from left to right, as the longest factor that satisfies the above constraint [Kreft & Navarro, 2010]. In this work, we consider an optimal LZ-End parsing that has the minimum number of phrases in such parsings. We show that a decision version of computing the optimal LZ-End parsing is NP-complete by showing a reduction from the vertex cover problem. Moreover, we give a MAX-SAT formulation for the optimal LZ-End parsing adapting an approach for computing various NP-hard repetitiveness measures recently presented by [Bannai et al., 2022]. We also consider the approximation ratio of the size of greedy LZ-End parsing to the size of the optimal LZ-End parsing, and give a lower bound of the ratio which asymptotically approaches 2.

Cite as

Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, and Takeaki Uno. Optimal LZ-End Parsing Is Hard. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 3:1-3:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2023.3,
  author =	{Bannai, Hideo and Funakoshi, Mitsuru and Kurita, Kazuhiro and Nakashima, Yuto and Seto, Kazuhisa and Uno, Takeaki},
  title =	{{Optimal LZ-End Parsing Is Hard}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{3:1--3:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.3},
  URN =		{urn:nbn:de:0030-drops-179571},
  doi =		{10.4230/LIPIcs.CPM.2023.3},
  annote =	{Keywords: Data Compression, LZ-End, Repetitiveness measures}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2022.15

Computing Palindromes on a Trie in Linear Time

Authors: Takuya Mieno, Mitsuru Funakoshi, and Shunsuke Inenaga

Published in: LIPIcs, Volume 248, 33rd International Symposium on Algorithms and Computation (ISAAC 2022)

Abstract

A trie 𝒯 is a rooted tree such that each edge is labeled by a single character from the alphabet, and the labels of out-going edges from the same node are mutually distinct. Given a trie 𝒯 with n edges, we show how to compute all distinct palindromes and all maximal palindromes on 𝒯 in O(n) time, in the case of integer alphabets of size polynomial in n. This improves the state-of-the-art O(n log h)-time algorithms by Funakoshi et al. [PSC 2019], where h is the height of 𝒯. Using our new algorithms, the eertree with suffix links for a given trie 𝒯 can readily be obtained in O(n) time. Further, our trie-based O(n)-space data structure allows us to report all distinct palindromes and maximal palindromes in a query string represented in the trie 𝒯, in output optimal time. This is an improvement over an existing (naïve) solution that precomputes and stores all distinct palindromes and maximal palindromes for each and every string in the trie 𝒯 separately, using a total O(n²) preprocessing time and space, and reports them in output optimal time upon query.

Cite as

Takuya Mieno, Mitsuru Funakoshi, and Shunsuke Inenaga. Computing Palindromes on a Trie in Linear Time. In 33rd International Symposium on Algorithms and Computation (ISAAC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 248, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.ISAAC.2022.15,
  author =	{Mieno, Takuya and Funakoshi, Mitsuru and Inenaga, Shunsuke},
  title =	{{Computing Palindromes on a Trie in Linear Time}},
  booktitle =	{33rd International Symposium on Algorithms and Computation (ISAAC 2022)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-258-7},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{248},
  editor =	{Bae, Sang Won and Park, Heejin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2022.15},
  URN =		{urn:nbn:de:0030-drops-173006},
  doi =		{10.4230/LIPIcs.ISAAC.2022.15},
  annote =	{Keywords: palindromes, suffix trees, tries, labeled trees, eertrees}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.12

Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Ayumi Shinohara

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

The equidistant subsequence pattern matching problem is considered. Given a pattern string P and a text string T, we say that P is an equidistant subsequence of T if P is a subsequence of the text such that consecutive symbols of P in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield o(n²) time algorithms for finding k-(sub-)cadences and equidistant subsequences. Furthermore, O(nlog² n) and O(nlog n) time algorithms, respectively for equidistant and Abelian equidistant matching for the case |P| = 3, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints.

Cite as

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, and Ayumi Shinohara. Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 12:1-12:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.CPM.2020.12,
  author =	{Funakoshi, Mitsuru and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki and Shinohara, Ayumi},
  title =	{{Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{12:1--12:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.12},
  URN =		{urn:nbn:de:0030-drops-121375},
  doi =		{10.4230/LIPIcs.CPM.2020.12},
  annote =	{Keywords: string algorithms, pattern matching, bit parallelism, subsequences, cadences}
}

Document

DOI: 10.4230/LIPIcs.STACS.2020.30

Non-Rectangular Convolutions and (Sub-)Cadences with Three Elements

Authors: Mitsuru Funakoshi and Julian Pape-Lange

Published in: LIPIcs, Volume 154, 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)

Abstract

The discrete acyclic convolution computes the 2n+1 sums ∑_{i+j=k|(i,j)∈[0,1,2,… ,n]²} a_i b_j in ?(n log n) time. By using suitable offsets and setting some of the variables to zero, this method provides a tool to calculate all non-zero sums ∑_{i+j=k|(i,j)∈ P∩ℤ²} a_i b_j in a rectangle P with perimeter p in ?(p log p) time. This paper extends this geometric interpretation in order to allow arbitrary convex polygons P with k vertices and perimeter p. Also, this extended algorithm only needs ?(k + p(log p)² log k) time. Additionally, this paper presents fast algorithms for counting sub-cadences and cadences with 3 elements using this extended method.

Cite as

Mitsuru Funakoshi and Julian Pape-Lange. Non-Rectangular Convolutions and (Sub-)Cadences with Three Elements. In 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 154, pp. 30:1-30:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.STACS.2020.30,
  author =	{Funakoshi, Mitsuru and Pape-Lange, Julian},
  title =	{{Non-Rectangular Convolutions and (Sub-)Cadences with Three Elements}},
  booktitle =	{37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)},
  pages =	{30:1--30:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-140-5},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{154},
  editor =	{Paul, Christophe and Bl\"{a}ser, Markus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2020.30},
  URN =		{urn:nbn:de:0030-drops-118911},
  doi =		{10.4230/LIPIcs.STACS.2020.30},
  annote =	{Keywords: discrete acyclic convolutions, string-cadences, geometric algorithms, number theoretic transforms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.27

Faster Queries for Longest Substring Palindrome After Block Edit

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. Manacher [J. ACM 1975] proposed a seminal algorithm that computes the longest substring palindromes (LSPals) of a given string in O(n) time, where n is the length of the string. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log log n) time, after a substring in T is replaced by a string of arbitrary length l. This outperforms the query algorithm proposed in our previous work [CPM 2018] that uses O(l + log n) time for each query.

Cite as

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Queries for Longest Substring Palindrome After Block Edit. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 27:1-27:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.CPM.2019.27,
  author =	{Funakoshi, Mitsuru and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster Queries for Longest Substring Palindrome After Block Edit}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{27:1--27:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.27},
  URN =		{urn:nbn:de:0030-drops-104989},
  doi =		{10.4230/LIPIcs.CPM.2019.27},
  annote =	{Keywords: palindromes, string algorithm, periodicity}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.12

Longest substring palindrome after edit

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

It is known that the length of the longest substring palindromes (LSPals) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(log (min {sigma, log n })) time after single character substitution, insertion, or deletion, where sigma denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log n) time, after an existing substring in T is replaced by a string of arbitrary length l.

Cite as

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest substring palindrome after edit. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 12:1-12:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.CPM.2018.12,
  author =	{Funakoshi, Mitsuru and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Longest substring palindrome after edit}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{12:1--12:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.12},
  URN =		{urn:nbn:de:0030-drops-86977},
  doi =		{10.4230/LIPIcs.CPM.2018.12},
  annote =	{Keywords: maximal palindromes, edit operations, periodicity, suffix trees}
}

Search Results

Documents authored by Funakoshi, Mitsuru

Height-Bounded Lempel-Ziv Encodings

Abstract

Cite as

Edit and Alphabet-Ordering Sensitivity of Lex-Parse

Abstract

Cite as

Optimal LZ-End Parsing Is Hard

Abstract

Cite as

Computing Palindromes on a Trie in Linear Time

Abstract

Cite as

Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences

Abstract

Cite as

Non-Rectangular Convolutions and (Sub-)Cadences with Three Elements

Abstract

Cite as

Faster Queries for Longest Substring Palindrome After Block Edit

Abstract

Cite as

Longest substring palindrome after edit

Abstract

Cite as

Thanks for your feedback!

Could not send message