Document

**Published in:** LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

This paper investigates the (quasi-)periodicity of a string when the string is edited. A string C is called a cover (as known as a quasi-period) of a string T if each character of T lies within some occurrence of C. By definition, a cover of T must be a border of T; that is, it occurs both as a prefix and as a suffix of T. In this paper, we focus on the changes in the longest border and the shortest cover of a string when the string is edited only once. We propose a data structure of size O(n) that computes the longest border and the shortest cover of the string in O(𝓁 log n) time after an edit operation (either insertion, deletion, or substitution of some string) is applied to the input string T of length n, where 𝓁 is the length of the string being inserted or substituted. The data structure can be constructed in O(n) time given string T.

Kazuki Mitani, Takuya Mieno, Kazuhisa Seto, and Takashi Horiyama. Shortest Cover After Edit. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 24:1-24:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{mitani_et_al:LIPIcs.CPM.2024.24, author = {Mitani, Kazuki and Mieno, Takuya and Seto, Kazuhisa and Horiyama, Takashi}, title = {{Shortest Cover After Edit}}, booktitle = {35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)}, pages = {24:1--24:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-326-3}, ISSN = {1868-8969}, year = {2024}, volume = {296}, editor = {Inenaga, Shunsuke and Puglisi, Simon J.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.24}, URN = {urn:nbn:de:0030-drops-201345}, doi = {10.4230/LIPIcs.CPM.2024.24}, annote = {Keywords: string algorithm, border, cover, quasi-periodicity, dynamic string} }

Document

**Published in:** LIPIcs, Volume 248, 33rd International Symposium on Algorithms and Computation (ISAAC 2022)

A trie 𝒯 is a rooted tree such that each edge is labeled by a single character from the alphabet, and the labels of out-going edges from the same node are mutually distinct. Given a trie 𝒯 with n edges, we show how to compute all distinct palindromes and all maximal palindromes on 𝒯 in O(n) time, in the case of integer alphabets of size polynomial in n. This improves the state-of-the-art O(n log h)-time algorithms by Funakoshi et al. [PSC 2019], where h is the height of 𝒯. Using our new algorithms, the eertree with suffix links for a given trie 𝒯 can readily be obtained in O(n) time. Further, our trie-based O(n)-space data structure allows us to report all distinct palindromes and maximal palindromes in a query string represented in the trie 𝒯, in output optimal time. This is an improvement over an existing (naïve) solution that precomputes and stores all distinct palindromes and maximal palindromes for each and every string in the trie 𝒯 separately, using a total O(n²) preprocessing time and space, and reports them in output optimal time upon query.

Takuya Mieno, Mitsuru Funakoshi, and Shunsuke Inenaga. Computing Palindromes on a Trie in Linear Time. In 33rd International Symposium on Algorithms and Computation (ISAAC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 248, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.ISAAC.2022.15, author = {Mieno, Takuya and Funakoshi, Mitsuru and Inenaga, Shunsuke}, title = {{Computing Palindromes on a Trie in Linear Time}}, booktitle = {33rd International Symposium on Algorithms and Computation (ISAAC 2022)}, pages = {15:1--15:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-258-7}, ISSN = {1868-8969}, year = {2022}, volume = {248}, editor = {Bae, Sang Won and Park, Heejin}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2022.15}, URN = {urn:nbn:de:0030-drops-173006}, doi = {10.4230/LIPIcs.ISAAC.2022.15}, annote = {Keywords: palindromes, suffix trees, tries, labeled trees, eertrees} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Park et al. [TCS 2020] observed that the similarity between two (numerical) strings can be captured by the Cartesian trees: The Cartesian tree of a string is a binary tree recursively constructed by picking up the smallest value of the string as the root of the tree. Two strings of equal length are said to Cartesian-tree match if their Cartesian trees are isomorphic. Park et al. [TCS 2020] introduced the following Cartesian tree substring matching (CTMStr) problem: Given a text string T of length n and a pattern string of length m, find every consecutive substring S = T[i..j] of a text string T such that S and P Cartesian-tree match. They showed how to solve this problem in Õ(n+m) time. In this paper, we introduce the Cartesian tree subsequence matching (CTMSeq) problem, that asks to find every minimal substring S = T[i..j] of T such that S contains a subsequence S' which Cartesian-tree matches P. We prove that the CTMSeq problem can be solved efficiently, in O(m n p(n)) time, where p(n) denotes the update/query time for dynamic predecessor queries. By using a suitable dynamic predecessor data structure, we obtain O(mn log log n)-time and O(n log m)-space solution for CTMSeq. This contrasts CTMSeq with closely related order-preserving subsequence matching (OPMSeq) which was shown to be NP-hard by Bose et al. [IPL 1998].

Tsubasa Oizumi, Takeshi Kai, Takuya Mieno, Shunsuke Inenaga, and Hiroki Arimura. Cartesian Tree Subsequence Matching. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{oizumi_et_al:LIPIcs.CPM.2022.14, author = {Oizumi, Tsubasa and Kai, Takeshi and Mieno, Takuya and Inenaga, Shunsuke and Arimura, Hiroki}, title = {{Cartesian Tree Subsequence Matching}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {14:1--14:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.14}, URN = {urn:nbn:de:0030-drops-161414}, doi = {10.4230/LIPIcs.CPM.2022.14}, annote = {Keywords: string algorithms, pattern matching, Cartesian tree subsequence matching, order preserving matching, episode matching} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Grammar-based compression is a loss-less data compression scheme that represents a given string w by a context-free grammar that generates only w. While computing the smallest grammar which generates a given string w is NP-hard in general, a number of polynomial-time grammar-based compressors which work well in practice have been proposed. RePair, proposed by Larsson and Moffat in 1999, is a grammar-based compressor which recursively replaces all possible occurrences of a most frequently occurring bigrams in the string. Since there can be multiple choices of the most frequent bigrams to replace, different implementations of RePair can result in different grammars. In this paper, we show that the smallest grammars generating the Fibonacci words F_k can be completely characterized by RePair, where F_k denotes the k-th Fibonacci word. Namely, all grammars for F_k generated by any implementation of RePair are the smallest grammars for F_k, and no other grammars can be the smallest for F_k. To the best of our knowledge, Fibonacci words are the first non-trivial infinite family of strings for which RePair is optimal.

Takuya Mieno, Shunsuke Inenaga, and Takashi Horiyama. {RePair} Grammars Are the Smallest Grammars for Fibonacci Words. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 26:1-26:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.CPM.2022.26, author = {Mieno, Takuya and Inenaga, Shunsuke and Horiyama, Takashi}, title = {{\{RePair\} Grammars Are the Smallest Grammars for Fibonacci Words}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {26:1--26:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.26}, URN = {urn:nbn:de:0030-drops-161530}, doi = {10.4230/LIPIcs.CPM.2022.26}, annote = {Keywords: grammar based compression, Fibonacci words, RePair, smallest grammar problem} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

A string w is called a minimal absent word for another string T if w does not occur (as a substring) in T and all proper substrings of w occur in T. State-of-the-art data structures for reporting the set MAW(T) of MAWs from a given string T of length n require O(n) space, can be built in O(n) time, and can report all MAWs in O(|MAW(T)|) time upon a query. This paper initiates the problem of computing MAWs from a compressed representation of a string. In particular, we focus on the most basic compressed representation of a string, run-length encoding (RLE), which represents each maximal run of the same characters a by a^p where p is the length of the run. Let m be the RLE-size of string T. After categorizing the MAWs into five disjoint sets ℳ₁, ℳ₂, ℳ₃, ℳ₄, ℳ₅ using RLE, we present matching upper and lower bounds for the number of MAWs in ℳ_i for i = 1,2,4,5 in terms of RLE-size m, except for ℳ₃ whose size is unbounded by m. We then present a compact O(m)-space data structure that can report all MAWs in optimal O(|MAW(T)|) time.

Tooru Akagi, Kouta Okabe, Takuya Mieno, Yuto Nakashima, and Shunsuke Inenaga. Minimal Absent Words on Run-Length Encoded Strings. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 27:1-27:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{akagi_et_al:LIPIcs.CPM.2022.27, author = {Akagi, Tooru and Okabe, Kouta and Mieno, Takuya and Nakashima, Yuto and Inenaga, Shunsuke}, title = {{Minimal Absent Words on Run-Length Encoded Strings}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {27:1--27:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.27}, URN = {urn:nbn:de:0030-drops-161545}, doi = {10.4230/LIPIcs.CPM.2022.27}, annote = {Keywords: string algorithms, combinatorics on words, minimal absent words, run-length encoding} }

Document

**Published in:** LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Let W be a string of length n over an alphabet Σ, k be a positive integer, and 𝒮 be a set of length-k substrings of W. The ETFS problem (Edit distance, Total order, Frequency, Sanitization) asks us to construct a string X_ED such that: (i) no string of 𝒮 occurs in X_ED; (ii) the order of all other length-k substrings over Σ (and thus the frequency) is the same in W and in X_ED; and (iii) X_ED has minimal edit distance to W. When W represents an individual’s data and 𝒮 represents a set of confidential patterns, the ETFS problem asks for transforming W to preserve its privacy and its utility [Bernardini et al., ECML PKDD 2019].
ETFS can be solved in 𝒪(n²k) time [Bernardini et al., CPM 2020]. The same paper shows that ETFS cannot be solved in 𝒪(n^{2-δ}) time, for any δ > 0, unless the Strong Exponential Time Hypothesis (SETH) is false. Our main results can be summarized as follows:
- An 𝒪(n²log²k)-time algorithm to solve ETFS.
- An 𝒪(n²log²n)-time algorithm to solve AETFS (Arbitrary lengths, Edit distance, Total order, Frequency, Sanitization), a generalization of ETFS in which the elements of 𝒮 can have arbitrary lengths. Our algorithms are thus optimal up to subpolynomial factors, unless SETH fails.
In order to arrive at these results, we develop new techniques for computing a variant of the standard dynamic programming (DP) table for edit distance. In particular, we simulate the DP table computation using a directed acyclic graph in which every node is assigned to a smaller DP table. We then focus on redundancy in these DP tables and exploit a tabulation technique according to dyadic intervals to obtain an optimal alignment in 𝒪̃(n²) total time. Beyond string sanitization, our techniques may inspire solutions to other problems related to regular expressions or context-free grammars.

Takuya Mieno, Solon P. Pissis, Leen Stougie, and Michelle Sweering. String Sanitization Under Edit Distance: Improved and Generalized. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 19:1-19:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.CPM.2021.19, author = {Mieno, Takuya and Pissis, Solon P. and Stougie, Leen and Sweering, Michelle}, title = {{String Sanitization Under Edit Distance: Improved and Generalized}}, booktitle = {32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)}, pages = {19:1--19:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-186-3}, ISSN = {1868-8969}, year = {2021}, volume = {191}, editor = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.19}, URN = {urn:nbn:de:0030-drops-139709}, doi = {10.4230/LIPIcs.CPM.2021.19}, annote = {Keywords: string algorithms, data sanitization, edit distance, dynamic programming} }

Document

**Published in:** LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

A substring Q of a string S is called a shortest unique substring (SUS) for interval [s,t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s,t], and every substring of S which contains interval [s,t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUSs for interval [s,t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s <= t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.

Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Tight Bounds on the Maximum Number of Shortest Unique Substrings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 24:1-24:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.CPM.2017.24, author = {Mieno, Takuya and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki}, title = {{Tight Bounds on the Maximum Number of Shortest Unique Substrings}}, booktitle = {28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)}, pages = {24:1--24:11}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-039-2}, ISSN = {1868-8969}, year = {2017}, volume = {78}, editor = {K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.24}, URN = {urn:nbn:de:0030-drops-73460}, doi = {10.4230/LIPIcs.CPM.2017.24}, annote = {Keywords: shortest unique substrings, maximal unique substrings} }

Document

**Published in:** LIPIcs, Volume 58, 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)

We consider the problem of answering shortest unique substring (SUS) queries on run-length encoded strings. For a string S, a unique substring u = S[i..j] is said to be a shortest unique substring (SUS) of S containing an interval [s, t] (i <= s <= t <= j) if for any i' <= s <= t <= j' with j-i > j'-i', S[i'..j'] occurs at least twice in S.
Given a run-length encoding of size m of a string of length N, we show that we can construct a data structure of size O(m+pi_s(N, m)) in O(m log m + pi_c(N, m)) time such that queries can be answered in
O(pi_q(N, m) + k) time, where k is the size of the output (the number of SUSs), and pi_s(N,m), pi_c(N,m), pi_q(N,m) are, respectively, the size, construction time, and query time for a predecessor/successor query data structure of m elements for the universe of [1,N]. Using the data structure by Beam and Fich (JCSS 2002), this results in a data structure of O(m) space that is constructed in O(m log m) time, and answers queries in O(sqrt(log m/loglog m)+k) time.

Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Shortest Unique Substring Queries on Run-Length Encoded Strings. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 69:1-69:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{mieno_et_al:LIPIcs.MFCS.2016.69, author = {Mieno, Takuya and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki}, title = {{Shortest Unique Substring Queries on Run-Length Encoded Strings}}, booktitle = {41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)}, pages = {69:1--69:11}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-016-3}, ISSN = {1868-8969}, year = {2016}, volume = {58}, editor = {Faliszewski, Piotr and Muscholl, Anca and Niedermeier, Rolf}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2016.69}, URN = {urn:nbn:de:0030-drops-65033}, doi = {10.4230/LIPIcs.MFCS.2016.69}, annote = {Keywords: string algorithms, shortest unique substring, run-length encoding} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail