DROPS

Document

DOI: 10.4230/LIPIcs.ISAAC.2019.40

An Improved Data Structure for Left-Right Maximal Generic Words Problem

Authors: Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

Abstract

For a set D of documents and a positive integer d, a string w is said to be d-left-right maximal, if (1) w occurs in at least d documents in D, and (2) any proper superstring of w occurs in less than d documents. The left-right-maximal generic words problem is, given a set D of documents, to preprocess D so that for any string p and for any positive integer d, all the superstrings of p that are d-left-right maximal can be answered quickly. In this paper, we present an O(n log m) space data structure (in words) which answers queries in O(|p| + o log log m) time, where n is the total length of documents in D, m is the number of documents in D and o is the number of outputs. Our solution improves the previous one by Nishimoto et al. (PSC 2015), which uses an O(n log n) space data structure answering queries in O(|p|+ r * log n + o * log^2 n) time, where r is the number of right-extensions q of p occurring in at least d documents such that any proper right extension of q occurs in less than d documents.

Cite as

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. An Improved Data Structure for Left-Right Maximal Generic Words Problem. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 40:1-40:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{fujishige_et_al:LIPIcs.ISAAC.2019.40,
  author =	{Fujishige, Yuta and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{An Improved Data Structure for Left-Right Maximal Generic Words Problem}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{40:1--40:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Lu, Pinyan and Zhang, Guochuan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.40},
  URN =		{urn:nbn:de:0030-drops-115366},
  doi =		{10.4230/LIPIcs.ISAAC.2019.40},
  annote =	{Keywords: generic words, suffix trees, string processing algorithms}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2017.33

Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings

Authors: Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 92, 28th International Symposium on Algorithms and Computation (ISAAC 2017)

Abstract

We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding. Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2. We also show an algorithm for computing all maximal repetitions in O(m \alpha(m)) time and O(m) space, where \alpha denotes the inverse Ackermann function.

Cite as

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 33:1-33:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{fujishige_et_al:LIPIcs.ISAAC.2017.33,
  author =	{Fujishige, Yuta and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings}},
  booktitle =	{28th International Symposium on Algorithms and Computation (ISAAC 2017)},
  pages =	{33:1--33:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-054-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{92},
  editor =	{Okamoto, Yoshio and Tokuyama, Takeshi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2017.33},
  URN =		{urn:nbn:de:0030-drops-82610},
  doi =		{10.4230/LIPIcs.ISAAC.2017.33},
  annote =	{Keywords: maximal repetitions,run length encoding}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.20

Faster STR-IC-LCS Computation via RLE

Authors: Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

The constrained LCS problem asks one to find a longest common subsequence of two input strings A and B with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string C as a substring. Given two strings A and B of respective lengths M and N, and a constraint string C of length at most min{M, N}, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz (Inf. Process. Lett., 11:423-426, 2012), runs in O(MN) time. In this work, we present an O(mN + nM)-time solution to the STR-IC-LCS problem, where m and n denote the sizes of the run-length encodings of A and B, respectively. Since m <= M and n <= N always hold, our algorithm is always as fast as Deorowicz's algorithm, and is faster when input strings are compressible via RLE.

Cite as

Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster STR-IC-LCS Computation via RLE. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 20:1-20:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{kuboi_et_al:LIPIcs.CPM.2017.20,
  author =	{Kuboi, Keita and Fujishige, Yuta and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster STR-IC-LCS Computation via RLE}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{20:1--20:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.20},
  URN =		{urn:nbn:de:0030-drops-73335},
  doi =		{10.4230/LIPIcs.CPM.2017.20},
  annote =	{Keywords: longest common subsequence, STR-IC-LCS, run-length encoding}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2016.38

Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets

Authors: Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

Published in: LIPIcs, Volume 58, 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)

Abstract

The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words of y can be computed in optimal O(n + |MAW(y)|) time and O(n) working space for integer alphabets.

Cite as

Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 38:1-38:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{fujishige_et_al:LIPIcs.MFCS.2016.38,
  author =	{Fujishige, Yuta and Tsujimaru, Yuki and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets}},
  booktitle =	{41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)},
  pages =	{38:1--38:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-016-3},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{58},
  editor =	{Faliszewski, Piotr and Muscholl, Anca and Niedermeier, Rolf},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2016.38},
  URN =		{urn:nbn:de:0030-drops-64528},
  doi =		{10.4230/LIPIcs.MFCS.2016.38},
  annote =	{Keywords: string algorithms, DAWGs, suffix trees, minimal absent words}
}

Search Results

Documents authored by Fujishige, Yuta

An Improved Data Structure for Left-Right Maximal Generic Words Problem

Abstract

Cite as

Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings

Abstract

Cite as

Faster STR-IC-LCS Computation via RLE

Abstract

Cite as

Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets

Abstract

Cite as