Shortest Unique Substring Queries on Run-Length Encoded Strings

Mieno, Takuya; Inenaga, Shunsuke; Bannai, Hideo; Takeda, Masayuki

doi:10.4230/LIPIcs.MFCS.2016.69

File

Author Details

Takuya Mieno

Shunsuke Inenaga

Hideo Bannai

Masayuki Takeda

Cite AsGet BibTex

Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Shortest Unique Substring Queries on Run-Length Encoded Strings. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 69:1-69:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.MFCS.2016.69

Abstract

We consider the problem of answering shortest unique substring (SUS) queries on run-length encoded strings. For a string S, a unique substring u = S[i..j] is said to be a shortest unique substring (SUS) of S containing an interval [s, t] (i <= s <= t <= j) if for any i' <= s <= t <= j' with j-i > j'-i', S[i'..j'] occurs at least twice in S. Given a run-length encoding of size m of a string of length N, we show that we can construct a data structure of size O(m+pi_s(N, m)) in O(m log m + pi_c(N, m)) time such that queries can be answered in O(pi_q(N, m) + k) time, where k is the size of the output (the number of SUSs), and pi_s(N,m), pi_c(N,m), pi_q(N,m) are, respectively, the size, construction time, and query time for a predecessor/successor query data structure of m elements for the universe of [1,N]. Using the data structure by Beam and Fich (JCSS 2002), this results in a data structure of O(m) space that is constructed in O(m log m) time, and answers queries in O(sqrt(log m/loglog m)+k) time.

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Amihood Amir, Gary Benson, and Martin Farach. Let sleeping files lie: pattern matching in z-compressed files. Journal of Computer and System Sciences, 52(2):299-307, April 1996.
Alberto Apostolico, Gad M. Landau, and Steven Skiena. Matching for run-length encoded strings. J. Complex., 15(1):4-16, March 1999. URL: http://dx.doi.org/10.1006/jcom.1998.0493.
Ora Arbell, Gad M. Landau, and Joseph S.B. Mitchell. Edit distance of run-length encoded strings. Information Processing Letters, 83(6):307-314, 2002. URL: http://dx.doi.org/10.1016/S0020-0190(02)00215-6.
Golnaz Badkobeh, Gabriele Fici, Steve Kroon, and Zsuzsanna Lipták. Binary jumbled string matching for highly run-length compressible texts. Information Processing Letters, 113(17):604-608, 2013. URL: http://dx.doi.org/10.1016/j.ipl.2013.05.007.
Paul Beame and Faith E. Fich. Optimal bounds for the predecessor problem and related problems. Journal of Computer and System Sciences, 65(1):38-72, 2002. URL: http://dx.doi.org/10.1006/jcss.2002.1822.
Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Proceedings of the 4th Latin American Symposium on Theoretical Informatics, LATIN 2000, pages 88-94, 2000. URL: http://dl.acm.org/citation.cfm?id=646388.690192.
Horst Bunke and János Csirik. An algorithm for matching run-length coded strings. Computing, 50(4):297-314, 1993.
Horst Bunke and János Csirik. An improved algorithm for computing the edit distance of run-length coded strings. Information Processing Letters, 54(2):93-96, April 1995.
Kuan-Yu Chen, Ping-Hui Hsu, and Kun-Mao Chao. Efficient retrieval of approximate palindromes in a run-length encoded string. Theoretical Computer Science, 432:28-37, 2012. URL: http://dx.doi.org/10.1016/j.tcs.2012.01.023.
Bernhard Haubold, Nora Pierstorff, Friedrich Möller, and Thomas Wiehe. Genome comparison without alignment using shortest unique substrings. BMC Bioinformatics, 6(1):123, 2005.
Wing-Kai Hon, Sharma V. Thankachan, and Bojian Xu. An in-place framework for exact and approximate shortest unique substring queries. In ISAAC 2015, pages 755-767, 2015.
Xiaocheng Hu, Jian Pei, and Yufei Tao. Shortest unique queries on strings. In Proc. SPIRE 2014, pages 161-172, 2014.
Atalay Mert Ileri, M. Oguzhan Külekci, and Bojian Xu. A simple yet time-optimal and linear-space algorithm for shortest unique substring queries. Theor. Comput. Sci., 562:621-633, 2015.
Jin Wook Kim, Amihood Amir, Gad M. Landau, and Kunsoo Park. Computing similarity of run-length encoded strings with affine gap penalty. Theoretical Computer Science, 395(2–3):268-282, 2008. SAIL - String Algorithms, Information and Learning: Dedicated to Professor Alberto Apostolico on the occasion of his 60th birthday. URL: http://dx.doi.org/10.1016/j.tcs.2008.01.008.
Jia-Jie Liu, Guan-Shieng Huang, and Yue-Li Wang. A fast algorithm for finding the positions of all squares in a run-length encoded string. Theoretical Computer Science, 410(38–40):3942-3948, 2009. URL: http://dx.doi.org/10.1016/j.tcs.2009.05.032.
Mäkinen, Ukkonen, and Navarro. Approximate matching of run-length compressed strings. Algorithmica, 35(4):347-369, 2003. URL: http://dx.doi.org/10.1007/s00453-002-1005-2.
Jian Pei, Wush Chi-Hsuan Wu, and Mi-Yen Yeh. On shortest unique substring queries. In Proc. ICDE 2013, pages 937-948, 2013.
Masayuki Takeda. Encyclopedia of algorithms, chapter "Compressed Pattern Matching", pages 171-174. Springer US, 2008.
Yuya Tamakoshi, Keisuke Goto, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. An opportunistic text indexing structure based on run length encoding. In Proc. CIAC 2015, pages 390-402, 2015.
Kazuya Tsuruta, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Shortest unique substrings queries in optimal time. In Proc. SOFSEM 2014, pages 503-513, 2014.

Shortest Unique Substring Queries on Run-Length Encoded Strings

Authors Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References

Thanks for your feedback!

Could not send message