Shortest Unique Substring Queries on Run-Length Encoded Strings

Authors Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda



PDF
Thumbnail PDF

File

LIPIcs.MFCS.2016.69.pdf
  • Filesize: 0.69 MB
  • 11 pages

Document Identifiers

Author Details

Takuya Mieno
Shunsuke Inenaga
Hideo Bannai
Masayuki Takeda

Cite AsGet BibTex

Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Shortest Unique Substring Queries on Run-Length Encoded Strings. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 69:1-69:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.MFCS.2016.69

Abstract

We consider the problem of answering shortest unique substring (SUS) queries on run-length encoded strings. For a string S, a unique substring u = S[i..j] is said to be a shortest unique substring (SUS) of S containing an interval [s, t] (i <= s <= t <= j) if for any i' <= s <= t <= j' with j-i > j'-i', S[i'..j'] occurs at least twice in S. Given a run-length encoding of size m of a string of length N, we show that we can construct a data structure of size O(m+pi_s(N, m)) in O(m log m + pi_c(N, m)) time such that queries can be answered in O(pi_q(N, m) + k) time, where k is the size of the output (the number of SUSs), and pi_s(N,m), pi_c(N,m), pi_q(N,m) are, respectively, the size, construction time, and query time for a predecessor/successor query data structure of m elements for the universe of [1,N]. Using the data structure by Beam and Fich (JCSS 2002), this results in a data structure of O(m) space that is constructed in O(m log m) time, and answers queries in O(sqrt(log m/loglog m)+k) time.
Keywords
  • string algorithms
  • shortest unique substring
  • run-length encoding

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir, Gary Benson, and Martin Farach. Let sleeping files lie: pattern matching in z-compressed files. Journal of Computer and System Sciences, 52(2):299-307, April 1996. Google Scholar
  2. Alberto Apostolico, Gad M. Landau, and Steven Skiena. Matching for run-length encoded strings. J. Complex., 15(1):4-16, March 1999. URL: http://dx.doi.org/10.1006/jcom.1998.0493.
  3. Ora Arbell, Gad M. Landau, and Joseph S.B. Mitchell. Edit distance of run-length encoded strings. Information Processing Letters, 83(6):307-314, 2002. URL: http://dx.doi.org/10.1016/S0020-0190(02)00215-6.
  4. Golnaz Badkobeh, Gabriele Fici, Steve Kroon, and Zsuzsanna Lipták. Binary jumbled string matching for highly run-length compressible texts. Information Processing Letters, 113(17):604-608, 2013. URL: http://dx.doi.org/10.1016/j.ipl.2013.05.007.
  5. Paul Beame and Faith E. Fich. Optimal bounds for the predecessor problem and related problems. Journal of Computer and System Sciences, 65(1):38-72, 2002. URL: http://dx.doi.org/10.1006/jcss.2002.1822.
  6. Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Proceedings of the 4th Latin American Symposium on Theoretical Informatics, LATIN 2000, pages 88-94, 2000. URL: http://dl.acm.org/citation.cfm?id=646388.690192.
  7. Horst Bunke and János Csirik. An algorithm for matching run-length coded strings. Computing, 50(4):297-314, 1993. Google Scholar
  8. Horst Bunke and János Csirik. An improved algorithm for computing the edit distance of run-length coded strings. Information Processing Letters, 54(2):93-96, April 1995. Google Scholar
  9. Kuan-Yu Chen, Ping-Hui Hsu, and Kun-Mao Chao. Efficient retrieval of approximate palindromes in a run-length encoded string. Theoretical Computer Science, 432:28-37, 2012. URL: http://dx.doi.org/10.1016/j.tcs.2012.01.023.
  10. Bernhard Haubold, Nora Pierstorff, Friedrich Möller, and Thomas Wiehe. Genome comparison without alignment using shortest unique substrings. BMC Bioinformatics, 6(1):123, 2005. Google Scholar
  11. Wing-Kai Hon, Sharma V. Thankachan, and Bojian Xu. An in-place framework for exact and approximate shortest unique substring queries. In ISAAC 2015, pages 755-767, 2015. Google Scholar
  12. Xiaocheng Hu, Jian Pei, and Yufei Tao. Shortest unique queries on strings. In Proc. SPIRE 2014, pages 161-172, 2014. Google Scholar
  13. Atalay Mert Ileri, M. Oguzhan Külekci, and Bojian Xu. A simple yet time-optimal and linear-space algorithm for shortest unique substring queries. Theor. Comput. Sci., 562:621-633, 2015. Google Scholar
  14. Jin Wook Kim, Amihood Amir, Gad M. Landau, and Kunsoo Park. Computing similarity of run-length encoded strings with affine gap penalty. Theoretical Computer Science, 395(2–3):268-282, 2008. SAIL - String Algorithms, Information and Learning: Dedicated to Professor Alberto Apostolico on the occasion of his 60th birthday. URL: http://dx.doi.org/10.1016/j.tcs.2008.01.008.
  15. Jia-Jie Liu, Guan-Shieng Huang, and Yue-Li Wang. A fast algorithm for finding the positions of all squares in a run-length encoded string. Theoretical Computer Science, 410(38–40):3942-3948, 2009. URL: http://dx.doi.org/10.1016/j.tcs.2009.05.032.
  16. Mäkinen, Ukkonen, and Navarro. Approximate matching of run-length compressed strings. Algorithmica, 35(4):347-369, 2003. URL: http://dx.doi.org/10.1007/s00453-002-1005-2.
  17. Jian Pei, Wush Chi-Hsuan Wu, and Mi-Yen Yeh. On shortest unique substring queries. In Proc. ICDE 2013, pages 937-948, 2013. Google Scholar
  18. Masayuki Takeda. Encyclopedia of algorithms, chapter "Compressed Pattern Matching", pages 171-174. Springer US, 2008. Google Scholar
  19. Yuya Tamakoshi, Keisuke Goto, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. An opportunistic text indexing structure based on run length encoding. In Proc. CIAC 2015, pages 390-402, 2015. Google Scholar
  20. Kazuya Tsuruta, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Shortest unique substrings queries in optimal time. In Proc. SOFSEM 2014, pages 503-513, 2014. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail