Tight Bounds on the Maximum Number of Shortest Unique Substrings

Authors Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda



PDF
Thumbnail PDF

File

LIPIcs.CPM.2017.24.pdf
  • Filesize: 0.63 MB
  • 11 pages

Document Identifiers

Author Details

Takuya Mieno
Shunsuke Inenaga
Hideo Bannai
Masayuki Takeda

Cite AsGet BibTex

Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Tight Bounds on the Maximum Number of Shortest Unique Substrings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 24:1-24:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.CPM.2017.24

Abstract

A substring Q of a string S is called a shortest unique substring (SUS) for interval [s,t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s,t], and every substring of S which contains interval [s,t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUSs for interval [s,t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s <= t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S.
Keywords
  • shortest unique substrings
  • maximal unique substrings

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Wing-Kai Hon, Sharma V. Thankachan, and Bojian Xu. An in-place framework for exact and approximate shortest unique substring queries. In Khaled M. Elbassioni and Kazuhisa Makino, editors, Proceedings of the 26th International Symposium on Algorithms and Computation (ISAAC 2015), volume 9472 of LNCS, pages 755-767. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48971-0_63.
  2. Xiaocheng Hu, Jian Pei, and Yufei Tao. Shortest unique queries on strings. In Edleno Silva de Moura and Maxime Crochemore, editors, Proceedings of the 21st International Symposium on String Processing and Information Retrieval (SPIRE 2014), volume 8799 of LNCS, pages 161-172. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-11918-2_16.
  3. Atalay Mert Ileri, M. Oguzhan Külekci, and Bojian Xu. Shortest unique substring query revisited. In Alexander S. Kulikov, Sergei O. Kuznetsov, and Pavel A. Pevzner, editors, Proceedings of the 25th Annual Symposium on Combinatorial Pattern Matching (CPM 2014), volume 8486 of LNCS, pages 172-181. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-07566-2_18.
  4. Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Shortest unique substring queries on run-length encoded strings. In Piotr Faliszewski, Anca Muscholl, and Rolf Niedermeier, editors, Proceedings of the 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016), volume 58 of LIPIcs, pages 69:1-69:11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.MFCS.2016.69.
  5. Jian Pei, Wush Chi-Hsuan Wu, and Mi-Yen Yeh. On shortest unique substring queries. In Christian S. Jensen, Christopher M. Jermaine, and Xiaofang Zhou, editors, Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE 2013), pages 937-948. IEEE Computer Society, 2013. URL: http://dx.doi.org/10.1109/ICDE.2013.6544887.
  6. Kazuya Tsuruta, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Shortest unique substrings queries in optimal time. In Viliam Geffert, Bart Preneel, Branislav Rovan, Julius Stuller, and A Min Tjoa, editors, Proceedings of the 40th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2014), volume 8327 of LNCS, pages 503-513. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-04298-5_44.
  7. Bojian Xu. On stabbing queries for generalized longest repeat. In Jun Huan, Satoru Miyano, Amarda Shehu, Xiaohua Tony Hu, Bin Ma, Sanguthevar Rajasekaran, Vijay K. Gombar, Matthieu-P. Schapranow, Illhoi Yoo, Jiayu Zhou, Brian Chen, Vinay Pai, and Brian G. Pierce, editors, Proceedings of the 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2015), pages 523-530. IEEE Computer Society, 2015. URL: http://dx.doi.org/10.1109/BIBM.2015.7359738.