Space-Time Trade-Offs for the Shortest Unique Substring Problem

Authors Arnab Ganguly, Wing-Kai Hon, Rahul Shah, Sharma V. Thankachan



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2016.34.pdf
  • Filesize: 0.52 MB
  • 13 pages

Document Identifiers

Author Details

Arnab Ganguly
Wing-Kai Hon
Rahul Shah
Sharma V. Thankachan

Cite AsGet BibTex

Arnab Ganguly, Wing-Kai Hon, Rahul Shah, and Sharma V. Thankachan. Space-Time Trade-Offs for the Shortest Unique Substring Problem. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 34:1-34:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.ISAAC.2016.34

Abstract

Given a string X[1, n] and a position k in [1, n], the Shortest Unique Substring of X covering k, denoted by S_k, is a substring X[i, j] of X which satisfies the following conditions: (i) i leq k leq j, (ii) i is the only position where there is an occurrence of X[i, j], and (iii) j - i is minimized. The best-known algorithm [Hon et al., ISAAC 2015] can find S k for all k in [1, n] in time O(n) using the string X and additional 2n words of working space. Let tau be a given parameter. We present the following new results. For any given k in [1, n], we can compute S_k via a deterministic algorithm in O(n tau^2 log n tau) time using X and additional O(n/tau) words of working space. For every k in [1, n], we can compute S_k via a deterministic algorithm in O(n tau^2 log n/tau) time using X and additional O(n/tau) words and 4n + o(n) bits of working space. For both problems above, we present an O(n tau log^{c+1} n)-time randomized algorithm that uses n/ log c n words in addition to that mentioned above, where c geq 0 is an arbitrary constant. In this case, the reported string is unique and covers k, but with probability at most n^{-O(1)} , may not be the shortest. As a consequence of our techniques, we also obtain similar space-and-time tradeoffs for a related problem of finding Maximal Unique Matches of two strings [Delcher et al., Nucleic Acids Res. 1999].
Keywords
  • Suffix Tree
  • Sparsification
  • Rabin-Karp Fingerprint
  • Probabilistic z-Fast Trie
  • Succinct Data-Structures

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Monotone minimal perfect hashing: searching a sorted table with O(1) accesses. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA, January 4-6, 2009, pages 785-794, 2009. Google Scholar
  2. Djamal Belazzougui and Fabio Cunial. Indexed matching statistics and shortest unique substrings. In String Processing and Information Retrieval - 21st International Symposium, SPIRE 2014, Ouro Preto, Brazil, October 20-22, 2014. Proceedings, pages 179-190, 2014. URL: http://dx.doi.org/10.1007/978-3-319-11918-2_18.
  3. Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, Punta del Este, Uruguay, April 10-14, 2000, Proceedings, pages 88-94, 2000. URL: http://dx.doi.org/10.1007/10719839_9.
  4. Michael A. Bender and Martin Farach-Colton. The level ancestor problem simplified. In LATIN 2002: Theoretical Informatics, 5th Latin American Symposium, Cancun, Mexico, April 3-6, 2002, Proceedings, pages 508-515, 2002. URL: http://dx.doi.org/10.1007/3-540-45995-2_44.
  5. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. Dictionary matching in a stream. In Algorithms - ESA 2015 - 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings, pages 361-372, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48350-3_31.
  6. Arthur L. Delcher, Simon Kasif, Robert D Fleischmann, Jeremy Peterson, Owen White, and Steven L. Salzberg. Alignment of whole genomes. Nucleic acids research, 27(11):2369-2376, 1999. Google Scholar
  7. Arthur L. Delcher, Adam Phillippy, Jane Carlton, and Steven L. Salzberg. Fast algorithms for large-scale genome alignment and comparison. Nucleic acids research, 30(11):2478-2483, 2002. Google Scholar
  8. Martin Farach. Optimal suffix tree construction with large alphabets. In 38th Annual Symposium on Foundations of Computer Science, FOCS'97, Miami Beach, Florida, USA, October 19-22, 1997, pages 137-143, 1997. URL: http://dx.doi.org/10.1109/SFCS.1997.646102.
  9. Johannes Fischer and Volker Heun. A new succinct representation of rmq-information and improvements in the enhanced suffix array. In Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, First International Symposium, ESCAPE 2007, Hangzhou, China, April 7-9, 2007, Revised Selected Papers, pages 459-470, 2007. URL: http://dx.doi.org/10.1007/978-3-540-74450-4_41.
  10. Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with 0(1) worst case access time. J. ACM, 31(3):538-544, 1984. URL: http://dx.doi.org/10.1145/828.1884.
  11. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997. Google Scholar
  12. Wing-Kai Hon, Tak Wah Lam, Rahul Shah, Siu-Lung Tam, and Jeffrey Scott Vitter. Compressed index for dictionary matching. In 2008 Data Compression Conference (DCC 2008), 25-27 March 2008, Snowbird, UT, USA, pages 23-32, 2008. URL: http://dx.doi.org/10.1109/DCC.2008.62.
  13. Wing-Kai Hon, Sharma V. Thankachan, and Bojian Xu. An in-place framework for exact and approximate shortest unique substring queries. In Algorithms and Computation - 26th International Symposium, ISAAC 2015, Nagoya, Japan, December 9-11, 2015, Proceedings, pages 755-767, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48971-0_63.
  14. Atalay Mert Ileri, M. Oguzhan Külekci, and Bojian Xu. Shortest unique substring query revisited. In Combinatorial Pattern Matching - 25th Annual Symposium, CPM 2014, Moscow, Russia, June 16-18, 2014. Proceedings, pages 172-181, 2014. URL: http://dx.doi.org/10.1007/978-3-319-07566-2_18.
  15. Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987. URL: http://dx.doi.org/10.1147/rd.312.0249.
  16. Stefan Kurtz. Reducing the space requirement of suffix trees. Softw., Pract. Exper., 29(13):1149-1171, 1999. URL: http://dx.doi.org/10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O.
  17. Stefan Kurtz, Adam Phillippy, Arthur L. Delcher, Michael Smoot, Martin Shumway, Corina Antonescu, and Steven L. Salzberg. Versatile and open software for comparing large genomes. Genome biology, 5(2):R12, 2004. Google Scholar
  18. J. Ian Munro. Tables. In Foundations of Software Technology and Theoretical Computer Science, 16th Conference, Hyderabad, India, December 18-20, 1996, Proceedings, pages 37-42, 1996. URL: http://dx.doi.org/10.1007/3-540-62034-6_35.
  19. Jian Pei, Wush Chi-Hsuan Wu, and Mi-Yen Yeh. On shortest unique substring queries. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, pages 937-948, 2013. URL: http://dx.doi.org/10.1109/ICDE.2013.6544887.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail