Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties

Authors Costas S. Iliopoulos, Jakub Radoszewski



PDF
Thumbnail PDF

File

LIPIcs.CPM.2016.8.pdf
  • Filesize: 0.51 MB
  • 12 pages

Document Identifiers

Author Details

Costas S. Iliopoulos
Jakub Radoszewski

Cite As Get BibTex

Costas S. Iliopoulos and Jakub Radoszewski. Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 8:1-8:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.CPM.2016.8

Abstract

Strings with don't care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still their worst-case running time is Theta(n^2). We present the first truly subquadratic-time solutions for a number of such problems on partial words that can also be adapted to indeterminate strings over a constant-sized alphabet. We show that $n$ longest common compatible prefix queries (which correspond to longest common extension queries in regular strings) can be answered on-line in O(n * sqrt(n * log(n)) time after O(n * sqrt(n * log(n))-time preprocessing. We also present O(n * sqrt(n * log(n))-time algorithms for computing the prefix array and two types of border array of a partial word.

Subject Classification

Keywords
  • string with don’t cares
  • partial word
  • indeterminate string
  • longest common conservative prefix queries
  • prefix array

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In Venkatesan Guruswami, editor, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59-78. IEEE Computer Society, 2015. URL: http://dx.doi.org/10.1109/FOCS.2015.14.
  2. Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039-1051, 1987. URL: http://dx.doi.org/10.1137/0216067.
  3. Ali Alatabbi, M. Sohel Rahman, and William F. Smyth. Inferring an indeterminate string from a prefix graph. J. Discrete Algorithms, 32:6-13, 2015. URL: http://dx.doi.org/10.1016/j.jda.2014.12.006.
  4. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Rocco A. Servedio and Ronitt Rubinfeld, editors, Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 51-58. ACM, 2015. URL: http://dx.doi.org/10.1145/2746539.2746612.
  5. Francine Blanchet-Sadri and Robert A. Hegstrom. Partial words and a theorem of Fine and Wilf revisited. Theor. Comput. Sci., 270(1-2):401-419, 2002. URL: http://dx.doi.org/10.1016/S0304-3975(00)00407-2.
  6. Francine Blanchet-Sadri and Justin Lazarow. Suffix trees for partial words and the longest common compatible prefix problem. In Adrian Horia Dediu, Carlos Martín-Vide, and Bianca Truthe, editors, Language and Automata Theory and Applications - 7th International Conference, LATA 2013, Bilbao, Spain, April 2-5, 2013. Proceedings, volume 7810 of Lecture Notes in Computer Science, pages 165-176. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-642-37064-9_16.
  7. Manolis Christodoulakis, P. J. Ryan, William F. Smyth, and Shu Wang. Indeterminate strings, prefix arrays & undirected graphs. Theor. Comput. Sci., 600:34-48, 2015. URL: http://dx.doi.org/10.1016/j.tcs.2015.06.056.
  8. Peter Clifford and Raphaël Clifford. Simple deterministic wildcard matching. Inf. Process. Lett., 101(2):53-54, 2007. URL: http://dx.doi.org/10.1016/j.ipl.2006.08.002.
  9. Richard Cole and Ramesh Hariharan. Verifying candidate matches in sparse and wildcard matching. In John H. Reif, editor, Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 592-601. ACM, 2002. URL: http://dx.doi.org/10.1145/509907.509992.
  10. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on Strings. Cambridge University Press, 2007. Google Scholar
  11. Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Jakub Radoszewski, Wojciech Rytter, Bartosz Szreder, and Tomasz Waleń. A note on the longest common compatible prefix problem for partial words. J. Discrete Algorithms, 34:49-53, 2015. URL: http://dx.doi.org/10.1016/j.jda.2015.05.003.
  12. Maxime Crochemore, Costas S. Iliopoulos, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Extracting powers and periods in a word from its runs structure. Theor. Comput. Sci., 521:29-41, 2014. URL: http://dx.doi.org/10.1016/j.tcs.2013.11.018.
  13. Maxime Crochemore and Wojciech Rytter. Jewels of Stringology. World Scientific, 2003. Google Scholar
  14. Michael J. Fischer and Michael S. Paterson. String matching and other products. In Richard Karp, editor, Proceedings of the 7th SIAM-AMS Complexity of Computation, pages 113-125, 1974. URL: http://dx.doi.org/10.1007/978-3-540-89097-3_14.
  15. Harold N. Gabow and Robert Endre Tarjan. A linear-time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci., 30(2):209-221, 1985. URL: http://dx.doi.org/10.1016/0022-0000(85)90014-5.
  16. Jan Holub and William F. Smyth. Algorithms on indeterminate strings. In Proceedings of 14th Australasian Workshop on Combinatorial Algorithms, pages 36-45, 2003. Google Scholar
  17. Jan Holub, William F. Smyth, and Shu Wang. Fast pattern-matching on indeterminate strings. J. Discrete Algorithms, 6(1):37-50, 2008. URL: http://dx.doi.org/10.1016/j.jda.2006.10.003.
  18. Costas S. Iliopoulos, Manal Mohamed, Laurent Mouchard, Katerina Perdikuri, William F. Smyth, and Athanasios K. Tsakalidis. String regularities with don't cares. Nord. J. Comput., 10(1):40-51, 2003. Google Scholar
  19. Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-SAT. J. Comput. Syst. Sci., 62(2):367-375, 2001. URL: http://dx.doi.org/10.1006/jcss.2000.1727.
  20. Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Comput. Syst. Sci., 63(4):512-530, 2001. URL: http://dx.doi.org/10.1006/jcss.2001.1774.
  21. Piotr Indyk. Faster algorithms for string matching problems: Matching the convolution bound. In 39th Annual Symposium on Foundations of Computer Science, FOCS'98, November 8-11, 1998, Palo Alto, California, USA, pages 166-173. IEEE Computer Society, 1998. URL: http://dx.doi.org/10.1109/SFCS.1998.743440.
  22. Adam Kalai. Efficient pattern-matching with don't cares. In David Eppstein, editor, Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6-8, 2002, San Francisco, CA, USA, pages 655-656. ACM/SIAM, 2002. URL: http://dl.acm.org/citation.cfm?id=545381.545468.
  23. William F. Smyth and Shu Wang. New perspectives on the prefix array. In Amihood Amir, Andrew Turpin, and Alistair Moffat, editors, String Processing and Information Retrieval, 15th International Symposium, SPIRE 2008, Melbourne, Australia, November 10-12, 2008. Proceedings, volume 5280 of Lecture Notes in Computer Science, pages 133-143. Springer, 2008. URL: http://dx.doi.org/10.1007/978-3-540-89097-3_14.
  24. William F. Smyth and Shu Wang. An adaptive hybrid pattern-matching algorithm on indeterminate strings. Int. J. Found. Comput. Sci., 20(6):985-1004, 2009. URL: http://dx.doi.org/10.1142/S0129054109007005.
  25. Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci., 348(2-3):357-365, 2005. URL: http://dx.doi.org/10.1016/j.tcs.2005.09.023.
  26. Sun Wu and Udi Manber. Agrep - a fast approximate pattern-matching tool. In Proceedings USENIX Winter 1992 Technical Conference, page 153–162, 1992. URL: https://www.usenix.org/legacy/publications/library/proceedings/wu.pdf.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail