k-Approximate Quasiperiodicity under Hamming and Edit Distance

Authors Aleksander Kędzierski , Jakub Radoszewski



PDF
Thumbnail PDF

File

LIPIcs.CPM.2020.18.pdf
  • Filesize: 0.57 MB
  • 15 pages

Document Identifiers

Author Details

Aleksander Kędzierski
  • Institute of Informatics, University of Warsaw, Poland
  • Samsung R&D Institute, Warsaw, Poland
Jakub Radoszewski
  • Institute of Informatics, University of Warsaw, Poland
  • Samsung R&D Institute, Warsaw, Poland

Cite AsGet BibTex

Aleksander Kędzierski and Jakub Radoszewski. k-Approximate Quasiperiodicity under Hamming and Edit Distance. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 18:1-18:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.CPM.2020.18

Abstract

Quasiperiodicity in strings was introduced almost 30 years ago as an extension of string periodicity. The basic notions of quasiperiodicity are cover and seed. A cover of a text T is a string whose occurrences in T cover all positions of T. A seed of text T is a cover of a superstring of T. In various applications exact quasiperiodicity is still not sufficient due to the presence of errors. We consider approximate notions of quasiperiodicity, for which we allow approximate occurrences in T with a small Hamming, Levenshtein or weighted edit distance. In previous work Sip et al. (2002) and Christodoulakis et al. (2005) showed that computing approximate covers and seeds, respectively, under weighted edit distance is NP-hard. They, therefore, considered restricted approximate covers and seeds which need to be factors of the original string T and presented polynomial-time algorithms for computing them. Further algorithms, considering approximate occurrences with Hamming distance bounded by k, were given in several contributions by Guth et al. They also studied relaxed approximate quasiperiods that do not need to cover all positions of T. In case of large data the exponents in polynomial time complexity play a crucial role. We present more efficient algorithms for computing restricted approximate covers and seeds. In particular, we improve upon the complexities of many of the aforementioned algorithms, also for relaxed quasiperiods. Our solutions are especially efficient if the number (or total cost) of allowed errors is bounded. We also show NP-hardness of computing non-restricted approximate covers and seeds under Hamming distance. Approximate covers were studied in three recent contributions at CPM over the last three years. However, these works consider a different definition of an approximate cover of T, that is, the shortest exact cover of a string T' with the smallest Hamming distance from T.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • approximate cover
  • approximate seed
  • enhanced cover
  • Hamming distance
  • edit distance

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir, Avivit Levy, Moshe Lewenstein, Ronit Lubin, and Benny Porat. Can we recover the cover? In Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter, editors, 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, volume 78 of LIPIcs, pages 25:1-25:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.CPM.2017.25.
  2. Amihood Amir, Avivit Levy, Moshe Lewenstein, Ronit Lubin, and Benny Porat. Can we recover the cover? Algorithmica, 81(7):2857-2875, 2019. URL: https://doi.org/10.1007/s00453-019-00559-8.
  3. Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat. Approximate cover of strings. In Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter, editors, 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, volume 78 of LIPIcs, pages 26:1-26:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.CPM.2017.26.
  4. Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat. Approximate cover of strings. Theoretical Computer Science, 793:59-69, 2019. URL: https://doi.org/10.1016/j.tcs.2019.05.020.
  5. Amihood Amir, Avivit Levy, and Ely Porat. Quasi-periodicity under mismatch errors. In Gonzalo Navarro, David Sankoff, and Binhai Zhu, editors, Annual Symposium on Combinatorial Pattern Matching, CPM 2018, volume 105 of LIPIcs, pages 4:1-4:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.4.
  6. Alberto Apostolico and Andrzej Ehrenfeucht. Efficient detection of quasiperiodicities in strings. Theoretical Computer Science, 119(2):247-265, 1993. URL: https://doi.org/10.1016/0304-3975(93)90159-Q.
  7. Alberto Apostolico, Martin Farach, and Costas S. Iliopoulos. Optimal superprimitivity testing for strings. Information Processing Letters, 39(1):17-20, 1991. URL: https://doi.org/10.1016/0020-0190(91)90056-N.
  8. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM Journal on Computing, 47(3):1087-1097, 2018. URL: https://doi.org/10.1137/15M1053128.
  9. Dany Breslauer. An on-line string superprimitivity test. Information Processing Letters, 44(6):345-347, 1992. URL: https://doi.org/10.1016/0020-0190(92)90111-8.
  10. Manolis Christodoulakis, Costas S. Iliopoulos, Kunsoo Park, and Jeong Seop Sim. Approximate seeds of strings. Journal of Automata, Languages and Combinatorics, 10(5/6):609-626, 2005. URL: https://doi.org/10.25596/jalc-2005-609.
  11. Maxime Crochemore and Wojciech Rytter. Jewels of Stringology. World Scientific, 2003. URL: https://doi.org/10.1142/4838.
  12. Patryk Czajka and Jakub Radoszewski. Experimental evaluation of algorithms for computing quasiperiods. CoRR (accepted to Theoretical Computer Science), 2019. URL: http://arxiv.org/abs/1909.11336.
  13. Tomás Flouri, Emanuele Giaquinta, Kassian Kobert, and Esko Ukkonen. Longest common substrings with k mismatches. Information Processing Letters, 115(6-8):643-647, 2015. URL: https://doi.org/10.1016/j.ipl.2015.03.006.
  14. Tomás Flouri, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Simon J. Puglisi, William F. Smyth, and Wojciech Tyczyński. Enhanced string covering. Theoretical Computer Science, 506:102-114, 2013. URL: https://doi.org/10.1016/j.tcs.2013.08.013.
  15. Moti Frances and Ami Litman. On covering problems of codes. Theory of Computing Systems, 30(2):113-119, 1997. URL: https://doi.org/10.1007/s002240000044.
  16. Ondřej Guth. Searching Regularities in Strings using Finite Automata. PhD thesis, Czech Technical University in Prague, 2014. URL: https://fit.cvut.cz/sites/default/files/PhDThesis_Guth.pdf.
  17. Ondřej Guth. On approximate enhanced covers under Hamming distance. Discrete Applied Mathematics, 274:67-80, 2020. URL: https://doi.org/10.1016/j.dam.2019.01.015.
  18. Ondřej Guth and Bořivoj Melichar. Using finite automata approach for searching approximate seeds of strings. In Xu Huang, Sio-Iong Ao, and Oscar Castillo, editors, Intelligent Automation and Computer Engineering, pages 347-360, Dordrecht, 2010. Springer Netherlands. URL: https://doi.org/10.1007/978-90-481-3517-2_27.
  19. Ondřej Guth, Bořivoj Melichar, and Miroslav Balík. Searching all approximate covers and their distance using finite automata. In Peter Vojtás, editor, Proceedings of the Conference on Theory and Practice of Information Technologies, ITAT 2008, volume 414 of CEUR Workshop Proceedings. CEUR-WS.org, 2008. URL: http://ceur-ws.org/Vol-414/paper4.pdf.
  20. Heikki Hyyrö, Kazuyuki Narisawa, and Shunsuke Inenaga. Dynamic edit distance table under a general weighted cost function. Journal of Discrete Algorithms, 34:2-17, 2015. URL: https://doi.org/10.1016/j.jda.2015.05.007.
  21. Costas S. Iliopoulos, Dennis W. G. Moore, and Kunsoo Park. Covering a string. Algorithmica, 16(3):288-297, 1996. URL: https://doi.org/10.1007/BF01955677.
  22. Haim Kaplan, Ely Porat, and Nira Shafrir. Finding the position of the k-mismatch and approximate tandem repeats. In Lars Arge and Rusins Freivalds, editors, Algorithm Theory - SWAT 2006, 10th ScandinavianWorkshop on Algorithm Theory, volume 4059 of Lecture Notes in Computer Science, pages 90-101. Springer, 2006. URL: https://doi.org/10.1007/11785293_11.
  23. Sung-Ryul Kim and Kunsoo Park. A dynamic edit distance table. Journal of Discrete Algorithms, 2(2):303-312, 2004. URL: https://doi.org/10.1016/S1570-8667(03)00082-0.
  24. Tomasz Kociumaka, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. A linear-time algorithm for seeds computation. ACM Transactions on Algorithms, 16(2):Article 27, April 2020. URL: https://doi.org/10.1145/3386369.
  25. Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Efficient algorithms for shortest partial seeds in words. Theoretical Computer Science, 710:139-147, 2018. URL: https://doi.org/10.1016/j.tcs.2016.11.035.
  26. Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Fast algorithm for partial covers in words. Algorithmica, 73(1):217-233, 2015. URL: https://doi.org/10.1007/s00453-014-9915-3.
  27. Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM Journal on Computing, 27(2):557-582, 1998. URL: https://doi.org/10.1137/S0097539794264810.
  28. Gad M. Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theoretical Computer Science, 43:239-249, 1986. URL: https://doi.org/10.1016/0304-3975(86)90178-7.
  29. Yin Li and William F. Smyth. Computing the cover array in linear time. Algorithmica, 32(1):95-106, 2002. URL: https://doi.org/10.1007/s00453-001-0062-2.
  30. Dennis W. G. Moore and William F. Smyth. An optimal algorithm to compute all the covers of a string. Information Processing Letters, 50(5):239-246, 1994. URL: https://doi.org/10.1016/0020-0190(94)00045-X.
  31. Dennis W. G. Moore and William F. Smyth. A correction to "An optimal algorithm to compute all the covers of a string". Information Processing Letters, 54(2):101-103, 1995. URL: https://doi.org/10.1016/0020-0190(94)00235-Q.
  32. Jeong Seop Sim, Costas S. Iliopoulos, Kunsoo Park, and William F. Smyth. Approximate periods of strings. Theoretical Computer Science, 262(1):557-568, 2001. URL: https://doi.org/10.1016/S0304-3975(00)00365-0.
  33. Jeong Seop Sim, Kunsoo Park, Sung-Ryul Kim, and Jee-Soo Lee. Finding approximate covers of strings. Journal of Korea Information Science Society, 29(1):16-21, 2002. URL: http://www.koreascience.or.kr/article/ArticleFullRecord.jsp?cn=JBGHG6_2002_v29n1_16.
  34. R. A. Wagner and M. J. Fischer. The string-to-string correction problem. Journal of the ACM, 21(1):168-173, 1974. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail