Linear-Time Algorithm for Long LCF with k Mismatches

Authors Panagiotis Charalampopoulos , Maxime Crochemore , Costas S. Iliopoulos, Tomasz Kociumaka , Solon P. Pissis , Jakub Radoszewski , Wojciech Rytter, Tomasz Walen



PDF
Thumbnail PDF

File

LIPIcs.CPM.2018.23.pdf
  • Filesize: 0.57 MB
  • 16 pages

Document Identifiers

Author Details

Panagiotis Charalampopoulos
  • Department of Informatics, King’s College London, London, UK
Maxime Crochemore
  • Department of Informatics, King’s College London, London, UK
Costas S. Iliopoulos
  • Department of Informatics, King’s College London, London, UK
Tomasz Kociumaka
  • Institute of Informatics, University of Warsaw, Warsaw, Poland
Solon P. Pissis
  • Department of Informatics, King’s College London, London, UK
Jakub Radoszewski
  • Institute of Informatics, University of Warsaw, Warsaw, Poland
Wojciech Rytter
  • Institute of Informatics, University of Warsaw, Warsaw, Poland
Tomasz Walen
  • Institute of Informatics, University of Warsaw, Warsaw, Poland

Cite AsGet BibTex

Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Linear-Time Algorithm for Long LCF with k Mismatches. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 23:1-23:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.CPM.2018.23

Abstract

In the Longest Common Factor with k Mismatches (LCF_k) problem, we are given two strings X and Y of total length n, and we are asked to find a pair of maximal-length factors, one of X and the other of Y, such that their Hamming distance is at most k. Thankachan et al. [Thankachan et al. 2016] show that this problem can be solved in O(n log^k n) time and O(n) space for constant k. We consider the LCF_k(l) problem in which we assume that the sought factors have length at least l. We use difference covers to reduce the LCF_k(l) problem with l=Omega(log^{2k+2}n) to a task involving m=O(n/log^{k+1}n) synchronized factors. The latter can be solved in O(m log^{k+1}m) time, which results in a linear-time algorithm for LCF_k(l) with l=Omega(log^{2k+2}n). In general, our solution to the LCF_k(l) problem for arbitrary l takes O(n + n log^{k+1} n/sqrt{l}) time.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • longest common factor
  • longest common substring
  • Hamming distance
  • heavy-light decomposition
  • difference cover

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Richard Ryan Williams, and Huacheng Yu. More applications of the polynomial method to algorithm design. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 218-230. SIAM, 2015. URL: http://dx.doi.org/10.1137/1.9781611973730.17.
  2. Hayam Alamro, Lorraine A. K. Ayad, Panagiotis Charalampopoulos, Costas S. Iliopoulos, and Solon P. Pissis. Longest common prefixes with k-mismatches and applications. In A Min Tjoa, Ladjel Bellatreche, Stefan Biffl, Jan van Leeuwen, and Jirí Wiedermann, editors, SOFSEM 2018: Theory and Practice of Computer Science - 44th International Conference on Current Trends in Theory and Practice of Computer Science, Krems, Austria, January 29 - February 2, 2018, Proceedings, volume 10706 of Lecture Notes in Computer Science, pages 636-649. Springer, 2018. URL: http://dx.doi.org/10.1007/978-3-319-73117-9_45.
  3. Lorraine A. K. Ayad, Panagiotis Charalampopoulos, Costas S. Iliopoulos, and Solon P. Pissis. Longest common prefixes with k-errors and applications. CoRR, abs/1801.04425, 2018. URL: http://arxiv.org/abs/1801.04425.
  4. Maxim A. Babenko and Tatiana A. Starikovskaya. Computing the longest common substring with one mismatch. Probl. Inf. Transm., 47(1):28-33, 2011. URL: http://dx.doi.org/10.1134/S0032946011010030.
  5. Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Gaston H. Gonnet, Daniel Panario, and Alfredo Viola, editors, LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, Punta del Este, Uruguay, April 10-14, 2000, Proceedings, volume 1776 of Lecture Notes in Computer Science, pages 88-94. Springer, 2000. URL: http://dx.doi.org/10.1007/10719839_9.
  6. Gerth Stølting Brodal. Finger search trees. In Dinesh P. Mehta and Sartaj Sahni, editors, Handbook of Data Structures and Applications. Chapman and Hall/CRC, 2004. URL: http://dx.doi.org/10.1201/9781420035179.ch11.
  7. Mark R. Brown and Robert Endre Tarjan. A fast merging algorithm. J. ACM, 26(2):211-226, 1979. URL: http://dx.doi.org/10.1145/322123.322127.
  8. Stefan Burkhardt and Juha Kärkkäinen. Fast lightweight suffix array construction and checking. In Ricardo A. Baeza-Yates, Edgar Chávez, and Maxime Crochemore, editors, Combinatorial Pattern Matching, 14th Annual Symposium, CPM 2003, Morelia, Michocán, Mexico, June 25-27, 2003, Proceedings, volume 2676 of Lecture Notes in Computer Science, pages 55-69. Springer, 2003. URL: http://dx.doi.org/10.1007/3-540-44888-8_5.
  9. Richard Cole, Lee-Ad Gottlieb, and Moshe Lewenstein. Dictionary matching and indexing with errors and don't cares. In László Babai, editor, Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 91-100. ACM, 2004. URL: http://dx.doi.org/10.1145/1007352.1007374.
  10. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on strings. Cambridge University Press, 2007. Google Scholar
  11. Maxime Crochemore, Costas S. Iliopoulos, Manal Mohamed, and Marie-France Sagot. Longest repeats with a block of k don't cares. Theor. Comput. Sci., 362(1-3):248-254, 2006. URL: http://dx.doi.org/10.1016/j.tcs.2006.06.029.
  12. Tomás Flouri, Emanuele Giaquinta, Kassian Kobert, and Esko Ukkonen. Longest common substrings with k mismatches. Inf. Process. Lett., 115(6-8):643-647, 2015. URL: http://dx.doi.org/10.1016/j.ipl.2015.03.006.
  13. Szymon Grabowski. A note on the longest common substring with k-mismatches problem. Inf. Process. Lett., 115(6-8):640-642, 2015. URL: http://dx.doi.org/10.1016/j.ipl.2015.03.003.
  14. Leonidas J. Guibas, Edward M. McCreight, Michael F. Plass, and Janet R. Roberts. A new representation for linear lists. In John E. Hopcroft, Emily P. Friedman, and Michael A. Harrison, editors, Proceedings of the 9th Annual ACM Symposium on Theory of Computing, May 4-6, 1977, Boulder, Colorado, USA, pages 49-60. ACM, 1977. URL: http://dx.doi.org/10.1145/800105.803395.
  15. Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997. Google Scholar
  16. Kurt Hoffman, Kurt Mehlhorn, Pierre Rosenstiehl, and Robert Endre Tarjan. Sorting jordan sequences in linear time using level-linked search trees. Information and Control, 68(1-3):170-184, 1986. URL: http://dx.doi.org/10.1016/S0019-9958(86)80033-X.
  17. Lucas Chi Kwong Hui. Color set size problem with application to string matching. In Alberto Apostolico, Maxime Crochemore, Zvi Galil, and Udi Manber, editors, Combinatorial Pattern Matching, Third Annual Symposium, CPM 92, Tucson, Arizona, USA, April 29 - May 1, 1992, Proceedings, volume 644 of Lecture Notes in Computer Science, pages 230-243. Springer, 1992. URL: http://dx.doi.org/10.1007/3-540-56024-6_19.
  18. Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. J. Comput. Syst. Sci., 62(2):367-375, 2001. URL: http://dx.doi.org/10.1006/jcss.2000.1727.
  19. Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? J. Comput. Syst. Sci., 63(4):512-530, 2001. URL: http://dx.doi.org/10.1006/jcss.2001.1774.
  20. Tomasz Kociumaka, Jakub Radoszewski, and Tatiana A. Starikovskaya. Longest common substring with approximately k mismatches. CoRR, abs/1712.08573, 2017. URL: http://arxiv.org/abs/1712.08573.
  21. Tomasz Kociumaka, Tatiana A. Starikovskaya, and Hjalte Wedel Vildhøj. Sublinear space algorithms for the longest common substring problem. In Andreas S. Schulz and Dorothea Wagner, editors, Algorithms - ESA 2014 - 22th Annual European Symposium, Wroclaw, Poland, September 8-10, 2014. Proceedings, volume 8737 of Lecture Notes in Computer Science, pages 605-617. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-662-44777-2_50.
  22. Chris-André Leimeister and Burkhard Morgenstern. kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison. Bioinformatics, 30(14):2000-2008, 2014. URL: http://dx.doi.org/10.1093/bioinformatics/btu331.
  23. Mamoru Maekawa. A square root N algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst., 3(2):145-159, 1985. Google Scholar
  24. Tatiana A. Starikovskaya. Longest common substring with approximately k mismatches. In Roberto Grossi and Moshe Lewenstein, editors, 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27-29, 2016, Tel Aviv, Israel, volume 54 of LIPIcs, pages 21:1-21:11. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2016.21.
  25. Tatiana A. Starikovskaya and Hjalte Wedel Vildhøj. Time-space trade-offs for the longest common substring problem. In Johannes Fischer and Peter Sanders, editors, Combinatorial Pattern Matching, 24th Annual Symposium, CPM 2013, Bad Herrenalb, Germany, June 17-19, 2013. Proceedings, volume 7922 of Lecture Notes in Computer Science, pages 223-234. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-642-38905-4_22.
  26. Robert Endre Tarjan. Applications of path compression on balanced trees. J. ACM, 26(4):690-715, 1979. URL: http://dx.doi.org/10.1145/322154.322161.
  27. Sharma V. Thankachan, Alberto Apostolico, and Srinivas Aluru. A provably efficient algorithm for the k-mismatch average common substring problem. Journal of Computational Biology, 23(6):472-482, 2016. URL: http://dx.doi.org/10.1089/cmb.2015.0235.
  28. Sharma V. Thankachan, Sriram P. Chockalingam, Yongchao Liu, Alberto Apostolico, and Srinivas Aluru. ALFRED: A practical method for alignment-free distance computation. Journal of Computational Biology, 23(6):452-460, 2016. URL: http://dx.doi.org/10.1089/cmb.2015.0217.
  29. Igor Ulitsky, David Burstein, Tamir Tuller, and Benny Chor. The average common substring approach to phylogenomic reconstruction. Journal of Computational Biology, 13(2):336-350, 2006. URL: http://dx.doi.org/10.1089/cmb.2006.13.336.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail