Document Open Access Logo

A Linear-Time n^{0.4}-Approximation for Longest Common Subsequence

Authors Karl Bringmann, Debarati Das



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2021.39.pdf
  • Filesize: 0.71 MB
  • 20 pages

Document Identifiers

Author Details

Karl Bringmann
  • Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
  • Max-Planck-Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Debarati Das
  • Basic Algorithm Research Copenhagen (BARC), University of Copenhagen, Denmark

Acknowledgements

We thank an anonymous reviewer for suggesting how to turn the near-linear-time algorithm that we obtained in a previous version of this paper into a linear-time algorithm.

Cite AsGet BibTex

Karl Bringmann and Debarati Das. A Linear-Time n^{0.4}-Approximation for Longest Common Subsequence. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 39:1-39:20, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ICALP.2021.39

Abstract

We consider the classic problem of computing the Longest Common Subsequence (LCS) of two strings of length n. While a simple quadratic algorithm has been known for the problem for more than 40 years, no faster algorithm has been found despite an extensive effort. The lack of progress on the problem has recently been explained by Abboud, Backurs, and Vassilevska Williams [FOCS'15] and Bringmann and Künnemann [FOCS'15] who proved that there is no subquadratic algorithm unless the Strong Exponential Time Hypothesis fails. This major roadblock for getting faster exact algorithms has led the community to look for subquadratic approximation algorithms for the problem. Yet, unlike the edit distance problem for which a constant-factor approximation in almost-linear time is known, very little progress has been made on LCS, making it a notoriously difficult problem also in the realm of approximation. For the general setting (where we make no assumption on the length of the optimum solution or the alphabet size), only a naive O(n^{ε/2})-approximation algorithm with running time Õ(n^{2-ε}) has been known, for any constant 0 < ε ≤ 1. Recently, a breakthrough result by Hajiaghayi, Seddighin, Seddighin, and Sun [SODA'19] provided a linear-time algorithm that yields a O(n^{0.497956})-approximation in expectation; improving upon the naive O(√n)-approximation for the first time. In this paper, we provide an algorithm that in time O(n^{2-ε}) computes an Õ(n^{2ε/5})-approximation with high probability, for any 0 < ε ≤ 1. Our result (1) gives an Õ(n^{0.4})-approximation in linear time, improving upon the bound of Hajiaghayi, Seddighin, Seddighin, and Sun, (2) provides an algorithm whose approximation scales with any subquadratic running time O(n^{2-ε}), improving upon the naive bound of O(n^{ε/2}) for any ε, and (3) instead of only in expectation, succeeds with high probability.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
Keywords
  • approximation algorithm
  • longest common subsequence
  • string algorithm

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud and Arturs Backurs. Towards hardness of approximation for polynomial time problems. In ITCS, volume 67 of LIPIcs, pages 11:1-11:26, 2017. Google Scholar
  2. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In FOCS, pages 59-78. IEEE, 2015. Google Scholar
  3. Amir Abboud and Karl Bringmann. Tighter connections between Formula-SAT and shaving logs. In ICALP, volume 107 of LIPIcs, pages 8:1-8:18, 2018. Google Scholar
  4. Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In STOC, pages 375-388. ACM, 2016. Google Scholar
  5. Amir Abboud and Aviad Rubinstein. Fast and deterministic constant factor approximation algorithms for LCS imply new circuit lower bounds. In ITCS, volume 94 of LIPIcs, pages 35:1-35:14, 2018. Google Scholar
  6. Alfred V. Aho, Daniel S. Hirschberg, and Jeffrey D. Ullman. Bounds on the complexity of the longest common subsequence problem. J. ACM, 23(1):1-12, 1976. Google Scholar
  7. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In FOCS, pages 377-386. IEEE, 2010. Google Scholar
  8. Alexandr Andoni and Negev Shekel Nosatzki. Edit distance in near-linear time: it’s a constant factor. In FOCS, pages 990-1001. IEEE, 2020. Google Scholar
  9. Alexandr Andoni and Krzysztof Onak. Approximating edit distance in near-linear time. SIAM J. Comput., 41(6):1635-1648, 2012. Google Scholar
  10. Alberto Apostolico. Improving the worst-case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings. Inf. Process. Lett., 23(2):63-69, 1986. Google Scholar
  11. Alberto Apostolico and Concettina Guerra. The longest common subsequence problem revisited. Algorithmica, 2:316-336, 1987. Google Scholar
  12. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In STOC, pages 51-58. ACM, 2015. Google Scholar
  13. Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer, and Ravi Kumar. Approximating edit distance efficiently. In FOCS, pages 550-559. IEEE, 2004. Google Scholar
  14. Tugkan Batu, Funda Ergün, Joe Kilian, Avner Magen, Sofya Raskhodnikova, Ronitt Rubinfeld, and Rahul Sami. A sublinear algorithm for weakly approximating edit distance. In STOC, pages 316-324. ACM, 2003. Google Scholar
  15. Tugkan Batu, Funda Ergün, and Süleyman Cenk Sahinalp. Oblivious string embeddings and edit distance approximations. In SODA, pages 792-801. ACM, 2006. Google Scholar
  16. Lasse Bergroth, Harri Hakonen, and Timo Raita. A survey of longest common subsequence algorithms. In SPIRE, pages 39-48. IEEE, 2000. Google Scholar
  17. Joshua Brakensiek and Aviad Rubinstein. Constant-factor approximation of near-linear edit distance in near-linear time. In STOC, pages 685-698. ACM, 2020. Google Scholar
  18. Karl Bringmann and Tobias Friedrich. Exact and efficient generation of geometric random variates and random graphs. In ICALP, volume 7965 of LNCS, pages 267-278, 2013. Google Scholar
  19. Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In FOCS, pages 79-97. IEEE, 2015. Google Scholar
  20. Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In SODA, pages 1216-1235. SIAM, 2018. Google Scholar
  21. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. Approximating edit distance within constant factor in truly sub-quadratic time. In FOCS, pages 979-990. IEEE, 2018. Google Scholar
  22. David Eppstein, Zvi Galil, Raffaele Giancarlo, and Giuseppe F. Italiano. Sparse dynamic programming I: linear cost functions. J. ACM, 39(3):519-545, 1992. Google Scholar
  23. Elazar Goldenberg, Robert Krauthgamer, and Barna Saha. Sublinear algorithms for gap edit distance. In FOCS, pages 1101-1120. IEEE, 2019. Google Scholar
  24. MohammadTaghi Hajiaghayi, Masoud Seddighin, Saeed Seddighin, and Xiaorui Sun. Approximating LCS in linear time: Beating the √n barrier. In SODA, pages 1181-1200. SIAM, 2019. Google Scholar
  25. MohammadTaghi Hajiaghayi, Masoud Seddighin, Saeed Seddighin, and Xiaorui Sun. Approximating LCS in linear time: Beating the √n barrier. CoRR, abs/2003.07285, 2020. URL: http://arxiv.org/abs/2003.07285.
  26. Daniel S. Hirschberg. Algorithms for the longest common subsequence problem. J. ACM, 24(4):664-675, 1977. Google Scholar
  27. James W. Hunt and Thomas G. Szymanski. A fast algorithm for computing longest subsequences. Commun. ACM, 20(5):350-353, 1977. Google Scholar
  28. Costas S. Iliopoulos and Mohammad Sohel Rahman. A new efficient algorithm for computing the longest common subsequence. Theory Comput. Syst., 45(2):355-371, 2009. Google Scholar
  29. Michal Koucký and Michael E. Saks. Constant factor approximations to edit distance on far input pairs in nearly linear time. In STOC, pages 699-712. ACM, 2020. Google Scholar
  30. William J. Masek and Mike Paterson. A faster algorithm computing string edit distances. J. Comput. Syst. Sci., 20(1):18-31, 1980. Google Scholar
  31. Eugene W. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1(2):251-266, 1986. Google Scholar
  32. Narao Nakatsu, Yahiko Kambayashi, and Shuzo Yajima. A longest common subsequence algorithm suitable for similar text strings. Acta Informatica, 18:171-179, 1982. Google Scholar
  33. Aviad Rubinstein, Saeed Seddighin, Zhao Song, and Xiaorui Sun. Approximation algorithms for LCS and LIS with truly improved running times. In FOCS, pages 1121-1145. IEEE, 2019. Google Scholar
  34. Aviad Rubinstein and Zhao Song. Reducing approximate longest common subsequence to approximate edit distance. In SODA, pages 1591-1600. SIAM, 2020. Google Scholar
  35. Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, 1974. Google Scholar
  36. Sun Wu, Udi Manber, Gene Myers, and Webb Miller. An O(NP) sequence comparison algorithm. Inf. Process. Lett., 35(6):317-323, 1990. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail