A Linear-Time n^{0.4}-Approximation for Longest Common Subsequence

Bringmann, Karl; Das, Debarati

doi:10.4230/LIPIcs.ICALP.2021.39

File

LIPIcs.ICALP.2021.39.pdf

Filesize: 0.71 MB
20 pages

Document Identifiers

DOI: 10.4230/LIPIcs.ICALP.2021.39
URN: urn:nbn:de:0030-drops-141082

Author Details

Karl Bringmann

Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Max-Planck-Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany

Debarati Das

Basic Algorithm Research Copenhagen (BARC), University of Copenhagen, Denmark

Acknowledgements

We thank an anonymous reviewer for suggesting how to turn the near-linear-time algorithm that we obtained in a previous version of this paper into a linear-time algorithm.

Cite AsGet BibTex

Karl Bringmann and Debarati Das. A Linear-Time n^{0.4}-Approximation for Longest Common Subsequence. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 39:1-39:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ICALP.2021.39

Abstract

We consider the classic problem of computing the Longest Common Subsequence (LCS) of two strings of length n. While a simple quadratic algorithm has been known for the problem for more than 40 years, no faster algorithm has been found despite an extensive effort. The lack of progress on the problem has recently been explained by Abboud, Backurs, and Vassilevska Williams [FOCS'15] and Bringmann and Künnemann [FOCS'15] who proved that there is no subquadratic algorithm unless the Strong Exponential Time Hypothesis fails. This major roadblock for getting faster exact algorithms has led the community to look for subquadratic approximation algorithms for the problem. Yet, unlike the edit distance problem for which a constant-factor approximation in almost-linear time is known, very little progress has been made on LCS, making it a notoriously difficult problem also in the realm of approximation. For the general setting (where we make no assumption on the length of the optimum solution or the alphabet size), only a naive O(n^{ε/2})-approximation algorithm with running time Õ(n^{2-ε}) has been known, for any constant 0 < ε ≤ 1. Recently, a breakthrough result by Hajiaghayi, Seddighin, Seddighin, and Sun [SODA'19] provided a linear-time algorithm that yields a O(n^{0.497956})-approximation in expectation; improving upon the naive O(√n)-approximation for the first time. In this paper, we provide an algorithm that in time O(n^{2-ε}) computes an Õ(n^{2ε/5})-approximation with high probability, for any 0 < ε ≤ 1. Our result (1) gives an Õ(n^{0.4})-approximation in linear time, improving upon the bound of Hajiaghayi, Seddighin, Seddighin, and Sun, (2) provides an algorithm whose approximation scales with any subquadratic running time O(n^{2-ε}), improving upon the naive bound of O(n^{ε/2}) for any ε, and (3) instead of only in expectation, succeeds with high probability.

Subject Classification

ACM Subject Classification

Theory of computation → Approximation algorithms analysis

Keywords

approximation algorithm
longest common subsequence
string algorithm

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Amir Abboud and Arturs Backurs. Towards hardness of approximation for polynomial time problems. In ITCS, volume 67 of LIPIcs, pages 11:1-11:26, 2017.
Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In FOCS, pages 59-78. IEEE, 2015.
Amir Abboud and Karl Bringmann. Tighter connections between Formula-SAT and shaving logs. In ICALP, volume 107 of LIPIcs, pages 8:1-8:18, 2018.
Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In STOC, pages 375-388. ACM, 2016.
Amir Abboud and Aviad Rubinstein. Fast and deterministic constant factor approximation algorithms for LCS imply new circuit lower bounds. In ITCS, volume 94 of LIPIcs, pages 35:1-35:14, 2018.
Alfred V. Aho, Daniel S. Hirschberg, and Jeffrey D. Ullman. Bounds on the complexity of the longest common subsequence problem. J. ACM, 23(1):1-12, 1976.
Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In FOCS, pages 377-386. IEEE, 2010.
Alexandr Andoni and Negev Shekel Nosatzki. Edit distance in near-linear time: it’s a constant factor. In FOCS, pages 990-1001. IEEE, 2020.
Alexandr Andoni and Krzysztof Onak. Approximating edit distance in near-linear time. SIAM J. Comput., 41(6):1635-1648, 2012.
Alberto Apostolico. Improving the worst-case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings. Inf. Process. Lett., 23(2):63-69, 1986.
Alberto Apostolico and Concettina Guerra. The longest common subsequence problem revisited. Algorithmica, 2:316-336, 1987.
Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In STOC, pages 51-58. ACM, 2015.
Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer, and Ravi Kumar. Approximating edit distance efficiently. In FOCS, pages 550-559. IEEE, 2004.
Tugkan Batu, Funda Ergün, Joe Kilian, Avner Magen, Sofya Raskhodnikova, Ronitt Rubinfeld, and Rahul Sami. A sublinear algorithm for weakly approximating edit distance. In STOC, pages 316-324. ACM, 2003.
Tugkan Batu, Funda Ergün, and Süleyman Cenk Sahinalp. Oblivious string embeddings and edit distance approximations. In SODA, pages 792-801. ACM, 2006.
Lasse Bergroth, Harri Hakonen, and Timo Raita. A survey of longest common subsequence algorithms. In SPIRE, pages 39-48. IEEE, 2000.
Joshua Brakensiek and Aviad Rubinstein. Constant-factor approximation of near-linear edit distance in near-linear time. In STOC, pages 685-698. ACM, 2020.
Karl Bringmann and Tobias Friedrich. Exact and efficient generation of geometric random variates and random graphs. In ICALP, volume 7965 of LNCS, pages 267-278, 2013.
Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In FOCS, pages 79-97. IEEE, 2015.
Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In SODA, pages 1216-1235. SIAM, 2018.
Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. Approximating edit distance within constant factor in truly sub-quadratic time. In FOCS, pages 979-990. IEEE, 2018.
David Eppstein, Zvi Galil, Raffaele Giancarlo, and Giuseppe F. Italiano. Sparse dynamic programming I: linear cost functions. J. ACM, 39(3):519-545, 1992.
Elazar Goldenberg, Robert Krauthgamer, and Barna Saha. Sublinear algorithms for gap edit distance. In FOCS, pages 1101-1120. IEEE, 2019.
MohammadTaghi Hajiaghayi, Masoud Seddighin, Saeed Seddighin, and Xiaorui Sun. Approximating LCS in linear time: Beating the √n barrier. In SODA, pages 1181-1200. SIAM, 2019.
MohammadTaghi Hajiaghayi, Masoud Seddighin, Saeed Seddighin, and Xiaorui Sun. Approximating LCS in linear time: Beating the √n barrier. CoRR, abs/2003.07285, 2020. URL: http://arxiv.org/abs/2003.07285.
Daniel S. Hirschberg. Algorithms for the longest common subsequence problem. J. ACM, 24(4):664-675, 1977.
James W. Hunt and Thomas G. Szymanski. A fast algorithm for computing longest subsequences. Commun. ACM, 20(5):350-353, 1977.
Costas S. Iliopoulos and Mohammad Sohel Rahman. A new efficient algorithm for computing the longest common subsequence. Theory Comput. Syst., 45(2):355-371, 2009.
Michal Koucký and Michael E. Saks. Constant factor approximations to edit distance on far input pairs in nearly linear time. In STOC, pages 699-712. ACM, 2020.
William J. Masek and Mike Paterson. A faster algorithm computing string edit distances. J. Comput. Syst. Sci., 20(1):18-31, 1980.
Eugene W. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1(2):251-266, 1986.
Narao Nakatsu, Yahiko Kambayashi, and Shuzo Yajima. A longest common subsequence algorithm suitable for similar text strings. Acta Informatica, 18:171-179, 1982.
Aviad Rubinstein, Saeed Seddighin, Zhao Song, and Xiaorui Sun. Approximation algorithms for LCS and LIS with truly improved running times. In FOCS, pages 1121-1145. IEEE, 2019.
Aviad Rubinstein and Zhao Song. Reducing approximate longest common subsequence to approximate edit distance. In SODA, pages 1591-1600. SIAM, 2020.
Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, 1974.
Sun Wu, Udi Manber, Gene Myers, and Webb Miller. An O(NP) sequence comparison algorithm. Inf. Process. Lett., 35(6):317-323, 1990.