Hardness of Approximation of (Multi-)LCS over Small Alphabet
The problem of finding longest common subsequence (LCS) is one of the fundamental problems in computer science, which finds application in fields such as computational biology, text processing, information retrieval, data compression etc. It is well known that (decision version of) the problem of finding the length of a LCS of an arbitrary number of input sequences (which we refer to as Multi-LCS problem) is NP-complete. Jiang and Li [SICOMP'95] showed that if Max-Clique is hard to approximate within a factor of s then Multi-LCS is also hard to approximate within a factor of Θ(s). By the NP-hardness of the problem of approximating Max-Clique by Zuckerman [ToC'07], for any constant δ > 0, the length of a LCS of arbitrary number of input sequences of length n each, cannot be approximated within an n^{1-δ}-factor in polynomial time unless {P}={NP}. However, the reduction of Jiang and Li assumes the alphabet size to be Ω(n). So far no hardness result is known for the problem of approximating Multi-LCS over sub-linear sized alphabet. On the other hand, it is easy to get 1/|Σ|-factor approximation for strings of alphabet Σ.
In this paper, we make a significant progress towards proving hardness of approximation over small alphabet by showing a polynomial-time reduction from the well-studied densest k-subgraph problem with perfect completeness to approximating Multi-LCS over alphabet of size poly(n/k). As a consequence, from the known hardness result of densest k-subgraph problem (e.g. [Manurangsi, STOC'17]) we get that no polynomial-time algorithm can give an n^{-o(1)}-factor approximation of Multi-LCS over an alphabet of size n^{o(1)}, unless the Exponential Time Hypothesis is false.
Longest common subsequence
Hardness of approximation
ETH-hardness
Densest k-subgraph problem
Theory of computation~Problems, reductions and completeness
38:1-38:16
APPROX
A full version of the paper is available at https://arxiv.org/abs/2006.13449.
Authors would like to thank anonymous reviewers for providing helpful comments on an earlier version of this paper and especially for pointing out a small technical mistake in the proof of Lemma 14. Authors would also like to thank Pasin Manurangsi for pointing out that for certain regimes no hardness result is known for the densest k-subgraph problem.
Amey
Bhangale
Amey Bhangale
University of California Riverside, CA, USA
Diptarka
Chakraborty
Diptarka Chakraborty
National University of Singapore, Singapore
Supported in part by NUS ODPRT Grant, WBS No. R-252-000-A94-133.
Rajendra
Kumar
Rajendra Kumar
IIT Kanpur, India
National University of Singapore, Singapore
Supported in part by National Research Foundation Singapore under its AI Singapore Programme [Award Number: AISG-RP-2018-005].
10.4230/LIPIcs.APPROX/RANDOM.2020.38
Amir Abboud and Arturs Backurs. Towards hardness of approximation for polynomial time problems. In 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA, pages 11:1-11:26, 2017.
Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59-78, 2015.
Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 375-388, 2016.
Amir Abboud and Aviad Rubinstein. Fast and deterministic constant factor approximation algorithms for LCS imply new circuit lower bounds. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, pages 35:1-35:14, 2018.
Lasse Bergroth, Harri Hakonen, and Timo Raita. A survey of longest common subsequence algorithms. In Pablo de la Fuente, editor, Seventh International Symposium on String Processing and Information Retrieval, SPIRE 2000, A Coruña, Spain, September 27-29, 2000, pages 39-48. IEEE Computer Society, 2000.
Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaraghavan. Detecting high log-densities: an O(n^1/4) approximation for densest k-subgraph. In Leonard J. Schulman, editor, Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 201-210. ACM, 2010.
Guillaume Blin, Laurent Bulteau, Minghui Jiang, Pedro J. Tejada, and Stéphane Vialette. Hardness of longest common subsequence for sequences with bounded run-lengths. In Combinatorial Pattern Matching - 23rd Annual Symposium, CPM 2012, Helsinki, Finland, July 3-5, 2012. Proceedings, pages 138-148, 2012.
Mark Braverman, Young Kun Ko, Aviad Rubinstein, and Omri Weinstein. ETH hardness for densest-k-subgraph with perfect completeness. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1326-1341. SIAM, 2017.
Kuan Cheng, Bernhard Haeupler, Xin Li, Amirbehshad Shahrasbi, and Ke Wu. Synchronization strings: Highly efficient deterministic constructions over small alphabets. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 2185-2204, 2019.
Uriel Feige, David Peleg, and Guy Kortsarz. The dense k-subgraph problem. Algorithmica, 29(3):410-421, 2001.
Uriel Feige and Michael Seltser. On the densest k-subgraph problem. Algorithmica, 29, 1997.
Szymon Grabowski. New tabulation and sparse dynamic programming based techniques for sequence similarity problems. Discrete Applied Mathematics, 212:96-103, 2016.
Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997.
Bernhard Haeupler and Amirbehshad Shahrasbi. Synchronization strings: explicit constructions, local decoding, and applications. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 841-854, 2018.
MohammadTaghi Hajiaghayi, Masoud Seddighin, Saeed Seddighin, and Xiaorui Sun. Approximating LCS in linear time: Beating the √n barrier. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 1181-1200, 2019.
D.S. Hirschberg. Recent results on the complexity of common subsequence problems. In Time Warps, String Edits, and Macromolecules, D. Sankoff and J.B. Kruskal, ed., Addison-Wesley, pages 323-328, 1983.
Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-SAT. Journal of Computer and System Sciences, 62(2):367-375, 2001.
Tao Jiang and Ming Li. On the approximation of shortest common supersequences and longest common subsequences. SIAM J. on Computing, 24(5):1122-1139, 1995.
Subhash Khot. Ruling out ptas for graph min-bisection, dense k-subgraph, and bipartite clique. SIAM Journal on Computing, 36(4):1025-1071, 2006.
Marcos Kiwi, Martin Loebl, and Jiří Matoušek. Expected length of the longest common subsequence for large alphabets. Advances in Mathematics, 197(2):480-498, 2005.
G Kortsarz and D Peleg. On choosing a dense subgraph. In Proceedings of the 1993 IEEE 34th Annual Foundations of Computer Science, pages 692-701. IEEE Computer Society, 1993.
S. Lu and K. S. Fu. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Transactions on Systems, Man, and Cybernetics, 8(5):381-389, May 1978.
David Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25(2):322-336, April 1978.
Pasin Manurangsi. Almost-polynomial ratio ETH-hardness of approximating densest k-subgraph. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 954-961. ACM, 2017.
William J. Masek and Michael S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20(1):18-31, 1980.
Christos H. Papadimitriou and Mihalis Yannakakis. Optimization, approximation, and complexity classes. J. Comput. Syst. Sci., 43(3):425-440, 1991.
Pavel A. Pevzner. Multiple alignment with guaranteed error bounds and communication cost. In Combinatorial Pattern Matching, Third Annual Symposium, CPM 92, Tucson, Arizona, USA, April 29 - May 1, 1992, Proceedings, pages 205-213, 1992.
Prasad Raghavendra and David Steurer. Graph expansion and the unique games conjecture. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 755-764. ACM, 2010.
Aviad Rubinstein and Zhao Song. Reducing approximate longest common subsequence to approximate edit distance. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1591-1600. SIAM, 2020.
Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, January 1974.
David Zuckerman. Linear degree extractors and the inapproximability of max clique and chromatic number. Theory of Computing, 3(6):103-128, 2007.
Amey Bhangale, Diptarka Chakraborty, and Rajendra Kumar
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode