Lower Bounds and Improved Algorithms for Asymmetric Streaming Edit Distance and Longest Common Subsequence
In this paper, we study edit distance (ED) and longest common subsequence (LCS) in the asymmetric streaming model, introduced by Saks and Seshadhri [Saks and Seshadhri, 2013]. As an intermediate model between the random access model and the streaming model, this model allows one to have streaming access to one string and random access to the other string. Meanwhile, ED and LCS are both fundamental problems that are often studied on large strings, thus the (asymmetric) streaming model is ideal for studying these problems.
Our first main contribution is a systematic study of space lower bounds for ED and LCS in the asymmetric streaming model. Previously, there are no explicitly stated results in this context, although some lower bounds about LCS can be inferred from the lower bounds for longest increasing subsequence (LIS) in [Sun and Woodruff, 2007; Gál and Gopalan, 2010; Ergun and Jowhari, 2008]. Yet these bounds only work for large alphabet size. In this paper, we develop several new techniques to handle ED in general and LCS for small alphabet size, thus establishing strong lower bounds for both problems. In particular, our lower bound for ED provides an exponential separation between edit distance and Hamming distance in the asymmetric streaming model. Our lower bounds also extend to LIS and longest non-decreasing subsequence (LNS) in the standard streaming model. Together with previous results, our bounds provide an almost complete picture for these two problems.
As our second main contribution, we give improved algorithms for ED and LCS in the asymmetric streaming model. For ED, we improve the space complexity of the constant factor approximation algorithms in [Farhadi et al., 2020; Cheng et al., 2020] from Õ({n^δ}/δ) to O({d^δ}/δ polylog(n)), where n is the length of each string and d is the edit distance between the two strings. For LCS, we give the first 1/2+ε approximation algorithm with space n^δ for any constant δ > 0, over a binary alphabet. Our work leaves a plethora of intriguing open questions, including establishing lower bounds and designing algorithms for a natural generalization of LIS and LNS, which we call longest non-decreasing subsequence with threshold (LNST).
Asymmetric Streaming Model
Edit Distance
Longest Common Subsequence
Space Lower Bound
Theory of computation~Lower bounds and information complexity
27:1-27:23
Regular Paper
https://arxiv.org/abs/2103.00713
Xin
Li
Xin Li
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Supported by NSF CAREER Award CCF-1845349.
Yu
Zheng
Yu Zheng
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Supported by NSF CAREER Award CCF-1845349
10.4230/LIPIcs.FSTTCS.2021.27
Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for lcs and other sequence similarity measures. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on. IEEE, 2015.
Alexandr Andoni, T.S. Jayram, and Mihai Patrascu. Lower bounds for edit distance and product metrics via poincare type inequalities. In Proceedings of the twenty first annual ACM-SIAM symposium on Discrete algorithms, pages 184-192, 2010.
Alexandr Andoni and Robert Krauthgamer. The computational hardness of estimating edit distance. SIAM Journal on Discrete Mathematics, 39(6):2398-2429, 2010.
Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Foundations of Computer Science (FOCS), 2010 IEEE 51st Annual Symposium on. IEEE, 2010.
Alexandr Andoni and Negev Shekel Nosatzki. Edit distance in near-linear time: it’s a constant factor. In Proceedings of the 61st Annual Symposium on Foundations of Computer Science (FOCS), 2020.
Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless seth is false). In Proceedings of the forty-seventh annual ACM symposium on Theory of computing (STOC). IEEE, 2015.
Djamal Belazzougui and Qin Zhang. Edit distance: Sketching, streaming, and document exchange. In Proceedings of the 57th IEEE Annual Symposium on Foundations of Computer Science, pages 51-60. IEEE, 2016.
Joshua Brakensiek and Aviad Rubinstein. Constant-factor approximation of near-linear edit distance in near-linear time. In Proceedings of the 52nd annual ACM symposium on Theory of computing (STOC), 2020.
Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucky, and Michael Saks. Approximating edit distance within constant factor in truly sub-quadratic time. In Foundations of Computer Science (FOCS), 2018 IEEE 59th Annual Symposium on. IEEE, 2019.
Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. Low distortion embedding from edit to hamming distance using coupling. In Proceedings of the 48th IEEE Annual Annual ACM SIGACT Symposium on Theory of Computing. ACM, 2016.
Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. Streaming algorithms for computing edit distance without exploiting suffix trees. arXiv preprint, 2016. URL: http://arxiv.org/abs/1607.03718.
http://arxiv.org/abs/1607.03718
Kuan Cheng, Alireza Farhadi, MohammadTaghi Hajiaghayi, Zhengzhong Jin, Xin Li, Aviad Rubinstein, Saeed Seddighin, and Yu Zheng. Streaming and small space approximation algorithms for edit distance and longest common subsequence. In International Colloquium on Automata, Languages, and Programming. Springer, 2021.
Kuan Cheng, Zhengzhong Jin, Xin Li, and Yu Zheng. Space efficient deterministic approximation of string measures. arXiv preprint, 2020. URL: http://arxiv.org/abs/2002.08498.
http://arxiv.org/abs/2002.08498
Funda Ergun and Hossein Jowhari. On distance to monotonicity and longest increasing subsequence of a data stream. In Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pages 730-736, 2008.
Alireza Farhadi, MohammadTaghi Hajiaghayi, Aviad Rubinstein, and Saeed Seddighin. Streaming with oracle: New streaming algorithms for edit distance and lcs. arXiv preprint, 2020. URL: http://arxiv.org/abs/2002.11342.
http://arxiv.org/abs/2002.11342
Anna Gál and Parikshit Gopalan. Lower bounds on streaming algorithms for approximating the length of the longest increasing subsequence. SIAM Journal on Computing, 39(8):3463-3479, 2010.
Parikshit Gopalan, TS Jayram, Robert Krauthgamer, and Ravi Kumar. Estimating the sortedness of a data stream. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 318-327. Society for Industrial and Applied Mathematics, 2007.
MohammadTaghi Hajiaghayi, Masoud Seddighin, Saeed Seddighin, and Xiaorui Sun. Approximating lcs in linear time: beating the √n barrier. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1181-1200. Society for Industrial and Applied Mathematics, 2019.
Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity. Journal of Computer and System Sciences, 63(4):512-530, 2001.
Michal Koucky and Michael E Saks. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Proceedings of the 52nd annual ACM symposium on Theory of computing (STOC), 2020.
Eyal Kushilevitz and Noam Nisan. Communication Complexity. Cambridge Press, 1997.
David Liben-Nowell, Erik Vee, and An Zhu. Finding longest increasing and common subsequences in streaming data. In COCOON, 2005.
Aviad Rubinstein, Saeed Seddighin, Zhao Song, and Xiaorui Sun. Approximation algorithms for lcs and lis with truly improved running times. In Foundations of Computer Science (FOCS), 2019 IEEE 60th Annual Symposium on. IEEE, 2019.
Aviad Rubinstein and Zhao Song. Reducing approximate longest common subsequence to approximate edit distance. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1591-1600. SIAM, 2020.
Barna Saha. Fast & space-efficient approximations of language edit distance and RNA folding: An amnesic dynamic programming approach. In FOCS, 2017.
Michael Saks and C Seshadhri. Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, pages 1698-1709. SIAM, 2013.
Leonard J Schulman and David Zuckerman. Asymptotically good codes correcting insertions, deletions, and transpositions. IEEE transactions on information theory, 45(7):2552-2557, 1999.
Xiaoming Sun and David P Woodruff. The communication and streaming complexity of computing the longest common and increasing subsequences. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 336-345, 2007.
Xin Li and Yu Zheng
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode