Lower Bounds and Improved Algorithms for Asymmetric Streaming Edit Distance and Longest Common Subsequence

Authors Xin Li, Yu Zheng



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2021.27.pdf
  • Filesize: 0.75 MB
  • 23 pages

Document Identifiers

Author Details

Xin Li
  • Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Yu Zheng
  • Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA

Cite AsGet BibTex

Xin Li and Yu Zheng. Lower Bounds and Improved Algorithms for Asymmetric Streaming Edit Distance and Longest Common Subsequence. In 41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 213, pp. 27:1-27:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.FSTTCS.2021.27

Abstract

In this paper, we study edit distance (ED) and longest common subsequence (LCS) in the asymmetric streaming model, introduced by Saks and Seshadhri [Saks and Seshadhri, 2013]. As an intermediate model between the random access model and the streaming model, this model allows one to have streaming access to one string and random access to the other string. Meanwhile, ED and LCS are both fundamental problems that are often studied on large strings, thus the (asymmetric) streaming model is ideal for studying these problems. Our first main contribution is a systematic study of space lower bounds for ED and LCS in the asymmetric streaming model. Previously, there are no explicitly stated results in this context, although some lower bounds about LCS can be inferred from the lower bounds for longest increasing subsequence (LIS) in [Sun and Woodruff, 2007; Gál and Gopalan, 2010; Ergun and Jowhari, 2008]. Yet these bounds only work for large alphabet size. In this paper, we develop several new techniques to handle ED in general and LCS for small alphabet size, thus establishing strong lower bounds for both problems. In particular, our lower bound for ED provides an exponential separation between edit distance and Hamming distance in the asymmetric streaming model. Our lower bounds also extend to LIS and longest non-decreasing subsequence (LNS) in the standard streaming model. Together with previous results, our bounds provide an almost complete picture for these two problems. As our second main contribution, we give improved algorithms for ED and LCS in the asymmetric streaming model. For ED, we improve the space complexity of the constant factor approximation algorithms in [Farhadi et al., 2020; Cheng et al., 2020] from Õ({n^δ}/δ) to O({d^δ}/δ polylog(n)), where n is the length of each string and d is the edit distance between the two strings. For LCS, we give the first 1/2+ε approximation algorithm with space n^δ for any constant δ > 0, over a binary alphabet. Our work leaves a plethora of intriguing open questions, including establishing lower bounds and designing algorithms for a natural generalization of LIS and LNS, which we call longest non-decreasing subsequence with threshold (LNST).

Subject Classification

ACM Subject Classification
  • Theory of computation → Lower bounds and information complexity
Keywords
  • Asymmetric Streaming Model
  • Edit Distance
  • Longest Common Subsequence
  • Space Lower Bound

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for lcs and other sequence similarity measures. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on. IEEE, 2015. Google Scholar
  2. Alexandr Andoni, T.S. Jayram, and Mihai Patrascu. Lower bounds for edit distance and product metrics via poincare type inequalities. In Proceedings of the twenty first annual ACM-SIAM symposium on Discrete algorithms, pages 184-192, 2010. Google Scholar
  3. Alexandr Andoni and Robert Krauthgamer. The computational hardness of estimating edit distance. SIAM Journal on Discrete Mathematics, 39(6):2398-2429, 2010. Google Scholar
  4. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Foundations of Computer Science (FOCS), 2010 IEEE 51st Annual Symposium on. IEEE, 2010. Google Scholar
  5. Alexandr Andoni and Negev Shekel Nosatzki. Edit distance in near-linear time: it’s a constant factor. In Proceedings of the 61st Annual Symposium on Foundations of Computer Science (FOCS), 2020. Google Scholar
  6. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless seth is false). In Proceedings of the forty-seventh annual ACM symposium on Theory of computing (STOC). IEEE, 2015. Google Scholar
  7. Djamal Belazzougui and Qin Zhang. Edit distance: Sketching, streaming, and document exchange. In Proceedings of the 57th IEEE Annual Symposium on Foundations of Computer Science, pages 51-60. IEEE, 2016. Google Scholar
  8. Joshua Brakensiek and Aviad Rubinstein. Constant-factor approximation of near-linear edit distance in near-linear time. In Proceedings of the 52nd annual ACM symposium on Theory of computing (STOC), 2020. Google Scholar
  9. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucky, and Michael Saks. Approximating edit distance within constant factor in truly sub-quadratic time. In Foundations of Computer Science (FOCS), 2018 IEEE 59th Annual Symposium on. IEEE, 2019. Google Scholar
  10. Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. Low distortion embedding from edit to hamming distance using coupling. In Proceedings of the 48th IEEE Annual Annual ACM SIGACT Symposium on Theory of Computing. ACM, 2016. Google Scholar
  11. Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. Streaming algorithms for computing edit distance without exploiting suffix trees. arXiv preprint, 2016. URL: http://arxiv.org/abs/1607.03718.
  12. Kuan Cheng, Alireza Farhadi, MohammadTaghi Hajiaghayi, Zhengzhong Jin, Xin Li, Aviad Rubinstein, Saeed Seddighin, and Yu Zheng. Streaming and small space approximation algorithms for edit distance and longest common subsequence. In International Colloquium on Automata, Languages, and Programming. Springer, 2021. Google Scholar
  13. Kuan Cheng, Zhengzhong Jin, Xin Li, and Yu Zheng. Space efficient deterministic approximation of string measures. arXiv preprint, 2020. URL: http://arxiv.org/abs/2002.08498.
  14. Funda Ergun and Hossein Jowhari. On distance to monotonicity and longest increasing subsequence of a data stream. In Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pages 730-736, 2008. Google Scholar
  15. Alireza Farhadi, MohammadTaghi Hajiaghayi, Aviad Rubinstein, and Saeed Seddighin. Streaming with oracle: New streaming algorithms for edit distance and lcs. arXiv preprint, 2020. URL: http://arxiv.org/abs/2002.11342.
  16. Anna Gál and Parikshit Gopalan. Lower bounds on streaming algorithms for approximating the length of the longest increasing subsequence. SIAM Journal on Computing, 39(8):3463-3479, 2010. Google Scholar
  17. Parikshit Gopalan, TS Jayram, Robert Krauthgamer, and Ravi Kumar. Estimating the sortedness of a data stream. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 318-327. Society for Industrial and Applied Mathematics, 2007. Google Scholar
  18. MohammadTaghi Hajiaghayi, Masoud Seddighin, Saeed Seddighin, and Xiaorui Sun. Approximating lcs in linear time: beating the √n barrier. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1181-1200. Society for Industrial and Applied Mathematics, 2019. Google Scholar
  19. Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity. Journal of Computer and System Sciences, 63(4):512-530, 2001. Google Scholar
  20. Michal Koucky and Michael E Saks. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Proceedings of the 52nd annual ACM symposium on Theory of computing (STOC), 2020. Google Scholar
  21. Eyal Kushilevitz and Noam Nisan. Communication Complexity. Cambridge Press, 1997. Google Scholar
  22. David Liben-Nowell, Erik Vee, and An Zhu. Finding longest increasing and common subsequences in streaming data. In COCOON, 2005. Google Scholar
  23. Aviad Rubinstein, Saeed Seddighin, Zhao Song, and Xiaorui Sun. Approximation algorithms for lcs and lis with truly improved running times. In Foundations of Computer Science (FOCS), 2019 IEEE 60th Annual Symposium on. IEEE, 2019. Google Scholar
  24. Aviad Rubinstein and Zhao Song. Reducing approximate longest common subsequence to approximate edit distance. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1591-1600. SIAM, 2020. Google Scholar
  25. Barna Saha. Fast & space-efficient approximations of language edit distance and RNA folding: An amnesic dynamic programming approach. In FOCS, 2017. Google Scholar
  26. Michael Saks and C Seshadhri. Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, pages 1698-1709. SIAM, 2013. Google Scholar
  27. Leonard J Schulman and David Zuckerman. Asymptotically good codes correcting insertions, deletions, and transpositions. IEEE transactions on information theory, 45(7):2552-2557, 1999. Google Scholar
  28. Xiaoming Sun and David P Woodruff. The communication and streaming complexity of computing the longest common and increasing subsequences. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 336-345, 2007. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail