Document Open Access Logo

An Improved Sketching Algorithm for Edit Distance

Authors Ce Jin , Jelani Nelson , Kewen Wu



PDF
Thumbnail PDF

File

LIPIcs.STACS.2021.45.pdf
  • Filesize: 0.84 MB
  • 16 pages

Document Identifiers

Author Details

Ce Jin
  • MIT, Cambridge, MA, USA
Jelani Nelson
  • University of California at Berkeley, CA, USA
Kewen Wu
  • University of California at Berkeley, CA, USA

Acknowledgements

We thank Qin Zhang for answering several questions about [Djamal Belazzougui and Qin Zhang, 2016]. C. J. thanks Virginia Vassilevska Williams for several helpful discussions. We thank anonymous reviewers for their helpful comments.

Cite AsGet BibTex

Ce Jin, Jelani Nelson, and Kewen Wu. An Improved Sketching Algorithm for Edit Distance. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 45:1-45:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.STACS.2021.45

Abstract

We provide improved upper bounds for the simultaneous sketching complexity of edit distance. Consider two parties, Alice with input x ∈ Σⁿ and Bob with input y ∈ Σⁿ, that share public randomness and are given a promise that the edit distance ed(x,y) between their two strings is at most some given value k. Alice must send a message sx and Bob must send sy to a third party Charlie, who does not know the inputs but shares the same public randomness and also knows k. Charlie must output ed(x,y) precisely as well as a sequence of ed(x,y) edits required to transform x into y. The goal is to minimize the lengths |sx|, |sy| of the messages sent. The protocol of Belazzougui and Zhang (FOCS 2016), building upon the random walk method of Chakraborty, Goldenberg, and Koucký (STOC 2016), achieves a maximum message length of Õ(k⁸) bits, where Õ(⋅) hides poly(log n) factors. In this work we build upon Belazzougui and Zhang’s protocol and provide an improved analysis demonstrating that a slight modification of their construction achieves a bound of Õ(k³).

Subject Classification

ACM Subject Classification
  • Theory of computation → Sketching and sampling
Keywords
  • edit distance
  • sketching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alexandr Andoni and Robert Krauthgamer. The smoothed complexity of edit distance. ACM Trans. Algorithms, 8(4):44:1-44:25, 2012. URL: https://doi.org/10.1145/2344422.2344434.
  2. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 377-386, 2010. URL: https://doi.org/10.1109/FOCS.2010.43.
  3. Alexandr Andoni and Negev Shekel Nosatzki. Edit distance in near-linear time: it’s a constant factor. CoRR, abs/2005.07678, 2020. To appear in FOCS 2020. URL: http://arxiv.org/abs/2005.07678.
  4. Alexandr Andoni and Krzysztof Onak. Approximating edit distance in near-linear time. SIAM J. Comput., 41(6):1635-1648, 2012. URL: https://doi.org/10.1137/090767182.
  5. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput., 47(3):1087-1097, 2018. URL: https://doi.org/10.1137/15M1053128.
  6. Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer, and Ravi Kumar. Approximating edit distance efficiently. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 550-559, 2004. URL: https://doi.org/10.1109/FOCS.2004.14.
  7. Djamal Belazzougui and Qin Zhang. Edit distance: Sketching, streaming, and document exchange. In Proceedings of the 57th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 51-60. IEEE Computer Society, 2016. Full version at https://arxiv.org/abs/1607.04200. URL: https://doi.org/10.1109/FOCS.2016.15.
  8. Mahdi Boroujeni, Soheil Ehsani, Mohammad Ghodsi, Mohammad Taghi Hajiaghayi, and Saeed Seddighin. Approximating edit distance in truly subquadratic time: Quantum and MapReduce. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1170-1189, 2018. URL: https://doi.org/10.1137/1.9781611975031.76.
  9. Mahdi Boroujeni, Masoud Seddighin, and Saeed Seddighin. Improved algorithms for edit distance and LCS: beyond worst case. In Proceedings of the 31st ACM-SIAM Symposium on Discrete Algorithms, (SODA), pages 1601-1620. SIAM, 2020. URL: https://doi.org/10.1137/1.9781611975994.99.
  10. Joshua Brakensiek, Moses Charikar, and Aviad Rubinstein. A simple sublinear algorithm for gap edit distance. CoRR, abs/2007.14368, 2020. URL: http://arxiv.org/abs/2007.14368.
  11. Joshua Brakensiek and Aviad Rubinstein. Constant-factor approximation of near-linear edit distance in near-linear time. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 685-698, 2020. URL: https://doi.org/10.1145/3357713.3384282.
  12. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. Approximating edit distance within constant factor in truly sub-quadratic time. In Proceedings of the 59th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 979-990. IEEE Computer Society, 2018. URL: https://doi.org/10.1109/FOCS.2018.00096.
  13. Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. Streaming algorithms for embedding and computing edit distance in the low distance regime. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 712-725. ACM, 2016. URL: https://doi.org/10.1145/2897518.2897577.
  14. Moses Charikar and Robert Krauthgamer. Embedding the ulam metric into 𝓁₁. Theory Comput., 2(11):207-224, 2006. URL: https://doi.org/10.4086/toc.2006.v002a011.
  15. Kuan Cheng, Zhengzhong Jin, Xin Li, and Ke Wu. Deterministic document exchange protocols, and almost optimal binary codes for edit errors. In Proceedings of the 59th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 200-211, 2018. URL: https://doi.org/10.1109/FOCS.2018.00028.
  16. Kuan Cheng and Xin Li. Efficient document exchange and error correcting codes with asymmetric information. CoRR, abs/2007.00870, 2020. To appear in SODA 2021. URL: http://arxiv.org/abs/2007.00870.
  17. Elazar Goldenberg, Robert Krauthgamer, and Barna Saha. Sublinear algorithms for gap edit distance. In Proceedings of the 60th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 1101-1120, 2019. URL: https://doi.org/10.1109/FOCS.2019.00070.
  18. Bernhard Haeupler. Optimal document exchange and new codes for insertions and deletions. In Proceedings of the 60th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 334-347, 2019. URL: https://doi.org/10.1109/FOCS.2019.00029.
  19. Subhash Khot and Assaf Naor. Nonembeddability theorems via Fourier analysis. Mathematische Annalen, 334:821-852, 2006. Google Scholar
  20. Tomasz Kociumaka and Barna Saha. Sublinear-time algorithms for computing & embedding gap edit distance. In Proceedings of the 61st IEEE Annual Symposium on Foundations of Computer Science (FOCS), 2020. Google Scholar
  21. Michal Koucký and Michael E. Saks. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 699-712. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384307.
  22. Robert Krauthgamer and Yuval Rabani. Improved lower bounds for embeddings into l₁. SIAM J. Comput., 38(6):2487-2498, 2009. URL: https://doi.org/10.1137/060660126.
  23. William J. Masek and Mike Paterson. A faster algorithm computing string edit distances. J. Comput. Syst. Sci., 20(1):18-31, 1980. URL: https://doi.org/10.1016/0022-0000(80)90002-1.
  24. Noam Nisan. Pseudorandom generators for space-bounded computation. Comb., 12(4):449-461, 1992. URL: https://doi.org/10.1007/BF01305237.
  25. Alon Orlitsky. Interactive communication: Balanced distributions, correlated files, and average-case complexity. In Proceedings of the 32nd Annual Symposium on Foundations of Computer Science (FOCS), pages 228-238, 1991. URL: https://doi.org/10.1109/SFCS.1991.185373.
  26. Rafail Ostrovsky and Yuval Rabani. Low distortion embeddings for edit distance. J. ACM, 54(5):23, 2007. URL: https://doi.org/10.1145/1284320.1284322.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail