Improved Sublinear-Time Edit Distance for Preprocessed Strings

Authors Karl Bringmann, Alejandro Cassis, Nick Fischer, Vasileios Nakos



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2022.32.pdf
  • Filesize: 0.77 MB
  • 20 pages

Document Identifiers

Author Details

Karl Bringmann
  • Universität des Saarlandes, Saarland Informatics Campus, Saarbrücken, Germany
  • Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Alejandro Cassis
  • Universität des Saarlandes, Saarland Informatics Campus, Saarbrücken, Germany
  • Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Nick Fischer
  • Universität des Saarlandes, Saarland Informatics Campus, Saarbrücken, Germany
  • Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Vasileios Nakos
  • RelationalAI, Berkeley, CA, USA

Cite As Get BibTex

Karl Bringmann, Alejandro Cassis, Nick Fischer, and Vasileios Nakos. Improved Sublinear-Time Edit Distance for Preprocessed Strings. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 32:1-32:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ICALP.2022.32

Abstract

We study the problem of approximating the edit distance of two strings in sublinear time, in a setting where one or both string(s) are preprocessed, as initiated by Goldenberg, Rubinstein, Saha (STOC '20). Specifically, in the (k, K)-gap edit distance problem, the goal is to distinguish whether the edit distance of two strings is at most k or at least K. We obtain the following results:  
- After preprocessing one string in time n^{1+o(1)}, we can solve (k, k ⋅ n^o(1))-gap-gap edit distance in time (n/k + k) ⋅ n^o(1). 
- After preprocessing both strings separately in time n^{1+o(1)}, we can solve (k, k ⋅ n^o(1))-gap edit distance in time kn^o(1).  Both results improve upon some previously best known result, with respect to either the gap or the query time or the preprocessing time.
Our algorithms build on the framework by Andoni, Krauthgamer and Onak (FOCS '10) and the recent sublinear-time algorithm by Bringmann, Cassis, Fischer and Nakos (STOC '22). We replace many complicated parts in their algorithm by faster and simpler solutions which exploit the preprocessing.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • Edit Distance
  • Property Testing
  • Preprocessing
  • Precision Sampling

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In Proceedings of the 56th IEEE Annual Symposium on Foundations of Computer Science, FOCS '15, pages 59-78. IEEE Computer Society, 2015. Google Scholar
  2. Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In Proceedings of the 48th ACM Symposium on Theory of Computing, STOC '16, pages 375-388. ACM, 2016. Google Scholar
  3. Alexandr Andoni. High frequency moments via max-stability. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP '17, pages 6364-6368. IEEE, 2017. Google Scholar
  4. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Proceedings of the 51st IEEE Annual Symposium on Foundations of Computer Science, FOCS '10, pages 377-386. IEEE Computer Society, 2010. Google Scholar
  5. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms via precision sampling. In Proceedings of the 52nd IEEE Annual Symposium on Foundations of Computer Science, FOCS '11, pages 363-372. IEEE Computer Society, 2011. Google Scholar
  6. Alexandr Andoni and Negev Shekel Nosatzki. Edit distance in near-linear time: it’s a constant factor. In Proceedings of the 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS '20, pages 990-1001. IEEE Computer Society, 2020. Google Scholar
  7. Alexandr Andoni and Krzysztof Onak. Approximating edit distance in near-linear time. SIAM J. Comput., 41(6):1635-1648, 2012. Google Scholar
  8. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Proceedings of the 47th ACM Symposium on Theory of Computing, STOC '15, pages 51-58. ACM, 2015. Google Scholar
  9. Ziv Bar-Yossef, S. Thathachar Jayram, Robert Krauthgamer, and Ravi Kumar. Approximating edit distance efficiently. In Proceedings of the 45th IEEE Annual Symposium on Foundations of Computer Science, FOCS '04, pages 550-559. IEEE Computer Society, 2004. Google Scholar
  10. Tugkan Batu, Funda Ergün, Joe Kilian, Avner Magen, Sofya Raskhodnikova, Ronitt Rubinfeld, and Rahul Sami. A sublinear algorithm for weakly approximating edit distance. In Proceedings of the 35th ACM Symposium on Theory of Computing, STOC '03, pages 316-324. ACM, 2003. Google Scholar
  11. Tugkan Batu, Funda Ergun, and Cenk Sahinalp. Oblivious string embeddings and edit distance approximations. In Proceedings of the 17th ACM-SIAM Symposium on Discrete Algorithms, SODA '06, pages 792-801. SIAM, 2006. Google Scholar
  12. Djamal Belazzougui and Qin Zhang. Edit distance: Sketching, streaming, and document exchange. In Proceedings of the 57th IEEE Annual Symposium on Foundations of Computer Science, FOCS '16, pages 51-60. IEEE Computer Society, 2016. Google Scholar
  13. Joshua Brakensiek, Moses Charikar, and Aviad Rubinstein. A simple sublinear algorithm for gap edit distance, 2020. URL: http://arxiv.org/abs/2007.14368.
  14. Joshua Brakensiek and Aviad Rubinstein. Constant-factor approximation of near-linear edit distance in near-linear time. In Proceedings of the 52nd ACM Symposium on Theory of Computing, STOC '20, pages 685-698. ACM, 2020. Google Scholar
  15. Karl Bringmann, Alejandro Cassis, Nick Fischer, and Vasileios Nakos. Almost-optimal sublinear-time edit distance in the low distance regime. In Proceedings of the 54rd ACM Symposium on Theory of Computing, STOC '22. ACM, 2022. Google Scholar
  16. Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In Proceedings of the 56th IEEE Annual Symposium on Foundations of Computer Science, FOCS '15, pages 79-97. IEEE Computer Society, 2015. Google Scholar
  17. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael Saks. Approximating edit distance within constant factor in truly sub-quadratic time. J. ACM, 67(6), 2020. Google Scholar
  18. Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. Streaming algorithms for embedding and computing edit distance in the low distance regime. In Proceedings of the 48th ACM Symposium on Theory of Computing, STOC '16, pages 712-725. ACM, 2016. Google Scholar
  19. Elazar Goldenberg, Tomasz Kociumaka, Robert Krauthgamer, and Barna Saha. Gap edit distance via non-adaptive queries: Simple and optimal, 2021. URL: http://arxiv.org/abs/2111.12706.
  20. Elazar Goldenberg, Robert Krauthgamer, and Barna Saha. Sublinear algorithms for gap edit distance. In Proceedings of the 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS '20, pages 1101-1120. IEEE Computer Society, 2019. Google Scholar
  21. Elazar Goldenberg, Aviad Rubinstein, and Barna Saha. Does preprocessing help in fast sequence comparisons? In Proceedings of the 52nd ACM Symposium on Theory of Computing, STOC '20, pages 657-670. ACM, 2020. Google Scholar
  22. Bernhard Haeupler. Optimal document exchange and new codes for insertions and deletions. In Proceedings of the 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS '19, pages 334-347. IEEE Computer Society, 2019. Google Scholar
  23. Hossein Jowhari. Efficient communication protocols for deciding edit distance. In Proceedings of the 20th Annual European Conference on Algorithms, ESA '12, pages 648-658. Springer, 2012. Google Scholar
  24. Tomasz Kociumaka and Barna Saha. Sublinear-time algorithms for computing & embedding gap edit distance. In Proceedings of the 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS '20, pages 1168-1179. IEEE Computer Society, 2020. Google Scholar
  25. Michal Koucký and Michael E. Saks. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Proceedings of the 52nd ACM Symposium on Theory of Computing, STOC '20, pages 699-712. ACM, 2020. Google Scholar
  26. Rafail Ostrovsky and Yuval Rabani. Low distortion embeddings for edit distance. J. ACM, 54(5):23, 2007. Google Scholar
  27. Sebastian Wandelt, Dong Deng, Stefan Gerdjikov, Shashwat Mishra, Petar Mitankin, Manish Patil, Enrico Siragusa, Alexander Tiskin, Wei Wang, Jiaying Wang, and Ulf Leser. State-of-the-art in string similarity search and join. SIGMOD Rec., 43(1):64-76, 2014. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail