Improved Sublinear-Time Edit Distance for Preprocessed Strings
We study the problem of approximating the edit distance of two strings in sublinear time, in a setting where one or both string(s) are preprocessed, as initiated by Goldenberg, Rubinstein, Saha (STOC '20). Specifically, in the (k, K)-gap edit distance problem, the goal is to distinguish whether the edit distance of two strings is at most k or at least K. We obtain the following results:
- After preprocessing one string in time n^{1+o(1)}, we can solve (k, k ⋅ n^o(1))-gap-gap edit distance in time (n/k + k) ⋅ n^o(1).
- After preprocessing both strings separately in time n^{1+o(1)}, we can solve (k, k ⋅ n^o(1))-gap edit distance in time kn^o(1). Both results improve upon some previously best known result, with respect to either the gap or the query time or the preprocessing time.
Our algorithms build on the framework by Andoni, Krauthgamer and Onak (FOCS '10) and the recent sublinear-time algorithm by Bringmann, Cassis, Fischer and Nakos (STOC '22). We replace many complicated parts in their algorithm by faster and simpler solutions which exploit the preprocessing.
Edit Distance
Property Testing
Preprocessing
Precision Sampling
Theory of computation~Streaming, sublinear and near linear time algorithms
32:1-32:20
Track A: Algorithms, Complexity and Games
This work is part of the project TIPEA that has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No. 850979).
Karl
Bringmann
Karl Bringmann
Universität des Saarlandes, Saarland Informatics Campus, Saarbrücken, Germany
Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Alejandro
Cassis
Alejandro Cassis
Universität des Saarlandes, Saarland Informatics Campus, Saarbrücken, Germany
Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Nick
Fischer
Nick Fischer
Universität des Saarlandes, Saarland Informatics Campus, Saarbrücken, Germany
Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Vasileios
Nakos
Vasileios Nakos
RelationalAI, Berkeley, CA, USA
10.4230/LIPIcs.ICALP.2022.32
Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In Proceedings of the 56th IEEE Annual Symposium on Foundations of Computer Science, FOCS '15, pages 59-78. IEEE Computer Society, 2015.
Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In Proceedings of the 48th ACM Symposium on Theory of Computing, STOC '16, pages 375-388. ACM, 2016.
Alexandr Andoni. High frequency moments via max-stability. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP '17, pages 6364-6368. IEEE, 2017.
Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Proceedings of the 51st IEEE Annual Symposium on Foundations of Computer Science, FOCS '10, pages 377-386. IEEE Computer Society, 2010.
Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms via precision sampling. In Proceedings of the 52nd IEEE Annual Symposium on Foundations of Computer Science, FOCS '11, pages 363-372. IEEE Computer Society, 2011.
Alexandr Andoni and Negev Shekel Nosatzki. Edit distance in near-linear time: it’s a constant factor. In Proceedings of the 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS '20, pages 990-1001. IEEE Computer Society, 2020.
Alexandr Andoni and Krzysztof Onak. Approximating edit distance in near-linear time. SIAM J. Comput., 41(6):1635-1648, 2012.
Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Proceedings of the 47th ACM Symposium on Theory of Computing, STOC '15, pages 51-58. ACM, 2015.
Ziv Bar-Yossef, S. Thathachar Jayram, Robert Krauthgamer, and Ravi Kumar. Approximating edit distance efficiently. In Proceedings of the 45th IEEE Annual Symposium on Foundations of Computer Science, FOCS '04, pages 550-559. IEEE Computer Society, 2004.
Tugkan Batu, Funda Ergün, Joe Kilian, Avner Magen, Sofya Raskhodnikova, Ronitt Rubinfeld, and Rahul Sami. A sublinear algorithm for weakly approximating edit distance. In Proceedings of the 35th ACM Symposium on Theory of Computing, STOC '03, pages 316-324. ACM, 2003.
Tugkan Batu, Funda Ergun, and Cenk Sahinalp. Oblivious string embeddings and edit distance approximations. In Proceedings of the 17th ACM-SIAM Symposium on Discrete Algorithms, SODA '06, pages 792-801. SIAM, 2006.
Djamal Belazzougui and Qin Zhang. Edit distance: Sketching, streaming, and document exchange. In Proceedings of the 57th IEEE Annual Symposium on Foundations of Computer Science, FOCS '16, pages 51-60. IEEE Computer Society, 2016.
Joshua Brakensiek, Moses Charikar, and Aviad Rubinstein. A simple sublinear algorithm for gap edit distance, 2020. URL: http://arxiv.org/abs/2007.14368.
http://arxiv.org/abs/2007.14368
Joshua Brakensiek and Aviad Rubinstein. Constant-factor approximation of near-linear edit distance in near-linear time. In Proceedings of the 52nd ACM Symposium on Theory of Computing, STOC '20, pages 685-698. ACM, 2020.
Karl Bringmann, Alejandro Cassis, Nick Fischer, and Vasileios Nakos. Almost-optimal sublinear-time edit distance in the low distance regime. In Proceedings of the 54rd ACM Symposium on Theory of Computing, STOC '22. ACM, 2022.
Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In Proceedings of the 56th IEEE Annual Symposium on Foundations of Computer Science, FOCS '15, pages 79-97. IEEE Computer Society, 2015.
Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael Saks. Approximating edit distance within constant factor in truly sub-quadratic time. J. ACM, 67(6), 2020.
Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. Streaming algorithms for embedding and computing edit distance in the low distance regime. In Proceedings of the 48th ACM Symposium on Theory of Computing, STOC '16, pages 712-725. ACM, 2016.
Elazar Goldenberg, Tomasz Kociumaka, Robert Krauthgamer, and Barna Saha. Gap edit distance via non-adaptive queries: Simple and optimal, 2021. URL: http://arxiv.org/abs/2111.12706.
http://arxiv.org/abs/2111.12706
Elazar Goldenberg, Robert Krauthgamer, and Barna Saha. Sublinear algorithms for gap edit distance. In Proceedings of the 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS '20, pages 1101-1120. IEEE Computer Society, 2019.
Elazar Goldenberg, Aviad Rubinstein, and Barna Saha. Does preprocessing help in fast sequence comparisons? In Proceedings of the 52nd ACM Symposium on Theory of Computing, STOC '20, pages 657-670. ACM, 2020.
Bernhard Haeupler. Optimal document exchange and new codes for insertions and deletions. In Proceedings of the 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS '19, pages 334-347. IEEE Computer Society, 2019.
Hossein Jowhari. Efficient communication protocols for deciding edit distance. In Proceedings of the 20th Annual European Conference on Algorithms, ESA '12, pages 648-658. Springer, 2012.
Tomasz Kociumaka and Barna Saha. Sublinear-time algorithms for computing & embedding gap edit distance. In Proceedings of the 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS '20, pages 1168-1179. IEEE Computer Society, 2020.
Michal Koucký and Michael E. Saks. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Proceedings of the 52nd ACM Symposium on Theory of Computing, STOC '20, pages 699-712. ACM, 2020.
Rafail Ostrovsky and Yuval Rabani. Low distortion embeddings for edit distance. J. ACM, 54(5):23, 2007.
Sebastian Wandelt, Dong Deng, Stefan Gerdjikov, Shashwat Mishra, Petar Mitankin, Manish Patil, Enrico Siragusa, Alexander Tiskin, Wei Wang, Jiaying Wang, and Ulf Leser. State-of-the-art in string similarity search and join. SIGMOD Rec., 43(1):64-76, 2014.
Karl Bringmann, Alejandro Cassis, Nick Fischer, and Vasileios Nakos
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode