Many Flavors of Edit Distance

Bhattacharya, Sudatta; Dey, Sanjana; Goldenberg, Elazar; Koucký, Michal

doi:10.4230/LIPIcs.FSTTCS.2024.11

Abstract

Several measures exist for string similarity, including notable ones like the edit distance and the indel distance. The former measures the count of insertions, deletions, and substitutions required to transform one string into another, while the latter specifically quantifies the number of insertions and deletions. Many algorithmic solutions explicitly address one of these measures, and frequently techniques applicable to one can also be adapted to work with the other. In this paper, we investigate whether there exists a standardized approach for applying results from one setting to another. Specifically, we demonstrate the capability to reduce questions regarding string similarity over arbitrary alphabets to equivalent questions over a binary alphabet. Furthermore, we illustrate how to transform questions concerning indel distance into equivalent questions based on edit distance. This complements an earlier result of Tiskin (2007) which addresses the inverse direction.

Cite As Get BibTex

Sudatta Bhattacharya, Sanjana Dey, Elazar Goldenberg, and Michal Koucký. Many Flavors of Edit Distance. In 44th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 323, pp. 11:1-11:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024) https://doi.org/10.4230/LIPIcs.FSTTCS.2024.11

Author Details

Sudatta Bhattacharya

Charles University, Prague, Czech Republic

Sanjana Dey

National University of Singapore, Singapore

Elazar Goldenberg

The Academic College of Tel-Aviv-Yaffo, Israel

Michal Koucký

Charles University, Prague, Czech Republic

Funding

Bhattacharya, Sudatta: Partially supported by the project of Czech Science Foundation no. 19-27871X, 24-10306S and by the project GAUK125424 of the Charles University Grant Agency.
Koucký, Michal: Partially supported by the project of Czech Science Foundation no. 19-27871X and 24-10306S.

Acknowledgements

The project started at EPAC Workshop: Algorithms and Complexity partially supported by the project of Czech Science Foundation no. 19-27871X.

References

Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, pages 59-78, 2015. URL: https://doi.org/10.1109/FOCS.2015.14.
Amir Abboud and Karl Bringmann. Tighter connections between formula-sat and shaving logs. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, volume 107 of LIPIcs, pages 8:1-8:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPICS.ICALP.2018.8.
Alexandr Andoni and Negev Shekel Nosatzki. Edit distance in near-linear time: it’s a constant factor. In Sandy Irani, editor, 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 990-1001. IEEE, 2020. URL: https://doi.org/10.1109/FOCS46700.2020.00096.
Alexandr Andoni and Krzysztof Onak. Approximating edit distance in near-linear time. In Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, STOC '09, pages 199-204, New York, NY, USA, 2009. ACM. URL: https://doi.org/10.1145/1536414.1536444.
Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC '15, pages 51-58, New York, NY, USA, 2015. ACM. URL: https://doi.org/10.1145/2746539.2746612.
Ziv Bar-Yossef, TS Jayram, Robert Krauthgamer, and Ravi Kumar. Approximating edit distance efficiently. In 45th Annual IEEE Symposium on Foundations of Computer Science, pages 550-559. IEEE, 2004.
Tugkan Batu, Funda Ergün, Joe Kilian, Avner Magen, Sofya Raskhodnikova, Ronitt Rubinfeld, and Rahul Sami. A sublinear algorithm for weakly approximating edit distance. In Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, STOC '03, pages 316-324, New York, NY, USA, 2003. ACM. URL: https://doi.org/10.1145/780542.780590.
Tuğkan Batu, Funda Ergun, and Cenk Sahinalp. Oblivious string embeddings and edit distance approximations. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA '06, pages 792-801, Philadelphia, PA, USA, 2006. Society for Industrial and Applied Mathematics.
Mahdi Boroujeni, Soheil Ehsani, Mohammad Ghodsi, MohammadTaghi HajiAghayi, and Saeed Seddighin. Approximating edit distance in truly subquadratic time: Quantum and mapreduce. Journal of the ACM (JACM), 68(3):1-41, 2021. URL: https://doi.org/10.1145/3456807.
Joshua Brakensiek and Aviad Rubinstein. Constant-factor approximation of near-linear edit distance in near-linear time. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 685-698. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384282.
Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Kouckỳ, and Michael Saks. Approximating edit distance within constant factor in truly sub-quadratic time. Journal of the ACM (JACM), 67(6):1-22, 2020. URL: https://doi.org/10.1145/3422823.
Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. Streaming algorithms for embedding and computing edit distance in the low distance regime. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 712-725, 2016. URL: https://doi.org/10.1145/2897518.2897577.
E. N. Gilbert. A comparison of signalling alphabets. The Bell System Technical Journal, 31(3):504-522, 1952. URL: https://doi.org/10.1002/j.1538-7305.1952.tb01393.x.
Elazar Goldenberg, Aviad Rubinstein, and Barna Saha. Does preprocessing help in fast sequence comparisons? In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 657-670, 2020. URL: https://doi.org/10.1145/3357713.3384300.
Szymon Grabowski. New tabulation and sparse dynamic programming based techniques for sequence similarity problems. Discrete Applied Mathematics, 212:96-103, 2016. URL: https://doi.org/10.1016/J.DAM.2015.10.040.
Michal Koucký and Michael E. Saks. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 699-712. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384307.
Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM J. Comput., 27(2):557-582, April 1998. URL: https://doi.org/10.1137/S0097539794264810.
Vladimir I Levenshtein et al. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10(8), pages 707-710. Soviet Union, 1966.
William J Masek and Michael S Paterson. A faster algorithm computing string edit distances. Journal of Computer and System sciences, 20(1):18-31, 1980. URL: https://doi.org/10.1016/0022-0000(80)90002-1.
Rafail Ostrovsky and Yuval Rabani. Low distortion embeddings for edit distance. J. ACM, 54(5):23, 2007. URL: https://doi.org/10.1145/1284320.1284322.
Alexander Tiskin. Semi-local string comparison: Algorithmic techniques and applications. Mathematics in Computer Science, 1:571-603, 2008. URL: https://doi.org/10.1007/S11786-007-0033-3.
Rom Rubenovich Varshamov. Estimate of the number of signals in error correcting codes. Docklady Akad. Nauk, SSSR, 117:739-741, 1957.
Robert A Wagner and Michael J Fischer. The string-to-string correction problem. Journal of the ACM (JACM), 21(1):168-173, 1974. URL: https://doi.org/10.1145/321796.321811.

Many Flavors of Edit Distance

Authors Sudatta Bhattacharya , Sanjana Dey , Elazar Goldenberg , Michal Koucký

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Many Flavors of Edit Distance

Authors Sudatta Bhattacharya , Sanjana Dey , Elazar Goldenberg , Michal Koucký

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message