Many Flavors of Edit Distance

Authors Sudatta Bhattacharya , Sanjana Dey , Elazar Goldenberg , Michal Koucký



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2024.11.pdf
  • Filesize: 0.77 MB
  • 16 pages

Document Identifiers

Author Details

Sudatta Bhattacharya
  • Charles University, Prague, Czech Republic
Sanjana Dey
  • National University of Singapore, Singapore
Elazar Goldenberg
  • The Academic College of Tel-Aviv-Yaffo, Israel
Michal Koucký
  • Charles University, Prague, Czech Republic

Acknowledgements

The project started at EPAC Workshop: Algorithms and Complexity partially supported by the project of Czech Science Foundation no. 19-27871X.

Cite As Get BibTex

Sudatta Bhattacharya, Sanjana Dey, Elazar Goldenberg, and Michal Koucký. Many Flavors of Edit Distance. In 44th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 323, pp. 11:1-11:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024) https://doi.org/10.4230/LIPIcs.FSTTCS.2024.11

Abstract

Several measures exist for string similarity, including notable ones like the edit distance and the indel distance. The former measures the count of insertions, deletions, and substitutions required to transform one string into another, while the latter specifically quantifies the number of insertions and deletions. Many algorithmic solutions explicitly address one of these measures, and frequently techniques applicable to one can also be adapted to work with the other. In this paper, we investigate whether there exists a standardized approach for applying results from one setting to another. Specifically, we demonstrate the capability to reduce questions regarding string similarity over arbitrary alphabets to equivalent questions over a binary alphabet. Furthermore, we illustrate how to transform questions concerning indel distance into equivalent questions based on edit distance. This complements an earlier result of Tiskin (2007) which addresses the inverse direction.

Subject Classification

ACM Subject Classification
  • Theory of computation → Random projections and metric embeddings
Keywords
  • Edit distance
  • Indel distance
  • Embedding
  • LCS
  • Alphabet Reduction

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, pages 59-78, 2015. URL: https://doi.org/10.1109/FOCS.2015.14.
  2. Amir Abboud and Karl Bringmann. Tighter connections between formula-sat and shaving logs. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, volume 107 of LIPIcs, pages 8:1-8:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPICS.ICALP.2018.8.
  3. Alexandr Andoni and Negev Shekel Nosatzki. Edit distance in near-linear time: it’s a constant factor. In Sandy Irani, editor, 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 990-1001. IEEE, 2020. URL: https://doi.org/10.1109/FOCS46700.2020.00096.
  4. Alexandr Andoni and Krzysztof Onak. Approximating edit distance in near-linear time. In Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, STOC '09, pages 199-204, New York, NY, USA, 2009. ACM. URL: https://doi.org/10.1145/1536414.1536444.
  5. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC '15, pages 51-58, New York, NY, USA, 2015. ACM. URL: https://doi.org/10.1145/2746539.2746612.
  6. Ziv Bar-Yossef, TS Jayram, Robert Krauthgamer, and Ravi Kumar. Approximating edit distance efficiently. In 45th Annual IEEE Symposium on Foundations of Computer Science, pages 550-559. IEEE, 2004. Google Scholar
  7. Tugkan Batu, Funda Ergün, Joe Kilian, Avner Magen, Sofya Raskhodnikova, Ronitt Rubinfeld, and Rahul Sami. A sublinear algorithm for weakly approximating edit distance. In Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, STOC '03, pages 316-324, New York, NY, USA, 2003. ACM. URL: https://doi.org/10.1145/780542.780590.
  8. Tuğkan Batu, Funda Ergun, and Cenk Sahinalp. Oblivious string embeddings and edit distance approximations. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA '06, pages 792-801, Philadelphia, PA, USA, 2006. Society for Industrial and Applied Mathematics. Google Scholar
  9. Mahdi Boroujeni, Soheil Ehsani, Mohammad Ghodsi, MohammadTaghi HajiAghayi, and Saeed Seddighin. Approximating edit distance in truly subquadratic time: Quantum and mapreduce. Journal of the ACM (JACM), 68(3):1-41, 2021. URL: https://doi.org/10.1145/3456807.
  10. Joshua Brakensiek and Aviad Rubinstein. Constant-factor approximation of near-linear edit distance in near-linear time. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 685-698. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384282.
  11. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Kouckỳ, and Michael Saks. Approximating edit distance within constant factor in truly sub-quadratic time. Journal of the ACM (JACM), 67(6):1-22, 2020. URL: https://doi.org/10.1145/3422823.
  12. Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. Streaming algorithms for embedding and computing edit distance in the low distance regime. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 712-725, 2016. URL: https://doi.org/10.1145/2897518.2897577.
  13. E. N. Gilbert. A comparison of signalling alphabets. The Bell System Technical Journal, 31(3):504-522, 1952. URL: https://doi.org/10.1002/j.1538-7305.1952.tb01393.x.
  14. Elazar Goldenberg, Aviad Rubinstein, and Barna Saha. Does preprocessing help in fast sequence comparisons? In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 657-670, 2020. URL: https://doi.org/10.1145/3357713.3384300.
  15. Szymon Grabowski. New tabulation and sparse dynamic programming based techniques for sequence similarity problems. Discrete Applied Mathematics, 212:96-103, 2016. URL: https://doi.org/10.1016/J.DAM.2015.10.040.
  16. Michal Koucký and Michael E. Saks. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 699-712. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384307.
  17. Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM J. Comput., 27(2):557-582, April 1998. URL: https://doi.org/10.1137/S0097539794264810.
  18. Vladimir I Levenshtein et al. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10(8), pages 707-710. Soviet Union, 1966. Google Scholar
  19. William J Masek and Michael S Paterson. A faster algorithm computing string edit distances. Journal of Computer and System sciences, 20(1):18-31, 1980. URL: https://doi.org/10.1016/0022-0000(80)90002-1.
  20. Rafail Ostrovsky and Yuval Rabani. Low distortion embeddings for edit distance. J. ACM, 54(5):23, 2007. URL: https://doi.org/10.1145/1284320.1284322.
  21. Alexander Tiskin. Semi-local string comparison: Algorithmic techniques and applications. Mathematics in Computer Science, 1:571-603, 2008. URL: https://doi.org/10.1007/S11786-007-0033-3.
  22. Rom Rubenovich Varshamov. Estimate of the number of signals in error correcting codes. Docklady Akad. Nauk, SSSR, 117:739-741, 1957. Google Scholar
  23. Robert A Wagner and Michael J Fischer. The string-to-string correction problem. Journal of the ACM (JACM), 21(1):168-173, 1974. URL: https://doi.org/10.1145/321796.321811.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail