When Is the Normalized Edit Distance over Non-Uniform Weights a Metric?

Authors Dana Fisman , Ilay Tzarfati



PDF
Thumbnail PDF

File

LIPIcs.CPM.2024.14.pdf
  • Filesize: 0.9 MB
  • 17 pages

Document Identifiers

Author Details

Dana Fisman
  • Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
Ilay Tzarfati
  • Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel

Acknowledgements

We would like to thank Oded Margalit, Elina Sudit and Sandra Zilles for comments on an earlier draft of this paper.

Cite AsGet BibTex

Dana Fisman and Ilay Tzarfati. When Is the Normalized Edit Distance over Non-Uniform Weights a Metric?. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.CPM.2024.14

Abstract

The well known Normalized Edit Distance (ned) [Marzal and Vidal 1993] is known to disobey the triangle inequality on contrived weight functions, while in practice it often exhibits a triangular behavior. Let d be a weight function on basic edit operations, and let ned_{d} be the resulting normalized edit distance. The question what criteria should d satisfy for ned_{d} to be a metric is long standing. It was recently shown that when d is the uniform weight function (all operations cost 1 except for no-op which costs 0) then ned_{d} is a metric. The question regarding non-uniform weights remained open. In this paper we answer this question by providing a necessary and sufficient condition on d under which ned_{d} is a metric.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Theory of computation → Formal languages and automata theory
Keywords
  • Normalized Edit Distance
  • Non-uniform Weights
  • Triangle Inequality
  • Metric

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. C. Baier and J-P. Katoen. Principles of Model Checking. MIT Press, 2008. Google Scholar
  2. Edmund M. Clarke, Orna Grumberg, Daniel Kroening, Doron A. Peled, and Helmut Veith. Model checking, 2nd Edition. MIT Press, 2018. URL: https://mitpress.mit.edu/books/model-checking-second-edition.
  3. Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem, editors. Handbook of Model Checking. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-10575-8.
  4. Loris D'Antoni and Margus Veanes. The power of symbolic automata and transducers. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I, pages 47-67, 2017. Google Scholar
  5. Colin de la Higuera and Luisa Micó. A contextual normalised edit distance. In Proceedings of the 24th International Conference on Data Engineering Workshops, ICDE 2008, April 7-12, 2008, Cancún, Mexico, pages 354-361. IEEE Computer Society, 2008. Google Scholar
  6. Emmanuel Filiot, Nicolas Mazzocchi, Jean-François Raskin, Sriram Sankaranarayanan, and Ashutosh Trivedi. Weighted transducers for robustness verification. In 31st International Conference on Concurrency Theory, CONCUR 2020, September 1-4, 2020, Vienna, Austria (Virtual Conference), pages 17:1-17:21, 2020. Google Scholar
  7. Dana Fisman, Joshua Grogin, Oded Margalit, and Gera Weiss. The normalized edit distance with uniform operation costs is a metric. In Hideo Bannai and Jan Holub, editors, 33rd Annual Symposium on Combinatorial Pattern Matching, CPM 2022, June 27-29, 2022, Prague, Czech Republic, volume 223 of LIPIcs, pages 17:1-17:17, 2022. Google Scholar
  8. Dana Fisman, Joshua Grogin, and Gera Weiss. A normalized edit distance on infinite words. In 31st EACSL Annual Conference on Computer Science Logic, CSL 2023, February 13-16, 2023, Warsaw, Poland, pages 20:1-20:20, 2023. Google Scholar
  9. R. W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 29(2):147-160, April 1950. URL: https://doi.org/10.1002/j.1538-7305.1950.tb00463.x.
  10. Karen Kukich. Techniques for automatically correcting words in text. ACM Comput. Surv., 24(4):377-439, December 1992. URL: https://doi.org/10.1145/146370.146380.
  11. Vladimir Iosifovich Levenshtein. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10(8):707-710, February 1966. Doklady Akademii Nauk SSSR, V163 No4 845-848 1965. Google Scholar
  12. Yujian Li and Bi Liu. A normalized levenshtein distance metric. IEEE Trans. Pattern Anal. Mach. Intell., 29(6):1091-1095, 2007. Google Scholar
  13. Andrés Marzal and Enrique Vidal. Computation of normalized edit distance and applications. IEEE Trans. Pattern Anal. Mach. Intell., 15(9):926-932, 1993. Google Scholar
  14. Gonzalo Navarro. A guided tour to approximate string matching. ACM Comput. Surv., 33(1):31-88, March 2001. URL: https://doi.org/10.1145/375360.375365.
  15. Büchi J. R. On a decision method in restricted second order arithmetic. In Int. Congress on Logic, Method, and Philosophy of Science, pages 1-12. Stanford University Press, 1962. Google Scholar
  16. Sanda Zilles. A distance on ℕ. Private communication, 2023. Google Scholar
  17. David Sankoff and Joseph B. Kruskal. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983. Google Scholar
  18. Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, January 1974. URL: https://doi.org/10.1145/321796.321811.
  19. Achim Weigel and Frank Fein. Normalizing the weighted edit distance. In 12th IAPR International Conference on Pattern Recognition, Conference B: Patern Recognition and Neural Networks, ICPR 1994, Jerusalem, Israel, 9-13 October, 1994, Volume 2, pages 399-402, 1994. Google Scholar