Edit Distance of Finite State Transducers

Aiswarya, C.; Manuel, Amaldev; Sunny, Saina

doi:10.4230/LIPIcs.ICALP.2024.125

Abstract

We lift metrics over words to metrics over word-to-word transductions, by defining the distance between two transductions as the supremum of the distances of their respective outputs over all inputs. This allows to compare transducers beyond equivalence. Two transducers are close (resp. k-close) with respect to a metric if their distance is finite (resp. at most k). Over integer-valued metrics computing the distance between transducers is equivalent to deciding the closeness and k-closeness problems. For common integer-valued edit distances such as, Hamming, transposition, conjugacy and Levenshtein family of distances, we show that the closeness and the k-closeness problems are decidable for functional transducers. Hence, the distance with respect to these metrics is also computable. Finally, we relate the notion of distance between functions to the notions of diameter of a relation and index of a relation in another. We show that computing edit distance between functional transducers is equivalent to computing diameter of a rational relation and both are a specific instance of the index problem of rational relations.

M Ackroyd. Isolated word recognition using the weighted Levenshtein distance. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(2):243-244, 1980. URL: https://doi.org/10.1109/TASSP.1980.1163382.
Alfred V Aho and Thomas G Peterson. A minimum distance error-correcting parser for context-free languages. SIAM Journal on Computing, 1(4):305-312, 1972. URL: https://doi.org/10.1137/0201022.
C. Aiswarya, Amaldev Manuel, and Saina Sunny. Deciding conjugacy of a rational relation. CoRR, abs/2307.06777, 2023. URL: https://doi.org/10.48550/arXiv.2307.06777.
Cyril Allauzen and Mehryar Mohri. Linear-space computation of the edit-distance between a string and a finite automaton. CoRR, abs/0904.4686, 2009. URL: https://doi.org/10.48550/arXiv.0904.4686.
Rajeev Alur and Pavol Černý. Expressiveness of streaming string transducers. In FSTTCS 2010, volume 8 of LIPIcs, pages 1-12. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2010. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2010.1.
Alberto Apostolico and Concettina Guerra. The longest common subsequence problem revisited. Algorithmica, 2:315-336, 1987. URL: https://doi.org/10.1007/BF01840365.
Michael Benedikt, Gabriele Puppis, and Cristian Riveros. Regular repair of specifications. In LICS 2011, pages 335-344. IEEE Computer Society, 2011. URL: https://doi.org/10.1109/LICS.2011.43.
Mikolaj Bojanczyk. Transducers of polynomial growth. In LICS 2022, pages 1-27. ACM, 2022. URL: https://doi.org/10.1145/3531130.3533326.
Mikolaj Bojanczyk, Sandra Kiefer, and Nathan Lhote. String-to-string interpretations with polynomial-size output. In ICALP 2019, volume 132 of LIPIcs, pages 106:1-106:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.106.
Christian Choffrut and Serge Grigorieff. Uniformization of rational relations. In Jewels are Forever: Contributions on Theoretical Computer Science in Honor of Arto Salomaa, pages 59-71. Springer, 1999. URL: https://doi.org/10.1007/978-3-642-60207-8_6.
Christian Choffrut and Giovanni Pighizzini. Distances between languages and reflexivity of relations. Theoretical Computer Science, 286(1):117-138, 2002. URL: https://doi.org/10.1016/S0304-3975(01)00238-9.
Thomas Colcombet. On factorisation forests. CoRR, abs/cs/0701113, 2007. URL: https://doi.org/10.48550/arXiv.cs/0701113.
Thomas Colcombet. The theory of stabilisation monoids and regular cost functions. In ICALP 2009, volume 5556 of Lecture Notes in Computer Science, pages 139-150. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-02930-1_12.
Thomas Colcombet. Regular cost functions, part I: logic and algebra over words. Log. Methods Comput. Sci., 9(3), 2013. URL: https://doi.org/10.2168/LMCS-9(3:3)2013.
Thomas Colcombet, Denis Kuperberg, Amaldev Manuel, and Szymon Torunczyk. Cost functions definable by min/max automata. In STACS 2016, volume 47 of LIPIcs, pages 29:1-29:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. URL: https://doi.org/10.4230/LIPIcs.STACS.2016.29.
Samuel Eilenberg. Automata, languages, and machines. A. Pure and applied mathematics. Academic Press, 1974. URL: https://www.worldcat.org/oclc/310535248.
Joost Engelfriet and Hendrik Jan Hoogeboom. MSO definable string transductions and two-way finite-state transducers. ACM Trans. Comput. Log., 2(2):216-254, 2001. URL: https://doi.org/10.1145/371316.371512.
David Eppstein, Zvi Galil, and Raffaele Giancarlo. Efficient algorithms with applications to molecular biology. In Sequences: Combinatorics, Compression, Security, and Transmission, pages 59-74. Springer, 1990. URL: https://doi.org/10.1007/978-1-4612-3352-7_5.
Emmanuel Filiot and Pierre-Alain Reynier. Transducers, logic and algebra for functions of finite words. ACM SIGLOG News, 3(3):4-19, 2016. URL: https://doi.org/10.1145/2984450.2984453.
Christiane Frougny and Jacques Sakarovitch. Rational relations with bounded delay. In STACS 1991, volume 480 of Lecture Notes in Computer Science, pages 50-63. Springer, 1991. URL: https://doi.org/10.1007/BFb0020787.
Yo-Sub Han, Sang-Ki Ko, and Kai Salomaa. Computing the edit-distance between a regular language and a context-free language. In DLT 2012, volume 7410, pages 85-96. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-31653-1_9.
Kosaburo Hashiguchi. A decision procedure for the order of regular events. Theoretical Computer Science, 8(1):69-72, 1979. URL: https://doi.org/10.1016/0304-3975(79)90057-4.
Thomas A. Henzinger, Jan Otop, and Roopsha Samanta. Lipschitz robustness of finite-state transducers. In FSTTCS 2014, volume 29 of LIPIcs, pages 431-443. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2014. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2014.431.
Richard M Karp. Mapping the genome: some combinatorial problems arising in molecular biology. In STOC 1993, number 8 in STOC '93, pages 278-285. ACM, 1993. URL: https://doi.org/10.1145/167088.167170.
Stavros Konstantinidis. Computing the edit distance of a regular language. Inf. Comput., 205(9):1307-1316, 2007. URL: https://doi.org/10.1016/j.ic.2007.06.001.
Dexter Kozen. Automata and computability. Undergraduate texts in computer science. Springer, 1997. URL: https://doi.org/10.1007/978-3-642-85706-5.
Joseph B Kruskal. An overview of sequence comparison: Time warps, string edits, and macromolecules. SIAM review, 25(2):201-237, 1983. URL: https://doi.org/10.1137/1025045.
Hing Leung and Viktor Podolskiy. The limitedness problem on distance automata: Hashiguchi’s method revisited. Theoretical Computer Science, 310(1-3):147-158, 2004. URL: https://doi.org/10.1016/S0304-3975(03)00377-3.
Vladimir I Levenshtein et al. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics. Doklady, volume 10, pages 707-710. Soviet Union, 1966.
Roger C Lyndon, Marcel-Paul Schützenberger, et al. The equation a^M = b^N c^P in a free group. Michigan Math. J, 9(4):289-298, 1962. URL: https://doi.org/10.1307/mmj/1028998766.
Mehryar Mohri. Edit-distance of weighted automata: General definitions and algorithms. Int. J. Found. Comput. Sci., 14(06):957-982, 2003. URL: https://doi.org/10.1142/S0129054103002114.
Maurice Nivat. Transduction des langages de Chomsky. PhD thesis, Annales de l'Institut Fourier, 1968. URL: https://doi.org/10.5802/aif.287.
Teruo Okuda, Eiichi Tanaka, and Tamotsu Kasai. A method for the correction of garbled words based on the levenshtein metric. IEEE Transactions on Computers, 100(2):172-178, 1976. URL: https://doi.org/10.1109/TC.1976.5009232.
Christophe Reutenauer and Marcel-Paul Schutzenberger. Minimization of rational word functions. SIAM Journal on Computing, 20(4):669-685, August 1991. URL: https://doi.org/10.1137/0220042.
Roopsha Samanta, Jyotirmoy V. Deshmukh, and Swarat Chaudhuri. Robustness analysis of string transducers. In ATVA 2013, volume 8172, pages 427-441. Springer, 2013. URL: https://doi.org/10.1007/978-3-319-02444-8_30.
Marcel Paul Schuetzenberger et al. Sur une variante des fonctions séquentielles. Theoretical Computer Science, 4(1):47-57, 1977. URL: https://doi.org/10.1016/0304-3975(77)90055-X.
Imre Simon. Limited subsets of a free monoid. In SFCS 1978, pages 143-150. IEEE, 1978. URL: https://doi.org/10.1109/SFCS.1978.21.
Imre Simon. Factorization forests of finite height. Theoretical Computer Science, 72(1):65-94, 1990. URL: https://doi.org/10.1016/0304-3975(90)90047-L.
Richard Edwin Stearns and Harry B. Hunt III. On the equivalence and containment problems for unambiguous regular expressions, regular grammars and finite automata. SIAM Journal on Computing, 14(3):598-611, 1985. URL: https://doi.org/10.1137/0214044.
Larry J. Stockmeyer and Albert R. Meyer. Word problems requiring exponential time: Preliminary report. In STOC 1973, pages 1-9. ACM, 1973. URL: https://doi.org/10.1145/800125.804029.
J Ullman. Near-optimal, single-synchronization-error-correcting code. IEEE Transactions on Information Theory, 12(4):418-424, 1966. URL: https://doi.org/10.1109/TIT.1966.1053920.
Robert A. Wagner. Order-n correction for regular languages. Commun. ACM, 17(5):265-268, 1974. URL: https://doi.org/10.1145/360980.360995.

Edit Distance of Finite State Transducers

Authors C. Aiswarya , Amaldev Manuel , Saina Sunny

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Edit Distance of Finite State Transducers

Authors C. Aiswarya , Amaldev Manuel , Saina Sunny

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message