Faster Algorithms for Bounded Tree Edit Distance

Authors Shyan Akmal , Ce Jin



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2021.12.pdf
  • Filesize: 0.79 MB
  • 15 pages

Document Identifiers

Author Details

Shyan Akmal
  • MIT, EECS and CSAIL, Cambridge, MA, USA
Ce Jin
  • MIT, EECS and CSAIL, Cambridge, MA, USA

Acknowledgements

We thank Virginia Vassilevska Williams for several helpful discussions.

Cite AsGet BibTex

Shyan Akmal and Ce Jin. Faster Algorithms for Bounded Tree Edit Distance. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 12:1-12:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ICALP.2021.12

Abstract

Tree edit distance is a well-studied measure of dissimilarity between rooted trees with node labels. It can be computed in O(n³) time [Demaine, Mozes, Rossman, and Weimann, ICALP 2007], and fine-grained hardness results suggest that the weighted version of this problem cannot be solved in truly subcubic time unless the APSP conjecture is false [Bringmann, Gawrychowski, Mozes, and Weimann, SODA 2018]. We consider the unweighted version of tree edit distance, where every insertion, deletion, or relabeling operation has unit cost. Given a parameter k as an upper bound on the distance, the previous fastest algorithm for this problem runs in O(nk³) time [Touzet, CPM 2005], which improves upon the cubic-time algorithm for k≪ n^{2/3}. In this paper, we give a faster algorithm taking O(nk² log n) time, improving both of the previous results for almost the full range of log n ≪ k≪ n/√{log n}.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • tree edit distance
  • edit distance
  • dynamic programming

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In Proceedings of the 56th IEEE Symposium on Foundations of Computer Science (FOCS), pages 59-78, 2015. URL: https://doi.org/10.1109/FOCS.2015.14.
  2. Tatsuya Akutsu, Daiji Fukagawa, and Atsuhiro Takasu. Approximating tree edit distance through string edit distance. Algorithmica, 57(2):325-348, 2010. URL: https://doi.org/10.1007/s00453-008-9213-z.
  3. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 377-386, 2010. URL: https://doi.org/10.1109/FOCS.2010.43.
  4. Alexandr Andoni and Negev Shekel Nosatzki. Edit distance in near-linear time: it’s a constant factor. In Proceedings of the 61st IEEE Annual Symposium on Foundations of Computer Science (FOCS), 2020. URL: http://arxiv.org/abs/2005.07678.
  5. Alexandr Andoni and Krzysztof Onak. Approximating edit distance in near-linear time. SIAM J. Comput., 41(6):1635-1648, 2012. URL: https://doi.org/10.1137/090767182.
  6. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput., 47(3):1087-1097, 2018. URL: https://doi.org/10.1137/15M1053128.
  7. John Bellando and Ravi Kothari. Region-based modeling and tree edit distance as a basis for gesture recognition. In Proceedings of the 10th International Conference on Image Analysis and Processing (ICIAP), pages 698-703. IEEE Computer Society, 1999. URL: https://doi.org/10.1109/ICIAP.1999.797676.
  8. Philip Bille. A survey on tree edit distance and related problems. Theor. Comput. Sci., 337(1-3):217-239, 2005. URL: https://doi.org/10.1016/j.tcs.2004.12.030.
  9. Philip Bille and Martin Farach-Colton. Fast and compact regular expression matching. Theor. Comput. Sci., 409(3):486-496, 2008. URL: https://doi.org/10.1016/j.tcs.2008.08.042.
  10. Mahdi Boroujeni, Soheil Ehsani, Mohammad Ghodsi, Mohammad Taghi Hajiaghayi, and Saeed Seddighin. Approximating edit distance in truly subquadratic time: Quantum and MapReduce. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1170-1189, 2018. URL: https://doi.org/10.1137/1.9781611975031.76.
  11. Mahdi Boroujeni, Mohammad Ghodsi, MohammadTaghi Hajiaghayi, and Saeed Seddighin. 1+ε approximation of tree edit distance in quadratic time. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 709-720, 2019. URL: https://doi.org/10.1145/3313276.3316388.
  12. Joshua Brakensiek and Aviad Rubinstein. Constant-factor approximation of near-linear edit distance in near-linear time. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 685-698, 2020. URL: https://doi.org/10.1145/3357713.3384282.
  13. Karl Bringmann, Paweł Gawrychowski, Shay Mozes, and Oren Weimann. Tree edit distance cannot be computed in strongly subcubic time (unless APSP can). ACM Trans. Algorithms, 16(4):48:1-48:22, 2020. URL: https://doi.org/10.1145/3381878.
  14. Peter Buneman, Martin Grohe, and Christoph Koch. Path queries on compressed XML. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), pages 141-152, 2003. URL: https://doi.org/10.1016/B978-012722442-8/50021-5.
  15. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. Approximating edit distance within constant factor in truly sub-quadratic time. In Proceedings of the 59th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 979-990. IEEE Computer Society, 2018. URL: https://doi.org/10.1109/FOCS.2018.00096.
  16. Sudarshan S. Chawathe. Comparing hierarchical data in external memory. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), pages 90-101, 1999. URL: http://www.vldb.org/conf/1999/P8.pdf.
  17. Erik D. Demaine, Shay Mozes, Benjamin Rossman, and Oren Weimann. An optimal decomposition algorithm for tree edit distance. ACM Trans. Algorithms, 6(1):2:1-2:19, 2009. URL: https://doi.org/10.1145/1644015.1644017.
  18. Bartłomiej Dudek and Paweł Gawrychowski. Edit distance between unrooted trees in cubic time. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming (ICALP), pages 45:1-45:14, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.45.
  19. Serge Dulucq and Hélène Touzet. Analysis of tree edit distance algorithms. In Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 2676 of Lecture Notes in Computer Science, pages 83-95. Springer, 2003. URL: https://doi.org/10.1007/3-540-44888-8_7.
  20. Serge Dulucq and Hélène Touzet. Decomposition algorithms for the tree edit distance problem. J. Discrete Algorithms, 3(2-4):448-471, 2005. URL: https://doi.org/10.1016/j.jda.2004.08.018.
  21. Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini, and S. Muthukrishnan. Compressing and indexing labeled trees, with applications. J. ACM, 57(1):4:1-4:33, 2009. URL: https://doi.org/10.1145/1613676.1613680.
  22. Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997. URL: https://doi.org/10.1017/CBO9780511574931.
  23. Matthias Höchsmann, Thomas Töller, Robert Giegerich, and Stefan Kurtz. Local similarity in RNA secondary structures. In Proceedings of 2nd IEEE Computer Society Bioinformatics Conference, CSB, pages 159-168. IEEE Computer Society, 2003. URL: https://doi.org/10.1109/CSB.2003.1227315.
  24. Philip N. Klein. Computing the edit-distance between unrooted ordered trees. In Proceedings of the 6th Annual European Symposium on Algorithms (ESA), volume 1461 of Lecture Notes in Computer Science, pages 91-102. Springer, 1998. URL: https://doi.org/10.1007/3-540-68530-8_8.
  25. Philip N. Klein, Thomas B. Sebastian, and Benjamin B. Kimia. Shape matching using edit-distance: an implementation. In Proceedings of the 12th Annual Symposium on Discrete Algorithms (SODA), pages 781-790, 2001. URL: http://dl.acm.org/citation.cfm?id=365411.365779.
  26. Philip N. Klein, Srikanta Tirthapura, Daniel Sharvit, and Benjamin B. Kimia. A tree-edit-distance algorithm for comparing simple, closed shapes. In David B. Shmoys, editor, Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 696-704, 2000. URL: http://dl.acm.org/citation.cfm?id=338219.338628.
  27. Michal Koucký and Michael E. Saks. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 699-712. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384307.
  28. Gad M. Landau and Uzi Vishkin. Fast string matching with k differences. J. Comput. Syst. Sci., 37(1):63-78, 1988. URL: https://doi.org/10.1016/0022-0000(88)90045-1.
  29. William J. Masek and Mike Paterson. A faster algorithm computing string edit distances. J. Comput. Syst. Sci., 20(1):18-31, 1980. URL: https://doi.org/10.1016/0022-0000(80)90002-1.
  30. Eugene W. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1(2):251-266, 1986. URL: https://doi.org/10.1007/BF01840446.
  31. Thomas B. Sebastian, Philip N. Klein, and Benjamin B. Kimia. Recognition of shapes by editing their shock graphs. IEEE Trans. Pattern Anal. Mach. Intell., 26(5):550-571, 2004. URL: https://doi.org/10.1109/TPAMI.2004.1273924.
  32. Bruce A. Shapiro and Kaizhong Zhang. Comparing multiple RNA secondary structures using tree comparisons. Bioinformatics, 6(4):309-318, October 1990. URL: https://doi.org/10.1093/bioinformatics/6.4.309.
  33. Dennis E. Shasha and Kaizhong Zhang. Fast algorithms for the unit cost editing distance between trees. J. Algorithms, 11(4):581-621, 1990. URL: https://doi.org/10.1016/0196-6774(90)90011-3.
  34. Kuo-Chung Tai. The tree-to-tree correction problem. J. ACM, 26(3):422-433, 1979. URL: https://doi.org/10.1145/322139.322143.
  35. Hélène Touzet. A linear tree edit distance algorithm for similar ordered trees. In Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 3537 of Lecture Notes in Computer Science, pages 334-345. Springer, 2005. URL: https://doi.org/10.1007/11496656_29.
  36. Hélène Touzet. Comparing similar ordered trees in linear-time. J. Discrete Algorithms, 5(4):696-705, 2007. URL: https://doi.org/10.1016/j.jda.2006.07.002.
  37. Esko Ukkonen. Algorithms for approximate string matching. Inf. Control., 64(1-3):100-118, 1985. URL: https://doi.org/10.1016/S0019-9958(85)80046-2.
  38. Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, 1974. URL: https://doi.org/10.1145/321796.321811.
  39. Michael S. Waterman. Introduction to computational biology: maps, sequences and genomes. CRC Press, 1995. Google Scholar
  40. Kaizhong Zhang and Dennis E. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput., 18(6):1245-1262, 1989. URL: https://doi.org/10.1137/0218082.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail