Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding

Authors Debarati Das, Tomasz Kociumaka , Barna Saha



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2022.49.pdf
  • Filesize: 1.01 MB
  • 20 pages

Document Identifiers

Author Details

Debarati Das
  • Pennsylvania State University, University Park, PA, USA
Tomasz Kociumaka
  • Max Planck Institute for Informatics, Saarbrücken, Germany
Barna Saha
  • University of California, San Diego, CA, USA

Cite AsGet BibTex

Debarati Das, Tomasz Kociumaka, and Barna Saha. Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 49:1-49:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ICALP.2022.49

Abstract

The Dyck language, which consists of well-balanced sequences of parentheses, is one of the most fundamental context-free languages. The Dyck edit distance quantifies the number of edits (character insertions, deletions, and substitutions) required to make a given length-n parenthesis sequence well-balanced. RNA Folding involves a similar problem, where a closing parenthesis can match an opening parenthesis of the same type irrespective of their ordering. For example, in RNA Folding, both () and )( are valid matches, whereas the Dyck language only allows () as a match. Both of these problems have been studied extensively in the literature. Using fast matrix multiplication, it is possible to compute their exact solutions in time O(n^2.687) (Chi, Duan, Xie, Zhang, STOC'22), and a (1+ε)-multiplicative approximation is known with a running time of Ω(n^2.372). The impracticality of fast matrix multiplication often makes combinatorial algorithms much more desirable. Unfortunately, it is known that the problems of (exactly) computing the Dyck edit distance and the folding distance are at least as hard as Boolean matrix multiplication. Thereby, they are unlikely to admit truly subcubic-time combinatorial algorithms. In terms of fast approximation algorithms that are combinatorial in nature, the state of the art for Dyck edit distance is an O(log n)-factor approximation algorithm that runs in near-linear time (Saha, FOCS'14), whereas for RNA Folding only an ε n-additive approximation in Õ(n²/ε) time (Saha, FOCS'17) is known. In this paper, we make substantial improvements to the state of the art for Dyck edit distance (with any number of parenthesis types). We design a constant-factor approximation algorithm that runs in Õ(n^1.971) time (the first constant-factor approximation in subquadratic time). Moreover, we develop a (1+ε)-factor approximation algorithm running in Õ(n²/ε) time, which improves upon the earlier additive approximation. Finally, we design a (3+ε)-approximation that takes Õ(nd/ε) time, where d ≥ 1 is an upper bound on the sought distance. As for RNA folding, for any s ≥ 1, we design a factor-s approximation algorithm that runs in O(n+(n/s)³) time. To the best of our knowledge, this is the first nontrivial approximation algorithm for RNA Folding that can go below the n² barrier. All our algorithms are combinatorial in nature.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
Keywords
  • Dyck Edit Distance
  • RNA Folding
  • String Algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. If the current clique algorithms are optimal, so is Valiant’s parser. SIAM Journal on Computing, 47(6):2527-2555, 2018. URL: https://doi.org/10.1137/16M1061771.
  2. Alfred V. Aho and Thomas G. Peterson. A minimum distance error-correcting parser for context-free languages. SIAM Journal on Computing, 1(4), 1972. URL: https://doi.org/10.1137/0201022.
  3. Noga Alon, Michael Krivelevich, Ilan Newman, and Mario Szegedy. Regular languages are testable with a constant number of queries. SIAM Journal on Computing, 30(6):1842-1862, 2001. URL: https://doi.org/10.1137/s0097539700366528.
  4. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In FOCS 2010, pages 377-386. IEEE, 2010. URL: https://doi.org/10.1109/focs.2010.43.
  5. Alexandr Andoni and Negev Shekel Nosatzki. Edit distance in near-linear time: it’s a constant factor. In FOCS 2020, pages 990-1001. IEEE, 2020. URL: https://doi.org/10.1109/focs46700.2020.00096.
  6. Alexandr Andoni and Krzysztof Onak. Approximating edit distance in near-linear time. SIAM Journal on Computing, 41(6):1635-1648, 2012. URL: https://doi.org/10.1137/090767182.
  7. Arturs Backurs and Krzysztof Onak. Fast algorithms for parsing sequences of parentheses with few errors. In PODS 2016, pages 477-488. ACM, 2016. URL: https://doi.org/10.1145/2902251.2902304.
  8. Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer, and Ravi Kumar. Approximating edit distance efficiently. In FOCS 2004, pages 550-559. IEEE, 2004. URL: https://doi.org/10.1109/FOCS.2004.14.
  9. Tugkan Batu, Funda Ergün, and Süleyman Cenk Sahinalp. Oblivious string embeddings and edit distance approximations. In SODA 2006, pages 792-801. ACM, 2006. URL: https://doi.org/10.1145/1109557.1109644.
  10. Djamal Belazzougui and Qin Zhang. Edit distance: Sketching, streaming, and document exchange. In FOCS 2016, pages 51-60. IEEE, 2016. URL: https://doi.org/10.1109/FOCS.2016.15.
  11. Mahdi Boroujeni, Soheil Ehsani, Mohammad Ghodsi, MohammadTaghi Hajiaghayi, and Saeed Seddighin. Approximating edit distance in truly subquadratic time: Quantum and mapreduce. Journal of the ACM, 68(3):19:1-19:41, 2021. URL: https://doi.org/10.1145/3456807.
  12. Mahdi Boroujeni, Masoud Seddighin, and Saeed Seddighin. Improved algorithms for edit distance and LCS: Beyond worst case. In SODA 2020, pages 1601-1620. SIAM, 2020. URL: https://doi.org/10.1137/1.9781611975994.99.
  13. Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia Vassilevska Williams. Truly subcubic algorithms for language edit distance and RNA folding via fast bounded-difference min-plus product. SIAM Journal on Computing, 48(2):481-512, 2019. URL: https://doi.org/10.1137/17M112720X.
  14. Amit Chakrabarti, Graham Cormode, Ranganath Kondapally, and Andrew McGregor. Information cost tradeoffs for augmented index and streaming language recognition. SIAM Journal on Computing, 42(1):61-83, 2013. URL: https://doi.org/10.1137/100816481.
  15. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. Approximating edit distance within constant factor in truly sub-quadratic time. Journal of the ACM, 67(6):1-22, 2020. URL: https://doi.org/10.1145/3422823.
  16. Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. Streaming algorithms for embedding and computing edit distance in the low distance regime. In STOC 2016, pages 712-725. ACM, 2016. URL: https://doi.org/10.1145/2897518.2897577.
  17. Shucheng Chi, Ran Duan, Tianle Xie, and Tianyi Zhang. Faster min-plus product for monotone instances. In STOC 2022. ACM, 2022. http://arxiv.org/abs/2204.04500, URL: https://doi.org/10.48550/arXiv.2204.04500.
  18. Debarati Das, Tomasz Kociumaka, and Barna Saha. Improved approximation algorithms for dyck edit distance and RNA folding, 2021. URL: http://arxiv.org/abs/2112.05866.
  19. Eldar Fischer, Frédéric Magniez, and Tatiana Starikovskaya. Improved bounds for testing Dyck languages. In SODA 2018, pages 1529-1544. SIAM, 2018. URL: https://doi.org/10.1137/1.9781611975031.100.
  20. Johannes Fischer and Volker Heun. Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing, 40(2):465-492, 2011. URL: https://doi.org/10.1137/090779759.
  21. Dvir Fried, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, and Tatiana Starikovskaya. An improved algorithm for the k-Dyck edit distance problem. In SODA 2022. SIAM, 2022. Google Scholar
  22. Elazar Goldenberg, Aviad Rubinstein, and Barna Saha. Does preprocessing help in fast sequence comparisons? In STOC 2020, pages 657-670. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384300.
  23. Michael A. Harrison. Introduction to Formal Language Theory. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1978. Google Scholar
  24. Tomasz Kociumaka. Efficient Data Structures for Internal Queries in Texts. PhD thesis, University of Warsaw, 2018. URL: https://depotuw.ceon.pl/handle/item/3614.
  25. Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Internal pattern matching queries in a text and applications. In SODA 2015, pages 532-551. SIAM, 2015. URL: https://doi.org/10.1137/1.9781611973730.36.
  26. Dexter C. Kozen. Automata and Computability. Springer New York, 1997. URL: https://doi.org/10.1007/978-1-4612-1844-9.
  27. Andreas Krebs, Nutan Limaye, and Srikanth Srinivasan. Streaming algorithms for recognizing nearly well-parenthesized expressions. In MFCS 2011, volume 6907 of LNCS, pages 412-423. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-22993-0_38.
  28. Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM Journal on Computing, 27(2):557-582, 1998. URL: https://doi.org/10.1137/S0097539794264810.
  29. Gad M. Landau and Uzi Vishkin. Fast string matching with k differences. Journal of Computer and System Sciences, 37(1):63-78, 1988. URL: https://doi.org/10.1016/0022-0000(88)90045-1.
  30. Frédéric Magniez, Claire Mathieu, and Ashwin Nayak. Recognizing well-parenthesized expressions in the streaming model. SIAM Journal on Computing, 43(6):1880-1905, 2014. URL: https://doi.org/10.1137/130926122.
  31. Ruth Nussinov and Ann B Jacobson. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proceedings of the National Academy of Sciences, 77(11):6309-6313, 1980. URL: https://doi.org/10.1073/pnas.77.11.6309.
  32. Michal Parnas, Dana Ron, and Ronitt Rubinfeld. Testing membership in parenthesis languages. Random Structures & Algorithms, 22(1), January 2003. URL: https://doi.org/10.1002/rsa.10067.
  33. Aviad Rubinstein, Saeed Seddighin, Zhao Song, and Xiaorui Sun. Approximation algorithms for LCS and LIS with truly improved running times. In FOCS 2019, pages 1121-1145. IEEE, 2019. URL: https://doi.org/10.1109/focs.2019.00071.
  34. Barna Saha. The Dyck language edit distance problem in near-linear time. In FOCS 2014, pages 611-620. IEEE, 2014. URL: https://doi.org/10.1109/focs.2014.71.
  35. Barna Saha. Language edit distance and maximum likelihood parsing of stochastic grammars: Faster algorithms and connection to fundamental graph problems. In FOCS 2015, pages 118-135. IEEE, 2015. URL: https://doi.org/10.1109/focs.2015.17.
  36. Barna Saha. Fast & space-efficient approximations of language edit distance and RNA folding: An amnesic dynamic programming approach. In FOCS 2017, pages 295-306. IEEE, 2017. URL: https://doi.org/10.1109/FOCS.2017.35.
  37. Masoud Seddighin and Saeed Seddighin. 3+ε approximation of tree edit distance in truly subquadratic time. In ITCS 2022, volume 215, pages 115:1-115:22, 2022. URL: https://doi.org/10.4230/LIPIcs.ITCS.2022.115.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail