Das, Debarati ;
Kociumaka, Tomasz ;
Saha, Barna
Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding
Abstract
The Dyck language, which consists of wellbalanced sequences of parentheses, is one of the most fundamental contextfree languages. The Dyck edit distance quantifies the number of edits (character insertions, deletions, and substitutions) required to make a given lengthn parenthesis sequence wellbalanced. RNA Folding involves a similar problem, where a closing parenthesis can match an opening parenthesis of the same type irrespective of their ordering. For example, in RNA Folding, both () and )( are valid matches, whereas the Dyck language only allows () as a match. Both of these problems have been studied extensively in the literature. Using fast matrix multiplication, it is possible to compute their exact solutions in time O(n^2.687) (Chi, Duan, Xie, Zhang, STOC'22), and a (1+ε)multiplicative approximation is known with a running time of Ω(n^2.372).
The impracticality of fast matrix multiplication often makes combinatorial algorithms much more desirable. Unfortunately, it is known that the problems of (exactly) computing the Dyck edit distance and the folding distance are at least as hard as Boolean matrix multiplication. Thereby, they are unlikely to admit truly subcubictime combinatorial algorithms. In terms of fast approximation algorithms that are combinatorial in nature, the state of the art for Dyck edit distance is an O(log n)factor approximation algorithm that runs in nearlinear time (Saha, FOCS'14), whereas for RNA Folding only an ε nadditive approximation in Õ(n²/ε) time (Saha, FOCS'17) is known.
In this paper, we make substantial improvements to the state of the art for Dyck edit distance (with any number of parenthesis types). We design a constantfactor approximation algorithm that runs in Õ(n^1.971) time (the first constantfactor approximation in subquadratic time). Moreover, we develop a (1+ε)factor approximation algorithm running in Õ(n²/ε) time, which improves upon the earlier additive approximation. Finally, we design a (3+ε)approximation that takes Õ(nd/ε) time, where d ≥ 1 is an upper bound on the sought distance.
As for RNA folding, for any s ≥ 1, we design a factors approximation algorithm that runs in O(n+(n/s)³) time. To the best of our knowledge, this is the first nontrivial approximation algorithm for RNA Folding that can go below the n² barrier. All our algorithms are combinatorial in nature.
BibTeX  Entry
@InProceedings{das_et_al:LIPIcs.ICALP.2022.49,
author = {Das, Debarati and Kociumaka, Tomasz and Saha, Barna},
title = {{Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding}},
booktitle = {49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
pages = {49:149:20},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772358},
ISSN = {18688969},
year = {2022},
volume = {229},
editor = {Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/16390},
URN = {urn:nbn:de:0030drops163902},
doi = {10.4230/LIPIcs.ICALP.2022.49},
annote = {Keywords: Dyck Edit Distance, RNA Folding, String Algorithms}
}
28.06.2022
Keywords: 

Dyck Edit Distance, RNA Folding, String Algorithms 
Seminar: 

49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Issue date: 

2022 
Date of publication: 

28.06.2022 