Automated Design of Dynamic Programming Schemes for RNA Folding with Pseudoknots

Authors Bertrand Marchand , Sebastian Will , Sarah J. Berkemer , Laurent Bulteau , Yann Ponty



PDF
Thumbnail PDF

File

LIPIcs.WABI.2022.7.pdf
  • Filesize: 1.89 MB
  • 24 pages

Document Identifiers

Author Details

Bertrand Marchand
  • LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, France
  • LIGM, CNRS, Univ Gustave Eiffel, F77454 Marne-la-vallée France
Sebastian Will
  • LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, France
Sarah J. Berkemer
  • LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, France
Laurent Bulteau
  • LIGM, CNRS, Univ Gustave Eiffel, F77454 Marne-la-vallée France
Yann Ponty
  • LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, France

Cite AsGet BibTex

Bertrand Marchand, Sebastian Will, Sarah J. Berkemer, Laurent Bulteau, and Yann Ponty. Automated Design of Dynamic Programming Schemes for RNA Folding with Pseudoknots. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 7:1-7:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.WABI.2022.7

Abstract

Despite being a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, RNA secondary structure prediction remains challenging whenever pseudoknots come into play. To circumvent the NP-hardness of energy minimization in realistic energy models, specialized algorithms have been proposed for restricted conformation classes that capture the most frequently observed configurations. While these methods rely on hand-crafted DP schemes, we generalize and fully automatize the design of DP pseudoknot prediction algorithms. We formalize the problem of designing DP algorithms for an (infinite) class of conformations, modeled by (a finite number of) fatgraphs, and automatically build DP schemes minimizing their algorithmic complexity. We propose an algorithm for the problem, based on the tree-decomposition of a well-chosen representative structure, which we simplify and reinterpret as a DP scheme. The algorithm is fixed-parameter tractable for the tree-width tw of the fatgraph, and its output represents a 𝒪(n^{tw+1}) algorithm for predicting the MFE folding of an RNA of length n. Our general framework supports general energy models, partition function computations, recursive substructures and partial folding, and could pave the way for algebraic dynamic programming beyond the context-free case.

Subject Classification

ACM Subject Classification
  • Applied computing → Computational biology
  • Theory of computation → Dynamic programming
Keywords
  • RNA folding
  • treewidth
  • dynamic programming

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Tatsuya Akutsu. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Applied Mathematics, 104(1-3):45-62, 2000. Google Scholar
  2. Can Alkan, Emre Karakoç, Joseph H. Nadeau, S. Cenk Sahinalp, and Kaizhong Zhang. RNA–RNA Interaction Prediction and Antisense RNA Target Search. Journal of Computational Biology, 13(2):267-282, 2006. URL: https://doi.org/10.1089/cmb.2006.13.267.
  3. Stefan Arnborg, Derek G Corneil, and Andrzej Proskurowski. Complexity of finding embeddings in ak-tree. SIAM Journal on Algebraic Discrete Methods, 8(2):277-284, 1987. Google Scholar
  4. Sarah J Berkemer, Christian Höner zu Siederdissen, and Peter F Stadler. Algebraic dynamic programming on trees. Algorithms, 10(4):135, 2017. Google Scholar
  5. Hans L Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM Journal on computing, 25(6):1305-1317, 1996. Google Scholar
  6. Hans L Bodlaender and Arie MCA Koster. Safe separators for treewidth. Discrete Mathematics, 306(3):337-350, 2006. Google Scholar
  7. Hans L Bodlaender and Arie MCA Koster. Combinatorial optimization on graphs of bounded treewidth. The Computer Journal, 51(3):255-269, 2008. Google Scholar
  8. Hans L Bodlaender and Arie MCA Koster. Treewidth computations i. upper bounds. Information and Computation, 208(3):259-275, 2010. Google Scholar
  9. Song Cao and Shi-Jie Chen. Predicting RNA pseudoknot folding thermodynamics. Nucleic Acids Research, 34(9):2634-2652, January 2006. URL: https://doi.org/10.1093/nar/gkl346.
  10. Ho-Lin Chen, Anne Condon, and Hosna Jabbari. An O(n⁵) algorithm for MFE prediction of kissing hairpins and 4-chains in nucleic acids. Journal of Computational Biology, 16(6):803-815, 2009. Google Scholar
  11. Marek Cygan, Fedor V Fomin, Łukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. Parameterized algorithms, volume 1. Springer, 2015. Google Scholar
  12. Ye Ding and Charles E. Lawrence. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Research, 31(24):7280-7301, December 2003. URL: https://doi.org/10.1093/nar/gkg938.
  13. Robert M Dirks, Justin S Bois, Joseph M Schaeffer, Erik Winfree, and Niles A Pierce. Thermodynamic analysis of interacting nucleic acid strands. SIAM review, 49(1):65-88, 2007. Google Scholar
  14. Robert M Dirks and Niles A Pierce. A partition function algorithm for nucleic acid secondary structure including pseudoknots. Journal of computational chemistry, 24(13):1664-1677, 2003. Google Scholar
  15. Chuong B Do, Daniel A Woods, and Serafim Batzoglou. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22(14):e90-e98, 2006. Google Scholar
  16. Mark E. Fornace, Nicholas J. Porubsky, and Niles A. Pierce. A Unified Dynamic Programming Framework for the Analysis of Interacting Nucleic Acid Strands: Enhanced Models, Scalability, and Speed. ACS Synthetic Biology, 9(10):2665-2678, 2020. PMID: 32910644. URL: https://doi.org/10.1021/acssynbio.9b00523.
  17. Robert Giegerich, Björn Voß, and Marc Rehmsmeier. Abstract shapes of rna. Nucleic acids research, 32(16):4843-4851, 2004. Google Scholar
  18. Vibhav Gogate and Rina Dechter. A complete anytime algorithm for treewidth. arXiv preprint arXiv:1207.4109, 2012. Google Scholar
  19. Fenix Huang, Christian Reidys, and Reza Rezazadegan. Fatgraph models of RNA structure. Computational and Mathematical Biophysics, 5(1):1-20, 2017. Google Scholar
  20. Hosna Jabbari and Anne Condon. A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures. BMC bioinformatics, 15(1):1-17, 2014. Google Scholar
  21. Hosna Jabbari, Ian Wark, Carlo Montemagno, and Sebastian Will. Knotty: efficient and accurate prediction of complex RNA pseudoknot structures. Bioinformatics, 34(22):3849-3856, 2018. Google Scholar
  22. Martin Loebl and Iain Moffatt. The chromatic polynomial of fatgraphs and its categorification. Advances in Mathematics, 217(4):1558-1587, 2008. Google Scholar
  23. R Lorenz, SH Bernhart, C Höner Zu Siederdissen, H Tafer, C Flamm, PF Stadler, and IL Hofacker. ViennaRNA Package 2.0. vol. 6. Algorithms Mol. Biol, page 26, 2011. Google Scholar
  24. László Lovász. Graph minor theory. Bulletin of the American Mathematical Society, 43(1):75-86, 2006. Google Scholar
  25. R. B. Lyngsø, M. Zuker, and C. N. Pedersen. Fast evaluation of internal loops in RNA secondary structure prediction. Bioinformatics (Oxford, England), 15(6):440-445, June 1999. URL: https://doi.org/10.1093/bioinformatics/15.6.440.
  26. J. S. McCaskill. The equilibrium partition function and base pair binding probabilities for rna secondary structure. Biopolymers, 29(6-7):1105-1119, 1990. URL: https://doi.org/10.1002/bip.360290621.
  27. Mathias Möhl, Sebastian Will, and Rolf Backofen. Lifting prediction to alignment of RNA pseudoknots. Journal of Computational Biology, 17(3):429-442, 2010. Google Scholar
  28. Felix Mölder, Kim Philipp Jablonski, Brice Letcher, Michael B Hall, Christopher H Tomkins-Tinch, Vanessa Sochat, Jan Forster, Soohyun Lee, Sven O Twardziok, Alexander Kanitz, et al. Sustainable data analysis with snakemake. F1000Research, 10, 2021. Google Scholar
  29. Ruth Nussinov and Ann B Jacobson. Fast algorithm for predicting the secondary structure of single-stranded rna. Proceedings of the National Academy of Sciences, 77(11):6309-6313, 1980. Google Scholar
  30. Robert Clark Penner, Michael Knudsen, Carsten Wiuf, and Jørgen Ellegaard Andersen. Fatgraph models of proteins. Communications on Pure and Applied Mathematics, 63(10):1249-1297, 2010. Google Scholar
  31. Yann Ponty and Cédric Saule. A combinatorial framework for designing (pseudoknotted) RNA algorithms. In Teresa M. Przytycka and Marie-France Sagot, editors, Algorithms in Bioinformatics, pages 250-269, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. Google Scholar
  32. Michela Quadrini, Luca Tesei, and Emanuela Merelli. An algebraic language for RNA pseudoknots comparison. BMC bioinformatics, 20(4):1-18, 2019. Google Scholar
  33. Christian M Reidys, Fenix WD Huang, Jørgen E Andersen, Robert C Penner, Peter F Stadler, and Markus E Nebel. Topology and prediction of RNA pseudoknots. Bioinformatics, 27(8):1076-1085, 2011. Google Scholar
  34. Christian M Reidys and Rita R Wang. Shapes of RNA pseudoknot structures. Journal of Computational Biology, 17(11):1575-1590, 2010. Google Scholar
  35. Jihong Ren, Baharak Rastegari, Anne Condon, and Holger H Hoos. HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. Rna, 11(10):1494-1504, 2005. Google Scholar
  36. Jessica S Reuter and David H Mathews. RNAstructure: software for rna secondary structure prediction and analysis. BMC bioinformatics, 11(1):1-9, 2010. Google Scholar
  37. Maik Riechert, Christian Höner zu Siederdissen, and Peter F. Stadler. Algebraic dynamic programming for multiple context-free grammars. Theoretical Computer Science, 639:91-109, August 2016. URL: https://doi.org/10.1016/j.tcs.2016.05.032.
  38. Philippe Rinaudo, Yann Ponty, Dominique Barth, and Alain Denise. Tree decomposition and parameterized algorithms for RNA structure-sequence alignment including tertiary interactions and pseudoknots. In International Workshop on Algorithms in Bioinformatics, pages 149-164. Springer, 2012. Google Scholar
  39. Elena Rivas and Sean R Eddy. A dynamic programming algorithm for RNA structure prediction including pseudoknots. Journal of molecular biology, 285(5):2053-2068, 1999. Google Scholar
  40. Kengo Sato, Manato Akiyama, and Yasubumi Sakakibara. RNA secondary structure prediction using deep learning with thermodynamic integration. Nature communications, 12(1):1-9, 2021. Google Scholar
  41. Kengo Sato, Yuki Kato, Michiaki Hamada, Tatsuya Akutsu, and Kiyoshi Asai. IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics, 27(13):i85-i93, 2011. Google Scholar
  42. Céline Scornavacca and Mathias Weller. Treewidth-based algorithms for the small parsimony problem on networks. In WABI, volume 201 of LIPIcs, pages 6:1-6:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. Google Scholar
  43. Hisao Tamaki. Positive-instance driven dynamic programming for treewidth. Journal of Combinatorial Optimization, 37(4):1283-1311, 2019. Google Scholar
  44. Edwin Ten Dam, Kees Pleij, and David Draper. Structural and functional aspects of RNA pseudoknots. Biochemistry, 31(47):11665-11676, 1992. Google Scholar
  45. Hua-Ting Yao, Jérôme Waldispühl, Yann Ponty, and Sebastian Will. Taming Disruptive Base Pairs to Reconcile Positive and Negative Structural Design of RNA. In RECOMB 2021-25th international conference on research in computational molecular biology, 2021. Google Scholar
  46. Shay Zakov, Yoav Goldberg, Michael Elhadad, and Michal Ziv-Ukelson. Rich parameterization improves RNA structure prediction. Journal of Computational Biology, 18(11):1525-1542, 2011. Google Scholar
  47. Michael Zuker. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic acids research, 31(13):3406-3415, 2003. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail