Fast and Accurate Structure Probability Estimation for Simultaneous Alignment and Folding of RNAs

Authors Milad Miladi , Martin Raden , Sebastian Will , Rolf Backofen



PDF
Thumbnail PDF

File

LIPIcs.WABI.2019.14.pdf
  • Filesize: 1.41 MB
  • 13 pages

Document Identifiers

Author Details

Milad Miladi
  • Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany
Martin Raden
  • Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany
Sebastian Will
  • Theoretical Biochemistry Group (TBI), Institute for Theoretical Chemistry, University of Vienna, Austria
Rolf Backofen
  • Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany
  • Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Germany

Acknowledgements

The authors would like to thank the anonymous reviewers.

Cite AsGet BibTex

Milad Miladi, Martin Raden, Sebastian Will, and Rolf Backofen. Fast and Accurate Structure Probability Estimation for Simultaneous Alignment and Folding of RNAs. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 14:1-14:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.WABI.2019.14

Abstract

Motivation: Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of O(n^6) in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. Results: Here we introduce a novel variant of Sankoff’s algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. Supporting modifications with surgical precision, this model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff’s algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA. Pankov is developed as a branch of the LocARNA package and available at https://github.com/mmiladi/Pankov.

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
  • Applied computing → Computational biology
  • Applied computing → Molecular structural biology
Keywords
  • RNA secondary structure
  • Structural bioinformatics
  • Alignment
  • Algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Athanasius F Bompfünewerer, Rolf Backofen, Stephan H Bernhart, Jana Hertel, Ivo L Hofacker, Peter F Stadler, and Sebastian Will. Variations on RNA folding and alignment: lessons from Benasque. Journal of mathematical biology, 56(1-2):129-144, 2008. Google Scholar
  2. Thomas R Cech and Joan A Steitz. The noncoding RNA revolution—trashing old rules to forge new ones. Cell, 157(1):77-94, 2014. Google Scholar
  3. Padideh Danaee, Mason Rouches, Michelle Wiley, Dezhong Deng, Liang Huang, and David Hendrix. bpRNA: large-scale automated annotation and analysis of RNA secondary structure. Nucleic acids research, 46(11):5381-5394, 2018. Google Scholar
  4. R. D. Dowell and S. R. Eddy. Efficient Pairwise RNA Structure Prediction and Alignment Using Sequence Alignment Constraints. BMC Bioinformatics, 7:400, 2006. Google Scholar
  5. Robin D Dowell and Sean R Eddy. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC bioinformatics, 5(1):71, 2004. Google Scholar
  6. Paul P. Gardner, Andreas Wilm, and Stefan Washietl. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res, 33(8):2433-9, 2005. Google Scholar
  7. Jan Gorodkin, Shawn L Stricklin, and Gary D Stormo. Discovering common stem-loop motifs in unaligned RNA sequences. Nucleic Acids Research, 29(10):2135-2144, 2001. Google Scholar
  8. Arif Ozgun Harmanci, Gaurav Sharma, and David H. Mathews. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics, 8:130, 2007. Google Scholar
  9. Jakob Hull Havgaard, Rune B Lyngsø, Gary D Stormo, and Jan Gorodkin. Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics, 21(9):1815-1824, 2005. Google Scholar
  10. I. L. Hofacker, S. H. Bernhart, and P. F. Stadler. Alignment of RNA base pairing probability matrices. Bioinformatics, 20(14):2222-7, 2004. Google Scholar
  11. Hisanori Kiryu, Yasuo Tabei, Taishin Kin, and Kiyoshi Asai. Murlet: a practical multiple alignment tool for structural RNA sequences. Bioinformatics, 23(13):1588-1598, 2007. Google Scholar
  12. Benedikt Löwes, Cedric Chauve, Yann Ponty, and Robert Giegerich. The BRaliBase dent-a tale of benchmark design and interpretation. Briefings in bioinformatics, 18(2):306-311, 2016. Google Scholar
  13. David H Mathews and Douglas H Turner. Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. Journal of molecular biology, 317(2):191-203, 2002. Google Scholar
  14. DH Mathews, J Sabina, M Zuker, and DH Turner. Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of RNA Secondary Structure. J Mol Biol, 288(5):911-40, 1999. Google Scholar
  15. J. S. McCaskill. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29(6-7):1105-19, 1990. Google Scholar
  16. Milad Miladi, Junge Alexander, Costa Fabrizio, Seemann Stefan E., Havgaard Jakob Hull, Jan Gorodkin, and Rolf Backofen. RNAscClust: clustering rna sequences using structure conservation and graph based motifs. Bioinformatics, 33(14):2089-2096, 2017. Google Scholar
  17. Christina Otto, Mathias Mohl, Steffen Heyne, Mika Amit, Gad M. Landau, Rolf Backofen, and Sebastian Will. ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs. BMC Bioinformatics, 15(1):6602, 2014. Google Scholar
  18. David Sankoff. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math., 45(5):810-825, 1985. Google Scholar
  19. Matthew G Seetin and David H Mathews. RNA structure prediction: an overview of methods. In Bacterial Regulatory RNA, pages 99-122. Springer, 2012. Google Scholar
  20. Elfar Torarinsson, Jakob H. Havgaard, and Jan Gorodkin. Multiple structural alignment and clustering of RNA sequences. Bioinformatics, 23(8):926-32, 2007. Google Scholar
  21. Sebastian Will, Christina Otto, Milad Miladi, Mathias Möhl, and Rolf Backofen. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics. Bioinformatics, 31(15):2489-2496, 2015. Google Scholar
  22. Sebastian Will, Kristin Reiche, Ivo L. Hofacker, Peter F. Stadler, and Rolf Backofen. Inferring Non-Coding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering. PLoS Comput Biol, 3(4):e65, 2007. Google Scholar
  23. Andreas Wilm, Indra Mainz, and Gerhard Steger. An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol Biol, 1:19, 2006. Google Scholar
  24. Stefan Wuchty, Walter Fontana, Ivo L Hofacker, and Peter Schuster. Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers: Original Research on Biomolecules, 49(2):145-165, 1999. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail