Fast and Accurate Structure Probability Estimation for Simultaneous Alignment and Folding of RNAs

Miladi, Milad; Raden, Martin; Will, Sebastian; Backofen, Rolf

doi:10.4230/LIPIcs.WABI.2019.14

File

LIPIcs.WABI.2019.14.pdf

Filesize: 1.41 MB
13 pages

Document Identifiers

DOI: 10.4230/LIPIcs.WABI.2019.14
URN: urn:nbn:de:0030-drops-110446

Author Details

Milad Miladi

Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany

Martin Raden

Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany

Sebastian Will

Theoretical Biochemistry Group (TBI), Institute for Theoretical Chemistry, University of Vienna, Austria

Rolf Backofen

Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany
Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Germany

Acknowledgements

The authors would like to thank the anonymous reviewers.

Cite AsGet BibTex

Milad Miladi, Martin Raden, Sebastian Will, and Rolf Backofen. Fast and Accurate Structure Probability Estimation for Simultaneous Alignment and Folding of RNAs. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 14:1-14:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.WABI.2019.14

Abstract

Motivation: Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of O(n^6) in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. Results: Here we introduce a novel variant of Sankoff’s algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. Supporting modifications with surgical precision, this model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff’s algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA. Pankov is developed as a branch of the LocARNA package and available at https://github.com/mmiladi/Pankov.

Subject Classification

ACM Subject Classification

Applied computing → Bioinformatics
Applied computing → Computational biology
Applied computing → Molecular structural biology

Keywords

RNA secondary structure
Structural bioinformatics
Alignment
Algorithms

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Athanasius F Bompfünewerer, Rolf Backofen, Stephan H Bernhart, Jana Hertel, Ivo L Hofacker, Peter F Stadler, and Sebastian Will. Variations on RNA folding and alignment: lessons from Benasque. Journal of mathematical biology, 56(1-2):129-144, 2008.
Thomas R Cech and Joan A Steitz. The noncoding RNA revolution—trashing old rules to forge new ones. Cell, 157(1):77-94, 2014.
Padideh Danaee, Mason Rouches, Michelle Wiley, Dezhong Deng, Liang Huang, and David Hendrix. bpRNA: large-scale automated annotation and analysis of RNA secondary structure. Nucleic acids research, 46(11):5381-5394, 2018.
R. D. Dowell and S. R. Eddy. Efficient Pairwise RNA Structure Prediction and Alignment Using Sequence Alignment Constraints. BMC Bioinformatics, 7:400, 2006.
Robin D Dowell and Sean R Eddy. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC bioinformatics, 5(1):71, 2004.
Paul P. Gardner, Andreas Wilm, and Stefan Washietl. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res, 33(8):2433-9, 2005.
Jan Gorodkin, Shawn L Stricklin, and Gary D Stormo. Discovering common stem-loop motifs in unaligned RNA sequences. Nucleic Acids Research, 29(10):2135-2144, 2001.
Arif Ozgun Harmanci, Gaurav Sharma, and David H. Mathews. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics, 8:130, 2007.
Jakob Hull Havgaard, Rune B Lyngsø, Gary D Stormo, and Jan Gorodkin. Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics, 21(9):1815-1824, 2005.
I. L. Hofacker, S. H. Bernhart, and P. F. Stadler. Alignment of RNA base pairing probability matrices. Bioinformatics, 20(14):2222-7, 2004.
Hisanori Kiryu, Yasuo Tabei, Taishin Kin, and Kiyoshi Asai. Murlet: a practical multiple alignment tool for structural RNA sequences. Bioinformatics, 23(13):1588-1598, 2007.
Benedikt Löwes, Cedric Chauve, Yann Ponty, and Robert Giegerich. The BRaliBase dent-a tale of benchmark design and interpretation. Briefings in bioinformatics, 18(2):306-311, 2016.
David H Mathews and Douglas H Turner. Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. Journal of molecular biology, 317(2):191-203, 2002.
DH Mathews, J Sabina, M Zuker, and DH Turner. Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of RNA Secondary Structure. J Mol Biol, 288(5):911-40, 1999.
J. S. McCaskill. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29(6-7):1105-19, 1990.
Milad Miladi, Junge Alexander, Costa Fabrizio, Seemann Stefan E., Havgaard Jakob Hull, Jan Gorodkin, and Rolf Backofen. RNAscClust: clustering rna sequences using structure conservation and graph based motifs. Bioinformatics, 33(14):2089-2096, 2017.
Christina Otto, Mathias Mohl, Steffen Heyne, Mika Amit, Gad M. Landau, Rolf Backofen, and Sebastian Will. ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs. BMC Bioinformatics, 15(1):6602, 2014.
David Sankoff. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math., 45(5):810-825, 1985.
Matthew G Seetin and David H Mathews. RNA structure prediction: an overview of methods. In Bacterial Regulatory RNA, pages 99-122. Springer, 2012.
Elfar Torarinsson, Jakob H. Havgaard, and Jan Gorodkin. Multiple structural alignment and clustering of RNA sequences. Bioinformatics, 23(8):926-32, 2007.
Sebastian Will, Christina Otto, Milad Miladi, Mathias Möhl, and Rolf Backofen. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics. Bioinformatics, 31(15):2489-2496, 2015.
Sebastian Will, Kristin Reiche, Ivo L. Hofacker, Peter F. Stadler, and Rolf Backofen. Inferring Non-Coding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering. PLoS Comput Biol, 3(4):e65, 2007.
Andreas Wilm, Indra Mainz, and Gerhard Steger. An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol Biol, 1:19, 2006.
Stefan Wuchty, Walter Fontana, Ivo L Hofacker, and Peter Schuster. Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers: Original Research on Biomolecules, 49(2):145-165, 1999.

Fast and Accurate Structure Probability Estimation for Simultaneous Alignment and Folding of RNAs

Authors Milad Miladi , Martin Raden , Sebastian Will , Rolf Backofen

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Fast and Accurate Structure Probability Estimation for Simultaneous Alignment and Folding of RNAs

Authors Milad Miladi , Martin Raden , Sebastian Will , Rolf Backofen

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message