SparseRNAFolD: Sparse RNA Pseudoknot-Free Folding Including Dangles

Gray, Mateo; Will, Sebastian; Jabbari, Hosna

doi:10.4230/LIPIcs.WABI.2023.19

Abstract

Motivation. Computational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their interactions. These methods find the structure with the minimum free energy (MFE) among exponentially many possible structures and have a restrictive time and space complexity (O(n³) time and O(n²) space for pseudoknot-free structures) for longer RNA sequences. Furthermore, accurate free energy calculations, including dangles contributions can be difficult and costly to implement, particularly when optimizing for time and space requirements.

Results. Here we introduce a fast and efficient sparsified MFE pseudoknot-free structure prediction algorithm, SparseRNAFolD, that utilizes an accurate energy model that accounts for dangle contributions. While the sparsification technique was previously employed to improve the time and space complexity of a pseudoknot-free structure prediction method with a realistic energy model, SparseMFEFold, it was not extended to include dangle contributions due to the complexity of computation. This may come at the cost of prediction accuracy. In this work, we compare three different sparsified implementations for dangles contributions and provide pros and cons of each method. As well, we compare our algorithm to LinearFold, a linear time and space algorithm, where we find that in practice, SparseRNAFolD has lower memory consumption across all lengths of sequence and a faster time for lengths up to 1000 bases.

Conclusion. Our SparseRNAFolD algorithm is an MFE-based algorithm that guarantees optimality of result and employs the most general energy model, including dangle contributions. We provide a basis for applying dangles to sparsified recursion in a pseudoknot-free model that has the ability to be extended to pseudoknots.

M Andronescu, V Bereg, H H. Hoos, and A Condon. RNA STRAND: The RNA secondary structure and statistical analysis database. BMC Bioinformatics, 9(1):340+, August 2008. URL: https://doi.org/10.1186/1471-2105-9-340.
R Backofen, D Tsur, S Zakov, and M Ziv-Ukelson. Sparse RNA folding: Time and space efficient algorithms. Journal of Discrete Algorithms, 9:12-31, March 2011. URL: https://doi.org/10.1016/j.jda.2010.09.001.
J A. Cruz and E Westhof. The dynamic landscapes of RNA architecture. Cell, 136:604-609, February 2009. URL: https://doi.org/10.1016/j.cell.2009.02.003.
S Dimitrieva and P Bucher. Practicality and time complexity of a sparsified RNA folding algorithm. Journal of Bioinformatics and Computational Biology, 10, April 2012. URL: https://doi.org/10.1142/S0219720012410077.
R M. Dirks and N A. Pierce. A partition function algorithm for nucleic acid secondary structure including pseudoknots. Journal of Computational Chemistry, 24:1664-1677, August 2003. URL: https://doi.org/10.1002/jcc.10296.
A F. Bompfünewerer et al. Variations on RNA folding and alignment: lessons from Benasque. Journal of Mathematical Biology, 56:129-144, January 2008. URL: https://doi.org/10.1007/s00285-007-0107-5.
I L. Hofacker et al. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie / Chemical Monthly, 125:167-188, February 1994. URL: https://doi.org/10.1007/BF00818163.
L Huang et al. Linearfold: linear-time approximate RNA folding by 5'-to-3' dynamic programming and beam search. Bioinformatics, 35:i295-i304, July 2019. URL: https://doi.org/10.1093/bioinformatics/btz375.
R Lorenz et al. ViennaRNA package 2.0. Algorithms for Molecular Biology, 6, November 2011. URL: https://doi.org/10.1186/1748-7188-6-26.
M Gray, S Chester, and H Jabbari. KnotAli: informed energy minimization through the use of evolutionary information. BMC Bioinformatics, 23, May 2022. URL: https://doi.org/10.1186/s12859-022-04673-3.
I L. Hofacker and P F. Stadler. Memory efficient folding algorithms for circular RNA secondary structures. Bioinformatics, 22:1172-1176, May 2006. URL: https://doi.org/10.1093/bioinformatics/btl023.
C E. Holt and S L. Bullock. Subcellular mRNA localization in animal cells and why it matters. Science, 326:1212-1216, September 2013. URL: https://doi.org/10.1126/science.1176488.
H Jabbari and A Condon. A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures. BMC Bioinformatics, 15, May 2014. URL: https://doi.org/10.1186/1471-2105-15-147.
H Jabbari, I Wark, C Montemagno, and S Will. Knotty: efficient and accurate prediction of complex RNA pseudoknot structures. Bioinformatics, 34:3849-3856, November 2018. URL: https://doi.org/10.1093/bioinformatics/bty420.
H Jabbari, I Wark, C Mothentemagno, and S Will. Sparsification enables predicting kissing hairpin pseudoknot structures of long RNAs in practice. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017), volume 88 of Leibniz International Proceedings in Informatics (LIPIcs), pages 12:1-12:13. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.WABI.2017.12.
M Kozak. Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene, 361:13-37, November 2005. URL: https://doi.org/10.1016/j.gene.2005.06.037.
R Lorenz, I L. Hofacker, and P F. Stadler. RNA folding with hard and soft constraints. Algorithms for Molecular Biology, 11, April 2016. URL: https://doi.org/10.1186/s13015-016-0070-z.
K C. Martin and A Ephrussi. mRNA localization: Gene expression in the spatial dimension. Cell, 136:719-730, February 2009. URL: https://doi.org/10.1016/j.cell.2009.01.044.
D H. Mathews and D H. Turner. Prediction of RNA secondary structure by free energy minimization. Current Opinion in Structural Biology, 16(3):270-278, June 2006. URL: https://doi.org/10.1016/j.sbi.2006.05.010.
D H. Matthews, M D. Disney, J L. Childs, S J. Schroeder, M Zuker, and D H. Turner. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proceeding of the National Academy of Science of the USA, 101:7287-7292, May 2004. URL: https://doi.org/10.1073/pnas.0401799101.
J S. McCaskill. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29:1105-1119, June 1990. URL: https://doi.org/10.1002/bip.360290621.
M Mohl, R Salari, S Will, R Backofen, and S Sahinalp. Sparsification of RNA structure prediction including pseudoknots. Algorithms for Molecular Biology, 5, December 2010. URL: https://doi.org/10.1186/1748-7188-5-39.
S A. Mortimer, M A. Kidwell, and J A. Doudna. Insights into RNA structure and function from genome-wide studies. Nature Reviews Genetics, 15:469-479, May 2014. URL: https://doi.org/10.1038/nrg3681.
J Nowakowski and I Tinoco. RNA structure and stability. Seminars in Virology, 8(3):153-165, 1997. URL: https://doi.org/10.1006/smvy.1997.0118.
B Rastegari and A Condon. Parsing nucleic acid pseudoknotted secondary structure: Algorithm and applications. Journal of Computational Biology, 14, March 2007. URL: https://doi.org/10.1089/cmb.2006.0108.
J Ren, B Rastegari, A Condon, and H H. Hoos. HotKnots: Heuristic prediction of RNA secondary structures including pseudoknots. RNA, 11:1494-1504, October 2005. URL: https://doi.org/10.1261/rna.7284905.
J S. Reuter and D H. Matthews. RNAstructure: software for RNA secondary structure prediction and analysis. Proceeding of the National Academy of Science of the USA, 11, March 2010. URL: https://doi.org/10.1186/1471-2105-11-129.
E Rivas and S R. Eddy. A dynamic programming algorithm for RNA structure prediction including pseudoknots. Journal of Molecular Biology, 285:2053-2068, February 1999. URL: https://doi.org/10.1006/jmbi.1998.2436.
R Salari, M Möhl, S Will, S Sahinalp, and R Backofen. Time and space efficient RNA-RNA interaction prediction via sparse folding. In Lecture Notes in Computer Science, volume 6044, pages 473-490. Research in Computational Molecular Biology, 2010. URL: https://doi.org/10.1007/978-3-642-12683-3_31.
N Sugimoto, R kierzek, and D H. Turner. Sequence dependence for the energetics of dangling ends and terminal base pairs in ribonucleic acid. Biochemisty, 19:4554-4558, July 1987. URL: https://doi.org/10.1021/bi00388a058.
D H. Turner and D H. Matthews. NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Research, 38:D280-D282, October 2009. URL: https://doi.org/10.1093/nar/gkp892.
M B. Warf and J A. Berglund. Role of RNA structure in regulating pre-mRNA splicing. Trends Biochem Sci., 35:169-178, March 2010. URL: https://doi.org/10.1016/j.tibs.2009.10.004.
A Waugh, P Gendron, R Altman, J W. Brown, D Case, D Gautheret, S C. Harvey, N Leontis, J Westbrook, E Westhof, M Zuker, and F Major. RNAML: A standard syntax for exchanging RNA information. RNA, 8:707-717, June 2002. URL: https://doi.org/10.1017/s1355838202028017.
Y Wexler, C Zilberstein, and M Ziv-Ukelson. A study of accessible motifs and RNA folding complexity. Journal of Computational Biology, 14:856-872, August 2007. URL: https://doi.org/10.1089/cmb.2007.R020.
S Will and H Jabbari. Sparse RNA folding revisited: space-efficient minimum free energy structure prediction. Algorithms for Molecular Biology, 11, April 2016. URL: https://doi.org/10.1186/s13015-016-0071-y.
T J. Wilson and D M. J. Lilley. RNA catalysis—is that it? RNA, 21:534-537, April 2015. URL: https://doi.org/10.1261/rna.049874.115.
J Zuber, B J. Cabral, I McFayden, D M. Mauger, and D H. Matthews. Analysis of RNA nearest neighbor parameters reveals interdependencies and quantifies the uncertainty in RNA secondary structure prediction. RNA, 24:1568-1582, November 2018. URL: https://doi.org/10.1261/rna.065102.117.
J Zuber, H Sun, X Zhang, I McFayden, and D H. Matthews. A sensitivity analysis of RNA folding nearest neighbor parameters identifies a subset of free energy parameters with the greatest impact on RNA secondary structure prediction. Nucleic Acids Research, 45:6168-6176, June 2017. URL: https://doi.org/10.1093/nar/gkx170.
M Zuker. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 31:3406-3415, July 2003. URL: https://doi.org/10.1093/nar/gkg595.
M Zuker and A B. Jacobson. Using reliability information to annotate RNA secondary structures. RNA, 4:669-679, June 1998. URL: https://doi.org/10.1017/s1355838298980116.
M Zuker and P Stiegler. Optimal computer folding of large RNA sequences using thermodynamic and auxiliary information. Nucleic Acids Research, 9:133-148, January 1981. URL: https://doi.org/10.1093/nar/9.1.133.

SparseRNAFolD: Sparse RNA Pseudoknot-Free Folding Including Dangles

Authors Mateo Gray , Sebastian Will , Hosna Jabbari

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

SparseRNAFolD: Sparse RNA Pseudoknot-Free Folding Including Dangles

Authors Mateo Gray , Sebastian Will , Hosna Jabbari

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Supplementary Materials

References

Thanks for your feedback!

Could not send message