Sparsification Enables Predicting Kissing Hairpin Pseudoknot Structures of Long RNAs in Practice

Authors Hosna Jabbari, Ian Wark, Carlo Montemagno, Sebastian Will



PDF
Thumbnail PDF

File

LIPIcs.WABI.2017.12.pdf
  • Filesize: 0.69 MB
  • 13 pages

Document Identifiers

Author Details

Hosna Jabbari
Ian Wark
Carlo Montemagno
Sebastian Will

Cite AsGet BibTex

Hosna Jabbari, Ian Wark, Carlo Montemagno, and Sebastian Will. Sparsification Enables Predicting Kissing Hairpin Pseudoknot Structures of Long RNAs in Practice. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 12:1-12:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.WABI.2017.12

Abstract

While computational RNA secondary structure prediction is an important tool in RNA research, it is still fundamentally limited to pseudoknot-free structures (or at best very simple pseudoknots) in practice. Here, we make the prediction of complex pseudoknots - including kissing hairpin structures - practically applicable by reducing the originally high space consumption. For this aim, we apply the technique of sparsification and other space-saving modifications to the recurrences of the pseudoknot prediction algorithm by Chen, Condon and Jabbari (CCJ algorithm). Thus, the theoretical space complexity of free energy minimization is reduced to Theta(n^3+Z), in the sequence length n and the number of non-optimally decomposable fragments ("candidates") Z. The sparsified CCJ algorithm, sparseCCJ, is presented in detail. Moreover, we provide and compare three generations of CCJ implementations, which continuously improve the space requirements: the original CCJ implementation, our first modified implementation, and our final sparsified implementation. The two latest implementations implement the established HotKnots DP09 energy model. In our experiments, using 244GB of RAM, the original CCJ implementation failed to handle sequences longer than 195 bases; sparseCCJ handles our pseudoknot data set (up to about length 400 bases) in this space limit. All three CCJ implementations are available at https://github.com/HosnaJabbari/CCJ.
Keywords
  • RNA
  • secondary structure prediction
  • pseudoknots
  • space efficiency
  • sparsification

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. T. Akutsu. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Disc. App. Math., 104(1-3):45-62, 2000. Google Scholar
  2. M. S. Andronescu, C. Pop, and A. E. Condon. Improved free energy parameters for RNA pseudoknotted secondary structure prediction. RNA (New York, N.Y.), 16(1):26-42, January 2010. Google Scholar
  3. R. Backofen, D. Tsur, S. Zakov, and M. Ziv-Ukelson. Sparse RNA folding: Time and space efficient algorithms. Journal of Discrete Algorithms, 9(1):12-31, March 2011. URL: http://dx.doi.org/10.1016/j.jda.2010.09.001.
  4. K.-Y. Chang and I. Tinoco. The structure of an RNA kissing hairpin complex of the HIV TAR hairpin loop and its complement. Journal of Molecular Biology, 269(1):52-66, May 1997. URL: http://dx.doi.org/10.1006/jmbi.1997.1021.
  5. H. L. Chen, A. Condon, and H. Jabbari. An o(n(5)) algorithm for MFE prediction of kissing hairpins and 4-chains in nucleic acids. Journal of computational biology : a journal of computational molecular cell biology, 16(6):803-815, June 2009. URL: http://dx.doi.org/10.1089/cmb.2008.0219.
  6. H. Jabbari. Algorithms for prediction of RNA pseudoknotted secondary structures. PhD thesis, University of British Columbia, March 2015. Google Scholar
  7. R. B. Lyngsø. Complexity of pseudoknot prediction in simple models. In ICALP'04, pages 919-931, 2004. Google Scholar
  8. R. B. Lyngsø and C. N. Pedersen. RNA pseudoknot prediction in energy-based models. J. Comput. Biol., 7(3-4):409-427, 2000. Google Scholar
  9. W. J. Melchers, J. G. Hoenderop, H. J. Bruins Slot, C. W. Pleij, E. V. Pilipenko, V.İ. Agol, and J. M. Galama. Kissing of the two predominant hairpin loops in the coxsackie B virus 3' untranslated region is the essential structural feature of the origin of replication required for negative-strand RNA synthesis. Journal of Virology, 71(1):686-696, January 1997. Google Scholar
  10. T. R. Mercer, M. E. Dinger, and J. S. Mattick. Long non-coding RNAs: insights into functions. Nature Reviews Genetics, 10(3):155-159, March 2009. URL: http://dx.doi.org/10.1038/nrg2521.
  11. M. Möhl, R. Salari, S. Will, R. Backofen, and S. C. Sahinalp. Sparsification of RNA structure prediction including pseudoknots. Algorithms for Molecular Biology, 5(1):39+, December 2010. URL: http://dx.doi.org/10.1186/1748-7188-5-39.
  12. R. Nussinov and A. B. Jacobson. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proceedings of the National Academy of Sciences of the United States of America, 77(11):6309-6313, November 1980. URL: http://dx.doi.org/10.1073/pnas.77.11.6309.
  13. J. Reeder and R. Giegerich. Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics, 5, 2004. Google Scholar
  14. E. Rivas and S. R. Eddy. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol., 285(5):2053-2068, 1999. Google Scholar
  15. R. Salari, M. Möhl, S. Will, S. C. Sahinalp, and R. Backofen. Time and Space Efficient RNA-RNA Interaction Prediction via Sparse Folding. In Bonnie Berger, editor, Research in Computational Molecular Biology, volume 6044 of Lecture Notes in Computer Science, chapter 31, pages 473-490. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2010. URL: http://dx.doi.org/10.1007/978-3-642-12683-3_31.
  16. K. Sato, Y. Kato, M. Hamada, T. Akutsu, and K. Asai. IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics, 27(13):i85-i93, July 2011. Google Scholar
  17. J. Sperschneider, A. Datta, and M. J. Wise. Predicting pseudoknotted structures across two RNA sequences. Bioinformatics (Oxford, England), 28(23):3058-3065, December 2012. Google Scholar
  18. Y. Uemura, A. Hasegawa, S. Kobayashi, and T. Yokomori. Tree adjoining grammars for RNA structure prediction. Theor. Comput. Sci., 210(2):277-303, 1999. Google Scholar
  19. M. H. Verheije, R. C. L. Olsthoorn, M. V. Kroese, P. J. M. Rottier, and J. J. M. Meulenberg. Kissing interaction between 3' noncoding and coding sequences is essential for porcine arterivirus RNA replication. Journal of Virology, 76(3):1521-1526, February 2002. URL: http://dx.doi.org/10.1128/jvi.76.3.1521-1526.2002.
  20. Y. Wexler, C. Zilberstein, and M. Ziv-Ukelson. A study of accessible motifs and RNA folding complexity. Journal of computational biology: a journal of computational molecular cell biology, 14(6):856-872, 2007. URL: http://dx.doi.org/10.1089/cmb.2007.r020.
  21. S. Will and H. Jabbari. Sparse RNA folding revisited: space-efficient minimum free energy structure prediction. Algorithms for molecular biology: AMB, 11, 2016. Google Scholar