Safe and Complete Algorithms for Dynamic Programming Problems, with an Application to RNA Folding

Authors Niko Kiirala, Leena Salmela , Alexandru I. Tomescu



PDF
Thumbnail PDF

File

LIPIcs.CPM.2019.8.pdf
  • Filesize: 0.75 MB
  • 16 pages

Document Identifiers

Author Details

Niko Kiirala
  • Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland
Leena Salmela
  • Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland
Alexandru I. Tomescu
  • Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland

Cite As Get BibTex

Niko Kiirala, Leena Salmela, and Alexandru I. Tomescu. Safe and Complete Algorithms for Dynamic Programming Problems, with an Application to RNA Folding. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.CPM.2019.8

Abstract

Many bioinformatics problems admit a large number of solutions, with no way of distinguishing the correct one among them. One approach of coping with this issue is to look at the partial solutions common to all solutions. Such partial solutions have been called safe, and an algorithm outputting all safe solutions has been called safe and complete. In this paper we develop a general technique that automatically provides a safe and complete algorithm to problems solvable by dynamic programming. We illustrate it by applying it to the bioinformatics problem of RNA folding, assuming the simplistic folding model maximizing the number of paired bases. Our safe and complete algorithm has time complexity O(n^3M(n)) and space complexity O(n^3) where n is the length of the RNA sequence and M(n) in Omega(n) is the time complexity of arithmetic operations on O(n)-bit integers. We also implement this algorithm and show that, despite an exponential number of optimal solutions, our algorithm is efficient in practice.

Subject Classification

ACM Subject Classification
  • Theory of computation → Dynamic programming
  • Applied computing → Life and medical sciences
  • Applied computing → Molecular structural biology
  • Theory of computation → Design and analysis of algorithms
Keywords
  • RNA secondary structure
  • RNA folding
  • Safe solution
  • Safe and complete algorithm
  • Counting problem

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Tatsuya Akutsu. Approximation and Exact Algorithms for RNA Secondary Structure Prediction and Recognition of Stochastic Context-free Languages. Journal of Combinatorial Optimization, 3(2):321-336, July 1999. URL: http://dx.doi.org/10.1023/A:1009898029639.
  2. Tatsuya Akutsu. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Applied Mathematics, 104(1):45-62, 2000. URL: http://dx.doi.org/10.1016/S0166-218X(00)00186-4.
  3. Mirela Andronescu, Vera Bereg, Holger H Hoos, and Anne Condon. RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinformatics, 9(1):340, 2008. URL: http://dx.doi.org/10.1186/1471-2105-9-340.
  4. Masanori Arita. Graph modeling of metabolism. Journal-Japanese Society for Artificial Intelligence, 15(4):703-710, 2000. Google Scholar
  5. Masanori Arita. Metabolic reconstruction using shortest paths. Simulation Practice and Theory, 8(1-2):109-125, 2000. Google Scholar
  6. J. K. Baker. Trainable grammars for speech recognition. The Journal of the Acoustical Society of America, 65(S1):S132-S132, 1979. URL: http://dx.doi.org/10.1121/1.2017061.
  7. Helen M. Berman, Wilma K. Olson, David L. Beveridge, John Westbrook, Anke Gelbin, Tamas Demeny, Shu-Hsin Hsieh, A.R. Srinivasan, and Bohdan Schneider. The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys J, 63(3):751-759, 1992. Google Scholar
  8. Jamie J. Cannone, Sankar Subramanian, Murray N. Schnare, James R. Collett, Lisa M. D'Souza, Yushi Du, Brian Feng, Nan Lin, Lakshmi V. Madabusi, Kirsten M. Müller, Nupur Pande, Zhidi Shang, Nan Yu, and Robin R. Gutell. The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics, 3(1):2, January 2002. URL: http://dx.doi.org/10.1186/1471-2105-3-2.
  9. Kun-Mao Chao, Ross C. Hardison, and Webb Miller. Locating well-conserved regions within a pairwise alignment. CABIOS, 9(4):387-396, 1993. URL: http://dx.doi.org/10.1093/bioinformatics/9.4.387.
  10. Richard Durbin, Sean R Eddy, Anders Krogh, and Graeme Mitchison. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press, 1998. Google Scholar
  11. David Eppstein. K-Best Enumeration. Bulletin of the EATCS, 115, 2015. URL: http://eatcs.org/beatcs/index.php/beatcs/article/view/322.
  12. David Eppstein. k-Best Enumeration. In Encyclopedia of Algorithms. Springer, Berlin, Heidelberg, 2015. URL: http://dx.doi.org/10.1007/978-3-642-27848-8_733-1.
  13. Yelena Frid and Dan Gusfield. A simple, practical and complete O-time Algorithm for RNA folding using the Four-Russians Speedup. Algorithms for Molecular Biology, 5(1):13, January 2010. URL: http://dx.doi.org/10.1186/1748-7188-5-13.
  14. Andreas Friemann and Stefan Schmitz. A new approach for displaying identities and differences among aligned amino acid sequences. Comput Appl Biosci, 8(3):261-265, June 1992. Google Scholar
  15. Robert Giegerich, Carsten Meyer, and Peter Steffen. A discipline of dynamic programming over sequence data. Science of Computer Programming, 51(3):215-263, 2004. URL: http://dx.doi.org/10.1016/j.scico.2003.12.005.
  16. Ivo L. Hofacker, Peter Schuster, and Peter F. Stadler. Combinatorics of RNA secondary structures. Discrete Applied Mathematics, 88(1):207-237, 1998. Computational Molecular Biology DAM - CMB Series. URL: http://dx.doi.org/10.1016/S0166-218X(98)00073-0.
  17. Ronny Lorenz, Stephan H Bernhart, Christian Höner zu Siederdissen, Hakim Tafer, Christoph Flamm, Peter F Stadler, and Ivo L Hofacker. ViennaRNA package 2.0. Algorithms for Molecular Biology, 6(26), November 2011. URL: http://dx.doi.org/10.1186/1748-7188-6-26.
  18. David H Mathews, Jeffrey Sabina, Michael Zuker, and Douglas H Turner. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. Journal of Molecular Biology, 288(5):911-940, 1999. URL: http://dx.doi.org/10.1006/jmbi.1999.2700.
  19. John S. McCaskill. The Equilibrium Partition Function and Base Pair Binding Probabilities for RNA Secondary Structure. Biopolymers, 29(6-7):1105-1119, 1990. URL: http://dx.doi.org/10.1002/bip.360290621.
  20. Niranjan Nagarajan and Mihai Pop. Parametric Complexity of Sequence Assembly: Theory and Applications to Next Generation Sequencing. Journal of Computational Biology, 16(7):897-908, 2009. Google Scholar
  21. Ruth Nussinov and Ann B. Jacobson. Fast Algorithm for Predicting the Secondary Structure of Single-Stranded RNA. Proceedings of the National Academy of Sciences of the United States of America, 77(11):6309-6313, November 1980. URL: http://dx.doi.org/10.1073/pnas.77.11.6309.
  22. Leena Salmela and Alexandru I. Tomescu. Safely Filling Gaps with Partial Solutions Common to All Solutions. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019. URL: http://dx.doi.org/10.1109/TCBB.2017.2785831.
  23. Arnold Schönhage and Volker Strassen. Schnelle Multiplikation großer Zahlen. Computing, 7(3-4):281-292, 1971. URL: http://dx.doi.org/10.1007/BF02242355.
  24. Yu-Keng Shih and Srinivasan Parthasarathy. A single source k-shortest paths algorithm to infer regulatory pathways in a gene network. Bioinformatics, 28(12):i49-i58, 2012. Google Scholar
  25. Mathias Sprinzl and Konstantin S. Vassilenko. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Research, 33(Database issue):139-140, 2005. Google Scholar
  26. Alexandru I. Tomescu and Paul Medvedev. Safe and Complete Contig Assembly Through Omnitigs. Journal of Computational Biology, 24(6):590-602, 2017. URL: http://dx.doi.org/10.1089/cmb.2016.0141.
  27. Leslie G. Valiant. General context-free recognition in less than cubic time. Journal of Computer and System Sciences, 10(2):308-315, 1975. URL: http://dx.doi.org/10.1016/S0022-0000(75)80046-8.
  28. Martin Vingron. Near-optimal sequence alignment. Curr. Opinion in Structural Biol., 6(3):346-352, June 1996. URL: http://dx.doi.org/10.1016/s0959-440x(96)80054-6.
  29. Martin Vingron and Patrick Argos. Determination of reliable regions in protein sequence alignments. Prot. Engin., 3(7):565-569, 1990. URL: http://dx.doi.org/10.1093/protein/3.7.565.
  30. Amy E Walter, Douglas H Turner, James Kim, Matthew H Lyttle, Peter Müller, David H Mathews, and Michael Zuker. Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. Proceedings of the National Academy of Sciences, 91(20):9218-9222, 1994. URL: http://dx.doi.org/10.1073/pnas.91.20.9218.
  31. Michael S. Waterman and Thomas H. Byers. A dynamic programming algorithm to find all solutions in a neighborhood of the optimum. Mathematical Biosciences, 77(1):179-188, 1985. URL: http://dx.doi.org/10.1016/0025-5564(85)90096-3.
  32. John Westbrook, Zukang Feng, Li Chen, Huanwang Yang, and Helen M. Berman. The Protein Data Bank and structural genomics. Nucleic Acids Research, 31(1):489-491, January 2003. URL: http://dx.doi.org/10.1093/nar/gkg068.
  33. Stefan Wuchty, Walter Fontana, Ivo L Hofacker, and Peter Schuster. Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers, 49(2):145-165, 1999. URL: http://dx.doi.org/10.1002/(SICI)1097-0282(199902)49:2<145::AID-BIP4>3.0.CO;2-G.
  34. Shay Zakov, Dekel Tsur, and Michal Ziv-Ukelson. Reducing the worst case running times of a family of RNA and CFG problems, using Valiant’s approach. Algorithms for Molecular Biology, 6(1):20, August 2011. URL: http://dx.doi.org/10.1186/1748-7188-6-20.
  35. Michael Zuker. On finding all suboptimal foldings of an RNA molecule. Science, 244(4900):48-52, April 1989. URL: http://dx.doi.org/10.1126/science.2468181.
  36. Michael Zuker. Suboptimal sequence alignment in molecular biology: Alignment with error analysis. J Mol Biol, 221(2):403-420, September 1991. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail