Tree Diet: Reducing the Treewidth to Unlock FPT Algorithms in RNA Bioinformatics

Authors Bertrand Marchand , Yann Ponty , Laurent Bulteau



PDF
Thumbnail PDF

File

LIPIcs.WABI.2021.7.pdf
  • Filesize: 1.75 MB
  • 23 pages

Document Identifiers

Author Details

Bertrand Marchand
  • LIX CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
  • LIGM, CNRS, Univ Gustave Eiffel, F77454 Marne-la-vallée, France
Yann Ponty
  • LIX CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
Laurent Bulteau
  • LIGM, CNRS, Univ Gustave Eiffel, F77454 Marne-la-vallée, France

Acknowledgements

The authors would like to thank Julien Baste for pointing out prior work on treewidth modulators, and providing valuable input regarding vertex deletion problems.

Cite AsGet BibTex

Bertrand Marchand, Yann Ponty, and Laurent Bulteau. Tree Diet: Reducing the Treewidth to Unlock FPT Algorithms in RNA Bioinformatics. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 7:1-7:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.WABI.2021.7

Abstract

Hard graph problems are ubiquitous in Bioinformatics, inspiring the design of specialized Fixed-Parameter Tractable algorithms, many of which rely on a combination of tree-decomposition and dynamic programming. The time/space complexities of such approaches hinge critically on low values for the treewidth tw of the input graph. In order to extend their scope of applicability, we introduce the Tree-Diet problem, i.e. the removal of a minimal set of edges such that a given tree-decomposition can be slimmed down to a prescribed treewidth tw'. Our rationale is that the time gained thanks to a smaller treewidth in a parameterized algorithm compensates the extra post-processing needed to take deleted edges into account. Our core result is an FPT dynamic programming algorithm for Tree-Diet, using 2^{O(tw)}n time and space. We complement this result with parameterized complexity lower-bounds for stronger variants (e.g., NP-hardness when tw' or tw-tw' is constant). We propose a prototype implementation for our approach which we apply on difficult instances of selected RNA-based problems: RNA design, sequence-structure alignment, and search of pseudoknotted RNAs in genomes, revealing very encouraging results. This work paves the way for a wider adoption of tree-decomposition-based algorithms in Bioinformatics.

Subject Classification

ACM Subject Classification
  • Theory of computation → Dynamic programming
  • Theory of computation → Parameterized complexity and exact algorithms
  • Applied computing → Bioinformatics
Keywords
  • RNA
  • treewidth
  • FPT algorithms
  • RNA design
  • structure-sequence alignment

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Tatsuya Akutsu. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Appl. Math., 104(1-3):45-62, 2000. URL: https://doi.org/10.1016/S0166-218X(00)00186-4.
  2. Julien Baste, Christophe Paul, Ignasi Sau, and Celine Scornavacca. Efficient FPT algorithms for (strict) compatibility of unrooted phylogenetic trees. Bulletin of Mathematical Biology, 79(4):920-938, February 2017. URL: https://doi.org/10.1007/s11538-017-0260-y.
  3. Julien Baste, Ignasi Sau, and Dimitrios M. Thilikos. Hitting minors on bounded treewidth graphs. i. general upper bounds. SIAM J. Discret. Math., 34(3):1623-1648, 2020. URL: https://doi.org/10.1137/19M1287146.
  4. H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. The protein data bank. Nucleic acids research, 28:235-242, 2000. URL: https://doi.org/10.1093/nar/28.1.235.
  5. Guillaume Blin, Alain Denise, Serge Dulucq, Claire Herrbach, and Helene Touzet. Alignments of RNA structures. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(2):309-322, April 2010. URL: https://doi.org/10.1109/tcbb.2008.28.
  6. Hans L Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM Journal on computing, 25(6):1305-1317, 1996. Google Scholar
  7. Hans L Bodlaender and Arie MCA Koster. Combinatorial optimization on graphs of bounded treewidth. The Computer Journal, 51(3):255-269, 2008. Google Scholar
  8. Hans L Bodlaender and Arie MCA Koster. Treewidth computations i. upper bounds. Information and Computation, 208(3):259-275, 2010. Google Scholar
  9. Laurent Bulteau, Guillaume Fertin, Minghui Jiang, and Irena Rusu. Tractability and approximability of maximal strip recovery. Theoretical Computer Science, 440:14-28, 2012. Google Scholar
  10. Laurent Bulteau and Mathias Weller. Parameterized algorithms in bioinformatics: An overview. Algorithms, 12(12):256, December 2019. URL: https://doi.org/10.3390/a12120256.
  11. Leizhen Cai. Fixed-parameter tractability of graph modification problems for hereditary properties. Information Processing Letters, 58(4):171-176, 1996. Google Scholar
  12. Leizhen Cai. Parameterized complexity of vertex colouring. Discrete Applied Mathematics, 127(3):415-429, 2003. Google Scholar
  13. Bruno Courcelle. The monadic second-order logic of graphs. i. recognizable sets of finite graphs. Information and Computation, 85(1):12-75, 1990. URL: https://doi.org/10.1016/0890-5401(90)90043-H.
  14. Christophe Crespelle, Pål Grønås Drange, Fedor V Fomin, and Petr A Golovach. A survey of parameterized algorithms and the complexity of edge modification. arXiv preprint, 2020. URL: http://arxiv.org/abs/2001.06867.
  15. Marek Cygan, Fedor V Fomin, Łukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. Parameterized algorithms, volume 5. Springer, 2015. Google Scholar
  16. Marek Cygan, Daniel Lokshtanov, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. On the hardness of losing width. In International Symposium on Parameterized and Exact Computation, pages 159-168. Springer, 2011. Google Scholar
  17. Rodney G Downey and Michael Ralph Fellows. Parameterized complexity. Springer Science & Business Media, 2012. Google Scholar
  18. Ehab S El-Mallah and Charles J Colbourn. The complexity of some edge deletion problems. IEEE transactions on circuits and systems, 35(3):354-362, 1988. Google Scholar
  19. Vibhav Gogate and Rina Dechter. A complete anytime algorithm for treewidth. arXiv preprint, 2012. URL: http://arxiv.org/abs/1207.4109.
  20. Stefan Hammer, Wei Wang, Sebastian Will, and Yann Ponty. Fixed-parameter tractable sampling for RNA design with multiple target structures. BMC Bioinformatics, 20(1), April 2019. URL: https://doi.org/10.1186/s12859-019-2784-7.
  21. Buhm Han, Banu Dost, Vineet Bafna, and Shaojie Zhang. Structural alignment of pseudoknotted RNA. Journal of Computational Biology, 15(5):489-504, 2008. URL: https://doi.org/10.1089/cmb.2007.0214.
  22. Wenzel Jakob, Jason Rhinelander, and Dean Moldovan. pybind11 - seamless operability between c++11 and python, 2017. URL: https://github.com/pybind/pybind11.
  23. Ioanna Kalvari, Eric P Nawrocki, Nancy Ontiveros-Palacios, Joanna Argasinska, Kevin Lamkiewicz, Manja Marz, Sam Griffiths-Jones, Claire Toffano-Nioche, Daniel Gautheret, Zasha Weinberg, Elena Rivas, Sean R Eddy, Robert D Finn, Alex Bateman, and Anton I Petrov. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Research, 49(D1):D192-D200, November 2020. URL: https://doi.org/10.1093/nar/gkaa1047.
  24. Robert J Klein and Sean R Eddy. Rsearch: finding homologs of single structured RNA sequences. BMC bioinformatics, 4(1):44, 2003. Google Scholar
  25. Neocles B Leontis and Eric Westhof. Geometric nomenclature and classification of RNA base pairs. RNA, 7(4):499-512, 2001. Google Scholar
  26. Yijin Liu, Timothy J Wilson, Scott A McPhee, and David MJ Lilley. Crystal structure and mechanistic investigation of the twister ribozyme. Nature chemical biology, 10(9):739-744, 2014. Google Scholar
  27. László Lovász. Graph minor theory. Bulletin of the American Mathematical Society, 43(1):75-86, 2006. Google Scholar
  28. Xiang-Jun Lu, Harmen J. Bussemaker, and Wilma K. Olson. DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Research, 43(21):e142-e142, July 2015. URL: https://doi.org/10.1093/nar/gkv716.
  29. R. B. Lyngsø and C. N. S. Pedersen. RNA pseudoknot prediction in energy-based models. Journal of Computational Biology, 7(3-4):409-427, 2000. Google Scholar
  30. Andrzej Proskurowski and Jan Arne Telle. Classes of graphs with restricted interval models. Discrete Mathematics & Theoretical Computer Science, 3(4), 2006. Google Scholar
  31. Vladimir Reinharz, Antoine Soulé, Eric Westhof, Jérôme Waldispühl, and Alain Denise. Mining for recurrent long-range interactions in RNA structures reveals embedded hierarchies in network families. Nucleic Acids Research, 46(8):3841-3851, March 2018. URL: https://doi.org/10.1093/nar/gky197.
  32. Philippe Rinaudo, Yann Ponty, Dominique Barth, and Alain Denise. Tree decomposition and parameterized algorithms for RNA structure-sequence alignment including tertiary interactions and pseudoknots. In Lecture Notes in Computer Science, pages 149-164. Springer Berlin Heidelberg, 2012. URL: https://doi.org/10.1007/978-3-642-33122-0_12.
  33. Elena Rivas and Sean R Eddy. Parameterizing sequence alignment with an explicit evolutionary model. BMC bioinformatics, 16(1):406, 2015. Google Scholar
  34. Neil Robertson and Paul D Seymour. Graph minors. xiii. the disjoint paths problem. Journal of combinatorial theory, Series B, 63(1):65-110, 1995. Google Scholar
  35. Toshiki Saitoh, Ryo Yoshinaka, and Hans L. Bodlaender. Fixed-treewidth-efficient algorithms for edge-deletion to interval graph classes. In WALCOM: Algorithms and Computation - 15th International Conference and Workshops, 2021, Proceedings, volume 12635 of Lecture Notes in Computer Science, pages 142-153. Springer, 2021. URL: https://doi.org/10.1007/978-3-030-68211-8_12.
  36. Roman Sarrazin-Gendron, Hua-Ting Yao, Vladimir Reinharz, Carlos G. Oliver, Yann Ponty, and Jérôme Waldispühl. Stochastic sampling of structural contexts improves the scalability and accuracy of RNA 3d module identification. In Lecture Notes in Computer Science, pages 186-201. Springer International Publishing, 2020. URL: https://doi.org/10.1007/978-3-030-45257-5_12.
  37. Saad Sheikh, Rolf Backofen, and Yann Ponty. Impact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots. In Juha Kärkkäinen and Jens Stoye, editors, CPM - 23rd Annual Symposium on Combinatorial Pattern Matching - 2012, volume 7354 of Combinatorial Pattern Matching, pages 321-333, Helsinki, Finland, 2012. Juha Kärkkäinen, Springer. URL: https://doi.org/10.1007/978-3-642-31265-6_26.
  38. Kathryn D. Smith, Carly A. Shanahan, Emily L. Moore, Aline C. Simon, and Scott A. Strobel. Structural basis of differential ligand recognition by two classes of bis-(3'-5')-cyclic dimeric guanosine monophosphate-binding riboswitches. Proceedings of the National Academy of Sciences, 108(19):7757-7762, 2011. URL: https://doi.org/10.1073/pnas.1018857108.
  39. Yinglei Song, Chunmei Liu, Russell Malmberg, Fangfang Pan, and Liming Cai. Tree decomposition based fast search of RNA structures including pseudoknots in genomes. In Computational Systems Bioinformatics Conference, 2005. Proceedings. 2005 IEEE, pages 223-234. IEEE, 2005. Google Scholar
  40. N. Sudarsan, E. R. Lee, Z. Weinberg, R. H. Moy, J. N. Kim, K. H. Link, and R. R. Breaker. Riboswitches in eubacteria sense the second messenger cyclic di-gmp. Science, 321(5887):411-413, 2008. URL: https://doi.org/10.1126/science.1159519.
  41. Rita Tamayo. Cyclic diguanylate riboswitches control bacterial pathogenesis mechanisms. PLOS Pathogens, 15(2):1-7, February 2019. URL: https://doi.org/10.1371/journal.ppat.1007529.
  42. Jinsong Tan and Louxin Zhang. The consecutive ones submatrix problem for sparse matrices. Algorithmica, 48(3):287-299, 2007. Google Scholar
  43. J D Thompson, F Plewniak, and O Poch. BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics, 15(1):87-88, January 1999. URL: https://doi.org/10.1093/bioinformatics/15.1.87.
  44. Jelena Vucinic, David Simoncini, Manon Ruffini, Sophie Barbe, and Thomas Schiex. Positive multistate protein design. Bioinformatics, 36(1):122-130, June 2019. URL: https://doi.org/10.1093/bioinformatics/btz497.
  45. Wei Wang. Practical sequence-structure alignment of RNAs with pseudoknots. PhD thesis, Université Paris-Saclay, School of Computer Science, 2017. Google Scholar
  46. Wei Wang, Alain Denise, and Yann Ponty. Licorna: alignment of complex rnas v1.0, 2017. URL: https://licorna.lri.fr.
  47. M. S. Waterman. Secondary structure of single stranded nucleic acids. Advances in Mathematics Supplementary Studies, 1(1):167-212, 1978. Google Scholar
  48. Mathias Weller, Annie Chateau, and Rodolphe Giroudeau. Exact approaches for scaffolding. BMC Bioinformatics, 16(S14), October 2015. URL: https://doi.org/10.1186/1471-2105-16-s14-s2.
  49. A. Xayaphoummine, T. Bucher, F. Thalmann, and H. Isambert. Prediction and statistics of pseudoknots in RNA structures using exactly clustered stochastic simulations. Proc. Natl. Acad. Sci. U. S. A., 100(26):15310-15315, 2003. Google Scholar
  50. Jinbo Xu. Rapid protein side-chain packing via tree decomposition. In Lecture Notes in Computer Science, pages 423-439. Springer Berlin Heidelberg, 2005. URL: https://doi.org/10.1007/11415770_32.
  51. Hua-Ting Yao, Jérôme Waldispühl, Yann Ponty, and Sebastian Will. Taming Disruptive Base Pairs to Reconcile Positive and Negative Structural Design of RNA. In RECOMB 2021 - 25th international conference on research in computational molecular biology, Padova, France, 2021. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail