Accelerating ILP Solvers for Minimum Flow Decompositions Through Search Space and Dimensionality Reductions

Authors Andreas Grigorjew , Fernando H. C. Dias , Andrea Cracco , Romeo Rizzi , Alexandru I. Tomescu



PDF
Thumbnail PDF

File

LIPIcs.SEA.2024.14.pdf
  • Filesize: 1.24 MB
  • 19 pages

Document Identifiers

Author Details

Andreas Grigorjew
  • University of Helsinki, Finland
Fernando H. C. Dias
  • Aalto University, Espoo, Finland
Andrea Cracco
  • University of Verona, Italy
Romeo Rizzi
  • University of Verona, Italy
Alexandru I. Tomescu
  • University of Helsinki, Finland

Acknowledgements

We are very grateful to Manuel Cáceres for a very helpful discussion on sets of independent safe paths.

Cite AsGet BibTex

Andreas Grigorjew, Fernando H. C. Dias, Andrea Cracco, Romeo Rizzi, and Alexandru I. Tomescu. Accelerating ILP Solvers for Minimum Flow Decompositions Through Search Space and Dimensionality Reductions. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.SEA.2024.14

Abstract

Given a flow network, the Minimum Flow Decomposition (MFD) problem is finding the smallest possible set of weighted paths whose superposition equals the flow. It is a classical, strongly NP-hard problem that is proven to be useful in RNA transcript assembly and applications outside of Bioinformatics. We improve an existing ILP (Integer Linear Programming) model by Dias et al. [RECOMB 2022] for DAGs by decreasing the solver’s search space using solution safety and several other optimizations. This results in a significant speedup compared to the original ILP, of up to 34× on average on the hardest instances. Moreover, we show that our optimizations apply also to MFD problem variants, resulting in speedups that go up to 219× on the hardest instances. We also developed an ILP model of reduced dimensionality for an MFD variant in which the solution path weights are restricted to a given set. This model can find an optimal MFD solution for most instances, and overall, its accuracy significantly outperforms that of previous greedy algorithms while being up to an order of magnitude faster than our optimized ILP.

Subject Classification

ACM Subject Classification
  • Theory of computation → Network flows
  • Applied computing → Bioinformatics
Keywords
  • Flow decomposition
  • Integer Linear Programming
  • Safety
  • RNA-seq
  • RNA transcript assembly
  • isoform

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jasmijn A Baaijens, Leen Stougie, and Alexander Schönhuth. Strain-aware assembly of genomes from mixed samples using flow variation graphs. In International Conference on Research in Computational Molecular Biology, pages 221-222. Springer, 2020. Google Scholar
  2. Elsa Bernard, Laurent Jacob, Julien Mairal, and Jean-Philippe Vert. Efficient RNA isoform identification and quantification from RNA-Seq data with network flows. Bioinformatics, 30(17):2447-2455, 2014. Google Scholar
  3. Benjamin Merlin Bumpus, Bart M. P. Jansen, and Jari J. H. de Kroon. Search-Space Reduction via Essential Vertices. In Shiri Chechik, Gonzalo Navarro, Eva Rotenberg, and Grzegorz Herman, editors, 30th Annual European Symposium on Algorithms (ESA 2022), volume 244 of Leibniz International Proceedings in Informatics (LIPIcs), pages 30:1-30:15, Dagstuhl, Germany, 2022. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ESA.2022.30.
  4. Fernando HC Dias, Lucia Williams, Brendan Mumey, and Alexandru I Tomescu. Fast, flexible, and exact minimum flow decompositions via ILP. In International Conference on Research in Computational Molecular Biology, pages 230-245. Springer, 2022. Google Scholar
  5. Fernando HC Dias, Lucia Williams, Brendan Mumey, and Alexandru I Tomescu. Minimum flow decomposition in graphs with cycles using integer linear programming. arXiv preprint arXiv:2209.00042, 2022. Google Scholar
  6. Thomas Gatter and Peter F Stadler. Ryūtō: network-flow based transcriptome reconstruction. BMC bioinformatics, 20(1):1-14, 2019. Google Scholar
  7. Thasso Griebel, Benedikt Zacher, Paolo Ribeca, Emanuele Raineri, Vincent Lacroix, Roderic Guigó, and Michael Sammeth. Modelling and simulating generic rna-seq experiments with the flux simulator. Nucleic Acids Research, 40(20):10073-10083, 2012. Google Scholar
  8. Michael Hagemann-Jensen, Christoph Ziegenhain, Ping Chen, Daniel Ramsköld, Gert-Jan Hendriks, Anton JM Larsson, Omid R Faridani, and Rickard Sandberg. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nature Biotechnology, 38(6):708-714, 2020. Google Scholar
  9. Tzvika Hartman, Avinatan Hassidim, Haim Kaplan, Danny Raz, and Michal Segalov. How to split a flow? In 2012 Proceedings IEEE INFOCOM, pages 828-836. IEEE, 2012. Google Scholar
  10. Shahbaz Khan, Milla Kortelainen, Manuel Cáceres, Lucia Williams, and Alexandru I. Tomescu. Safety and Completeness in Flow Decompositions for RNA Assembly. In Itsik Pe'er, editor, Research in Computational Molecular Biology - 26th Annual International Conference, RECOMB 2022, San Diego, CA, USA, May 22-25, 2022, Proceedings, volume 13278 of Lecture Notes in Computer Science, pages 177-192. Springer, 2022. URL: https://doi.org/10.1007/978-3-031-04749-7_11.
  11. Kyle Kloster, Philipp Kuinke, Michael P O'Brien, Felix Reidl, Fernando Sánchez Villaamil, Blair D Sullivan, and Andrew van der Poel. A practical fpt algorithm for flow decomposition and transcript assembly. In 2018 Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 75-86. SIAM, 2018. Google Scholar
  12. Ralf Möhring. Algorithmic Aspects of Comparability Graphs and Interval Graphs. In Ivan Rival, editor, Graphs and Order: the role of graphs in the theory of ordered sets and its applications. D. Reidel Publishing Company, 1984. Google Scholar
  13. Rolf H Möhring. Algorithmic aspects of comparability graphs and interval graphs. Graphs and order: the role of graphs in the theory of ordered sets and its applications, pages 41-101, 1985. Google Scholar
  14. Brendan Mumey, Samareh Shahmohammadi, Kathryn McManus, and Sean Yaw. Parity balancing path flow decomposition and routing. In 2015 IEEE Globecom Workshops (GC Wkshps), pages 1-6. IEEE, 2015. Google Scholar
  15. Rob Patro, Geet Duggal, and Carl Kingsford. Salmon: accurate, versatile and ultrafast quantification from RNA-seq data using lightweight-alignment. BioRxiv, page 021592, 2015. Google Scholar
  16. Mihaela Pertea, Geo M Pertea, Corina M Antonescu, Tsung-Cheng Chang, Joshua T Mendell, and Steven L Salzberg. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology, 33(3):290-295, 2015. Google Scholar
  17. Manuel Ariel Caceres Reyes, Massimo Cairo, Andreas Grigorjew, Shahbaz Khan, Brendan Marshall Mumey, Romeo Rizzi, Alexandru Tomescu, and Lucia Williams. Width helps and hinders splitting flows. In 30th Annual European Symposium on Algorithms (ESA 2022), 2022. Google Scholar
  18. Zhaleh Safikhani, Mehdi Sadeghi, Hamid Pezeshk, and Changiz Eslahchi. SSP: An interval integer linear programming for de novo transcriptome assembly and isoform discovery of RNA-seq reads. Genomics, 102(5-6):507-514, 2013. Google Scholar
  19. Mingfu Shao and Carl Kingsford. Accurate assembly of transcripts through phase-preserving graph decomposition. Nature Biotechnology, 35(12):1167-1169, 2017. Google Scholar
  20. Mingfu Shao and Carl Kingsford. Theory and a heuristic for the minimum path flow decomposition problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(2):658-670, 2017. Google Scholar
  21. Alexandru I Tomescu, Travis Gagie, Alexandru Popa, Romeo Rizzi, Anna Kuosmanen, and Veli Mäkinen. Explaining a weighted dag with few paths for solving genome-guided multi-assembly. IEEE/ACM transactions on computational biology and bioinformatics, 12(6):1345-1354, 2015. Google Scholar
  22. Alexandru I Tomescu, Anna Kuosmanen, Romeo Rizzi, and Veli Mäkinen. A novel min-cost flow method for estimating transcript expression with RNA-Seq. In BMC bioinformatics, volume 14, pages S15:1-S15:10. Springer, 2013. Google Scholar
  23. Alexandru I Tomescu and Paul Medvedev. Safe and complete contig assembly through omnitigs. Journal of Computational Biology, 24(6):590-602, 2017. Google Scholar
  24. B. Vatinlen, F. Chauvet, P. Chrétienne, and P. Mahey. Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths. European Journal of Operational Research, 185(3):1390-1401, 2008. URL: https://doi.org/10.1016/j.ejor.2006.05.043.
  25. Lucia Williams, Gillian Reynolds, and Brendan Mumey. RNA Transcript Assembly Using Inexact Flows. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1907-1914. IEEE, 2019. Google Scholar
  26. Lucia Williams, Alexandru Tomescu, Brendan Marshall Mumey, et al. Flow decomposition with subpath constraints. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021. Google Scholar