Flow Decomposition with Subpath Constraints

Authors Lucia Williams , Alexandru I. Tomescu , Brendan Mumey



PDF
Thumbnail PDF

File

LIPIcs.WABI.2021.16.pdf
  • Filesize: 0.7 MB
  • 15 pages

Document Identifiers

Author Details

Lucia Williams
  • School of Computing, Montana State University, Bozeman, MT, USA
Alexandru I. Tomescu
  • Department of Computer Science, University of Helsinki, Finland
Brendan Mumey
  • School of Computing, Montana State University, Bozeman, MT, USA

Acknowledgements

Computational efforts were performed on the Hyalite High Performance Computing System, operated and supported by University Information Technology Research Cyberinfrastructure at Montana State University.

Cite AsGet BibTex

Lucia Williams, Alexandru I. Tomescu, and Brendan Mumey. Flow Decomposition with Subpath Constraints. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 16:1-16:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.WABI.2021.16

Abstract

Flow network decomposition is a natural model for problems where we are given a flow network arising from superimposing a set of weighted paths and would like to recover the underlying data, i.e., decompose the flow into the original paths and their weights. Thus, variations on flow decomposition are often used as subroutines in multiassembly problems such as RNA transcript assembly. In practice, we frequently have access to information beyond flow values in the form of subpaths, and many tools incorporate these heuristically. But despite acknowledging their utility in practice, previous work has not formally addressed the effect of subpath constraints on the accuracy of flow network decomposition approaches. We formalize the flow decomposition with subpath constraints problem, give the first algorithms for it, and study its usefulness for recovering ground truth decompositions. For finding a minimum decomposition, we propose both a heuristic and an FPT algorithm. Experiments on RNA transcript datasets show that for instances with larger solution path sets, the addition of subpath constraints finds 13% more ground truth solutions when minimal decompositions are found exactly, and 30% more ground truth solutions when minimal decompositions are found heuristically.

Subject Classification

ACM Subject Classification
  • Theory of computation → Network flows
  • Applied computing → Computational transcriptomics
Keywords
  • Flow decomposition
  • subpath constraints
  • RNA-Seq

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network flows: theory, algorithms, and applications. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993. Google Scholar
  2. Jasmijn A. Baaijens, Bastiaan Van der Roest, Johannes Köster, Leen Stougie, and Alexander Schönhuth. Full-length de novo viral quasispecies assembly through variation graph construction. Bioinform., 35(24):5086-5094, 2019. URL: https://doi.org/10.1093/bioinformatics/btz443.
  3. Jasmijn A Baaijens, Amal Zine El Aabidine, Eric Rivals, and Alexander Schönhuth. De novo assembly of viral quasispecies using overlap graphs. Genome research, 27(5):835-848, 2017. Google Scholar
  4. Jasmijn A Baaijens, Leen Stougie, and Alexander Schönhuth. Strain-aware assembly of genomes from mixed samples using flow variation graphs. In International Conference on Research in Computational Molecular Biology, pages 221-222. Springer, 2020. Google Scholar
  5. Georg Baier, Ekkehard Köhler, and Martin Skutella. On the k-splittable flow problem. In European Symposium on Algorithms, pages 101-113. Springer, 2002. Google Scholar
  6. Georg Baier, Ekkehard Köhler, and Martin Skutella. The k-splittable flow problem. Algorithmica, 42(3-4):231-248, 2005. Google Scholar
  7. Jørgen Bang-Jensen and Gregory Z Gutin. Digraphs Theory, Algorithms and Applications. Springer-Verlag, Berlin, 1st edition, 2000. Google Scholar
  8. Ergude Bao, Tao Jiang, and Thomas Girke. Branch: boosting RNA-Seq assemblies with partial or related genomic sequences. Bioinformatics, 29(10):1250-1259, 2013. Google Scholar
  9. Elsa Bernard, Laurent Jacob, Julien Mairal, and Jean-Philippe Vert. Efficient RNA isoform identification and quantification from RNA-Seq data with network flows. Bioinformatics, 30(17):2447-2455, 2014. URL: https://doi.org/10.1093/bioinformatics/btu317.
  10. Thasso Griebel, Benedikt Zacher, Paolo Ribeca, Emanuele Raineri, Vincent Lacroix, Roderic Guigó, and Michael Sammeth. Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic acids research, 40(20):10073-10083, 2012. Google Scholar
  11. Tzvika Hartman, Avinatan Hassidim, Haim Kaplan, Danny Raz, and Michal Segalov. How to split a flow? In 2012 Proceedings IEEE INFOCOM, pages 828-836. IEEE, 2012. Google Scholar
  12. Kyle Kloster, Philipp Kuinke, Michael P O'Brien, Felix Reidl, Fernando Sánchez Villaamil, Blair D Sullivan, and Andrew van der Poel. Toboggan: Version 1.0, June 2017. URL: https://doi.org/10.5281/zenodo.821634.
  13. Kyle Kloster, Philipp Kuinke, Michael P O'Brien, Felix Reidl, Fernando Sánchez Villaamil, Blair D Sullivan, and Andrew van der Poel. A practical FPT algorithm for flow decomposition and transcript assembly. In 2018 Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 75-86. SIAM, 2018. Google Scholar
  14. Anna Kuosmanen, Tuukka Norri, and Veli Mäkinen. Evaluating approaches to find exon chains based on long reads. Briefings in bioinformatics, 19(3):404-414, 2018. Google Scholar
  15. Anna Kuosmanen, Ahmed Sobih, Romeo Rizzi, Veli Mäkinen, and Alexandru I. Tomescu. On using longer RNA-Seq reads to improve transcript prediction accuracy. In James P. Gilbert, Haim Azhari, Hesham H. Ali, Carla Quintão, Jan Sliwa, Carolina Ruiz, Ana L. N. Fred, and Hugo Gamboa, editors, Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2016) - Volume 3: BIOINFORMATICS, Rome, Italy, February 21-23, 2016, pages 272-277. SciTePress, 2016. URL: https://doi.org/10.5220/0005819702720277.
  16. Wei Li, Jianxing Feng, and Tao Jiang. Isolasso: A LASSO regression approach to RNA-Seq based transcriptome assembly. Journal of Computational Biology, 18(11):1693-1707, 2011. URL: https://doi.org/10.1089/cmb.2011.0171.
  17. Brendan Mumey, Samareh Shahmohammadi, Kathryn McManus, and Sean Yaw. Parity balancing path flow decomposition and routing. In 2015 IEEE Globecom Workshops (GC Wkshps), pages 1-6. IEEE, 2015. Google Scholar
  18. Mihaela Pertea, Geo M Pertea, Corina M Antonescu, Tsung-Cheng Chang, Joshua T Mendell, and Steven L Salzberg. Stringtie enables improved reconstruction of a transcriptome from RNA-Seq reads. Nature Biotechnology, 33(3):290-295, 2015. Google Scholar
  19. Krzysztof Pieńkosz and Kamil Kołtyś. Integral flow decomposition with minimum longest path length. European Journal of Operational Research, 247(2):414-420, 2015. Google Scholar
  20. Romeo Rizzi, Alexandru I. Tomescu, and Veli Mäkinen. On the complexity of minimum path cover with subpath constraints for multi-assembly. BMC Bioinform., 15(S-9):S5, 2014. URL: https://doi.org/10.1186/1471-2105-15-S9-S5.
  21. Mingfu Shao and Carl Kingsford. Accurate assembly of transcripts through phase-preserving graph decomposition. Nature biotechnology, 35(12):1167-1169, 2017. Google Scholar
  22. Mingfu Shao and Carl Kingsford. Theory and a heuristic for the minimum path flow decomposition problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(2):658-670, 2017. Google Scholar
  23. Tamara Steijger, Josep F Abril, Par G Engstrom, Felix Kokocinski, The RGASP Consortium, Tim J Hubbard, Roderic Guigo, Jennifer Harrow, and Paul Bertone. Assessment of transcript reconstruction methods for RNA-Seq. Nat Meth, 10(12):1177-1184, December 2013. URL: https://doi.org/10.1038/nmeth.2714.
  24. Vorapong Suppakitpaisarn. An approximation algorithm for multiroute flow decomposition. Electronic Notes in Discrete Mathematics, 52:367-374, 2016. INOC 2015 - 7th International Network Optimization Conference. URL: https://doi.org/10.1016/j.endm.2016.03.048.
  25. Alexandru I. Tomescu, Anna Kuosmanen, Romeo Rizzi, and Veli Mäkinen. A novel min-cost flow method for estimating transcript expression with RNA-Seq. BMC Bioinformatics, 14(S-5):S15, 2013. Proceedings paper from RECOMB-seq: Third Annual RECOMB Satellite Workshop on Massively Parallel Sequencing Beijing, China. 11-12 April 2013. URL: https://doi.org/10.1186/1471-2105-14-S5-S15.
  26. Cole Trapnell, B.A. Williams, G. Pertea, Ali Mortazavi, G. Kwan, M.J. van Baren, S.L. Salzberg, B.J. Wold, and L. Pachter. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology, 28:511-515, 2010. Google Scholar
  27. Benedicte Vatinlen, Fabrice Chauvet, Philippe Chrétienne, and Philippe Mahey. Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths. European Journal of Operational Research, 185(3):1390-1401, 2008. Google Scholar
  28. Lucia Williams, Gill Reynolds, and Brendan Mumey. RNA transcript assembly using inexact flows. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1907-1914, 2019. Google Scholar
  29. Ting Yu, Zengchao Mu, Zhaoyuan Fang, Xiaoping Liu, Xin Gao, and Juntao Liu. Transborrow: genome-guided transcriptome assembly by borrowing assemblies from different assemblers. Genome Research, 30(8):1181-1190, 2020. URL: https://doi.org/10.1101/gr.257766.119.
  30. Osvaldo Zagordi, Arnab Bhattacharya, Nicholas Eriksson, and Niko Beerenwinkel. ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics, 12(1):119+, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail