Optimizing Safe Flow Decompositions in DAGs

Authors Shahbaz Khan , Alexandru I. Tomescu



PDF
Thumbnail PDF

File

LIPIcs.ESA.2022.72.pdf
  • Filesize: 1.25 MB
  • 17 pages

Document Identifiers

Author Details

Shahbaz Khan
  • Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, India
Alexandru I. Tomescu
  • Department of Computer Science, University of Helsinki, Finland

Acknowledgements

We would like to thank Manuel Cáceres from University of Helsinki, for helpful discussions and for highlighting several errors in our previous approaches.

Cite AsGet BibTex

Shahbaz Khan and Alexandru I. Tomescu. Optimizing Safe Flow Decompositions in DAGs. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 72:1-72:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ESA.2022.72

Abstract

Network flow is one of the most studied combinatorial optimization problems having innumerable applications. Any flow on a directed acyclic graph G having n vertices and m edges can be decomposed into a set of O(m) paths. The applications of such a flow decomposition range from network routing to the assembly of biological sequences. However, in some applications, each solution (decomposition) corresponds to some particular data that generated the original flow. Given the possibility of multiple optimal solutions, no optimization criterion ensures the identification of the correct decomposition. Hence, recently flow decomposition was studied [RECOMB22] in the Safe and Complete framework, particularly for RNA Assembly. The proposed solution reported all the safe paths, i.e., the paths which are subpath of every possible solution of flow decomposition. They presented a characterization of the safe paths, resulting in an O(mn+out_R) time algorithm to compute all safe paths, where out_R is the size of the raw output reporting each safe path explicitly. They also showed that out_R can be Ω(mn²) in the worst case but O(m) in the best case. Hence, they further presented an algorithm to report a concise representation of the output out_C in O(mn+out_C) time, where out_C can be Ω(mn) in the worst case but O(m) in the best case. In this work, we study how different safe paths interact, resulting in optimal output-sensitive algorithms requiring O(m+out_R) and O(m+out_C) time for computing the existing representations of the safe paths. Our algorithm uses a novel data structure called Path Tries, which may be of independent interest. Further, we propose a new characterization of the safe paths resulting in the optimal representation of safe paths out_O, which can be Ω(mn) in the worst case but requires optimal O(1) space for every safe path reported. We also present a near-optimal algorithm to compute all the safe paths in O(m+out_Olog n) time. The new representation also establishes tighter worst case bounds Θ(mn²) and Θ(mn) bounds for out_R and out_C (along with out_O), respectively. Overall we further develop the theory of safe and complete solutions for the flow decomposition problem, giving an optimal algorithm for the explicit representation, and a near-optimal algorithm for the optimal representation of the safe paths.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Graph algorithms
  • Mathematics of computing → Network flows
  • Theory of computation → Network flows
  • Networks → Network algorithms
Keywords
  • safety
  • flows
  • networks
  • directed acyclic graphs

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jasmijn A. Baaijens, Bastiaan Van der Roest, Johannes Köster, Leen Stougie, and Alexander Schönhuth. Full-length de novo viral quasispecies assembly through variation graph construction. Bioinform., 35(24):5086-5094, 2019. URL: https://doi.org/10.1093/bioinformatics/btz443.
  2. Jasmijn A. Baaijens, Leen Stougie, and Alexander Schönhuth. Strain-aware assembly of genomes from mixed samples using flow variation graphs. In Research in Computational Molecular Biology - 24th Annual International Conference, RECOMB 2020, Padua, Italy, May 10-13, 2020, Proceedings, pages 221-222, 2020. Google Scholar
  3. Georg Baier, Ekkehard Köhler, and Martin Skutella. The k-splittable flow problem. Algorithmica, 42(3-4):231-248, 2005. URL: https://doi.org/10.1007/s00453-005-1167-9.
  4. Michael A. Bender and Martin Farach-Colton. The level ancestor problem simplified. Theor. Comput. Sci., 321(1):5-12, 2004. URL: https://doi.org/10.1016/j.tcs.2003.05.002.
  5. Elsa Bernard, Laurent Jacob, Julien Mairal, and Jean-Philippe Vert. Efficient RNA isoform identification and quantification from rna-seq data with network flows. Bioinform., 30(17):2447-2455, 2014. URL: https://doi.org/10.1093/bioinformatics/btu317.
  6. Rami Cohen, Liane Lewin-Eytan, Joseph Seffi Naor, and Danny Raz. On the effect of forwarding table size on sdn network utilization. In IEEE INFOCOM 2014-IEEE conference on computer communications, pages 1734-1742. IEEE, 2014. Google Scholar
  7. D. R. Ford and D. R. Fulkerson. Flows in Networks. Princeton University Press, USA, 2010. Google Scholar
  8. Thomas Gatter and Peter F Stadler. Ryūtō: network-flow based transcriptome reconstruction. BMC bioinformatics, 20(1):190, 2019. URL: https://doi.org/10.1186/s12859-019-2786-5.
  9. Tzvika Hartman, Avinatan Hassidim, Haim Kaplan, Danny Raz, and Michal Segalov. How to split a flow? In 2012 Proceedings IEEE INFOCOM, pages 828-836. IEEE, 2012. Google Scholar
  10. Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. Achieving high utilization with software-driven wan. In Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, pages 15-26, 2013. Google Scholar
  11. Shahbaz Khan, Milla Kortelainen, Manuel Cáceres, Lucia Williams, and Alexandru I. Tomescu. Safety and completeness in flow decompositions for RNA assembly. In 26th Annual International Conference, RECOMB 2022, San Diego, CA, USA, May 22-25, 2022, pages 177-192, 2022. Google Scholar
  12. Kyle Kloster, Philipp Kuinke, Michael P O'Brien, Felix Reidl, Fernando Sánchez Villaamil, Blair D Sullivan, and Andrew van der Poel. A practical fpt algorithm for flow decomposition and transcript assembly. In 2018 Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 75-86. SIAM, 2018. Google Scholar
  13. Cong Ma, Hongyu Zheng, and Carl Kingsford. Exact transcript quantification over splice graphs. In 20th International Workshop on Algorithms in Bioinformatics, WABI 2020, September 7-9, 2020, Pisa, Italy (Virtual Conference), pages 12:1-12:18, 2020. Google Scholar
  14. Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, and Alexandru I. Tomescu. Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing. Cambridge University Press, 2015. URL: https://doi.org/10.1017/CBO9781139940023.
  15. Brendan Mumey, Samareh Shahmohammadi, Kathryn McManus, and Sean Yaw. Parity balancing path flow decomposition and routing. In 2015 IEEE Globecom Workshops (GC Wkshps), pages 1-6. IEEE, 2015. Google Scholar
  16. Jan Peter Ohst. On the Construction of Optimal Paths from Flows and the Analysis of Evacuation Scenarios. PhD thesis, University of Koblenz and Landau, Germany, 2015. Google Scholar
  17. Nils Olsen, Natalia Kliewer, and Lena Wolbeck. A study on flow decomposition methods for scheduling of electric buses in public transport based on aggregated time-space network models. Central European Journal of Operations Research, 2020. URL: https://doi.org/10.1007/s10100-020-00705-6.
  18. Mihaela Pertea, Geo M Pertea, Corina M Antonescu, Tsung-Cheng Chang, Joshua T Mendell, and Steven L Salzberg. Stringtie enables improved reconstruction of a transcriptome from rna-seq reads. Nature biotechnology, 33(3):290-295, 2015. URL: https://doi.org/10.1038/nbt.3122.
  19. Krzysztof Pieńkosz and Kamil Kołtyś. Integral flow decomposition with minimum longest path length. European Journal of Operational Research, 247(2):414-420, 2015. URL: https://doi.org/10.1016/j.ejor.2015.06.012.
  20. Mingfu Shao and Carl Kingsford. Theory and a heuristic for the minimum path flow decomposition problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(2):658-670, 2017. Google Scholar
  21. Vorapong Suppakitpaisarn. An approximation algorithm for multiroute flow decomposition. Electronic Notes in Discrete Mathematics, 52:367-374, 2016. INOC 2015 - 7th International Network Optimization Conference. Google Scholar
  22. Alexandru I. Tomescu, Travis Gagie, Alexandru Popa, Romeo Rizzi, Anna Kuosmanen, and Veli Mäkinen. Explaining a weighted DAG with few paths for solving genome-guided multi-assembly. IEEE ACM Trans. Comput. Biol. Bioinform., 12(6):1345-1354, 2015. URL: https://doi.org/10.1109/TCBB.2015.2418753.
  23. Alexandru I Tomescu, Anna Kuosmanen, Romeo Rizzi, and Veli Mäkinen. A novel min-cost flow method for estimating transcript expression with rna-seq. BMC bioinformatics, 14(S5):S15, 2013. URL: https://doi.org/10.1186/1471-2105-14-S5-S15.
  24. Alexandru I. Tomescu and Paul Medvedev. Safe and complete contig assembly through omnitigs. Journal of Computational Biology, 24(6):590-602, 2017. Preliminary version appeared in RECOMB 2016. Google Scholar
  25. Benedicte Vatinlen, Fabrice Chauvet, Philippe Chrétienne, and Philippe Mahey. Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths. European Journal of Operational Research, 185(3):1390-1401, 2008. URL: https://doi.org/10.1016/j.ejor.2006.05.043.
  26. Zhong Wang, Mark Gerstein, and Michael Snyder. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10(1):57-63, 2009. URL: https://doi.org/10.1038/nrg2484.
  27. Lucia Williams, Gillian Reynolds, and Brendan Mumey. Rna transcript assembly using inexact flows. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1907-1914. IEEE, 2019. Google Scholar
  28. Hongyu Zheng, Cong Ma, and Carl Kingsford. Deriving ranges of optimal estimated transcript expression due to nonidentifiability. J. Comput. Biol., 29(2):121-139, 2022. URL: https://doi.org/10.1089/cmb.2021.0444.