Feasibility of Flow Decomposition with Subpath Constraints in Linear Time

Authors Daniel Gibney, Sharma V. Thankachan, Srinivas Aluru



PDF
Thumbnail PDF

File

LIPIcs.WABI.2022.17.pdf
  • Filesize: 1.26 MB
  • 16 pages

Document Identifiers

Author Details

Daniel Gibney
  • Georgia Institute of Technology, Atlanta, GA, USA
Sharma V. Thankachan
  • North Carolina State University, Raleigh, NC, USA
Srinivas Aluru
  • Georgia Institute of Technology, Atlanta, GA, USA

Cite As Get BibTex

Daniel Gibney, Sharma V. Thankachan, and Srinivas Aluru. Feasibility of Flow Decomposition with Subpath Constraints in Linear Time. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 17:1-17:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.WABI.2022.17

Abstract

The decomposition of flow-networks is an essential part of many transcriptome assembly algorithms used in Computational Biology. The addition of subpath constraints to this decomposition appeared recently as an effective way to incorporate longer, already known, portions of the transcript. The problem is defined as follows: given a weakly connected directed acyclic flow network G = (V, E, f) and a set ℛ of subpaths in G, find a flow decomposition so that every subpath in ℛ is included in some flow in the decomposition [Williams et al., WABI 2021]. The authors of that work presented an exponential time algorithm for determining the feasibility of such a flow decomposition, and more recently presented an O(|E| + L+|ℛ|³) time algorithm, where L is the sum of the path lengths in ℛ [Williams et al., TCBB 2022]. Our work provides an improved, linear O(|E| + L) time algorithm for determining the feasibility of such a flow decomposition. We also introduce two natural optimization variants of the feasibility problem: (i) determining the minimum sized subset of ℛ that must be removed to make a flow decomposition feasible, and (ii) determining the maximum sized subset of ℛ that can be maintained while making a flow decomposition feasible. We show that, under the assumption P ≠ NP, (i) does not admit a polynomial-time o(log |V|)-approximation algorithm and (ii) does not admit a polynomial-time O(|V|^{1/2-ε} + |ℛ|^{1-ε})-approximation algorithm for any constant ε > 0.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Flow networks
  • flow decomposition
  • subpath constraints

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Ergude Bao, Tao Jiang, and Thomas Girke. BRANCH: boosting RNA-Seq assemblies with partial or related genomic sequences. Bioinform., 29(10):1250-1259, 2013. URL: https://doi.org/10.1093/bioinformatics/btt127.
  2. Elsa Bernard, Laurent Jacob, Julien Mairal, and Jean-Philippe Vert. Efficient RNA isoform identification and quantification from RNA-Seq data with network flows. Bioinform., 30(17):2447-2455, 2014. URL: https://doi.org/10.1093/bioinformatics/btu317.
  3. Martin Farach. Optimal suffix tree construction with large alphabets. In 38th Annual Symposium on Foundations of Computer Science, FOCS '97, Miami Beach, Florida, USA, October 19-22, 1997, pages 137-143. IEEE Computer Society, 1997. URL: https://doi.org/10.1109/SFCS.1997.646102.
  4. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997. URL: https://doi.org/10.1017/cbo9780511574931.
  5. Shahbaz Khan, Milla Kortelainen, Manuel Cáceres, Lucia Williams, and Alexandru I. Tomescu. Safety and completeness in flow decompositions for RNA assembly. CoRR, abs/2201.10372, 2022. URL: http://arxiv.org/abs/2201.10372.
  6. Shahbaz Khan and Alexandru I. Tomescu. Safety of flow decompositions in dags. CoRR, abs/2102.06480, 2021. URL: http://arxiv.org/abs/2102.06480.
  7. Anna Kuosmanen, Ahmed Sobih, Romeo Rizzi, Veli Mäkinen, and Alexandru I. Tomescu. On using longer RNA-Seq reads to improve transcript prediction accuracy. In James P. Gilbert, Haim Azhari, Hesham H. Ali, Carla Quintão, Jan Sliwa, Carolina Ruiz, Ana L. N. Fred, and Hugo Gamboa, editors, Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2016) - Volume 3: BIOINFORMATICS, Rome, Italy, February 21-23, 2016, pages 272-277. SciTePress, 2016. URL: https://doi.org/10.5220/0005819702720277.
  8. Wei Li, Jianxing Feng, and Tao Jiang. Isolasso: A LASSO regression approach to RNA-Seq based transcriptome assembly. J. Comput. Biol., 18(11):1693-1707, 2011. URL: https://doi.org/10.1089/cmb.2011.0171.
  9. Carsten Lund and Mihalis Yannakakis. On the hardness of approximating minimization problems. J. ACM, 41(5):960-981, 1994. URL: https://doi.org/10.1145/185675.306789.
  10. Cong Ma, Hongyu Zheng, and Carl Kingsford. Finding ranges of optimal transcript expression quantification in cases of non-identifiability. bioRxiv, 2020. Google Scholar
  11. Edward M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262-272, 1976. URL: https://doi.org/10.1145/321941.321946.
  12. Jelani Nelson. A note on set cover inapproximability independent of universe size. Electron. Colloquium Comput. Complex., 105, 2007. URL: https://eccc.weizmann.ac.il/eccc-reports/2007/TR07-105/index.html.
  13. Mihaela Pertea, Geo M Pertea, Corina M Antonescu, Tsung-Cheng Chang, Joshua T Mendell, and Steven L Salzberg. Stringtie enables improved reconstruction of a transcriptome from RNA-Seq reads. Nature biotechnology, 33(3):290-295, 2015. Google Scholar
  14. Romeo Rizzi, Alexandru I. Tomescu, and Veli Mäkinen. On the complexity of minimum path cover with subpath constraints for multi-assembly. BMC Bioinform., 15(S-9):S5, 2014. URL: https://doi.org/10.1186/1471-2105-15-S9-S5.
  15. Mingfu Shao and Carl Kingsford. Accurate assembly of transcripts through phase-preserving graph decomposition. Nature biotechnology, 35(12):1167-1169, 2017. Google Scholar
  16. Mingfu Shao and Carl Kingsford. Theory and a heuristic for the minimum path flow decomposition problem. IEEE ACM Trans. Comput. Biol. Bioinform., 16(2):658-670, 2019. URL: https://doi.org/10.1109/TCBB.2017.2779509.
  17. Alexandru I. Tomescu, Anna Kuosmanen, Romeo Rizzi, and Veli Mäkinen. A novel min-cost flow method for estimating transcript expression with RNA-Seq. BMC Bioinform., 14(S-5):S15, 2013. URL: https://doi.org/10.1186/1471-2105-14-S5-S15.
  18. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. URL: https://doi.org/10.1007/BF01206331.
  19. Benedicte Vatinlen, Fabrice Chauvet, Philippe Chrétienne, and Philippe Mahey. Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths. Eur. J. Oper. Res., 185(3):1390-1401, 2008. URL: https://doi.org/10.1016/j.ejor.2006.05.043.
  20. Peter Weiner. Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages 1-11. IEEE Computer Society, 1973. URL: https://doi.org/10.1109/SWAT.1973.13.
  21. Lucia Williams, Alexandru I. Tomescu, and Brendan Mumey. Flow decomposition with subpath constraints. In Alessandra Carbone and Mohammed El-Kebir, editors, 21st International Workshop on Algorithms in Bioinformatics, WABI 2021, August 2-4, 2021, Virtual Conference, volume 201 of LIPIcs, pages 16:1-16:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.WABI.2021.16.
  22. Lucia Williams, Alexandru I. Ioan Tomescu, and Brendan Mumey. Flow decomposition with subpath constraints. IEEE/ACM Transactions on Computational Biology and Bioinformatics, pages 1-1, 2022. URL: https://doi.org/10.1109/TCBB.2022.3147697.
  23. Ting Yu, Zengchao Mu, Zhaoyuan Fang, Xiaoping Liu, Xin Gao, and Juntao Liu. Transborrow: genome-guided transcriptome assembly by borrowing assemblies from different assemblers. Genome research, 30(8):1181-1190, 2020. Google Scholar
  24. David Zuckerman. Linear degree extractors and the inapproximability of max clique and chromatic number. Theory Comput., 3(1):103-128, 2007. URL: https://doi.org/10.4086/toc.2007.v003a006.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail