Document

**Published in:** LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)

The decomposition of flow-networks is an essential part of many transcriptome assembly algorithms used in Computational Biology. The addition of subpath constraints to this decomposition appeared recently as an effective way to incorporate longer, already known, portions of the transcript. The problem is defined as follows: given a weakly connected directed acyclic flow network G = (V, E, f) and a set ℛ of subpaths in G, find a flow decomposition so that every subpath in ℛ is included in some flow in the decomposition [Williams et al., WABI 2021]. The authors of that work presented an exponential time algorithm for determining the feasibility of such a flow decomposition, and more recently presented an O(|E| + L+|ℛ|³) time algorithm, where L is the sum of the path lengths in ℛ [Williams et al., TCBB 2022]. Our work provides an improved, linear O(|E| + L) time algorithm for determining the feasibility of such a flow decomposition. We also introduce two natural optimization variants of the feasibility problem: (i) determining the minimum sized subset of ℛ that must be removed to make a flow decomposition feasible, and (ii) determining the maximum sized subset of ℛ that can be maintained while making a flow decomposition feasible. We show that, under the assumption P ≠ NP, (i) does not admit a polynomial-time o(log |V|)-approximation algorithm and (ii) does not admit a polynomial-time O(|V|^{1/2-ε} + |ℛ|^{1-ε})-approximation algorithm for any constant ε > 0.

Daniel Gibney, Sharma V. Thankachan, and Srinivas Aluru. Feasibility of Flow Decomposition with Subpath Constraints in Linear Time. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 17:1-17:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{gibney_et_al:LIPIcs.WABI.2022.17, author = {Gibney, Daniel and Thankachan, Sharma V. and Aluru, Srinivas}, title = {{Feasibility of Flow Decomposition with Subpath Constraints in Linear Time}}, booktitle = {22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)}, pages = {17:1--17:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-243-3}, ISSN = {1868-8969}, year = {2022}, volume = {242}, editor = {Boucher, Christina and Rahmann, Sven}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.17}, URN = {urn:nbn:de:0030-drops-170516}, doi = {10.4230/LIPIcs.WABI.2022.17}, annote = {Keywords: Flow networks, flow decomposition, subpath constraints} }

Document

**Published in:** LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)

Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix-matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second.

Chirag Jain, Haowen Zhang, Alexander Dilthey, and Srinivas Aluru. Validating Paired-End Read Alignments in Sequence Graphs. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 17:1-17:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{jain_et_al:LIPIcs.WABI.2019.17, author = {Jain, Chirag and Zhang, Haowen and Dilthey, Alexander and Aluru, Srinivas}, title = {{Validating Paired-End Read Alignments in Sequence Graphs}}, booktitle = {19th International Workshop on Algorithms in Bioinformatics (WABI 2019)}, pages = {17:1--17:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-123-8}, ISSN = {1868-8969}, year = {2019}, volume = {143}, editor = {Huber, Katharina T. and Gusfield, Dan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.17}, URN = {urn:nbn:de:0030-drops-110470}, doi = {10.4230/LIPIcs.WABI.2019.17}, annote = {Keywords: Sequence graphs, read mapping, index, sparse matrix-matrix multiplication} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail