Search Results

Documents authored by Jain, Chirag


Document
Co-Linear Chaining on Pangenome Graphs

Authors: Jyotshna Rajput, Ghanshyam Chandra, and Chirag Jain

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)


Abstract
Pangenome reference graphs are useful in genomics because they compactly represent the genetic diversity within a species, a capability that linear references lack. However, efficiently aligning sequences to these graphs with complex topology and cycles can be challenging. The seed-chain-extend based alignment algorithms use co-linear chaining as a standard technique to identify a good cluster of exact seed matches that can be combined to form an alignment. Recent works show how the co-linear chaining problem can be efficiently solved for acyclic pangenome graphs by exploiting their small width [Makinen et al., TALG'19] and how incorporating gap cost in the scoring function improves alignment accuracy [Chandra and Jain, RECOMB'23]. However, it remains open on how to effectively generalize these techniques for general pangenome graphs which contain cycles. Here we present the first practical formulation and an exact algorithm for co-linear chaining on cyclic pangenome graphs. We rigorously prove the correctness and computational complexity of the proposed algorithm. We evaluate the empirical performance of our algorithm by aligning simulated long reads from the human genome to a cyclic pangenome graph constructed from 95 publicly available haplotype-resolved human genome assemblies. While the existing heuristic-based algorithms are faster, the proposed algorithm provides a significant advantage in terms of accuracy.

Cite as

Jyotshna Rajput, Ghanshyam Chandra, and Chirag Jain. Co-Linear Chaining on Pangenome Graphs. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{rajput_et_al:LIPIcs.WABI.2023.12,
  author =	{Rajput, Jyotshna and Chandra, Ghanshyam and Jain, Chirag},
  title =	{{Co-Linear Chaining on Pangenome Graphs}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{12:1--12:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.12},
  URN =		{urn:nbn:de:0030-drops-186389},
  doi =		{10.4230/LIPIcs.WABI.2023.12},
  annote =	{Keywords: Sequence alignment, variation graph, genome sequencing, path cover}
}
Document
Validating Paired-End Read Alignments in Sequence Graphs

Authors: Chirag Jain, Haowen Zhang, Alexander Dilthey, and Srinivas Aluru

Published in: LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)


Abstract
Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix-matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second.

Cite as

Chirag Jain, Haowen Zhang, Alexander Dilthey, and Srinivas Aluru. Validating Paired-End Read Alignments in Sequence Graphs. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 17:1-17:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{jain_et_al:LIPIcs.WABI.2019.17,
  author =	{Jain, Chirag and Zhang, Haowen and Dilthey, Alexander and Aluru, Srinivas},
  title =	{{Validating Paired-End Read Alignments in Sequence Graphs}},
  booktitle =	{19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
  pages =	{17:1--17:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-123-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{143},
  editor =	{Huber, Katharina T. and Gusfield, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.17},
  URN =		{urn:nbn:de:0030-drops-110470},
  doi =		{10.4230/LIPIcs.WABI.2019.17},
  annote =	{Keywords: Sequence graphs, read mapping, index, sparse matrix-matrix multiplication}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail