Search Results

Documents authored by Zhang, Haowen


Document
Validating Paired-End Read Alignments in Sequence Graphs

Authors: Chirag Jain, Haowen Zhang, Alexander Dilthey, and Srinivas Aluru

Published in: LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)


Abstract
Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix-matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second.

Cite as

Chirag Jain, Haowen Zhang, Alexander Dilthey, and Srinivas Aluru. Validating Paired-End Read Alignments in Sequence Graphs. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 17:1-17:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{jain_et_al:LIPIcs.WABI.2019.17,
  author =	{Jain, Chirag and Zhang, Haowen and Dilthey, Alexander and Aluru, Srinivas},
  title =	{{Validating Paired-End Read Alignments in Sequence Graphs}},
  booktitle =	{19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
  pages =	{17:1--17:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-123-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{143},
  editor =	{Huber, Katharina T. and Gusfield, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.17},
  URN =		{urn:nbn:de:0030-drops-110470},
  doi =		{10.4230/LIPIcs.WABI.2019.17},
  annote =	{Keywords: Sequence graphs, read mapping, index, sparse matrix-matrix multiplication}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail