Search Results

Documents authored by Ma, Cong


Document
Exact Transcript Quantification Over Splice Graphs

Authors: Cong Ma, Hongyu Zheng, and Carl Kingsford

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)


Abstract
The probability of sequencing a set of RNA-seq reads can be directly modeled using the abundances of splice junctions in splice graphs instead of the abundances of a list of transcripts. We call this model graph quantification, which was first proposed by Bernard et al. (2014). The model can be viewed as a generalization of transcript expression quantification where every full path in the splice graph is a possible transcript. However, the previous graph quantification model assumes the length of single-end reads or paired-end fragments is fixed. We provide an improvement of this model to handle variable-length reads or fragments and incorporate bias correction. We prove that our model is equivalent to running a transcript quantifier with exactly the set of all compatible transcripts. The key to our method is constructing an extension of the splice graph based on Aho-Corasick automata. The proof of equivalence is based on a novel reparameterization of the read generation model of a state-of-art transcript quantification method. This new approach is useful for modeling scenarios where reference transcriptome is incomplete or not available and can be further used in transcriptome assembly or alternative splicing analysis.

Cite as

Cong Ma, Hongyu Zheng, and Carl Kingsford. Exact Transcript Quantification Over Splice Graphs. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{ma_et_al:LIPIcs.WABI.2020.12,
  author =	{Ma, Cong and Zheng, Hongyu and Kingsford, Carl},
  title =	{{Exact Transcript Quantification Over Splice Graphs}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{12:1--12:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.12},
  URN =		{urn:nbn:de:0030-drops-128013},
  doi =		{10.4230/LIPIcs.WABI.2020.12},
  annote =	{Keywords: RNA-seq, alternative splicing, transcript quantification, splice graph, network flow}
}
Document
Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem

Authors: Yutong Qiu, Cong Ma, Han Xie, and Carl Kingsford

Published in: LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)


Abstract
Transcriptomic structural variants (TSVs) - large-scale transcriptome sequence change due to structural variation - are common, especially in cancer. Detecting TSVs is a challenging computational problem. Sample heterogeneity (including differences between alleles in diploid organisms) is a critical confounding factor when identifying TSVs. To improve TSV detection in heterogeneous RNA-seq samples, we introduce the Multiple Compatible Arrangement Problem (MCAP), which seeks k genome rearrangements to maximize the number of reads that are concordant with at least one rearrangement. This directly models the situation of a heterogeneous or diploid sample. We prove that MCAP is NP-hard and provide a 1/4-approximation algorithm for k=1 and a 3/4-approximation algorithm for the diploid case (k=2) assuming an oracle for k=1. Combining these, we obtain a 3/16-approximation algorithm for MCAP when k=2 (without an oracle). We also present an integer linear programming formulation for general k. We characterize the graph structures that require k>1 to satisfy all edges and show such structures are prevalent in cancer samples. We evaluate our algorithms on 381 TCGA samples and 2 cancer cell lines and show improved performance compared to the state-of-the-art TSV-calling tool, SQUID.

Cite as

Yutong Qiu, Cong Ma, Han Xie, and Carl Kingsford. Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 18:1-18:5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{qiu_et_al:LIPIcs.WABI.2019.18,
  author =	{Qiu, Yutong and Ma, Cong and Xie, Han and Kingsford, Carl},
  title =	{{Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem}},
  booktitle =	{19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
  pages =	{18:1--18:5},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-123-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{143},
  editor =	{Huber, Katharina T. and Gusfield, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.18},
  URN =		{urn:nbn:de:0030-drops-110483},
  doi =		{10.4230/LIPIcs.WABI.2019.18},
  annote =	{Keywords: transcriptomic structural variation, integer linear programming, heterogeneity}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail