Exact Transcript Quantification Over Splice Graphs

Ma, Cong; Zheng, Hongyu; Kingsford, Carl

doi:10.4230/LIPIcs.WABI.2020.12

File

LIPIcs.WABI.2020.12.pdf

Filesize: 0.64 MB
18 pages

Document Identifiers

DOI: 10.4230/LIPIcs.WABI.2020.12
URN: urn:nbn:de:0030-drops-128013

Author Details

Cong Ma

Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

Hongyu Zheng

Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

Carl Kingsford

Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

Acknowledgements

We would also like to thank Natalie Sauerwald, Dr. Guillaume Marçais, Xiangrui Zeng and Dr. Jose Lugo-Martinez for insightful comments on the manuscript. C.K. is a co-founder of Ocean Genomics, Inc.

Cite As Get BibTex

Cong Ma, Hongyu Zheng, and Carl Kingsford. Exact Transcript Quantification Over Splice Graphs. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.WABI.2020.12

Abstract

The probability of sequencing a set of RNA-seq reads can be directly modeled using the abundances of splice junctions in splice graphs instead of the abundances of a list of transcripts. We call this model graph quantification, which was first proposed by Bernard et al. (2014). The model can be viewed as a generalization of transcript expression quantification where every full path in the splice graph is a possible transcript. However, the previous graph quantification model assumes the length of single-end reads or paired-end fragments is fixed. We provide an improvement of this model to handle variable-length reads or fragments and incorporate bias correction. We prove that our model is equivalent to running a transcript quantifier with exactly the set of all compatible transcripts. The key to our method is constructing an extension of the splice graph based on Aho-Corasick automata. The proof of equivalence is based on a novel reparameterization of the read generation model of a state-of-art transcript quantification method. This new approach is useful for modeling scenarios where reference transcriptome is incomplete or not available and can be further used in transcriptome assembly or alternative splicing analysis.

Subject Classification

ACM Subject Classification

Applied computing → Computational transcriptomics

Keywords

RNA-seq
alternative splicing
transcript quantification
splice graph
network flow

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Alfred V Aho and Margaret J Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6):333-340, 1975.
N Akula, J Barb, X Jiang, JR Wendland, KH Choi, SK Sen, L Hou, DTW Chen, G Laje, K Johnson, et al. RNA-sequencing of the brain transcriptome implicates dysregulation of neuroplasticity, circadian rhythms and GTPase binding in bipolar disorder. Molecular Psychiatry, 19(11):1179-1185, 2014.
Elsa Bernard, Laurent Jacob, Julien Mairal, and Jean-Philippe Vert. Efficient RNA isoform identification and quantification from RNA-Seq data with network flows. Bioinformatics, 30(17):2447-2455, 2014.
Nicolas L Bray, Harold Pimentel, Páll Melsted, and Lior Pachter. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34(5):525-527, 2016.
Adam Frankish, Mark Diekhans, Anne-Maud Ferreira, Rory Johnson, Irwin Jungreis, Jane Loveland, Jonathan M Mudge, Cristina Sisu, James Wright, Joel Armstrong, et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Research, 47(D1):D766-D773, 2018.
James Hensman, Panagiotis Papastamoulis, Peter Glaus, Antti Honkela, and Magnus Rattray. Fast and accurate approximate inference of transcript expression from RNA-seq data. Bioinformatics, 31(24):3881-3889, 2015.
Yarden Katz, Eric T Wang, Edoardo M Airoldi, and Christopher B Burge. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods, 7(12):1009, 2010.
Laura H LeGault and Colin N Dewey. Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs. Bioinformatics, 29(18):2300-2310, 2013.
Bo Li and Colin N Dewey. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics, 12(1):323, 2011.
Juntao Liu, Ting Yu, Tao Jiang, and Guojun Li. TransComb: genome-guided transcriptome assembly via combing junctions in splicing graphs. Genome Biology, 17(1):213, 2016.
Lior Pachter. Models for transcript quantification from RNA-Seq. arXiv preprint arXiv:1104.3889, 2011.
Rob Patro, Geet Duggal, Michael I Love, Rafael A Irizarry, and Carl Kingsford. Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods, 14(4):417-419, 2017.
Mihaela Pertea, Geo M Pertea, Corina M Antonescu, Tsung-Cheng Chang, Joshua T Mendell, and Steven L Salzberg. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology, 33(3):290-295, 2015.
Mingfu Shao and Carl Kingsford. Accurate assembly of transcripts through phase-preserving graph decomposition. Nature Biotechnology, 35(12):1167-1169, 2017.
Mingfu Shao and Carl Kingsford. Theory and a heuristic for the minimum path flow decomposition problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(2):658-670, 2019.
Shihao Shen, Juw Won Park, Zhi-xiang Lu, Lan Lin, Michael D Henry, Ying Nian Wu, Qing Zhou, and Yi Xing. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proceedings of the National Academy of Sciences, 111(51):E5593-E5601, 2014.
Manuel Tardaguila, Lorena De La Fuente, Cristina Marti, Cécile Pereira, Francisco Jose Pardo-Palacios, Hector Del Risco, Marc Ferrell, Maravillas Mellado, Marissa Macchietto, Kenneth Verheggen, et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Research, 28(3):396-411, 2018.
Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J Van Baren, Steven L Salzberg, Barbara J Wold, and Lior Pachter. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology, 28(5):511-515, 2010.
Juan L Trincado, Juan C Entizne, Gerald Hysenaj, Babita Singh, Miha Skalic, David J Elliott, and Eduardo Eyras. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biology, 19(1):40, 2018.
Yun C Yung, Nicole C Stoddard, Hope Mirendil, and Jerold Chun. Lysophosphatidic acid signaling in the nervous system. Neuron, 85(4):669-682, 2015.

Exact Transcript Quantification Over Splice Graphs

Authors Cong Ma , Hongyu Zheng , Carl Kingsford

File

Document Identifiers

Author Details

Acknowledgements

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Exact Transcript Quantification Over Splice Graphs

Authors Cong Ma , Hongyu Zheng , Carl Kingsford

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References

Thanks for your feedback!

Could not send message