An Average-Case Sublinear Exact Li and Stephens Forward Algorithm

Authors Yohei M. Rosen , Benedict J. Paten

Thumbnail PDF


  • Filesize: 1.03 MB
  • 13 pages

Document Identifiers

Author Details

Yohei M. Rosen
  • University of California, Santa Cruz, California, New York University School of Medicine, New York, New York
Benedict J. Paten
  • University of California, Santa Cruz, California

Cite AsGet BibTex

Yohei M. Rosen and Benedict J. Paten. An Average-Case Sublinear Exact Li and Stephens Forward Algorithm. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 9:1-9:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithms as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations are used. However, sequencing projects numbering in the thousands to hundreds of thousands of individuals are underway, and others numbering in the millions are anticipated. To make the Li and Stephens forward algorithm for these datasets computationally tractable, we have created a numerically exact version of the algorithm with observed average case O(nk^{0.35}) runtime in number of genetic sites n and reference panel size k. This avoids any tradeoff between runtime and model complexity. We demonstrate that our approach also provides a succinct data structure for general purpose haplotype data storage. We discuss generalizations of our algorithmic techniques to other hidden Markov models.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Applied computing → Bioinformatics
  • Haplotype
  • Hidden Markov Model
  • Forward Algorithm
  • Lazy Evaluation


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Brian L Browning and Sharon R Browning. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. The American Journal of Human Genetics, 84(2):210-223, 2009. Google Scholar
  2. 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature, 526(7571):68, 2015. Google Scholar
  3. Olivier Delaneau, Jean-Francois Zagury, and Jonathan Marchini. Improved whole-chromosome phasing for disease and population genetic studies. Nature methods, 10(1):5, 2013. Google Scholar
  4. Peter Donnelly and Stephen Leslie. The coalescent and its descendants. arXiv preprint arXiv:1006.1514, 2010. Google Scholar
  5. Richard Durbin. Efficient haplotype matching and storage using the positional burrows-wheeler transform (pbwt). Bioinformatics, 30(9):1266-1272, 2014. Google Scholar
  6. Alon Keinan and Andrew G Clark. Recent explosive human population growth has resulted in an excess of rare genetic variants. science, 336(6082):740-743, 2012. Google Scholar
  7. John Frank Charles Kingman. The coalescent. Stochastic processes and their applications, 13(3):235-248, 1982. Google Scholar
  8. Na Li and Matthew Stephens. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics, 165(4):2213-2233, 2003. Google Scholar
  9. Yun Li, Cristen J Willer, Jun Ding, Paul Scheet, and Gonçalo R Abecasis. Mach: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic epidemiology, 34(8):816-834, 2010. Google Scholar
  10. Po-Ru Loh, Petr Danecek, Pier Francesco Palamara, Christian Fuchsberger, Yakir A Reshef, Hilary K Finucane, Sebastian Schoenherr, Lukas Forer, Shane McCarthy, Goncalo R Abecasis, et al. Reference-based phasing using the haplotype reference consortium panel. Nature genetics, 48(11):1443, 2016. Google Scholar
  11. Gerton Lunter. Fast haplotype matching in very large cohorts using the li and stephens model. bioRxiv, 2016. URL:
  12. Jared O'Connell, Kevin Sharp, Nick Shrine, Louise Wain, Ian Hall, Martin Tobin, Jean-Francois Zagury, Olivier Delaneau, and Jonathan Marchini. Haplotype estimation for biobank-scale data sets. Nature genetics, 48(7):817, 2016. Google Scholar
  13. Yohei Rosen, Jordan Eizenga, and Benedict Paten. Modelling haplotypes with respect to reference cohort variation graphs. Bioinformatics, 33(14):i118-i123, 2017. Google Scholar
  14. Amy L Williams, Nick Patterson, Joseph Glessner, Hakon Hakonarson, and David Reich. Phasing of many thousands of genotyped samples. The American Journal of Human Genetics, 91(2):238-251, 2012. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail