Haplotype Threading Using the Positional Burrows-Wheeler Transform

Authors Ahsan Sanaullah, Degui Zhi, Shaoije Zhang

Thumbnail PDF


  • Filesize: 0.64 MB
  • 14 pages

Document Identifiers

Author Details

Ahsan Sanaullah
  • Department of Computer Science, University of Central Florida, Orlando, FL, USA
Degui Zhi
  • School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, USA
Shaoije Zhang
  • Department of Computer Science, University of Central Florida, Orlando, FL, USA

Cite AsGet BibTex

Ahsan Sanaullah, Degui Zhi, and Shaoije Zhang. Haplotype Threading Using the Positional Burrows-Wheeler Transform. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 4:1-4:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


In the classic model of population genetics, one haplotype (query) is considered as a mosaic copy of segments from a number of haplotypes in a panel, or threading the haplotype through the panel. The Li and Stephens model parameterized this problem using a hidden Markov model (HMM). However, HMM algorithms are linear to the sample size, and can be very expensive for biobank-scale panels. Here, we formulate the haplotype threading problem as the Minimal Positional Substring Cover problem, where a query is represented by a mosaic of a minimal number of substring matches from the panel. We show that this problem can be solved by a sequential set of greedy set maximal matches. Moreover, the solution space can be bounded by the left-most and the right-most solutions by the greedy approach. Based on these results, we formulate and solve several variations of this problem. Although our results are yet to be generalized to the cases with mismatches, they offer a theoretical framework for designing methods for genotype imputation and haplotype phasing.

Subject Classification

ACM Subject Classification
  • Applied computing → Computational biology
  • Applied computing → Genetics
  • Substring Cover
  • PBWT
  • Haplotype Threading
  • Haplotype Matching


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Olivier Delaneau, Jean-François Zagury, Matthew R Robinson, Jonathan L Marchini, and Emmanouil T Dermitzakis. Accurate, scalable and integrative haplotype estimation. Nature communications, 10(1):1-10, 2019. Google Scholar
  2. Richard Durbin. Efficient haplotype matching and storage using the positional burrows-wheeler transform (pbwt). Bioinformatics, 30(9):1266-1272, 2014. Google Scholar
  3. Na Li and Matthew Stephens. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics, 165(4):2213-2233, 2003. Google Scholar
  4. Po-Ru Loh, Petr Danecek, Pier Francesco Palamara, Christian Fuchsberger, Yakir A Reshef, Hilary K Finucane, Sebastian Schoenherr, Lukas Forer, Shane McCarthy, Goncalo R Abecasis, et al. Reference-based phasing using the haplotype reference consortium panel. Nature genetics, 48(11):1443-1448, 2016. Google Scholar
  5. Gerton Lunter. Haplotype matching in large cohorts using the li and stephens model. Bioinformatics, 35(5):798-806, 2019. Google Scholar
  6. Ardalan Naseri, Erwin Holzhauser, Degui Zhi, and Shaojie Zhang. Efficient haplotype matching between a query and a panel for genealogical search. Bioinformatics, 35(14):i233-i241, 2019. Google Scholar
  7. Ardalan Naseri, Xiaoming Liu, Kecong Tang, Shaojie Zhang, and Degui Zhi. Rapid: ultra-fast, powerful, and accurate detection of segments identical by descent (IBD) in biobank-scale cohorts. Genome biology, 20(1):1-15, 2019. Google Scholar
  8. Ardalan Naseri, Degui Zhi, and Shaojie Zhang. Multi-allelic positional Burrows-Wheeler transform. BMC bioinformatics, 20(11):1-8, 2019. Google Scholar
  9. Simone Rubinacci, Olivier Delaneau, and Jonathan Marchini. Genotype imputation using the positional burrows wheeler transform. PLoS genetics, 16(11):e1009049, 2020. Google Scholar
  10. Ahsan Sanaullah, Degui Zhi, and Shaojie Zhang. d-PBWT: dynamic positional Burrows-Wheeler transform. Bioinformatics, 37(16):2390-2397, 2021. Google Scholar
  11. William Yue, Ardalan Naseri, Victor Wang, Pramesh Shakya, Shaojie Zhang, and Degui Zhi. P-smoother: Efficient PBWT smoothing of large haplotype panels. Bioinformatics Advances, 2022. URL: https://doi.org/10.1093/bioadv/vbac045.