Norri, Tuukka ;
Cazaux, Bastien ;
Kosolobov, Dmitry ;
Mäkinen, Veli
Minimum Segmentation for Pangenomic Founder Reconstruction in Linear Time
Abstract
Given a threshold L and a set R = {R_1, ..., R_m} of m strings (haplotype sequences), each having length n, the minimum segmentation problem for founder reconstruction is to partition [1,n] into set P of disjoint segments such that each segment [a,b] in P has length at least L and the number d(a,b)={R_i[a,b] : 1 <= i <= m} of distinct substrings at segment [a,b] is minimized over [a,b] in P. The distinct substrings in the segments represent founder blocks that can be concatenated to form max{d(a,b) : [a,b] in P} founder sequences representing the original R such that crossovers happen only at segment boundaries. We give an optimal O(mn) time algorithm to solve the problem, improving over earlier O(mn^2). This improvement enables to exploit the algorithm on a pangenomic setting of input strings being aligned haplotype sequences of complete human chromosomes, with a goal of finding a representative set of references that can be indexed for read alignment and variant calling. We implemented the new algorithm and give some experimental evidence on the practicality of the approach on this pangenomic setting.
BibTeX  Entry
@InProceedings{norri_et_al:LIPIcs:2018:9317,
author = {Tuukka Norri and Bastien Cazaux and Dmitry Kosolobov and Veli M{\"a}kinen},
title = {{Minimum Segmentation for Pangenomic Founder Reconstruction in Linear Time}},
booktitle = {18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
pages = {15:115:15},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770828},
ISSN = {18688969},
year = {2018},
volume = {113},
editor = {Laxmi Parida and Esko Ukkonen},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2018/9317},
URN = {urn:nbn:de:0030drops93175},
doi = {10.4230/LIPIcs.WABI.2018.15},
annote = {Keywords: Pangenome indexing, founder reconstruction, dynamic programming, positional BurrowsWheeler transform, range minimum query}
}
2018
Keywords: 

Pangenome indexing, founder reconstruction, dynamic programming, positional BurrowsWheeler transform, range minimum query 
Seminar: 

18th International Workshop on Algorithms in Bioinformatics (WABI 2018)

Issue date: 

2018 
Date of publication: 

2018 