eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2017-06-30
27:1
27:17
10.4230/LIPIcs.CPM.2017.27
article
Beyond Adjacency Maximization: Scaffold Filling for New String Distances
Bulteau, Laurent
Fertin, Guillaume
Komusiewicz, Christian
In Genomic Scaffold Filling, one aims at polishing in silico a draft genome, called scaffold. The scaffold is given in the form of an ordered set of gene sequences, called contigs. This is done by confronting the scaffold to an already complete reference genome from a close species. More precisely, given a scaffold S, a reference genome G and a score function f() between two genomes, the aim is to complete S by adding the missing genes from G so that the obtained complete genome S* optimizes f(S*, G). In this paper, we extend a model of Jiang et al. [CPM 2016] (i) by allowing the insertions of strings instead of single characters (i.e., some groups of genes may be forced to be inserted together) and (ii) by considering two alternative score functions: the first generalizes the notion of common adjacencies by maximizing the number of common k-mers between S* and G (k-Mer Scaffold Filling), the second aims at minimizing the number of breakpoints between S* and G (Min-Breakpoint Scaffold Filling). We study these problems from the parameterized complexity point of view, providing fixed-parameter (FPT) algorithms for both problems. In particular, we show that k-Mer Scaffold Filling is FPT wrt. parameter l, the number of additional k-mers realized by the completion of S—this answers an open question of Jiang et al. [CPM 2016]. We also show that Min-Breakpoint Scaffold Filling is FPT wrt. a parameter combining the number of missing genes, the number of gene repetitions and the target distance.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol078-cpm2017/LIPIcs.CPM.2017.27/LIPIcs.CPM.2017.27.pdf
computational biology
strings
FPT algorithms
kernelization