Abstract
The genomic scaffold filling problem has attracted a lot of attention recently. The problem is on filling an incomplete sequence (scaffold) I into I', with respect to a complete reference genome G, such that the number of adjacencies between G and I' is maximized. The problem is NPcomplete and APXhard, and admits a 1.2approximation. However, the sequence input I is not quite practical and does not fit most of the real datasets (where a scaffold is more often given as a list of contigs). In this paper, we revisit the genomic scaffold filling problem by considering this important case when, (1) a scaffold S is given, the missing genes X = c(G)  c(S) can only be inserted in between the contigs, and the objective is to maximize the number of adjacencies between G and the filled S' and (2) a scaffold S is given, a subset of the missing genes X' subset X = c(G)  c(S) can only be inserted in between the contigs, and the objective is still to maximize the number of adjacencies between G and the filled S''. For problem (1), we present a simple NPcompleteness proof, we then present a factor2 greedy approximation algorithm, and finally we show that the problem is FPT when each gene appears at most d times in G. For problem (2), we prove that the problem is W[1]hard and then we present a factor2 FPTapproximation for the case when each gene appears at most d times in G.
BibTeX  Entry
@InProceedings{jiang_et_al:LIPIcs:2016:6079,
author = {Haitao Jiang and Chenglin Fan and Boting Yang and Farong Zhong and Daming Zhu and Binhai Zhu},
title = {{Genomic Scaffold Filling Revisited}},
booktitle = {27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
pages = {15:115:13},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770125},
ISSN = {18688969},
year = {2016},
volume = {54},
editor = {Roberto Grossi and Moshe Lewenstein},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2016/6079},
URN = {urn:nbn:de:0030drops60791},
doi = {10.4230/LIPIcs.CPM.2016.15},
annote = {Keywords: Computational biology, Approximation algorithms, FPT algorithms, NP completeness}
}
Keywords: 

Computational biology, Approximation algorithms, FPT algorithms, NP completeness 
Collection: 

27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016) 
Issue Date: 

2016 
Date of publication: 

27.06.2016 