License: Creative Commons Attribution 3.0 Unported license (CC-BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2016.15
URN: urn:nbn:de:0030-drops-60791
URL: https://drops.dagstuhl.de/opus/volltexte/2016/6079/
Go to the corresponding LIPIcs Volume Portal


Jiang, Haitao ; Fan, Chenglin ; Yang, Boting ; Zhong, Farong ; Zhu, Daming ; Zhu, Binhai

Genomic Scaffold Filling Revisited

pdf-format:
LIPIcs-CPM-2016-15.pdf (0.4 MB)


Abstract

The genomic scaffold filling problem has attracted a lot of attention recently. The problem is on filling an incomplete sequence (scaffold) I into I', with respect to a complete reference genome G, such that the number of adjacencies between G and I' is maximized. The problem is NP-complete and APX-hard, and admits a 1.2-approximation. However, the sequence input I is not quite practical and does not fit most of the real datasets (where a scaffold is more often given as a list of contigs). In this paper, we revisit the genomic scaffold filling problem by considering this important case when, (1) a scaffold S is given, the missing genes X = c(G) - c(S) can only be inserted in between the contigs, and the objective is to maximize the number of adjacencies between G and the filled S' and (2) a scaffold S is given, a subset of the missing genes X' subset X = c(G) - c(S) can only be inserted in between the contigs, and the objective is still to maximize the number of adjacencies between G and the filled S''. For problem (1), we present a simple NP-completeness proof, we then present a factor-2 greedy approximation algorithm, and finally we show that the problem is FPT when each gene appears at most d times in G. For problem (2), we prove that the problem is W[1]-hard and then we present a factor-2 FPT-approximation for the case when each gene appears at most d times in G.

BibTeX - Entry

@InProceedings{jiang_et_al:LIPIcs:2016:6079,
  author =	{Haitao Jiang and Chenglin Fan and Boting Yang and Farong Zhong and Daming Zhu and Binhai Zhu},
  title =	{{Genomic Scaffold Filling Revisited}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Roberto Grossi and Moshe Lewenstein},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2016/6079},
  URN =		{urn:nbn:de:0030-drops-60791},
  doi =		{10.4230/LIPIcs.CPM.2016.15},
  annote =	{Keywords: Computational biology, Approximation algorithms, FPT algorithms, NP- completeness}
}

Keywords: Computational biology, Approximation algorithms, FPT algorithms, NP- completeness
Collection: 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Issue Date: 2016
Date of publication: 27.06.2016


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI