Search Results

Documents authored by Lai, Wenfeng


Document
Beyond the Longest Letter-Duplicated Subsequence Problem

Authors: Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, and Peng Zou

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)


Abstract
Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest letter-duplicated subsequence (LLDS) is proposed. Given a sequence S of length n, a letter-duplicated subsequence is a subsequence of S in the form of x₁^{d₁}x₂^{d₂}⋯ x_k^{d_k} with x_i ∈ Σ, x_j≠ x_{j+1} and d_i ≥ 2 for all i in [k] and j in [k-1]. A linear time algorithm for computing the longest letter-duplicated subsequence (LLDS) of S can be easily obtained. In this paper, we focus on two variants of this problem. We first consider the constrained version when Σ is unbounded, each letter appears in S at least 6 times and all the letters in Σ must appear in the solution. We show that the problem is NP-hard (a further twist indicates that the problem does not admit any polynomial time approximation). The reduction is from possibly the simplest version of SAT that is NP-complete, (≤ 2,1, ≤ 3)-SAT, where each variable appears at most twice positively and exact once negatively, and each clause contains at most three literals and some clauses must contain exactly two literals. (We hope that this technique will serve as a general tool to help us proving the NP-hardness for some more tricky sequence problems involving only one sequence - much harder than with at least two input sequences, which we apply successfully at the end of the paper on some extra variations of the LLDS problem.) We then show that when each letter appears in S at most 3 times, then the problem admits a factor 1.5-O(1/n) approximation. Finally, we consider the weighted version, where the weight of a block x_i^{d_i} (d_i ≥ 2) could be any positive function which might not grow with d_i. We give a non-trivial O(n²) time dynamic programming algorithm for this version, i.e., computing an LD-subsequence of S whose weight is maximized.

Cite as

Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, and Peng Zou. Beyond the Longest Letter-Duplicated Subsequence Problem. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 7:1-7:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{lai_et_al:LIPIcs.CPM.2022.7,
  author =	{Lai, Wenfeng and Liyanage, Adiesha and Zhu, Binhai and Zou, Peng},
  title =	{{Beyond the Longest Letter-Duplicated Subsequence Problem}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{7:1--7:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.7},
  URN =		{urn:nbn:de:0030-drops-161348},
  doi =		{10.4230/LIPIcs.CPM.2022.7},
  annote =	{Keywords: Segmental duplications, Tandem duplications, Longest common subsequence, NP-completeness, Dynamic programming}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail