33 Search Results for "Zhu, Binhai"


Volume

LIPIcs, Volume 105

29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

CPM 2018, July 2-4, 2018, Qingdao, China

Editors: Gonzalo Navarro, David Sankoff, and Binhai Zhu

Document
Beyond the Longest Letter-Duplicated Subsequence Problem

Authors: Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, and Peng Zou

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)


Abstract
Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest letter-duplicated subsequence (LLDS) is proposed. Given a sequence S of length n, a letter-duplicated subsequence is a subsequence of S in the form of x₁^{d₁}x₂^{d₂}⋯ x_k^{d_k} with x_i ∈ Σ, x_j≠ x_{j+1} and d_i ≥ 2 for all i in [k] and j in [k-1]. A linear time algorithm for computing the longest letter-duplicated subsequence (LLDS) of S can be easily obtained. In this paper, we focus on two variants of this problem. We first consider the constrained version when Σ is unbounded, each letter appears in S at least 6 times and all the letters in Σ must appear in the solution. We show that the problem is NP-hard (a further twist indicates that the problem does not admit any polynomial time approximation). The reduction is from possibly the simplest version of SAT that is NP-complete, (≤ 2,1, ≤ 3)-SAT, where each variable appears at most twice positively and exact once negatively, and each clause contains at most three literals and some clauses must contain exactly two literals. (We hope that this technique will serve as a general tool to help us proving the NP-hardness for some more tricky sequence problems involving only one sequence - much harder than with at least two input sequences, which we apply successfully at the end of the paper on some extra variations of the LLDS problem.) We then show that when each letter appears in S at most 3 times, then the problem admits a factor 1.5-O(1/n) approximation. Finally, we consider the weighted version, where the weight of a block x_i^{d_i} (d_i ≥ 2) could be any positive function which might not grow with d_i. We give a non-trivial O(n²) time dynamic programming algorithm for this version, i.e., computing an LD-subsequence of S whose weight is maximized.

Cite as

Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, and Peng Zou. Beyond the Longest Letter-Duplicated Subsequence Problem. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 7:1-7:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{lai_et_al:LIPIcs.CPM.2022.7,
  author =	{Lai, Wenfeng and Liyanage, Adiesha and Zhu, Binhai and Zou, Peng},
  title =	{{Beyond the Longest Letter-Duplicated Subsequence Problem}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{7:1--7:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.7},
  URN =		{urn:nbn:de:0030-drops-161348},
  doi =		{10.4230/LIPIcs.CPM.2022.7},
  annote =	{Keywords: Segmental duplications, Tandem duplications, Longest common subsequence, NP-completeness, Dynamic programming}
}
Document
Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms

Authors: Manuel Lafond, Binhai Zhu, and Peng Zou

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)


Abstract
Recently, due to the genomic sequence analysis in several types of cancer, genomic data based on copy number profiles (CNP for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific segment of interest. The motivation is that in the late stage of certain types of cancer, the genomes are progressing rapidly by segmental duplications and deletions, and hence obtaining the exact sequences becomes difficult. Instead, the number of copies of important segments can be predicted from expression analysis and carries important biological information. Therefore, significant research has recently been devoted to the analysis of genomic data represented as CNP’s. In this paper, we present two streams of results. The first is the negative results on two open problems regarding the computational complexity of the Minimum Copy Number Generation (MCNG) problem posed by Qingge et al. in 2018. The Minimum Copy Number Generation (MCNG) is defined as follows: given a string S in which each character represents a gene or segment, and a CNP C, compute a string T from S, with the minimum number of segmental duplications and deletions, such that cnp(T)=C. It was shown by Qingge et al. that the problem is NP-hard if the duplications are tandem and they left the open question of whether the problem remains NP-hard if arbitrary duplications and/or deletions are used. We answer this question affirmatively in this paper; in fact, we prove that it is NP-hard to even obtain a constant factor approximation. This is achieved through a general-purpose lemma on set-cover reductions that require an exact cover in one direction, but not the other, which might be of independent interest. We also prove that the corresponding parameterized version is W[1]-hard, answering another open question by Qingge et al. The other result is positive and is based on a new (and more general) problem regarding CNP’s. The Copy Number Profile Conforming (CNPC) problem is formally defined as follows: given two CNP’s C₁ and C₂, compute two strings S₁ and S₂ with cnp(S₁)=C₁ and cnp(S₂)=C₂ such that the distance between S₁ and S₂, d(S₁,S₂), is minimized. Here, d(S₁,S₂) is a very general term, which means it could be any genome rearrangement distance (like reversal, transposition, and tandem duplication, etc). We make the first step by showing that if d(S₁,S₂) is measured by the breakpoint distance then the problem is polynomially solvable. We expect that this will trigger some related research along the line in the near future.

Cite as

Manuel Lafond, Binhai Zhu, and Peng Zou. Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 22:1-22:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{lafond_et_al:LIPIcs.CPM.2020.22,
  author =	{Lafond, Manuel and Zhu, Binhai and Zou, Peng},
  title =	{{Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{22:1--22:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.22},
  URN =		{urn:nbn:de:0030-drops-121471},
  doi =		{10.4230/LIPIcs.CPM.2020.22},
  annote =	{Keywords: Computational genomics, cancer genomics, copy number profiles, NP-hardness, approximation algorithms, FPT algorithms}
}
Document
The Tandem Duplication Distance Is NP-Hard

Authors: Manuel Lafond, Binhai Zhu, and Peng Zou

Published in: LIPIcs, Volume 154, 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)


Abstract
In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation AXB ⇒ AXXB. Tandem exon duplications have been found in many species such as human, fly or worm, and have been largely studied in computational biology. The Tandem Duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet, compute the smallest sequence of tandem duplications required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that tandem duplications have received much attention ever since. In this paper, we prove that this problem is NP-hard, settling the 16-year old open problem. We further show that this hardness holds even if all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. One of the tools we develop for the reduction is a new problem called the Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems.

Cite as

Manuel Lafond, Binhai Zhu, and Peng Zou. The Tandem Duplication Distance Is NP-Hard. In 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 154, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{lafond_et_al:LIPIcs.STACS.2020.15,
  author =	{Lafond, Manuel and Zhu, Binhai and Zou, Peng},
  title =	{{The Tandem Duplication Distance Is NP-Hard}},
  booktitle =	{37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-140-5},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{154},
  editor =	{Paul, Christophe and Bl\"{a}ser, Markus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2020.15},
  URN =		{urn:nbn:de:0030-drops-118769},
  doi =		{10.4230/LIPIcs.STACS.2020.15},
  annote =	{Keywords: Tandem duplication, Text processing, Formal languages, Computational genomics, FPT algorithms}
}
Document
A 2-Approximation Algorithm for the Complementary Maximal Strip Recovery Problem

Authors: Haitao Jiang, Jiong Guo, Daming Zhu, and Binhai Zhu

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
The Maximal Strip Recovery problem (MSR) and its complementary (CMSR) are well-studied NP-hard problems in computational genomics. The input of these dual problems are two signed permutations. The goal is to delete some gene markers from both permutations, such that, in the remaining permutations, each gene marker has at least one common neighbor. Equivalently, the resulting permutations could be partitioned into common strips of length at least two. Then MSR is to maximize the number of remaining genes, while the objective of CMSR is to delete the minimum number of gene markers. In this paper, we present a new approximation algorithm for the Complementary Maximal Strip Recovery (CMSR) problem. Our approximation factor is 2, improving the currently best 7/3-approximation algorithm. Although the improvement on the factor is not huge, the analysis is greatly simplified by a compensating method, commonly referred to as the non-oblivious local search technique. In such a method a substitution may not always increase the value of the current solution (it sometimes may even decrease the solution value), though it always improves the value of another function seemingly unrelated to the objective function.

Cite as

Haitao Jiang, Jiong Guo, Daming Zhu, and Binhai Zhu. A 2-Approximation Algorithm for the Complementary Maximal Strip Recovery Problem. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.CPM.2019.5,
  author =	{Jiang, Haitao and Guo, Jiong and Zhu, Daming and Zhu, Binhai},
  title =	{{A 2-Approximation Algorithm for the Complementary Maximal Strip Recovery Problem}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{5:1--5:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.5},
  URN =		{urn:nbn:de:0030-drops-104769},
  doi =		{10.4230/LIPIcs.CPM.2019.5},
  annote =	{Keywords: Maximal strip recovery, complementary maximal strip recovery, computational genomics, approximation algorithm, local search}
}
Document
Complete Volume
LIPIcs, Volume 105, CPM'18, Complete Volume

Authors: Gonzalo Navarro, David Sankoff, and Binhai Zhu

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
LIPIcs, Volume 105, CPM'18, Complete Volume

Cite as

29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@Proceedings{navarro_et_al:LIPIcs.CPM.2018,
  title =	{{LIPIcs, Volume 105, CPM'18, Complete Volume}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018},
  URN =		{urn:nbn:de:0030-drops-89341},
  doi =		{10.4230/LIPIcs.CPM.2018},
  annote =	{Keywords: Mathematics of computing, Discrete mathematics, Information theory,Information systems, Information retrieval, Theory of computation}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Gonzalo Navarro, David Sankoff, and Binhai Zhu

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{navarro_et_al:LIPIcs.CPM.2018.0,
  author =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.0},
  URN =		{urn:nbn:de:0030-drops-86849},
  doi =		{10.4230/LIPIcs.CPM.2018.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Maximal Common Subsequence Algorithms

Authors: Yoshifumi Sakai

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
A common subsequence of two strings is maximal, if inserting any character into the subsequence can no longer yield a common subsequence of the two strings. The present article proposes a (sub)linearithmic-time, linear-space algorithm for finding a maximal common subsequence of two strings and also proposes a linear-time algorithm for determining if a common subsequence of two strings is maximal.

Cite as

Yoshifumi Sakai. Maximal Common Subsequence Algorithms. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 1:1-1:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{sakai:LIPIcs.CPM.2018.1,
  author =	{Sakai, Yoshifumi},
  title =	{{Maximal Common Subsequence Algorithms}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{1:1--1:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.1},
  URN =		{urn:nbn:de:0030-drops-87079},
  doi =		{10.4230/LIPIcs.CPM.2018.1},
  annote =	{Keywords: algorithms, string comparison, longest common subsequence, constrained longest common subsequence}
}
Document
Order-Preserving Pattern Matching Indeterminate Strings

Authors: Rui Henriques, Alexandre P. Francisco, Luís M. S. Russo, and Hideo Bannai

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
Given an indeterminate string pattern p and an indeterminate string text t, the problem of order-preserving pattern matching with character uncertainties (muOPPM) is to find all substrings of t that satisfy one of the possible orderings defined by p. When the text and pattern are determinate strings, we are in the presence of the well-studied exact order-preserving pattern matching (OPPM) problem with diverse applications on time series analysis. Despite its relevance, the exact OPPM problem suffers from two major drawbacks: 1) the inability to deal with indetermination in the text, thus preventing the analysis of noisy time series; and 2) the inability to deal with indetermination in the pattern, thus imposing the strict satisfaction of the orders among all pattern positions. In this paper, we provide the first polynomial algorithms to answer the muOPPM problem when: 1) indetermination is observed on the pattern or text; and 2) indetermination is observed on both the pattern and the text and given by uncertainties between pairs of characters. First, given two strings with the same length m and O(r) uncertain characters per string position, we show that the muOPPM problem can be solved in O(mr lg r) time when one string is indeterminate and r in N^+ and in O(m^2) time when both strings are indeterminate and r=2. Second, given an indeterminate text string of length n, we show that muOPPM can be efficiently solved in polynomial time and linear space.

Cite as

Rui Henriques, Alexandre P. Francisco, Luís M. S. Russo, and Hideo Bannai. Order-Preserving Pattern Matching Indeterminate Strings. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 2:1-2:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{henriques_et_al:LIPIcs.CPM.2018.2,
  author =	{Henriques, Rui and Francisco, Alexandre P. and Russo, Lu{\'\i}s M. S. and Bannai, Hideo},
  title =	{{Order-Preserving Pattern Matching Indeterminate Strings}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{2:1--2:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.2},
  URN =		{urn:nbn:de:0030-drops-87087},
  doi =		{10.4230/LIPIcs.CPM.2018.2},
  annote =	{Keywords: Order-preserving pattern matching, Indeterminate string analysis, Generic pattern matching, Satisfiability}
}
Document
On Undetected Redundancy in the Burrows-Wheeler Transform

Authors: Uwe Baier

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
The Burrows-Wheeler-Transform (BWT) is an invertible permutation of a text known to be highly compressible but also useful for sequence analysis, what makes the BWT highly attractive for lossless data compression. In this paper, we present a new technique to reduce the size of a BWT using its combinatorial properties, while keeping it invertible. The technique can be applied to any BWT-based compressor, and, as experiments show, is able to reduce the encoding size by 8-16 % on average and up to 33-57 % in the best cases (depending on the BWT-compressor used), making BWT-based compressors competitive or even superior to today's best lossless compressors.

Cite as

Uwe Baier. On Undetected Redundancy in the Burrows-Wheeler Transform. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 3:1-3:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{baier:LIPIcs.CPM.2018.3,
  author =	{Baier, Uwe},
  title =	{{On Undetected Redundancy in the Burrows-Wheeler Transform}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{3:1--3:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.3},
  URN =		{urn:nbn:de:0030-drops-87049},
  doi =		{10.4230/LIPIcs.CPM.2018.3},
  annote =	{Keywords: Lossless data compression, BWT, Tunneling}
}
Document
Quasi-Periodicity Under Mismatch Errors

Authors: Amihood Amir, Avivit Levy, and Ely Porat

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
Tracing regularities plays a key role in data analysis for various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. Part of the scientific process is understanding and explaining these regularities. A common notion to describe regularity in a string T is a cover or quasi-period, which is a string C for which every letter of T lies within some occurrence of C. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper we initiate the study of quasi-periodicity persistence under mismatch errors, and our goal is to characterize situations where a given quasi-periodic string remains quasi-periodic even after substitution errors have been introduced to the string. Our study results in proving necessary conditions as well as a theorem stating sufficient conditions for quasi-periodicity persistence. As an application, we are able to close the gap in understanding the complexity of Approximate Cover Problem (ACP) relaxations studied by [Amir 2017a, Amir 2017b] and solve an open question.

Cite as

Amihood Amir, Avivit Levy, and Ely Porat. Quasi-Periodicity Under Mismatch Errors. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 4:1-4:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2018.4,
  author =	{Amir, Amihood and Levy, Avivit and Porat, Ely},
  title =	{{Quasi-Periodicity Under Mismatch Errors}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{4:1--4:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.4},
  URN =		{urn:nbn:de:0030-drops-87054},
  doi =		{10.4230/LIPIcs.CPM.2018.4},
  annote =	{Keywords: Periodicity, Quasi-Periodicity, Cover, Approximate Cover}
}
Document
Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant

Authors: Brian Brubach

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
We present a new approach to approximating the Maximum Duo-Preservation String Mapping Problem (MPSM) based on massaging the constraints into a tractable matching problem. MPSM was introduced in Chen, Chen, Samatova, Peng, Wang, and Tang [Chen et al., 2014] as the complement to the well-studied Minimum Common String Partition problem (MCSP). Prior work also considers the k-MPSM and k-MCSP variants in which each letter occurs at most k times in each string. The authors of [Chen et al., 2014] showed a k^2-appoximation for k >= 3 and 2-approximation for k = 2. Boria, Kurpisz, Leppänen, and Mastrolilli [Boria et al., 2014] gave a 4-approximation independent of k and showed that even 2-MPSM is APX-Hard. A series of improvements led to the current best bounds of a (2 + epsilon)-approximation for any epsilon > 0 in n^{O(1/epsilon)} time for strings of length n and a 2.67-approximation running in O(n^2) time, both by Dudek, Gawrychowski, and Ostropolski-Nalewaja [Dudek et al., 2017]. Here, we show that a 2.67-approximation can surprisingly be achieved in O(n) time for alphabets of constant size and O(n + alpha^7) for alphabets of size alpha. Recently, Mehrabi [Mehrabi, 2017] introduced the more general weighted variant, Maximum Weight Duo-Preservation String Mapping (MWPSM) and provided a 6-approximation. Our approach gives a 2.67-approximation to this problem running in O(n^3) time. This approach can also find an 8/(3(1-epsilon))-approximation to MWPSM for any epsilon > 0 in O(n^2 epsilon^{-1} lg{epsilon^{-1}}) time using the approximate weighted matching algorithm of Duan and Pettie [Duan and Pettie, 2014]. Finally, we introduce the first streaming algorithm for MPSM. We show that a single pass suffices to find a 4-approximation on the size of an optimal solution using only O(alpha^2 lg{n}) space.

Cite as

Brian Brubach. Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 5:1-5:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{brubach:LIPIcs.CPM.2018.5,
  author =	{Brubach, Brian},
  title =	{{Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{5:1--5:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.5},
  URN =		{urn:nbn:de:0030-drops-87066},
  doi =		{10.4230/LIPIcs.CPM.2018.5},
  annote =	{Keywords: approximation algorithm, maximum duo-preservation string mapping, minimum common string partition, string comparison, streaming algorithm, comparative genomics}
}
Document
Nearest constrained circular words

Authors: Guillaume Blin, Alexandre Blondin Massé, Marie Gasparoux, Sylvie Hamel, and Élise Vandomme

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
In this paper, we study circular words arising in the development of equipment using shields in brachytherapy. This equipment has physical constraints that have to be taken into consideration. From an algorithmic point of view, the problem can be formulated as follows: Given a circular word, find a constrained circular word of the same length such that the Manhattan distance between these two words is minimal. We show that we can solve this problem in pseudo polynomial time (polynomial time in practice) using dynamic programming.

Cite as

Guillaume Blin, Alexandre Blondin Massé, Marie Gasparoux, Sylvie Hamel, and Élise Vandomme. Nearest constrained circular words. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{blin_et_al:LIPIcs.CPM.2018.6,
  author =	{Blin, Guillaume and Blondin Mass\'{e}, Alexandre and Gasparoux, Marie and Hamel, Sylvie and Vandomme, \'{E}lise},
  title =	{{Nearest constrained circular words}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{6:1--6:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.6},
  URN =		{urn:nbn:de:0030-drops-87035},
  doi =		{10.4230/LIPIcs.CPM.2018.6},
  annote =	{Keywords: Circular constrained alignments, Manhattan distance, Application to brachytherapy}
}
Document
Online LZ77 Parsing and Matching Statistics with RLBWTs

Authors: Hideo Bannai, Travis Gagie, and Tomohiro I

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse T^R of a text T, to compute offline the LZ77 parse of T in O(n log r) time and O(r) space, where n is the length of T and r is the number of runs in the BWT of T^R. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of T while still using O(n log r) time and O(r) space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further - albeit making it static again and increasing its space by a factor proportional to the size of the alphabet - such that later, given another string S and O(log log n)-time random access to T, we can compute the matching statistics of S with respect to T in O(|S| log log n) time.

Cite as

Hideo Bannai, Travis Gagie, and Tomohiro I. Online LZ77 Parsing and Matching Statistics with RLBWTs. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 7:1-7:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2018.7,
  author =	{Bannai, Hideo and Gagie, Travis and I, Tomohiro},
  title =	{{Online LZ77 Parsing and Matching Statistics with RLBWTs}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{7:1--7:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.7},
  URN =		{urn:nbn:de:0030-drops-87025},
  doi =		{10.4230/LIPIcs.CPM.2018.7},
  annote =	{Keywords: Lempel-Ziv 1977, Matching Statistics, Run-Length Compressed Burrows-Wheeler Transform}
}
Document
Non-Overlapping Indexing - Cache Obliviously

Authors: Sahar Hooshmand, Paniz Abedin, M. Oguzhan Külekci, and Sharma V. Thankachan

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
The non-overlapping indexing problem is defined as follows: pre-process a given text T[1,n] of length n into a data structure such that whenever a pattern P[1,p] comes as an input, we can efficiently report the largest set of non-overlapping occurrences of P in T. The best known solution is by Cohen and Porat [ISAAC, 2009]. Their index size is O(n) words and query time is optimal O(p+nocc), where nocc is the output size. We study this problem in the cache-oblivious model and present a new data structure of size O(n log n) words. It can answer queries in optimal O(p/(B)+log_B n+nocc/B) I/Os, where B is the block size.

Cite as

Sahar Hooshmand, Paniz Abedin, M. Oguzhan Külekci, and Sharma V. Thankachan. Non-Overlapping Indexing - Cache Obliviously. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 8:1-8:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{hooshmand_et_al:LIPIcs.CPM.2018.8,
  author =	{Hooshmand, Sahar and Abedin, Paniz and K\"{u}lekci, M. Oguzhan and Thankachan, Sharma V.},
  title =	{{Non-Overlapping Indexing - Cache Obliviously}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{8:1--8:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.8},
  URN =		{urn:nbn:de:0030-drops-87009},
  doi =		{10.4230/LIPIcs.CPM.2018.8},
  annote =	{Keywords: Suffix Trees, Cache Oblivious, Data Structure, String Algorithms}
}
  • Refine by Author
  • 8 Zhu, Binhai
  • 7 Bannai, Hideo
  • 5 Inenaga, Shunsuke
  • 5 Takeda, Masayuki
  • 4 Nakashima, Yuto
  • Show More...

  • Refine by Classification
  • 10 Theory of computation → Pattern matching
  • 4 Mathematics of computing → Combinatorial algorithms
  • 3 Theory of computation
  • 3 Theory of computation → Data compression
  • 2 Mathematics of computing → Combinatorics on words
  • Show More...

  • Refine by Keyword
  • 3 FPT algorithms
  • 2 Computational genomics
  • 2 Data Structure
  • 2 Lyndon factorization
  • 2 Lyndon word
  • Show More...

  • Refine by Type
  • 32 document
  • 1 volume

  • Refine by Publication Year
  • 27 2018
  • 2 2016
  • 2 2020
  • 1 2019
  • 1 2022

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail