LIPIcs, Volume 105

29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)



Thumbnail PDF

Event

CPM 2018, July 2-4, 2018, Qingdao, China

Editors

Gonzalo Navarro
David Sankoff
Binhai Zhu

Publication Details

  • published at: 2018-05-18
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-074-3
  • DBLP: db/conf/cpm/cpm2018

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 105, CPM'18, Complete Volume

Authors: Gonzalo Navarro, David Sankoff, and Binhai Zhu


Abstract
LIPIcs, Volume 105, CPM'18, Complete Volume

Cite as

29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@Proceedings{navarro_et_al:LIPIcs.CPM.2018,
  title =	{{LIPIcs, Volume 105, CPM'18, Complete Volume}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018},
  URN =		{urn:nbn:de:0030-drops-89341},
  doi =		{10.4230/LIPIcs.CPM.2018},
  annote =	{Keywords: Mathematics of computing, Discrete mathematics, Information theory,Information systems, Information retrieval, Theory of computation}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Gonzalo Navarro, David Sankoff, and Binhai Zhu


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{navarro_et_al:LIPIcs.CPM.2018.0,
  author =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.0},
  URN =		{urn:nbn:de:0030-drops-86849},
  doi =		{10.4230/LIPIcs.CPM.2018.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Maximal Common Subsequence Algorithms

Authors: Yoshifumi Sakai


Abstract
A common subsequence of two strings is maximal, if inserting any character into the subsequence can no longer yield a common subsequence of the two strings. The present article proposes a (sub)linearithmic-time, linear-space algorithm for finding a maximal common subsequence of two strings and also proposes a linear-time algorithm for determining if a common subsequence of two strings is maximal.

Cite as

Yoshifumi Sakai. Maximal Common Subsequence Algorithms. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 1:1-1:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{sakai:LIPIcs.CPM.2018.1,
  author =	{Sakai, Yoshifumi},
  title =	{{Maximal Common Subsequence Algorithms}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{1:1--1:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.1},
  URN =		{urn:nbn:de:0030-drops-87079},
  doi =		{10.4230/LIPIcs.CPM.2018.1},
  annote =	{Keywords: algorithms, string comparison, longest common subsequence, constrained longest common subsequence}
}
Document
Order-Preserving Pattern Matching Indeterminate Strings

Authors: Rui Henriques, Alexandre P. Francisco, Luís M. S. Russo, and Hideo Bannai


Abstract
Given an indeterminate string pattern p and an indeterminate string text t, the problem of order-preserving pattern matching with character uncertainties (muOPPM) is to find all substrings of t that satisfy one of the possible orderings defined by p. When the text and pattern are determinate strings, we are in the presence of the well-studied exact order-preserving pattern matching (OPPM) problem with diverse applications on time series analysis. Despite its relevance, the exact OPPM problem suffers from two major drawbacks: 1) the inability to deal with indetermination in the text, thus preventing the analysis of noisy time series; and 2) the inability to deal with indetermination in the pattern, thus imposing the strict satisfaction of the orders among all pattern positions. In this paper, we provide the first polynomial algorithms to answer the muOPPM problem when: 1) indetermination is observed on the pattern or text; and 2) indetermination is observed on both the pattern and the text and given by uncertainties between pairs of characters. First, given two strings with the same length m and O(r) uncertain characters per string position, we show that the muOPPM problem can be solved in O(mr lg r) time when one string is indeterminate and r in N^+ and in O(m^2) time when both strings are indeterminate and r=2. Second, given an indeterminate text string of length n, we show that muOPPM can be efficiently solved in polynomial time and linear space.

Cite as

Rui Henriques, Alexandre P. Francisco, Luís M. S. Russo, and Hideo Bannai. Order-Preserving Pattern Matching Indeterminate Strings. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 2:1-2:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{henriques_et_al:LIPIcs.CPM.2018.2,
  author =	{Henriques, Rui and Francisco, Alexandre P. and Russo, Lu{\'\i}s M. S. and Bannai, Hideo},
  title =	{{Order-Preserving Pattern Matching Indeterminate Strings}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{2:1--2:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.2},
  URN =		{urn:nbn:de:0030-drops-87087},
  doi =		{10.4230/LIPIcs.CPM.2018.2},
  annote =	{Keywords: Order-preserving pattern matching, Indeterminate string analysis, Generic pattern matching, Satisfiability}
}
Document
On Undetected Redundancy in the Burrows-Wheeler Transform

Authors: Uwe Baier


Abstract
The Burrows-Wheeler-Transform (BWT) is an invertible permutation of a text known to be highly compressible but also useful for sequence analysis, what makes the BWT highly attractive for lossless data compression. In this paper, we present a new technique to reduce the size of a BWT using its combinatorial properties, while keeping it invertible. The technique can be applied to any BWT-based compressor, and, as experiments show, is able to reduce the encoding size by 8-16 % on average and up to 33-57 % in the best cases (depending on the BWT-compressor used), making BWT-based compressors competitive or even superior to today's best lossless compressors.

Cite as

Uwe Baier. On Undetected Redundancy in the Burrows-Wheeler Transform. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 3:1-3:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{baier:LIPIcs.CPM.2018.3,
  author =	{Baier, Uwe},
  title =	{{On Undetected Redundancy in the Burrows-Wheeler Transform}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{3:1--3:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.3},
  URN =		{urn:nbn:de:0030-drops-87049},
  doi =		{10.4230/LIPIcs.CPM.2018.3},
  annote =	{Keywords: Lossless data compression, BWT, Tunneling}
}
Document
Quasi-Periodicity Under Mismatch Errors

Authors: Amihood Amir, Avivit Levy, and Ely Porat


Abstract
Tracing regularities plays a key role in data analysis for various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. Part of the scientific process is understanding and explaining these regularities. A common notion to describe regularity in a string T is a cover or quasi-period, which is a string C for which every letter of T lies within some occurrence of C. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper we initiate the study of quasi-periodicity persistence under mismatch errors, and our goal is to characterize situations where a given quasi-periodic string remains quasi-periodic even after substitution errors have been introduced to the string. Our study results in proving necessary conditions as well as a theorem stating sufficient conditions for quasi-periodicity persistence. As an application, we are able to close the gap in understanding the complexity of Approximate Cover Problem (ACP) relaxations studied by [Amir 2017a, Amir 2017b] and solve an open question.

Cite as

Amihood Amir, Avivit Levy, and Ely Porat. Quasi-Periodicity Under Mismatch Errors. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 4:1-4:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2018.4,
  author =	{Amir, Amihood and Levy, Avivit and Porat, Ely},
  title =	{{Quasi-Periodicity Under Mismatch Errors}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{4:1--4:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.4},
  URN =		{urn:nbn:de:0030-drops-87054},
  doi =		{10.4230/LIPIcs.CPM.2018.4},
  annote =	{Keywords: Periodicity, Quasi-Periodicity, Cover, Approximate Cover}
}
Document
Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant

Authors: Brian Brubach


Abstract
We present a new approach to approximating the Maximum Duo-Preservation String Mapping Problem (MPSM) based on massaging the constraints into a tractable matching problem. MPSM was introduced in Chen, Chen, Samatova, Peng, Wang, and Tang [Chen et al., 2014] as the complement to the well-studied Minimum Common String Partition problem (MCSP). Prior work also considers the k-MPSM and k-MCSP variants in which each letter occurs at most k times in each string. The authors of [Chen et al., 2014] showed a k^2-appoximation for k >= 3 and 2-approximation for k = 2. Boria, Kurpisz, Leppänen, and Mastrolilli [Boria et al., 2014] gave a 4-approximation independent of k and showed that even 2-MPSM is APX-Hard. A series of improvements led to the current best bounds of a (2 + epsilon)-approximation for any epsilon > 0 in n^{O(1/epsilon)} time for strings of length n and a 2.67-approximation running in O(n^2) time, both by Dudek, Gawrychowski, and Ostropolski-Nalewaja [Dudek et al., 2017]. Here, we show that a 2.67-approximation can surprisingly be achieved in O(n) time for alphabets of constant size and O(n + alpha^7) for alphabets of size alpha. Recently, Mehrabi [Mehrabi, 2017] introduced the more general weighted variant, Maximum Weight Duo-Preservation String Mapping (MWPSM) and provided a 6-approximation. Our approach gives a 2.67-approximation to this problem running in O(n^3) time. This approach can also find an 8/(3(1-epsilon))-approximation to MWPSM for any epsilon > 0 in O(n^2 epsilon^{-1} lg{epsilon^{-1}}) time using the approximate weighted matching algorithm of Duan and Pettie [Duan and Pettie, 2014]. Finally, we introduce the first streaming algorithm for MPSM. We show that a single pass suffices to find a 4-approximation on the size of an optimal solution using only O(alpha^2 lg{n}) space.

Cite as

Brian Brubach. Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 5:1-5:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{brubach:LIPIcs.CPM.2018.5,
  author =	{Brubach, Brian},
  title =	{{Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{5:1--5:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.5},
  URN =		{urn:nbn:de:0030-drops-87066},
  doi =		{10.4230/LIPIcs.CPM.2018.5},
  annote =	{Keywords: approximation algorithm, maximum duo-preservation string mapping, minimum common string partition, string comparison, streaming algorithm, comparative genomics}
}
Document
Nearest constrained circular words

Authors: Guillaume Blin, Alexandre Blondin Massé, Marie Gasparoux, Sylvie Hamel, and Élise Vandomme


Abstract
In this paper, we study circular words arising in the development of equipment using shields in brachytherapy. This equipment has physical constraints that have to be taken into consideration. From an algorithmic point of view, the problem can be formulated as follows: Given a circular word, find a constrained circular word of the same length such that the Manhattan distance between these two words is minimal. We show that we can solve this problem in pseudo polynomial time (polynomial time in practice) using dynamic programming.

Cite as

Guillaume Blin, Alexandre Blondin Massé, Marie Gasparoux, Sylvie Hamel, and Élise Vandomme. Nearest constrained circular words. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{blin_et_al:LIPIcs.CPM.2018.6,
  author =	{Blin, Guillaume and Blondin Mass\'{e}, Alexandre and Gasparoux, Marie and Hamel, Sylvie and Vandomme, \'{E}lise},
  title =	{{Nearest constrained circular words}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{6:1--6:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.6},
  URN =		{urn:nbn:de:0030-drops-87035},
  doi =		{10.4230/LIPIcs.CPM.2018.6},
  annote =	{Keywords: Circular constrained alignments, Manhattan distance, Application to brachytherapy}
}
Document
Online LZ77 Parsing and Matching Statistics with RLBWTs

Authors: Hideo Bannai, Travis Gagie, and Tomohiro I


Abstract
Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse T^R of a text T, to compute offline the LZ77 parse of T in O(n log r) time and O(r) space, where n is the length of T and r is the number of runs in the BWT of T^R. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of T while still using O(n log r) time and O(r) space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further - albeit making it static again and increasing its space by a factor proportional to the size of the alphabet - such that later, given another string S and O(log log n)-time random access to T, we can compute the matching statistics of S with respect to T in O(|S| log log n) time.

Cite as

Hideo Bannai, Travis Gagie, and Tomohiro I. Online LZ77 Parsing and Matching Statistics with RLBWTs. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 7:1-7:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2018.7,
  author =	{Bannai, Hideo and Gagie, Travis and I, Tomohiro},
  title =	{{Online LZ77 Parsing and Matching Statistics with RLBWTs}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{7:1--7:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.7},
  URN =		{urn:nbn:de:0030-drops-87025},
  doi =		{10.4230/LIPIcs.CPM.2018.7},
  annote =	{Keywords: Lempel-Ziv 1977, Matching Statistics, Run-Length Compressed Burrows-Wheeler Transform}
}
Document
Non-Overlapping Indexing - Cache Obliviously

Authors: Sahar Hooshmand, Paniz Abedin, M. Oguzhan Külekci, and Sharma V. Thankachan


Abstract
The non-overlapping indexing problem is defined as follows: pre-process a given text T[1,n] of length n into a data structure such that whenever a pattern P[1,p] comes as an input, we can efficiently report the largest set of non-overlapping occurrences of P in T. The best known solution is by Cohen and Porat [ISAAC, 2009]. Their index size is O(n) words and query time is optimal O(p+nocc), where nocc is the output size. We study this problem in the cache-oblivious model and present a new data structure of size O(n log n) words. It can answer queries in optimal O(p/(B)+log_B n+nocc/B) I/Os, where B is the block size.

Cite as

Sahar Hooshmand, Paniz Abedin, M. Oguzhan Külekci, and Sharma V. Thankachan. Non-Overlapping Indexing - Cache Obliviously. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 8:1-8:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{hooshmand_et_al:LIPIcs.CPM.2018.8,
  author =	{Hooshmand, Sahar and Abedin, Paniz and K\"{u}lekci, M. Oguzhan and Thankachan, Sharma V.},
  title =	{{Non-Overlapping Indexing - Cache Obliviously}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{8:1--8:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.8},
  URN =		{urn:nbn:de:0030-drops-87009},
  doi =		{10.4230/LIPIcs.CPM.2018.8},
  annote =	{Keywords: Suffix Trees, Cache Oblivious, Data Structure, String Algorithms}
}
Document
Faster Online Elastic Degenerate String Matching

Authors: Kotaro Aoyama, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda


Abstract
An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm sqrt{m log m} + N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm^2 + N) time.

Cite as

Kotaro Aoyama, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Online Elastic Degenerate String Matching. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 9:1-9:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{aoyama_et_al:LIPIcs.CPM.2018.9,
  author =	{Aoyama, Kotaro and Nakashima, Yuto and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Faster Online Elastic Degenerate String Matching}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{9:1--9:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.9},
  URN =		{urn:nbn:de:0030-drops-87016},
  doi =		{10.4230/LIPIcs.CPM.2018.9},
  annote =	{Keywords: elastic degenerate pattern matching, boolean convolution}
}
Document
A Simple Linear-Time Algorithm for Computing the Centroid and Canonical Form of a Plane Graph and Its Applications

Authors: Tatsuya Akutsu, Colin de la Higuera, and Takeyuki Tamura


Abstract
We present a simple linear-time algorithm for computing the topological centroid and the canonical form of a plane graph. Although the targets are restricted to plane graphs, it is much simpler than the linear-time algorithm by Hopcroft and Wong for determination of the canonical form and isomorphism of planar graphs. By utilizing a modified centroid for outerplanar graphs, we present a linear-time algorithm for a geometric version of the maximum common connected edge subgraph (MCCES) problem for the special case in which input geometric graphs have outerplanar structures, MCCES can be obtained by deleting at most a constant number of edges from each input graph, and both the maximum degree and the maximum face degree are bounded by constants.

Cite as

Tatsuya Akutsu, Colin de la Higuera, and Takeyuki Tamura. A Simple Linear-Time Algorithm for Computing the Centroid and Canonical Form of a Plane Graph and Its Applications. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 10:1-10:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{akutsu_et_al:LIPIcs.CPM.2018.10,
  author =	{Akutsu, Tatsuya and de la Higuera, Colin and Tamura, Takeyuki},
  title =	{{A Simple Linear-Time Algorithm for Computing the Centroid and Canonical Form of a Plane Graph and Its Applications}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{10:1--10:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.10},
  URN =		{urn:nbn:de:0030-drops-86992},
  doi =		{10.4230/LIPIcs.CPM.2018.10},
  annote =	{Keywords: Plane graph, Graph isomorphism, Maximum common subgraph}
}
Document
Locally Maximal Common Factors as a Tool for Efficient Dynamic String Algorithms

Authors: Amihood Amir and Itai Boneh


Abstract
There has been recent interest in dynamic string algorithms, i.e. string problems where the input changes dynamically. One such problem is the longest common factor (LCF) problem. It is well known that the LCF of two strings S and D of length n over a fixed constant-sized alphabet Sigma can be computed in time linear in n. Recently, a new challenge was introduced - finding the LCF of two strings in a dynamic setting. The problem is the fully dynamic one sided LCF (FDOS-LCF) problem. In the FDOS-LCF problem we get q consecutive queries of the form <i,alpha >, where each such query means: "replace D[i] by alpha, alpha in Sigma and output the LCF of S and (the updated) D. The goal is to initially preprocess S and D so that we do not need O(n) time to compute an LCF for each such query. The state-of-the-art is an algorithm that preprocesses the two strings S and D in time O(n log^4 n). Subsequently, the algorithm answers in time O(log^3 n) a single query of the form: Given a position i on D and a letter alpha, return an LCF of S and D', where D' is the string resulting from D after substituting D[i] with alpha. That algorithm is not extendable to multiple queries. In this paper we present a tool - Locally Maximal Common Factors (LMCF) - that proves to be quite useful in solving some restricted versions of the FDOS-LCF problem . The versions we solve are the Decremental FDOS-LCS problem, where every change <i,alpha> is of the form <i,omega>, omega !in Sigma, and the Periodic FDOS-LCS problem, where S is a periodic string with period length p. For the decremental problem we provide an algorithm with linear time preprocessing and O(log log n) time per query. For the periodic problem our preprocessing time is linear and the query time is O(p log log n).

Cite as

Amihood Amir and Itai Boneh. Locally Maximal Common Factors as a Tool for Efficient Dynamic String Algorithms. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 11:1-11:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2018.11,
  author =	{Amir, Amihood and Boneh, Itai},
  title =	{{Locally Maximal Common Factors as a Tool for Efficient Dynamic String Algorithms}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{11:1--11:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.11},
  URN =		{urn:nbn:de:0030-drops-86983},
  doi =		{10.4230/LIPIcs.CPM.2018.11},
  annote =	{Keywords: Dynamic Algorithms, Periodicity, Longest Common Factor, Priority Queue Data Structures, Suffix
Tree, Balanced Search Tree, Range Maximum Queries}
}
Document
Longest substring palindrome after edit

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda


Abstract
It is known that the length of the longest substring palindromes (LSPals) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(log (min {sigma, log n })) time after single character substitution, insertion, or deletion, where sigma denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log n) time, after an existing substring in T is replaced by a string of arbitrary length l.

Cite as

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest substring palindrome after edit. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 12:1-12:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.CPM.2018.12,
  author =	{Funakoshi, Mitsuru and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Longest substring palindrome after edit}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{12:1--12:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.12},
  URN =		{urn:nbn:de:0030-drops-86977},
  doi =		{10.4230/LIPIcs.CPM.2018.12},
  annote =	{Keywords: maximal palindromes, edit operations, periodicity, suffix trees}
}
Document
A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment

Authors: Brian Brubach and Jay Ghurye


Abstract
The classical Four Russians speedup for computing edit distance (a.k.a. Levenshtein distance), due to Masek and Paterson [Masek and Paterson, 1980], involves partitioning the dynamic programming table into k-by-k square blocks and generating a lookup table in O(psi^{2k} k^2 |Sigma|^{2k}) time and O(psi^{2k} k |Sigma|^{2k}) space for block size k, where psi depends on the cost function (for unit costs psi = 3) and |Sigma| is the size of the alphabet. We show that the O(psi^{2k} k^2) and O(psi^{2k} k) factors can be improved to O(k^2 lg{k}) time and O(k^2) space. Thus, we improve the time and space complexity of that aspect compared to Masek and Paterson [Masek and Paterson, 1980] and remove the dependence on psi. We further show that for certain problems the O(|Sigma|^{2k}) factor can also be reduced. Using this technique, we show a new algorithm for the fundamental problem of one-against-many banded alignment. In particular, comparing one string of length m to n other strings of length m with maximum distance d can be performed in O(n m + m d^2 lg{d} + n d^3) time. When d is reasonably small, this approaches or meets the current best theoretic result of O(nm + n d^2) achieved by using the best known pairwise algorithm running in O(m + d^2) time [Myers, 1986][Ukkonen, 1985] while potentially being more practical. It also improves on the standard practical approach which requires O(n m d) time to iteratively run an O(md) time pairwise banded alignment algorithm. Regarding pairwise comparison, we extend the classic result of Masek and Paterson [Masek and Paterson, 1980] which computes the edit distance between two strings in O(m^2/log{m}) time to remove the dependence on psi even when edits have arbitrary costs from a penalty matrix. Crochemore, Landau, and Ziv-Ukelson [Crochemore, 2003] achieved a similar result, also allowing for unrestricted scoring matrices, but with variable-sized blocks. In practical applications of the Four Russians speedup wherein space efficiency is important and smaller block sizes k are used (notably k < |Sigma|), Kim, Na, Park, and Sim [Kim et al., 2016] showed how to remove the dependence on the alphabet size for the unit cost version, generating a lookup table in O(3^{2k} (2k)! k^2) time and O(3^{2k} (2k)! k) space. Combining their work with our result yields an improvement to O((2k)! k^2 lg{k}) time and O((2k)! k^2) space.

Cite as

Brian Brubach and Jay Ghurye. A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 13:1-13:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{brubach_et_al:LIPIcs.CPM.2018.13,
  author =	{Brubach, Brian and Ghurye, Jay},
  title =	{{A Succinct Four Russians Speedup for Edit Distance Computation and One-against-many Banded Alignment}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{13:1--13:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.13},
  URN =		{urn:nbn:de:0030-drops-86965},
  doi =		{10.4230/LIPIcs.CPM.2018.13},
  annote =	{Keywords: edit distance, banded alignment, one-against-many alignment, genomics, method of the Four Russians}
}
Document
Can a permutation be sorted by best short swaps?

Authors: Shu Zhang, Daming Zhu, Haitao Jiang, Jingjing Ma, Jiong Guo, and Haodi Feng


Abstract
A short swap switches two elements with at most one element caught between them. Sorting permutation by short swaps asks to find a shortest short swap sequence to transform a permutation into another. A short swap can eliminate at most three inversions. It is still open for whether a permutation can be sorted by short swaps each of which can eliminate three inversions. In this paper, we present a polynomial time algorithm to solve the problem, which can decide whether a permutation can be sorted by short swaps each of which can eliminate 3 inversions in O(n) time, and if so, sort the permutation by such short swaps in O(n^2) time, where n is the number of elements in the permutation. A short swap can cause the total length of two element vectors to decrease by at most 4. We further propose an algorithm to recognize a permutation which can be sorted by short swaps each of which can cause the element vector length sum to decrease by 4 in O(n) time, and if so, sort the permutation by such short swaps in O(n^2) time. This improves upon the O(n^2) algorithm proposed by Heath and Vergara to decide whether a permutation is so called lucky.

Cite as

Shu Zhang, Daming Zhu, Haitao Jiang, Jingjing Ma, Jiong Guo, and Haodi Feng. Can a permutation be sorted by best short swaps?. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 14:1-14:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{zhang_et_al:LIPIcs.CPM.2018.14,
  author =	{Zhang, Shu and Zhu, Daming and Jiang, Haitao and Ma, Jingjing and Guo, Jiong and Feng, Haodi},
  title =	{{Can a permutation be sorted by best short swaps?}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{14:1--14:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.14},
  URN =		{urn:nbn:de:0030-drops-86957},
  doi =		{10.4230/LIPIcs.CPM.2018.14},
  annote =	{Keywords: Algorithm, Complexity, Short Swap, Permutation, Reversal}
}
Document
Computing longest common square subsequences

Authors: Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, Hideo Bannai, and Masayuki Takeda


Abstract
A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings.

Cite as

Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, Hideo Bannai, and Masayuki Takeda. Computing longest common square subsequences. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{inoue_et_al:LIPIcs.CPM.2018.15,
  author =	{Inoue, Takafumi and Inenaga, Shunsuke and Hyyr\"{o}, Heikki and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Computing longest common square subsequences}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.15},
  URN =		{urn:nbn:de:0030-drops-86946},
  doi =		{10.4230/LIPIcs.CPM.2018.15},
  annote =	{Keywords: squares, subsequences, matching rectangles, dynamic programming}
}
Document
Slowing Down Top Trees for Better Worst-Case Compression

Authors: Bartlomiej Dudek and Pawel Gawrychowski


Abstract
We consider the top tree compression scheme introduced by Bille et al. [ICALP 2013] and construct an infinite family of trees on n nodes labeled from an alphabet of size sigma, for which the size of the top DAG is Theta(n/log_sigma n log log_sigma n). Our construction matches a previously known upper bound and exhibits a weakness of this scheme, as the information-theoretic lower bound is Omega(n/log_sigma n}). This settles an open problem stated by Lohrey et al. [arXiv 2017], who designed a more involved version achieving the lower bound. We show that this can be also guaranteed by a very minor modification of the original scheme: informally, one only needs to ensure that different parts of the tree are not compressed too quickly. Arguably, our version is more uniform, and in particular, the compression procedure is oblivious to the value of sigma.

Cite as

Bartlomiej Dudek and Pawel Gawrychowski. Slowing Down Top Trees for Better Worst-Case Compression. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 16:1-16:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{dudek_et_al:LIPIcs.CPM.2018.16,
  author =	{Dudek, Bartlomiej and Gawrychowski, Pawel},
  title =	{{Slowing Down Top Trees for Better Worst-Case Compression}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{16:1--16:8},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.16},
  URN =		{urn:nbn:de:0030-drops-86920},
  doi =		{10.4230/LIPIcs.CPM.2018.16},
  annote =	{Keywords: top trees, compression, tree grammars}
}
Document
On the Maximum Colorful Arborescence Problem and Color Hierarchy Graph Structure

Authors: Guillaume Fertin, Julien Fradin, and Christian Komusiewicz


Abstract
Let G=(V,A) be a vertex-colored arc-weighted directed acyclic graph (DAG) rooted in some vertex r. The color hierarchy graph H(G) of G is defined as follows: the vertex set of H(G) is the color set C of G, and H(G) has an arc from c to c' if G has an arc from a vertex of color c to a vertex of color c'. We study the Maximum Colorful Arborescence (MCA) problem, which takes as input a DAG G such that H(G) is also a DAG, and aims at finding in G a maximum-weight arborescence rooted in r in which no color appears more than once. The MCA problem models the de novo inference of unknown metabolites by mass spectrometry experiments. Although the problem has been introduced ten years ago (under a different name), it was only recently pointed out that a crucial additional property in the problem definition was missing: by essence, H(G) must be a DAG. In this paper, we further investigate MCA under this new light and provide new algorithmic results for this problem, with a focus on fixed-parameter tractability (FPT) issues for different structural parameters of H(G). In particular, we develop an O^*(3^{{x_H}})-time algorithm for solving MCA, where {x_{H}} is the number of vertices of indegree at least two in H(G), thereby improving the O^*(3^{|C|})-time algorithm of Böcker et al. [Proc. ECCB '08]. We also prove that MCA is W[2]-hard with respect to the treewidth t_H of the underlying undirected graph of H(G), and further show that it is FPT with respect to t_H + l_{C}, where l_{C} := |V| - |C|.

Cite as

Guillaume Fertin, Julien Fradin, and Christian Komusiewicz. On the Maximum Colorful Arborescence Problem and Color Hierarchy Graph Structure. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 17:1-17:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{fertin_et_al:LIPIcs.CPM.2018.17,
  author =	{Fertin, Guillaume and Fradin, Julien and Komusiewicz, Christian},
  title =	{{On the Maximum Colorful Arborescence Problem and Color Hierarchy Graph Structure}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{17:1--17:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.17},
  URN =		{urn:nbn:de:0030-drops-86939},
  doi =		{10.4230/LIPIcs.CPM.2018.17},
  annote =	{Keywords: Subgraph problem, computational complexity, algorithms, fixed-parameter tractability, kernelization}
}
Document
Dualities in Tree Representations

Authors: Rayan Chikhi and Alexander Schönhuth


Abstract
A characterization of the tree T^* such that BP(T^*)=ova{DFUDS(T)}, the reversal of DFUDS(T) is given. An immediate consequence is a rigorous characterization of the tree T^ such that BP(T^)=DFUDS(T). In summary, BP and DFUDS are unified within an encompassing framework, which might have the potential to imply future simplifications with regard to queries in BP and/or DFUDS. Immediate benefits displayed here are to identify so far unnoted commonalities in most recent work on the Range Minimum Query problem, and to provide improvements for the Minimum Length Interval Query problem.

Cite as

Rayan Chikhi and Alexander Schönhuth. Dualities in Tree Representations. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 18:1-18:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{chikhi_et_al:LIPIcs.CPM.2018.18,
  author =	{Chikhi, Rayan and Sch\"{o}nhuth, Alexander},
  title =	{{Dualities in Tree Representations}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{18:1--18:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.18},
  URN =		{urn:nbn:de:0030-drops-86901},
  doi =		{10.4230/LIPIcs.CPM.2018.18},
  annote =	{Keywords: Data Structures, Succinct Tree Representation, Balanced Parenthesis Representation, Isomorphisms}
}
Document
Longest Lyndon Substring After Edit

Authors: Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda


Abstract
The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T') where T' is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(l log sigma + log n) time for any block edit where sigma is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T' for both problems.

Cite as

Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest Lyndon Substring After Edit. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 19:1-19:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{urabe_et_al:LIPIcs.CPM.2018.19,
  author =	{Urabe, Yuki and Nakashima, Yuto and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Longest Lyndon Substring After Edit}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{19:1--19:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.19},
  URN =		{urn:nbn:de:0030-drops-86913},
  doi =		{10.4230/LIPIcs.CPM.2018.19},
  annote =	{Keywords: Lyndon word, Lyndon factorization, Lyndon tree, Edit operation}
}
Document
The Heaviest Induced Ancestors Problem Revisited

Authors: Paniz Abedin, Sahar Hooshmand, Arnab Ganguly, and Sharma V. Thankachan


Abstract
We revisit the heaviest induced ancestors problem, which has several interesting applications in string matching. Let T_1 and T_2 be two weighted trees, where the weight W(u) of a node u in either of the two trees is more than the weight of u's parent. Additionally, the leaves in both trees are labeled and the labeling of the leaves in T_2 is a permutation of those in T_1. A node x in T_1 and a node y in T_2 are induced, iff their subtree have at least one common leaf label. A heaviest induced ancestor query HIA(u_1,u_2) is: given a node u_1 in T_1 and a node u_2 in T_2, output the pair (u_1^*,u_2^*) of induced nodes with the highest combined weight W(u^*_1) + W(u^*_2), such that u_1^* is an ancestor of u_1 and u^*_2 is an ancestor of u_2. Let n be the number of nodes in both trees combined and epsilon >0 be an arbitrarily small constant. Gagie et al. [CCCG' 13] introduced this problem and proposed three solutions with the following space-time trade-offs: - an O(n log^2n)-word data structure with O(log n log log n) query time - an O(n log n)-word data structure with O(log^2 n) query time - an O(n)-word data structure with O(log^{3+epsilon}n) query time. In this paper, we revisit this problem and present new data structures, with improved bounds. Our results are as follows. - an O(n log n)-word data structure with O(log n log log n) query time - an O(n)-word data structure with O(log^2 n/log log n) query time. As a corollary, we also improve the LZ compressed index of Gagie et al. [CCCG' 13] for answering longest common substring (LCS) queries. Additionally, we show that the LCS after one edit problem of size n [Amir et al., SPIRE' 17] can also be reduced to the heaviest induced ancestors problem over two trees of n nodes in total. This yields a straightforward improvement over its current solution of O(n log^3 n) space and O(log^3 n) query time.

Cite as

Paniz Abedin, Sahar Hooshmand, Arnab Ganguly, and Sharma V. Thankachan. The Heaviest Induced Ancestors Problem Revisited. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 20:1-20:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{abedin_et_al:LIPIcs.CPM.2018.20,
  author =	{Abedin, Paniz and Hooshmand, Sahar and Ganguly, Arnab and Thankachan, Sharma V.},
  title =	{{The Heaviest Induced Ancestors Problem Revisited}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{20:1--20:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.20},
  URN =		{urn:nbn:de:0030-drops-86898},
  doi =		{10.4230/LIPIcs.CPM.2018.20},
  annote =	{Keywords: Data Structure, String Algorithms, Orthogonal Range Queries}
}
Document
Superstrings with multiplicities

Authors: Bastien Cazaux and Eric Rivals


Abstract
A superstring of a set of words P = {s_1, ..., s_p } is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word s_i comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of s_i. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems.

Cite as

Bastien Cazaux and Eric Rivals. Superstrings with multiplicities. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 21:1-21:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{cazaux_et_al:LIPIcs.CPM.2018.21,
  author =	{Cazaux, Bastien and Rivals, Eric},
  title =	{{Superstrings with multiplicities}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{21:1--21:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.21},
  URN =		{urn:nbn:de:0030-drops-86881},
  doi =		{10.4230/LIPIcs.CPM.2018.21},
  annote =	{Keywords: greedy algorithm, approximation, overlap, cyclic cover, APX, subset system}
}
Document
Linear-time algorithms for the subpath kernel

Authors: Kilho Shin and Taichi Ishikawa


Abstract
The subpath kernel is a useful positive definite kernel, which takes arbitrary rooted trees as input, no matter whether they are ordered or unordered, We first show that the subpath kernel can exhibit excellent classification performance in combination with SVM through an intensive experiment. Secondly, we develop a theory of irreducible trees, and then, using it as a rigid mathematical basis, reconstruct a bottom-up linear-time algorithm for the subtree kernel, which is a correction of an algorithm well-known in the literature. Thirdly, we show a novel top-down algorithm, with which we can realize a linear-time parallel-computing algorithm to compute the subpath kernel.

Cite as

Kilho Shin and Taichi Ishikawa. Linear-time algorithms for the subpath kernel. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 22:1-22:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{shin_et_al:LIPIcs.CPM.2018.22,
  author =	{Shin, Kilho and Ishikawa, Taichi},
  title =	{{Linear-time algorithms for the subpath kernel}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{22:1--22:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.22},
  URN =		{urn:nbn:de:0030-drops-86877},
  doi =		{10.4230/LIPIcs.CPM.2018.22},
  annote =	{Keywords: tree, kernel, suffix tree}
}
Document
Linear-Time Algorithm for Long LCF with k Mismatches

Authors: Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen


Abstract
In the Longest Common Factor with k Mismatches (LCF_k) problem, we are given two strings X and Y of total length n, and we are asked to find a pair of maximal-length factors, one of X and the other of Y, such that their Hamming distance is at most k. Thankachan et al. [Thankachan et al. 2016] show that this problem can be solved in O(n log^k n) time and O(n) space for constant k. We consider the LCF_k(l) problem in which we assume that the sought factors have length at least l. We use difference covers to reduce the LCF_k(l) problem with l=Omega(log^{2k+2}n) to a task involving m=O(n/log^{k+1}n) synchronized factors. The latter can be solved in O(m log^{k+1}m) time, which results in a linear-time algorithm for LCF_k(l) with l=Omega(log^{2k+2}n). In general, our solution to the LCF_k(l) problem for arbitrary l takes O(n + n log^{k+1} n/sqrt{l}) time.

Cite as

Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Linear-Time Algorithm for Long LCF with k Mismatches. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 23:1-23:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2018.23,
  author =	{Charalampopoulos, Panagiotis and Crochemore, Maxime and Iliopoulos, Costas S. and Kociumaka, Tomasz and Pissis, Solon P. and Radoszewski, Jakub and Rytter, Wojciech and Walen, Tomasz},
  title =	{{Linear-Time Algorithm for Long LCF with k Mismatches}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{23:1--23:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.23},
  URN =		{urn:nbn:de:0030-drops-86869},
  doi =		{10.4230/LIPIcs.CPM.2018.23},
  annote =	{Keywords: longest common factor, longest common substring, Hamming distance, heavy-light decomposition, difference cover}
}
Document
Lyndon Factorization of Grammar Compressed Texts Revisited

Authors: Isamu Furuya, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda


Abstract
We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.

Cite as

Isamu Furuya, Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Lyndon Factorization of Grammar Compressed Texts Revisited. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 24:1-24:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{furuya_et_al:LIPIcs.CPM.2018.24,
  author =	{Furuya, Isamu and Nakashima, Yuto and I, Tomohiro and Inenaga, Shunsuke and Bannai, Hideo and Takeda, Masayuki},
  title =	{{Lyndon Factorization of Grammar Compressed Texts Revisited}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{24:1--24:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.24},
  URN =		{urn:nbn:de:0030-drops-86855},
  doi =		{10.4230/LIPIcs.CPM.2018.24},
  annote =	{Keywords: Lyndon word, Lyndon factorization, Straight line program}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail