Search Results

Documents authored by Amir, Amihood


Document
The k-Mappability Problem Revisited

Authors: Amihood Amir, Itai Boneh, and Eitan Kondratovsky

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)


Abstract
The k-mappability problem has two integers parameters m and k. For every subword of size m in a text S, we wish to report the number of indices in S in which the word occurs with at most k mismatches. The problem was lately tackled by Alzamel et al. [Mai Alzamel et al., 2018]. For a text with constant alphabet Σ and k ∈ O(1), they present an algorithm with linear space and O(nlog^{k+1}n) time. For the case in which k = 1 and a constant size alphabet, a faster algorithm with linear space and O(nlog(n)log log(n)) time was presented in [Mai Alzamel et al., 2020]. In this work, we enhance the techniques of [Mai Alzamel et al., 2020] to obtain an algorithm with linear space and O(n log(n)) time for k = 1. Our algorithm removes the constraint of the alphabet being of constant size. We also present linear algorithms for the case of k = 1, |Σ| ∈ O(1) and m = Ω(√n).

Cite as

Amihood Amir, Itai Boneh, and Eitan Kondratovsky. The k-Mappability Problem Revisited. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 5:1-5:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2021.5,
  author =	{Amir, Amihood and Boneh, Itai and Kondratovsky, Eitan},
  title =	{{The k-Mappability Problem Revisited}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{5:1--5:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.5},
  URN =		{urn:nbn:de:0030-drops-139566},
  doi =		{10.4230/LIPIcs.CPM.2021.5},
  annote =	{Keywords: Pattern Matching, Hamming Distance, Suffix Tree, Suffix Array}
}
Document
Update Query Time Trade-Off for Dynamic Suffix Arrays

Authors: Amihood Amir and Itai Boneh

Published in: LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)


Abstract
The Suffix Array SA(S) of a string S[1 … n] is an array containing all the suffixes of S sorted by lexicographic order. The suffix array is one of the most well known indexing data structures, and it functions as a key tool in many string algorithms. In this paper, we present a data structure for maintaining the Suffix Array of a dynamic string. For every 1 ≤ k ≤ n, our data structure reports SA[i] in 𝒪̃(n/k) time and handles text modification in 𝒪̃(k) time. Additionally, our data structure enables the same query time for reporting iSA[i], with iSA being the Inverse Suffix Array of S[1 … n]. Our data structure can be used to construct sub-linear dynamic variants of static strings algorithms or data structures that are based on the Suffix Array and the Inverse Suffix Array.

Cite as

Amihood Amir and Itai Boneh. Update Query Time Trade-Off for Dynamic Suffix Arrays. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 63:1-63:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ISAAC.2020.63,
  author =	{Amir, Amihood and Boneh, Itai},
  title =	{{Update Query Time Trade-Off for Dynamic Suffix Arrays}},
  booktitle =	{31st International Symposium on Algorithms and Computation (ISAAC 2020)},
  pages =	{63:1--63:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-173-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{181},
  editor =	{Cao, Yixin and Cheng, Siu-Wing and Li, Minming},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.63},
  URN =		{urn:nbn:de:0030-drops-134070},
  doi =		{10.4230/LIPIcs.ISAAC.2020.63},
  annote =	{Keywords: String Algorithms, Dynamic Algorithms, Suffix Array, Inverse Suffix Array}
}
Document
Analysis of the Period Recovery Error Bound

Authors: Amihood Amir, Itai Boneh, Michael Itzhaki, and Eitan Kondratovsky

Published in: LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)


Abstract
The recovery problem is the problem whose input is a corrupted text T that was originally periodic, and where one wishes to recover its original period. The algorithm’s input is T without any information about either the period’s length or the period itself. An algorithm that solves this problem is called a recovery algorithm. In order to make recovery possible, there must be some assumption that not "too many" errors corrupted the initial periodic string. This is called the error bound. In previous recovery algorithms, it was shown that a given error bound of n/((2+ε)p) can lead to O(log_{1+ε} n) period candidates, that are guaranteed to include the original period, where p is the length of the original period (unknown by the algorithm) and ε > 0 is an arbitrary constant. This paper provides the first analysis of the relationship between the error bound and the number of candidates, as well as identification of the error parameters that still guarantee recovery. We improve the previously known upper error bound on the number of corruptions, n/((2+ε)p), that outputs O(log_{1+ε} n) period candidates. We show how to (1) remove ε from the bound, (2) relax the error bound to allow more errors while keeping the candidates set of size O(log n). It turns out that this relaxation on the previously known upper bound is quite challenging. To achieve this result we provide what, to our knowledge, is the first known non-trivial lower bound on the Hamming distance between two periodic strings. This proof leads to an error bound, that produces a family of period candidates of size 2log₃ n. We show that this result is tight and further provide a compact representation of the period candidates. We call this representation the canonic period seed. In addition to providing less restrictive error bounds that guarantee a smaller candidate set, we also provide a hierarchy of more restrictive upper error bounds that asymptotically reduces the size of the potential period candidate set.

Cite as

Amihood Amir, Itai Boneh, Michael Itzhaki, and Eitan Kondratovsky. Analysis of the Period Recovery Error Bound. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 5:1-5:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ESA.2020.5,
  author =	{Amir, Amihood and Boneh, Itai and Itzhaki, Michael and Kondratovsky, Eitan},
  title =	{{Analysis of the Period Recovery Error Bound}},
  booktitle =	{28th Annual European Symposium on Algorithms (ESA 2020)},
  pages =	{5:1--5:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-162-7},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{173},
  editor =	{Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.5},
  URN =		{urn:nbn:de:0030-drops-128717},
  doi =		{10.4230/LIPIcs.ESA.2020.5},
  annote =	{Keywords: Period Recovery, Period Recovery Hierarchy, Hamming Distance}
}
Document
Double String Tandem Repeats

Authors: Amihood Amir, Ayelet Butman, Gad M. Landau, Shoshana Marcus, and Dina Sokol

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)


Abstract
A tandem repeat is an occurrence of two adjacent identical substrings. In this paper, we introduce the notion of a double string, which consists of two parallel strings, and we study the problem of locating all tandem repeats in a double string. The problem introduced here has applications beyond actual double strings, as we illustrate by solving two different problems with the algorithm of the double string tandem repeats problem. The first problem is that of finding all corner-sharing tandems in a 2-dimensional text, defined by Apostolico and Brimkov. The second problem is that of finding all scaled tandem repeats in a 1d text, where a scaled tandem repeat is defined as a string UU' such that U' is discrete scale of U. In addition to the algorithms for exact tandem repeats, we also present algorithms that solve the problem in the inexact sense, allowing up to k mismatches. We believe that this framework will open a new perspective for other problems in the future.

Cite as

Amihood Amir, Ayelet Butman, Gad M. Landau, Shoshana Marcus, and Dina Sokol. Double String Tandem Repeats. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2020.3,
  author =	{Amir, Amihood and Butman, Ayelet and Landau, Gad M. and Marcus, Shoshana and Sokol, Dina},
  title =	{{Double String Tandem Repeats}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{3:1--3:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.3},
  URN =		{urn:nbn:de:0030-drops-121283},
  doi =		{10.4230/LIPIcs.CPM.2020.3},
  annote =	{Keywords: double string, tandem repeat, 2-dimensional, scale}
}
Document
Repetition Detection in a Dynamic String

Authors: Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos, and Eitan Kondratovsky

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)


Abstract
A string UU for a non-empty string U is called a square. Squares have been well-studied both from a combinatorial and an algorithmic perspective. In this paper, we are the first to consider the problem of maintaining a representation of the squares in a dynamic string S of length at most n. We present an algorithm that updates this representation in n^o(1) time. This representation allows us to report a longest square-substring of S in O(1) time and all square-substrings of S in O(output) time. We achieve this by introducing a novel tool - maintaining prefix-suffix matches of two dynamic strings. We extend the above result to address the problem of maintaining a representation of all runs (maximal repetitions) of the string. Runs are known to capture the periodic structure of a string, and, as an application, we show that our representation of runs allows us to efficiently answer periodicity queries for substrings of a dynamic string. These queries have proven useful in static pattern matching problems and our techniques have the potential of offering solutions to these problems in a dynamic text setting.

Cite as

Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos, and Eitan Kondratovsky. Repetition Detection in a Dynamic String. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ESA.2019.5,
  author =	{Amir, Amihood and Boneh, Itai and Charalampopoulos, Panagiotis and Kondratovsky, Eitan},
  title =	{{Repetition Detection in a Dynamic String}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{5:1--5:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.5},
  URN =		{urn:nbn:de:0030-drops-111265},
  doi =		{10.4230/LIPIcs.ESA.2019.5},
  annote =	{Keywords: string algorithms, dynamic algorithms, squares, repetitions, runs}
}
Document
Longest Common Substring Made Fully Dynamic

Authors: Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)


Abstract
Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an O(n)-time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to this problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in O~(n^(2/3)) time, after O~(n)-time and space preprocessing. This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, they presented an O~(n)-sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in O~(1) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings. We show that the techniques we develop can be applied to obtain fully dynamic algorithms for all of these variants. The only previously known sublinear-time dynamic algorithms for problems on strings were for maintaining a dynamic collection of strings for comparison queries and for pattern matching, with the most recent advances made by Gawrychowski et al. [SODA 2018] and by Clifford et al. [STACS 2018]. As an intermediate problem we consider computing the solution for a string with a given set of k edits, which leads us, in particular, to answering internal queries on a string. The input to such a query is specified by a substring (or substrings) of a given string. Data structures for answering internal string queries that were proposed by Kociumaka et al. [SODA 2015] and by Gagie et al. [CCCG 2013] are used, along with new ones, based on ingredients such as the suffix tree, heavy-path decomposition, orthogonal range queries, difference covers, and string periodicity.

Cite as

Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest Common Substring Made Fully Dynamic. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 6:1-6:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ESA.2019.6,
  author =	{Amir, Amihood and Charalampopoulos, Panagiotis and Pissis, Solon P. and Radoszewski, Jakub},
  title =	{{Longest Common Substring Made Fully Dynamic}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{6:1--6:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.6},
  URN =		{urn:nbn:de:0030-drops-111275},
  doi =		{10.4230/LIPIcs.ESA.2019.6},
  annote =	{Keywords: longest common substring, string algorithms, dynamic algorithms}
}
Document
Sufficient Conditions for Efficient Indexing Under Different Matchings

Authors: Amihood Amir and Eitan Kondratovsky

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
The most important task derived from the massive digital data accumulation in the world, is efficient access to this data, hence the importance of indexing. In the last decade, many different types of matching relations were defined, each requiring an efficient indexing scheme. Cole and Hariharan in a ground breaking paper [Cole and Hariharan, SIAM J. Comput., 33(1):26–42, 2003], formulate sufficient conditions for building an efficient indexing for quasi-suffix collections, collections that behave as suffixes. It was shown that known matchings, including parameterized, 2-D array and order preserving matchings, fit their indexing settings. In this paper, we formulate more basic sufficient conditions based on the order relation derived from the matching relation itself, our conditions are more general than the previously known conditions.

Cite as

Amihood Amir and Eitan Kondratovsky. Sufficient Conditions for Efficient Indexing Under Different Matchings. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 6:1-6:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2019.6,
  author =	{Amir, Amihood and Kondratovsky, Eitan},
  title =	{{Sufficient Conditions for Efficient Indexing Under Different Matchings}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{6:1--6:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.6},
  URN =		{urn:nbn:de:0030-drops-104773},
  doi =		{10.4230/LIPIcs.CPM.2019.6},
  annote =	{Keywords: off-the-shelf indexing algorithms, general matching relations, weaker sufficient conditions for indexing}
}
Document
Cartesian Tree Matching and Indexing

Authors: Sung Gwan Park, Amihood Amir, Gad M. Landau, and Kunsoo Park

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
We introduce a new metric of match, called Cartesian tree matching, which means that two strings match if they have the same Cartesian trees. Based on Cartesian tree matching, we define single pattern matching for a text of length n and a pattern of length m, and multiple pattern matching for a text of length n and k patterns of total length m. We present an O(n+m) time algorithm for single pattern matching, and an O((n+m) log k) deterministic time or O(n+m) randomized time algorithm for multiple pattern matching. We also define an index data structure called Cartesian suffix tree, and present an O(n) randomized time algorithm to build the Cartesian suffix tree. Our efficient algorithms for Cartesian tree matching use a representation of the Cartesian tree, called the parent-distance representation.

Cite as

Sung Gwan Park, Amihood Amir, Gad M. Landau, and Kunsoo Park. Cartesian Tree Matching and Indexing. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 16:1-16:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{park_et_al:LIPIcs.CPM.2019.16,
  author =	{Park, Sung Gwan and Amir, Amihood and Landau, Gad M. and Park, Kunsoo},
  title =	{{Cartesian Tree Matching and Indexing}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{16:1--16:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.16},
  URN =		{urn:nbn:de:0030-drops-104879},
  doi =		{10.4230/LIPIcs.CPM.2019.16},
  annote =	{Keywords: Cartesian tree matching, Pattern matching, Indexing, Parent-distance representation}
}
Document
Two-Dimensional Maximal Repetitions

Authors: Amihood Amir, Gad M. Landau, Shoshana Marcus, and Dina Sokol

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)


Abstract
Maximal repetitions or runs in strings have a wide array of applications and thus have been extensively studied. In this paper, we extend this notion to 2-dimensions, precisely defining a maximal 2D repetition. We provide initial bounds on the number of maximal 2D repetitions that can occur in a matrix. The main contribution of this paper is the presentation of the first algorithm for locating all maximal 2D repetitions in a matrix. The algorithm is efficient and straightforward, with runtime O(n^2 log n log log n+ rho log n), where n^2 is the size of the input, and rho is the number of 2D repetitions in the output.

Cite as

Amihood Amir, Gad M. Landau, Shoshana Marcus, and Dina Sokol. Two-Dimensional Maximal Repetitions. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ESA.2018.2,
  author =	{Amir, Amihood and Landau, Gad M. and Marcus, Shoshana and Sokol, Dina},
  title =	{{Two-Dimensional Maximal Repetitions}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{2:1--2:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.2},
  URN =		{urn:nbn:de:0030-drops-94652},
  doi =		{10.4230/LIPIcs.ESA.2018.2},
  annote =	{Keywords: pattern matching algorithms, repetitions, periodicity, two-dimensional}
}
Document
Quasi-Periodicity Under Mismatch Errors

Authors: Amihood Amir, Avivit Levy, and Ely Porat

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
Tracing regularities plays a key role in data analysis for various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. Part of the scientific process is understanding and explaining these regularities. A common notion to describe regularity in a string T is a cover or quasi-period, which is a string C for which every letter of T lies within some occurrence of C. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper we initiate the study of quasi-periodicity persistence under mismatch errors, and our goal is to characterize situations where a given quasi-periodic string remains quasi-periodic even after substitution errors have been introduced to the string. Our study results in proving necessary conditions as well as a theorem stating sufficient conditions for quasi-periodicity persistence. As an application, we are able to close the gap in understanding the complexity of Approximate Cover Problem (ACP) relaxations studied by [Amir 2017a, Amir 2017b] and solve an open question.

Cite as

Amihood Amir, Avivit Levy, and Ely Porat. Quasi-Periodicity Under Mismatch Errors. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 4:1-4:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2018.4,
  author =	{Amir, Amihood and Levy, Avivit and Porat, Ely},
  title =	{{Quasi-Periodicity Under Mismatch Errors}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{4:1--4:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.4},
  URN =		{urn:nbn:de:0030-drops-87054},
  doi =		{10.4230/LIPIcs.CPM.2018.4},
  annote =	{Keywords: Periodicity, Quasi-Periodicity, Cover, Approximate Cover}
}
Document
Locally Maximal Common Factors as a Tool for Efficient Dynamic String Algorithms

Authors: Amihood Amir and Itai Boneh

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)


Abstract
There has been recent interest in dynamic string algorithms, i.e. string problems where the input changes dynamically. One such problem is the longest common factor (LCF) problem. It is well known that the LCF of two strings S and D of length n over a fixed constant-sized alphabet Sigma can be computed in time linear in n. Recently, a new challenge was introduced - finding the LCF of two strings in a dynamic setting. The problem is the fully dynamic one sided LCF (FDOS-LCF) problem. In the FDOS-LCF problem we get q consecutive queries of the form <i,alpha >, where each such query means: "replace D[i] by alpha, alpha in Sigma and output the LCF of S and (the updated) D. The goal is to initially preprocess S and D so that we do not need O(n) time to compute an LCF for each such query. The state-of-the-art is an algorithm that preprocesses the two strings S and D in time O(n log^4 n). Subsequently, the algorithm answers in time O(log^3 n) a single query of the form: Given a position i on D and a letter alpha, return an LCF of S and D', where D' is the string resulting from D after substituting D[i] with alpha. That algorithm is not extendable to multiple queries. In this paper we present a tool - Locally Maximal Common Factors (LMCF) - that proves to be quite useful in solving some restricted versions of the FDOS-LCF problem . The versions we solve are the Decremental FDOS-LCS problem, where every change <i,alpha> is of the form <i,omega>, omega !in Sigma, and the Periodic FDOS-LCS problem, where S is a periodic string with period length p. For the decremental problem we provide an algorithm with linear time preprocessing and O(log log n) time per query. For the periodic problem our preprocessing time is linear and the query time is O(p log log n).

Cite as

Amihood Amir and Itai Boneh. Locally Maximal Common Factors as a Tool for Efficient Dynamic String Algorithms. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 11:1-11:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2018.11,
  author =	{Amir, Amihood and Boneh, Itai},
  title =	{{Locally Maximal Common Factors as a Tool for Efficient Dynamic String Algorithms}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{11:1--11:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.11},
  URN =		{urn:nbn:de:0030-drops-86983},
  doi =		{10.4230/LIPIcs.CPM.2018.11},
  annote =	{Keywords: Dynamic Algorithms, Periodicity, Longest Common Factor, Priority Queue Data Structures, Suffix
Tree, Balanced Search Tree, Range Maximum Queries}
}
Document
Can We Recover the Cover?

Authors: Amihood Amir, Avivit Levy, Moshe Lewenstein, Ronit Lubin, and Benny Porat

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)


Abstract
Data analysis typically involves error recovery and detection of regularities as two different key tasks. In this paper we show that there are data types for which these two tasks can be powerfully combined. A common notion of regularity in strings is that of a cover. Data describing measures of a natural coverable phenomenon may be corrupted by errors caused by the measurement process, or by the inexact features of the phenomenon itself. Due to this reason, different variants of approximate covers have been introduced, some of which are NP-hard to compute. In this paper we assume that the Hamming distance metric measures the amount of corruption experienced, and study the problem of recovering the correct cover from data corrupted by mismatch errors, formally defined as the cover recovery problem (CRP). We show that for the Hamming distance metric, coverability is a powerful property allowing detecting the original cover and correcting the data, under suitable conditions. We also study a relaxation of another problem, which is called the approximate cover problem (ACP). Since the ACP is proved to be NP-hard [Amir,Levy,Lubin,Porat, CPM 2017], we study a relaxation, which we call the candidate-relaxation of the ACP, and show it has a polynomial time complexity. As a result, we get that the ACP also has a polynomial time complexity in many practical situations. An important application of our ACP relaxation study is also a polynomial time algorithm for the cover recovery problem (CRP).

Cite as

Amihood Amir, Avivit Levy, Moshe Lewenstein, Ronit Lubin, and Benny Porat. Can We Recover the Cover?. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 25:1-25:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2017.25,
  author =	{Amir, Amihood and Levy, Avivit and Lewenstein, Moshe and Lubin, Ronit and Porat, Benny},
  title =	{{Can We Recover the Cover?}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{25:1--25:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.25},
  URN =		{urn:nbn:de:0030-drops-73190},
  doi =		{10.4230/LIPIcs.CPM.2017.25},
  annote =	{Keywords: periodicity, quasi-periodicity, cover, approximate cover, data recovery}
}
Document
Approximate Cover of Strings

Authors: Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)


Abstract
Regularities in strings arise in various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. A common notion to describe regularity in a string T is a cover, which is a string C for which every letter of T lies within some occurrence of C. The alignment of the cover repetitions in the given text is called a tiling. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper, we use a new approach for handling errors in coverable phenomena and define the approximate cover problem (ACP), in which we are given a text that is a sequence of some cover repetitions with possible mismatch errors, and we seek a string that covers the text with the minimum number of errors. We first show that the ACP is NP-hard, by studying the cover-size relaxation of the ACP, in which the requested size of the approximate cover is also given with the input string. We show this relaxation is already NP-hard. We also study another two relaxations of the ACP, which we call the partial-tiling relaxation of the ACP and the full-tiling relaxation of the ACP, in which a tiling of the requested cover is also given with the input string. A given full tiling retains all the occurrences of the cover before the errors, while in a partial tiling there can be additional occurrences of the cover that are not marked by the tiling. We show that the partial-tiling relaxation has a polynomial time complexity and give experimental evidence that the full-tiling also has polynomial time complexity. The study of these relaxations, besides shedding another light on the complexity of the ACP, also involves a deep understanding of the properties of covers, yielding some key lemmas and observations that may be helpful for a future study of regularities in the presence of errors.

Cite as

Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat. Approximate Cover of Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2017.26,
  author =	{Amir, Amihood and Levy, Avivit and Lubin, Ronit and Porat, Ely},
  title =	{{Approximate Cover of Strings}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{26:1--26:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.26},
  URN =		{urn:nbn:de:0030-drops-73189},
  doi =		{10.4230/LIPIcs.CPM.2017.26},
  annote =	{Keywords: periodicity, quasi-periodicity, cover, approximate cover}
}
Document
Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap

Authors: Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom

Published in: LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)


Abstract
We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from D that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Online DMOG captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap. In this paper, we demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem, even in the offline setting, can be traced back to the infamous 3SUM conjecture. We show a conditional lower bound of Omega(delta(G_D)+op) time per text character, where G_D is a bipartite graph that captures the structure of D, delta(G_D) is the degeneracy of this graph, and op is the output size. Moreover, we show a conditional lower bound in terms of the magnitude of gaps for the bounded case, thereby showing that some known offline upper bounds are essentially optimal. We also provide matching upper-bounds (up to sub-polynomial factors), in terms of the degeneracy, for the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on delta(G_D). Our algorithms make use of graph orientations, together with some additional techniques. These algorithms are of practical interest since although delta(G_D) can be as large as sqrt(d), and even larger if G_D is a multi-graph, it is typically a very small constant in practice. Finally, when delta(G_D) is large we are able to obtain even more efficient solutions.

Cite as

Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom. Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 12:1-12:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ISAAC.2016.12,
  author =	{Amir, Amihood and Kopelowitz, Tsvi and Levy, Avivit and Pettie, Seth and Porat, Ely and Shalom, B. Riva},
  title =	{{Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap}},
  booktitle =	{27th International Symposium on Algorithms and Computation (ISAAC 2016)},
  pages =	{12:1--12:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-026-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{64},
  editor =	{Hong, Seok-Hee},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.12},
  URN =		{urn:nbn:de:0030-drops-67841},
  doi =		{10.4230/LIPIcs.ISAAC.2016.12},
  annote =	{Keywords: Pattern matching, Dictionary matching, 3SUM, Triangle reporting}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail