DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2024.31

String 2-Covers with No Length Restrictions

Authors: Itai Boneh, Shay Golan, and Arseny Shur

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)

Abstract

A λ-cover of a string S is a set of strings {C_i}₁^λ such that every index in S is contained in an occurrence of at least one string C_i. The existence of a 1-cover defines a well-known class of quasi-periodic strings. Quasi-periodicity can be decided in linear time, and all 1-covers of a string can be reported in linear time as well. Since in general it is NP-complete to decide whether a string has a λ-cover, the natural next step is the development of efficient algorithms for 2-covers. Radoszewski and Straszyński [ESA 2020] analysed the particular case where the strings in a 2-cover must be of the same length. They provided an algorithm that reports all such 2-covers of S in time near-linear in |S| and in the size of the output. In this work, we consider 2-covers in full generality. Since every length-n string has Ω(n²) trivial 2-covers (every prefix and suffix of total length at least n constitute such a 2-cover), we state the reporting problem as follows: given a string S and a number m, report all 2-covers {C₁,C₂} of S with length |C₁|+|C₂| upper bounded by m. We present an Õ(n + output) time algorithm solving this problem, with output being the size of the output. This algorithm admits a simpler modification that finds a 2-cover of minimum length. We also provide an Õ(n) time construction of a 2-cover oracle which, given two substrings C₁,C₂ of S, reports in poly-logarithmic time whether {C₁,C₂} is a 2-cover of S.

Cite as

Itai Boneh, Shay Golan, and Arseny Shur. String 2-Covers with No Length Restrictions. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 31:1-31:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.ESA.2024.31,
  author =	{Boneh, Itai and Golan, Shay and Shur, Arseny},
  title =	{{String 2-Covers with No Length Restrictions}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{31:1--31:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.31},
  URN =		{urn:nbn:de:0030-drops-211029},
  doi =		{10.4230/LIPIcs.ESA.2024.31},
  annote =	{Keywords: Quasi-periodicity, String cover, Range query, Range stabbing}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.30

Õptimal Dynamic Time Warping on Run-Length Encoded Strings

Authors: Itai Boneh, Shay Golan, Shay Mozes, and Oren Weimann

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

Dynamic Time Warping (DTW) distance is the optimal cost of matching two strings when extending runs of letters is for free. Therefore, it is natural to measure the time complexity of DTW in terms of the number of runs n (rather than the string lengths N). In this paper, we give an Õ(n²) time algorithm for computing the DTW distance. This matches (up to log factors) the known (conditional) lower bound, and should be compared with the previous fastest O(n³) time exact algorithm and the Õ(n²) time approximation algorithm. Our method also immediately implies an Õ(nk) time algorithm when the distance is bounded by k. This should be compared with the previous fastest O(n²k) and O(Nk) time exact algorithms and the Õ(nk) time approximation algorithm.

Cite as

Itai Boneh, Shay Golan, Shay Mozes, and Oren Weimann. Õptimal Dynamic Time Warping on Run-Length Encoded Strings. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 30:1-30:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.ICALP.2024.30,
  author =	{Boneh, Itai and Golan, Shay and Mozes, Shay and Weimann, Oren},
  title =	{{\~{O}ptimal Dynamic Time Warping on Run-Length Encoded Strings}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{30:1--30:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.30},
  URN =		{urn:nbn:de:0030-drops-201730},
  doi =		{10.4230/LIPIcs.ICALP.2024.30},
  annote =	{Keywords: Dynamic time warping, Fr\'{e}chet distance, edit distance, run-length encoding}
}

Document

DOI: 10.4230/LIPIcs.CPM.2024.10

Searching 2D-Strings for Matching Frames

Authors: Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus, Adrian Miclăuş, and Arseny Shur

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

Abstract

We study a natural type of repetitions in 2-dimensional strings. Such a repetition, called a matching frame, is a rectangular substring of size at least 2× 2 with equal marginal rows and equal marginal columns. Matching frames first appeared in literature in the context of Wang tiles. We present two algorithms finding a matching frame with the maximum perimeter in a given n× m input string. The first algorithm solves the problem exactly in Õ(n^{2.5}) time (assuming n ≥ m). The second algorithm finds a (1-ε)-approximate solution in Õ((nm)/ε⁴) time, which is near linear in the size of the input for constant ε. In particular, by setting ε = O(1) the second algorithm decides the existence of a matching frame in a given string in Õ(nm) time. Some technical elements and structural properties used in these algorithms can be of independent interest.

Cite as

Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus, Adrian Miclăuş, and Arseny Shur. Searching 2D-Strings for Matching Frames. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 10:1-10:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2024.10,
  author =	{Boneh, Itai and Fried, Dvir and Golan, Shay and Kraus, Matan and Micl\u{a}u\c{s}, Adrian and Shur, Arseny},
  title =	{{Searching 2D-Strings for Matching Frames}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{10:1--10:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.10},
  URN =		{urn:nbn:de:0030-drops-201205},
  doi =		{10.4230/LIPIcs.CPM.2024.10},
  annote =	{Keywords: 2D string, matching frame, LCP, multidimensional range query}
}

Document

DOI: 10.4230/LIPIcs.CPM.2024.11

Hairpin Completion Distance Lower Bound

Authors: Itai Boneh, Dvir Fried, Shay Golan, and Matan Kraus

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

Abstract

Hairpin completion, derived from the hairpin formation observed in DNA biochemistry, is an operation applied to strings, particularly useful in DNA computing. Conceptually, a right hairpin completion operation transforms a string S into S⋅ S' where S' is the reverse complement of a prefix of S. Similarly, a left hairpin completion operation transforms a string S into S'⋅ S where S' is the reverse complement of a suffix of S. The hairpin completion distance from S to T is the minimum number of hairpin completion operations needed to transform S into T. Recently Boneh et al. [Itai Boneh et al., 2023] showed an O(n²) time algorithm for finding the hairpin completion distance between two strings of length at most n. In this paper we show that for any ε > 0 there is no O(n^{2-ε})-time algorithm for the hairpin completion distance problem unless the Strong Exponential Time Hypothesis (SETH) is false. Thus, under SETH, the time complexity of the hairpin completion distance problem is quadratic, up to sub-polynomial factors.

Cite as

Itai Boneh, Dvir Fried, Shay Golan, and Matan Kraus. Hairpin Completion Distance Lower Bound. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 11:1-11:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2024.11,
  author =	{Boneh, Itai and Fried, Dvir and Golan, Shay and Kraus, Matan},
  title =	{{Hairpin Completion Distance Lower Bound}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{11:1--11:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.11},
  URN =		{urn:nbn:de:0030-drops-201215},
  doi =		{10.4230/LIPIcs.CPM.2024.11},
  annote =	{Keywords: Fine-grained complexity, Hairpin completion, LCS}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2020.46

Improved Circular k-Mismatch Sketches

Authors: Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, and Przemysław Uznański

Published in: LIPIcs, Volume 176, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)

Abstract

The shift distance sh(S₁,S₂) between two strings S₁ and S₂ of the same length is defined as the minimum Hamming distance between S₁ and any rotation (cyclic shift) of S₂. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings S₁ and S₂ of length n are given to two identical players (encoders), who independently compute sketches (summaries) sk(S₁) and sk(S₂), respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) sh(S₁,S₂) with high probability. This paper primarily focuses on the more general k-mismatch version of the problem, where the decoder is allowed to declare a failure if sh(S₁,S₂) > k, where k is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular k-mismatch sketches of size Õ(k+D(n)), where D(n) is the number of divisors of n. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches. We circumvent this lower bound by designing a (non-linear) exact circular k-mismatch sketch of size Õ(k); this size matches communication-complexity lower bounds. We also design (1± ε)-approximate circular k-mismatch sketch of size Õ(min(ε^{-2}√k, ε^{-1.5}√n)), which improves upon an Õ(ε^{-2}√n)-size sketch of Crouch and McGregor (APPROX'11).

Cite as

Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, and Przemysław Uznański. Improved Circular k-Mismatch Sketches. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 46:1-46:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.APPROX/RANDOM.2020.46,
  author =	{Golan, Shay and Kociumaka, Tomasz and Kopelowitz, Tsvi and Porat, Ely and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{Improved Circular k-Mismatch Sketches}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)},
  pages =	{46:1--46:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-164-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{176},
  editor =	{Byrka, Jaros{\l}aw and Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2020.46},
  URN =		{urn:nbn:de:0030-drops-126492},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2020.46},
  annote =	{Keywords: Hamming distance, k-mismatch, sketches, rotation, cyclic shift, communication complexity}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.5

Time-Space Tradeoffs for Finding a Long Common Substring

Authors: Stav Ben-Nun, Shay Golan, Tomasz Kociumaka, and Matan Kraus

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

We consider the problem of finding, given two documents of total length n, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic 𝒪(n)-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require Θ(n) space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildhøj (CPM 2013) showed that for n^{2/3} ≤ s ≤ n, the LCS problem can be solved in 𝒪(s) space and 𝒪̃(n²/s) time. Kociumaka et al. (ESA 2014) generalized this tradeoff to 1 ≤ s ≤ n, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length L of the sought LCS is large. For 1 ≤ s ≤ n, we show that the LCS problem can be solved in 𝒪(s) space and 𝒪̃(n²/(L⋅s) +n) time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents.

Cite as

Stav Ben-Nun, Shay Golan, Tomasz Kociumaka, and Matan Kraus. Time-Space Tradeoffs for Finding a Long Common Substring. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 5:1-5:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bennun_et_al:LIPIcs.CPM.2020.5,
  author =	{Ben-Nun, Stav and Golan, Shay and Kociumaka, Tomasz and Kraus, Matan},
  title =	{{Time-Space Tradeoffs for Finding a Long Common Substring}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{5:1--5:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.5},
  URN =		{urn:nbn:de:0030-drops-121302},
  doi =		{10.4230/LIPIcs.CPM.2020.5},
  annote =	{Keywords: longest common substring, time-space tradeoff, local consistency, periodicity}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.15

The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time

Authors: Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

We revisit the k-mismatch problem in the streaming model on a pattern of length m and a streaming text of length n, both over a size-σ alphabet. The current state-of-the-art algorithm for the streaming k-mismatch problem, by Clifford et al. [SODA 2019], uses Õ(k) space and Õ(√k) worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is Õ(n√k), and the fastest known offline algorithm, which costs Õ(n + min(nk/√m, σn)) time. Moreover, it is not known whether improvements over the Õ(n√k) total time are possible when using more than O(k) space. We address these gaps by designing a randomized streaming algorithm for the k-mismatch problem that, given an integer parameter k≤s≤m, uses Õ(s) space and costs Õ(n+min(nk²/m, nk/√s, σnm/s)) total time. For s=m, the total runtime becomes Õ(n + min(nk/√m, σn)), which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still Õ(√k).

Cite as

Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.CPM.2020.15,
  author =	{Golan, Shay and Kociumaka, Tomasz and Kopelowitz, Tsvi and Porat, Ely},
  title =	{{The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.15},
  URN =		{urn:nbn:de:0030-drops-121406},
  doi =		{10.4230/LIPIcs.CPM.2020.15},
  annote =	{Keywords: Streaming pattern matching, Hamming distance, k-mismatch}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2018.65

Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams

Authors: Shay Golan, Tsvi Kopelowitz, and Ely Porat

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

Recently, there has been a growing focus in solving approximate pattern matching problems in the streaming model. Of particular interest are the pattern matching with k-mismatches (KMM) problem and the pattern matching with w-wildcards (PMWC) problem. Motivated by reductions from these problems in the streaming model to the dictionary matching problem, this paper focuses on designing algorithms for the dictionary matching problem in the multi-stream model where there are several independent streams of data (as opposed to just one in the streaming model), and the memory complexity of an algorithm is expressed using two quantities: (1) a read-only shared memory storage area which is shared among all the streams, and (2) local stream memory that each stream stores separately. In the dictionary matching problem in the multi-stream model the goal is to preprocess a dictionary D={P_1,P_2,...,P_d} of d=|D| patterns (strings with maximum length m over alphabet Sigma) into a data structure stored in shared memory, so that given multiple independent streaming texts (where characters arrive one at a time) the algorithm reports occurrences of patterns from D in each one of the texts as soon as they appear. We design two efficient algorithms for the dictionary matching problem in the multi-stream model. The first algorithm works when all the patterns in D have the same length m and costs O(d log m) words in shared memory, O(log m log d) words in stream memory, and O(log m) time per character. The second algorithm works for general D, but the time cost per character becomes O(log m+log d log log d). We also demonstrate the usefulness of our first algorithm in solving both the KMM problem and PMWC problem in the streaming model. In particular, we obtain the first almost optimal (up to poly-log factors) algorithm for the PMWC problem in the streaming model. We also design a new algorithm for the KMM problem in the streaming model that, up to poly-log factors, has the same bounds as the most recent results that use different techniques. Moreover, for most inputs, our algorithm for KMM is significantly faster on average.

Cite as

Shay Golan, Tsvi Kopelowitz, and Ely Porat. Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 65:1-65:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.ICALP.2018.65,
  author =	{Golan, Shay and Kopelowitz, Tsvi and Porat, Ely},
  title =	{{Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{65:1--65:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.65},
  URN =		{urn:nbn:de:0030-drops-90690},
  doi =		{10.4230/LIPIcs.ICALP.2018.65},
  annote =	{Keywords: Streaming approximate pattern matching, Dictionary matching}
}

Document

DOI: 10.4230/LIPIcs.ESA.2017.41

Real-Time Streaming Multi-Pattern Search for Constant Alphabet

Authors: Shay Golan and Ely Porat

Published in: LIPIcs, Volume 87, 25th Annual European Symposium on Algorithms (ESA 2017)

Abstract

In the streaming multi-pattern search problem, which is also known as the streaming dictionary matching problem, a set D={P_1,P_2, . . . ,P_d} of d patterns (strings over an alphabet Sigma), called the dictionary, is given to be preprocessed. Then, a text T arrives one character at a time and the goal is to report, before the next character arrives, the longest pattern in the dictionary that is a current suffix of T. We prove that for a constant size alphabet, there exists a randomized Monte-Carlo algorithm for the streaming dictionary matching problem that takes constant time per character and uses O(d log m) words of space, where m is the length of the longest pattern in the dictionary. In the case where the alphabet size is not constant, we introduce two new randomized Monte-Carlo algorithms with the following complexities: * O(log log |Sigma|) time per character in the worst case and O(d log m) words of space. * O(1/epsilon) time per character in the worst case and O(d |\Sigma|^epsilon log m/epsilon) words of space for any 0<epsilon<= 1. These results improve upon the algorithm of [Clifford et al., ESA'15] which uses O(d log m) words of space and takes O(log log (m+d)) time per character.

Cite as

Shay Golan and Ely Porat. Real-Time Streaming Multi-Pattern Search for Constant Alphabet. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 41:1-41:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.ESA.2017.41,
  author =	{Golan, Shay and Porat, Ely},
  title =	{{Real-Time Streaming Multi-Pattern Search for Constant Alphabet}},
  booktitle =	{25th Annual European Symposium on Algorithms (ESA 2017)},
  pages =	{41:1--41:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-049-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{87},
  editor =	{Pruhs, Kirk and Sohler, Christian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2017.41},
  URN =		{urn:nbn:de:0030-drops-78550},
  doi =		{10.4230/LIPIcs.ESA.2017.41},
  annote =	{Keywords: multi-pattern, dictionary, streaming pattern matching, fingerprints}
}

Document

DOI: 10.4230/LIPIcs.ESA.2016.44

Streaming Pattern Matching with d Wildcards

Authors: Shay Golan, Tsvi Kopelowitz, and Ely Porat

Published in: LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)

Abstract

In the pattern matching with d wildcards problem we are given a text T of length n and a pattern P of length m that contains d wildcard characters, each denoted by a special symbol '?'. A wildcard character matches any other character. The goal is to establish for each m-length substring of T whether it matches P. In the streaming model variant of the pattern matching with d wildcards problem the text T arrives one character at a time and the goal is to report, before the next character arrives, if the last m characters match P while using only o(m) words of space. In this paper we introduce two new algorithms for the d wildcard pattern matching problem in the streaming model. The first is a randomized Monte Carlo algorithm that is parameterized by a constant 0<=delta<=1. This algorithm uses ~O(d^{1-delta}) amortized time per character and ~O(d^{1+delta}) words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses O(d+log m) worst-case time per character and O(d log m) words of space.

Cite as

Shay Golan, Tsvi Kopelowitz, and Ely Porat. Streaming Pattern Matching with d Wildcards. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 44:1-44:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.ESA.2016.44,
  author =	{Golan, Shay and Kopelowitz, Tsvi and Porat, Ely},
  title =	{{Streaming Pattern Matching with d Wildcards}},
  booktitle =	{24th Annual European Symposium on Algorithms (ESA 2016)},
  pages =	{44:1--44:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-015-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{57},
  editor =	{Sankowski, Piotr and Zaroliagis, Christos},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.44},
  URN =		{urn:nbn:de:0030-drops-63561},
  doi =		{10.4230/LIPIcs.ESA.2016.44},
  annote =	{Keywords: wildcards, don't-cares, streaming pattern matching, fingerprints}
}

Search Results

Documents authored by Golan, Shay

String 2-Covers with No Length Restrictions

Abstract

Cite as

Õptimal Dynamic Time Warping on Run-Length Encoded Strings

Abstract

Cite as

Searching 2D-Strings for Matching Frames

Abstract

Cite as

Hairpin Completion Distance Lower Bound

Abstract

Cite as

Improved Circular k-Mismatch Sketches

Abstract

Cite as

Time-Space Tradeoffs for Finding a Long Common Substring

Abstract

Cite as

The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time

Abstract

Cite as

Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams

Abstract

Cite as

Real-Time Streaming Multi-Pattern Search for Constant Alphabet

Abstract

Cite as

Streaming Pattern Matching with d Wildcards

Abstract

Cite as

Thanks for your feedback!

Could not send message