DROPS

Document

DOI: 10.4230/LIPIcs.CPM.2023.19

String Factorization via Prefix Free Families

Authors: Matan Kraus, Moshe Lewenstein, Alexandru Popa, Ely Porat, and Yonathan Sadia

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

A factorization of a string S is a partition of w into substrings u_1,… ,u_k such that S = u_1 u_2 ⋯ u_k. Such a partition is called equality-free if no two factors are equal: u_i ≠ u_j, ∀ i,j with i ≠ j. The maximum equality-free factorization problem is to find for a given string S, the largest integer k for which S admits an equality-free factorization with k factors. Equality-free factorizations have lately received attention because of their applications in DNA self-assembly. The best approximation algorithm known for the problem is the natural greedy algorithm, that chooses iteratively from left to right the shortest factor that does not appear before. This algorithm has a √n approximation ratio (SOFSEM 2020) and it is an open problem whether there is a better solution. Our main result is to show that the natural greedy algorithm is a Θ(n^{1/4}) approximation algorithm for the maximum equality-free factorization problem. Thus, we disprove one of the conjectures of Mincu and Popa (SOFSEM 2020) according to which the greedy algorithm is a Θ(√n) approximation. The most challenging part of the proof is to show that the greedy algorithm is an O(n^{1/4}) approximation. We obtain this algorithm via prefix free factor families, i.e. a set of non-overlapping factors of the string which are pairwise non-prefixes of each other. In the paper we show the relation between prefix free factor families and the maximum equality-free factorization. Moreover, as a byproduct we present another approximation algorithm that achieves an approximation ratio of O(n^{1/4}) that we believe is of independent interest and may lead to improved algorithms. We then show that the natural greedy algorithm has an approximation ratio that is Ω(n^{1/4}) via a clever analysis which shows that the greedy algorithm is Θ(n^{1/4}) for the maximum equality-free factorization problem.

Cite as

Matan Kraus, Moshe Lewenstein, Alexandru Popa, Ely Porat, and Yonathan Sadia. String Factorization via Prefix Free Families. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 19:1-19:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{kraus_et_al:LIPIcs.CPM.2023.19,
  author =	{Kraus, Matan and Lewenstein, Moshe and Popa, Alexandru and Porat, Ely and Sadia, Yonathan},
  title =	{{String Factorization via Prefix Free Families}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{19:1--19:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.19},
  URN =		{urn:nbn:de:0030-drops-179738},
  doi =		{10.4230/LIPIcs.CPM.2023.19},
  annote =	{Keywords: string factorization, NP-hard problem, approximation algorithm}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.10

Partial Permutations Comparison, Maintenance and Applications

Authors: Avivit Levy, Ely Porat, and B. Riva Shalom

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

This paper focuses on the concept of partial permutations and their use in algorithmic tasks. A partial permutation over Σ is a bijection π_{par}: Σ₁↦Σ₂ mapping a subset Σ₁ ⊂ Σ to a subset Σ₂ ⊂ Σ, where |Σ₁| = |Σ₂| (|Σ| denotes the size of a set Σ). Intuitively, two partial permutations agree if their mapping pairs do not form conflicts. This notion, which is formally defined in this paper, enables a consistent as well as informatively rich comparison between partial permutations. We formalize the Partial Permutations Agreement problem (PPA), as follows. Given two sets A₁, A₂ of partial permutations over alphabet Σ, each of size n, output all pairs (π_i, π_j), where π_i ∈ A₁, π_j ∈ A₂ and π_i agrees with π_j. The possibility of having a data structure for efficiently maintaining a dynamic set of partial permutations enabling to retrieve agreement of partial permutations is then studied, giving both negative and positive results. Applying our study enables to point out fruitful versus futile methods for efficient genes sequences comparison in database or automatic color transformation data augmentation technique for image processing through neural networks. It also shows that an efficient solution of strict Parameterized Dictionary Matching with One Gap (PDMOG) over general dictionary alphabets is not likely, unless the Strong Exponential Time Hypothesis (SETH) fails, thus negatively answering an open question posed lately.

Cite as

Avivit Levy, Ely Porat, and B. Riva Shalom. Partial Permutations Comparison, Maintenance and Applications. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{levy_et_al:LIPIcs.CPM.2022.10,
  author =	{Levy, Avivit and Porat, Ely and Shalom, B. Riva},
  title =	{{Partial Permutations Comparison, Maintenance and Applications}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{10:1--10:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.10},
  URN =		{urn:nbn:de:0030-drops-161376},
  doi =		{10.4230/LIPIcs.CPM.2022.10},
  annote =	{Keywords: Partial permutations, Partial words, Genes comparison, Color transformation, Dictionary matching with gaps, Parameterized matching, SETH hypothesis}
}

Document

DOI: 10.4230/LIPIcs.CPM.2022.18

The Dynamic k-Mismatch Problem

Authors: Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, and Przemysław Uznański

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

Abstract

The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length m and all length-m substrings of a given text of length n ≥ m. We focus on the well-studied k-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold k. Moreover, we assume n ≤ 2m (in general, one can partition the text into overlapping blocks). In this work, we develop data structures for the dynamic version of the k-mismatch problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, whereas a query, given an index i, returns the Hamming distance between the pattern and the text substring starting at position i, or reports that the distance exceeds k. First, we describe a simple data structure with 𝒪̃(1) update time and 𝒪̃(k) query time. Through considerably more sophisticated techniques, we show that 𝒪̃(k) update time and 𝒪̃(1) query time is also achievable. These two solutions likely provide an essentially optimal trade-off for the dynamic k-mismatch problem with m^{Ω(1)} ≤ k ≤ √m: we prove that, in that case, conditioned on the 3SUM conjecture, one cannot simultaneously achieve k^{1-Ω(1)} time for all operations (updates and queries) after n^{𝒪(1)}-time initialization. For k ≥ √m, the same lower bound excludes achieving m^{1/2-Ω(1)} time per operation. This is known to be essentially tight for constant-sized alphabets: already Clifford et al. (STACS 2018) achieved 𝒪̃(√m) time per operation in that case, but their solution for large alphabets costs 𝒪̃(m^{3/4}) time per operation. We improve and extend the latter result by developing a trade-off algorithm that, given a parameter 1 ≤ x ≤ k, achieves update time 𝒪̃(m/k +√{mk/x}) and query time 𝒪̃(x). In particular, for k ≥ √m, an appropriate choice of x yields 𝒪̃(∛{mk}) time per operation, which is 𝒪̃(m^{2/3}) when only the trivial threshold k = m is provided.

Cite as

Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, and Przemysław Uznański. The Dynamic k-Mismatch Problem. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 18:1-18:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{clifford_et_al:LIPIcs.CPM.2022.18,
  author =	{Clifford, Rapha\"{e}l and Gawrychowski, Pawe{\l} and Kociumaka, Tomasz and Martin, Daniel P. and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{The Dynamic k-Mismatch Problem}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{18:1--18:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.18},
  URN =		{urn:nbn:de:0030-drops-161454},
  doi =		{10.4230/LIPIcs.CPM.2022.18},
  annote =	{Keywords: Pattern matching, Hamming distance, dynamic algorithms}
}

Document

DOI: 10.4230/LIPIcs.ESA.2021.12

Incremental Edge Orientation in Forests

Authors: Michael A. Bender, Tsvi Kopelowitz, William Kuszmaul, Ely Porat, and Clifford Stein

Published in: LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

Abstract

For any forest G = (V, E) it is possible to orient the edges E so that no vertex in V has out-degree greater than 1. This paper considers the incremental edge-orientation problem, in which the edges E arrive over time and the algorithm must maintain a low-out-degree edge orientation at all times. We give an algorithm that maintains a maximum out-degree of 3 while flipping at most O(log log n) edge orientations per edge insertion, with high probability in n. The algorithm requires worst-case time O(log n log log n) per insertion, and takes amortized time O(1). The previous state of the art required up to O(log n / log log n) edge flips per insertion. We then apply our edge-orientation results to the problem of dynamic Cuckoo hashing. The problem of designing simple families ℋ of hash functions that are compatible with Cuckoo hashing has received extensive attention. These families ℋ are known to satisfy static guarantees, but do not come typically with dynamic guarantees for the running time of inserts and deletes. We show how to transform static guarantees (for 1-associativity) into near-state-of-the-art dynamic guarantees (for O(1)-associativity) in a black-box fashion. Rather than relying on the family ℋ to supply randomness, as in past work, we instead rely on randomness within our table-maintenance algorithm.

Cite as

Michael A. Bender, Tsvi Kopelowitz, William Kuszmaul, Ely Porat, and Clifford Stein. Incremental Edge Orientation in Forests. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bender_et_al:LIPIcs.ESA.2021.12,
  author =	{Bender, Michael A. and Kopelowitz, Tsvi and Kuszmaul, William and Porat, Ely and Stein, Clifford},
  title =	{{Incremental Edge Orientation in Forests}},
  booktitle =	{29th Annual European Symposium on Algorithms (ESA 2021)},
  pages =	{12:1--12:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-204-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{204},
  editor =	{Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.12},
  URN =		{urn:nbn:de:0030-drops-145933},
  doi =		{10.4230/LIPIcs.ESA.2021.12},
  annote =	{Keywords: edge orientation, graph algorithms, Cuckoo hashing, hash functions}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2021.21

Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing

Authors: Samuel McCauley

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Abstract

Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess n strings of length d, to quickly answer queries q of the form: if there is a database string within edit distance r of q, return a database string within edit distance cr of q. Previous approaches to this problem either rely on very large (superconstant) approximation ratios c, or very small search radii r. Outside of a narrow parameter range, these solutions are not competitive with trivially searching through all n strings. In this work we give a simple and easy-to-implement hash function that can quickly answer queries for a wide range of parameters. Specifically, our strategy can answer queries in time Õ(d3^rn^{1/c}). The best known practical results require c ≫ r to achieve any correctness guarantee; meanwhile, the best known theoretical results are very involved and difficult to implement, and require query time that can be loosely bounded below by 24^r. Our results significantly broaden the range of parameters for which there exist nontrivial theoretical bounds, while retaining the practicality of a locality-sensitive hash function.

Cite as

Samuel McCauley. Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 21:1-21:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{mccauley:LIPIcs.ICDT.2021.21,
  author =	{McCauley, Samuel},
  title =	{{Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{21:1--21:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.21},
  URN =		{urn:nbn:de:0030-drops-137299},
  doi =		{10.4230/LIPIcs.ICDT.2021.21},
  annote =	{Keywords: edit distance, approximate pattern matching, approximate nearest neighbor, similarity search, locality-sensitive hashing}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2020.46

Improved Circular k-Mismatch Sketches

Authors: Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, and Przemysław Uznański

Published in: LIPIcs, Volume 176, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)

Abstract

The shift distance sh(S₁,S₂) between two strings S₁ and S₂ of the same length is defined as the minimum Hamming distance between S₁ and any rotation (cyclic shift) of S₂. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings S₁ and S₂ of length n are given to two identical players (encoders), who independently compute sketches (summaries) sk(S₁) and sk(S₂), respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) sh(S₁,S₂) with high probability. This paper primarily focuses on the more general k-mismatch version of the problem, where the decoder is allowed to declare a failure if sh(S₁,S₂) > k, where k is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular k-mismatch sketches of size Õ(k+D(n)), where D(n) is the number of divisors of n. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches. We circumvent this lower bound by designing a (non-linear) exact circular k-mismatch sketch of size Õ(k); this size matches communication-complexity lower bounds. We also design (1± ε)-approximate circular k-mismatch sketch of size Õ(min(ε^{-2}√k, ε^{-1.5}√n)), which improves upon an Õ(ε^{-2}√n)-size sketch of Crouch and McGregor (APPROX'11).

Cite as

Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, and Przemysław Uznański. Improved Circular k-Mismatch Sketches. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 46:1-46:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.APPROX/RANDOM.2020.46,
  author =	{Golan, Shay and Kociumaka, Tomasz and Kopelowitz, Tsvi and Porat, Ely and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{Improved Circular k-Mismatch Sketches}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)},
  pages =	{46:1--46:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-164-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{176},
  editor =	{Byrka, Jaros{\l}aw and Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2020.46},
  URN =		{urn:nbn:de:0030-drops-126492},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2020.46},
  annote =	{Keywords: Hamming distance, k-mismatch, sketches, rotation, cyclic shift, communication complexity}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.15

The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time

Authors: Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

We revisit the k-mismatch problem in the streaming model on a pattern of length m and a streaming text of length n, both over a size-σ alphabet. The current state-of-the-art algorithm for the streaming k-mismatch problem, by Clifford et al. [SODA 2019], uses Õ(k) space and Õ(√k) worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is Õ(n√k), and the fastest known offline algorithm, which costs Õ(n + min(nk/√m, σn)) time. Moreover, it is not known whether improvements over the Õ(n√k) total time are possible when using more than O(k) space. We address these gaps by designing a randomized streaming algorithm for the k-mismatch problem that, given an integer parameter k≤s≤m, uses Õ(s) space and costs Õ(n+min(nk²/m, nk/√s, σnm/s)) total time. For s=m, the total runtime becomes Õ(n + min(nk/√m, σn)), which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still Õ(√k).

Cite as

Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.CPM.2020.15,
  author =	{Golan, Shay and Kociumaka, Tomasz and Kopelowitz, Tsvi and Porat, Ely},
  title =	{{The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.15},
  URN =		{urn:nbn:de:0030-drops-121406},
  doi =		{10.4230/LIPIcs.CPM.2020.15},
  annote =	{Keywords: Streaming pattern matching, Hamming distance, k-mismatch}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2019.7

On the Hardness of Set Disjointness and Set Intersection with Bounded Universe

Authors: Isaac Goldstein, Moshe Lewenstein, and Ely Porat

Published in: LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

Abstract

In the SetDisjointness problem, a collection of m sets S_1,S_2,...,S_m from some universe U is preprocessed in order to answer queries on the emptiness of the intersection of some two query sets from the collection. In the SetIntersection variant, all the elements in the intersection of the query sets are required to be reported. These are two fundamental problems that were considered in several papers from both the upper bound and lower bound perspective. Several conditional lower bounds for these problems were proven for the tradeoff between preprocessing and query time or the tradeoff between space and query time. Moreover, there are several unconditional hardness results for these problems in some specific computational models. The fundamental nature of the SetDisjointness and SetIntersection problems makes them useful for proving the conditional hardness of other problems from various areas. However, the universe of the elements in the sets may be very large, which may cause the reduction to some other problems to be inefficient and therefore it is not useful for proving their conditional hardness. In this paper, we prove the conditional hardness of SetDisjointness and SetIntersection with bounded universe. This conditional hardness is shown for both the interplay between preprocessing and query time and the interplay between space and query time. Moreover, we present several applications of these new conditional lower bounds. These applications demonstrates the strength of our new conditional lower bounds as they exploit the limited universe size. We believe that this new framework of conditional lower bounds with bounded universe can be useful for further significant applications.

Cite as

Isaac Goldstein, Moshe Lewenstein, and Ely Porat. On the Hardness of Set Disjointness and Set Intersection with Bounded Universe. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 7:1-7:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{goldstein_et_al:LIPIcs.ISAAC.2019.7,
  author =	{Goldstein, Isaac and Lewenstein, Moshe and Porat, Ely},
  title =	{{On the Hardness of Set Disjointness and Set Intersection with Bounded Universe}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{7:1--7:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Lu, Pinyan and Zhang, Guochuan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.7},
  URN =		{urn:nbn:de:0030-drops-115036},
  doi =		{10.4230/LIPIcs.ISAAC.2019.7},
  annote =	{Keywords: set disjointness, set intersection, 3SUM, space-time tradeoff, conditional lower bounds}
}

Document

DOI: 10.4230/LIPIcs.ESA.2018.37

Improved Space-Time Tradeoffs for kSUM

Authors: Isaac Goldstein, Moshe Lewenstein, and Ely Porat

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)

Abstract

In the kSUM problem we are given an array of numbers a_1,a_2,...,a_n and we are required to determine if there are k different elements in this array such that their sum is 0. This problem is a parameterized version of the well-studied SUBSET-SUM problem, and a special case is the 3SUM problem that is extensively used for proving conditional hardness. Several works investigated the interplay between time and space in the context of SUBSET-SUM. Recently, improved time-space tradeoffs were proven for kSUM using both randomized and deterministic algorithms. In this paper we obtain an improvement over the best known results for the time-space tradeoff for kSUM. A major ingredient in achieving these results is a general self-reduction from kSUM to mSUM where m<k, and several useful observations that enable this reduction and its implications. The main results we prove in this paper include the following: (i) The best known Las Vegas solution to kSUM running in approximately O(n^{k-delta sqrt{2k}}) time and using O(n^{delta}) space, for 0 <= delta <= 1. (ii) The best known deterministic solution to kSUM running in approximately O(n^{k-delta sqrt{k}}) time and using O(n^{delta}) space, for 0 <= delta <= 1. (iii) A space-time tradeoff for solving kSUM using O(n^{delta}) space, for delta>1. (iv) An algorithm for 6SUM running in O(n^4) time using just O(n^{2/3}) space. (v) A solution to 3SUM on random input using O(n^2) time and O(n^{1/3}) space, under the assumption of a random read-only access to random bits.

Cite as

Isaac Goldstein, Moshe Lewenstein, and Ely Porat. Improved Space-Time Tradeoffs for kSUM. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 37:1-37:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{goldstein_et_al:LIPIcs.ESA.2018.37,
  author =	{Goldstein, Isaac and Lewenstein, Moshe and Porat, Ely},
  title =	{{Improved Space-Time Tradeoffs for kSUM}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{37:1--37:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.37},
  URN =		{urn:nbn:de:0030-drops-95000},
  doi =		{10.4230/LIPIcs.ESA.2018.37},
  annote =	{Keywords: kSUM, space-time tradeoff, self-reduction}
}

Document

DOI: 10.4230/LIPIcs.ICALP.2018.65

Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams

Authors: Shay Golan, Tsvi Kopelowitz, and Ely Porat

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

Recently, there has been a growing focus in solving approximate pattern matching problems in the streaming model. Of particular interest are the pattern matching with k-mismatches (KMM) problem and the pattern matching with w-wildcards (PMWC) problem. Motivated by reductions from these problems in the streaming model to the dictionary matching problem, this paper focuses on designing algorithms for the dictionary matching problem in the multi-stream model where there are several independent streams of data (as opposed to just one in the streaming model), and the memory complexity of an algorithm is expressed using two quantities: (1) a read-only shared memory storage area which is shared among all the streams, and (2) local stream memory that each stream stores separately. In the dictionary matching problem in the multi-stream model the goal is to preprocess a dictionary D={P_1,P_2,...,P_d} of d=|D| patterns (strings with maximum length m over alphabet Sigma) into a data structure stored in shared memory, so that given multiple independent streaming texts (where characters arrive one at a time) the algorithm reports occurrences of patterns from D in each one of the texts as soon as they appear. We design two efficient algorithms for the dictionary matching problem in the multi-stream model. The first algorithm works when all the patterns in D have the same length m and costs O(d log m) words in shared memory, O(log m log d) words in stream memory, and O(log m) time per character. The second algorithm works for general D, but the time cost per character becomes O(log m+log d log log d). We also demonstrate the usefulness of our first algorithm in solving both the KMM problem and PMWC problem in the streaming model. In particular, we obtain the first almost optimal (up to poly-log factors) algorithm for the PMWC problem in the streaming model. We also design a new algorithm for the KMM problem in the streaming model that, up to poly-log factors, has the same bounds as the most recent results that use different techniques. Moreover, for most inputs, our algorithm for KMM is significantly faster on average.

Cite as

Shay Golan, Tsvi Kopelowitz, and Ely Porat. Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 65:1-65:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.ICALP.2018.65,
  author =	{Golan, Shay and Kopelowitz, Tsvi and Porat, Ely},
  title =	{{Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{65:1--65:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.65},
  URN =		{urn:nbn:de:0030-drops-90690},
  doi =		{10.4230/LIPIcs.ICALP.2018.65},
  annote =	{Keywords: Streaming approximate pattern matching, Dictionary matching}
}

Document

DOI: 10.4230/LIPIcs.CPM.2018.4

Quasi-Periodicity Under Mismatch Errors

Authors: Amihood Amir, Avivit Levy, and Ely Porat

Published in: LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Abstract

Tracing regularities plays a key role in data analysis for various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. Part of the scientific process is understanding and explaining these regularities. A common notion to describe regularity in a string T is a cover or quasi-period, which is a string C for which every letter of T lies within some occurrence of C. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper we initiate the study of quasi-periodicity persistence under mismatch errors, and our goal is to characterize situations where a given quasi-periodic string remains quasi-periodic even after substitution errors have been introduced to the string. Our study results in proving necessary conditions as well as a theorem stating sufficient conditions for quasi-periodicity persistence. As an application, we are able to close the gap in understanding the complexity of Approximate Cover Problem (ACP) relaxations studied by [Amir 2017a, Amir 2017b] and solve an open question.

Cite as

Amihood Amir, Avivit Levy, and Ely Porat. Quasi-Periodicity Under Mismatch Errors. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 4:1-4:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2018.4,
  author =	{Amir, Amihood and Levy, Avivit and Porat, Ely},
  title =	{{Quasi-Periodicity Under Mismatch Errors}},
  booktitle =	{29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
  pages =	{4:1--4:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Navarro, Gonzalo and Sankoff, David and Zhu, Binhai},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.4},
  URN =		{urn:nbn:de:0030-drops-87054},
  doi =		{10.4230/LIPIcs.CPM.2018.4},
  annote =	{Keywords: Periodicity, Quasi-Periodicity, Cover, Approximate Cover}
}

Document

DOI: 10.4230/OASIcs.SOSA.2018.10

A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance

Authors: Tsvi Kopelowitz and Ely Porat

Published in: OASIcs, Volume 61, 1st Symposium on Simplicity in Algorithms (SOSA 2018)

Abstract

The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n, both over a general alphabet \Sigma, is one of the most fundamental algorithmic tasks in string algorithms. The fastest known runtime for exact computation is \tilde O(n\sqrt m). We recently introduced a complicated randomized algorithm for obtaining a (1 +/- eps) approximation for each location in the text in O( (n/eps) log(1/eps) log n log m log |\Sigma|) total time, breaking a barrier that stood for 22 years. In this paper, we introduce an elementary and simple randomized algorithm that takes O((n/eps) log n log m) time.

Cite as

Tsvi Kopelowitz and Ely Porat. A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance. In 1st Symposium on Simplicity in Algorithms (SOSA 2018). Open Access Series in Informatics (OASIcs), Volume 61, pp. 10:1-10:5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{kopelowitz_et_al:OASIcs.SOSA.2018.10,
  author =	{Kopelowitz, Tsvi and Porat, Ely},
  title =	{{A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance}},
  booktitle =	{1st Symposium on Simplicity in Algorithms (SOSA 2018)},
  pages =	{10:1--10:5},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-064-4},
  ISSN =	{2190-6807},
  year =	{2018},
  volume =	{61},
  editor =	{Seidel, Raimund},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SOSA.2018.10},
  URN =		{urn:nbn:de:0030-drops-83089},
  doi =		{10.4230/OASIcs.SOSA.2018.10},
  annote =	{Keywords: Pattern Matching, Hamming Distance, Approximation Algorithms}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2017.40

Orthogonal Vectors Indexing

Authors: Isaac Goldstein, Moshe Lewenstein, and Ely Porat

Published in: LIPIcs, Volume 92, 28th International Symposium on Algorithms and Computation (ISAAC 2017)

Abstract

In the recent years, intensive research work has been dedicated to prove conditional lower bounds in order to reveal the inner structure of the class P. These conditional lower bounds are based on many popular conjectures on well-studied problems. One of the most heavily used conjectures is the celebrated Strong Exponential Time Hypothesis (SETH). It turns out that conditional hardness proved based on SETH goes, in many cases, through an intermediate problem - the Orthogonal Vectors (OV) problem. Almost all research work regarding conditional lower bound was concentrated on time complexity. Very little attention was directed toward space complexity. In a recent work, Goldstein et al.[WADS '17] set the stage for proving conditional lower bounds regarding space and its interplay with time. In this spirit, it is tempting to investigate the space complexity of a data structure variant of OV which is called OV indexing. In this problem n boolean vectors of size clogn are given for preprocessing. As a query, a vector v is given and we are required to verify if there is an input vector that is orthogonal to it or not. This OV indexing problem is interesting in its own, but it also likely to have strong implications on problems known to be conditionally hard, in terms of time complexity, based on OV. Having this in mind, we study OV indexing in this paper from many aspects. We give some space-efficient algorithms for the problem, show a tradeoff between space and query time, describe how to solve its reporting variant, shed light on an interesting connection between this problem and the well-studied SetDisjointness problem and demonstrate how it can be solved more efficiently on random input.

Cite as

Isaac Goldstein, Moshe Lewenstein, and Ely Porat. Orthogonal Vectors Indexing. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 40:1-40:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{goldstein_et_al:LIPIcs.ISAAC.2017.40,
  author =	{Goldstein, Isaac and Lewenstein, Moshe and Porat, Ely},
  title =	{{Orthogonal Vectors Indexing}},
  booktitle =	{28th International Symposium on Algorithms and Computation (ISAAC 2017)},
  pages =	{40:1--40:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-054-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{92},
  editor =	{Okamoto, Yoshio and Tokuyama, Takeshi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2017.40},
  URN =		{urn:nbn:de:0030-drops-82395},
  doi =		{10.4230/LIPIcs.ISAAC.2017.40},
  annote =	{Keywords: SETH, orthogonal vectors, space complexity}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2017.51

Simultaneously Load Balancing for Every p-norm, With Reassignments

Authors: Aaron Bernstein, Tsvi Kopelowitz, Seth Pettie, Ely Porat, and Clifford Stein

Published in: LIPIcs, Volume 67, 8th Innovations in Theoretical Computer Science Conference (ITCS 2017)

Abstract

This paper investigates the task of load balancing where the objective function is to minimize the p-norm of loads, for p\geq 1, in both static and incremental settings. We consider two closely related load balancing problems. In the bipartite matching problem we are given a bipartite graph G=(C\cup S, E) and the goal is to assign each client c\in C to a server s\in S so that the p-norm of assignment loads on S is minimized. In the graph orientation problem the goal is to orient (direct) the edges of a given undirected graph while minimizing the p-norm of the out-degrees. The graph orientation problem is a special case of the bipartite matching problem, but less complex, which leads to simpler algorithms. For the graph orientation problem we show that the celebrated Chiba-Nishizeki peeling algorithm provides a simple linear time load balancing scheme whose output is an orientation that is 2-competitive, in a p-norm sense, for all p\geq 1. For the bipartite matching problem we first provide an offline algorithm that computes an optimal assignment. We then extend this solution to the online bipartite matching problem with reassignments, where vertices from C arrive in an online fashion together with their corresponding edges, and we are allowed to reassign an amortized O(1) vertices from C each time a new vertex arrives. In this online scenario we show how to maintain a single assignment that is 8-competitive, in a p-norm sense, for all p\geq 1.

Cite as

Aaron Bernstein, Tsvi Kopelowitz, Seth Pettie, Ely Porat, and Clifford Stein. Simultaneously Load Balancing for Every p-norm, With Reassignments. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 67, pp. 51:1-51:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{bernstein_et_al:LIPIcs.ITCS.2017.51,
  author =	{Bernstein, Aaron and Kopelowitz, Tsvi and Pettie, Seth and Porat, Ely and Stein, Clifford},
  title =	{{Simultaneously Load Balancing for Every p-norm, With Reassignments}},
  booktitle =	{8th Innovations in Theoretical Computer Science Conference (ITCS 2017)},
  pages =	{51:1--51:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-029-3},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{67},
  editor =	{Papadimitriou, Christos H.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2017.51},
  URN =		{urn:nbn:de:0030-drops-82009},
  doi =		{10.4230/LIPIcs.ITCS.2017.51},
  annote =	{Keywords: Online Matching, Graph Orientation, Minmizing the p-norm}
}

Document

DOI: 10.4230/LIPIcs.ESA.2017.41

Real-Time Streaming Multi-Pattern Search for Constant Alphabet

Authors: Shay Golan and Ely Porat

Published in: LIPIcs, Volume 87, 25th Annual European Symposium on Algorithms (ESA 2017)

Abstract

In the streaming multi-pattern search problem, which is also known as the streaming dictionary matching problem, a set D={P_1,P_2, . . . ,P_d} of d patterns (strings over an alphabet Sigma), called the dictionary, is given to be preprocessed. Then, a text T arrives one character at a time and the goal is to report, before the next character arrives, the longest pattern in the dictionary that is a current suffix of T. We prove that for a constant size alphabet, there exists a randomized Monte-Carlo algorithm for the streaming dictionary matching problem that takes constant time per character and uses O(d log m) words of space, where m is the length of the longest pattern in the dictionary. In the case where the alphabet size is not constant, we introduce two new randomized Monte-Carlo algorithms with the following complexities: * O(log log |Sigma|) time per character in the worst case and O(d log m) words of space. * O(1/epsilon) time per character in the worst case and O(d |\Sigma|^epsilon log m/epsilon) words of space for any 0<epsilon<= 1. These results improve upon the algorithm of [Clifford et al., ESA'15] which uses O(d log m) words of space and takes O(log log (m+d)) time per character.

Cite as

Shay Golan and Ely Porat. Real-Time Streaming Multi-Pattern Search for Constant Alphabet. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 41:1-41:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.ESA.2017.41,
  author =	{Golan, Shay and Porat, Ely},
  title =	{{Real-Time Streaming Multi-Pattern Search for Constant Alphabet}},
  booktitle =	{25th Annual European Symposium on Algorithms (ESA 2017)},
  pages =	{41:1--41:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-049-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{87},
  editor =	{Pruhs, Kirk and Sohler, Christian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2017.41},
  URN =		{urn:nbn:de:0030-drops-78550},
  doi =		{10.4230/LIPIcs.ESA.2017.41},
  annote =	{Keywords: multi-pattern, dictionary, streaming pattern matching, fingerprints}
}

27 Search Results for "Porat, Ely"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message