Document

**Published in:** LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

A factorization of a string S is a partition of w into substrings u_1,… ,u_k such that S = u_1 u_2 ⋯ u_k. Such a partition is called equality-free if no two factors are equal: u_i ≠ u_j, ∀ i,j with i ≠ j. The maximum equality-free factorization problem is to find for a given string S, the largest integer k for which S admits an equality-free factorization with k factors.
Equality-free factorizations have lately received attention because of their applications in DNA self-assembly. The best approximation algorithm known for the problem is the natural greedy algorithm, that chooses iteratively from left to right the shortest factor that does not appear before. This algorithm has a √n approximation ratio (SOFSEM 2020) and it is an open problem whether there is a better solution.
Our main result is to show that the natural greedy algorithm is a Θ(n^{1/4}) approximation algorithm for the maximum equality-free factorization problem. Thus, we disprove one of the conjectures of Mincu and Popa (SOFSEM 2020) according to which the greedy algorithm is a Θ(√n) approximation.
The most challenging part of the proof is to show that the greedy algorithm is an O(n^{1/4}) approximation. We obtain this algorithm via prefix free factor families, i.e. a set of non-overlapping factors of the string which are pairwise non-prefixes of each other. In the paper we show the relation between prefix free factor families and the maximum equality-free factorization. Moreover, as a byproduct we present another approximation algorithm that achieves an approximation ratio of O(n^{1/4}) that we believe is of independent interest and may lead to improved algorithms. We then show that the natural greedy algorithm has an approximation ratio that is Ω(n^{1/4}) via a clever analysis which shows that the greedy algorithm is Θ(n^{1/4}) for the maximum equality-free factorization problem.

Matan Kraus, Moshe Lewenstein, Alexandru Popa, Ely Porat, and Yonathan Sadia. String Factorization via Prefix Free Families. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 19:1-19:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{kraus_et_al:LIPIcs.CPM.2023.19, author = {Kraus, Matan and Lewenstein, Moshe and Popa, Alexandru and Porat, Ely and Sadia, Yonathan}, title = {{String Factorization via Prefix Free Families}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {19:1--19:10}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.19}, URN = {urn:nbn:de:0030-drops-179738}, doi = {10.4230/LIPIcs.CPM.2023.19}, annote = {Keywords: string factorization, NP-hard problem, approximation algorithm} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

This paper focuses on the concept of partial permutations and their use in algorithmic tasks. A partial permutation over Σ is a bijection π_{par}: Σ₁↦Σ₂ mapping a subset Σ₁ ⊂ Σ to a subset Σ₂ ⊂ Σ, where |Σ₁| = |Σ₂| (|Σ| denotes the size of a set Σ). Intuitively, two partial permutations agree if their mapping pairs do not form conflicts. This notion, which is formally defined in this paper, enables a consistent as well as informatively rich comparison between partial permutations. We formalize the Partial Permutations Agreement problem (PPA), as follows. Given two sets A₁, A₂ of partial permutations over alphabet Σ, each of size n, output all pairs (π_i, π_j), where π_i ∈ A₁, π_j ∈ A₂ and π_i agrees with π_j. The possibility of having a data structure for efficiently maintaining a dynamic set of partial permutations enabling to retrieve agreement of partial permutations is then studied, giving both negative and positive results. Applying our study enables to point out fruitful versus futile methods for efficient genes sequences comparison in database or automatic color transformation data augmentation technique for image processing through neural networks. It also shows that an efficient solution of strict Parameterized Dictionary Matching with One Gap (PDMOG) over general dictionary alphabets is not likely, unless the Strong Exponential Time Hypothesis (SETH) fails, thus negatively answering an open question posed lately.

Avivit Levy, Ely Porat, and B. Riva Shalom. Partial Permutations Comparison, Maintenance and Applications. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{levy_et_al:LIPIcs.CPM.2022.10, author = {Levy, Avivit and Porat, Ely and Shalom, B. Riva}, title = {{Partial Permutations Comparison, Maintenance and Applications}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {10:1--10:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.10}, URN = {urn:nbn:de:0030-drops-161376}, doi = {10.4230/LIPIcs.CPM.2022.10}, annote = {Keywords: Partial permutations, Partial words, Genes comparison, Color transformation, Dictionary matching with gaps, Parameterized matching, SETH hypothesis} }

Document

**Published in:** LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

For any forest G = (V, E) it is possible to orient the edges E so that no vertex in V has out-degree greater than 1. This paper considers the incremental edge-orientation problem, in which the edges E arrive over time and the algorithm must maintain a low-out-degree edge orientation at all times. We give an algorithm that maintains a maximum out-degree of 3 while flipping at most O(log log n) edge orientations per edge insertion, with high probability in n. The algorithm requires worst-case time O(log n log log n) per insertion, and takes amortized time O(1). The previous state of the art required up to O(log n / log log n) edge flips per insertion.
We then apply our edge-orientation results to the problem of dynamic Cuckoo hashing. The problem of designing simple families ℋ of hash functions that are compatible with Cuckoo hashing has received extensive attention. These families ℋ are known to satisfy static guarantees, but do not come typically with dynamic guarantees for the running time of inserts and deletes. We show how to transform static guarantees (for 1-associativity) into near-state-of-the-art dynamic guarantees (for O(1)-associativity) in a black-box fashion. Rather than relying on the family ℋ to supply randomness, as in past work, we instead rely on randomness within our table-maintenance algorithm.

Michael A. Bender, Tsvi Kopelowitz, William Kuszmaul, Ely Porat, and Clifford Stein. Incremental Edge Orientation in Forests. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{bender_et_al:LIPIcs.ESA.2021.12, author = {Bender, Michael A. and Kopelowitz, Tsvi and Kuszmaul, William and Porat, Ely and Stein, Clifford}, title = {{Incremental Edge Orientation in Forests}}, booktitle = {29th Annual European Symposium on Algorithms (ESA 2021)}, pages = {12:1--12:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-204-4}, ISSN = {1868-8969}, year = {2021}, volume = {204}, editor = {Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.12}, URN = {urn:nbn:de:0030-drops-145933}, doi = {10.4230/LIPIcs.ESA.2021.12}, annote = {Keywords: edge orientation, graph algorithms, Cuckoo hashing, hash functions} }

Document

APPROX

**Published in:** LIPIcs, Volume 176, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)

The shift distance sh(S₁,S₂) between two strings S₁ and S₂ of the same length is defined as the minimum Hamming distance between S₁ and any rotation (cyclic shift) of S₂. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings S₁ and S₂ of length n are given to two identical players (encoders), who independently compute sketches (summaries) sk(S₁) and sk(S₂), respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) sh(S₁,S₂) with high probability.
This paper primarily focuses on the more general k-mismatch version of the problem, where the decoder is allowed to declare a failure if sh(S₁,S₂) > k, where k is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular k-mismatch sketches of size Õ(k+D(n)), where D(n) is the number of divisors of n. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches.
We circumvent this lower bound by designing a (non-linear) exact circular k-mismatch sketch of size Õ(k); this size matches communication-complexity lower bounds. We also design (1± ε)-approximate circular k-mismatch sketch of size Õ(min(ε^{-2}√k, ε^{-1.5}√n)), which improves upon an Õ(ε^{-2}√n)-size sketch of Crouch and McGregor (APPROX'11).

Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, and Przemysław Uznański. Improved Circular k-Mismatch Sketches. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 46:1-46:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.APPROX/RANDOM.2020.46, author = {Golan, Shay and Kociumaka, Tomasz and Kopelowitz, Tsvi and Porat, Ely and Uzna\'{n}ski, Przemys{\l}aw}, title = {{Improved Circular k-Mismatch Sketches}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)}, pages = {46:1--46:24}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-164-1}, ISSN = {1868-8969}, year = {2020}, volume = {176}, editor = {Byrka, Jaros{\l}aw and Meka, Raghu}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2020.46}, URN = {urn:nbn:de:0030-drops-126492}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2020.46}, annote = {Keywords: Hamming distance, k-mismatch, sketches, rotation, cyclic shift, communication complexity} }

Document

**Published in:** LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

We revisit the k-mismatch problem in the streaming model on a pattern of length m and a streaming text of length n, both over a size-σ alphabet. The current state-of-the-art algorithm for the streaming k-mismatch problem, by Clifford et al. [SODA 2019], uses Õ(k) space and Õ(√k) worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is Õ(n√k), and the fastest known offline algorithm, which costs Õ(n + min(nk/√m, σn)) time. Moreover, it is not known whether improvements over the Õ(n√k) total time are possible when using more than O(k) space.
We address these gaps by designing a randomized streaming algorithm for the k-mismatch problem that, given an integer parameter k≤s≤m, uses Õ(s) space and costs Õ(n+min(nk²/m, nk/√s, σnm/s)) total time. For s=m, the total runtime becomes Õ(n + min(nk/√m, σn)), which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still Õ(√k).

Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.CPM.2020.15, author = {Golan, Shay and Kociumaka, Tomasz and Kopelowitz, Tsvi and Porat, Ely}, title = {{The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time}}, booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)}, pages = {15:1--15:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-149-8}, ISSN = {1868-8969}, year = {2020}, volume = {161}, editor = {G{\o}rtz, Inge Li and Weimann, Oren}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.15}, URN = {urn:nbn:de:0030-drops-121406}, doi = {10.4230/LIPIcs.CPM.2020.15}, annote = {Keywords: Streaming pattern matching, Hamming distance, k-mismatch} }

Document

**Published in:** LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

In the SetDisjointness problem, a collection of m sets S_1,S_2,...,S_m from some universe U is preprocessed in order to answer queries on the emptiness of the intersection of some two query sets from the collection. In the SetIntersection variant, all the elements in the intersection of the query sets are required to be reported. These are two fundamental problems that were considered in several papers from both the upper bound and lower bound perspective.
Several conditional lower bounds for these problems were proven for the tradeoff between preprocessing and query time or the tradeoff between space and query time. Moreover, there are several unconditional hardness results for these problems in some specific computational models. The fundamental nature of the SetDisjointness and SetIntersection problems makes them useful for proving the conditional hardness of other problems from various areas. However, the universe of the elements in the sets may be very large, which may cause the reduction to some other problems to be inefficient and therefore it is not useful for proving their conditional hardness.
In this paper, we prove the conditional hardness of SetDisjointness and SetIntersection with bounded universe. This conditional hardness is shown for both the interplay between preprocessing and query time and the interplay between space and query time. Moreover, we present several applications of these new conditional lower bounds. These applications demonstrates the strength of our new conditional lower bounds as they exploit the limited universe size. We believe that this new framework of conditional lower bounds with bounded universe can be useful for further significant applications.

Isaac Goldstein, Moshe Lewenstein, and Ely Porat. On the Hardness of Set Disjointness and Set Intersection with Bounded Universe. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 7:1-7:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{goldstein_et_al:LIPIcs.ISAAC.2019.7, author = {Goldstein, Isaac and Lewenstein, Moshe and Porat, Ely}, title = {{On the Hardness of Set Disjointness and Set Intersection with Bounded Universe}}, booktitle = {30th International Symposium on Algorithms and Computation (ISAAC 2019)}, pages = {7:1--7:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-130-6}, ISSN = {1868-8969}, year = {2019}, volume = {149}, editor = {Lu, Pinyan and Zhang, Guochuan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.7}, URN = {urn:nbn:de:0030-drops-115036}, doi = {10.4230/LIPIcs.ISAAC.2019.7}, annote = {Keywords: set disjointness, set intersection, 3SUM, space-time tradeoff, conditional lower bounds} }

Document

**Published in:** LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)

In the kSUM problem we are given an array of numbers a_1,a_2,...,a_n and we are required to determine if there are k different elements in this array such that their sum is 0. This problem is a parameterized version of the well-studied SUBSET-SUM problem, and a special case is the 3SUM problem that is extensively used for proving conditional hardness. Several works investigated the interplay between time and space in the context of SUBSET-SUM. Recently, improved time-space tradeoffs were proven for kSUM using both randomized and deterministic algorithms.
In this paper we obtain an improvement over the best known results for the time-space tradeoff for kSUM. A major ingredient in achieving these results is a general self-reduction from kSUM to mSUM where m<k, and several useful observations that enable this reduction and its implications. The main results we prove in this paper include the following: (i) The best known Las Vegas solution to kSUM running in approximately O(n^{k-delta sqrt{2k}}) time and using O(n^{delta}) space, for 0 <= delta <= 1. (ii) The best known deterministic solution to kSUM running in approximately O(n^{k-delta sqrt{k}}) time and using O(n^{delta}) space, for 0 <= delta <= 1. (iii) A space-time tradeoff for solving kSUM using O(n^{delta}) space, for delta>1. (iv) An algorithm for 6SUM running in O(n^4) time using just O(n^{2/3}) space. (v) A solution to 3SUM on random input using O(n^2) time and O(n^{1/3}) space, under the assumption of a random read-only access to random bits.

Isaac Goldstein, Moshe Lewenstein, and Ely Porat. Improved Space-Time Tradeoffs for kSUM. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 37:1-37:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{goldstein_et_al:LIPIcs.ESA.2018.37, author = {Goldstein, Isaac and Lewenstein, Moshe and Porat, Ely}, title = {{Improved Space-Time Tradeoffs for kSUM}}, booktitle = {26th Annual European Symposium on Algorithms (ESA 2018)}, pages = {37:1--37:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-081-1}, ISSN = {1868-8969}, year = {2018}, volume = {112}, editor = {Azar, Yossi and Bast, Hannah and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.37}, URN = {urn:nbn:de:0030-drops-95000}, doi = {10.4230/LIPIcs.ESA.2018.37}, annote = {Keywords: kSUM, space-time tradeoff, self-reduction} }

Document

**Published in:** LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Recently, there has been a growing focus in solving approximate pattern matching problems in the streaming model. Of particular interest are the pattern matching with k-mismatches (KMM) problem and the pattern matching with w-wildcards (PMWC) problem. Motivated by reductions from these problems in the streaming model to the dictionary matching problem, this paper focuses on designing algorithms for the dictionary matching problem in the multi-stream model where there are several independent streams of data (as opposed to just one in the streaming model), and the memory complexity of an algorithm is expressed using two quantities: (1) a read-only shared memory storage area which is shared among all the streams, and (2) local stream memory that each stream stores separately.
In the dictionary matching problem in the multi-stream model the goal is to preprocess a dictionary D={P_1,P_2,...,P_d} of d=|D| patterns (strings with maximum length m over alphabet Sigma) into a data structure stored in shared memory, so that given multiple independent streaming texts (where characters arrive one at a time) the algorithm reports occurrences of patterns from D in each one of the texts as soon as they appear.
We design two efficient algorithms for the dictionary matching problem in the multi-stream model. The first algorithm works when all the patterns in D have the same length m and costs O(d log m) words in shared memory, O(log m log d) words in stream memory, and O(log m) time per character. The second algorithm works for general D, but the time cost per character becomes O(log m+log d log log d). We also demonstrate the usefulness of our first algorithm in solving both the KMM problem and PMWC problem in the streaming model. In particular, we obtain the first almost optimal (up to poly-log factors) algorithm for the PMWC problem in the streaming model. We also design a new algorithm for the KMM problem in the streaming model that, up to poly-log factors, has the same bounds as the most recent results that use different techniques. Moreover, for most inputs, our algorithm for KMM is significantly faster on average.

Shay Golan, Tsvi Kopelowitz, and Ely Porat. Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 65:1-65:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.ICALP.2018.65, author = {Golan, Shay and Kopelowitz, Tsvi and Porat, Ely}, title = {{Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams}}, booktitle = {45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)}, pages = {65:1--65:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-076-7}, ISSN = {1868-8969}, year = {2018}, volume = {107}, editor = {Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.65}, URN = {urn:nbn:de:0030-drops-90690}, doi = {10.4230/LIPIcs.ICALP.2018.65}, annote = {Keywords: Streaming approximate pattern matching, Dictionary matching} }

Document

**Published in:** LIPIcs, Volume 105, 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Tracing regularities plays a key role in data analysis for various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. Part of the scientific process is understanding and explaining these regularities. A common notion to describe regularity in a string T is a cover or quasi-period, which is a string C for which every letter of T lies within some occurrence of C. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper we initiate the study of quasi-periodicity persistence under mismatch errors, and our goal is to characterize situations where a given quasi-periodic string remains quasi-periodic even after substitution errors have been introduced to the string. Our study results in proving necessary conditions as well as a theorem stating sufficient conditions for quasi-periodicity persistence. As an application, we are able to close the gap in understanding the complexity of Approximate Cover Problem (ACP) relaxations studied by [Amir 2017a, Amir 2017b] and solve an open question.

Amihood Amir, Avivit Levy, and Ely Porat. Quasi-Periodicity Under Mismatch Errors. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 4:1-4:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2018.4, author = {Amir, Amihood and Levy, Avivit and Porat, Ely}, title = {{Quasi-Periodicity Under Mismatch Errors}}, booktitle = {29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018)}, pages = {4:1--4:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-074-3}, ISSN = {1868-8969}, year = {2018}, volume = {105}, editor = {Navarro, Gonzalo and Sankoff, David and Zhu, Binhai}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2018.4}, URN = {urn:nbn:de:0030-drops-87054}, doi = {10.4230/LIPIcs.CPM.2018.4}, annote = {Keywords: Periodicity, Quasi-Periodicity, Cover, Approximate Cover} }

Document

**Published in:** OASIcs, Volume 61, 1st Symposium on Simplicity in Algorithms (SOSA 2018)

The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n, both over a general alphabet \Sigma, is one of the most fundamental algorithmic tasks in string algorithms. The fastest known runtime for exact computation is \tilde O(n\sqrt m). We recently introduced a complicated randomized algorithm for obtaining a (1 +/- eps) approximation for each location in the text in O( (n/eps) log(1/eps) log n log m log |\Sigma|) total time, breaking a barrier that stood for 22 years. In this paper, we introduce an elementary and simple randomized algorithm that takes O((n/eps) log n log m) time.

Tsvi Kopelowitz and Ely Porat. A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance. In 1st Symposium on Simplicity in Algorithms (SOSA 2018). Open Access Series in Informatics (OASIcs), Volume 61, pp. 10:1-10:5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{kopelowitz_et_al:OASIcs.SOSA.2018.10, author = {Kopelowitz, Tsvi and Porat, Ely}, title = {{A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance}}, booktitle = {1st Symposium on Simplicity in Algorithms (SOSA 2018)}, pages = {10:1--10:5}, series = {Open Access Series in Informatics (OASIcs)}, ISBN = {978-3-95977-064-4}, ISSN = {2190-6807}, year = {2018}, volume = {61}, editor = {Seidel, Raimund}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SOSA.2018.10}, URN = {urn:nbn:de:0030-drops-83089}, doi = {10.4230/OASIcs.SOSA.2018.10}, annote = {Keywords: Pattern Matching, Hamming Distance, Approximation Algorithms} }

Document

**Published in:** LIPIcs, Volume 92, 28th International Symposium on Algorithms and Computation (ISAAC 2017)

In the recent years, intensive research work has been dedicated to prove conditional lower bounds in order to reveal the inner structure of the class P. These conditional lower bounds are based on many popular conjectures on well-studied problems. One of the most heavily used conjectures is the celebrated Strong Exponential Time Hypothesis (SETH). It turns out that conditional hardness proved based on SETH goes, in many cases, through an intermediate problem - the Orthogonal Vectors (OV) problem.
Almost all research work regarding conditional lower bound was concentrated on time complexity. Very little attention was directed toward space complexity. In a recent work, Goldstein et al.[WADS '17] set the stage for proving conditional lower bounds regarding space and its interplay with time. In this spirit, it is tempting to investigate the space complexity of a data structure variant of OV which is called OV indexing. In this problem n boolean vectors of size clogn are given for preprocessing. As a query, a vector v is given and we are required to verify if there is an input vector that is orthogonal to it or not.
This OV indexing problem is interesting in its own, but it also likely to have strong implications on problems known to be conditionally hard, in terms of time complexity, based on OV. Having this in mind, we study OV indexing in this paper from many aspects. We give some space-efficient algorithms for the problem, show a tradeoff between space and query time, describe how to solve its reporting variant, shed light on an interesting connection between this problem and the well-studied SetDisjointness problem and demonstrate how it can be solved more efficiently on random input.

Isaac Goldstein, Moshe Lewenstein, and Ely Porat. Orthogonal Vectors Indexing. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 40:1-40:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{goldstein_et_al:LIPIcs.ISAAC.2017.40, author = {Goldstein, Isaac and Lewenstein, Moshe and Porat, Ely}, title = {{Orthogonal Vectors Indexing}}, booktitle = {28th International Symposium on Algorithms and Computation (ISAAC 2017)}, pages = {40:1--40:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-054-5}, ISSN = {1868-8969}, year = {2017}, volume = {92}, editor = {Okamoto, Yoshio and Tokuyama, Takeshi}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2017.40}, URN = {urn:nbn:de:0030-drops-82395}, doi = {10.4230/LIPIcs.ISAAC.2017.40}, annote = {Keywords: SETH, orthogonal vectors, space complexity} }

Document

**Published in:** LIPIcs, Volume 67, 8th Innovations in Theoretical Computer Science Conference (ITCS 2017)

This paper investigates the task of load balancing where the objective function is to minimize the p-norm of loads, for p\geq 1, in both static and incremental settings. We consider two closely related load balancing problems. In the bipartite matching problem we are given a bipartite graph G=(C\cup S, E) and the goal is to assign each client c\in C to a server s\in S so that the p-norm of assignment loads on S is minimized.
In the graph orientation problem the goal is to orient (direct) the edges of a given undirected graph while minimizing the p-norm of the out-degrees. The graph orientation problem is a special case of the bipartite matching problem, but less complex, which leads to simpler algorithms.
For the graph orientation problem we show that the celebrated Chiba-Nishizeki peeling algorithm provides a simple linear time load balancing scheme whose output is an orientation that is 2-competitive, in a p-norm sense, for all p\geq 1. For the bipartite matching problem we first provide an offline algorithm that computes an optimal assignment. We then extend this solution to the online bipartite matching problem with reassignments, where vertices from C arrive in an online fashion together with their corresponding edges, and we are allowed to reassign an amortized O(1) vertices from C each time a new vertex arrives. In this online scenario we show how to maintain a single assignment that is 8-competitive, in a p-norm sense, for all p\geq 1.

Aaron Bernstein, Tsvi Kopelowitz, Seth Pettie, Ely Porat, and Clifford Stein. Simultaneously Load Balancing for Every p-norm, With Reassignments. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 67, pp. 51:1-51:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{bernstein_et_al:LIPIcs.ITCS.2017.51, author = {Bernstein, Aaron and Kopelowitz, Tsvi and Pettie, Seth and Porat, Ely and Stein, Clifford}, title = {{Simultaneously Load Balancing for Every p-norm, With Reassignments}}, booktitle = {8th Innovations in Theoretical Computer Science Conference (ITCS 2017)}, pages = {51:1--51:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-029-3}, ISSN = {1868-8969}, year = {2017}, volume = {67}, editor = {Papadimitriou, Christos H.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2017.51}, URN = {urn:nbn:de:0030-drops-82009}, doi = {10.4230/LIPIcs.ITCS.2017.51}, annote = {Keywords: Online Matching, Graph Orientation, Minmizing the p-norm} }

Document

**Published in:** LIPIcs, Volume 87, 25th Annual European Symposium on Algorithms (ESA 2017)

In the streaming multi-pattern search problem, which is also known as the streaming dictionary matching problem, a set D={P_1,P_2, . . . ,P_d} of d patterns (strings over an alphabet Sigma), called the dictionary, is given to be preprocessed. Then, a text T arrives one character at a time and the goal is to report, before the next character arrives, the longest pattern in the dictionary that is a current suffix of T. We prove that for a constant size alphabet, there exists a randomized Monte-Carlo algorithm for the streaming dictionary matching problem that takes constant time per character and uses O(d log m) words of space, where m is the length of the longest pattern in the dictionary. In the case where the alphabet size is not constant, we introduce two new randomized Monte-Carlo algorithms with the following complexities:
* O(log log |Sigma|) time per character in the worst case and O(d log m) words of space.
* O(1/epsilon) time per character in the worst case and O(d |\Sigma|^epsilon log m/epsilon) words of space for any 0<epsilon<= 1.
These results improve upon the algorithm of [Clifford et al., ESA'15] which uses O(d log m) words of space and takes O(log log (m+d)) time per character.

Shay Golan and Ely Porat. Real-Time Streaming Multi-Pattern Search for Constant Alphabet. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 41:1-41:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.ESA.2017.41, author = {Golan, Shay and Porat, Ely}, title = {{Real-Time Streaming Multi-Pattern Search for Constant Alphabet}}, booktitle = {25th Annual European Symposium on Algorithms (ESA 2017)}, pages = {41:1--41:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-049-1}, ISSN = {1868-8969}, year = {2017}, volume = {87}, editor = {Pruhs, Kirk and Sohler, Christian}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2017.41}, URN = {urn:nbn:de:0030-drops-78550}, doi = {10.4230/LIPIcs.ESA.2017.41}, annote = {Keywords: multi-pattern, dictionary, streaming pattern matching, fingerprints} }

Document

**Published in:** LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Regularities in strings arise in various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. A common notion to describe regularity in a string T is a cover, which is a string C for which every letter of T lies within some occurrence of C. The alignment of the cover repetitions in the given text is called a tiling. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper, we use a new approach for handling errors in coverable phenomena and define the approximate cover problem (ACP), in which we are given a text that is a sequence of some cover repetitions with possible mismatch errors, and we seek a string that covers the text with the minimum number of errors. We first show that the ACP is NP-hard, by studying the cover-size relaxation of the ACP, in which the requested size of the approximate cover is also given with the input string. We show this relaxation is already NP-hard. We also study another two relaxations of the ACP, which we call the partial-tiling relaxation of the ACP and the full-tiling relaxation of the ACP, in which a tiling of the requested cover is also given with the input string. A given full tiling retains all the occurrences of the cover before the errors, while in a partial tiling there can be additional occurrences of the cover that are not marked by the tiling. We show that the partial-tiling relaxation has a polynomial time complexity and give experimental evidence that the full-tiling also has polynomial time complexity. The study of these relaxations, besides shedding another light on the complexity of the ACP, also involves a deep understanding of the properties of covers, yielding some key lemmas and observations that may be helpful for a future study of regularities in the presence of errors.

Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat. Approximate Cover of Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2017.26, author = {Amir, Amihood and Levy, Avivit and Lubin, Ronit and Porat, Ely}, title = {{Approximate Cover of Strings}}, booktitle = {28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)}, pages = {26:1--26:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-039-2}, ISSN = {1868-8969}, year = {2017}, volume = {78}, editor = {K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.26}, URN = {urn:nbn:de:0030-drops-73189}, doi = {10.4230/LIPIcs.CPM.2017.26}, annote = {Keywords: periodicity, quasi-periodicity, cover, approximate cover} }

Document

**Published in:** LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)

We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from D that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Online DMOG captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap.
In this paper, we demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem, even in the offline setting, can be traced back to the infamous 3SUM conjecture. We show a conditional lower bound of Omega(delta(G_D)+op) time per text character, where G_D is a bipartite graph that captures the structure of D, delta(G_D) is the degeneracy of this graph, and op is the output size. Moreover, we show a conditional lower bound in terms of the magnitude of gaps for the bounded case, thereby showing that some known offline upper bounds are essentially optimal.
We also provide matching upper-bounds (up to sub-polynomial factors), in terms of the degeneracy, for the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on delta(G_D). Our algorithms make use of graph orientations, together with some additional techniques. These algorithms are of practical interest since although delta(G_D) can be as large as sqrt(d), and even larger if G_D is a multi-graph, it is typically a very small constant in practice. Finally, when delta(G_D) is large we are able to obtain even more efficient solutions.

Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom. Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 12:1-12:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.ISAAC.2016.12, author = {Amir, Amihood and Kopelowitz, Tsvi and Levy, Avivit and Pettie, Seth and Porat, Ely and Shalom, B. Riva}, title = {{Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap}}, booktitle = {27th International Symposium on Algorithms and Computation (ISAAC 2016)}, pages = {12:1--12:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-026-2}, ISSN = {1868-8969}, year = {2016}, volume = {64}, editor = {Hong, Seok-Hee}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.12}, URN = {urn:nbn:de:0030-drops-67841}, doi = {10.4230/LIPIcs.ISAAC.2016.12}, annote = {Keywords: Pattern matching, Dictionary matching, 3SUM, Triangle reporting} }

Document

**Published in:** LIPIcs, Volume 55, 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)

We consider distance labeling schemes for trees: given a tree with n nodes, label the nodes with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the distance in the tree between the two nodes.
A lower bound by Gavoille et al. [Gavoille et al., J. Alg., 2004] and an upper bound by Peleg [Peleg, J. Graph Theory, 2000] establish that labels must use Theta(log^2(n)) bits. Gavoille et al. [Gavoille et al., ESA, 2001] show that for very small approximate stretch, labels use Theta(log(n) log(log(n))) bits. Several other papers investigate various variants such as, for example, small distances in trees [Alstrup et al., SODA, 2003].
We improve the known upper and lower bounds of exact distance labeling by showing that 1/4*log^2(n) bits are needed and that 1/2*log^2(n) bits are sufficient. We also give (1 + epsilon)-stretch labeling schemes using Theta(log(n)) bits for constant epsilon > 0. (1 + epsilon)-stretch labeling schemes with polylogarithmic label size have previously been established for doubling dimension graphs by Talwar [Talwar, STOC, 2004].
In addition, we present matching upper and lower bounds for distance labeling for caterpillars, showing that labels must have size 2*log(n) - Theta(log(log(n))). For simple paths with k nodes and edge weights in [1,n], we show that labels must have size (k - 1)/k*log(n) + Theta(log(k)).

Stephen Alstrup, Inge Li Gørtz, Esben Bistrup Halvorsen, and Ely Porat. Distance Labeling Schemes for Trees. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 55, pp. 132:1-132:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{alstrup_et_al:LIPIcs.ICALP.2016.132, author = {Alstrup, Stephen and G{\o}rtz, Inge Li and Halvorsen, Esben Bistrup and Porat, Ely}, title = {{Distance Labeling Schemes for Trees}}, booktitle = {43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)}, pages = {132:1--132:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-013-2}, ISSN = {1868-8969}, year = {2016}, volume = {55}, editor = {Chatzigiannakis, Ioannis and Mitzenmacher, Michael and Rabani, Yuval and Sangiorgi, Davide}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2016.132}, URN = {urn:nbn:de:0030-drops-62661}, doi = {10.4230/LIPIcs.ICALP.2016.132}, annote = {Keywords: Distributed computing, Distance labeling, Graph theory, Routing, Trees} }

Document

**Published in:** LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)

A distance labeling scheme labels the n nodes of a graph with binary strings such that, given the labels of any two nodes, one can determine the distance in the graph between the two nodes by looking only at the labels. A D-preserving distance labeling scheme only returns precise distances between pairs of nodes that are at distance at least D from each other. In this paper we consider distance labeling schemes for the classical case of unweighted and undirected graphs.
We present a O(n/D * log^2(D)) bit D-preserving distance labeling scheme, improving the previous bound by Bollobás et al. [SIAM J. Discrete Math. 2005]. We also give an almost matching lower bound of Omega(n/D). With our D-preserving distance labeling scheme as a building block, we additionally achieve the following results:
1. We present the first distance labeling scheme of size o(n) for sparse graphs (and hence bounded degree graphs). This addresses an open problem by Gavoille et. al. [J. Algo. 2004], hereby separating the complexity from distance labeling in general graphs which require Omega(n) bits, Moon [Proc. of Glasgow Math. Association 1965].
2. For approximate r-additive labeling schemes, that return distances within an additive error of r we show a scheme of size
O(n/r * polylog(r*log(n))/log(n)) for r >= 2. This improves on the current best bound of O(n/r) by Alstrup et al. [SODA 2016] for sub-polynomial r, and is a generalization of a result by Gawrychowski et al. [arXiv preprint 2015] who showed this for r=2.

Stephen Alstrup, Søren Dahlgaard, Mathias Bæk Tejs Knudsen, and Ely Porat. Sublinear Distance Labeling. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{alstrup_et_al:LIPIcs.ESA.2016.5, author = {Alstrup, Stephen and Dahlgaard, S{\o}ren and Knudsen, Mathias B{\ae}k Tejs and Porat, Ely}, title = {{Sublinear Distance Labeling}}, booktitle = {24th Annual European Symposium on Algorithms (ESA 2016)}, pages = {5:1--5:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-015-6}, ISSN = {1868-8969}, year = {2016}, volume = {57}, editor = {Sankowski, Piotr and Zaroliagis, Christos}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.5}, URN = {urn:nbn:de:0030-drops-63479}, doi = {10.4230/LIPIcs.ESA.2016.5}, annote = {Keywords: Graph labeling schemes, Distance labeling, Graph theory, Sparse graphs} }

Document

**Published in:** LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)

In the pattern matching with d wildcards problem we are given a text T of length n and a pattern P of length m that contains d wildcard characters, each denoted by a special symbol '?'. A wildcard character matches any other character. The goal is to establish for each m-length substring of T whether it matches P. In the streaming model variant of the pattern matching with d wildcards problem the text T arrives one character at a time and the goal is to report, before the next character arrives, if the last m characters match P while using only o(m) words of space.
In this paper we introduce two new algorithms for the d wildcard pattern matching problem in the streaming model.
The first is a randomized Monte Carlo algorithm that is parameterized by a constant 0<=delta<=1. This algorithm uses ~O(d^{1-delta}) amortized time per character and ~O(d^{1+delta}) words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses O(d+log m) worst-case time per character and O(d log m) words of space.

Shay Golan, Tsvi Kopelowitz, and Ely Porat. Streaming Pattern Matching with d Wildcards. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 44:1-44:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.ESA.2016.44, author = {Golan, Shay and Kopelowitz, Tsvi and Porat, Ely}, title = {{Streaming Pattern Matching with d Wildcards}}, booktitle = {24th Annual European Symposium on Algorithms (ESA 2016)}, pages = {44:1--44:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-015-6}, ISSN = {1868-8969}, year = {2016}, volume = {57}, editor = {Sankowski, Piotr and Zaroliagis, Christos}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.44}, URN = {urn:nbn:de:0030-drops-63561}, doi = {10.4230/LIPIcs.ESA.2016.44}, annote = {Keywords: wildcards, don't-cares, streaming pattern matching, fingerprints} }

Document

**Published in:** LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)

In recent years much effort has been put into developing polynomial-time conditional lower bounds for algorithms and data structures in both static and dynamic settings. Along these lines we introduce a framework for proving conditional lower bounds based on the well-known 3SUM conjecture. Our framework creates a compact representation of an instance of the 3SUM problem using hashing and domain specific encoding. This compact representation admits false solutions to the original 3SUM problem instance which we reveal and eliminate until we find a true solution. In other words, from all witnesses (candidate solutions) we figure out if an honest one (a true solution) exists. This enumeration of witnesses is used to prove conditional lower bounds on reporting problems that generate all witnesses. In turn, these reporting problems are then reduced to various decision problems using special search data structures which are able to enumerate the witnesses while only using solutions to decision variants. Hence, 3SUM-hardness of the decision problems is deduced.
We utilize this framework to show conditional lower bounds for several variants of convolutions, matrix multiplication and string problems. Our framework uses a strong connection between all of these problems and the ability to find witnesses.
Specifically, we prove conditional lower bounds for computing partial outputs of convolutions and matrix multiplication for sparse inputs. These problems are inspired by the open question raised by Muthukrishnan 20 years ago. The lower bounds we show rule out the possibility (unless the 3SUM conjecture is false) that almost linear time solutions to sparse input-output convolutions or matrix multiplications exist. This is in contrast to standard convolutions and matrix multiplications that have, or assumed to have, almost linear solutions.
Moreover, we improve upon the conditional lower bounds of Amir et al. for histogram indexing, a problem that has been of much interest recently. The conditional lower bounds we show apply for both reporting and decision variants. For the well-studied decision variant, we show a full tradeoff between preprocessing and query time for every alphabet size > 2. At an extreme, this implies that no solution to this problem exists with subquadratic preprocessing time and ~O(1) query time for every alphabet size > 2, unless the 3SUM conjecture is false. This is in contrast to a recent result by Chan and Lewenstein for a binary alphabet.
While these specific applications are used to demonstrate the techniques of our framework, we believe that this novel framework is useful for many other problems as well.

Isaac Goldstein, Tsvi Kopelowitz, Moshe Lewenstein, and Ely Porat. How Hard is it to Find (Honest) Witnesses?. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 45:1-45:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{goldstein_et_al:LIPIcs.ESA.2016.45, author = {Goldstein, Isaac and Kopelowitz, Tsvi and Lewenstein, Moshe and Porat, Ely}, title = {{How Hard is it to Find (Honest) Witnesses?}}, booktitle = {24th Annual European Symposium on Algorithms (ESA 2016)}, pages = {45:1--45:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-015-6}, ISSN = {1868-8969}, year = {2016}, volume = {57}, editor = {Sankowski, Piotr and Zaroliagis, Christos}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.45}, URN = {urn:nbn:de:0030-drops-63575}, doi = {10.4230/LIPIcs.ESA.2016.45}, annote = {Keywords: 3SUM, convolutions, matrix multiplication, histogram indexing} }

Document

**Published in:** LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)

All Pairs Shortest Path (APSP) is a classic problem in graph theory. While for general weighted graphs there is no algorithm that computes APSP in O(n^{3-epsilon}) time (epsilon > 0), by using fast matrix multiplication algorithms, we can compute APSP in O(n^{omega}*log(n)) time (omega < 2.373) for undirected unweighted graphs, and in O(n^{2.5302}) time for directed unweighted graphs. In the current state of matters, there is a substantial gap between the upper bounds of the problem for undirected and directed graphs, and for a long time, it is remained an important open question whether it is possible to close this gap.
In this paper we introduce a new parameter that measures the symmetry of directed graphs (i.e. their closeness to undirected graphs), and obtain a new parameterized APSP algorithm for directed unweighted graphs, that generalizes Seidel's O(n^{omega}*log(n)) time algorithm for undirected unweighted graphs. Given a directed unweighted graph G, unless it is highly asymmetric, our algorithms can compute APSP in o(n^{2.5}) time for G, providing for such graphs a faster APSP algorithm than the state-of-the-art algorithms for the problem.

Ely Porat, Eduard Shahbazian, and Roei Tov. New Parameterized Algorithms for APSP in Directed Graphs. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 72:1-72:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{porat_et_al:LIPIcs.ESA.2016.72, author = {Porat, Ely and Shahbazian, Eduard and Tov, Roei}, title = {{New Parameterized Algorithms for APSP in Directed Graphs}}, booktitle = {24th Annual European Symposium on Algorithms (ESA 2016)}, pages = {72:1--72:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-015-6}, ISSN = {1868-8969}, year = {2016}, volume = {57}, editor = {Sankowski, Piotr and Zaroliagis, Christos}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.72}, URN = {urn:nbn:de:0030-drops-64207}, doi = {10.4230/LIPIcs.ESA.2016.72}, annote = {Keywords: Graphs, distances, APSP, fast matrix multiplication} }

Document

**Published in:** LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

In the online dictionary matching problem the goal is to preprocess a set of patterns D={P_1,...,P_d} over alphabet Sigma, so that given an online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the online dictionary matching problem. Our solution uses a new succinct representation for multi-labeled trees, in which each node has a set of labels from a universe of size lambda. We consider lowest labeled ancestor (LLA) queries on multi-labeled trees, where given a node and a label we return the lowest proper ancestor of the node that has the queried label.
In this paper we introduce a succinct representation of multi-labeled trees for lambda=omega(1) that support LLA queries in O(log(log(lambda))) time. Using this representation of multi-labeled trees, we introduce a succinct data structure for the online dictionary matching problem when sigma=omega(1). In this solution the worst case cost per character is O(log(log(sigma)) + occ) time, where occ is the size of the current output.
Moreover, the amortized cost per character is O(1+occ) time.

Tsvi Kopelowitz, Ely Porat, and Yaron Rozen. Succinct Online Dictionary Matching with Improved Worst-Case Guarantees. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 6:1-6:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{kopelowitz_et_al:LIPIcs.CPM.2016.6, author = {Kopelowitz, Tsvi and Porat, Ely and Rozen, Yaron}, title = {{Succinct Online Dictionary Matching with Improved Worst-Case Guarantees}}, booktitle = {27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)}, pages = {6:1--6:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-012-5}, ISSN = {1868-8969}, year = {2016}, volume = {54}, editor = {Grossi, Roberto and Lewenstein, Moshe}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.6}, URN = {urn:nbn:de:0030-drops-60825}, doi = {10.4230/LIPIcs.CPM.2016.6}, annote = {Keywords: Succinct indexing, dictionary matching, Aho-Corasick, labeled trees} }

Document

**Published in:** LIPIcs, Volume 18, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2012)

In this paper we introduce a general framework that exponentially improves the space, the degree of independence, and the time needed by min-wise based algorithms. The authors, in SODA 2011, we introduced an exponential time improvement for min-wise based algorithms by defining and constructing an almost k-min-wise independent family of hash functions. Here we develop an alternative approach that achieves both exponential time and exponential space improvement. The new approach relaxes the need for approximately min-wise hash functions, hence gets around the Omega(log(1/epsilon)) independence lower bound in [Patrascu 2010]. This is done by defining and constructing a d-k-min-wise independent family of hash functions. Surprisingly, for most cases only 8-wise independence is needed for the additional improvement. Moreover, as the degree of independence is a small constant, our function can be implemented efficiently.
Informally, under this definition, all subsets of size d of any fixed set X have an equal probability to have hash values among the minimal k values in X, where the probability is over the random choice of hash function from the family. This property measures the randomness of the family, as choosing a truly random function, obviously, satisfies the definition for d=k=|X|. We define and give an efficient time and space construction of approximately d-k-min-wise independent family of hash functions for the case where d=2, as this is sufficient for the additional exponential improvement.
We discuss how this construction can be used to improve many min-wise based algorithms. To our knowledge such definitions, for hash functions, were never studied and no construction was given before.
As an example we show how to apply it for similarity and rarity estimation over data streams. Other min-wise based algorithms, can be adjusted in the same way.

Guy Feigenblat, Ely Porat, and Ariel Shiftan. Exponential Space Improvement for minwise Based Algorithms. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2012). Leibniz International Proceedings in Informatics (LIPIcs), Volume 18, pp. 70-85, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2012)

Copy BibTex To Clipboard

@InProceedings{feigenblat_et_al:LIPIcs.FSTTCS.2012.70, author = {Feigenblat, Guy and Porat, Ely and Shiftan, Ariel}, title = {{Exponential Space Improvement for minwise Based Algorithms}}, booktitle = {IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2012)}, pages = {70--85}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-47-7}, ISSN = {1868-8969}, year = {2012}, volume = {18}, editor = {D'Souza, Deepak and Radhakrishnan, Jaikumar and Telikepalli, Kavitha}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2012.70}, URN = {urn:nbn:de:0030-drops-38495}, doi = {10.4230/LIPIcs.FSTTCS.2012.70}, annote = {Keywords: Streaming, Min-Wise, Hash Functions, Similarity, On line algorithms, Sub-linear algorithms} }

Document

**Published in:** LIPIcs, Volume 14, 29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012)

We present two recursive techniques to construct compressed sensing schemes that can be "decoded" in sub-linear time. The first technique is based on the well studied code composition method called code concatenation where the "outer" code has strong list recoverability properties. This technique uses only one level of recursion and critically uses the power of list recovery. The second recursive technique is conceptually similar, and has multiple recursion levels. The following compressed sensing results are obtained using these techniques:
- Strongly explicit efficiently decodable l_1/l_1 compressed sensing matrices: We present a strongly explicit ("for all") compressed sensing measurement matrix with O(d^2log^2 n) measurements that can output near-optimal d-sparse approximations in time poly(d log n).
- Near-optimal efficiently decodable l_1/l_1 compressed sensing matrices for non-negative signals: We present two randomized constructions of ("for all") compressed sensing matrices with near optimal number of measurements: O(d log n loglog_d n) and O_{m,s}(d^{1+1/s} log n (log^(m) n)^s), respectively, for any integer parameters s,m>=1. Both of these constructions can output near optimal d-sparse approximations for non-negative signals in time poly(d log n).
To the best of our knowledge, none of the results are dominated by existing results in the literature.

Hung Q. Ngo, Ely Porat, and Atri Rudra. Efficiently Decodable Compressed Sensing by List-Recoverable Codes and Recursion. In 29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012). Leibniz International Proceedings in Informatics (LIPIcs), Volume 14, pp. 230-241, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2012)

Copy BibTex To Clipboard

@InProceedings{ngo_et_al:LIPIcs.STACS.2012.230, author = {Ngo, Hung Q. and Porat, Ely and Rudra, Atri}, title = {{Efficiently Decodable Compressed Sensing by List-Recoverable Codes and Recursion}}, booktitle = {29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012)}, pages = {230--241}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-35-4}, ISSN = {1868-8969}, year = {2012}, volume = {14}, editor = {D\"{u}rr, Christoph and Wilke, Thomas}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2012.230}, URN = {urn:nbn:de:0030-drops-34011}, doi = {10.4230/LIPIcs.STACS.2012.230}, annote = {Keywords: Compressed Sensing, Sub-Linear Time Decoding, List-Recoverable Codes} }

Document

**Published in:** Dagstuhl Seminar Proceedings, Volume 9281, Search Methodologies (2009)

Group testing is a long studied problem in combinatorics: A small set of r ill people should be identified out of the whole (n people) by using only queries (tests) of the form "Does set X contain an ill human?". In this paper we provide an explicit construction of a testing scheme which is better (smaller) than any known explicit construction. This scheme has \Theta(min[r2 log n, n])tests which is as many as the best non-explicit schemes have. In our construction we use a fact that may have a value by its own right: Linear error-correction codes with parameters [m, k, \delta m]q meeting the Gilbert-Varshamov bound may be constructed quite efficiently, in \Theta[q^{k}m) time.

Ely Porat and Amir Rotschild. Explicit Non-Adaptive Combinatorial Group Testing Schemes. In Search Methodologies. Dagstuhl Seminar Proceedings, Volume 9281, pp. 1-13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)

Copy BibTex To Clipboard

@InProceedings{porat_et_al:DagSemProc.09281.2, author = {Porat, Ely and Rotschild, Amir}, title = {{Explicit Non-Adaptive Combinatorial Group Testing Schemes}}, booktitle = {Search Methodologies}, pages = {1--13}, series = {Dagstuhl Seminar Proceedings (DagSemProc)}, ISSN = {1862-4405}, year = {2009}, volume = {9281}, editor = {Rudolf Ahlswede and Ferdinando Cicalese and Ugo Vaccaro}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.09281.2}, URN = {urn:nbn:de:0030-drops-22414}, doi = {10.4230/DagSemProc.09281.2}, annote = {Keywords: Prime Numbers, Group Testing, Streaming, Pattern Matching} }

Document

**Published in:** Dagstuhl Seminar Proceedings, Volume 9281, Search Methodologies (2009)

We present solutions for the k-mismatch pattern matching problem
with don't cares. Given a text t of length n and a pattern p of length m
with don't care symbols and a bound k, our algorithms find all the places
that the pattern matches the text with at most k mismatches. We first
give an \Theta(n(k + logmlog k) log n) time randomised algorithm which finds
the correct answer with high probability. We then present a new deter-
ministic \Theta(nk^2 log^m)time solution that uses tools originally developed
for group testing. Taking our derandomisation approach further we de-
velop an approach based on k-selectors that runs in \Theta(nk polylogm) time.
Further, in each case the location of the mismatches at each alignment is
also given at no extra cost.

Raphael Clifford, Klim Efremo, Ely Porat, and Amir Rotschild. Pattern matching with don't cares and few errors. In Search Methodologies. Dagstuhl Seminar Proceedings, Volume 9281, pp. 1-19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)

Copy BibTex To Clipboard

@InProceedings{clifford_et_al:DagSemProc.09281.5, author = {Clifford, Raphael and Efremo, Klim and Porat, Ely and Rotschild, Amir}, title = {{Pattern matching with don't cares and few errors}}, booktitle = {Search Methodologies}, pages = {1--19}, series = {Dagstuhl Seminar Proceedings (DagSemProc)}, ISSN = {1862-4405}, year = {2009}, volume = {9281}, editor = {Rudolf Ahlswede and Ferdinando Cicalese and Ugo Vaccaro}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.09281.5}, URN = {urn:nbn:de:0030-drops-22442}, doi = {10.4230/DagSemProc.09281.5}, annote = {Keywords: Prime Numbers, Group Testing, Streaming, Pattern Matching} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail