Document

**Published in:** LIPIcs, Volume 283, 34th International Symposium on Algorithms and Computation (ISAAC 2023)

Shannon’s entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of strings, e.g., the size z of the Lempel–Ziv parse or the number r of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size γ of a smallest string attractor. Let T be a string of length n. A string attractor of T is a set of positions of T capturing the occurrences of all the substrings of T. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing γ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure of compressibility that is based on the function S_T(k) counting the number of distinct substrings of length k of T, also known as the substring complexity of T. This new measure is defined as δ = sup{S_T(k)/k, k ≥ 1} and lower bounds all the relevant ad hoc measures previously considered. In particular, δ ≤ γ always holds and δ can be computed in 𝒪(n) time using Θ(n) working space. Kociumaka et al. showed that one can construct an 𝒪(δ log n/(δ))-sized representation of T supporting efficient direct access and efficient pattern matching queries on T. Given that for highly compressible strings, δ is significantly smaller than n, it is natural to pose the following question:
Can we compute δ efficiently using sublinear working space?
It is straightforward to show that in the comparison model, any algorithm computing δ using 𝒪(b) space requires Ω(n^{2-o(1)}/b) time through a reduction from the element distinctness problem [Yao, SIAM J. Comput. 1994]. We thus wanted to investigate whether we can indeed match this lower bound. We address this algorithmic challenge by showing the following bounds to compute δ:
- 𝒪((n³log b)/b²) time using 𝒪(b) space, for any b ∈ [1,n], in the comparison model.
- 𝒪̃(n²/b) time using 𝒪̃(b) space, for any b ∈ [√n,n], in the word RAM model. This gives an 𝒪̃(n^{1+ε})-time and 𝒪̃(n^{1-ε})-space algorithm to compute δ, for any 0 < ε ≤ 1/2.
Let us remark that our algorithms compute S_T(k), for all k, within the same complexities.

Giulia Bernardini, Gabriele Fici, Paweł Gawrychowski, and Solon P. Pissis. Substring Complexity in Sublinear Space. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 12:1-12:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bernardini_et_al:LIPIcs.ISAAC.2023.12, author = {Bernardini, Giulia and Fici, Gabriele and Gawrychowski, Pawe{\l} and Pissis, Solon P.}, title = {{Substring Complexity in Sublinear Space}}, booktitle = {34th International Symposium on Algorithms and Computation (ISAAC 2023)}, pages = {12:1--12:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-289-1}, ISSN = {1868-8969}, year = {2023}, volume = {283}, editor = {Iwata, Satoru and Kakimura, Naonori}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2023.12}, URN = {urn:nbn:de:0030-drops-193143}, doi = {10.4230/LIPIcs.ISAAC.2023.12}, annote = {Keywords: sublinear-space algorithm, string algorithm, substring complexity} }

Document

**Published in:** LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

We revisit the Heaviest Induced Ancestors (HIA) problem that was introduced by Gagie, Gawrychowski, and Nekrich [CCCG 2013] and has a number of applications in string algorithms. Let T₁ and T₂ be two rooted trees whose nodes have weights that are increasing in all root-to-leaf paths, and labels on the leaves, such that no two leaves of a tree have the same label. A pair of nodes (u, v) ∈ T₁ × T₂ is induced if and only if there is a label shared by leaf-descendants of u and v. In an HIA query, given nodes x ∈ T₁ and y ∈ T₂, the goal is to find an induced pair of nodes (u, v) of the maximum total weight such that u is an ancestor of x and v is an ancestor of y.
Let n be the upper bound on the sizes of the two trees. It is known that no data structure of size 𝒪̃(n) can answer HIA queries in o(log n / log log n) time [Charalampopoulos, Gawrychowski, Pokorski; ICALP 2020]. This (unconditional) lower bound is a polyloglog n factor away from the query time of the fastest 𝒪̃(n)-size data structure known to date for the HIA problem [Abedin, Hooshmand, Ganguly, Thankachan; Algorithmica 2022]. In this work, we resolve the query-time complexity of the HIA problem for the near-linear space regime by presenting a data structure that can be built in 𝒪̃(n) time and answers HIA queries in 𝒪(log n/log log n) time. As a direct corollary, we obtain an 𝒪̃(n)-size data structure that maintains the LCS of a static string and a dynamic string, both of length at most n, in time optimal for this space regime.
The main ingredients of our approach are fractional cascading and the utilization of an 𝒪(log n/ log log n)-depth tree decomposition. The latter allows us to break through the Ω(log n) barrier faced by previous works, due to the depth of the considered heavy-path decompositions.

Panagiotis Charalampopoulos, Bartłomiej Dudek, Paweł Gawrychowski, and Karol Pokorski. Optimal Near-Linear Space Heaviest Induced Ancestors. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 8:1-8:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2023.8, author = {Charalampopoulos, Panagiotis and Dudek, Bart{\l}omiej and Gawrychowski, Pawe{\l} and Pokorski, Karol}, title = {{Optimal Near-Linear Space Heaviest Induced Ancestors}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {8:1--8:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.8}, URN = {urn:nbn:de:0030-drops-179624}, doi = {10.4230/LIPIcs.CPM.2023.8}, annote = {Keywords: data structures, string algorithms, fractional cascading} }

Document

**Published in:** LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

The fundamental question considered in algorithms on strings is that of indexing, that is, preprocessing a given string for specific queries. By now we have a number of efficient solutions for this problem when the queries ask for an exact occurrence of a given pattern P. However, practical applications motivate the necessity of considering more complex queries, for example concerning near occurrences of two patterns. Recently, Bille et al. [CPM 2021] introduced a variant of such queries, called gapped consecutive occurrences, in which a query consists of two patterns P₁ and P₂ and a range [a,b], and one must find all consecutive occurrences (q₁,q₂) of P₁ and P₂ such that q₂-q₁ ∈ [a,b]. By their results, we cannot hope for a very efficient indexing structure for such queries, even if a = 0 is fixed (although at the same time they provided a non-trivial upper bound). Motivated by this, we focus on a text given as a straight-line program (SLP) and design an index taking space polynomial in the size of the grammar that answers such queries in time optimal up to polylog factors.

Paweł Gawrychowski, Garance Gourdel, Tatiana Starikovskaya, and Teresa Anna Steiner. Compressed Indexing for Consecutive Occurrences. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 12:1-12:22, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2023.12, author = {Gawrychowski, Pawe{\l} and Gourdel, Garance and Starikovskaya, Tatiana and Steiner, Teresa Anna}, title = {{Compressed Indexing for Consecutive Occurrences}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {12:1--12:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.12}, URN = {urn:nbn:de:0030-drops-179666}, doi = {10.4230/LIPIcs.CPM.2023.12}, annote = {Keywords: Compressed indexing, two patterns, consecutive occurrences} }

Document

**Published in:** LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

An order-preserving square in a string is a fragment of the form uv where u ≠ v and u is order-isomorphic to v. We show that a string w of length n over an alphabet of size σ contains 𝒪(σn) order-preserving squares that are distinct as words. This improves the upper bound of 𝒪(σ²n) by Kociumaka, Radoszewski, Rytter, and Waleń [TCS 2016]. Further, for every σ and n we exhibit a string with Ω(σn) order-preserving squares that are distinct as words, thus establishing that our upper bound is asymptotically tight. Finally, we design an 𝒪(σn) time algorithm that outputs all order-preserving squares that occur in a given string and are distinct as words. By our lower bound, this is optimal in the worst case.

Paweł Gawrychowski, Samah Ghazawi, and Gad M. Landau. Order-Preserving Squares in Strings. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 13:1-13:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2023.13, author = {Gawrychowski, Pawe{\l} and Ghazawi, Samah and Landau, Gad M.}, title = {{Order-Preserving Squares in Strings}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {13:1--13:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.13}, URN = {urn:nbn:de:0030-drops-179676}, doi = {10.4230/LIPIcs.CPM.2023.13}, annote = {Keywords: repetitions, distinct squares, order-isomorphism} }

Document

APPROX

**Published in:** LIPIcs, Volume 245, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)

In a breakthrough work, Kawarabayashi and Thorup (J. ACM'19) gave a near-linear time deterministic algorithm to compute the weight of a minimum cut in a simple graph G = (V,E). A key component of this algorithm is finding the (1+ε)-KT partition of G, the coarsest partition {P_1, …, P_k} of V such that for every non-trivial (1+ε)-near minimum cut with sides {S, ̄{S}} it holds that P_i is contained in either S or ̄{S}, for i = 1, …, k. In this work we give a near-linear time randomized algorithm to find the (1+ε)-KT partition of a weighted graph. Our algorithm is quite different from that of Kawarabayashi and Thorup and builds on Karger’s framework of tree-respecting cuts (J. ACM'00).
We describe a number of applications of the algorithm. (i) The algorithm makes progress towards a more efficient algorithm for constructing the polygon representation of the set of near-minimum cuts in a graph. This is a generalization of the cactus representation, and was initially described by Benczúr (FOCS'95). (ii) We improve the time complexity of a recent quantum algorithm for minimum cut in a simple graph in the adjacency list model from Õ(n^{3/2}) to Õ(√{mn}), when the graph has n vertices and m edges. (iii) We describe a new type of randomized algorithm for minimum cut in simple graphs with complexity 𝒪(m + n log⁶ n). For graphs that are not too sparse, this matches the complexity of the current best 𝒪(m + n log² n) algorithm which uses a different approach based on random contractions.
The key technical contribution of our work is the following. Given a weighted graph G with m edges and a spanning tree T of G, consider the graph H whose nodes are the edges of T, and where there is an edge between two nodes of H iff the corresponding 2-respecting cut of T is a non-trivial near-minimum cut of G. We give a 𝒪(m log⁴ n) time deterministic algorithm to compute a spanning forest of H.

Simon Apers, Paweł Gawrychowski, and Troy Lee. Finding the KT Partition of a Weighted Graph in Near-Linear Time. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 32:1-32:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{apers_et_al:LIPIcs.APPROX/RANDOM.2022.32, author = {Apers, Simon and Gawrychowski, Pawe{\l} and Lee, Troy}, title = {{Finding the KT Partition of a Weighted Graph in Near-Linear Time}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)}, pages = {32:1--32:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-249-5}, ISSN = {1868-8969}, year = {2022}, volume = {245}, editor = {Chakrabarti, Amit and Swamy, Chaitanya}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2022.32}, URN = {urn:nbn:de:0030-drops-171544}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2022.32}, annote = {Keywords: Graph theory} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

We revisit the complexity of the classical Interval Scheduling in the dynamic setting. In this problem, the goal is to maintain a set of intervals under insertions and deletions and report the size of the maximum size subset of pairwise disjoint intervals after each update. Nontrivial approximation algorithms are known for this problem, for both the unweighted and weighted versions [Henzinger, Neumann, Wiese, SoCG 2020]. Surprisingly, it was not known if the general exact version admits an exact solution working in sublinear time, that is, without recomputing the answer after each update.
Our first contribution is a structure for Dynamic Interval Scheduling with amortized 𝒪̃(n^{1/3}) update time. Then, building on the ideas used for the case of one machine, we design a sublinear solution for any constant number of machines: we describe a structure for Dynamic Interval Scheduling on m ≥ 2 machines with amortized 𝒪̃(n^{1 - 1/m}) update time.
We complement the above results by considering Dynamic Weighted Interval Scheduling on one machine, that is maintaining (the weight of) the maximum weight subset of pairwise disjoint intervals. We show an almost linear lower bound (conditioned on the hardness of Minimum Weight k-Clique) for the update/query time of any structure for this problem. Hence, in the weighted case one should indeed seek approximate solutions.

Paweł Gawrychowski and Karol Pokorski. Sublinear Dynamic Interval Scheduling (On One or Multiple Machines). In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 67:1-67:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.ICALP.2022.67, author = {Gawrychowski, Pawe{\l} and Pokorski, Karol}, title = {{Sublinear Dynamic Interval Scheduling (On One or Multiple Machines)}}, booktitle = {49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)}, pages = {67:1--67:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-235-8}, ISSN = {1868-8969}, year = {2022}, volume = {229}, editor = {Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.67}, URN = {urn:nbn:de:0030-drops-164086}, doi = {10.4230/LIPIcs.ICALP.2022.67}, annote = {Keywords: interval scheduling, dynamic problems, data structures, greedy algorithms} }

Document

**Published in:** LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)

The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length m and all length-m substrings of a given text of length n ≥ m. We focus on the well-studied k-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold k. Moreover, we assume n ≤ 2m (in general, one can partition the text into overlapping blocks). In this work, we develop data structures for the dynamic version of the k-mismatch problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, whereas a query, given an index i, returns the Hamming distance between the pattern and the text substring starting at position i, or reports that the distance exceeds k.
First, we describe a simple data structure with 𝒪̃(1) update time and 𝒪̃(k) query time. Through considerably more sophisticated techniques, we show that 𝒪̃(k) update time and 𝒪̃(1) query time is also achievable. These two solutions likely provide an essentially optimal trade-off for the dynamic k-mismatch problem with m^{Ω(1)} ≤ k ≤ √m: we prove that, in that case, conditioned on the 3SUM conjecture, one cannot simultaneously achieve k^{1-Ω(1)} time for all operations (updates and queries) after n^{𝒪(1)}-time initialization. For k ≥ √m, the same lower bound excludes achieving m^{1/2-Ω(1)} time per operation. This is known to be essentially tight for constant-sized alphabets: already Clifford et al. (STACS 2018) achieved 𝒪̃(√m) time per operation in that case, but their solution for large alphabets costs 𝒪̃(m^{3/4}) time per operation. We improve and extend the latter result by developing a trade-off algorithm that, given a parameter 1 ≤ x ≤ k, achieves update time 𝒪̃(m/k +√{mk/x}) and query time 𝒪̃(x). In particular, for k ≥ √m, an appropriate choice of x yields 𝒪̃(∛{mk}) time per operation, which is 𝒪̃(m^{2/3}) when only the trivial threshold k = m is provided.

Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, and Przemysław Uznański. The Dynamic k-Mismatch Problem. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 18:1-18:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{clifford_et_al:LIPIcs.CPM.2022.18, author = {Clifford, Rapha\"{e}l and Gawrychowski, Pawe{\l} and Kociumaka, Tomasz and Martin, Daniel P. and Uzna\'{n}ski, Przemys{\l}aw}, title = {{The Dynamic k-Mismatch Problem}}, booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)}, pages = {18:1--18:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-234-1}, ISSN = {1868-8969}, year = {2022}, volume = {223}, editor = {Bannai, Hideo and Holub, Jan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.18}, URN = {urn:nbn:de:0030-drops-161454}, doi = {10.4230/LIPIcs.CPM.2022.18}, annote = {Keywords: Pattern matching, Hamming distance, dynamic algorithms} }

Document

**Published in:** LIPIcs, Volume 202, 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)

A pattern α is a string of variables and terminal letters. We say that α matches a word w, consisting only of terminal letters, if w can be obtained by replacing the variables of α by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patterns with restricted structure. In this paper, we approach this problem in a generalized setting, by considering approximate pattern matching under Hamming distance. More precisely, we are interested in what is the minimum Hamming distance between w and any word u obtained by replacing the variables of α by terminal words. Firstly, we address the class of regular patterns (in which no variable occurs twice) and propose efficient algorithms for this problem, as well as matching conditional lower bounds. We show that the problem can still be solved efficiently if we allow repeated variables, but restrict the way the different variables can be interleaved according to a locality parameter. However, as soon as we allow a variable to occur more than once and its occurrences can be interleaved arbitrarily with those of other variables, even if none of them occurs more than once, the problem becomes intractable.

Paweł Gawrychowski, Florin Manea, and Stefan Siemer. Matching Patterns with Variables Under Hamming Distance. In 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 202, pp. 48:1-48:24, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.MFCS.2021.48, author = {Gawrychowski, Pawe{\l} and Manea, Florin and Siemer, Stefan}, title = {{Matching Patterns with Variables Under Hamming Distance}}, booktitle = {46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)}, pages = {48:1--48:24}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-201-3}, ISSN = {1868-8969}, year = {2021}, volume = {202}, editor = {Bonchi, Filippo and Puglisi, Simon J.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2021.48}, URN = {urn:nbn:de:0030-drops-144886}, doi = {10.4230/LIPIcs.MFCS.2021.48}, annote = {Keywords: Pattern with variables, Matching algorithms, Hamming distance, Conditional lower bounds, Patterns with structural restrictions} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

We consider the problem of preprocessing two strings S and T, of lengths m and n, respectively, in order to be able to efficiently answer the following queries: Given positions i,j in S and positions a,b in T, return the optimal alignment score of S[i..j] and T[a..b]. Let N = mn. We present an oracle with preprocessing time N^{1+o(1)} and space N^{1+o(1)} that answers queries in log^{2+o(1)}N time. In other words, we show that we can efficiently query for the alignment score of every pair of substrings after preprocessing the input for almost the same time it takes to compute just the alignment of S and T. Our oracle uses ideas from our distance oracle for planar graphs [STOC 2019] and exploits the special structure of the alignment graph. Conditioned on popular hardness conjectures, this result is optimal up to subpolynomial factors. Our results apply to both edit distance and longest common subsequence (LCS).
The best previously known oracle with construction time and size 𝒪(N) has slow Ω(√N) query time [Sakai, TCS 2019], and the one with size N^{1+o(1)} and query time log^{2+o(1)}N (using a planar graph distance oracle) has slow Ω(N^{3/2}) construction time [Long & Pettie, SODA 2021]. We improve both approaches by roughly a √ N factor.

Panagiotis Charalampopoulos, Paweł Gawrychowski, Shay Mozes, and Oren Weimann. An Almost Optimal Edit Distance Oracle. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 48:1-48:20, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.ICALP.2021.48, author = {Charalampopoulos, Panagiotis and Gawrychowski, Pawe{\l} and Mozes, Shay and Weimann, Oren}, title = {{An Almost Optimal Edit Distance Oracle}}, booktitle = {48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)}, pages = {48:1--48:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-195-5}, ISSN = {1868-8969}, year = {2021}, volume = {198}, editor = {Bansal, Nikhil and Merelli, Emanuela and Worrell, James}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.48}, URN = {urn:nbn:de:0030-drops-141175}, doi = {10.4230/LIPIcs.ICALP.2021.48}, annote = {Keywords: longest common subsequence, edit distance, planar graphs, Voronoi diagrams} }

Document

Complete Volume

**Published in:** LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

LIPIcs, Volume 191, CPM 2021, Complete Volume

Paweł Gawrychowski and Tatiana Starikovskaya. LIPIcs, Volume 191, CPM 2021, Complete Volume. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 1-404, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@Proceedings{gawrychowski_et_al:LIPIcs.CPM.2021, title = {{LIPIcs, Volume 191, CPM 2021, Complete Volume}}, booktitle = {32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)}, pages = {1--404}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-186-3}, ISSN = {1868-8969}, year = {2021}, volume = {191}, editor = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021}, URN = {urn:nbn:de:0030-drops-139509}, doi = {10.4230/LIPIcs.CPM.2021}, annote = {Keywords: LIPIcs, Volume 191, CPM 2021, Complete Volume} }

Document

Front Matter

**Published in:** LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Front Matter, Table of Contents, Preface, Conference Organization

Paweł Gawrychowski and Tatiana Starikovskaya. Front Matter, Table of Contents, Preface, Conference Organization. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 0:i-0:xiv, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2021.0, author = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana}, title = {{Front Matter, Table of Contents, Preface, Conference Organization}}, booktitle = {32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)}, pages = {0:i--0:xiv}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-186-3}, ISSN = {1868-8969}, year = {2021}, volume = {191}, editor = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.0}, URN = {urn:nbn:de:0030-drops-139512}, doi = {10.4230/LIPIcs.CPM.2021.0}, annote = {Keywords: Front Matter, Table of Contents, Preface, Conference Organization} }

Document

**Published in:** LIPIcs, Volume 187, 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)

Simon’s congruence ∼_k is a relation on words defined by Imre Simon in the 1970s and intensely studied since then. This congruence was initially used in connection to piecewise testable languages, but also found many applications in, e.g., learning theory, databases theory, or linguistics. The ∼_k-relation is defined as follows: two words are ∼_k-congruent if they have the same set of subsequences of length at most k. A long standing open problem, stated already by Simon in his initial works on this topic, was to design an algorithm which computes, given two words s and t, the largest k for which s∼_k t. We propose the first algorithm solving this problem in linear time O(|s|+|t|) when the input words are over the integer alphabet {1,…,|s|+|t|} (or other alphabets which can be sorted in linear time). Our approach can be extended to an optimal algorithm in the case of general alphabets as well.
To achieve these results, we introduce a novel data-structure, called Simon-Tree, which allows us to construct a natural representation of the equivalence classes induced by ∼_k on the set of suffixes of a word, for all k ≥ 1. We show that such a tree can be constructed for an input word in linear time. Then, when working with two words s and t, we compute their respective Simon-Trees and efficiently build a correspondence between the nodes of these trees. This correspondence, which can also be constructed in linear time O(|s|+|t|), allows us to retrieve the largest k for which s∼_k t.

Paweł Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Efficiently Testing Simon’s Congruence. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 34:1-34:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2021.34, author = {Gawrychowski, Pawe{\l} and Kosche, Maria and Ko{\ss}, Tore and Manea, Florin and Siemer, Stefan}, title = {{Efficiently Testing Simon’s Congruence}}, booktitle = {38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)}, pages = {34:1--34:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-180-1}, ISSN = {1868-8969}, year = {2021}, volume = {187}, editor = {Bl\"{a}ser, Markus and Monmege, Benjamin}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2021.34}, URN = {urn:nbn:de:0030-drops-136796}, doi = {10.4230/LIPIcs.STACS.2021.34}, annote = {Keywords: Simon’s congruence, Subsequence, Scattered factor, Efficient algorithms} }

Document

**Published in:** LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)

The Longest Common Increasing Subsequence (LCIS) is a variant of the classical Longest Common Subsequence (LCS), in which we additionally require the common subsequence to be strictly increasing. While the well-known "Four Russians" technique can be used to find LCS in subquadratic time, it does not seem directly applicable to LCIS. Recently, Duraj [STACS 2020] used a completely different method based on the combinatorial properties of LCIS to design an 𝒪(n²(log log n)²/log^{1/6}n) time algorithm. We show that an approach based on exploiting tabulation (more involved than "Four Russians") can be used to construct an asymptotically faster 𝒪(n² log log n/√{log n}) time algorithm. As our solution avoids using the specific combinatorial properties of LCIS, it can be also adapted for the Longest Common Weakly Increasing Subsequence (LCWIS).

Anadi Agrawal and Paweł Gawrychowski. A Faster Subquadratic Algorithm for the Longest Common Increasing Subsequence Problem. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 4:1-4:12, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{agrawal_et_al:LIPIcs.ISAAC.2020.4, author = {Agrawal, Anadi and Gawrychowski, Pawe{\l}}, title = {{A Faster Subquadratic Algorithm for the Longest Common Increasing Subsequence Problem}}, booktitle = {31st International Symposium on Algorithms and Computation (ISAAC 2020)}, pages = {4:1--4:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-173-3}, ISSN = {1868-8969}, year = {2020}, volume = {181}, editor = {Cao, Yixin and Cheng, Siu-Wing and Li, Minming}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.4}, URN = {urn:nbn:de:0030-drops-133487}, doi = {10.4230/LIPIcs.ISAAC.2020.4}, annote = {Keywords: Longest Common Increasing Subsequence, Four Russians} }

Document

**Published in:** LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)

Permutation σ appears in permutation π if there exists a subsequence of π that is order-isomorphic to σ. The natural algorithmic question is to check if σ appears in π, and if so count the number of occurrences. Only since very recently we know that for any fixed length k, we can check if a given pattern of length k appears in a permutation of length n in time linear in n, but being able to count all such occurrences in f(k)⋅ n^o(k/log k) time would refute the exponential time hypothesis (ETH). Together with practical applications in statistics, this motivates a systematic study of the complexity of counting occurrences for different patterns of fixed small length k. We investigate this question for k = 4. Very recently, Even-Zohar and Leng [arXiv 2019] identified two types of 4-patterns. For the first type they designed an 𝒪̃(n) time algorithm, while for the second they were able to provide an 𝒪̃(n^1.5) time algorithm. This brings up the question whether the permutations of the second type are inherently harder than the first type.
We establish a connection between counting 4-patterns of the second type and counting 4-cycles (not necessarily induced) in a sparse undirected graph. By designing two-way reductions we show that the complexities of both problems are the same, up to polylogarithmic factors. This allows us to leverage the work done on the latter to provide a reasonable argument for why there is a difference in the complexities for counting 4-patterns of the first and the second type. In particular, even for the seemingly simpler problem of detecting a 4-cycle in a graph on m edges, the best known algorithm works in 𝒪(m^{4/3}) time. Our reductions imply that an 𝒪(n^{4/3-ε}) time algorithm for counting occurrences of any 4-pattern of the second type in a permutation of length n would imply an exciting breakthrough for counting (and hence also detecting) 4-cycles. In the other direction, by plugging in the fastest known algorithm for counting 4-cycles, we obtain an algorithm for counting occurrences of any 4-pattern of the second type in 𝒪(n^1.48) time.

Bartłomiej Dudek and Paweł Gawrychowski. Counting 4-Patterns in Permutations Is Equivalent to Counting 4-Cycles in Graphs. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 23:1-23:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{dudek_et_al:LIPIcs.ISAAC.2020.23, author = {Dudek, Bart{\l}omiej and Gawrychowski, Pawe{\l}}, title = {{Counting 4-Patterns in Permutations Is Equivalent to Counting 4-Cycles in Graphs}}, booktitle = {31st International Symposium on Algorithms and Computation (ISAAC 2020)}, pages = {23:1--23:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-173-3}, ISSN = {1868-8969}, year = {2020}, volume = {181}, editor = {Cao, Yixin and Cheng, Siu-Wing and Li, Minming}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.23}, URN = {urn:nbn:de:0030-drops-133678}, doi = {10.4230/LIPIcs.ISAAC.2020.23}, annote = {Keywords: Permutations, pattern avoidance, counting cycles} }

Document

**Published in:** LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)

We consider labeling nodes of a directed graph for reachability queries. A reachability labeling scheme for such a graph assigns a binary string, called a label, to each node. Then, given the labels of nodes u and v and no other information about the underlying graph, it should be possible to determine whether there exists a directed path from u to v. By a simple information theoretical argument and invoking the bound on the number of partial orders, in any scheme some labels need to consist of at least n/4 bits, where n is the number of nodes. On the other hand, it is not hard to design a scheme with labels consisting of n/2+𝒪(log n) bits. In the classical centralised setting, where a single data structure is stored as a whole, Munro and Nicholson designed a structure for reachability queries consisting of n²/4+o(n²) bits (which is optimal, up to the lower order term). We extend their approach to obtain a scheme with labels consisting of n/3+o(n) bits.

Maciej Dulęba, Paweł Gawrychowski, and Wojciech Janczewski. Efficient Labeling for Reachability in Directed Acyclic Graphs. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 27:1-27:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{duleba_et_al:LIPIcs.ISAAC.2020.27, author = {Dul\k{e}ba, Maciej and Gawrychowski, Pawe{\l} and Janczewski, Wojciech}, title = {{Efficient Labeling for Reachability in Directed Acyclic Graphs}}, booktitle = {31st International Symposium on Algorithms and Computation (ISAAC 2020)}, pages = {27:1--27:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-173-3}, ISSN = {1868-8969}, year = {2020}, volume = {181}, editor = {Cao, Yixin and Cheng, Siu-Wing and Li, Minming}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.27}, URN = {urn:nbn:de:0030-drops-133710}, doi = {10.4230/LIPIcs.ISAAC.2020.27}, annote = {Keywords: informative labeling scheme, reachability, DAG} }

Document

**Published in:** LIPIcs, Volume 170, 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)

We study the computational complexity of solving mean payoff games. This class of games can be seen as an extension of parity games, and they have similar complexity status: in both cases solving them is in NP ∩ coNP and not known to be in P. In a breakthrough result Calude, Jain, Khoussainov, Li, and Stephan constructed in 2017 a quasipolynomial time algorithm for solving parity games, which was quickly followed by a few other algorithms with the same complexity. Our objective is to investigate how these techniques can be extended to mean payoff games.
The starting point is the combinatorial notion of universal trees: all quasipolynomial time algorithms for parity games have been shown to exploit universal trees. Universal graphs extend universal trees to arbitrary (positionally determined) objectives. We show that they yield a family of value iteration algorithms for solving mean payoff games which includes the value iteration algorithm due to Brim, Chaloupka, Doyen, Gentilini, and Raskin.
The contribution of this paper is to prove tight bounds on the complexity of algorithms for mean payoff games using universal graphs. We consider two parameters: the largest weight N in absolute value and the number k of weights. The dependence in N in the existing value iteration algorithm is linear, we show that this can be improved to N^{1 - 1/n} and obtain a matching lower bound. However, we show that we cannot break the linear dependence in the exponent in the number k of weights implying that universal graphs do not yield a quasipolynomial time algorithm for solving mean payoff games.

Nathanaël Fijalkow, Paweł Gawrychowski, and Pierre Ohlmann. Value Iteration Using Universal Graphs and the Complexity of Mean Payoff Games. In 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 170, pp. 34:1-34:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{fijalkow_et_al:LIPIcs.MFCS.2020.34, author = {Fijalkow, Nathana\"{e}l and Gawrychowski, Pawe{\l} and Ohlmann, Pierre}, title = {{Value Iteration Using Universal Graphs and the Complexity of Mean Payoff Games}}, booktitle = {45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)}, pages = {34:1--34:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-159-7}, ISSN = {1868-8969}, year = {2020}, volume = {170}, editor = {Esparza, Javier and Kr\'{a}l', Daniel}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2020.34}, URN = {urn:nbn:de:0030-drops-127011}, doi = {10.4230/LIPIcs.MFCS.2020.34}, annote = {Keywords: Mean payoff games, Universal graphs, Value iteration} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 168, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)

The longest common substring problem consists in finding a longest string that appears as a (contiguous) substring of two input strings. We consider the dynamic variant of this problem, in which we are to maintain two dynamic strings S and T, each of length at most n, that undergo substitutions of letters, in order to be able to return a longest common substring after each substitution. Recently, Amir et al. [ESA 2019] presented a solution for this problem that needs only 𝒪̃(n^(2/3)) time per update. This brought the challenge of determining whether there exists a faster solution with polylogarithmic update time, or (as is the case for other dynamic problems), we should expect a polynomial (conditional) lower bound. We answer this question by designing a significantly faster algorithm that processes each substitution in amortized log^𝒪(1) n time with high probability. Our solution relies on exploiting the local consistency of the parsing of a collection of dynamic strings due to Gawrychowski et al. [SODA 2018], and on maintaining two dynamic trees with labeled bicolored leaves, so that after each update we can report a pair of nodes, one from each tree, of maximum combined weight, which have at least one common leaf-descendant of each color. We complement this with a lower bound of Ω(log n/ log log n) for the update time of any polynomial-size data structure that maintains the LCS of two dynamic strings, even allowing amortization and randomization.

Panagiotis Charalampopoulos, Paweł Gawrychowski, and Karol Pokorski. Dynamic Longest Common Substring in Polylogarithmic Time. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 27:1-27:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.ICALP.2020.27, author = {Charalampopoulos, Panagiotis and Gawrychowski, Pawe{\l} and Pokorski, Karol}, title = {{Dynamic Longest Common Substring in Polylogarithmic Time}}, booktitle = {47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)}, pages = {27:1--27:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-138-2}, ISSN = {1868-8969}, year = {2020}, volume = {168}, editor = {Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.27}, URN = {urn:nbn:de:0030-drops-124340}, doi = {10.4230/LIPIcs.ICALP.2020.27}, annote = {Keywords: string algorithms, dynamic algorithms, longest common substring} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 168, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)

We give a randomized algorithm that finds a minimum cut in an undirected weighted m-edge n-vertex graph G with high probability in O(m log² n) time. This is the first improvement to Karger’s celebrated O(m log³ n) time algorithm from 1996. Our main technical contribution is a deterministic O(m log n) time algorithm that, given a spanning tree T of G, finds a minimum cut of G that 2-respects (cuts two edges of) T.

Paweł Gawrychowski, Shay Mozes, and Oren Weimann. Minimum Cut in O(m log² n) Time. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 57:1-57:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.ICALP.2020.57, author = {Gawrychowski, Pawe{\l} and Mozes, Shay and Weimann, Oren}, title = {{Minimum Cut in O(m log² n) Time}}, booktitle = {47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)}, pages = {57:1--57:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-138-2}, ISSN = {1868-8969}, year = {2020}, volume = {168}, editor = {Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.57}, URN = {urn:nbn:de:0030-drops-124646}, doi = {10.4230/LIPIcs.ICALP.2020.57}, annote = {Keywords: Minimum cut, Minimum 2-respecting cut} }

Document

**Published in:** LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

The last decade brought a significant increase in the amount of data and a variety of new inference methods for reconstructing the detailed evolutionary history of various cancers. This brings the need of designing efficient procedures for comparing rooted trees representing the evolution of mutations in tumor phylogenies. Bernardini et al. [CPM 2019] recently introduced a notion of the rearrangement distance for fully-labelled trees motivated by this necessity. This notion originates from two operations: one that permutes the labels of the nodes, the other that affects the topology of the tree. Each operation alone defines a distance that can be computed in polynomial time, while the actual rearrangement distance, that combines the two, was proven to be NP-hard.
We answer two open question left unanswered by the previous work. First, what is the complexity of computing the permutation distance? Second, is there a constant-factor approximation algorithm for estimating the rearrangement distance between two arbitrary trees? We answer the first one by showing, via a two-way reduction, that calculating the permutation distance between two trees on n nodes is equivalent, up to polylogarithmic factors, to finding the largest cardinality matching in a sparse bipartite graph. In particular, by plugging in the algorithm of Liu and Sidford [ArXiv 2020], we obtain an 𝒪̃(n^{4/3+o(1}) time algorithm for computing the permutation distance between two trees on n nodes. Then we answer the second question positively, and design a linear-time constant-factor approximation algorithm that does not need any assumption on the trees.

Giulia Bernardini, Paola Bonizzoni, and Paweł Gawrychowski. On Two Measures of Distance Between Fully-Labelled Trees. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 6:1-6:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bernardini_et_al:LIPIcs.CPM.2020.6, author = {Bernardini, Giulia and Bonizzoni, Paola and Gawrychowski, Pawe{\l}}, title = {{On Two Measures of Distance Between Fully-Labelled Trees}}, booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)}, pages = {6:1--6:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-149-8}, ISSN = {1868-8969}, year = {2020}, volume = {161}, editor = {G{\o}rtz, Inge Li and Weimann, Oren}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.6}, URN = {urn:nbn:de:0030-drops-121318}, doi = {10.4230/LIPIcs.CPM.2020.6}, annote = {Keywords: Tree distance, Cancer progression, Approximation algorithms, Fine-grained complexity} }

Document

**Published in:** LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Given two indeterminate equal-length strings p and t with a set of characters per position in both strings, we obtain a determinate string p_w from p and a determinate string t_w from t by choosing one character per position. Then, we say that p and t match when p_w and t_w match for some choice of the characters. While the most standard notion of a match for determinate strings is that they are simply identical, in certain applications it is more appropriate to use other definitions, with the prime examples being parameterized matching, order-preserving matching, and the recently introduced Cartesian tree matching. We provide a systematic study of the complexity of string matching for indeterminate equal-length strings, for different notions of matching. We use n to denote the length of both strings, and r to be an upper-bound on the number of uncertain characters per position. First, we provide the first polynomial time algorithm for the Cartesian tree version that runs in deterministic 𝒪(nlog² n) and expected 𝒪(nlog nlog log n) time using 𝒪(nlog n) space, for constant r. Second, we establish NP-hardness of the order-preserving version for r=2, thus solving a question explicitly stated by Henriques et al. [CPM 2018], who showed hardness for r=3. Third, we establish NP-hardness of the parameterized version for r=2. As both parameterized and order-preserving indeterminate matching reduce to the standard determinate matching for r=1, this provides a complete classification for these three variants.

Paweł Gawrychowski, Samah Ghazawi, and Gad M. Landau. On Indeterminate Strings Matching. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 14:1-14:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2020.14, author = {Gawrychowski, Pawe{\l} and Ghazawi, Samah and Landau, Gad M.}, title = {{On Indeterminate Strings Matching}}, booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)}, pages = {14:1--14:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-149-8}, ISSN = {1868-8969}, year = {2020}, volume = {161}, editor = {G{\o}rtz, Inge Li and Weimann, Oren}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.14}, URN = {urn:nbn:de:0030-drops-121393}, doi = {10.4230/LIPIcs.CPM.2020.14}, annote = {Keywords: string matching, indeterminate strings, Cartesian trees, order-preserving matching, parameterized matching} }

Document

**Published in:** LIPIcs, Volume 154, 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)

We study the following natural variation on the classical universality problem: given a language L(M) represented by M (e.g., a DFA/RE/NFA/PDA), does there exist an integer ? ≥ 0 such that Σ^? ⊆ L(M)? In the case of an NFA, we show that this problem is NEXPTIME-complete, and the smallest such ? can be doubly exponential in the number of states. This particular case was formulated as an open problem in 2009, and our solution uses a novel and involved construction. In the case of a PDA, we show that it is recursively unsolvable, while the smallest such ? is not bounded by any computable function of the number of states. In the case of a DFA, we show that the problem is NP-complete, and e^{√{n log n} (1+o(1))} is an asymptotically tight upper bound for the smallest such ?, where n is the number of states. Finally, we prove that in all these cases, the problem becomes computationally easier when the length ? is also given in binary in the input: it is polynomially solvable for a DFA, PSPACE-complete for an NFA, and co-NEXPTIME-complete for a PDA.

Paweł Gawrychowski, Martin Lange, Narad Rampersad, Jeffrey Shallit, and Marek Szykuła. Existential Length Universality. In 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 154, pp. 16:1-16:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2020.16, author = {Gawrychowski, Pawe{\l} and Lange, Martin and Rampersad, Narad and Shallit, Jeffrey and Szyku{\l}a, Marek}, title = {{Existential Length Universality}}, booktitle = {37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)}, pages = {16:1--16:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-140-5}, ISSN = {1868-8969}, year = {2020}, volume = {154}, editor = {Paul, Christophe and Bl\"{a}ser, Markus}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2020.16}, URN = {urn:nbn:de:0030-drops-118770}, doi = {10.4230/LIPIcs.STACS.2020.16}, annote = {Keywords: decision problem, deterministic automaton, nondeterministic automaton, pushdown automaton, regular expression, regular language, universality} }

Document

**Published in:** LIPIcs, Volume 154, 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)

In the problem of Generalised Pattern Matching (GPM) [STOC'94, Muthukrishnan and Palem], we are given a text T of length n over an alphabet Σ_T, a pattern P of length m over an alphabet Σ_P, and a matching relationship ⊆ Σ_T × Σ_P, and must return all substrings of T that match P (reporting) or the number of mismatches between each substring of T of length m and P (counting). In this work, we improve over all previously known algorithms for this problem:
- For ? being the maximum number of characters that match a fixed character, we show two new Monte Carlo algorithms, a reporting algorithm with time ?(? n log n log m) and a (1-ε)-approximation counting algorithm with time ?(ε^-1 ? n log n log m). We then derive a (1-ε)-approximation deterministic counting algorithm for GPM with ?(ε^-2 ? n log⁶ n) time.
- For ? being the number of pairs of matching characters, we demonstrate Monte Carlo algorithms for reporting and (1-ε)-approximate counting with running time ?(√? n log m √{log n}) and ?(√{ε^-1 ?} n log m √{log n}), respectively, as well as a (1-ε)-approximation deterministic algorithm for the counting variant of GPM with ?(ε^-1 √{?} n log^{7/2} n) time.
- Finally, for ℐ being the total number of disjoint intervals of characters that match the m characters of the pattern P, we show that both the reporting and the counting variants of GPM can be solved exactly and deterministically in ?(n√{ℐ log m} +n log n) time.
At the heart of our new deterministic upper bounds for ? and ? lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest.
To conclude, we demonstrate first lower bounds for GPM. We start by showing that any deterministic or Monte Carlo algorithm for GPM must use Ω(?) time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed.

Bartłomiej Dudek, Paweł Gawrychowski, and Tatiana Starikovskaya. Generalised Pattern Matching Revisited. In 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 154, pp. 18:1-18:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{dudek_et_al:LIPIcs.STACS.2020.18, author = {Dudek, Bart{\l}omiej and Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana}, title = {{Generalised Pattern Matching Revisited}}, booktitle = {37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)}, pages = {18:1--18:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-140-5}, ISSN = {1868-8969}, year = {2020}, volume = {154}, editor = {Paul, Christophe and Bl\"{a}ser, Markus}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2020.18}, URN = {urn:nbn:de:0030-drops-118798}, doi = {10.4230/LIPIcs.STACS.2020.18}, annote = {Keywords: pattern matching, superimposed codes, conditional lower bounds} }

Document

**Published in:** LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to preprocess a set of strings of total length n over an alphabet of size sigma into a compressed data structure of worst-case optimal size O(n/log_sigma n) that given a pattern string P of length m determines if P is a prefix of one of the strings in time O(min(m log sigma,m + log n)). We show that this query time is in fact optimal regardless of the size of the data structure.
Existing solutions either use Omega(n) space or rely on word RAM techniques, such as tabulation, hashing, address arithmetic, or word-level parallelism, and hence do not work on a pointer machine. Our result is the first solution on a pointer machine that achieves worst-case o(n) space. Along the way, we develop several interesting data structures that work on a pointer machine and are of independent interest. These include an optimal data structures for random access to a grammar-compressed string and an optimal data structure for a variant of the level ancestor problem.

Philip Bille, Paweł Gawrychowski, Inge Li Gørtz, Gad M. Landau, and Oren Weimann. Top Tree Compression of Tries. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 4:1-4:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.ISAAC.2019.4, author = {Bille, Philip and Gawrychowski, Pawe{\l} and G{\o}rtz, Inge Li and Landau, Gad M. and Weimann, Oren}, title = {{Top Tree Compression of Tries}}, booktitle = {30th International Symposium on Algorithms and Computation (ISAAC 2019)}, pages = {4:1--4:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-130-6}, ISSN = {1868-8969}, year = {2019}, volume = {149}, editor = {Lu, Pinyan and Zhang, Guochuan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.4}, URN = {urn:nbn:de:0030-drops-115000}, doi = {10.4230/LIPIcs.ISAAC.2019.4}, annote = {Keywords: pattern matching, tree compression, top trees, pointer machine} }

Document

**Published in:** LIPIcs, Volume 138, 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019)

We show that the edit distance between two run-length encoded strings of compressed lengths m and n respectively, can be computed in O(mn log(mn)) time. This improves the previous record by a factor of O(n/log(mn)). The running time of our algorithm is within subpolynomial factors of being optimal, subject to the standard SETH-hardness assumption. This effectively closes a line of algorithmic research first started in 1993.

Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, and Przemysław Uznański. RLE Edit Distance in Near Optimal Time. In 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 138, pp. 66:1-66:13, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{clifford_et_al:LIPIcs.MFCS.2019.66, author = {Clifford, Rapha\"{e}l and Gawrychowski, Pawe{\l} and Kociumaka, Tomasz and Martin, Daniel P. and Uzna\'{n}ski, Przemys{\l}aw}, title = {{RLE Edit Distance in Near Optimal Time}}, booktitle = {44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019)}, pages = {66:1--66:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-117-7}, ISSN = {1868-8969}, year = {2019}, volume = {138}, editor = {Rossmanith, Peter and Heggernes, Pinar and Katoen, Joost-Pieter}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2019.66}, URN = {urn:nbn:de:0030-drops-110109}, doi = {10.4230/LIPIcs.MFCS.2019.66}, annote = {Keywords: String algorithms, Compression, Pattern matching, Run-length encoding} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)

An elastic-degenerate (ED) string is a sequence of n sets of strings of total length N, which was recently proposed to model a set of similar sequences. The ED string matching (EDSM) problem is to find all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention in the combinatorial pattern matching community, and an O(nm^{1.5}sqrt{log m} + N)-time algorithm is known [Aoyama et al., CPM 2018]. The standard assumption in the prior work on this question is that N is substantially larger than both n and m, and thus we would like to have a linear dependency on the former. Under this assumption, the natural open problem is whether we can decrease the 1.5 exponent in the time complexity, similarly as in the related (but, to the best of our knowledge, not equivalent) word break problem [Backurs and Indyk, FOCS 2016].
Our starting point is a conditional lower bound for the EDSM problem. We use the popular combinatorial Boolean matrix multiplication (BMM) conjecture stating that there is no truly subcubic combinatorial algorithm for BMM [Abboud and Williams, FOCS 2014]. By designing an appropriate reduction we show that a combinatorial algorithm solving the EDSM problem in O(nm^{1.5-epsilon} + N) time, for any epsilon>0, refutes this conjecture. Of course, the notion of combinatorial algorithms is not clearly defined, so our reduction should be understood as an indication that decreasing the exponent requires fast matrix multiplication.
Two standard tools used in algorithms on strings are string periodicity and fast Fourier transform. Our main technical contribution is that we successfully combine these tools with fast matrix multiplication to design a non-combinatorial O(nm^{1.381} + N)-time algorithm for EDSM. To the best of our knowledge, we are the first to do so.

Giulia Bernardini, Paweł Gawrychowski, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 21:1-21:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{bernardini_et_al:LIPIcs.ICALP.2019.21, author = {Bernardini, Giulia and Gawrychowski, Pawe{\l} and Pisanti, Nadia and Pissis, Solon P. and Rosone, Giovanna}, title = {{Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication}}, booktitle = {46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)}, pages = {21:1--21:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-109-2}, ISSN = {1868-8969}, year = {2019}, volume = {132}, editor = {Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.21}, URN = {urn:nbn:de:0030-drops-105973}, doi = {10.4230/LIPIcs.ICALP.2019.21}, annote = {Keywords: string algorithms, pattern matching, elastic-degenerate string, matrix multiplication, fast Fourier transform} }

Document

Invited Talk

**Published in:** LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Periodicity is a fundamental combinatorial property of strings. We say that p is a period of a string s[1..n] when s[i]=s[i+p] for every i such that both s[i] and s[i+p] are defined. While this notion is interesting on its own, it can be often used as a tool for designing efficient algorithms. At a high level, such algorithms often operate differently depending on whether a given string does or does not have a small period, where small usually means smaller than half of its length (or, say, quarter). In other words, we design an algorithm that is efficient if the given string is repetitive, and another algorithm that is efficient if the given string is non-repetitive, in every case carefully exploiting either the periodicity or the fact that input looks sufficiently “random”, and then choose the appropriate algorithm depending on the input. Of course, in some cases, one needs to proceed in a more complex manner, for example by classifying the whole string look at its substrings and process each of them differently depending on its structure.
I will survey results, mostly connected to different version of pattern matching, that are based on this paradigm. This will include the recent generalization of periodicity that can be applied in approximate pattern matching, and some examples of how the notion of periodicity can be applied to design a better data structure.

Paweł Gawrychowski. How to Exploit Periodicity (Invited Talk). In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, p. 1:1, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{gawrychowski:LIPIcs.CPM.2019.1, author = {Gawrychowski, Pawe{\l}}, title = {{How to Exploit Periodicity}}, booktitle = {30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)}, pages = {1:1--1:1}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-103-0}, ISSN = {1868-8969}, year = {2019}, volume = {128}, editor = {Pisanti, Nadia and P. Pissis, Solon}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.1}, URN = {urn:nbn:de:0030-drops-104727}, doi = {10.4230/LIPIcs.CPM.2019.1}, annote = {Keywords: periodicity, pattern matching, Hamming distance} }

Document

**Published in:** LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

In the k-mismatch problem we are given a pattern of length m and a text and must find all locations where the Hamming distance between the pattern and the text is at most k. A series of recent breakthroughs have resulted in an ultra-efficient streaming algorithm for this problem that requires only O(k log m/k) space [Clifford, Kociumaka, Porat, SODA 2019]. In this work, we consider a strictly harder problem called dictionary matching with k mismatches, where we are given a dictionary of d patterns of lengths at most m and must find all their k-mismatch occurrences in the text, and show the first streaming algorithm for it. The algorithm uses O(k d log^k d polylog m) space and processes each position of the text in O(k log^k d polylog m + occ) time, where occ is the number of k-mismatch occurrences of the patterns that end at this position. The algorithm is randomised and outputs correct answers with high probability.

Paweł Gawrychowski and Tatiana Starikovskaya. Streaming Dictionary Matching with Mismatches. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 21:1-21:15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2019.21, author = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana}, title = {{Streaming Dictionary Matching with Mismatches}}, booktitle = {30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)}, pages = {21:1--21:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-103-0}, ISSN = {1868-8969}, year = {2019}, volume = {128}, editor = {Pisanti, Nadia and P. Pissis, Solon}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.21}, URN = {urn:nbn:de:0030-drops-104925}, doi = {10.4230/LIPIcs.CPM.2019.21}, annote = {Keywords: Streaming, multiple pattern matching, Hamming distance} }

Document

**Published in:** LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

In this work, we show two streaming algorithms for computing the length of the shortest cover of a string of length n. We start by showing a two-pass algorithm that uses O(log^2 n) space and then show a one-pass streaming algorithm that uses O(sqrt{n log n}) space. Both algorithms run in near-linear time. The algorithms are randomized and compute the answer incorrectly with probability inverse-polynomial in n. We also show that there is no sublinear-space streaming algorithm for computing the length of the shortest seed of a string.

Paweł Gawrychowski, Jakub Radoszewski, and Tatiana Starikovskaya. Quasi-Periodicity in Streams. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 22:1-22:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2019.22, author = {Gawrychowski, Pawe{\l} and Radoszewski, Jakub and Starikovskaya, Tatiana}, title = {{Quasi-Periodicity in Streams}}, booktitle = {30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)}, pages = {22:1--22:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-103-0}, ISSN = {1868-8969}, year = {2019}, volume = {128}, editor = {Pisanti, Nadia and P. Pissis, Solon}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.22}, URN = {urn:nbn:de:0030-drops-104930}, doi = {10.4230/LIPIcs.CPM.2019.22}, annote = {Keywords: Streaming algorithms, quasi-periodicity, covers, seeds} }

Document

**Published in:** LIPIcs, Volume 126, 36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019)

For k >= 3, a k-rollercoaster is a sequence of numbers whose every maximal contiguous subsequence, that is increasing or decreasing, has length at least k; 3-rollercoasters are called simply rollercoasters. Given a sequence of distinct real numbers, we are interested in computing its maximum-length (not necessarily contiguous) subsequence that is a k-rollercoaster. Biedl et al. (2018) have shown that each sequence of n distinct real numbers contains a rollercoaster of length at least ceil[n/2] for n>7, and that a longest rollercoaster contained in such a sequence can be computed in O(n log n)-time (or faster, in O(n log log n) time, when the input sequence is a permutation of {1,...,n}). They have also shown that every sequence of n >=slant (k-1)^2+1 distinct real numbers contains a k-rollercoaster of length at least n/(2(k-1)) - 3k/2, and gave an O(nk log n)-time (respectively, O(n k log log n)-time) algorithm computing a longest k-rollercoaster in a sequence of length n (respectively, a permutation of {1,...,n}).
In this paper, we give an O(nk^2)-time algorithm computing the length of a longest k-rollercoaster contained in a sequence of n distinct real numbers; hence, for constant k, our algorithm computes the length of a longest k-rollercoaster in optimal linear time. The algorithm can be easily adapted to output the respective k-rollercoaster. In particular, this improves the results of Biedl et al. (2018), by showing that a longest rollercoaster can be computed in optimal linear time. We also present an algorithm computing the length of a longest k-rollercoaster in O(n log^2 n)-time, that is, subquadratic even for large values of k <= n. Again, the rollercoaster can be easily retrieved. Finally, we show an Omega(n log k) lower bound for the number of comparisons in any comparison-based algorithm computing the length of a longest k-rollercoaster.

Paweł Gawrychowski, Florin Manea, and Radosław Serafin. Fast and Longest Rollercoasters. In 36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 126, pp. 30:1-30:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2019.30, author = {Gawrychowski, Pawe{\l} and Manea, Florin and Serafin, Rados{\l}aw}, title = {{Fast and Longest Rollercoasters}}, booktitle = {36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019)}, pages = {30:1--30:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-100-9}, ISSN = {1868-8969}, year = {2019}, volume = {126}, editor = {Niedermeier, Rolf and Paul, Christophe}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2019.30}, URN = {urn:nbn:de:0030-drops-102694}, doi = {10.4230/LIPIcs.STACS.2019.30}, annote = {Keywords: sequences, alternating runs, patterns in permutations} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail