DROPS

Volume

LIPIcs, Volume 191

32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

CPM 2021, July 5-7, 2021, Wrocław, Poland

Editors: Paweł Gawrychowski and Tatiana Starikovskaya

Document

DOI: 10.4230/LIPIcs.SEA.2024.9

Top- k Frequent Patterns in Streams and Parameterized-Space LZ Compression

Authors: Patrick Dinklage, Johnnes Fischer, and Nicola Prezza

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)

Abstract

We present novel online approximations of the Lempel-Ziv 77 (LZ77) and Lempel-Ziv 78 (LZ78) compression schemes [Lempel & Ziv, 1977/1978] with parameterizable space usage based on estimating which k patterns occur the most frequently in the streamed input for parameter k. This new approach overcomes the issue of finding only local repetitions, which is a natural limitation of algorithms that compress using a sliding window or by partitioning the input into blocks. For this, we introduce the top-k trie, a summary for maintaining online the top-k frequent consecutive patterns in a stream of characters based on a combination of the Lempel-Ziv 78 compression scheme and the Misra-Gries algorithm for frequent item estimation in streams. Using straightforward encoding, our implementations yield compression ratios (output over input size) competitive with established general-purpose LZ-based compression utilities such as gzip or xz.

Cite as

Patrick Dinklage, Johnnes Fischer, and Nicola Prezza. Top- k Frequent Patterns in Streams and Parameterized-Space LZ Compression. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 9:1-9:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{dinklage_et_al:LIPIcs.SEA.2024.9,
  author =	{Dinklage, Patrick and Fischer, Johnnes and Prezza, Nicola},
  title =	{{Top- k Frequent Patterns in Streams and Parameterized-Space LZ Compression}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{9:1--9:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.9},
  URN =		{urn:nbn:de:0030-drops-203748},
  doi =		{10.4230/LIPIcs.SEA.2024.9},
  annote =	{Keywords: compression, streaming, heavy hitters, algorithm engineering}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.39

Optimal Bounds for Distinct Quartics

Authors: Panagiotis Charalampopoulos, Paweł Gawrychowski, and Samah Ghazawi

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

A fundamental concept related to strings is that of repetitions. It has been extensively studied in many versions, from both purely combinatorial and algorithmic angles. One of the most basic questions is how many distinct squares, i.e., distinct strings of the form UU, a string of length n can contain as fragments. It turns out that this is always 𝒪(n), and the bound cannot be improved to sublinear in n [Fraenkel and Simpson, JCTA 1998]. Several similar questions about repetitions in strings have been considered, and by now we seem to have a good understanding of their repetitive structure. For higher-dimensional strings, the basic concept of periodicity has been successfully extended and applied to design efficient algorithms - it is inherently more complex than for regular strings. Extending the notion of repetitions and understanding the repetitive structure of higher-dimensional strings is however far from complete. Quartics were introduced by Apostolico and Brimkov [TCS 2000] as analogues of squares in two dimensions. Charalampopoulos, Radoszewski, Rytter, Waleń, and Zuba [ESA 2020] proved that the number of distinct quartics in an n×n 2D string is 𝒪(n²log²n) and that they can be computed in 𝒪(n²log²n) time. Gawrychowski, Ghazawi, and Landau [SPIRE 2021] constructed an infinite family of n×n 2D strings with Ω(n²log n) distinct quartics. This brings the challenge of determining asymptotically tight bounds. Here, we settle both the combinatorial and the algorithmic aspects of this question: the number of distinct quartics in an n×n 2D string is 𝒪(n²log n) and they can be computed in the worst-case optimal 𝒪(n²log n) time. As expected, our solution heavily exploits the periodic structure implied by occurrences of quartics. However, the two-dimensional nature of the problem introduces some technical challenges. Somewhat surprisingly, we overcome the final challenge for the combinatorial bound using a result of Marcus and Tardos [JCTA 2004] for permutation avoidance on matrices.

Cite as

Panagiotis Charalampopoulos, Paweł Gawrychowski, and Samah Ghazawi. Optimal Bounds for Distinct Quartics. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 39:1-39:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.ICALP.2024.39,
  author =	{Charalampopoulos, Panagiotis and Gawrychowski, Pawe{\l} and Ghazawi, Samah},
  title =	{{Optimal Bounds for Distinct Quartics}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{39:1--39:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.39},
  URN =		{urn:nbn:de:0030-drops-201823},
  doi =		{10.4230/LIPIcs.ICALP.2024.39},
  annote =	{Keywords: 2D strings, quartics, repetitions, periodicity}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.27

The Discrepancy of Shortest Paths

Authors: Greg Bodwin, Chengyuan Deng, Jie Gao, Gary Hoppenworth, Jalaj Upadhyay, and Chen Wang

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

The hereditary discrepancy of a set system is a quantitative measure of the pseudorandom properties of the system. Roughly speaking, hereditary discrepancy measures how well one can 2-color the elements of the system so that each set contains approximately the same number of elements of each color. Hereditary discrepancy has numerous applications in computational geometry, communication complexity and derandomization. More recently, the hereditary discrepancy of the set system of shortest paths has found applications in differential privacy [Chen et al. SODA 23]. The contribution of this paper is to improve the upper and lower bounds on the hereditary discrepancy of set systems of unique shortest paths in graphs. In particular, we show that any system of unique shortest paths in an undirected weighted graph has hereditary discrepancy O(n^{1/4}), and we construct lower bound examples demonstrating that this bound is tight up to polylog n factors. Our lower bounds hold even for planar graphs and bipartite graphs, and improve a previous lower bound of Ω(n^{1/6}) obtained by applying the trace bound of Chazelle and Lvov [SoCG'00] to a classical point-line system of Erdős. As applications, we improve the lower bound on the additive error for differentially-private all pairs shortest distances from Ω(n^{1/6}) [Chen et al. SODA 23] to Ω̃(n^{1/4}), and we improve the lower bound on additive error for the differentially-private all sets range queries problem to Ω̃(n^{1/4}), which is tight up to polylog n factors [Deng et al. WADS 23].

Cite as

Greg Bodwin, Chengyuan Deng, Jie Gao, Gary Hoppenworth, Jalaj Upadhyay, and Chen Wang. The Discrepancy of Shortest Paths. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 27:1-27:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bodwin_et_al:LIPIcs.ICALP.2024.27,
  author =	{Bodwin, Greg and Deng, Chengyuan and Gao, Jie and Hoppenworth, Gary and Upadhyay, Jalaj and Wang, Chen},
  title =	{{The Discrepancy of Shortest Paths}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{27:1--27:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.27},
  URN =		{urn:nbn:de:0030-drops-201705},
  doi =		{10.4230/LIPIcs.ICALP.2024.27},
  annote =	{Keywords: Discrepancy, hereditary discrepancy, shortest paths, differential privacy}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.30

Õptimal Dynamic Time Warping on Run-Length Encoded Strings

Authors: Itai Boneh, Shay Golan, Shay Mozes, and Oren Weimann

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

Dynamic Time Warping (DTW) distance is the optimal cost of matching two strings when extending runs of letters is for free. Therefore, it is natural to measure the time complexity of DTW in terms of the number of runs n (rather than the string lengths N). In this paper, we give an Õ(n²) time algorithm for computing the DTW distance. This matches (up to log factors) the known (conditional) lower bound, and should be compared with the previous fastest O(n³) time exact algorithm and the Õ(n²) time approximation algorithm. Our method also immediately implies an Õ(nk) time algorithm when the distance is bounded by k. This should be compared with the previous fastest O(n²k) and O(Nk) time exact algorithms and the Õ(nk) time approximation algorithm.

Cite as

Itai Boneh, Shay Golan, Shay Mozes, and Oren Weimann. Õptimal Dynamic Time Warping on Run-Length Encoded Strings. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 30:1-30:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.ICALP.2024.30,
  author =	{Boneh, Itai and Golan, Shay and Mozes, Shay and Weimann, Oren},
  title =	{{\~{O}ptimal Dynamic Time Warping on Run-Length Encoded Strings}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{30:1--30:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.30},
  URN =		{urn:nbn:de:0030-drops-201730},
  doi =		{10.4230/LIPIcs.ICALP.2024.30},
  annote =	{Keywords: Dynamic time warping, Fr\'{e}chet distance, edit distance, run-length encoding}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.64

Minimizing Tardy Processing Time on a Single Machine in Near-Linear Time

Authors: Nick Fischer and Leo Wennmann

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

In this work we revisit the elementary scheduling problem 1||∑ p_j U_j. The goal is to select, among n jobs with processing times and due dates, a subset of jobs with maximum total processing time that can be scheduled in sequence without violating their due dates. This problem is NP-hard, but a classical algorithm by Lawler and Moore from the 60s solves this problem in pseudo-polynomial time O(nP), where P is the total processing time of all jobs. With the aim to develop best-possible pseudo-polynomial-time algorithms, a recent wave of results has improved Lawler and Moore’s algorithm for 1||∑ p_j U_j: First to time Õ(P^{7/4}) [Bringmann, Fischer, Hermelin, Shabtay, Wellnitz; ICALP'20], then to time Õ(P^{5/3}) [Klein, Polak, Rohwedder; SODA'23], and finally to time Õ(P^{7/5}) [Schieber, Sitaraman; WADS'23]. It remained an exciting open question whether these works can be improved further. In this work we develop an algorithm in near-linear time Õ(P) for the 1||∑ p_j U_j problem. This running time not only significantly improves upon the previous results, but also matches conditional lower bounds based on the Strong Exponential Time Hypothesis or the Set Cover Hypothesis and is therefore likely optimal (up to subpolynomial factors). Our new algorithm also extends to the case of m machines in time Õ(P^m). In contrast to the previous improvements, we take a different, more direct approach inspired by the recent reductions from Modular Subset Sum to dynamic string problems. We thereby arrive at a satisfyingly simple algorithm.

Cite as

Nick Fischer and Leo Wennmann. Minimizing Tardy Processing Time on a Single Machine in Near-Linear Time. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 64:1-64:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{fischer_et_al:LIPIcs.ICALP.2024.64,
  author =	{Fischer, Nick and Wennmann, Leo},
  title =	{{Minimizing Tardy Processing Time on a Single Machine in Near-Linear Time}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{64:1--64:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.64},
  URN =		{urn:nbn:de:0030-drops-202079},
  doi =		{10.4230/LIPIcs.ICALP.2024.64},
  annote =	{Keywords: Scheduling, Fine-Grained Complexity, Dynamic Strings}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.95

Fully Dynamic Strongly Connected Components in Planar Digraphs

Authors: Adam Karczmarz and Marcin Smulewicz

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

In this paper we consider maintaining strongly connected components (SCCs) of a directed planar graph subject to edge insertions and deletions. We show a data structure maintaining an implicit representation of the SCCs within Õ(n^{6/7}) worst-case time per update. The data structure supports, in O(log²{n}) time, reporting vertices of any specified SCC (with constant overhead per reported vertex) and aggregating vertex information (e.g., computing the maximum label) over all the vertices of that SCC. Furthermore, it can maintain global information about the structure of SCCs, such as the number of SCCs, or the size of the largest SCC. To the best of our knowledge, no fully dynamic SCCs data structures with sublinear update time have been previously known for any major subclass of digraphs. Our result should be contrasted with the n^{1-o(1)} amortized update time lower bound conditional on SETH, which holds even for dynamically maintaining whether a general digraph has more than two SCCs.

Cite as

Adam Karczmarz and Marcin Smulewicz. Fully Dynamic Strongly Connected Components in Planar Digraphs. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 95:1-95:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{karczmarz_et_al:LIPIcs.ICALP.2024.95,
  author =	{Karczmarz, Adam and Smulewicz, Marcin},
  title =	{{Fully Dynamic Strongly Connected Components in Planar Digraphs}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{95:1--95:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.95},
  URN =		{urn:nbn:de:0030-drops-202388},
  doi =		{10.4230/LIPIcs.ICALP.2024.95},
  annote =	{Keywords: dynamic strongly connected components, dynamic strong connectivity, dynamic reachability, planar graphs}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.89

Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching

Authors: Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

A parameterized string (p-string) is a string over an alphabet (Σ_s ∪ Σ_p), where Σ_s and Σ_p are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings x and y are said to parameterized match (p-match) if and only if x can be transformed into y by applying a bijection on Σ_p to every occurrence of p-symbols in x. The indexing problem for p-matching is to preprocess a p-string T of length n so that we can efficiently find the occurrences of substrings of T that p-match with a given pattern. Let σ_s and respectively σ_p be the numbers of distinct s-symbols and p-symbols that appear in T and σ = σ_s + σ_p. Extending the Burrows-Wheeler Transform (BWT) based index for exact string pattern matching, Ganguly et al. [SODA 2017] proposed parameterized BWTs (pBWTs) to design the first compact index for p-matching, and posed an open problem on how to construct the pBWT-based index in compact space, i.e., in O(n lg |Σ_s ∪ Σ_p|) bits of space. Hashimoto et al. [SPIRE 2022] showed how to construct the pBWT for T, under the assumption that Σ_s ∪ Σ_p = [0..O(σ)], in O(n lg σ) bits of space and O(n (σ_p lg n)/(lg lg n)) time in an online manner while reading the symbols of T from right to left. In this paper, we refine Hashimoto et al.’s algorithm to work in O(n lg σ) bits of space and O(n (lg σ_p lg n)/(lg lg n)) time in a more general assumption that Σ_s ∪ Σ_p = [0..n^{O(1)}]. Our result has an immediate application to constructing parameterized suffix arrays in O(n (lg σ_p lg n)/(lg lg n)) time and O(n lg σ) bits of working space. We also show that our data structure can support backward search, a core procedure of BWT-based indexes, at any stage of the online construction, making it the first compact index for p-matching that can be constructed in compact space and even in an online manner.

Cite as

Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara. Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 89:1-89:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{iseri_et_al:LIPIcs.ICALP.2024.89,
  author =	{Iseri, Kento and I, Tomohiro and Hendrian, Diptarama and K\"{o}ppl, Dominik and Yoshinaka, Ryo and Shinohara, Ayumi},
  title =	{{Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{89:1--89:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.89},
  URN =		{urn:nbn:de:0030-drops-202324},
  doi =		{10.4230/LIPIcs.ICALP.2024.89},
  annote =	{Keywords: Index for parameterized pattern matching, Parameterized Burrows-Wheeler Transform, Online construction}
}

@InProceedings{iseri_et_al:LIPIcs.ICALP.2024.89,
  author =	{Iseri, Kento and I, Tomohiro and Hendrian, Diptarama and K\"{o}ppl, Dominik and Yoshinaka, Ryo and Shinohara, Ayumi},
  title =	{{Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{89:1--89:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.89},
  URN =		{urn:nbn:de:0030-drops-202324},
  doi =		{10.4230/LIPIcs.ICALP.2024.89},
  annote =	{Keywords: Index for parameterized pattern matching, Parameterized Burrows-Wheeler Transform, Online construction}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.115

On the Cut-Query Complexity of Approximating Max-Cut

Authors: Orestis Plevrakis, Seyoon Ragavan, and S. Matthew Weinberg

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We consider the problem of query-efficient global max-cut on a weighted undirected graph in the value oracle model examined by [Rubinstein et al., 2018]. Graph algorithms in this cut query model and other query models have recently been studied for various other problems such as min-cut, connectivity, bipartiteness, and triangle detection. Max-cut in the cut query model can also be viewed as a natural special case of submodular function maximization: on query S ⊆ V, the oracle returns the total weight of the cut between S and V\S. Our first main technical result is a lower bound stating that a deterministic algorithm achieving a c-approximation for any c > 1/2 requires Ω(n) queries. This uses an extension of the cut dimension to rule out approximation (prior work of [Graur et al., 2020] introducing the cut dimension only rules out exact solutions). Secondly, we provide a randomized algorithm with Õ(n) queries that finds a c-approximation for any c < 1. We achieve this using a query-efficient sparsifier for undirected weighted graphs (prior work of [Rubinstein et al., 2018] holds only for unweighted graphs). To complement these results, for most constants c ∈ (0,1], we nail down the query complexity of achieving a c-approximation, for both deterministic and randomized algorithms (up to logarithmic factors). Analogously to general submodular function maximization in the same model, we observe a phase transition at c = 1/2: we design a deterministic algorithm for global c-approximate max-cut in O(log n) queries for any c < 1/2, and show that any randomized algorithm requires Ω(n/log n) queries to find a c-approximate max-cut for any c > 1/2. Additionally, we show that any deterministic algorithm requires Ω(n²) queries to find an exact max-cut (enough to learn the entire graph).

Cite as

Orestis Plevrakis, Seyoon Ragavan, and S. Matthew Weinberg. On the Cut-Query Complexity of Approximating Max-Cut. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 115:1-115:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{plevrakis_et_al:LIPIcs.ICALP.2024.115,
  author =	{Plevrakis, Orestis and Ragavan, Seyoon and Weinberg, S. Matthew},
  title =	{{On the Cut-Query Complexity of Approximating Max-Cut}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{115:1--115:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.115},
  URN =		{urn:nbn:de:0030-drops-202587},
  doi =		{10.4230/LIPIcs.ICALP.2024.115},
  annote =	{Keywords: query complexity, maximum cut, approximation algorithms, graph sparsification}
}

Document

DOI: 10.4230/LIPIcs.CPM.2024.4

Internal Pattern Matching in Small Space and Applications

Authors: Gabriel Bathie, Panagiotis Charalampopoulos, and Tatiana Starikovskaya

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

Abstract

In this work, we consider pattern matching variants in small space, that is, in the read-only setting, where we want to bound the space usage on top of storing the strings. Our main contribution is a space-time trade-off for the Internal Pattern Matching (IPM) problem, where the goal is to construct a data structure over a string S of length n that allows one to answer the following type of queries: Compute the occurrences of a fragment P of S inside another fragment T of S, provided that |T| < 2|P|. For any τ ∈ [1 . . n/log² n], we present a nearly-optimal Õ(n/τ)-size data structure that can be built in Õ(n) time using Õ(n/τ) extra space, and answers IPM queries in O(τ+log n log³ log n) time. IPM queries have been identified as a crucial primitive operation for the analysis of algorithms on strings. In particular, the complexities of several recent algorithms for approximate pattern matching are expressed with regards to the number of calls to a small set of primitive operations that include IPM queries; our data structure allows us to port these results to the small-space setting. We further showcase the applicability of our IPM data structure by using it to obtain space-time trade-offs for the longest common substring and circular pattern matching problems in the asymmetric streaming setting.

Cite as

Gabriel Bathie, Panagiotis Charalampopoulos, and Tatiana Starikovskaya. Internal Pattern Matching in Small Space and Applications. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 4:1-4:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bathie_et_al:LIPIcs.CPM.2024.4,
  author =	{Bathie, Gabriel and Charalampopoulos, Panagiotis and Starikovskaya, Tatiana},
  title =	{{Internal Pattern Matching in Small Space and Applications}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{4:1--4:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.4},
  URN =		{urn:nbn:de:0030-drops-201148},
  doi =		{10.4230/LIPIcs.CPM.2024.4},
  annote =	{Keywords: internal pattern matching, longest common substring, small-space algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2024.8

Faster Sliding Window String Indexing in Streams

Authors: Philip Bille, Paweł Gawrychowski, Inge Li Gørtz, and Simon R. Tarnow

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

Abstract

The classical string indexing problem asks to preprocess the input string S for efficient pattern matching queries. Bille, Fischer, Gørtz, Pedersen, and Stordalen [CPM 2023] generalized this to the {streaming sliding window string indexing} problem, where the input string S arrives as a stream, and we are asked to maintain an index of the last w characters, called the window. Further, at any point in time, a pattern P might appear, again given as a stream, and all occurrences of P in the current window must be output. We require that the time to process each character of the text or the pattern is worst-case. It appears that standard string indexing structures, such as suffix trees, do not provide an efficient solution in such a setting, as to obtain a good worst-case bound, they necessarily need to work right-to-left, and we cannot reverse the pattern while keeping a worst-case guarantee on the time to process each of its characters. Nevertheless, it is possible to obtain a bound of 𝒪(log w) (with high probability) by maintaining a hierarchical structure of multiple suffix trees. We significantly improve this upper bound by designing a black-box reduction to maintain a suffix tree under prepending characters to the current text. By plugging in the known results, this allows us to obtain a bound of 𝒪(log log w +log log σ) (with high probability), where σ is the size of the alphabet. Further, we introduce an even more general problem, called the {streaming dynamic window string indexing}, where the goal is to maintain the current text under adding and deleting characters at either end and design a similar black-box reduction.

Cite as

Philip Bille, Paweł Gawrychowski, Inge Li Gørtz, and Simon R. Tarnow. Faster Sliding Window String Indexing in Streams. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 8:1-8:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2024.8,
  author =	{Bille, Philip and Gawrychowski, Pawe{\l} and G{\o}rtz, Inge Li and Tarnow, Simon R.},
  title =	{{Faster Sliding Window String Indexing in Streams}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{8:1--8:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.8},
  URN =		{urn:nbn:de:0030-drops-201183},
  doi =		{10.4230/LIPIcs.CPM.2024.8},
  annote =	{Keywords: data structures, pattern matching, string indexing}
}

Document

DOI: 10.4230/LIPIcs.CPM.2024.13

Online Context-Free Recognition in OMv Time

Authors: Bartłomiej Dudek and Paweł Gawrychowski

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

Abstract

One of the classical algorithmic problems in formal languages is the context-free recognition problem: for a given context-free grammar and a length-n string, check if the string belongs to the language described by the grammar. Already in 1975, Valiant showed that this can be solved in {O}̃(n^ω) time, where ω is the matrix multiplication exponent. More recently, Abboud, Backurs, and Vassilevska Williams [FOCS 2015] showed that any improvement on this complexity would imply a breakthrough algorithm for the k-Clique problem. We study the natural online version of this problem, where the input string w[1..n] is given left-to-right, and after having seen every prefix w[1..t] we should output if it belongs to the language. The goal is to maintain the total running time to process the whole input. Even though this version has been extensively studied in the past, the best known upper bound was {O}(n³/log²n). We connect the complexity of online context-free recognition to that of Online Matrix-Vector Multiplication, which allows us to improve the upper bound to n³/2^{Ω(√{log{n}})}.

Cite as

Bartłomiej Dudek and Paweł Gawrychowski. Online Context-Free Recognition in OMv Time. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 13:1-13:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{dudek_et_al:LIPIcs.CPM.2024.13,
  author =	{Dudek, Bart{\l}omiej and Gawrychowski, Pawe{\l}},
  title =	{{Online Context-Free Recognition in OMv Time}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{13:1--13:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.13},
  URN =		{urn:nbn:de:0030-drops-201235},
  doi =		{10.4230/LIPIcs.CPM.2024.13},
  annote =	{Keywords: data structures, context-free grammar parsing, online matrix-vector multiplication}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2023.12

Substring Complexity in Sublinear Space

Authors: Giulia Bernardini, Gabriele Fici, Paweł Gawrychowski, and Solon P. Pissis

Published in: LIPIcs, Volume 283, 34th International Symposium on Algorithms and Computation (ISAAC 2023)

Abstract

Shannon’s entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of strings, e.g., the size z of the Lempel–Ziv parse or the number r of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size γ of a smallest string attractor. Let T be a string of length n. A string attractor of T is a set of positions of T capturing the occurrences of all the substrings of T. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing γ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure of compressibility that is based on the function S_T(k) counting the number of distinct substrings of length k of T, also known as the substring complexity of T. This new measure is defined as δ = sup{S_T(k)/k, k ≥ 1} and lower bounds all the relevant ad hoc measures previously considered. In particular, δ ≤ γ always holds and δ can be computed in 𝒪(n) time using Θ(n) working space. Kociumaka et al. showed that one can construct an 𝒪(δ log n/(δ))-sized representation of T supporting efficient direct access and efficient pattern matching queries on T. Given that for highly compressible strings, δ is significantly smaller than n, it is natural to pose the following question: Can we compute δ efficiently using sublinear working space? It is straightforward to show that in the comparison model, any algorithm computing δ using 𝒪(b) space requires Ω(n^{2-o(1)}/b) time through a reduction from the element distinctness problem [Yao, SIAM J. Comput. 1994]. We thus wanted to investigate whether we can indeed match this lower bound. We address this algorithmic challenge by showing the following bounds to compute δ: - 𝒪((n³log b)/b²) time using 𝒪(b) space, for any b ∈ [1,n], in the comparison model. - 𝒪̃(n²/b) time using 𝒪̃(b) space, for any b ∈ [√n,n], in the word RAM model. This gives an 𝒪̃(n^{1+ε})-time and 𝒪̃(n^{1-ε})-space algorithm to compute δ, for any 0 < ε ≤ 1/2. Let us remark that our algorithms compute S_T(k), for all k, within the same complexities.

Cite as

Giulia Bernardini, Gabriele Fici, Paweł Gawrychowski, and Solon P. Pissis. Substring Complexity in Sublinear Space. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 12:1-12:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{bernardini_et_al:LIPIcs.ISAAC.2023.12,
  author =	{Bernardini, Giulia and Fici, Gabriele and Gawrychowski, Pawe{\l} and Pissis, Solon P.},
  title =	{{Substring Complexity in Sublinear Space}},
  booktitle =	{34th International Symposium on Algorithms and Computation (ISAAC 2023)},
  pages =	{12:1--12:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-289-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{283},
  editor =	{Iwata, Satoru and Kakimura, Naonori},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2023.12},
  URN =		{urn:nbn:de:0030-drops-193143},
  doi =		{10.4230/LIPIcs.ISAAC.2023.12},
  annote =	{Keywords: sublinear-space algorithm, string algorithm, substring complexity}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.8

Optimal Near-Linear Space Heaviest Induced Ancestors

Authors: Panagiotis Charalampopoulos, Bartłomiej Dudek, Paweł Gawrychowski, and Karol Pokorski

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

We revisit the Heaviest Induced Ancestors (HIA) problem that was introduced by Gagie, Gawrychowski, and Nekrich [CCCG 2013] and has a number of applications in string algorithms. Let T₁ and T₂ be two rooted trees whose nodes have weights that are increasing in all root-to-leaf paths, and labels on the leaves, such that no two leaves of a tree have the same label. A pair of nodes (u, v) ∈ T₁ × T₂ is induced if and only if there is a label shared by leaf-descendants of u and v. In an HIA query, given nodes x ∈ T₁ and y ∈ T₂, the goal is to find an induced pair of nodes (u, v) of the maximum total weight such that u is an ancestor of x and v is an ancestor of y. Let n be the upper bound on the sizes of the two trees. It is known that no data structure of size 𝒪̃(n) can answer HIA queries in o(log n / log log n) time [Charalampopoulos, Gawrychowski, Pokorski; ICALP 2020]. This (unconditional) lower bound is a polyloglog n factor away from the query time of the fastest 𝒪̃(n)-size data structure known to date for the HIA problem [Abedin, Hooshmand, Ganguly, Thankachan; Algorithmica 2022]. In this work, we resolve the query-time complexity of the HIA problem for the near-linear space regime by presenting a data structure that can be built in 𝒪̃(n) time and answers HIA queries in 𝒪(log n/log log n) time. As a direct corollary, we obtain an 𝒪̃(n)-size data structure that maintains the LCS of a static string and a dynamic string, both of length at most n, in time optimal for this space regime. The main ingredients of our approach are fractional cascading and the utilization of an 𝒪(log n/ log log n)-depth tree decomposition. The latter allows us to break through the Ω(log n) barrier faced by previous works, due to the depth of the considered heavy-path decompositions.

Cite as

Panagiotis Charalampopoulos, Bartłomiej Dudek, Paweł Gawrychowski, and Karol Pokorski. Optimal Near-Linear Space Heaviest Induced Ancestors. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2023.8,
  author =	{Charalampopoulos, Panagiotis and Dudek, Bart{\l}omiej and Gawrychowski, Pawe{\l} and Pokorski, Karol},
  title =	{{Optimal Near-Linear Space Heaviest Induced Ancestors}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.8},
  URN =		{urn:nbn:de:0030-drops-179624},
  doi =		{10.4230/LIPIcs.CPM.2023.8},
  annote =	{Keywords: data structures, string algorithms, fractional cascading}
}

Document

DOI: 10.4230/LIPIcs.CPM.2023.12

Compressed Indexing for Consecutive Occurrences

Authors: Paweł Gawrychowski, Garance Gourdel, Tatiana Starikovskaya, and Teresa Anna Steiner

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

Abstract

The fundamental question considered in algorithms on strings is that of indexing, that is, preprocessing a given string for specific queries. By now we have a number of efficient solutions for this problem when the queries ask for an exact occurrence of a given pattern P. However, practical applications motivate the necessity of considering more complex queries, for example concerning near occurrences of two patterns. Recently, Bille et al. [CPM 2021] introduced a variant of such queries, called gapped consecutive occurrences, in which a query consists of two patterns P₁ and P₂ and a range [a,b], and one must find all consecutive occurrences (q₁,q₂) of P₁ and P₂ such that q₂-q₁ ∈ [a,b]. By their results, we cannot hope for a very efficient indexing structure for such queries, even if a = 0 is fixed (although at the same time they provided a non-trivial upper bound). Motivated by this, we focus on a text given as a straight-line program (SLP) and design an index taking space polynomial in the size of the grammar that answers such queries in time optimal up to polylog factors.

Cite as

Paweł Gawrychowski, Garance Gourdel, Tatiana Starikovskaya, and Teresa Anna Steiner. Compressed Indexing for Consecutive Occurrences. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 12:1-12:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2023.12,
  author =	{Gawrychowski, Pawe{\l} and Gourdel, Garance and Starikovskaya, Tatiana and Steiner, Teresa Anna},
  title =	{{Compressed Indexing for Consecutive Occurrences}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{12:1--12:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.12},
  URN =		{urn:nbn:de:0030-drops-179666},
  doi =		{10.4230/LIPIcs.CPM.2023.12},
  annote =	{Keywords: Compressed indexing, two patterns, consecutive occurrences}
}

69 Search Results for "Gawrychowski, Paweł"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as