Search Results

Documents authored by Wirth, Anthony


Document
Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams

Authors: Amit Chakrabarti, Andrew McGregor, and Anthony Wirth

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)


Abstract
The maximum coverage problem is to select k sets, from a collection of m sets, such that the cardinality of their union, in a universe of size n, is maximized. We consider (1-1/e-ε)-approximation algorithms for this NP-hard problem in three standard data stream models. 1) Dynamic Model. The stream consists of a sequence of sets being inserted and deleted. Our multi-pass algorithm uses ε^{-2} k ⋅ polylog(n,m) space. The best previous result (Assadi and Khanna, SODA 2018) used (n +ε^{-4} k) polylog(n,m) space. While both algorithms use O(ε^{-1} log m) passes, our analysis shows that, when ε ≤ 1/log log m, it is possible to reduce the number of passes by a 1/log log m factor without incurring additional space. 2) Random Order Model. In this model, there are no deletions, and the sets forming the instance are uniformly randomly permuted to form the input stream. We show that a single pass and k polylog(n,m) space suffices for arbitrary small constant ε. The best previous result, by Warneke et al. (ESA 2023), used k² polylog(n,m) space. 3) Insert-Only Model. Lastly, our results, along with numerous previous results, use a sub-sampling technique introduced by McGregor and Vu (ICDT 2017) to sparsify the input instance. We explain how this technique and others used in the paper can be implemented such that the amortized update time of our algorithm is polylogarithmic. This also implies an improvement of the state-of-the-art insert only algorithms in terms of the update time: polylog(m,n) update time suffices, whereas the best previous result by Jaud et al. (SEA 2023) required update time that was linear in k.

Cite as

Amit Chakrabarti, Andrew McGregor, and Anthony Wirth. Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 40:1-40:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{chakrabarti_et_al:LIPIcs.ESA.2024.40,
  author =	{Chakrabarti, Amit and McGregor, Andrew and Wirth, Anthony},
  title =	{{Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{40:1--40:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.40},
  URN =		{urn:nbn:de:0030-drops-211114},
  doi =		{10.4230/LIPIcs.ESA.2024.40},
  annote =	{Keywords: Data Stream Computation, Maximum Coverage, Submodular Maximization}
}
Document
APPROX
Maximum Unique Coverage on Streams: Improved FPT Approximation Scheme and Tighter Space Lower Bound

Authors: Philip Cervenjak, Junhao Gan, Seeun William Umboh, and Anthony Wirth

Published in: LIPIcs, Volume 317, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)


Abstract
We consider the Max Unique Coverage problem, including applications to the data stream model. The input is a universe of n elements, a collection of m subsets of this universe, and a cardinality constraint, k. The goal is to select a subcollection of at most k sets that maximizes unique coverage, i.e, the number of elements contained in exactly one of the selected sets. The Max Unique Coverage problem has applications in wireless networks, radio broadcast, and envy-free pricing. Our first main result is a fixed-parameter tractable approximation scheme (FPT-AS) for Max Unique Coverage, parameterized by k and the maximum element frequency, r, which can be implemented on a data stream. Our FPT-AS finds a (1-ε)-approximation while maintaining a kernel of size Õ(k r/ε), which can be combined with subsampling to use Õ(k² r / ε³) space overall. This significantly improves on the previous-best FPT-AS with the same approximation, but a kernel of size Õ(k² r / ε²). In order to achieve our first result, we show upper bounds on the ratio of a collection’s coverage to the unique coverage of a maximizing subcollection; this is by constructing explicit algorithms that find a subcollection with unique coverage at least a logarithmic ratio of the collection’s coverage. We complement our algorithms with our second main result, showing that Ω(m / k²) space is necessary to achieve a (1.5 + o(1))/(ln k - 1)-approximation in the data stream. This dramatically improves the previous-best lower bound showing that Ω(m / k²) is necessary to achieve better than a e^{-1+1/k}-approximation.

Cite as

Philip Cervenjak, Junhao Gan, Seeun William Umboh, and Anthony Wirth. Maximum Unique Coverage on Streams: Improved FPT Approximation Scheme and Tighter Space Lower Bound. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 317, pp. 25:1-25:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{cervenjak_et_al:LIPIcs.APPROX/RANDOM.2024.25,
  author =	{Cervenjak, Philip and Gan, Junhao and Umboh, Seeun William and Wirth, Anthony},
  title =	{{Maximum Unique Coverage on Streams: Improved FPT Approximation Scheme and Tighter Space Lower Bound}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)},
  pages =	{25:1--25:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-348-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{317},
  editor =	{Kumar, Amit and Ron-Zewi, Noga},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2024.25},
  URN =		{urn:nbn:de:0030-drops-210183},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2024.25},
  annote =	{Keywords: Maximum unique coverage, maximum coverage, approximate kernel, data streams}
}
Document
Exploiting New Properties of String Net Frequency for Efficient Computation

Authors: Peaker Guo, Patrick Eades, Anthony Wirth, and Justin Zobel

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
Knowing which strings in a massive text are significant - that is, which strings are common and distinct from other strings - is valuable for several applications, including text compression and tokenization. Frequency in itself is not helpful for significance, because the commonest strings are the shortest strings. A compelling alternative is net frequency, which has the property that strings with positive net frequency are of maximal length. However, net frequency remains relatively unexplored, and there is no prior art showing how to compute it efficiently. We first introduce a characteristic of net frequency that simplifies the original definition. With this, we study strings with positive net frequency in Fibonacci words. We then use our characteristic and solve two key problems related to net frequency. First, single-nf, how to compute the net frequency of a given string of length m, in an input text of length n over an alphabet size σ. Second, all-nf, given length-n input text, how to report every string of positive net frequency (and its net frequency). Our methods leverage suffix arrays, components of the Burrows-Wheeler transform, and solution to the coloured range listing problem. We show that, for both problems, our data structure has O(n) construction cost: with this structure, we solve single-nf in O(m + σ) time and all-nf in O(n) time. Experimentally, we find our method to be around 100 times faster than reasonable baselines for single-nf. For all-nf, our results show that, even with prior knowledge of the set of strings with positive net frequency, simply confirming that their net frequency is positive takes longer than with our purpose-designed method. All in all, we show that net frequency is a cogent method for identifying significant strings. We show how to calculate net frequency efficiently, and how to report efficiently the set of plausibly significant strings.

Cite as

Peaker Guo, Patrick Eades, Anthony Wirth, and Justin Zobel. Exploiting New Properties of String Net Frequency for Efficient Computation. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 16:1-16:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{guo_et_al:LIPIcs.CPM.2024.16,
  author =	{Guo, Peaker and Eades, Patrick and Wirth, Anthony and Zobel, Justin},
  title =	{{Exploiting New Properties of String Net Frequency for Efficient Computation}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{16:1--16:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.16},
  URN =		{urn:nbn:de:0030-drops-201265},
  doi =		{10.4230/LIPIcs.CPM.2024.16},
  annote =	{Keywords: Fibonacci words, suffix arrays, Burrows-Wheeler transform, LCP arrays, irreducible LCP values, coloured range listing}
}
Document
Maximum Coverage in Random-Arrival Streams

Authors: Rowan Warneke, Farhana Choudhury, and Anthony Wirth

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)


Abstract
Given a collection of m sets, each a subset of a universe {1, …, n}, maximum coverage is the problem of choosing k sets whose union has the largest cardinality. A simple greedy algorithm achieves an approximation factor of 1 - 1 / e ≈ 0.632, which is the best possible polynomial-time approximation unless P = NP. In the streaming setting, information about the input is revealed gradually, in an online fashion. In the set-streaming model, each set is listed contiguously in the stream. In the more general edge-streaming model, the stream is composed of set-element pairs, denoting membership. The overall goal in the streaming setting is to design algorithms that use sublinear space in the size of the input. An interesting line of research is to design algorithms with space complexity polylogarithmic in the size of the input (i.e., polylogarithmic in both n and m); we call such algorithms low-space. In the set-streaming model, it is known that 1/2 is the best possible low-space approximation. In the edge-streaming model, no low-space algorithm can achieve a nontrivial approximation factor. We study the problem under the assumption that the order in which the stream arrives is chosen uniformly at random. Our main results are as follows. - In the random-arrival set-streaming model, we give two new algorithms to show that low space is sufficient to break the 1/2 barrier. The first achieves an approximation factor of 1/2 + c₁ using Õ(k²) space, where c₁ > 0 is a small constant and Õ(⋅) notation suppresses polylogarithmic factors; the second achieves a factor of 1 - 1 / e - ε - o(1) using Õ(k² ε^{-3}) space, where the o(1) term is a function of k. This is essentially the optimal bound, as breaking the 1-1/e barrier is known to require high space. - In the random-arrival edge-streaming model, we show for all fixed α > 0 and δ > 0, any algorithm that α-approximates maximum coverage with probability at least 0.9 in the random-arrival edge-streaming model requires Ω(m^{1-δ}) space (i.e., high space), even for the special case of k = 1.

Cite as

Rowan Warneke, Farhana Choudhury, and Anthony Wirth. Maximum Coverage in Random-Arrival Streams. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 102:1-102:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{warneke_et_al:LIPIcs.ESA.2023.102,
  author =	{Warneke, Rowan and Choudhury, Farhana and Wirth, Anthony},
  title =	{{Maximum Coverage in Random-Arrival Streams}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{102:1--102:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.102},
  URN =		{urn:nbn:de:0030-drops-187559},
  doi =		{10.4230/LIPIcs.ESA.2023.102},
  annote =	{Keywords: Maximum Coverage, Streaming Algorithm, Random Arrival, Greedy Algorithm, Communication Complexity}
}
Document
Maximum Coverage in Sublinear Space, Faster

Authors: Stephen Jaud, Anthony Wirth, and Farhana Choudhury

Published in: LIPIcs, Volume 265, 21st International Symposium on Experimental Algorithms (SEA 2023)


Abstract
Given a collection of m sets from a universe 𝒰, the Maximum Set Coverage problem consists of finding k sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor 1-1/e. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe n = |𝒰|. However, one randomized streaming algorithm has been shown to produce a 1-1/e-ε approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to m and n. In order to achieve such a low space complexity, the authors used two techniques in their multi-pass approach: - F₀-sketching, allows to determine with great accuracy the number of distinct elements in a set using less space than the set itself. - Subsampling, consists of only solving the problem on a subspace of the universe. It is implemented using γ-independent hash functions. This article focuses on the sublinear-space algorithm and highlights the time cost of these two techniques, especially subsampling. We present optimizations that significantly reduce the time complexity of the algorithm. Firstly, we give some optimizations that do not alter the space complexity, number of passes and approximation quality of the original algorithm. In particular, we reanalyze the error bounds to show that the original independence factor of Ω(ε^{-2} k log m) can be fine-tuned to Ω(k log m); we also show how F₀-sketching can be removed. Secondly, we derive a new lower bound for the probability of producing a 1-1/e-ε approximation using only pairwise independence: 1- (4/(c k log m)) compared to 1-(2e/(m^{ck/6})) with Ω(k log m)-independence. Although the theoretical guarantees are weaker, suggesting the approximation quality would suffer, for large streams, our algorithms perform well in practice. Finally, our experimental results show that even a pairwise-independent hash-function sampler does not produce worse solution than the original algorithm, while running significantly faster by several orders of magnitude.

Cite as

Stephen Jaud, Anthony Wirth, and Farhana Choudhury. Maximum Coverage in Sublinear Space, Faster. In 21st International Symposium on Experimental Algorithms (SEA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 265, pp. 21:1-21:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{jaud_et_al:LIPIcs.SEA.2023.21,
  author =	{Jaud, Stephen and Wirth, Anthony and Choudhury, Farhana},
  title =	{{Maximum Coverage in Sublinear Space, Faster}},
  booktitle =	{21st International Symposium on Experimental Algorithms (SEA 2023)},
  pages =	{21:1--21:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-279-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{265},
  editor =	{Georgiadis, Loukas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2023.21},
  URN =		{urn:nbn:de:0030-drops-183715},
  doi =		{10.4230/LIPIcs.SEA.2023.21},
  annote =	{Keywords: streaming algorithms, subsampling, maximum set cover, k-wise independent hash functions}
}
Document
An Almost Optimal Algorithm for Unbounded Search with Noisy Information

Authors: Junhao Gan, Anthony Wirth, and Xin Zhang

Published in: LIPIcs, Volume 227, 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)


Abstract
Given a sequence of integers, 𝒮 = s₁, s₂,… in ascending order, called the search domain, and an integer t, called the target, the predecessor problem asks for the target index N such that s_N is the largest integer in 𝒮 satisfying s_N ≤ t. We consider solving the predecessor problem with the least number of queries to a binary comparison oracle. For each query index i, the oracle returns whether s_i ≤ t or s_i > t. In particular, we study the predecessor problem under the UnboundedNoisy setting, where (i) the search domain 𝒮 is unbounded, i.e., n = |𝒮| is unknown or infinite, and (ii) the binary comparison oracle is noisy. We denote the former setting by Unbounded and the latter by Noisy. In Noisy, the oracle, for each query, independently returns a wrong answer with a fixed constant probability 0 < p < 1/2. In particular, even for two queries on the same index i, the answers from the oracle may be different. Furthermore, with a noisy oracle, the goal is to correctly return the target index with probability at least 1- Q, where 0 < Q < 1/2 is the failure probability. Our first result is an algorithm, called NoS, for Noisy that improves the previous result by Ben-Or and Hassidim [FOCS 2008] from an expected query complexity bound to a worst-case bound. We also achieve an expected query complexity bound, whose leading term has an optimal constant factor, matching the lower bound of Ben-Or and Hassidim. Building on NoS, we propose our NoSU algorithm, which correctly solves the predecessor problem in the UnboundedNoisy setting. We prove that the query complexity of NoSU is ∑_{i = 1}^k (log^{(i)} N) /(1-H(p))+ o(log N) when log Q^{-1} ∈ o(log N), where N is the target index, k = log^* N, the iterated logarithm, and H(p) is the entropy function. This improves the previous bound of O(log (N/Q) / (1-H(p))) by reducing the coefficient of the leading term from a large constant to 1. Moreover, we show that this upper bound can be further improved to (1 - Q) ∑_{i = 1}^k (log^{(i)} N) /(1-H(p))+ o(log N) in expectation, with the constant in the leading term reduced to 1 - Q. Finally, we show that an information-theoretic lower bound on the expected query cost of the predecessor problem in UnboundedNoisy is at least (1 - Q)(∑_{i = 1}^k log^{(i)} N - 2k)/(1-H(p)) - 10. This implies the constant factor in the leading term of our expected upper bound is indeed optimal.

Cite as

Junhao Gan, Anthony Wirth, and Xin Zhang. An Almost Optimal Algorithm for Unbounded Search with Noisy Information. In 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 227, pp. 25:1-25:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{gan_et_al:LIPIcs.SWAT.2022.25,
  author =	{Gan, Junhao and Wirth, Anthony and Zhang, Xin},
  title =	{{An Almost Optimal Algorithm for Unbounded Search with Noisy Information}},
  booktitle =	{18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)},
  pages =	{25:1--25:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-236-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{227},
  editor =	{Czumaj, Artur and Xin, Qin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2022.25},
  URN =		{urn:nbn:de:0030-drops-161854},
  doi =		{10.4230/LIPIcs.SWAT.2022.25},
  annote =	{Keywords: Fault-tolerant search, noisy binary search, query complexity}
}
Document
Recency Queries with Succinct Representation

Authors: William L. Holland, Anthony Wirth, and Justin Zobel

Published in: LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)


Abstract
In the context of the sliding-window set membership problem, and caching policies that require knowledge of item recency, we formalize the problem of Recency on a stream. Informally, the query asks, "when was the last time I saw item x?" Existing structures, such as hash tables, can support a recency query by augmenting item occurrences with timestamps. To support recency queries on a window of W items, this might require Θ(W log W) bits. We propose a succinct data structure for Recency. By combining sliding-window dictionaries in a hierarchical structure, and careful design of the underlying hash tables, we achieve a data structure that returns a 1+ε approximation to the recency of every item in O(log(ε W)) time, in only (1+o(1))(1+ε)(ℬ+Wlog(ε^(-1))) bits. Here, ℬ is the information-theoretic lower bound on the number of bits for a set of size W, in a universe of cardinality N.

Cite as

William L. Holland, Anthony Wirth, and Justin Zobel. Recency Queries with Succinct Representation. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 49:1-49:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{holland_et_al:LIPIcs.ISAAC.2020.49,
  author =	{Holland, William L. and Wirth, Anthony and Zobel, Justin},
  title =	{{Recency Queries with Succinct Representation}},
  booktitle =	{31st International Symposium on Algorithms and Computation (ISAAC 2020)},
  pages =	{49:1--49:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-173-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{181},
  editor =	{Cao, Yixin and Cheng, Siu-Wing and Li, Minming},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.49},
  URN =		{urn:nbn:de:0030-drops-133931},
  doi =		{10.4230/LIPIcs.ISAAC.2020.49},
  annote =	{Keywords: Succinct Data Structures, Data Streams, Sliding Dictionary}
}
Document
Graph Clustering in All Parameter Regimes

Authors: Junhao Gan, David F. Gleich, Nate Veldt, Anthony Wirth, and Xin Zhang

Published in: LIPIcs, Volume 170, 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)


Abstract
Resolution parameters in graph clustering control the size and structure of clusters formed by solving a parametric objective function. Typically there is more than one meaningful way to cluster a graph, and solving the same objective function for different resolution parameters produces clusterings at different levels of granularity, each of which can be meaningful depending on the application. In this paper, we address the task of efficiently solving a parameterized graph clustering objective for all values of a resolution parameter. Specifically, we consider a new analysis-friendly objective we call LambdaPrime, involving a parameter λ ∈ (0,1). LambdaPrime is an adaptation of LambdaCC, a significant family of instances of the Correlation Clustering (minimization) problem. Indeed, LambdaPrime and LambdaCC are closely related to other parameterized clustering problems, such as parametric generalizations of modularity. They capture a number of specific clustering problems as special cases, including sparsest cut and cluster deletion. While previous work provides approximation results for a single value of the resolution parameter, we seek a set of approximately optimal clusterings for all values of λ in polynomial time. More specifically, we show that when a graph has m edges and n nodes, there exists a set of at most m clusterings such that, for every λ ∈ (0,1), the family contains an optimal solution to the LambdaPrime objective. This bound is tight on star graphs. We obtain a family of O(log n) clusterings by solving the parametric linear programming (LP) relaxation of LambdaPrime at O(log n) λ values, and rounding each LP solution using existing approximation algorithms. We prove that this is asymptotically tight: for a certain class of ring graphs, for all values of λ, Ω(log n) feasible solutions are required to provide a constant-factor approximation for the LambdaPrime LP relaxation. To minimize the size of the clustering family, we further propose an algorithm that yields a family of solutions of a size no more than twice of the minimum LP-approximating family.

Cite as

Junhao Gan, David F. Gleich, Nate Veldt, Anthony Wirth, and Xin Zhang. Graph Clustering in All Parameter Regimes. In 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 170, pp. 39:1-39:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{gan_et_al:LIPIcs.MFCS.2020.39,
  author =	{Gan, Junhao and Gleich, David F. and Veldt, Nate and Wirth, Anthony and Zhang, Xin},
  title =	{{Graph Clustering in All Parameter Regimes}},
  booktitle =	{45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)},
  pages =	{39:1--39:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-159-7},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{170},
  editor =	{Esparza, Javier and Kr\'{a}l', Daniel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2020.39},
  URN =		{urn:nbn:de:0030-drops-127065},
  doi =		{10.4230/LIPIcs.MFCS.2020.39},
  annote =	{Keywords: Graph Clustering, Algorithms, Parametric Linear Programming}
}
Document
Result-Sensitive Binary Search with Noisy Information

Authors: Narthana S. Epa, Junhao Gan, and Anthony Wirth

Published in: LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)


Abstract
We describe new algorithms for the predecessor problem in the Noisy Comparison Model. In this problem, given a sorted list L of n (distinct) elements and a query q, we seek the predecessor of q in L: denoted by u, the largest element less than or equal to q. In the Noisy Comparison Model, the result of a comparison between two elements is non-deterministic. Moreover, multiple comparisons of the same pair of elements might have different results: each is generated independently, and is correct with probability p > 1/2. Given an overall error tolerance Q, the cost of an algorithm is measured by the total number of noisy comparisons; these must guarantee the predecessor is returned with probability at least 1 - Q. Feige et al. showed that predecessor queries can be answered by a modified binary search with Theta(log (n/Q)) noisy comparisons. We design result-sensitive algorithms for answering predecessor queries. The query cost is related to the index, k, of the predecessor u in L. Our first algorithm answers predecessor queries with O(log ((log^{*(c)} n)/Q) + log (k/Q)) noisy comparisons, for an arbitrarily large constant c. The function log^{*(c)} n iterates c times the iterated-logarithm function, log^* n. Our second algorithm is a genuinely result-sensitive algorithm whose expected query cost is bounded by O(log (k/Q)), and is guaranteed to terminate after at most O(log((log n)/Q)) noisy comparisons. Our results strictly improve the state-of-the-art bounds when k is in omega(1) intersected with o(n^epsilon), where epsilon > 0 is some constant. Moreover, we show that our result-sensitive algorithms immediately improve not only predecessor-query algorithms, but also binary-search-like algorithms for solving key applications.

Cite as

Narthana S. Epa, Junhao Gan, and Anthony Wirth. Result-Sensitive Binary Search with Noisy Information. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 60:1-60:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{epa_et_al:LIPIcs.ISAAC.2019.60,
  author =	{Epa, Narthana S. and Gan, Junhao and Wirth, Anthony},
  title =	{{Result-Sensitive Binary Search with Noisy Information}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{60:1--60:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Lu, Pinyan and Zhang, Guochuan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.60},
  URN =		{urn:nbn:de:0030-drops-115568},
  doi =		{10.4230/LIPIcs.ISAAC.2019.60},
  annote =	{Keywords: Fault-tolerant search, random walks, noisy comparisons, predecessor queries}
}
Document
Correlation Clustering Generalized

Authors: David F. Gleich, Nate Veldt, and Anthony Wirth

Published in: LIPIcs, Volume 123, 29th International Symposium on Algorithms and Computation (ISAAC 2018)


Abstract
We present new results for LambdaCC and MotifCC, two recently introduced variants of the well-studied correlation clustering problem. Both variants are motivated by applications to network analysis and community detection, and have non-trivial approximation algorithms. We first show that the standard linear programming relaxation of LambdaCC has a Theta(log n) integrality gap for a certain choice of the parameter lambda. This sheds light on previous challenges encountered in obtaining parameter-independent approximation results for LambdaCC. We generalize a previous constant-factor algorithm to provide the best results, from the LP-rounding approach, for an extended range of lambda. MotifCC generalizes correlation clustering to the hypergraph setting. In the case of hyperedges of degree 3 with weights satisfying probability constraints, we improve the best approximation factor from 9 to 8. We show that in general our algorithm gives a 4(k-1) approximation when hyperedges have maximum degree k and probability weights. We additionally present approximation results for LambdaCC and MotifCC where we restrict to forming only two clusters.

Cite as

David F. Gleich, Nate Veldt, and Anthony Wirth. Correlation Clustering Generalized. In 29th International Symposium on Algorithms and Computation (ISAAC 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 123, pp. 44:1-44:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{gleich_et_al:LIPIcs.ISAAC.2018.44,
  author =	{Gleich, David F. and Veldt, Nate and Wirth, Anthony},
  title =	{{Correlation Clustering Generalized}},
  booktitle =	{29th International Symposium on Algorithms and Computation (ISAAC 2018)},
  pages =	{44:1--44:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-094-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{123},
  editor =	{Hsu, Wen-Lian and Lee, Der-Tsai and Liao, Chung-Shou},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2018.44},
  URN =		{urn:nbn:de:0030-drops-99925},
  doi =		{10.4230/LIPIcs.ISAAC.2018.44},
  annote =	{Keywords: Correlation Clustering, Approximation Algorithms}
}
Document
Precedence-Constrained Min Sum Set Cover

Authors: Jessica McClintock, Julián Mestre, and Anthony Wirth

Published in: LIPIcs, Volume 92, 28th International Symposium on Algorithms and Computation (ISAAC 2017)


Abstract
We introduce a version of the Min Sum Set Cover (MSSC) problem in which there are "AND" precedence constraints on the m sets. In the Precedence-Constrained Min Sum Set Cover (PCMSSC) problem, when interpreted as directed edges, the constraints induce an acyclic directed graph. PCMSSC models the aim of scheduling software tests to prioritize the rate of fault detection subject to dependencies between tests. Our greedy scheme for PCMSSC is similar to the approaches of Feige, Lovasz, and, Tetali for MSSC, and Chekuri and Motwani for precedence-constrained scheduling to minimize weighted completion time. With a factor-4 increase in approximation ratio, we reduce PCMSSC to the problem of finding a maximum-density precedence-closed sub-family of sets, where density is the ratio of sub-family union size to cardinality. We provide a greedy factor-sqrt m algorithm for maximizing density; on forests of in-trees, we show this algorithm finds an optimal solution. Harnessing an alternative greedy argument of Chekuri and Kumar for Maximum Coverage with Group Budget Constraints, on forests of out-trees, we design an algorithm with approximation ratio equal to maximum tree height. Finally, with a reduction from the Planted Dense Subgraph detection problem, we show that its conjectured hardness implies there is no polynomial-time algorithm for PCMSSC with approximation factor in O(m^{1/12-epsilon}).

Cite as

Jessica McClintock, Julián Mestre, and Anthony Wirth. Precedence-Constrained Min Sum Set Cover. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 55:1-55:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{mcclintock_et_al:LIPIcs.ISAAC.2017.55,
  author =	{McClintock, Jessica and Mestre, Juli\'{a}n and Wirth, Anthony},
  title =	{{Precedence-Constrained Min Sum Set Cover}},
  booktitle =	{28th International Symposium on Algorithms and Computation (ISAAC 2017)},
  pages =	{55:1--55:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-054-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{92},
  editor =	{Okamoto, Yoshio and Tokuyama, Takeshi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2017.55},
  URN =		{urn:nbn:de:0030-drops-82648},
  doi =		{10.4230/LIPIcs.ISAAC.2017.55},
  annote =	{Keywords: planted dense subgraph, min sum set cover, precedence constrained}
}
Document
On Approximating Target Set Selection

Authors: Moses Charikar, Yonatan Naamad, and Anthony Wirth

Published in: LIPIcs, Volume 60, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)


Abstract
We study the Target Set Selection (TSS) problem introduced by Kempe, Kleinberg, and Tardos (2003). This problem models the propagation of influence in a network, in a sequence of rounds. A set of nodes is made "active" initially. In each subsequent round, a vertex is activated if at least a certain number of its neighbors are (already) active. In the minimization version, the goal is to activate a small set of vertices initially - a seed, or target, set - so that activation spreads to the entire graph. In the absence of a sublinear-factor algorithm for the general version, we provide a (sublinear) approximation algorithm for the bounded-round version, where the goal is to activate all the vertices in r rounds. Assuming a known conjecture on the hardness of Planted Dense Subgraph, we establish hardness-of-approximation results for the bounded-round version. We show that they translate to general Target Set Selection, leading to a hardness factor of n^(1/2-epsilon) for all epsilon > 0. This is the first polynomial hardness result for Target Set Selection, and the strongest conditional result known for a large class of monotone satisfiability problems. In the maximization version of TSS, the goal is to pick a target set of size k so as to maximize the number of nodes eventually active. We show an n^(1-epsilon) hardness result for the undirected maximization version of the problem, thus establishing that the undirected case is as hard as the directed case. Finally, we demonstrate an SETH lower bound for the exact computation of the optimal seed set.

Cite as

Moses Charikar, Yonatan Naamad, and Anthony Wirth. On Approximating Target Set Selection. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 60, pp. 4:1-4:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.APPROX-RANDOM.2016.4,
  author =	{Charikar, Moses and Naamad, Yonatan and Wirth, Anthony},
  title =	{{On Approximating Target Set Selection}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)},
  pages =	{4:1--4:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-018-7},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{60},
  editor =	{Jansen, Klaus and Mathieu, Claire and Rolim, Jos\'{e} D. P. and Umans, Chris},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2016.4},
  URN =		{urn:nbn:de:0030-drops-66274},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2016.4},
  annote =	{Keywords: target set selection, influence propagation, approximation algorithms, hardness of approximation, planted dense subgraph}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail