DROPS

Document

DOI: 10.4230/LIPIcs.CPM.2024.16

Exploiting New Properties of String Net Frequency for Efficient Computation

Authors: Peaker Guo, Patrick Eades, Anthony Wirth, and Justin Zobel

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

Abstract

Knowing which strings in a massive text are significant - that is, which strings are common and distinct from other strings - is valuable for several applications, including text compression and tokenization. Frequency in itself is not helpful for significance, because the commonest strings are the shortest strings. A compelling alternative is net frequency, which has the property that strings with positive net frequency are of maximal length. However, net frequency remains relatively unexplored, and there is no prior art showing how to compute it efficiently. We first introduce a characteristic of net frequency that simplifies the original definition. With this, we study strings with positive net frequency in Fibonacci words. We then use our characteristic and solve two key problems related to net frequency. First, single-nf, how to compute the net frequency of a given string of length m, in an input text of length n over an alphabet size σ. Second, all-nf, given length-n input text, how to report every string of positive net frequency (and its net frequency). Our methods leverage suffix arrays, components of the Burrows-Wheeler transform, and solution to the coloured range listing problem. We show that, for both problems, our data structure has O(n) construction cost: with this structure, we solve single-nf in O(m + σ) time and all-nf in O(n) time. Experimentally, we find our method to be around 100 times faster than reasonable baselines for single-nf. For all-nf, our results show that, even with prior knowledge of the set of strings with positive net frequency, simply confirming that their net frequency is positive takes longer than with our purpose-designed method. All in all, we show that net frequency is a cogent method for identifying significant strings. We show how to calculate net frequency efficiently, and how to report efficiently the set of plausibly significant strings.

Cite as

Peaker Guo, Patrick Eades, Anthony Wirth, and Justin Zobel. Exploiting New Properties of String Net Frequency for Efficient Computation. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 16:1-16:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{guo_et_al:LIPIcs.CPM.2024.16,
  author =	{Guo, Peaker and Eades, Patrick and Wirth, Anthony and Zobel, Justin},
  title =	{{Exploiting New Properties of String Net Frequency for Efficient Computation}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{16:1--16:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.16},
  URN =		{urn:nbn:de:0030-drops-201265},
  doi =		{10.4230/LIPIcs.CPM.2024.16},
  annote =	{Keywords: Fibonacci words, suffix arrays, Burrows-Wheeler transform, LCP arrays, irreducible LCP values, coloured range listing}
}

Document

DOI: 10.4230/LIPIcs.ESA.2023.102

Maximum Coverage in Random-Arrival Streams

Authors: Rowan Warneke, Farhana Choudhury, and Anthony Wirth

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Abstract

Given a collection of m sets, each a subset of a universe {1, …, n}, maximum coverage is the problem of choosing k sets whose union has the largest cardinality. A simple greedy algorithm achieves an approximation factor of 1 - 1 / e ≈ 0.632, which is the best possible polynomial-time approximation unless P = NP. In the streaming setting, information about the input is revealed gradually, in an online fashion. In the set-streaming model, each set is listed contiguously in the stream. In the more general edge-streaming model, the stream is composed of set-element pairs, denoting membership. The overall goal in the streaming setting is to design algorithms that use sublinear space in the size of the input. An interesting line of research is to design algorithms with space complexity polylogarithmic in the size of the input (i.e., polylogarithmic in both n and m); we call such algorithms low-space. In the set-streaming model, it is known that 1/2 is the best possible low-space approximation. In the edge-streaming model, no low-space algorithm can achieve a nontrivial approximation factor. We study the problem under the assumption that the order in which the stream arrives is chosen uniformly at random. Our main results are as follows. - In the random-arrival set-streaming model, we give two new algorithms to show that low space is sufficient to break the 1/2 barrier. The first achieves an approximation factor of 1/2 + c₁ using Õ(k²) space, where c₁ > 0 is a small constant and Õ(⋅) notation suppresses polylogarithmic factors; the second achieves a factor of 1 - 1 / e - ε - o(1) using Õ(k² ε^{-3}) space, where the o(1) term is a function of k. This is essentially the optimal bound, as breaking the 1-1/e barrier is known to require high space. - In the random-arrival edge-streaming model, we show for all fixed α > 0 and δ > 0, any algorithm that α-approximates maximum coverage with probability at least 0.9 in the random-arrival edge-streaming model requires Ω(m^{1-δ}) space (i.e., high space), even for the special case of k = 1.

Cite as

Rowan Warneke, Farhana Choudhury, and Anthony Wirth. Maximum Coverage in Random-Arrival Streams. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 102:1-102:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{warneke_et_al:LIPIcs.ESA.2023.102,
  author =	{Warneke, Rowan and Choudhury, Farhana and Wirth, Anthony},
  title =	{{Maximum Coverage in Random-Arrival Streams}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{102:1--102:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.102},
  URN =		{urn:nbn:de:0030-drops-187559},
  doi =		{10.4230/LIPIcs.ESA.2023.102},
  annote =	{Keywords: Maximum Coverage, Streaming Algorithm, Random Arrival, Greedy Algorithm, Communication Complexity}
}

Document

DOI: 10.4230/LIPIcs.SEA.2023.21

Maximum Coverage in Sublinear Space, Faster

Authors: Stephen Jaud, Anthony Wirth, and Farhana Choudhury

Published in: LIPIcs, Volume 265, 21st International Symposium on Experimental Algorithms (SEA 2023)

Abstract

Given a collection of m sets from a universe 𝒰, the Maximum Set Coverage problem consists of finding k sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor 1-1/e. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe n = |𝒰|. However, one randomized streaming algorithm has been shown to produce a 1-1/e-ε approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to m and n. In order to achieve such a low space complexity, the authors used two techniques in their multi-pass approach: - F₀-sketching, allows to determine with great accuracy the number of distinct elements in a set using less space than the set itself. - Subsampling, consists of only solving the problem on a subspace of the universe. It is implemented using γ-independent hash functions. This article focuses on the sublinear-space algorithm and highlights the time cost of these two techniques, especially subsampling. We present optimizations that significantly reduce the time complexity of the algorithm. Firstly, we give some optimizations that do not alter the space complexity, number of passes and approximation quality of the original algorithm. In particular, we reanalyze the error bounds to show that the original independence factor of Ω(ε^{-2} k log m) can be fine-tuned to Ω(k log m); we also show how F₀-sketching can be removed. Secondly, we derive a new lower bound for the probability of producing a 1-1/e-ε approximation using only pairwise independence: 1- (4/(c k log m)) compared to 1-(2e/(m^{ck/6})) with Ω(k log m)-independence. Although the theoretical guarantees are weaker, suggesting the approximation quality would suffer, for large streams, our algorithms perform well in practice. Finally, our experimental results show that even a pairwise-independent hash-function sampler does not produce worse solution than the original algorithm, while running significantly faster by several orders of magnitude.

Cite as

Stephen Jaud, Anthony Wirth, and Farhana Choudhury. Maximum Coverage in Sublinear Space, Faster. In 21st International Symposium on Experimental Algorithms (SEA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 265, pp. 21:1-21:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{jaud_et_al:LIPIcs.SEA.2023.21,
  author =	{Jaud, Stephen and Wirth, Anthony and Choudhury, Farhana},
  title =	{{Maximum Coverage in Sublinear Space, Faster}},
  booktitle =	{21st International Symposium on Experimental Algorithms (SEA 2023)},
  pages =	{21:1--21:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-279-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{265},
  editor =	{Georgiadis, Loukas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2023.21},
  URN =		{urn:nbn:de:0030-drops-183715},
  doi =		{10.4230/LIPIcs.SEA.2023.21},
  annote =	{Keywords: streaming algorithms, subsampling, maximum set cover, k-wise independent hash functions}
}

Document

DOI: 10.4230/LIPIcs.SWAT.2022.25

An Almost Optimal Algorithm for Unbounded Search with Noisy Information

Authors: Junhao Gan, Anthony Wirth, and Xin Zhang

Published in: LIPIcs, Volume 227, 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)

Abstract

Given a sequence of integers, 𝒮 = s₁, s₂,… in ascending order, called the search domain, and an integer t, called the target, the predecessor problem asks for the target index N such that s_N is the largest integer in 𝒮 satisfying s_N ≤ t. We consider solving the predecessor problem with the least number of queries to a binary comparison oracle. For each query index i, the oracle returns whether s_i ≤ t or s_i > t. In particular, we study the predecessor problem under the UnboundedNoisy setting, where (i) the search domain 𝒮 is unbounded, i.e., n = |𝒮| is unknown or infinite, and (ii) the binary comparison oracle is noisy. We denote the former setting by Unbounded and the latter by Noisy. In Noisy, the oracle, for each query, independently returns a wrong answer with a fixed constant probability 0 < p < 1/2. In particular, even for two queries on the same index i, the answers from the oracle may be different. Furthermore, with a noisy oracle, the goal is to correctly return the target index with probability at least 1- Q, where 0 < Q < 1/2 is the failure probability. Our first result is an algorithm, called NoS, for Noisy that improves the previous result by Ben-Or and Hassidim [FOCS 2008] from an expected query complexity bound to a worst-case bound. We also achieve an expected query complexity bound, whose leading term has an optimal constant factor, matching the lower bound of Ben-Or and Hassidim. Building on NoS, we propose our NoSU algorithm, which correctly solves the predecessor problem in the UnboundedNoisy setting. We prove that the query complexity of NoSU is ∑_{i = 1}^k (log^{(i)} N) /(1-H(p))+ o(log N) when log Q^{-1} ∈ o(log N), where N is the target index, k = log^* N, the iterated logarithm, and H(p) is the entropy function. This improves the previous bound of O(log (N/Q) / (1-H(p))) by reducing the coefficient of the leading term from a large constant to 1. Moreover, we show that this upper bound can be further improved to (1 - Q) ∑_{i = 1}^k (log^{(i)} N) /(1-H(p))+ o(log N) in expectation, with the constant in the leading term reduced to 1 - Q. Finally, we show that an information-theoretic lower bound on the expected query cost of the predecessor problem in UnboundedNoisy is at least (1 - Q)(∑_{i = 1}^k log^{(i)} N - 2k)/(1-H(p)) - 10. This implies the constant factor in the leading term of our expected upper bound is indeed optimal.

Cite as

Junhao Gan, Anthony Wirth, and Xin Zhang. An Almost Optimal Algorithm for Unbounded Search with Noisy Information. In 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 227, pp. 25:1-25:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{gan_et_al:LIPIcs.SWAT.2022.25,
  author =	{Gan, Junhao and Wirth, Anthony and Zhang, Xin},
  title =	{{An Almost Optimal Algorithm for Unbounded Search with Noisy Information}},
  booktitle =	{18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)},
  pages =	{25:1--25:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-236-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{227},
  editor =	{Czumaj, Artur and Xin, Qin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2022.25},
  URN =		{urn:nbn:de:0030-drops-161854},
  doi =		{10.4230/LIPIcs.SWAT.2022.25},
  annote =	{Keywords: Fault-tolerant search, noisy binary search, query complexity}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2020.49

Recency Queries with Succinct Representation

Authors: William L. Holland, Anthony Wirth, and Justin Zobel

Published in: LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)

Abstract

In the context of the sliding-window set membership problem, and caching policies that require knowledge of item recency, we formalize the problem of Recency on a stream. Informally, the query asks, "when was the last time I saw item x?" Existing structures, such as hash tables, can support a recency query by augmenting item occurrences with timestamps. To support recency queries on a window of W items, this might require Θ(W log W) bits. We propose a succinct data structure for Recency. By combining sliding-window dictionaries in a hierarchical structure, and careful design of the underlying hash tables, we achieve a data structure that returns a 1+ε approximation to the recency of every item in O(log(ε W)) time, in only (1+o(1))(1+ε)(ℬ+Wlog(ε^(-1))) bits. Here, ℬ is the information-theoretic lower bound on the number of bits for a set of size W, in a universe of cardinality N.

Cite as

William L. Holland, Anthony Wirth, and Justin Zobel. Recency Queries with Succinct Representation. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 49:1-49:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{holland_et_al:LIPIcs.ISAAC.2020.49,
  author =	{Holland, William L. and Wirth, Anthony and Zobel, Justin},
  title =	{{Recency Queries with Succinct Representation}},
  booktitle =	{31st International Symposium on Algorithms and Computation (ISAAC 2020)},
  pages =	{49:1--49:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-173-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{181},
  editor =	{Cao, Yixin and Cheng, Siu-Wing and Li, Minming},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.49},
  URN =		{urn:nbn:de:0030-drops-133931},
  doi =		{10.4230/LIPIcs.ISAAC.2020.49},
  annote =	{Keywords: Succinct Data Structures, Data Streams, Sliding Dictionary}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2020.39

Graph Clustering in All Parameter Regimes

Authors: Junhao Gan, David F. Gleich, Nate Veldt, Anthony Wirth, and Xin Zhang

Published in: LIPIcs, Volume 170, 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)

Abstract

Resolution parameters in graph clustering control the size and structure of clusters formed by solving a parametric objective function. Typically there is more than one meaningful way to cluster a graph, and solving the same objective function for different resolution parameters produces clusterings at different levels of granularity, each of which can be meaningful depending on the application. In this paper, we address the task of efficiently solving a parameterized graph clustering objective for all values of a resolution parameter. Specifically, we consider a new analysis-friendly objective we call LambdaPrime, involving a parameter λ ∈ (0,1). LambdaPrime is an adaptation of LambdaCC, a significant family of instances of the Correlation Clustering (minimization) problem. Indeed, LambdaPrime and LambdaCC are closely related to other parameterized clustering problems, such as parametric generalizations of modularity. They capture a number of specific clustering problems as special cases, including sparsest cut and cluster deletion. While previous work provides approximation results for a single value of the resolution parameter, we seek a set of approximately optimal clusterings for all values of λ in polynomial time. More specifically, we show that when a graph has m edges and n nodes, there exists a set of at most m clusterings such that, for every λ ∈ (0,1), the family contains an optimal solution to the LambdaPrime objective. This bound is tight on star graphs. We obtain a family of O(log n) clusterings by solving the parametric linear programming (LP) relaxation of LambdaPrime at O(log n) λ values, and rounding each LP solution using existing approximation algorithms. We prove that this is asymptotically tight: for a certain class of ring graphs, for all values of λ, Ω(log n) feasible solutions are required to provide a constant-factor approximation for the LambdaPrime LP relaxation. To minimize the size of the clustering family, we further propose an algorithm that yields a family of solutions of a size no more than twice of the minimum LP-approximating family.

Cite as

Junhao Gan, David F. Gleich, Nate Veldt, Anthony Wirth, and Xin Zhang. Graph Clustering in All Parameter Regimes. In 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 170, pp. 39:1-39:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{gan_et_al:LIPIcs.MFCS.2020.39,
  author =	{Gan, Junhao and Gleich, David F. and Veldt, Nate and Wirth, Anthony and Zhang, Xin},
  title =	{{Graph Clustering in All Parameter Regimes}},
  booktitle =	{45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)},
  pages =	{39:1--39:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-159-7},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{170},
  editor =	{Esparza, Javier and Kr\'{a}l', Daniel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2020.39},
  URN =		{urn:nbn:de:0030-drops-127065},
  doi =		{10.4230/LIPIcs.MFCS.2020.39},
  annote =	{Keywords: Graph Clustering, Algorithms, Parametric Linear Programming}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2019.60

Result-Sensitive Binary Search with Noisy Information

Authors: Narthana S. Epa, Junhao Gan, and Anthony Wirth

Published in: LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

Abstract

We describe new algorithms for the predecessor problem in the Noisy Comparison Model. In this problem, given a sorted list L of n (distinct) elements and a query q, we seek the predecessor of q in L: denoted by u, the largest element less than or equal to q. In the Noisy Comparison Model, the result of a comparison between two elements is non-deterministic. Moreover, multiple comparisons of the same pair of elements might have different results: each is generated independently, and is correct with probability p > 1/2. Given an overall error tolerance Q, the cost of an algorithm is measured by the total number of noisy comparisons; these must guarantee the predecessor is returned with probability at least 1 - Q. Feige et al. showed that predecessor queries can be answered by a modified binary search with Theta(log (n/Q)) noisy comparisons. We design result-sensitive algorithms for answering predecessor queries. The query cost is related to the index, k, of the predecessor u in L. Our first algorithm answers predecessor queries with O(log ((log^{*(c)} n)/Q) + log (k/Q)) noisy comparisons, for an arbitrarily large constant c. The function log^{*(c)} n iterates c times the iterated-logarithm function, log^* n. Our second algorithm is a genuinely result-sensitive algorithm whose expected query cost is bounded by O(log (k/Q)), and is guaranteed to terminate after at most O(log((log n)/Q)) noisy comparisons. Our results strictly improve the state-of-the-art bounds when k is in omega(1) intersected with o(n^epsilon), where epsilon > 0 is some constant. Moreover, we show that our result-sensitive algorithms immediately improve not only predecessor-query algorithms, but also binary-search-like algorithms for solving key applications.

Cite as

Narthana S. Epa, Junhao Gan, and Anthony Wirth. Result-Sensitive Binary Search with Noisy Information. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 60:1-60:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{epa_et_al:LIPIcs.ISAAC.2019.60,
  author =	{Epa, Narthana S. and Gan, Junhao and Wirth, Anthony},
  title =	{{Result-Sensitive Binary Search with Noisy Information}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{60:1--60:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Lu, Pinyan and Zhang, Guochuan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.60},
  URN =		{urn:nbn:de:0030-drops-115568},
  doi =		{10.4230/LIPIcs.ISAAC.2019.60},
  annote =	{Keywords: Fault-tolerant search, random walks, noisy comparisons, predecessor queries}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2018.44

Correlation Clustering Generalized

Authors: David F. Gleich, Nate Veldt, and Anthony Wirth

Published in: LIPIcs, Volume 123, 29th International Symposium on Algorithms and Computation (ISAAC 2018)

Abstract

We present new results for LambdaCC and MotifCC, two recently introduced variants of the well-studied correlation clustering problem. Both variants are motivated by applications to network analysis and community detection, and have non-trivial approximation algorithms. We first show that the standard linear programming relaxation of LambdaCC has a Theta(log n) integrality gap for a certain choice of the parameter lambda. This sheds light on previous challenges encountered in obtaining parameter-independent approximation results for LambdaCC. We generalize a previous constant-factor algorithm to provide the best results, from the LP-rounding approach, for an extended range of lambda. MotifCC generalizes correlation clustering to the hypergraph setting. In the case of hyperedges of degree 3 with weights satisfying probability constraints, we improve the best approximation factor from 9 to 8. We show that in general our algorithm gives a 4(k-1) approximation when hyperedges have maximum degree k and probability weights. We additionally present approximation results for LambdaCC and MotifCC where we restrict to forming only two clusters.

Cite as

David F. Gleich, Nate Veldt, and Anthony Wirth. Correlation Clustering Generalized. In 29th International Symposium on Algorithms and Computation (ISAAC 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 123, pp. 44:1-44:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gleich_et_al:LIPIcs.ISAAC.2018.44,
  author =	{Gleich, David F. and Veldt, Nate and Wirth, Anthony},
  title =	{{Correlation Clustering Generalized}},
  booktitle =	{29th International Symposium on Algorithms and Computation (ISAAC 2018)},
  pages =	{44:1--44:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-094-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{123},
  editor =	{Hsu, Wen-Lian and Lee, Der-Tsai and Liao, Chung-Shou},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2018.44},
  URN =		{urn:nbn:de:0030-drops-99925},
  doi =		{10.4230/LIPIcs.ISAAC.2018.44},
  annote =	{Keywords: Correlation Clustering, Approximation Algorithms}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2017.55

Precedence-Constrained Min Sum Set Cover

Authors: Jessica McClintock, Julián Mestre, and Anthony Wirth

Published in: LIPIcs, Volume 92, 28th International Symposium on Algorithms and Computation (ISAAC 2017)

Abstract

We introduce a version of the Min Sum Set Cover (MSSC) problem in which there are "AND" precedence constraints on the m sets. In the Precedence-Constrained Min Sum Set Cover (PCMSSC) problem, when interpreted as directed edges, the constraints induce an acyclic directed graph. PCMSSC models the aim of scheduling software tests to prioritize the rate of fault detection subject to dependencies between tests. Our greedy scheme for PCMSSC is similar to the approaches of Feige, Lovasz, and, Tetali for MSSC, and Chekuri and Motwani for precedence-constrained scheduling to minimize weighted completion time. With a factor-4 increase in approximation ratio, we reduce PCMSSC to the problem of finding a maximum-density precedence-closed sub-family of sets, where density is the ratio of sub-family union size to cardinality. We provide a greedy factor-sqrt m algorithm for maximizing density; on forests of in-trees, we show this algorithm finds an optimal solution. Harnessing an alternative greedy argument of Chekuri and Kumar for Maximum Coverage with Group Budget Constraints, on forests of out-trees, we design an algorithm with approximation ratio equal to maximum tree height. Finally, with a reduction from the Planted Dense Subgraph detection problem, we show that its conjectured hardness implies there is no polynomial-time algorithm for PCMSSC with approximation factor in O(m^{1/12-epsilon}).

Cite as

Jessica McClintock, Julián Mestre, and Anthony Wirth. Precedence-Constrained Min Sum Set Cover. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 55:1-55:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{mcclintock_et_al:LIPIcs.ISAAC.2017.55,
  author =	{McClintock, Jessica and Mestre, Juli\'{a}n and Wirth, Anthony},
  title =	{{Precedence-Constrained Min Sum Set Cover}},
  booktitle =	{28th International Symposium on Algorithms and Computation (ISAAC 2017)},
  pages =	{55:1--55:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-054-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{92},
  editor =	{Okamoto, Yoshio and Tokuyama, Takeshi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2017.55},
  URN =		{urn:nbn:de:0030-drops-82648},
  doi =		{10.4230/LIPIcs.ISAAC.2017.55},
  annote =	{Keywords: planted dense subgraph, min sum set cover, precedence constrained}
}

Document

DOI: 10.4230/LIPIcs.APPROX-RANDOM.2016.4

On Approximating Target Set Selection

Authors: Moses Charikar, Yonatan Naamad, and Anthony Wirth

Published in: LIPIcs, Volume 60, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)

Abstract

We study the Target Set Selection (TSS) problem introduced by Kempe, Kleinberg, and Tardos (2003). This problem models the propagation of influence in a network, in a sequence of rounds. A set of nodes is made "active" initially. In each subsequent round, a vertex is activated if at least a certain number of its neighbors are (already) active. In the minimization version, the goal is to activate a small set of vertices initially - a seed, or target, set - so that activation spreads to the entire graph. In the absence of a sublinear-factor algorithm for the general version, we provide a (sublinear) approximation algorithm for the bounded-round version, where the goal is to activate all the vertices in r rounds. Assuming a known conjecture on the hardness of Planted Dense Subgraph, we establish hardness-of-approximation results for the bounded-round version. We show that they translate to general Target Set Selection, leading to a hardness factor of n^(1/2-epsilon) for all epsilon > 0. This is the first polynomial hardness result for Target Set Selection, and the strongest conditional result known for a large class of monotone satisfiability problems. In the maximization version of TSS, the goal is to pick a target set of size k so as to maximize the number of nodes eventually active. We show an n^(1-epsilon) hardness result for the undirected maximization version of the problem, thus establishing that the undirected case is as hard as the directed case. Finally, we demonstrate an SETH lower bound for the exact computation of the optimal seed set.

Cite as

Moses Charikar, Yonatan Naamad, and Anthony Wirth. On Approximating Target Set Selection. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 60, pp. 4:1-4:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.APPROX-RANDOM.2016.4,
  author =	{Charikar, Moses and Naamad, Yonatan and Wirth, Anthony},
  title =	{{On Approximating Target Set Selection}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)},
  pages =	{4:1--4:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-018-7},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{60},
  editor =	{Jansen, Klaus and Mathieu, Claire and Rolim, Jos\'{e} D. P. and Umans, Chris},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2016.4},
  URN =		{urn:nbn:de:0030-drops-66274},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2016.4},
  annote =	{Keywords: target set selection, influence propagation, approximation algorithms, hardness of approximation, planted dense subgraph}
}

@InProceedings{charikar_et_al:LIPIcs.APPROX-RANDOM.2016.4,
  author =	{Charikar, Moses and Naamad, Yonatan and Wirth, Anthony},
  title =	{{On Approximating Target Set Selection}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)},
  pages =	{4:1--4:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-018-7},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{60},
  editor =	{Jansen, Klaus and Mathieu, Claire and Rolim, Jos\'{e} D. P. and Umans, Chris},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2016.4},
  URN =		{urn:nbn:de:0030-drops-66274},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2016.4},
  annote =	{Keywords: target set selection, influence propagation, approximation algorithms, hardness of approximation, planted dense subgraph}
}

Search Results

Documents authored by Wirth, Anthony

Exploiting New Properties of String Net Frequency for Efficient Computation

Abstract

Cite as

Maximum Coverage in Random-Arrival Streams

Abstract

Cite as

Maximum Coverage in Sublinear Space, Faster

Abstract

Cite as

An Almost Optimal Algorithm for Unbounded Search with Noisy Information

Abstract

Cite as

Recency Queries with Succinct Representation

Abstract

Cite as

Graph Clustering in All Parameter Regimes

Abstract

Cite as

Result-Sensitive Binary Search with Noisy Information

Abstract

Cite as

Correlation Clustering Generalized

Abstract

Cite as

Precedence-Constrained Min Sum Set Cover

Abstract

Cite as

On Approximating Target Set Selection

Abstract

Cite as