DROPS

Document

DOI: 10.4230/LIPIcs.CCC.2024.2

Streaming Zero-Knowledge Proofs

Authors: Graham Cormode, Marcel Dall'Agnol, Tom Gur, and Chris Hickey

Published in: LIPIcs, Volume 300, 39th Computational Complexity Conference (CCC 2024)

Abstract

Streaming interactive proofs (SIPs) enable a space-bounded algorithm with one-pass access to a massive stream of data to verify a computation that requires large space, by communicating with a powerful but untrusted prover. This work initiates the study of zero-knowledge proofs for data streams. We define the notion of zero-knowledge in the streaming setting and construct zero-knowledge SIPs for the two main algorithmic building blocks in the streaming interactive proofs literature: the sumcheck and polynomial evaluation protocols. To the best of our knowledge all known streaming interactive proofs are based on either of these tools, and indeed, this allows us to obtain zero-knowledge SIPs for central streaming problems such as index, point and range queries, median, frequency moments, and inner product. Our protocols are efficient in terms of time and space, as well as communication: the verifier algorithm’s space complexity is polylog(n) and, after a non-interactive setup that uses a random string of near-linear length, the remaining parameters are n^o(1). En route, we develop an algorithmic toolkit for designing zero-knowledge data stream protocols, consisting of an algebraic streaming commitment protocol and a temporal commitment protocol. Our analyses rely on delicate algebraic and information-theoretic arguments and reductions from average-case communication complexity.

Cite as

Graham Cormode, Marcel Dall'Agnol, Tom Gur, and Chris Hickey. Streaming Zero-Knowledge Proofs. In 39th Computational Complexity Conference (CCC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 300, pp. 2:1-2:66, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{cormode_et_al:LIPIcs.CCC.2024.2,
  author =	{Cormode, Graham and Dall'Agnol, Marcel and Gur, Tom and Hickey, Chris},
  title =	{{Streaming Zero-Knowledge Proofs}},
  booktitle =	{39th Computational Complexity Conference (CCC 2024)},
  pages =	{2:1--2:66},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-331-7},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{300},
  editor =	{Santhanam, Rahul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2024.2},
  URN =		{urn:nbn:de:0030-drops-203988},
  doi =		{10.4230/LIPIcs.CCC.2024.2},
  annote =	{Keywords: Zero-knowledge proofs, streaming algorithms, computational complexity}
}

Document

DOI: 10.4230/LIPIcs.CCC.2024.7

Polynomial Pass Semi-Streaming Lower Bounds for K-Cores and Degeneracy

Authors: Sepehr Assadi, Prantar Ghosh, Bruno Loff, Parth Mittal, and Sagnik Mukhopadhyay

Published in: LIPIcs, Volume 300, 39th Computational Complexity Conference (CCC 2024)

Abstract

The following question arises naturally in the study of graph streaming algorithms: Is there any graph problem which is "not too hard", in that it can be solved efficiently with total communication (nearly) linear in the number n of vertices, and for which, nonetheless, any streaming algorithm with Õ(n) space (i.e., a semi-streaming algorithm) needs a polynomial n^Ω(1) number of passes? Assadi, Chen, and Khanna [STOC 2019] were the first to prove that this is indeed the case. However, the lower bounds that they obtained are for rather non-standard graph problems. Our first main contribution is to present the first polynomial-pass lower bounds for natural "not too hard" graph problems studied previously in the streaming model: k-cores and degeneracy. We devise a novel communication protocol for both problems with near-linear communication, thus showing that k-cores and degeneracy are natural examples of "not too hard" problems. Indeed, previous work have developed single-pass semi-streaming algorithms for approximating these problems. In contrast, we prove that any semi-streaming algorithm for exactly solving these problems requires (almost) Ω(n^{1/3}) passes. The lower bound follows by a reduction from a generalization of the hidden pointer chasing (HPC) problem of Assadi, Chen, and Khanna, which is also the basis of their earlier semi-streaming lower bounds. Our second main contribution is improved round-communication lower bounds for the underlying communication problems at the basis of these reductions: - We improve the previous lower bound of Assadi, Chen, and Khanna for HPC to achieve optimal bounds for this problem. - We further observe that all current reductions from HPC can also work with a generalized version of this problem that we call MultiHPC, and prove an even stronger and optimal lower bound for this generalization. These two results collectively allow us to improve the resulting pass lower bounds for semi-streaming algorithms by a polynomial factor, namely, from n^{1/5} to n^{1/3} passes.

Cite as

Sepehr Assadi, Prantar Ghosh, Bruno Loff, Parth Mittal, and Sagnik Mukhopadhyay. Polynomial Pass Semi-Streaming Lower Bounds for K-Cores and Degeneracy. In 39th Computational Complexity Conference (CCC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 300, pp. 7:1-7:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{assadi_et_al:LIPIcs.CCC.2024.7,
  author =	{Assadi, Sepehr and Ghosh, Prantar and Loff, Bruno and Mittal, Parth and Mukhopadhyay, Sagnik},
  title =	{{Polynomial Pass Semi-Streaming Lower Bounds for K-Cores and Degeneracy}},
  booktitle =	{39th Computational Complexity Conference (CCC 2024)},
  pages =	{7:1--7:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-331-7},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{300},
  editor =	{Santhanam, Rahul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2024.7},
  URN =		{urn:nbn:de:0030-drops-204035},
  doi =		{10.4230/LIPIcs.CCC.2024.7},
  annote =	{Keywords: Graph streaming, Lower bounds, Communication complexity, k-Cores and degeneracy}
}

Document

DOI: 10.4230/LIPIcs.SEA.2024.12

Streaming Matching and Edge Cover in Practice

Authors: S M Ferdous, Alex Pothen, and Mahantesh Halappanavar

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)

Abstract

Graph algorithms with polynomial space and time requirements often become infeasible for massive graphs with billions of edges or more. State-of-the-art approaches therefore employ approximate serial, parallel, and distributed algorithms to tackle these challenges. However, such approaches require storing the entire graph in memory and thus need access to costly computing resources such as clusters and supercomputers. In this paper, we present practical streaming approaches for solving massive graph problems using limited memory for two prototypical graph problems: maximum weighted matching and minimum weighted edge cover. For matching, we conduct a thorough computational study on two of the semi-streaming algorithms including a recent breakthrough result that achieves a 1/(2+ε)-approximation of the weight while using O(n log W /ε) memory (here n is the number of vertices and W is the maximum edge weight), designed by Paz and Schwartzman [SODA, 2017]. Empirically, we show that the semi-streaming algorithms produce matchings whose weight is close to the best 1/2-approximate offline algorithm while requiring less time and an order-of-magnitude less memory. For minimum weighted edge cover, we develop three novel semi-streaming algorithms. Two of these algorithms require a single pass through the input graph, require O(n log n) memory, and provide a 2-approximation guarantee on the objective. We also leverage a relationship between approximate maximum weighted matching and approximate minimum weighted edge cover to develop a two-pass 3/2+ε-approximate algorithm with the memory requirement of Paz and Schwartzman’s semi-streaming matching algorithm. These streaming approaches are compared against the state-of-the-art 3/2-approximate offline algorithm. The semi-streaming matching and the novel edge cover algorithms proposed in this paper can process graphs with several billions of edges in under 30 minutes using 6 GB of memory, which is at least an order of magnitude improvement from the offline (non-streaming) algorithms. For the largest graph, the best alternative offline parallel approximation algorithm (GPA+ROMA) could not finish in three hours even while employing hundreds of processors and 1 TB of memory. We also demonstrate an application of semi-streaming algorithm by computing a matching using linearly bounded memory on intersection graphs derived from three machine learning datasets, while the existing offline algorithms could not complete on one of these datasets since its memory requirement exceeded 1TB.

Cite as

S M Ferdous, Alex Pothen, and Mahantesh Halappanavar. Streaming Matching and Edge Cover in Practice. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 12:1-12:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{ferdous_et_al:LIPIcs.SEA.2024.12,
  author =	{Ferdous, S M and Pothen, Alex and Halappanavar, Mahantesh},
  title =	{{Streaming Matching and Edge Cover in Practice}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{12:1--12:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.12},
  URN =		{urn:nbn:de:0030-drops-203773},
  doi =		{10.4230/LIPIcs.SEA.2024.12},
  annote =	{Keywords: Matching, Edge Cover, Semi-Streaming Algorithm, Parallel Algorithms, Algorithm Engineering}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.46

On the Streaming Complexity of Expander Decomposition

Authors: Yu Chen, Michael Kapralov, Mikhail Makarov, and Davide Mazzali

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

In this paper we study the problem of finding (ε, ϕ)-expander decompositions of a graph in the streaming model, in particular for dynamic streams of edge insertions and deletions. The goal is to partition the vertex set so that every component induces a ϕ-expander, while the number of inter-cluster edges is only an ε fraction of the total volume. It was recently shown that there exists a simple algorithm to construct a (O(ϕ log n), ϕ)-expander decomposition of an n-vertex graph using Õ(n/ϕ²) bits of space [Filtser, Kapralov, Makarov, ITCS'23]. This result calls for understanding the extent to which a dependence in space on the sparsity parameter ϕ is inherent. We move towards answering this question on two fronts. We prove that a (O(ϕ log n), ϕ)-expander decomposition can be found using Õ(n) space, for every ϕ. At the core of our result is the first streaming algorithm for computing boundary-linked expander decompositions, a recently introduced strengthening of the classical notion [Goranci et al., SODA'21]. The key advantage is that a classical sparsifier [Fung et al., STOC'11], with size independent of ϕ, preserves the cuts inside the clusters of a boundary-linked expander decomposition within a multiplicative error. Notable algorithmic applications use sequences of expander decompositions, in particular one often repeatedly computes a decomposition of the subgraph induced by the inter-cluster edges (e.g., the seminal work of Spielman and Teng on spectral sparsifiers [Spielman, Teng, SIAM Journal of Computing 40(4)], or the recent maximum flow breakthrough [Chen et al., FOCS'22], among others). We prove that any streaming algorithm that computes a sequence of (O(ϕ log n), ϕ)-expander decompositions requires Ω̃(n/ϕ) bits of space, even in insertion only streams.

Cite as

Yu Chen, Michael Kapralov, Mikhail Makarov, and Davide Mazzali. On the Streaming Complexity of Expander Decomposition. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 46:1-46:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICALP.2024.46,
  author =	{Chen, Yu and Kapralov, Michael and Makarov, Mikhail and Mazzali, Davide},
  title =	{{On the Streaming Complexity of Expander Decomposition}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{46:1--46:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.46},
  URN =		{urn:nbn:de:0030-drops-201890},
  doi =		{10.4230/LIPIcs.ICALP.2024.46},
  annote =	{Keywords: Graph Sketching, Dynamic Streaming, Expander Decomposition}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.54

Nearly Optimal Independence Oracle Algorithms for Edge Estimation in Hypergraphs

Authors: Holger Dell, John Lapinskas, and Kitty Meeks

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

Consider a query model of computation in which an n-vertex k-hypergraph can be accessed only via its independence oracle or via its colourful independence oracle, and each oracle query may incur a cost depending on the size of the query. Several recent results (Dell and Lapinskas, STOC 2018; Dell, Lapinskas, and Meeks, SODA 2020) give efficient algorithms to approximately count the hypergraph’s edges in the colourful setting. These algorithms immediately imply fine-grained reductions from approximate counting to decision, with overhead only log^Θ(k) n over the running time n^α of the original decision algorithm, for many well-studied problems including k-Orthogonal Vectors, k-SUM, subgraph isomorphism problems including k-Clique and colourful-H, graph motifs, and k-variable first-order model checking. We explore the limits of what is achievable in this setting, obtaining unconditional lower bounds on the oracle cost of algorithms to approximately count the hypergraph’s edges in both the colourful and uncoloured settings. In both settings, we also obtain algorithms which essentially match these lower bounds; in the colourful setting, this requires significant changes to the algorithm of Dell, Lapinskas, and Meeks (SODA 2020) and reduces the total overhead to log^{Θ(k-α)}n. Our lower bound for the uncoloured setting shows that there is no fine-grained reduction from approximate counting to the corresponding uncoloured decision problem (except in the case α ≥ k-1): without an algorithm for the colourful decision problem, we cannot hope to avoid the much larger overhead of roughly n^{(k-α)²/4}. The uncoloured setting has previously been studied for the special case k = 2 (Peled, Ramamoorthy, Rashtchian, Sinha, ITCS 2018; Chen, Levi, and Waingarten, SODA 2020), and our work generalises the existing algorithms and lower bounds for this special case to k > 2 and to oracles with cost.

Cite as

Holger Dell, John Lapinskas, and Kitty Meeks. Nearly Optimal Independence Oracle Algorithms for Edge Estimation in Hypergraphs. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 54:1-54:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{dell_et_al:LIPIcs.ICALP.2024.54,
  author =	{Dell, Holger and Lapinskas, John and Meeks, Kitty},
  title =	{{Nearly Optimal Independence Oracle Algorithms for Edge Estimation in Hypergraphs}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{54:1--54:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.54},
  URN =		{urn:nbn:de:0030-drops-201977},
  doi =		{10.4230/LIPIcs.ICALP.2024.54},
  annote =	{Keywords: Graph oracles, Fine-grained complexity, Approximate counting, Hypergraphs}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.92

Algorithms for the Generalized Poset Sorting Problem

Authors: Shaofeng H.-C. Jiang, Wenqian Wang, Yubo Zhang, and Yuhao Zhang

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We consider a generalized poset sorting problem (GPS), in which we are given a query graph G = (V, E) and an unknown poset 𝒫(V, ≺) that is defined on the same vertex set V, and the goal is to make as few queries as possible to edges in G in order to fully recover 𝒫, where each query (u, v) returns the relation between u, v, i.e., u ≺ v, v ≺ u or u ̸ ∼ v. This generalizes both the poset sorting problem [Faigle et al., SICOMP 88] and the generalized sorting problem [Huang et al., FOCS 11]. We give algorithms with Õ(n poly(k)) query complexity when G is a complete bipartite graph or G is stochastic under the Erdős-Rényi model, where k is the width of the poset, and these generalize [Daskalakis et al., SICOMP 11] which only studies complete graph G. Both results are based on a unified framework that reduces the poset sorting to partitioning the vertices with respect to a given pivot element, which may be of independent interest. Moreover, we also propose novel algorithms to implement this partition oracle. Notably, we suggest a randomized BFS with vertex skipping for the stochastic G, and it yields a nearly-tight bound even for the special case of generalized sorting (for stochastic G) which is comparable to the main result of a recent work [Kuszmaul et al., FOCS 21] but is conceptually different and simplified. Our study of GPS also leads to a new Õ(n^{1 - 1 / (2W)}) competitive ratio for the so-called weighted generalized sorting problem where W is the number of distinct weights in the query graph. This problem was considered as an open question in [Charikar et al., JCSS 02], and our result makes important progress as it yields the first nontrivial sublinear ratio for general weighted query graphs (for any bounded W). We obtain this via an Õ(nk + n^{1.5}) query complexity algorithm for the case where every edge in G is guaranteed to be comparable in the poset, which generalizes a Õ(n^{1.5}) bound for generalized sorting [Huang et al., FOCS 11].

Cite as

Shaofeng H.-C. Jiang, Wenqian Wang, Yubo Zhang, and Yuhao Zhang. Algorithms for the Generalized Poset Sorting Problem. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 92:1-92:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.ICALP.2024.92,
  author =	{Jiang, Shaofeng H.-C. and Wang, Wenqian and Zhang, Yubo and Zhang, Yuhao},
  title =	{{Algorithms for the Generalized Poset Sorting Problem}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{92:1--92:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.92},
  URN =		{urn:nbn:de:0030-drops-202359},
  doi =		{10.4230/LIPIcs.ICALP.2024.92},
  annote =	{Keywords: sorting, poset sorting, generalized sorting}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.93

Streaming Algorithms for Connectivity Augmentation

Authors: Ce Jin, Michael Kapralov, Sepideh Mahabadi, and Ali Vakilian

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We study the k-connectivity augmentation problem (k-CAP) in the single-pass streaming model. Given a (k-1)-edge connected graph G = (V,E) that is stored in memory, and a stream of weighted edges (also called links) L with weights in {0,1,… ,W}, the goal is to choose a minimum weight subset L' ⊆ L of the links such that G' = (V,E∪ L') is k-edge connected. We give a (2+ε)-approximation algorithm for this problem which requires to store O(ε^{-1} nlog n) words. Moreover, we show the tightness of our result: Any algorithm with better than 2-approximation for the problem requires Ω(n²) bits of space even when k = 2. This establishes a gap between the optimal approximation factor one can obtain in the streaming vs the offline setting for k-CAP. We further consider a natural generalization to the fully streaming model where both E and L arrive in the stream in an arbitrary order. We show that this problem has a space lower bound that matches the best possible size of a spanner of the same approximation ratio. Following this, we give improved results for spanners on weighted graphs: We show a streaming algorithm that finds a (2t-1+ε)-approximate weighted spanner of size at most O(ε^{-1} n^{1+1/t}log n) for integer t, whereas the best prior streaming algorithm for spanner on weighted graphs had size depending on log W. We believe that this result is of independent interest. Using our spanner result, we provide an optimal O(t)-approximation for k-CAP in the fully streaming model with O(nk + n^{1+1/t}) words of space. Finally we apply our results to network design problems such as Steiner tree augmentation problem (STAP), k-edge connected spanning subgraph (k-ECSS) and the general Survivable Network Design problem (SNDP). In particular, we show a single-pass O(tlog k)-approximation for SNDP using O(kn^{1+1/t}) words of space, where k is the maximum connectivity requirement.

Cite as

Ce Jin, Michael Kapralov, Sepideh Mahabadi, and Ali Vakilian. Streaming Algorithms for Connectivity Augmentation. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 93:1-93:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{jin_et_al:LIPIcs.ICALP.2024.93,
  author =	{Jin, Ce and Kapralov, Michael and Mahabadi, Sepideh and Vakilian, Ali},
  title =	{{Streaming Algorithms for Connectivity Augmentation}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{93:1--93:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.93},
  URN =		{urn:nbn:de:0030-drops-202367},
  doi =		{10.4230/LIPIcs.ICALP.2024.93},
  annote =	{Keywords: streaming algorithms, connectivity augmentation}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.115

On the Cut-Query Complexity of Approximating Max-Cut

Authors: Orestis Plevrakis, Seyoon Ragavan, and S. Matthew Weinberg

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We consider the problem of query-efficient global max-cut on a weighted undirected graph in the value oracle model examined by [Rubinstein et al., 2018]. Graph algorithms in this cut query model and other query models have recently been studied for various other problems such as min-cut, connectivity, bipartiteness, and triangle detection. Max-cut in the cut query model can also be viewed as a natural special case of submodular function maximization: on query S ⊆ V, the oracle returns the total weight of the cut between S and V\S. Our first main technical result is a lower bound stating that a deterministic algorithm achieving a c-approximation for any c > 1/2 requires Ω(n) queries. This uses an extension of the cut dimension to rule out approximation (prior work of [Graur et al., 2020] introducing the cut dimension only rules out exact solutions). Secondly, we provide a randomized algorithm with Õ(n) queries that finds a c-approximation for any c < 1. We achieve this using a query-efficient sparsifier for undirected weighted graphs (prior work of [Rubinstein et al., 2018] holds only for unweighted graphs). To complement these results, for most constants c ∈ (0,1], we nail down the query complexity of achieving a c-approximation, for both deterministic and randomized algorithms (up to logarithmic factors). Analogously to general submodular function maximization in the same model, we observe a phase transition at c = 1/2: we design a deterministic algorithm for global c-approximate max-cut in O(log n) queries for any c < 1/2, and show that any randomized algorithm requires Ω(n/log n) queries to find a c-approximate max-cut for any c > 1/2. Additionally, we show that any deterministic algorithm requires Ω(n²) queries to find an exact max-cut (enough to learn the entire graph).

Cite as

Orestis Plevrakis, Seyoon Ragavan, and S. Matthew Weinberg. On the Cut-Query Complexity of Approximating Max-Cut. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 115:1-115:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{plevrakis_et_al:LIPIcs.ICALP.2024.115,
  author =	{Plevrakis, Orestis and Ragavan, Seyoon and Weinberg, S. Matthew},
  title =	{{On the Cut-Query Complexity of Approximating Max-Cut}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{115:1--115:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.115},
  URN =		{urn:nbn:de:0030-drops-202587},
  doi =		{10.4230/LIPIcs.ICALP.2024.115},
  annote =	{Keywords: query complexity, maximum cut, approximation algorithms, graph sparsification}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.121

Streaming Edge Coloring with Asymptotically Optimal Colors

Authors: Mohammad Saneian and Soheil Behnezhad

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

Given a graph G, an edge-coloring is an assignment of colors to edges of G such that any two edges sharing an endpoint receive different colors. By Vizing’s celebrated theorem, any graph of maximum degree Δ needs at least Δ and at most (Δ + 1) colors to be properly edge colored. In this paper, we study edge colorings in the streaming setting. The edges arrive one by one in an arbitrary order. The algorithm takes a single pass over the input and must output a solution using a much smaller space than the input size. Since the output of edge coloring is as large as its input, the assigned colors should also be reported in a streaming fashion. The streaming edge coloring problem has been studied in a series of works over the past few years. The main challenge is that the algorithm cannot "remember" all the color assignments that it returns. To ensure the validity of the solution, existing algorithms use many more colors than Vizing’s bound. Namely, in n-vertex graphs, the state-of-the-art algorithm with Õ(n s) space requires O(Δ²/s + Δ) colors. Note, in particular, that for an asymptotically optimal O(Δ) coloring, this algorithm requires Ω(nΔ) space which is as large as the input. Whether such a coloring can be achieved with sublinear space has been left open. In this paper, we answer this question in the affirmative. We present a randomized algorithm that returns an asymptotically optimal O(Δ) edge coloring using Õ(n √{Δ}) space. More generally, our algorithm returns a proper O(Δ^{1.5}/s + Δ) edge coloring with Õ(n s) space, improving prior algorithms for the whole range of s.

Cite as

Mohammad Saneian and Soheil Behnezhad. Streaming Edge Coloring with Asymptotically Optimal Colors. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 121:1-121:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{saneian_et_al:LIPIcs.ICALP.2024.121,
  author =	{Saneian, Mohammad and Behnezhad, Soheil},
  title =	{{Streaming Edge Coloring with Asymptotically Optimal Colors}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{121:1--121:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.121},
  URN =		{urn:nbn:de:0030-drops-202640},
  doi =		{10.4230/LIPIcs.ICALP.2024.121},
  annote =	{Keywords: Streaming, Edge coloring, Adversarial order}
}

Document

DOI: 10.4230/LIPIcs.ESA.2023.102

Maximum Coverage in Random-Arrival Streams

Authors: Rowan Warneke, Farhana Choudhury, and Anthony Wirth

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)

Abstract

Given a collection of m sets, each a subset of a universe {1, …, n}, maximum coverage is the problem of choosing k sets whose union has the largest cardinality. A simple greedy algorithm achieves an approximation factor of 1 - 1 / e ≈ 0.632, which is the best possible polynomial-time approximation unless P = NP. In the streaming setting, information about the input is revealed gradually, in an online fashion. In the set-streaming model, each set is listed contiguously in the stream. In the more general edge-streaming model, the stream is composed of set-element pairs, denoting membership. The overall goal in the streaming setting is to design algorithms that use sublinear space in the size of the input. An interesting line of research is to design algorithms with space complexity polylogarithmic in the size of the input (i.e., polylogarithmic in both n and m); we call such algorithms low-space. In the set-streaming model, it is known that 1/2 is the best possible low-space approximation. In the edge-streaming model, no low-space algorithm can achieve a nontrivial approximation factor. We study the problem under the assumption that the order in which the stream arrives is chosen uniformly at random. Our main results are as follows. - In the random-arrival set-streaming model, we give two new algorithms to show that low space is sufficient to break the 1/2 barrier. The first achieves an approximation factor of 1/2 + c₁ using Õ(k²) space, where c₁ > 0 is a small constant and Õ(⋅) notation suppresses polylogarithmic factors; the second achieves a factor of 1 - 1 / e - ε - o(1) using Õ(k² ε^{-3}) space, where the o(1) term is a function of k. This is essentially the optimal bound, as breaking the 1-1/e barrier is known to require high space. - In the random-arrival edge-streaming model, we show for all fixed α > 0 and δ > 0, any algorithm that α-approximates maximum coverage with probability at least 0.9 in the random-arrival edge-streaming model requires Ω(m^{1-δ}) space (i.e., high space), even for the special case of k = 1.

Cite as

Rowan Warneke, Farhana Choudhury, and Anthony Wirth. Maximum Coverage in Random-Arrival Streams. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 102:1-102:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{warneke_et_al:LIPIcs.ESA.2023.102,
  author =	{Warneke, Rowan and Choudhury, Farhana and Wirth, Anthony},
  title =	{{Maximum Coverage in Random-Arrival Streams}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{102:1--102:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.102},
  URN =		{urn:nbn:de:0030-drops-187559},
  doi =		{10.4230/LIPIcs.ESA.2023.102},
  annote =	{Keywords: Maximum Coverage, Streaming Algorithm, Random Arrival, Greedy Algorithm, Communication Complexity}
}

Document

DOI: 10.4230/LIPIcs.ESA.2022.2

Non-Adaptive Edge Counting and Sampling via Bipartite Independent Set Queries

Authors: Raghavendra Addanki, Andrew McGregor, and Cameron Musco

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

Abstract

We study the problem of estimating the number of edges in an n-vertex graph, accessed via the Bipartite Independent Set query model introduced by Beame et al. (TALG '20). In this model, each query returns a Boolean, indicating the existence of at least one edge between two specified sets of nodes. We present a non-adaptive algorithm that returns a (1± ε) relative error approximation to the number of edges, with query complexity Õ(ε^{-5}log⁵ n), where Õ(⋅) hides poly(log log n) dependencies. This is the first non-adaptive algorithm in this setting achieving poly(1/ε,log n) query complexity. Prior work requires Ω(log² n) rounds of adaptivity. We avoid this by taking a fundamentally different approach, inspired by work on single-pass streaming algorithms. Moreover, for constant ε, our query complexity significantly improves on the best known adaptive algorithm due to Bhattacharya et al. (STACS '22), which requires O(ε^{-2} log^{11} n) queries. Building on our edge estimation result, we give the first {non-adaptive} algorithm for outputting a nearly uniformly sampled edge with query complexity Õ(ε^{-6} log⁶ n), improving on the works of Dell et al. (SODA '20) and Bhattacharya et al. (STACS '22), which require Ω(log³ n) rounds of adaptivity. Finally, as a consequence of our edge sampling algorithm, we obtain a Õ(n log^8 n) query algorithm for connectivity, using two rounds of adaptivity. This improves on a three-round algorithm of Assadi et al. (ESA '21) and is tight; there is no non-adaptive algorithm for connectivity making o(n²) queries.

Cite as

Raghavendra Addanki, Andrew McGregor, and Cameron Musco. Non-Adaptive Edge Counting and Sampling via Bipartite Independent Set Queries. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 2:1-2:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{addanki_et_al:LIPIcs.ESA.2022.2,
  author =	{Addanki, Raghavendra and McGregor, Andrew and Musco, Cameron},
  title =	{{Non-Adaptive Edge Counting and Sampling via Bipartite Independent Set Queries}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{2:1--2:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.2},
  URN =		{urn:nbn:de:0030-drops-169400},
  doi =		{10.4230/LIPIcs.ESA.2022.2},
  annote =	{Keywords: sublinear graph algorithms, bipartite independent set queries, edge sampling and counting, graph connectivity, query adaptivity}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2022.96

Graph Reconstruction from Random Subgraphs

Authors: Andrew McGregor and Rik Sengupta

Published in: LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Abstract

We consider the problem of reconstructing a graph G in two natural sampling models: 1) each sample corresponds to a random induced subgraph and 2) for a fixed adjacency matrix A_G for G, each sample corresponds to a random principal submatrix (i.e., a submatrix formed by deleting the same set of rows and columns) of A_G. We refer to these models as the "unordered" and "ordered" models respectively. The two models are motivated by work on the reconstruction conjecture in combinatorics and trace reconstruction in theoretical computer science. Despite the superficial similarities between the models, we show that the sample complexity of reconstruction can be exponentially different. Our main results are as follows: - In the unordered model, we show that almost all graphs can be reconstructed with Θ(p^{-2} log n) samples if each node is included in the random subgraph with any constant probability p; this is optimal. We show our upper bound extends to smaller values of p as well. In contrast, for arbitrary graphs, we show that exp(Ω(n)) samples are required for reconstruction even for 2-regular graphs. One of the key technical steps in the first result is showing that, with high probability, any subgraph isomorphism in a random graph has at most O(log n) non-fixed points. - In the ordered model, we show that any graph with constant arboricity or degeneracy (i.e., every induced subgraph has constant average degree) can be reconstructed with exp(Õ(n^{1/3})) samples and that arbitrary graphs can be reconstructed with exp(Õ(n^{1/2})) samples. The results about almost all graphs in the first model carry over to the second. One of the key technical steps in the first result is showing that reconstruction of low degeneracy graphs can be reduced to learning a small number of moments of sets of the form {i-j: j < i,(i,j) ∈ E} and {j-i: i < j,(i,j) ∈ E} where G = ([n],E) is the unknown graph.

Cite as

Andrew McGregor and Rik Sengupta. Graph Reconstruction from Random Subgraphs. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 96:1-96:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{mcgregor_et_al:LIPIcs.ICALP.2022.96,
  author =	{McGregor, Andrew and Sengupta, Rik},
  title =	{{Graph Reconstruction from Random Subgraphs}},
  booktitle =	{49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
  pages =	{96:1--96:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-235-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{229},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.96},
  URN =		{urn:nbn:de:0030-drops-164373},
  doi =		{10.4230/LIPIcs.ICALP.2022.96},
  annote =	{Keywords: graph reconstruction, sample complexity, deletion channel}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2022.7

Improved Approximation and Scalability for Fair Max-Min Diversification

Authors: Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, and Zafeiria Moumoulidou

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

Given an n-point metric space ({𝒳},d) where each point belongs to one of m = O(1) different categories or groups and a set of integers k₁, …, k_m, the fair Max-Min diversification problem is to select k_i points belonging to category i ∈ [m], such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results: 1) We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor 2-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a 6-approximation that is guaranteed to satisfy the fairness constraints up to a factor 1-ε for any constant ε. We also present a linear time algorithm returning an m+1 approximation with exact fairness. The best previous result was a 3m-1 approximation. 2) We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. {For constant dimensions, categories and any constant ε > 0, we present a 1+ε approximation algorithm that runs in O(nk) + 2^{O(k)} time where k = k₁+…+k_m.} We can improve the running time to O(nk)+poly(k) at the expense of only picking (1-ε) k_i points from category i ∈ [m]. Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.

Cite as

Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, and Zafeiria Moumoulidou. Improved Approximation and Scalability for Fair Max-Min Diversification. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 7:1-7:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{addanki_et_al:LIPIcs.ICDT.2022.7,
  author =	{Addanki, Raghavendra and McGregor, Andrew and Meliou, Alexandra and Moumoulidou, Zafeiria},
  title =	{{Improved Approximation and Scalability for Fair Max-Min Diversification}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{7:1--7:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.7},
  URN =		{urn:nbn:de:0030-drops-158812},
  doi =		{10.4230/LIPIcs.ICDT.2022.7},
  annote =	{Keywords: algorithmic fairness, diversity maximization, data selection, approximation algorithms}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2021.12

Maximum Coverage in the Data Stream Model: Parameterized and Generalized

Authors: Andrew McGregor, David Tench, and Hoa T. Vu

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Abstract

We present algorithms for the Max Coverage and Max Unique Coverage problems in the data stream model. The input to both problems are m subsets of a universe of size n and a value k ∈ [m]. In Max Coverage, the problem is to find a collection of at most k sets such that the number of elements covered by at least one set is maximized. In Max Unique Coverage, the problem is to find a collection of at most k sets such that the number of elements covered by exactly one set is maximized. These problems are closely related to a range of graph problems including matching, partial vertex cover, and capacitated maximum cut. In the data stream model, we assume k is given and the sets are revealed online. Our goal is to design single-pass algorithms that use space that is sublinear in the input size. Our main algorithmic results are: - If the sets have size at most d, there exist single-pass algorithms using O(d^{d+1} k^d) space that solve both problems exactly. This is optimal up to polylogarithmic factors for constant d. - If each element appears in at most r sets, we present single pass algorithms using Õ(k² r/ε³) space that return a 1+ε approximation in the case of Max Coverage. We also present a single-pass algorithm using slightly more memory, i.e., Õ(k³ r/ε⁴) space, that 1+ε approximates Max Unique Coverage. In contrast to the above results, when d and r are arbitrary, any constant pass 1+ε approximation algorithm for either problem requires Ω(ε^{-2}m) space but a single pass O(ε^{-2}mk) space algorithm exists. In fact any constant-pass algorithm with an approximation better than e/(e-1) and e^{1-1/k} for Max Coverage and Max Unique Coverage respectively requires Ω(m/k²) space when d and r are unrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set Cover problem.

Cite as

Andrew McGregor, David Tench, and Hoa T. Vu. Maximum Coverage in the Data Stream Model: Parameterized and Generalized. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{mcgregor_et_al:LIPIcs.ICDT.2021.12,
  author =	{McGregor, Andrew and Tench, David and Vu, Hoa T.},
  title =	{{Maximum Coverage in the Data Stream Model: Parameterized and Generalized}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.12},
  URN =		{urn:nbn:de:0030-drops-137208},
  doi =		{10.4230/LIPIcs.ICDT.2021.12},
  annote =	{Keywords: Data streams, maximum coverage, maximum unique coverage, set cover}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2021.13

Diverse Data Selection under Fairness Constraints

Authors: Zafeiria Moumoulidou, Andrew McGregor, and Alexandra Meliou

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Abstract

Diversity is an important principle in data selection and summarization, facility location, and recommendation systems. Our work focuses on maximizing diversity in data selection, while offering fairness guarantees. In particular, we offer the first study that augments the Max-Min diversification objective with fairness constraints. More specifically, given a universe 𝒰 of n elements that can be partitioned into m disjoint groups, we aim to retrieve a k-sized subset that maximizes the pairwise minimum distance within the set (diversity) and contains a pre-specified k_i number of elements from each group i (fairness). We show that this problem is NP-complete even in metric spaces, and we propose three novel algorithms, linear in n, that provide strong theoretical approximation guarantees for different values of m and k. Finally, we extend our algorithms and analysis to the case where groups can be overlapping.

Cite as

Zafeiria Moumoulidou, Andrew McGregor, and Alexandra Meliou. Diverse Data Selection under Fairness Constraints. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 13:1-13:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{moumoulidou_et_al:LIPIcs.ICDT.2021.13,
  author =	{Moumoulidou, Zafeiria and McGregor, Andrew and Meliou, Alexandra},
  title =	{{Diverse Data Selection under Fairness Constraints}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{13:1--13:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.13},
  URN =		{urn:nbn:de:0030-drops-137216},
  doi =		{10.4230/LIPIcs.ICDT.2021.13},
  annote =	{Keywords: data selection, diversity maximization, fairness constraints, approximation algorithms}
}

22 Search Results for "McGregor, Andrew"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as