Document

APPROX

**Published in:** LIPIcs, Volume 275, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)

In this work we consider the diversity maximization problem, where given a data set X of n elements, and a parameter k, the goal is to pick a subset of X of size k maximizing a certain diversity measure. Chandra and Halldórsson [Barun Chandra and Magnús M. Halldórsson, 2001] defined a variety of diversity measures based on pairwise distances between the points. A constant factor approximation algorithm was known for all those diversity measures except "remote-matching", where only an O(log k) approximation was known. In this work we present an O(1) approximation for this remaining notion. Further, we consider these notions from the perpective of composable coresets. Indyk et al. [Piotr Indyk et al., 2014] provided composable coresets with a constant factor approximation for all but "remote-pseudoforest" and "remote-matching", which again they only obtained a O(log k) approximation. Here we also close the gap up to constants and present a constant factor composable coreset algorithm for these two notions. For remote-matching, our coreset has size only O(k), and for remote-pseudoforest, our coreset has size O(k^{1+ε}) for any ε > 0, for an O(1/ε)-approximate coreset.

Sepideh Mahabadi and Shyam Narayanan. Improved Diversity Maximization Algorithms for Matching and Pseudoforest. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 275, pp. 25:1-25:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{mahabadi_et_al:LIPIcs.APPROX/RANDOM.2023.25, author = {Mahabadi, Sepideh and Narayanan, Shyam}, title = {{Improved Diversity Maximization Algorithms for Matching and Pseudoforest}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)}, pages = {25:1--25:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-296-9}, ISSN = {1868-8969}, year = {2023}, volume = {275}, editor = {Megow, Nicole and Smith, Adam}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2023.25}, URN = {urn:nbn:de:0030-drops-188503}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2023.25}, annote = {Keywords: diversity maximization, approximation algorithms, composable coresets} }

Document

RANDOM

**Published in:** LIPIcs, Volume 245, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)

We introduce data structures for solving robust regression through stochastic gradient descent (SGD) by sampling gradients with probability proportional to their norm, i.e., importance sampling. Although SGD is widely used for large scale machine learning, it is well-known for possibly experiencing slow convergence rates due to the high variance from uniform sampling. On the other hand, importance sampling can significantly decrease the variance but is usually difficult to implement because computing the sampling probabilities requires additional passes over the data, in which case standard gradient descent (GD) could be used instead. In this paper, we introduce an algorithm that approximately samples T gradients of dimension d from nearly the optimal importance sampling distribution for a robust regression problem over n rows. Thus our algorithm effectively runs T steps of SGD with importance sampling while using sublinear space and just making a single pass over the data. Our techniques also extend to performing importance sampling for second-order optimization.

Sepideh Mahabadi, David P. Woodruff, and Samson Zhou. Adaptive Sketches for Robust Regression with Importance Sampling. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 31:1-31:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{mahabadi_et_al:LIPIcs.APPROX/RANDOM.2022.31, author = {Mahabadi, Sepideh and Woodruff, David P. and Zhou, Samson}, title = {{Adaptive Sketches for Robust Regression with Importance Sampling}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)}, pages = {31:1--31:21}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-249-5}, ISSN = {1868-8969}, year = {2022}, volume = {245}, editor = {Chakrabarti, Amit and Swamy, Chaitanya}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2022.31}, URN = {urn:nbn:de:0030-drops-171531}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2022.31}, annote = {Keywords: Streaming algorithms, stochastic optimization, importance sampling} }

Document

**Published in:** LIPIcs, Volume 189, 37th International Symposium on Computational Geometry (SoCG 2021)

In this paper, we prove a two-sided variant of the Kirszbraun theorem. Consider an arbitrary subset X of Euclidean space and its superset Y. Let f be a 1-Lipschitz map from X to ℝ^m. The Kirszbraun theorem states that the map f can be extended to a 1-Lipschitz map ̃ f from Y to ℝ^m. While the extension ̃ f does not increase distances between points, there is no guarantee that it does not decrease distances significantly. In fact, ̃ f may even map distinct points to the same point (that is, it can infinitely decrease some distances). However, we prove that there exists a (1 + ε)-Lipschitz outer extension f̃:Y → ℝ^{m'} that does not decrease distances more than "necessary". Namely, ‖f̃(x) - f̃(y)‖ ≥ c √{ε} min(‖x-y‖, inf_{a,b ∈ X} (‖x - a‖ + ‖f(a) - f(b)‖ + ‖b-y‖)) for some absolutely constant c > 0. This bound is asymptotically optimal, since no L-Lipschitz extension g can have ‖g(x) - g(y)‖ > L min(‖x-y‖, inf_{a,b ∈ X} (‖x - a‖ + ‖f(a) - f(b)‖ + ‖b-y‖)) even for a single pair of points x and y.
In some applications, one is interested in the distances ‖f̃(x) - f̃(y)‖ between images of points x,y ∈ Y rather than in the map f̃ itself. The standard Kirszbraun theorem does not provide any method of computing these distances without computing the entire map ̃ f first. In contrast, our theorem provides a simple approximate formula for distances ‖f̃(x) - f̃(y)‖.

Arturs Backurs, Sepideh Mahabadi, Konstantin Makarychev, and Yury Makarychev. Two-Sided Kirszbraun Theorem. In 37th International Symposium on Computational Geometry (SoCG 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 189, pp. 13:1-13:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{backurs_et_al:LIPIcs.SoCG.2021.13, author = {Backurs, Arturs and Mahabadi, Sepideh and Makarychev, Konstantin and Makarychev, Yury}, title = {{Two-Sided Kirszbraun Theorem}}, booktitle = {37th International Symposium on Computational Geometry (SoCG 2021)}, pages = {13:1--13:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-184-9}, ISSN = {1868-8969}, year = {2021}, volume = {189}, editor = {Buchin, Kevin and Colin de Verdi\`{e}re, \'{E}ric}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2021.13}, URN = {urn:nbn:de:0030-drops-138129}, doi = {10.4230/LIPIcs.SoCG.2021.13}, annote = {Keywords: Kirszbraun theorem, Lipschitz map, Outer-extension, Two-sided extension} }

Document

APPROX

**Published in:** LIPIcs, Volume 176, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)

We study the space complexity of solving the bias-regularized SVM problem in the streaming model. In particular, given a data set (x_i,y_i) ∈ ℝ^d× {-1,+1}, the objective function is F_λ(θ,b) = λ/2‖(θ,b)‖₂² + 1/n∑_{i=1}ⁿ max{0,1-y_i(θ^Tx_i+b)} and the goal is to find the parameters that (approximately) minimize this objective. This is a classic supervised learning problem that has drawn lots of attention, including for developing fast algorithms for solving the problem approximately: i.e., for finding (θ,b) such that F_λ(θ,b) ≤ min_{(θ',b')} F_λ(θ',b')+ε.
One of the most widely used algorithms for approximately optimizing the SVM objective is Stochastic Gradient Descent (SGD), which requires only O(1/λε) random samples, and which immediately yields a streaming algorithm that uses O(d/λε) space. For related problems, better streaming algorithms are only known for smooth functions, unlike the SVM objective that we focus on in this work.
We initiate an investigation of the space complexity for both finding an approximate optimum of this objective, and for the related "point estimation" problem of sketching the data set to evaluate the function value F_λ on any query (θ, b). We show that, for both problems, for dimensions d = 1,2, one can obtain streaming algorithms with space polynomially smaller than 1/λε, which is the complexity of SGD for strongly convex functions like the bias-regularized SVM [Shalev-Shwartz et al., 2007], and which is known to be tight in general, even for d = 1 [Agarwal et al., 2009]. We also prove polynomial lower bounds for both point estimation and optimization. In particular, for point estimation we obtain a tight bound of Θ(1/√{ε}) for d = 1 and a nearly tight lower bound of Ω̃(d/{ε}²) for d = Ω(log(1/ε)). Finally, for optimization, we prove a Ω(1/√{ε}) lower bound for d = Ω(log(1/ε)), and show similar bounds when d is constant.

Alexandr Andoni, Collin Burns, Yi Li, Sepideh Mahabadi, and David P. Woodruff. Streaming Complexity of SVMs. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 50:1-50:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{andoni_et_al:LIPIcs.APPROX/RANDOM.2020.50, author = {Andoni, Alexandr and Burns, Collin and Li, Yi and Mahabadi, Sepideh and Woodruff, David P.}, title = {{Streaming Complexity of SVMs}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)}, pages = {50:1--50:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-164-1}, ISSN = {1868-8969}, year = {2020}, volume = {176}, editor = {Byrka, Jaros{\l}aw and Meka, Raghu}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2020.50}, URN = {urn:nbn:de:0030-drops-126532}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2020.50}, annote = {Keywords: support vector machine, streaming algorithm, space lower bound, sketching algorithm, point estimation} }

Document

**Published in:** LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

In the Sparse Linear Regression (SLR) problem, given a d x n matrix M and a d-dimensional query q, the goal is to compute a k-sparse n-dimensional vector tau such that the error ||M tau - q|| is minimized. This problem is equivalent to the following geometric problem: given a set P of n points and a query point q in d dimensions, find the closest k-dimensional subspace to q, that is spanned by a subset of k points in P. In this paper, we present data-structures/algorithms and conditional lower bounds for several variants of this problem (such as finding the closest induced k dimensional flat/simplex instead of a subspace).
In particular, we present approximation algorithms for the online variants of the above problems with query time O~(n^{k-1}), which are of interest in the "low sparsity regime" where k is small, e.g., 2 or 3. For k=d, this matches, up to polylogarithmic factors, the lower bound that relies on the affinely degenerate conjecture (i.e., deciding if n points in R^d contains d+1 points contained in a hyperplane takes Omega(n^d) time). Moreover, our algorithms involve formulating and solving several geometric subproblems, which we believe to be of independent interest.

Sariel Har-Peled, Piotr Indyk, and Sepideh Mahabadi. Approximate Sparse Linear Regression. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 77:1-77:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{harpeled_et_al:LIPIcs.ICALP.2018.77, author = {Har-Peled, Sariel and Indyk, Piotr and Mahabadi, Sepideh}, title = {{Approximate Sparse Linear Regression}}, booktitle = {45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)}, pages = {77:1--77:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-076-7}, ISSN = {1868-8969}, year = {2018}, volume = {107}, editor = {Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.77}, URN = {urn:nbn:de:0030-drops-90816}, doi = {10.4230/LIPIcs.ICALP.2018.77}, annote = {Keywords: Sparse Linear Regression, Approximate Nearest Neighbor, Sparse Recovery, Nearest Induced Flat, Nearest Subspace Search} }

Document

**Published in:** LIPIcs, Volume 81, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)

We study the Fractional Set Cover problem in the streaming model. That is, we consider the relaxation of the set cover problem over a universe of n elements and a collection of m sets, where each set can be picked fractionally, with a value in [0,1]. We present a randomized (1+a)-approximation algorithm that makes p passes over the data, and uses O(polylog(m,n,1/a) (mn^(O(1/(pa)))+n)) memory space. The algorithm works in both the set arrival and the edge arrival models. To the best of our knowledge, this is the first streaming result for the fractional set cover problem. We obtain our results by employing the multiplicative weights update framework in the streaming settings.

Piotr Indyk, Sepideh Mahabadi, Ronitt Rubinfeld, Jonathan Ullman, Ali Vakilian, and Anak Yodpinyanee. Fractional Set Cover in the Streaming Model. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 81, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{indyk_et_al:LIPIcs.APPROX-RANDOM.2017.12, author = {Indyk, Piotr and Mahabadi, Sepideh and Rubinfeld, Ronitt and Ullman, Jonathan and Vakilian, Ali and Yodpinyanee, Anak}, title = {{Fractional Set Cover in the Streaming Model}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)}, pages = {12:1--12:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-044-6}, ISSN = {1868-8969}, year = {2017}, volume = {81}, editor = {Jansen, Klaus and Rolim, Jos\'{e} D. P. and Williamson, David P. and Vempala, Santosh S.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2017.12}, URN = {urn:nbn:de:0030-drops-75613}, doi = {10.4230/LIPIcs.APPROX-RANDOM.2017.12}, annote = {Keywords: Streaming Algorithms, Fractional Set Cover, LP relaxation, Multiplicative Weight Update} }

Document

**Published in:** LIPIcs, Volume 51, 32nd International Symposium on Computational Geometry (SoCG 2016)

Motivated by applications in computer vision and databases, we introduce and study the Simultaneous Nearest Neighbor Search (SNN) problem. Given a set of data points, the goal of SNN is to design a data structure that, given a collection of queries, finds a collection of close points that are compatible with each other. Formally, we are given k query points Q=q_1,...,q_k, and a compatibility graph G with vertices in Q, and the goal is to return data points p_1,...,p_k that minimize (i) the weighted sum of the distances from q_i to p_i and (ii) the weighted sum, over all edges (i,j) in the compatibility graph G, of the distances between p_i and p_j. The problem has several applications in computer vision and databases, where one wants to return a set of *consistent* answers to multiple related queries. Furthermore, it generalizes several well-studied computational problems, including Nearest Neighbor Search, Aggregate Nearest Neighbor Search and the 0-extension problem.
In this paper we propose and analyze the following general two-step method for designing efficient data structures for SNN. In the first step, for each query point q_i we find its (approximate) nearest neighbor point p'_i; this can be done efficiently using existing approximate nearest neighbor structures. In the second step, we solve an off-line optimization problem over sets q_1,...,q_k and p'_1,...,p'_k; this can be done efficiently given that k is much smaller than n. Even though p'_1,...,p'_k might not constitute the optimal answers to queries q_1,...,q_k, we show that, for the unweighted case, the resulting algorithm satisfies a O(log k/log log k)-approximation guarantee. Furthermore, we show that the approximation factor can be in fact reduced to a constant for compatibility graphs frequently occurring in practice, e.g., 2D grids, 3D grids or planar graphs.
Finally, we validate our theoretical results by preliminary experiments. In particular, we show that the empirical approximation factor provided by the above approach is very close to 1.

Piotr Indyk, Robert Kleinberg, Sepideh Mahabadi, and Yang Yuan. Simultaneous Nearest Neighbor Search. In 32nd International Symposium on Computational Geometry (SoCG 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 51, pp. 44:1-44:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{indyk_et_al:LIPIcs.SoCG.2016.44, author = {Indyk, Piotr and Kleinberg, Robert and Mahabadi, Sepideh and Yuan, Yang}, title = {{Simultaneous Nearest Neighbor Search}}, booktitle = {32nd International Symposium on Computational Geometry (SoCG 2016)}, pages = {44:1--44:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-009-5}, ISSN = {1868-8969}, year = {2016}, volume = {51}, editor = {Fekete, S\'{a}ndor and Lubiw, Anna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2016.44}, URN = {urn:nbn:de:0030-drops-59360}, doi = {10.4230/LIPIcs.SoCG.2016.44}, annote = {Keywords: Approximate Nearest Neighbor, Metric Labeling, 0-extension, Simultaneous Nearest Neighbor, Group Nearest Neighbor} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail