16 Search Results for "Balcan, Maria-Florina"


Document
Fully Dynamic k-Means Coreset in Near-Optimal Update Time

Authors: Max Dupré la Tour, Monika Henzinger, and David Saulpic

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)


Abstract
We study in this paper the problem of maintaining a solution to k-median and k-means clustering in a fully dynamic setting. To do so, we present an algorithm to efficiently maintain a coreset, a compressed version of the dataset, that allows easy computation of a clustering solution at query time. Our coreset algorithm has near-optimal update time of Õ(k) in general metric spaces, which reduces to Õ(d) in the Euclidean space ℝ^d. The query time is O(k²) in general metrics, and O(kd) in ℝ^d. To maintain a constant-factor approximation for k-median and k-means clustering in Euclidean space, this directly leads to an algorithm with update time Õ(d), and query time Õ(kd + k²). To maintain a O(polylog k)-approximation, the query time is reduced to Õ(kd).

Cite as

Max Dupré la Tour, Monika Henzinger, and David Saulpic. Fully Dynamic k-Means Coreset in Near-Optimal Update Time. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 100:1-100:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{latour_et_al:LIPIcs.ESA.2024.100,
  author =	{la Tour, Max Dupr\'{e} and Henzinger, Monika and Saulpic, David},
  title =	{{Fully Dynamic k-Means Coreset in Near-Optimal Update Time}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{100:1--100:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.100},
  URN =		{urn:nbn:de:0030-drops-211716},
  doi =		{10.4230/LIPIcs.ESA.2024.100},
  annote =	{Keywords: clustering, fully-dynamic, coreset, k-means}
}
Document
RANDOM
Nearly Optimal Bounds for Sample-Based Testing and Learning of k-Monotone Functions

Authors: Hadley Black

Published in: LIPIcs, Volume 317, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)


Abstract
We study monotonicity testing of functions f : {0,1}^d → {0,1} using sample-based algorithms, which are only allowed to observe the value of f on points drawn independently from the uniform distribution. A classic result by Bshouty-Tamon (J. ACM 1996) proved that monotone functions can be learned with exp(Õ(min{(1/ε)√d,d})) samples and it is not hard to show that this bound extends to testing. Prior to our work the only lower bound for this problem was Ω(√{exp(d)/ε}) in the small ε parameter regime, when ε = O(d^{-3/2}), due to Goldreich-Goldwasser-Lehman-Ron-Samorodnitsky (Combinatorica 2000). Thus, the sample complexity of monotonicity testing was wide open for ε ≫ d^{-3/2}. We resolve this question, obtaining a nearly tight lower bound of exp(Ω(min{(1/ε)√d,d})) for all ε at most a sufficiently small constant. In fact, we prove a much more general result, showing that the sample complexity of k-monotonicity testing and learning for functions f : {0,1}^d → [r] is exp(Ω(min{(rk/ε)√d,d})). For testing with one-sided error we show that the sample complexity is exp(Ω(d)). Beyond the hypercube, we prove nearly tight bounds (up to polylog factors of d,k,r,1/ε in the exponent) of exp(Θ̃(min{(rk/ε)√d,d})) on the sample complexity of testing and learning measurable k-monotone functions f : ℝ^d → [r] under product distributions. Our upper bound improves upon the previous bound of exp(Õ(min{(k/ε²)√d,d})) by Harms-Yoshida (ICALP 2022) for Boolean functions (r = 2).

Cite as

Hadley Black. Nearly Optimal Bounds for Sample-Based Testing and Learning of k-Monotone Functions. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 317, pp. 37:1-37:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{black:LIPIcs.APPROX/RANDOM.2024.37,
  author =	{Black, Hadley},
  title =	{{Nearly Optimal Bounds for Sample-Based Testing and Learning of k-Monotone Functions}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)},
  pages =	{37:1--37:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-348-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{317},
  editor =	{Kumar, Amit and Ron-Zewi, Noga},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2024.37},
  URN =		{urn:nbn:de:0030-drops-210308},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2024.37},
  annote =	{Keywords: Property testing, learning, Boolean functions, monotonicity, k-monotonicity}
}
Document
Tenspiler: A Verified-Lifting-Based Compiler for Tensor Operations

Authors: Jie Qiu, Colin Cai, Sahil Bhatia, Niranjan Hasabnis, Sanjit A. Seshia, and Alvin Cheung

Published in: LIPIcs, Volume 313, 38th European Conference on Object-Oriented Programming (ECOOP 2024)


Abstract
Tensor processing infrastructures such as deep learning frameworks and specialized hardware accelerators have revolutionized how computationally intensive code from domains such as deep learning and image processing is executed and optimized. These infrastructures provide powerful and expressive abstractions while ensuring high performance. However, to utilize them, code must be written specifically using the APIs / ISAs of such software frameworks or hardware accelerators. Importantly, given the fast pace of innovation in these domains, code written today quickly becomes legacy as new frameworks and accelerators are developed, and migrating such legacy code manually is a considerable effort. To enable developers in leveraging such DSLs while preserving their current programming paradigm, we present Tenspiler, a verified-lifting-based compiler that uses program synthesis to translate sequential programs written in general-purpose programming languages (e.g., C++ or Python code that does not leverage any specialized framework or accelerator) into tensor operations. Central to Tenspiler is our carefully crafted yet simple intermediate language, named TensIR, that expresses tensor operations. TensIR enables efficient lifting, verification, and code generation. Unlike classical pattern-matching-based compilers, Tenspiler uses program synthesis to translate input code into TensIR, which is then compiled to the target API / ISA. Currently, Tenspiler already supports six DSLs, spanning a broad spectrum of software and hardware environments. Furthermore, we show that new backends can be easily supported by Tenspiler by adding simple pattern-matching rules for TensIR. Using 10 real-world code benchmark suites, our experimental evaluation shows that by translating code to be executed on 6 different software frameworks and hardware devices, Tenspiler offers on average 105× kernel and 9.65× end-to-end execution time improvement over the fully-optimized sequential implementation of the same benchmarks.

Cite as

Jie Qiu, Colin Cai, Sahil Bhatia, Niranjan Hasabnis, Sanjit A. Seshia, and Alvin Cheung. Tenspiler: A Verified-Lifting-Based Compiler for Tensor Operations. In 38th European Conference on Object-Oriented Programming (ECOOP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 313, pp. 32:1-32:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{qiu_et_al:LIPIcs.ECOOP.2024.32,
  author =	{Qiu, Jie and Cai, Colin and Bhatia, Sahil and Hasabnis, Niranjan and Seshia, Sanjit A. and Cheung, Alvin},
  title =	{{Tenspiler: A Verified-Lifting-Based Compiler for Tensor Operations}},
  booktitle =	{38th European Conference on Object-Oriented Programming (ECOOP 2024)},
  pages =	{32:1--32:28},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-341-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{313},
  editor =	{Aldrich, Jonathan and Salvaneschi, Guido},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2024.32},
  URN =		{urn:nbn:de:0030-drops-208817},
  doi =		{10.4230/LIPIcs.ECOOP.2024.32},
  annote =	{Keywords: Program Synthesis, Code Transpilation, Tensor DSLs, Verification}
}
Document
Constraint Modelling with LLMs Using In-Context Learning

Authors: Kostis Michailidis, Dimos Tsouros, and Tias Guns

Published in: LIPIcs, Volume 307, 30th International Conference on Principles and Practice of Constraint Programming (CP 2024)


Abstract
Constraint Programming (CP) allows for the modelling and solving of a wide range of combinatorial problems. However, modelling such problems using constraints over decision variables still requires significant expertise, both in conceptual thinking and syntactic use of modelling languages. In this work, we explore the potential of using pre-trained Large Language Models (LLMs) as coding assistants, to transform textual problem descriptions into concrete and executable CP specifications. We present different transformation pipelines with explicit intermediate representations, and we investigate the potential benefit of various retrieval-augmented example selection strategies for in-context learning. We evaluate our approach on 2 datasets from the literature, namely NL4Opt (optimisation) and Logic Grid Puzzles (satisfaction), and a heterogeneous set of exercises from a CP course. The results show that pre-trained LLMs have promising potential for initialising the modelling process, with retrieval-augmented in-context learning significantly enhancing their modelling capabilities.

Cite as

Kostis Michailidis, Dimos Tsouros, and Tias Guns. Constraint Modelling with LLMs Using In-Context Learning. In 30th International Conference on Principles and Practice of Constraint Programming (CP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 307, pp. 20:1-20:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{michailidis_et_al:LIPIcs.CP.2024.20,
  author =	{Michailidis, Kostis and Tsouros, Dimos and Guns, Tias},
  title =	{{Constraint Modelling with LLMs Using In-Context Learning}},
  booktitle =	{30th International Conference on Principles and Practice of Constraint Programming (CP 2024)},
  pages =	{20:1--20:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-336-2},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{307},
  editor =	{Shaw, Paul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CP.2024.20},
  URN =		{urn:nbn:de:0030-drops-207053},
  doi =		{10.4230/LIPIcs.CP.2024.20},
  annote =	{Keywords: Constraint Modelling, Constraint Acquisition, Constraint Programming, Large Language Models, In-Context Learning, Natural Language Processing, Named Entity Recognition, Retrieval-Augmented Generation, Optimisation}
}
Document
Combining Constraint Programming Reasoning with Large Language Model Predictions

Authors: Florian Régin, Elisabetta De Maria, and Alexandre Bonlarron

Published in: LIPIcs, Volume 307, 30th International Conference on Principles and Practice of Constraint Programming (CP 2024)


Abstract
Constraint Programming (CP) and Machine Learning (ML) face challenges in text generation due to CP’s struggle with implementing "meaning" and ML’s difficulty with structural constraints. This paper proposes a solution by combining both approaches and embedding a Large Language Model (LLM) in CP. The LLM handles word generation and meaning, while CP manages structural constraints. This approach builds on GenCP, an improved version of On-the-fly Constraint Programming Search (OTFS) using LLM-generated domains. Compared to Beam Search (BS), a standard NLP method, this combined approach (GenCP with LLM) is faster and produces better results, ensuring all constraints are satisfied. This fusion of CP and ML presents new possibilities for enhancing text generation under constraints.

Cite as

Florian Régin, Elisabetta De Maria, and Alexandre Bonlarron. Combining Constraint Programming Reasoning with Large Language Model Predictions. In 30th International Conference on Principles and Practice of Constraint Programming (CP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 307, pp. 25:1-25:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{regin_et_al:LIPIcs.CP.2024.25,
  author =	{R\'{e}gin, Florian and De Maria, Elisabetta and Bonlarron, Alexandre},
  title =	{{Combining Constraint Programming Reasoning with Large Language Model Predictions}},
  booktitle =	{30th International Conference on Principles and Practice of Constraint Programming (CP 2024)},
  pages =	{25:1--25:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-336-2},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{307},
  editor =	{Shaw, Paul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CP.2024.25},
  URN =		{urn:nbn:de:0030-drops-207109},
  doi =		{10.4230/LIPIcs.CP.2024.25},
  annote =	{Keywords: Solver and Tools, ML-augmented CP, Constrained Text Generation, ML alongside CO}
}
Document
Track A: Algorithms, Complexity and Games
Fully-Scalable MPC Algorithms for Clustering in High Dimension

Authors: Artur Czumaj, Guichen Gao, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Pavel Veselý

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
We design new parallel algorithms for clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine may be n^σ for arbitrarily small fixed σ > 0. Importantly, the local memory may be substantially smaller than the number of clusters k, yet all our algorithms are fast, i.e., run in O(1) rounds. We first devise a fast MPC algorithm for O(1)-approximation of uniform Facility Location. This is the first fully-scalable MPC algorithm that achieves O(1)-approximation for any clustering problem in general geometric setting; previous algorithms only provide poly(log n)-approximation or apply to restricted inputs, like low dimension or small number of clusters k; e.g. [Bhaskara and Wijewardena, ICML'18; Cohen-Addad et al., NeurIPS'21; Cohen-Addad et al., ICML'22]. We then build on this Facility Location result and devise a fast MPC algorithm that achieves O(1)-bicriteria approximation for k-Median and for k-Means, namely, it computes (1+ε)k clusters of cost within O(1/ε²)-factor of the optimum for k clusters. A primary technical tool that we introduce, and may be of independent interest, is a new MPC primitive for geometric aggregation, namely, computing for every data point a statistic of its approximate neighborhood, for statistics like range counting and nearest-neighbor search. Our implementation of this primitive works in high dimension, and is based on consistent hashing (aka sparse partition), a technique that was recently used for streaming algorithms [Czumaj et al., FOCS'22].

Cite as

Artur Czumaj, Guichen Gao, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Pavel Veselý. Fully-Scalable MPC Algorithms for Clustering in High Dimension. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 50:1-50:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{czumaj_et_al:LIPIcs.ICALP.2024.50,
  author =	{Czumaj, Artur and Gao, Guichen and Jiang, Shaofeng H.-C. and Krauthgamer, Robert and Vesel\'{y}, Pavel},
  title =	{{Fully-Scalable MPC Algorithms for Clustering in High Dimension}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{50:1--50:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.50},
  URN =		{urn:nbn:de:0030-drops-201938},
  doi =		{10.4230/LIPIcs.ICALP.2024.50},
  annote =	{Keywords: Massively parallel computing, high dimension, facility location, k-median, k-means}
}
Document
Track A: Algorithms, Complexity and Games
Simultaneously Approximating All 𝓁_p-Norms in Correlation Clustering

Authors: Sami Davies, Benjamin Moseley, and Heather Newman

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
This paper considers correlation clustering on unweighted complete graphs. We give a combinatorial algorithm that returns a single clustering solution that is simultaneously O(1)-approximate for all 𝓁_p-norms of the disagreement vector; in other words, a combinatorial O(1)-approximation of the all-norms objective for correlation clustering. This is the first proof that minimal sacrifice is needed in order to optimize different norms of the disagreement vector. In addition, our algorithm is the first combinatorial approximation algorithm for the 𝓁₂-norm objective, and more generally the first combinatorial algorithm for the 𝓁_p-norm objective when 1 < p < ∞. It is also faster than all previous algorithms that minimize the 𝓁_p-norm of the disagreement vector, with run-time O(n^ω), where O(n^ω) is the time for matrix multiplication on n × n matrices. When the maximum positive degree in the graph is at most Δ, this can be improved to a run-time of O(nΔ² log n).

Cite as

Sami Davies, Benjamin Moseley, and Heather Newman. Simultaneously Approximating All 𝓁_p-Norms in Correlation Clustering. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 52:1-52:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{davies_et_al:LIPIcs.ICALP.2024.52,
  author =	{Davies, Sami and Moseley, Benjamin and Newman, Heather},
  title =	{{Simultaneously Approximating All 𝓁\underlinep-Norms in Correlation Clustering}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{52:1--52:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.52},
  URN =		{urn:nbn:de:0030-drops-201950},
  doi =		{10.4230/LIPIcs.ICALP.2024.52},
  annote =	{Keywords: Approximation algorithms, correlation clustering, all-norms, lp-norms}
}
Document
Improved Sample Complexity Bounds for Branch-And-Cut

Authors: Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, and Ellen Vitercik

Published in: LIPIcs, Volume 235, 28th International Conference on Principles and Practice of Constraint Programming (CP 2022)


Abstract
The branch-and-cut algorithm for integer programming has a wide variety of tunable parameters that have a huge impact on its performance, but which are challenging to tune by hand. An increasingly popular approach is to use machine learning to configure these parameters based on a training set of integer programs from the application domain. We bound how large the training set should be to ensure that for any configuration, its average performance over the training set is close to its expected future performance. Our guarantees apply to parameters that control the most important aspects of branch-and-cut: node selection, branching constraint selection, and cut selection, and are sharper and more general than those from prior research.

Cite as

Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, and Ellen Vitercik. Improved Sample Complexity Bounds for Branch-And-Cut. In 28th International Conference on Principles and Practice of Constraint Programming (CP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 235, pp. 3:1-3:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{balcan_et_al:LIPIcs.CP.2022.3,
  author =	{Balcan, Maria-Florina and Prasad, Siddharth and Sandholm, Tuomas and Vitercik, Ellen},
  title =	{{Improved Sample Complexity Bounds for Branch-And-Cut}},
  booktitle =	{28th International Conference on Principles and Practice of Constraint Programming (CP 2022)},
  pages =	{3:1--3:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-240-2},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{235},
  editor =	{Solnon, Christine},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CP.2022.3},
  URN =		{urn:nbn:de:0030-drops-166321},
  doi =		{10.4230/LIPIcs.CP.2022.3},
  annote =	{Keywords: Automated algorithm configuration, integer programming, machine learning theory, tree search, branch-and-bound, branch-and-cut, cutting planes, sample complexity, generalization guarantees, data-driven algorithm design}
}
Document
Invited Talk
Generalization Guarantees for Data-Driven Mechanism Design (Invited Talk)

Authors: Maria-Florina Balcan

Published in: LIPIcs, Volume 219, 39th International Symposium on Theoretical Aspects of Computer Science (STACS 2022)


Abstract
Many mechanisms including pricing mechanisms and auctions typically come with a variety of tunable parameters which impact significantly their desired performance guarantees. Data-driven mechanism design is a powerful approach for designing mechanisms, where these parameters are tuned via machine learning based on data. In this talk I will discuss how techniques from machine learning theory can be adapted and extended to analyze generalization guarantees of data-driven mechanism design.

Cite as

Maria-Florina Balcan. Generalization Guarantees for Data-Driven Mechanism Design (Invited Talk). In 39th International Symposium on Theoretical Aspects of Computer Science (STACS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 219, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{balcan:LIPIcs.STACS.2022.2,
  author =	{Balcan, Maria-Florina},
  title =	{{Generalization Guarantees for Data-Driven Mechanism Design}},
  booktitle =	{39th International Symposium on Theoretical Aspects of Computer Science (STACS 2022)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-222-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{219},
  editor =	{Berenbrink, Petra and Monmege, Benjamin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2022.2},
  URN =		{urn:nbn:de:0030-drops-158127},
  doi =		{10.4230/LIPIcs.STACS.2022.2},
  annote =	{Keywords: mechanism configuration, algorithm configuration, machine learning, generalization guarantees}
}
Document
Track A: Algorithms, Complexity and Games
Robust Communication-Optimal Distributed Clustering Algorithms

Authors: Pranjal Awasthi, Ainesh Bakshi, Maria-Florina Balcan, Colin White, and David P. Woodruff

Published in: LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)


Abstract
In this work, we study the k-median and k-means clustering problems when the data is distributed across many servers and can contain outliers. While there has been a lot of work on these problems for worst-case instances, we focus on gaining a finer understanding through the lens of beyond worst-case analysis. Our main motivation is the following: for many applications such as clustering proteins by function or clustering communities in a social network, there is some unknown target clustering, and the hope is that running a k-median or k-means algorithm will produce clusterings which are close to matching the target clustering. Worst-case results can guarantee constant factor approximations to the optimal k-median or k-means objective value, but not closeness to the target clustering. Our first result is a distributed algorithm which returns a near-optimal clustering assuming a natural notion of stability, namely, approximation stability [Awasthi and Balcan, 2014], even when a constant fraction of the data are outliers. The communication complexity is O~(sk+z) where s is the number of machines, k is the number of clusters, and z is the number of outliers. Next, we show this amount of communication cannot be improved even in the setting when the input satisfies various non-worst-case assumptions. We give a matching Omega(sk+z) lower bound on the communication required both for approximating the optimal k-means or k-median cost up to any constant, and for returning a clustering that is close to the target clustering in Hamming distance. These lower bounds hold even when the data satisfies approximation stability or other common notions of stability, and the cluster sizes are balanced. Therefore, Omega(sk+z) is a communication bottleneck, even for real-world instances.

Cite as

Pranjal Awasthi, Ainesh Bakshi, Maria-Florina Balcan, Colin White, and David P. Woodruff. Robust Communication-Optimal Distributed Clustering Algorithms. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 18:1-18:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{awasthi_et_al:LIPIcs.ICALP.2019.18,
  author =	{Awasthi, Pranjal and Bakshi, Ainesh and Balcan, Maria-Florina and White, Colin and Woodruff, David P.},
  title =	{{Robust Communication-Optimal Distributed Clustering Algorithms}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{18:1--18:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.18},
  URN =		{urn:nbn:de:0030-drops-105942},
  doi =		{10.4230/LIPIcs.ICALP.2019.18},
  annote =	{Keywords: robust distributed clustering, communication complexity}
}
Document
Perturbation Resilient Clustering for k-Center and Related Problems via LP Relaxations

Authors: Chandra Chekuri and Shalmoli Gupta

Published in: LIPIcs, Volume 116, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)


Abstract
We consider clustering in the perturbation resilience model that has been studied since the work of Bilu and Linial [Yonatan Bilu and Nathan Linial, 2010] and Awasthi, Blum and Sheffet [Awasthi et al., 2012]. A clustering instance I is said to be alpha-perturbation resilient if the optimal solution does not change when the pairwise distances are modified by a factor of alpha and the perturbed distances satisfy the metric property - this is the metric perturbation resilience property introduced in [Angelidakis et al., 2017] and a weaker requirement than prior models. We make two high-level contributions. - We show that the natural LP relaxation of k-center and asymmetric k-center is integral for 2-perturbation resilient instances. We belive that demonstrating the goodness of standard LP relaxations complements existing results [Maria{-}Florina Balcan et al., 2016; Angelidakis et al., 2017] that are based on new algorithms designed for the perturbation model. - We define a simple new model of perturbation resilience for clustering with outliers. Using this model we show that the unified MST and dynamic programming based algorithm proposed in [Angelidakis et al., 2017] exactly solves the clustering with outliers problem for several common center based objectives (like k-center, k-means, k-median) when the instances is 2-perturbation resilient. We further show that a natural LP relxation is integral for 2-perturbation resilient instances of k-center with outliers.

Cite as

Chandra Chekuri and Shalmoli Gupta. Perturbation Resilient Clustering for k-Center and Related Problems via LP Relaxations. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 116, pp. 9:1-9:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{chekuri_et_al:LIPIcs.APPROX-RANDOM.2018.9,
  author =	{Chekuri, Chandra and Gupta, Shalmoli},
  title =	{{Perturbation Resilient Clustering for k-Center and Related Problems via LP Relaxations}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)},
  pages =	{9:1--9:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-085-9},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{116},
  editor =	{Blais, Eric and Jansen, Klaus and D. P. Rolim, Jos\'{e} and Steurer, David},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2018.9},
  URN =		{urn:nbn:de:0030-drops-94136},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2018.9},
  annote =	{Keywords: Clustering, Perturbation Resilience, LP Integrality, Outliers, Beyond Worst Case Analysis}
}
Document
Matrix Completion and Related Problems via Strong Duality

Authors: Maria-Florina Balcan, Yingyu Liang, David P. Woodruff, and Hongyang Zhang

Published in: LIPIcs, Volume 94, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)


Abstract
This work studies the strong duality of non-convex matrix factorization problems: we show that under certain dual conditions, these problems and its dual have the same optimum. This has been well understood for convex optimization, but little was known for non-convex problems. We propose a novel analytical framework and show that under certain dual conditions, the optimal solution of the matrix factorization program is the same as its bi-dual and thus the global optimality of the non-convex program can be achieved by solving its bi-dual which is convex. These dual conditions are satisfied by a wide class of matrix factorization problems, although matrix factorization problems are hard to solve in full generality. This analytical framework may be of independent interest to non-convex optimization more broadly. We apply our framework to two prototypical matrix factorization problems: matrix completion and robust Principal Component Analysis (PCA). These are examples of efficiently recovering a hidden matrix given limited reliable observations of it. Our framework shows that exact recoverability and strong duality hold with nearly-optimal sample complexity guarantees for matrix completion and robust PCA.

Cite as

Maria-Florina Balcan, Yingyu Liang, David P. Woodruff, and Hongyang Zhang. Matrix Completion and Related Problems via Strong Duality. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 5:1-5:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{balcan_et_al:LIPIcs.ITCS.2018.5,
  author =	{Balcan, Maria-Florina and Liang, Yingyu and Woodruff, David P. and Zhang, Hongyang},
  title =	{{Matrix Completion and Related Problems via Strong Duality}},
  booktitle =	{9th Innovations in Theoretical Computer Science Conference (ITCS 2018)},
  pages =	{5:1--5:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-060-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{94},
  editor =	{Karlin, Anna R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2018.5},
  URN =		{urn:nbn:de:0030-drops-83583},
  doi =		{10.4230/LIPIcs.ITCS.2018.5},
  annote =	{Keywords: Non-Convex Optimization, Strong Duality, Matrix Completion, Robust PCA, Sample Complexity}
}
Document
Foundations of Unsupervised Learning (Dagstuhl Seminar 16382)

Authors: Maria-Florina Balcan, Shai Ben-David, Ruth Urner, and Ulrike von Luxburg

Published in: Dagstuhl Reports, Volume 6, Issue 9 (2017)


Abstract
This report documents the program and the outcomes of Dagstuhl Seminar 16382 "Foundations of Unsupervised Learning". Unsupervised learning techniques are frequently used in practice of data analysis. However, there is currently little formal guidance as to how, when and to what effect to use which unsupervised learning method. The goal of the seminar was to initiate a broader and more systematic research on the foundations of unsupervised learning with the ultimate aim to provide more support to practitioners. The seminar brought together academic researchers from the fields of theoretical computer science and statistics as well as some researchers from industry.

Cite as

Maria-Florina Balcan, Shai Ben-David, Ruth Urner, and Ulrike von Luxburg. Foundations of Unsupervised Learning (Dagstuhl Seminar 16382). In Dagstuhl Reports, Volume 6, Issue 9, pp. 94-109, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@Article{balcan_et_al:DagRep.6.9.94,
  author =	{Balcan, Maria-Florina and Ben-David, Shai and Urner, Ruth and von Luxburg, Ulrike},
  title =	{{Foundations of Unsupervised Learning (Dagstuhl Seminar 16382)}},
  pages =	{94--109},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2017},
  volume =	{6},
  number =	{9},
  editor =	{Balcan, Maria-Florina and Ben-David, Shai and Urner, Ruth and von Luxburg, Ulrike},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.6.9.94},
  URN =		{urn:nbn:de:0030-drops-69542},
  doi =		{10.4230/DagRep.6.9.94},
  annote =	{Keywords: Machine learning, theory of computing, unsupervised learning, representation learning}
}
Document
k-Center Clustering Under Perturbation Resilience

Authors: Maria-Florina Balcan, Nika Haghtalab, and Colin White

Published in: LIPIcs, Volume 55, 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)


Abstract
The k-center problem is a canonical and long-studied facility location and clustering problem with many applications in both its symmetric and asymmetric forms. Both versions of the problem have tight approximation factors on worst case instances: a 2-approximation for symmetric kcenter and an O(log*(k))-approximation for the asymmetric version. Therefore to improve on these ratios, one must go beyond the worst case. In this work, we take this approach and provide strong positive results both for the asymmetric and symmetric k-center problems under a very natural input stability (promise) condition called alpha-perturbation resilience [Bilu Linial, 2012], which states that the optimal solution does not change under any alpha-factor perturbation to the input distances. We show that by assuming 2-perturbation resilience, the exact solution for the asymmetric k-center problem can be found in polynomial time. To our knowledge, this is the first problem that is hard to approximate to any constant factor in the worst case, yet can be optimally solved in polynomial time under perturbation resilience for a constant value of alpha. Furthermore, we prove our result is tight by showing symmetric k-center under (2-epsilon)-perturbation resilience is hard unless NP=RP. This is the first tight result for any problem under perturbation resilience, i.e., this is the first time the exact value of alpha for which the problem switches from being NP-hard to efficiently computable has been found. Our results illustrate a surprising relationship between symmetric and asymmetric k-center instances under perturbation resilience. Unlike approximation ratio, for which symmetric k-center is easily solved to a factor of 2 but asymmetric k-center cannot be approximated to any constant factor, both symmetric and asymmetric k-center can be solved optimally under resilience to 2-perturbations.

Cite as

Maria-Florina Balcan, Nika Haghtalab, and Colin White. k-Center Clustering Under Perturbation Resilience. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 55, pp. 68:1-68:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{balcan_et_al:LIPIcs.ICALP.2016.68,
  author =	{Balcan, Maria-Florina and Haghtalab, Nika and White, Colin},
  title =	{{k-Center Clustering Under Perturbation Resilience}},
  booktitle =	{43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)},
  pages =	{68:1--68:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-013-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{55},
  editor =	{Chatzigiannakis, Ioannis and Mitzenmacher, Michael and Rabani, Yuval and Sangiorgi, Davide},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2016.68},
  URN =		{urn:nbn:de:0030-drops-62160},
  doi =		{10.4230/LIPIcs.ICALP.2016.68},
  annote =	{Keywords: k-center, clustering, perturbation resilience}
}
Document
Analysis of Algorithms Beyond the Worst Case (Dagstuhl Seminar 14372)

Authors: Marina-Florina Balcan, Bodo Manthey, Heiko Röglin, and Tim Roughgarden

Published in: Dagstuhl Reports, Volume 4, Issue 9 (2015)


Abstract
This report documents the program and the outcomes of Dagstuhl Seminar 14372 "Analysis of Algorithms Beyond the Worst Case". The theory of algorithms has traditionally focused on worst-case analysis. This focus has led to both a deep theory and many beautiful and useful algorithms. However, there are a number of important problems and algorithms for which worst-case analysis does not provide useful or empirically accurate results. This is due to the fact that worst-case inputs are often rather contrived and occur hardly ever in practical applications. Only in recent years a paradigm shift towards a more realistic and robust algorithmic theory has been initiated. The development of a more realistic theory hinges on finding models that measure the performance of an algorithm not only by its worst-case behavior but rather by its behavior on "typical" inputs. In this seminar, we discussed various recent theoretical models and results that go beyond worst-case analysis. The seminar helped to consolidate the research and to foster collaborations among the researchers working in the different branches of analysis of algorithms beyond the worst case.

Cite as

Marina-Florina Balcan, Bodo Manthey, Heiko Röglin, and Tim Roughgarden. Analysis of Algorithms Beyond the Worst Case (Dagstuhl Seminar 14372). In Dagstuhl Reports, Volume 4, Issue 9, pp. 30-49, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)


Copy BibTex To Clipboard

@Article{balcan_et_al:DagRep.4.9.30,
  author =	{Balcan, Marina-Florina and Manthey, Bodo and R\"{o}glin, Heiko and Roughgarden, Tim},
  title =	{{Analysis of Algorithms Beyond the Worst Case (Dagstuhl Seminar 14372)}},
  pages =	{30--49},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2015},
  volume =	{4},
  number =	{9},
  editor =	{Balcan, Marina-Florina and Manthey, Bodo and R\"{o}glin, Heiko and Roughgarden, Tim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.4.9.30},
  URN =		{urn:nbn:de:0030-drops-48829},
  doi =		{10.4230/DagRep.4.9.30},
  annote =	{Keywords: analysis of algorithms, probabilistic analysis, smoothed analysis, approximation stability, machine learning}
}
  • Refine by Author
  • 7 Balcan, Maria-Florina
  • 2 White, Colin
  • 2 Woodruff, David P.
  • 1 Awasthi, Pranjal
  • 1 Bakshi, Ainesh
  • Show More...

  • Refine by Classification
  • 3 Theory of computation → Facility location and clustering
  • 2 Theory of computation → Constraint and logic programming
  • 1 Computing methodologies → Discrete space search
  • 1 Computing methodologies → Natural language generation
  • 1 Mathematics of computing → Probabilistic algorithms
  • Show More...

  • Refine by Keyword
  • 2 clustering
  • 2 generalization guarantees
  • 2 k-means
  • 2 machine learning
  • 1 Approximation algorithms
  • Show More...

  • Refine by Type
  • 16 document

  • Refine by Publication Year
  • 7 2024
  • 2 2018
  • 2 2022
  • 1 2007
  • 1 2015
  • Show More...

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail