Document

**Published in:** LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

We study k-means clustering in a semi-supervised setting. Given an oracle that returns whether two given points belong to the same cluster in a fixed optimal clustering, we investigate the following question: how many oracle queries are sufficient to efficiently recover a clustering that, with probability at least (1 - delta), simultaneously has a cost of at most (1 + epsilon) times the optimal cost and an accuracy of at least (1 - epsilon)?
We show how to achieve such a clustering on n points with O{((k^2 log n) * m{(Q, epsilon^4, delta / (k log n))})} oracle queries, when the k clusters can be learned with an epsilon' error and a failure probability delta' using m(Q, epsilon',delta') labeled samples in the supervised setting, where Q is the set of candidate cluster centers. We show that m(Q, epsilon', delta') is small both for k-means instances in Euclidean space and for those in finite metric spaces. We further show that, for the Euclidean k-means instances, we can avoid the dependency on n in the query complexity at the expense of an increased dependency on k: specifically, we give a slightly more involved algorithm that uses O{(k^4/(epsilon^2 delta) + (k^{9}/epsilon^4) log(1/delta) + k * m{({R}^r, epsilon^4/k, delta)})} oracle queries.
We also show that the number of queries needed for (1 - epsilon)-accuracy in Euclidean k-means must linearly depend on the dimension of the underlying Euclidean space, and for finite metric space k-means, we show that it must at least be logarithmic in the number of candidate centers. This shows that our query complexities capture the right dependencies on the respective parameters.

Buddhima Gamlath, Sangxia Huang, and Ola Svensson. Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 57:1-57:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gamlath_et_al:LIPIcs.ICALP.2018.57, author = {Gamlath, Buddhima and Huang, Sangxia and Svensson, Ola}, title = {{Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering}}, booktitle = {45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)}, pages = {57:1--57:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-076-7}, ISSN = {1868-8969}, year = {2018}, volume = {107}, editor = {Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.57}, URN = {urn:nbn:de:0030-drops-90612}, doi = {10.4230/LIPIcs.ICALP.2018.57}, annote = {Keywords: Clustering, Semi-supervised Learning, Approximation Algorithms, k-Means, k-Median} }

Document

**Published in:** LIPIcs, Volume 40, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015)

An instance of the 2-Lin(2) problem is a system of equations of the form "x_i + x_j = b (mod 2)". Given such a system in which it's possible to satisfy all but an epsilon fraction of the equations, we show it is NP-hard to satisfy all but a C*epsilon fraction of the equations, for any C < 11/8 = 1.375 (and any 0 < epsilon <= 1/8). The previous best result, standing for over 15 years, had 5/4 in place of 11/8. Our result provides the best known NP-hardness even for the Unique Games problem, and it also holds for the special case of Max-Cut. The precise factor 11/8 is unlikely to be best possible; we also give a conjecture concerning analysis of Boolean functions which, if true, would yield a larger hardness factor of 3/2.
Our proof is by a modified gadget reduction from a pairwise-independent predicate. We also show an inherent limitation to this type of gadget reduction. In particular, any such reduction can never establish a hardness factor C greater than 2.54. Previously, no such limitation on gadget reductions was known.

Johan Håstad, Sangxia Huang, Rajsekar Manokaran, Ryan O’Donnell, and John Wright. Improved NP-Inapproximability for 2-Variable Linear Equations. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 40, pp. 341-360, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{hastad_et_al:LIPIcs.APPROX-RANDOM.2015.341, author = {H\r{a}stad, Johan and Huang, Sangxia and Manokaran, Rajsekar and O’Donnell, Ryan and Wright, John}, title = {{Improved NP-Inapproximability for 2-Variable Linear Equations}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015)}, pages = {341--360}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-89-7}, ISSN = {1868-8969}, year = {2015}, volume = {40}, editor = {Garg, Naveen and Jansen, Klaus and Rao, Anup and Rolim, Jos\'{e} D. P.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2015.341}, URN = {urn:nbn:de:0030-drops-53112}, doi = {10.4230/LIPIcs.APPROX-RANDOM.2015.341}, annote = {Keywords: approximability, unique games, linear equation, gadget, linear programming} }

Document

**Published in:** LIPIcs, Volume 9, 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)

We prove a complexity dichotomy theorem for counting weighted Boolean CSP modulo k for any positive integer $k>1$. This generalizes a theorem by Faben for the unweighted setting. In the weighted setting, there are new interesting tractable problems. We first prove a dichotomy theorem for the finite field case where k is a prime. It turns out that the dichotomy theorem for the finite field is very similar to the one for the complex weighted Boolean #CSP, found by [Cai, Lu and Xia, STOC 2009]. Then we further extend the result to an arbitrary integer k.

Heng Guo, Sangxia Huang, Pinyan Lu, and Mingji Xia. The Complexity of Weighted Boolean #CSP Modulo k. In 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011). Leibniz International Proceedings in Informatics (LIPIcs), Volume 9, pp. 249-260, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2011)

Copy BibTex To Clipboard

@InProceedings{guo_et_al:LIPIcs.STACS.2011.249, author = {Guo, Heng and Huang, Sangxia and Lu, Pinyan and Xia, Mingji}, title = {{The Complexity of Weighted Boolean #CSP Modulo k}}, booktitle = {28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)}, pages = {249--260}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-25-5}, ISSN = {1868-8969}, year = {2011}, volume = {9}, editor = {Schwentick, Thomas and D\"{u}rr, Christoph}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2011.249}, URN = {urn:nbn:de:0030-drops-30158}, doi = {10.4230/LIPIcs.STACS.2011.249}, annote = {Keywords: #CSP, dichotomy theorem, counting problems, computational complexity} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail