DROPS

Document

DOI: 10.4230/LIPIcs.ICALP.2018.57

Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering

Authors: Buddhima Gamlath, Sangxia Huang, and Ola Svensson

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Abstract

We study k-means clustering in a semi-supervised setting. Given an oracle that returns whether two given points belong to the same cluster in a fixed optimal clustering, we investigate the following question: how many oracle queries are sufficient to efficiently recover a clustering that, with probability at least (1 - delta), simultaneously has a cost of at most (1 + epsilon) times the optimal cost and an accuracy of at least (1 - epsilon)? We show how to achieve such a clustering on n points with O{((k^2 log n) * m{(Q, epsilon^4, delta / (k log n))})} oracle queries, when the k clusters can be learned with an epsilon' error and a failure probability delta' using m(Q, epsilon',delta') labeled samples in the supervised setting, where Q is the set of candidate cluster centers. We show that m(Q, epsilon', delta') is small both for k-means instances in Euclidean space and for those in finite metric spaces. We further show that, for the Euclidean k-means instances, we can avoid the dependency on n in the query complexity at the expense of an increased dependency on k: specifically, we give a slightly more involved algorithm that uses O{(k^4/(epsilon^2 delta) + (k^{9}/epsilon^4) log(1/delta) + k * m{({R}^r, epsilon^4/k, delta)})} oracle queries. We also show that the number of queries needed for (1 - epsilon)-accuracy in Euclidean k-means must linearly depend on the dimension of the underlying Euclidean space, and for finite metric space k-means, we show that it must at least be logarithmic in the number of candidate centers. This shows that our query complexities capture the right dependencies on the respective parameters.

Cite as

Buddhima Gamlath, Sangxia Huang, and Ola Svensson. Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 57:1-57:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gamlath_et_al:LIPIcs.ICALP.2018.57,
  author =	{Gamlath, Buddhima and Huang, Sangxia and Svensson, Ola},
  title =	{{Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{57:1--57:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.57},
  URN =		{urn:nbn:de:0030-drops-90612},
  doi =		{10.4230/LIPIcs.ICALP.2018.57},
  annote =	{Keywords: Clustering, Semi-supervised Learning, Approximation Algorithms, k-Means, k-Median}
}

Document

DOI: 10.4230/LIPIcs.APPROX-RANDOM.2015.341

Improved NP-Inapproximability for 2-Variable Linear Equations

Authors: Johan Håstad, Sangxia Huang, Rajsekar Manokaran, Ryan O’Donnell, and John Wright

Published in: LIPIcs, Volume 40, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015)

Abstract

An instance of the 2-Lin(2) problem is a system of equations of the form "x_i + x_j = b (mod 2)". Given such a system in which it's possible to satisfy all but an epsilon fraction of the equations, we show it is NP-hard to satisfy all but a C*epsilon fraction of the equations, for any C < 11/8 = 1.375 (and any 0 < epsilon <= 1/8). The previous best result, standing for over 15 years, had 5/4 in place of 11/8. Our result provides the best known NP-hardness even for the Unique Games problem, and it also holds for the special case of Max-Cut. The precise factor 11/8 is unlikely to be best possible; we also give a conjecture concerning analysis of Boolean functions which, if true, would yield a larger hardness factor of 3/2. Our proof is by a modified gadget reduction from a pairwise-independent predicate. We also show an inherent limitation to this type of gadget reduction. In particular, any such reduction can never establish a hardness factor C greater than 2.54. Previously, no such limitation on gadget reductions was known.

Cite as

Johan Håstad, Sangxia Huang, Rajsekar Manokaran, Ryan O’Donnell, and John Wright. Improved NP-Inapproximability for 2-Variable Linear Equations. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 40, pp. 341-360, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{hastad_et_al:LIPIcs.APPROX-RANDOM.2015.341,
  author =	{H\r{a}stad, Johan and Huang, Sangxia and Manokaran, Rajsekar and O’Donnell, Ryan and Wright, John},
  title =	{{Improved NP-Inapproximability for 2-Variable Linear Equations}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015)},
  pages =	{341--360},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-89-7},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{40},
  editor =	{Garg, Naveen and Jansen, Klaus and Rao, Anup and Rolim, Jos\'{e} D. P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2015.341},
  URN =		{urn:nbn:de:0030-drops-53112},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2015.341},
  annote =	{Keywords: approximability, unique games, linear equation, gadget, linear programming}
}

@InProceedings{hastad_et_al:LIPIcs.APPROX-RANDOM.2015.341,
  author =	{H\r{a}stad, Johan and Huang, Sangxia and Manokaran, Rajsekar and O’Donnell, Ryan and Wright, John},
  title =	{{Improved NP-Inapproximability for 2-Variable Linear Equations}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015)},
  pages =	{341--360},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-89-7},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{40},
  editor =	{Garg, Naveen and Jansen, Klaus and Rao, Anup and Rolim, Jos\'{e} D. P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2015.341},
  URN =		{urn:nbn:de:0030-drops-53112},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2015.341},
  annote =	{Keywords: approximability, unique games, linear equation, gadget, linear programming}
}

Document

DOI: 10.4230/LIPIcs.STACS.2011.249

The Complexity of Weighted Boolean #CSP Modulo k

Authors: Heng Guo, Sangxia Huang, Pinyan Lu, and Mingji Xia

Published in: LIPIcs, Volume 9, 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)

Abstract

We prove a complexity dichotomy theorem for counting weighted Boolean CSP modulo k for any positive integer $k>1$. This generalizes a theorem by Faben for the unweighted setting. In the weighted setting, there are new interesting tractable problems. We first prove a dichotomy theorem for the finite field case where k is a prime. It turns out that the dichotomy theorem for the finite field is very similar to the one for the complex weighted Boolean #CSP, found by [Cai, Lu and Xia, STOC 2009]. Then we further extend the result to an arbitrary integer k.

Cite as

Heng Guo, Sangxia Huang, Pinyan Lu, and Mingji Xia. The Complexity of Weighted Boolean #CSP Modulo k. In 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011). Leibniz International Proceedings in Informatics (LIPIcs), Volume 9, pp. 249-260, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2011)

Copy BibTex To Clipboard

@InProceedings{guo_et_al:LIPIcs.STACS.2011.249,
  author =	{Guo, Heng and Huang, Sangxia and Lu, Pinyan and Xia, Mingji},
  title =	{{The Complexity of Weighted Boolean #CSP Modulo k}},
  booktitle =	{28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)},
  pages =	{249--260},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-25-5},
  ISSN =	{1868-8969},
  year =	{2011},
  volume =	{9},
  editor =	{Schwentick, Thomas and D\"{u}rr, Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2011.249},
  URN =		{urn:nbn:de:0030-drops-30158},
  doi =		{10.4230/LIPIcs.STACS.2011.249},
  annote =	{Keywords: #CSP, dichotomy theorem, counting problems, computational complexity}
}

Search Results

Documents authored by Huang, Sangxia

Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering

Abstract

Cite as

Improved NP-Inapproximability for 2-Variable Linear Equations

Abstract

Cite as

The Complexity of Weighted Boolean #CSP Modulo k

Abstract

Cite as