{"@context":"https:\/\/schema.org\/","@type":"ScholarlyArticle","@id":"#article11067","name":"Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering","abstract":"We study k-means clustering in a semi-supervised setting. Given an oracle that returns whether two given points belong to the same cluster in a fixed optimal clustering, we investigate the following question: how many oracle queries are sufficient to efficiently recover a clustering that, with probability at least (1 - delta), simultaneously has a cost of at most (1 + epsilon) times the optimal cost and an accuracy of at least (1 - epsilon)?\nWe show how to achieve such a clustering on n points with O{((k^2 log n) * m{(Q, epsilon^4, delta \/ (k log n))})} oracle queries, when the k clusters can be learned with an epsilon' error and a failure probability delta' using m(Q, epsilon',delta') labeled samples in the supervised setting, where Q is the set of candidate cluster centers. We show that m(Q, epsilon', delta') is small both for k-means instances in Euclidean space and for those in finite metric spaces. We further show that, for the Euclidean k-means instances, we can avoid the dependency on n in the query complexity at the expense of an increased dependency on k: specifically, we give a slightly more involved algorithm that uses O{(k^4\/(epsilon^2 delta) + (k^{9}\/epsilon^4) log(1\/delta) + k * m{({R}^r, epsilon^4\/k, delta)})} oracle queries.\nWe also show that the number of queries needed for (1 - epsilon)-accuracy in Euclidean k-means must linearly depend on the dimension of the underlying Euclidean space, and for finite metric space k-means, we show that it must at least be logarithmic in the number of candidate centers. This shows that our query complexities capture the right dependencies on the respective parameters.","keywords":["Clustering","Semi-supervised Learning","Approximation Algorithms","k-Means","k-Median"],"author":[{"@type":"Person","name":"Gamlath, Buddhima","givenName":"Buddhima","familyName":"Gamlath","affiliation":"\u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, Lausanne, Switzerland"},{"@type":"Person","name":"Huang, Sangxia","givenName":"Sangxia","familyName":"Huang","affiliation":"Sony Mobile Communications, Lund, Sweden"},{"@type":"Person","name":"Svensson, Ola","givenName":"Ola","familyName":"Svensson","affiliation":"\u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, Lausanne, Switzerland"}],"position":57,"pageStart":"57:1","pageEnd":"57:14","dateCreated":"2018-07-04","datePublished":"2018-07-04","isAccessibleForFree":true,"license":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode","copyrightHolder":[{"@type":"Person","name":"Gamlath, Buddhima","givenName":"Buddhima","familyName":"Gamlath","affiliation":"\u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, Lausanne, Switzerland"},{"@type":"Person","name":"Huang, Sangxia","givenName":"Sangxia","familyName":"Huang","affiliation":"Sony Mobile Communications, Lund, Sweden"},{"@type":"Person","name":"Svensson, Ola","givenName":"Ola","familyName":"Svensson","affiliation":"\u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, Lausanne, Switzerland"}],"copyrightYear":"2018","accessMode":"textual","accessModeSufficient":"textual","creativeWorkStatus":"Published","inLanguage":"en-US","sameAs":"https:\/\/doi.org\/10.4230\/LIPIcs.ICALP.2018.57","funding":"This research was supported by ERC Starting Grant 335288-OptApprox.","publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","citation":["http:\/\/dx.doi.org\/10.4230\/LIPIcs.ITCS.2018.40","http:\/\/dl.acm.org\/citation.cfm?id=3157382.3157458","http:\/\/dx.doi.org\/10.1109\/FOCS.2010.36","http:\/\/dx.doi.org\/10.1145\/2450142.2450144","http:\/\/dx.doi.org\/10.1145\/2897518.2897647","https:\/\/books.google.ch\/books?id=riJuAQAACAAJ","http:\/\/dx.doi.org\/10.1145\/1247069.1247072","http:\/\/dx.doi.org\/10.1145\/177424.178042","http:\/\/dx.doi.org\/10.1145\/513400.513402"],"isPartOf":{"@type":"PublicationVolume","@id":"#volume6310","volumeNumber":107,"name":"45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)","dateCreated":"2018-07-04","datePublished":"2018-07-04","editor":[{"@type":"Person","name":"Chatzigiannakis, Ioannis","givenName":"Ioannis","familyName":"Chatzigiannakis"},{"@type":"Person","name":"Kaklamanis, Christos","givenName":"Christos","familyName":"Kaklamanis"},{"@type":"Person","name":"Marx, D\u00e1niel","givenName":"D\u00e1niel","familyName":"Marx"},{"@type":"Person","name":"Sannella, Donald","givenName":"Donald","familyName":"Sannella"}],"isAccessibleForFree":true,"publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","hasPart":"#article11067","isPartOf":{"@type":"Periodical","@id":"#series116","name":"Leibniz International Proceedings in Informatics","issn":"1868-8969","isAccessibleForFree":true,"publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","hasPart":"#volume6310"}}}