DROPS

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.52

Simultaneously Approximating All 𝓁_p-Norms in Correlation Clustering

Authors: Sami Davies, Benjamin Moseley, and Heather Newman

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

This paper considers correlation clustering on unweighted complete graphs. We give a combinatorial algorithm that returns a single clustering solution that is simultaneously O(1)-approximate for all 𝓁_p-norms of the disagreement vector; in other words, a combinatorial O(1)-approximation of the all-norms objective for correlation clustering. This is the first proof that minimal sacrifice is needed in order to optimize different norms of the disagreement vector. In addition, our algorithm is the first combinatorial approximation algorithm for the 𝓁₂-norm objective, and more generally the first combinatorial algorithm for the 𝓁_p-norm objective when 1 < p < ∞. It is also faster than all previous algorithms that minimize the 𝓁_p-norm of the disagreement vector, with run-time O(n^ω), where O(n^ω) is the time for matrix multiplication on n × n matrices. When the maximum positive degree in the graph is at most Δ, this can be improved to a run-time of O(nΔ² log n).

Cite as

Sami Davies, Benjamin Moseley, and Heather Newman. Simultaneously Approximating All 𝓁_p-Norms in Correlation Clustering. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 52:1-52:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{davies_et_al:LIPIcs.ICALP.2024.52,
  author =	{Davies, Sami and Moseley, Benjamin and Newman, Heather},
  title =	{{Simultaneously Approximating All 𝓁\underlinep-Norms in Correlation Clustering}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{52:1--52:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.52},
  URN =		{urn:nbn:de:0030-drops-201950},
  doi =		{10.4230/LIPIcs.ICALP.2024.52},
  annote =	{Keywords: Approximation algorithms, correlation clustering, all-norms, lp-norms}
}

Document

Invited Paper

DOI: 10.4230/LIPIcs.SWAT.2022.3

Reconstructing the Tree of Life (Fitting Distances by Tree Metrics) (Invited Paper)

Authors: Mikkel Thorup

Published in: LIPIcs, Volume 227, 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)

Abstract

We consider the numerical taxonomy problem of fitting an S× S distance matrix D with a tree metric T. Here T is a weighted tree spanning S where the path lengths in T induce a metric on S. If there is a tree metric matching D exactly, then it is easily found. If there is no exact match, then for some k, we want to minimize the L_k norm of the errors, that is, pick T so as to minimize ‖D-T‖_k = (∑_{i,j ∈ S} |D(i,j)-T(i,j)|^k) ^{1/k}. This problem was raised in biology in the 1960s for k = 1,2. The biological interpretation is that T represents a possible evolution behind the species in S matching some measured distances in D. Sometimes, it is required that T is an ultrametric, meaning that all species are at the same distance from the root. An evolutionary tree induces a hierarchical classification of species and this is not just tied to biology. Medicine, ecology and linguistics are just some of the fields where this concept appears, and it is even an integral part of machine learning and data science. Fundamentally, if we can approximate distances with a tree, then they are much easier to reason about: many questions that are NP-hard for general metrics can be answered in linear time on tree metrics. In fact, humans have appreciated hierarchical classifications at least since Plato and Aristotle (350 BC). The numerical taxonomy problem is important in practice and many heuristics have been proposed. In this talk we will review the basic algorithmic theory, results and techniques, for the problem, including the most recent result from FOCS'21 [Vincent Cohen-Addad et al., 2021]. They paint a varied landscape with big differences between different moments, and with some very nice open problems remaining. - At STOC'93, Farach, Kannan, and Warnow [Martin Farach et al., 1995] proved that under L_∞, we can find the optimal ultrametric. Almost all other variants of the problem are APX-hard. - At SODA'96, Agarwala, Bafna, Farach, Paterson, and Thorup [Richa Agarwala et al., 1999] showed that for any norm L_k, k ≥ 1, if the best ultrametric can be α-approximated, then the best tree metric can be 3α-approximated. In particular, this implied a 3-approximation for tree metrics under L_∞. - At FOCS'05, Ailon and Charikar [Nir Ailon and Moses Charikar, 2011] showed that for any L_k, k ≥ 1, we can get an approximation factor of O(((log n)(log log n))^{1/k}) for both tree and ultrametrics. Their paper was focused on the L₁ norm, and they wrote "Determining whether an O(1) approximation can be obtained is a fascinating question". - At FOCS'21, Cohen-Addad, Das, Kipouridis, Parotsidis, and Thorup [Vincent Cohen-Addad et al., 2021] showed that indeed a constant factor is possible for L₁ for both tree and ultrametrics. This uses the special structure of L₁ in relation to hierarchies. - The status of L_k is wide open for 1 < k < ∞. All we know is that the approximation factor is between Ω(1) and O((log n)(log log n)).

Cite as

Mikkel Thorup. Reconstructing the Tree of Life (Fitting Distances by Tree Metrics) (Invited Paper). In 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 227, pp. 3:1-3:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{thorup:LIPIcs.SWAT.2022.3,
  author =	{Thorup, Mikkel},
  title =	{{Reconstructing the Tree of Life (Fitting Distances by Tree Metrics)}},
  booktitle =	{18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)},
  pages =	{3:1--3:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-236-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{227},
  editor =	{Czumaj, Artur and Xin, Qin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2022.3},
  URN =		{urn:nbn:de:0030-drops-161631},
  doi =		{10.4230/LIPIcs.SWAT.2022.3},
  annote =	{Keywords: Numerical taxonomy, computational phylogenetics, hierarchical clustering}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2018.40

Approximate Clustering with Same-Cluster Queries

Authors: Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar

Published in: LIPIcs, Volume 94, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)

Abstract

Ashtiani et al. proposed a Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to make adaptive queries to a domain expert. The queries are of the kind "do two given points belong to the same optimal cluster?", where the answers to these queries are assumed to be consistent with a unique optimal solution. There are many clustering contexts where such same cluster queries are feasible. Ashtiani et al. exhibited the power of such queries by showing that any instance of the k-means clustering problem, with additional margin assumption, can be solved efficiently if one is allowed to make O(k^2 log{k} + k log{n}) same-cluster queries. This is interesting since the k-means problem, even with the margin assumption, is NP-hard. In this paper, we extend the work of Ashtiani et al. to the approximation setting by showing that a few of such same-cluster queries enables one to get a polynomial-time (1+eps)-approximation algorithm for the k-means problem without any margin assumption on the input dataset. Again, this is interesting since the k-means problem is NP-hard to approximate within a factor (1+c) for a fixed constant 0 < c < 1. The number of same-cluster queries used by the algorithm is poly(k/eps) which is independent of the size n of the dataset. Our algorithm is based on the D^2-sampling technique, also known as the k-means++ seeding algorithm. We also give a conditional lower bound on the number of same-cluster queries showing that if the Exponential Time Hypothesis (ETH) holds, then any such efficient query algorithm needs to make Omega (k/poly log k) same-cluster queries. Our algorithm can be extended for the case where the query answers are wrong with some bounded probability. Another result we show for the k-means++ seeding is that a small modification of the k-means++ seeding within the SSAC framework converts it to a constant factor approximation algorithm instead of the well known O(log k)-approximation algorithm.

Cite as

Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar. Approximate Clustering with Same-Cluster Queries. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 40:1-40:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{ailon_et_al:LIPIcs.ITCS.2018.40,
  author =	{Ailon, Nir and Bhattacharya, Anup and Jaiswal, Ragesh and Kumar, Amit},
  title =	{{Approximate Clustering with Same-Cluster Queries}},
  booktitle =	{9th Innovations in Theoretical Computer Science Conference (ITCS 2018)},
  pages =	{40:1--40:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-060-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{94},
  editor =	{Karlin, Anna R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2018.40},
  URN =		{urn:nbn:de:0030-drops-83358},
  doi =		{10.4230/LIPIcs.ITCS.2018.40},
  annote =	{Keywords: k-means, semi-supervised learning, query bounds}
}

3 Search Results for "Ailon, Nir"

Simultaneously Approximating All 𝓁_p-Norms in Correlation Clustering

Abstract

Cite as

Reconstructing the Tree of Life (Fitting Distances by Tree Metrics) (Invited Paper)

Abstract

Cite as

Approximate Clustering with Same-Cluster Queries

Abstract

Cite as