Hierarchical Clusterings of Unweighted Graphs

Authors Svein Høgemo, Christophe Paul, Jan Arne Telle



PDF
Thumbnail PDF

File

LIPIcs.MFCS.2020.47.pdf
  • Filesize: 0.49 MB
  • 13 pages

Document Identifiers

Author Details

Svein Høgemo
  • Department of Informatics, University of Bergen, Norway
Christophe Paul
  • LIRMM, University of Montpellier, CNRS, France
Jan Arne Telle
  • Department of Informatics, University of Bergen, Norway

Cite As Get BibTex

Svein Høgemo, Christophe Paul, and Jan Arne Telle. Hierarchical Clusterings of Unweighted Graphs. In 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 170, pp. 47:1-47:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.MFCS.2020.47

Abstract

We study the complexity of finding an optimal hierarchical clustering of an unweighted similarity graph under the recently introduced Dasgupta objective function. We introduce a proof technique, called the normalization procedure, that takes any such clustering of a graph G and iteratively improves it until a desired target clustering of G is reached. We use this technique to show both a negative and a positive complexity result. Firstly, we show that in general the problem is NP-complete. Secondly, we consider min-well-behaved graphs, which are graphs H having the property that for any k the graph H^{(k)} being the join of k copies of H has an optimal hierarchical clustering that splits each copy of H in the same optimal way. To optimally cluster such a graph H^{(k)} we thus only need to optimally cluster the smaller graph H. Co-bipartite graphs are min-well-behaved, but otherwise they seem to be scarce. We use the normalization procedure to show that also the cycle on 6 vertices is min-well-behaved.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Hierarchical Clustering

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning, 56(1-3):89-113, 2004. Google Scholar
  2. P. Buneman. The recovery of trees from measures of dissimilarity. Mathematics in the Archaeological and Historical Sciences, pages 387-395, 1971. Google Scholar
  3. S. Chakrabarti, M. Ester, U. Fayyad, J. Gehrke, J. Han, S. Morishita, G. Piatetsky-Shapiro, and W. Wang. Data mining curriculum: A proposal (version 1.0). Technical report, Intensive Working Group of ACM SIGKDD, 2006. Google Scholar
  4. M. Charikar and V. Chatziafratis. Approximate hierarchical clustering via sparsest cut and spreading metrics. In Annual ACM-SIAM symposium on Discrete algorithms (SODA), pages 841-854, 2017. Google Scholar
  5. V. Cohen-Addad, V. Kanade, F. Mallmann-Trenn, and C. Mathieu. Hierarchical clustering: Objective functions and algorithms. Journal of ACM, 66(4):26:1-26-42, 2019. Google Scholar
  6. S. Dasgupta. A cost function for similarity-based hierarchical clustering. In Annual ACM symposium on Theory of Computing (STOC), pages 118-127, 2016. Google Scholar
  7. S. Dasgupta. Hardness of hierarchical clustering optimization. Private communication, 2019. Google Scholar
  8. R. Diestel. Graph theory. Springer-Verlag, 2005. Google Scholar
  9. J. Hartigan. Clustering algorithms. John Wiley and Sons, 1975. Google Scholar
  10. T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning: data mining, inference, and prediction. Springer Series in Statistics. Springer, second edition, 2009. Google Scholar
  11. K. Koutroumbas and S. Theodoridis. Pattern recognition. Academic Press, fourth edition, 2009. Google Scholar
  12. R. Sokal and P. Sneath. Numerical taxonomy. W.H. Freeman, 1963. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail