Document Open Access Logo

The Price of Hierarchical Clustering

Authors Anna Arutyunova, Heiko Röglin



PDF
Thumbnail PDF

File

LIPIcs.ESA.2022.10.pdf
  • Filesize: 0.78 MB
  • 14 pages

Document Identifiers

Author Details

Anna Arutyunova
  • Universität Bonn, Germany
Heiko Röglin
  • Universität Bonn, Germany

Cite AsGet BibTex

Anna Arutyunova and Heiko Röglin. The Price of Hierarchical Clustering. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 10:1-10:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ESA.2022.10

Abstract

Hierarchical Clustering is a popular tool for understanding the hereditary properties of a data set. Such a clustering is actually a sequence of clusterings that starts with the trivial clustering in which every data point forms its own cluster and then successively merges two existing clusters until all points are in the same cluster. A hierarchical clustering achieves an approximation factor of α if the costs of each k-clustering in the hierarchy are at most α times the costs of an optimal k-clustering. We study as cost functions the maximum (discrete) radius of any cluster (k-center problem) and the maximum diameter of any cluster (k-diameter problem). In general, the optimal clusterings do not form a hierarchy and hence an approximation factor of 1 cannot be achieved. We call the smallest approximation factor that can be achieved for any instance the price of hierarchy. For the k-diameter problem we improve the upper bound on the price of hierarchy to 3+2√2≈ 5.83. Moreover we significantly improve the lower bounds for k-center and k-diameter, proving a price of hierarchy of exactly 4 and 3+2√2, respectively.

Subject Classification

ACM Subject Classification
  • Theory of computation → Facility location and clustering
Keywords
  • Hierarchical Clustering
  • approximation Algorithms
  • k-center Problem

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Marcel R. Ackermann, Johannes Blömer, Daniel Kuntze, and Christian Sohler. Analysis of agglomerative clustering. Algorithmica, 69(1):184-215, 2014. URL: https://doi.org/10.1007/s00453-012-9717-4.
  2. Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. In 58th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 61-72, 2017. URL: https://doi.org/10.1109/FOCS.2017.15.
  3. Anna Arutyunova, Anna Großwendt, Heiko Röglin, Melanie Schmidt, and Julian Wargalla. Upper and lower bounds for complete linkage in general metric spaces. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM), pages 18:1-18:22, 2021. URL: https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2021.18.
  4. Anna Arutyunova and Heiko Röglin. The price of hierarchical clustering. CoRR, abs/2205.01417, 2022. URL: https://doi.org/10.48550/arXiv.2205.01417.
  5. Felix Bock. Hierarchy cost of hierarchical clusterings. Journal of Combinatorial Optimization, 2022. URL: https://doi.org/10.1007/s10878-022-00851-4.
  6. Jaroslaw Byrka, Thomas W. Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 13(2):23:1-23:31, 2017. URL: https://doi.org/10.1145/2981561.
  7. Moses Charikar and Vaggos Chatziafratis. Approximate hierarchical clustering via sparsest cut and spreading metrics. In Proc. of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 841-854, 2017. URL: https://doi.org/10.1137/1.9781611974782.53.
  8. Moses Charikar, Chandra Chekuri, Tomás Feder, and Rajeev Motwani. Incremental clustering and dynamic information retrieval. SIAM J. Comput., 33(6):1417-1440, 2004. URL: https://doi.org/10.1137/S0097539702418498.
  9. Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn, and Claire Mathieu. Hierarchical clustering: Objective functions and algorithms. In Proc. of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 378-397, 2018. URL: https://doi.org/10.1137/1.9781611975031.26.
  10. Wenqiang Dai. A 16-competitive algorithm for hierarchical median problem. SCIENCE CHINA Information Sciences, 57(3):1-7, 2014. URL: https://doi.org/10.1007/s11432-014-5065-0.
  11. Aparna Das and Claire Kenyon-Mathieu. On hierarchical diameter-clustering and the supplier problem. Theory Comput. Syst., 45(3):497-511, 2009. URL: https://doi.org/10.1007/s00224-009-9186-6.
  12. Sanjoy Dasgupta. A cost function for similarity-based hierarchical clustering. In Proc. of the 48th Annual ACM Symposium on Theory of Computing (STOC), pages 118-127, 2016. URL: https://doi.org/10.1145/2897518.2897527.
  13. Sanjoy Dasgupta and Philip M. Long. Performance guarantees for hierarchical clustering. Journal of Computer and System Sciences, 70(4):555-569, 2005. URL: https://doi.org/10.1016/j.jcss.2004.10.006.
  14. Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293-306, 1985. URL: https://doi.org/10.1016/0304-3975(85)90224-5.
  15. Anna Großwendt and Heiko Röglin. Improved analysis of complete-linkage clustering. Algorithmica, 78(4):1131-1150, 2017. URL: https://doi.org/10.1007/s00453-017-0284-6.
  16. Anna Großwendt, Heiko Röglin, and Melanie Schmidt. Analysis of ward’s method. In Proc. of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2939-2957, 2019. URL: https://doi.org/10.1137/1.9781611975482.182.
  17. Anna-Klara Großwendt. Theoretical Analysis of Hierarchical Clustering and the Shadow Vertex Algorithm. PhD thesis, University of Bonn, 2020. URL: http://hdl.handle.net/20.500.11811/8348.
  18. Dorit S. Hochbaum and David B. Shmoys. A unified approach to approximation algorithms for bottleneck problems. J. ACM, 33(3):533-550, 1986. URL: https://doi.org/10.1145/5925.5933.
  19. Guolong Lin, Chandrashekhar Nagarajan, Rajmohan Rajaraman, and David P. Williamson. A general approach for incremental approximation and hierarchical clustering. SIAM Journal on Computing, 39(8):3633-3669, 2010. URL: https://doi.org/10.1137/070698257.
  20. Sakib A. Mondal. An improved approximation algorithm for hierarchical clustering. Pattern Recognit. Lett., 104:23-28, 2018. URL: https://doi.org/10.1016/j.patrec.2018.01.015.
  21. C. Greg Plaxton. Approximation algorithms for hierarchical location problems. Journal of Computer and System Sciences, 72(3):425-443, 2006. URL: https://doi.org/10.1016/j.jcss.2005.09.004.
  22. Yuyan Wang and Benjamin Moseley. An objective for hierarchical clustering in euclidean space and its connection to bisecting k-means. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):6307-6314, 2020. URL: https://doi.org/10.1609/aaai.v34i04.6099.
  23. Joe H. Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236-244, 1963. URL: https://doi.org/10.1080/01621459.1963.10500845.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail