Upper and Lower Bounds for Complete Linkage in General Metric Spaces

Authors Anna Arutyunova, Anna Großwendt, Heiko Röglin, Melanie Schmidt, Julian Wargalla



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2021.18.pdf
  • Filesize: 0.69 MB
  • 22 pages

Document Identifiers

Author Details

Anna Arutyunova
  • Universität Bonn, Germany
Anna Großwendt
  • Universität Bonn, Germany
Heiko Röglin
  • Universität Bonn, Germany
Melanie Schmidt
  • Universität Köln, Germany
Julian Wargalla
  • Universität Köln, Germany

Cite AsGet BibTex

Anna Arutyunova, Anna Großwendt, Heiko Röglin, Melanie Schmidt, and Julian Wargalla. Upper and Lower Bounds for Complete Linkage in General Metric Spaces. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 18:1-18:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2021.18

Abstract

In a hierarchical clustering problem the task is to compute a series of mutually compatible clusterings of a finite metric space (P,dist). Starting with the clustering where every point forms its own cluster, one iteratively merges two clusters until only one cluster remains. Complete linkage is a well-known and popular algorithm to compute such clusterings: in every step it merges the two clusters whose union has the smallest radius (or diameter) among all currently possible merges. We prove that the radius (or diameter) of every k-clustering computed by complete linkage is at most by factor O(k) (or O(k²)) worse than an optimal k-clustering minimizing the radius (or diameter). Furthermore we give a negative answer to the question proposed by Dasgupta and Long [Sanjoy Dasgupta and Philip M. Long, 2005], who show a lower bound of Ω(log(k)) and ask if the approximation guarantee is in fact Θ(log(k)). We present instances where complete linkage performs poorly in the sense that the k-clustering computed by complete linkage is off by a factor of Ω(k) from an optimal solution for radius and diameter. We conclude that in general metric spaces complete linkage does not perform asymptotically better than single linkage, merging the two clusters with smallest inter-cluster distance, for which we prove an approximation guarantee of O(k).

Subject Classification

ACM Subject Classification
  • Theory of computation → Facility location and clustering
Keywords
  • Hierarchical Clustering
  • Complete Linkage
  • agglomerative Clustering
  • k-Center

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Marcel R. Ackermann, Johannes Blömer, Daniel Kuntze, and Christian Sohler. Analysis of agglomerative clustering. Algorithmica, 69(1):184-215, 2014. URL: https://doi.org/10.1007/s00453-012-9717-4.
  2. Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms. SIAM J. Comput., 49(4), 2020. URL: https://doi.org/10.1137/18M1171321.
  3. Jaroslaw Byrka, Thomas W. Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 13(2):23:1-23:31, 2017. URL: https://doi.org/10.1145/2981561.
  4. Moses Charikar, Chandra Chekuri, Tomás Feder, and Rajeev Motwani. Incremental clustering and dynamic information retrieval. SIAM J. Comput., 33(6):1417-1440, 2004. URL: https://doi.org/10.1137/S0097539702418498.
  5. Aparna Das and Claire Kenyon-Mathieu. On hierarchical diameter-clustering and the supplier problem. Theory Comput. Syst., 45(3):497-511, 2009. URL: https://doi.org/10.1007/s00224-009-9186-6.
  6. Sanjoy Dasgupta and Philip M. Long. Performance guarantees for hierarchical clustering. J. Comput. Syst. Sci., 70(4):555-569, 2005. URL: https://doi.org/10.1016/j.jcss.2004.10.006.
  7. Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci., 38:293-306, 1985. URL: https://doi.org/10.1016/0304-3975(85)90224-5.
  8. Anna Großwendt and Heiko Röglin. Improved analysis of complete-linkage clustering. Algorithmica, 78(4):1131-1150, 2017. URL: https://doi.org/10.1007/s00453-017-0284-6.
  9. Anna Großwendt, Heiko Röglin, and Melanie Schmidt. Analysis of ward’s method. In Timothy M. Chan, editor, Proc. of the Thirtieth Annu. ACM-SIAM Symp. on Discrete Algorithms, SODA, pages 2939-2957. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975482.182.
  10. Anna-Klara Großwendt. Theoretical Analysis of Hierarchical Clustering and the Shadow Vertex Algorithm. PhD thesis, University of Bonn, 2020. URL: http://hdl.handle.net/20.500.11811/8348.
  11. D. Ellis Hershkowitz and Gregory Kehne. Reverse greedy is bad for k-center. Inf. Process. Lett., 158:105941, June 2020. URL: https://doi.org/10.1016/j.ipl.2020.105941.
  12. Dorit S. Hochbaum. When are np-hard location problems easy? Ann. Oper. Res., 1(3):201-214, 1984. URL: https://doi.org/10.1007/BF01874389.
  13. Dorit S. Hochbaum and David B. Shmoys. A best possible heuristic for the k-center problem. Math. Oper. Res., 10(2):180-184, 1985. URL: https://doi.org/10.1287/moor.10.2.180.
  14. Wen-Lian Hsu and George L. Nemhauser. Easy and hard bottleneck location problems. Discret. Appl. Math., 1(3):209-215, 1979. URL: https://doi.org/10.1016/0166-218X(79)90044-1.
  15. Guolong Lin, Chandrashekhar Nagarajan, Rajmohan Rajaraman, and David P. Williamson. A general approach for incremental approximation and hierarchical clustering. SIAM J. Comput., 39(8):3633-3669, 2010. URL: https://doi.org/10.1137/070698257.
  16. Joe H. Ward, Jr. Hierarchical grouping to optimize an objective function. J. of the Am. Stat. Assoc., 58:236-244, 1963. URL: https://doi.org/10.1080/01621459.1963.10500845.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail