It’s Hard to HAC Average Linkage!

Authors MohammadHossein Bateni , Laxman Dhulipala , Kishen N. Gowda , D. Ellis Hershkowitz , Rajesh Jayaram , Jakub Łącki

Document Identifiers

Author Details

MohammadHossein Bateni
  • Google Research, New York, NY, USA
Laxman Dhulipala
  • University of Maryland, College Park, MD, USA
Kishen N. Gowda
  • University of Maryland, College Park, MD, USA
D. Ellis Hershkowitz
  • Brown University, Providence, RI, USA
Rajesh Jayaram
  • Google Research, New York, NY, USA
Jakub Łącki
  • Google Research, New York, NY, USA


We thank the anonymous reviewers for their useful comments.

MohammadHossein Bateni, Laxman Dhulipala, Kishen N. Gowda, D. Ellis Hershkowitz, Rajesh Jayaram, and Jakub Łącki. It’s Hard to HAC Average Linkage!. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Average linkage Hierarchical Agglomerative Clustering (HAC) is an extensively studied and applied method for hierarchical clustering. Recent applications to massive datasets have driven significant interest in near-linear-time and efficient parallel algorithms for average linkage HAC.
We provide hardness results that rule out such algorithms. On the sequential side, we establish a runtime lower bound of n^{3/2-ε} on n node graphs for sequential combinatorial algorithms under standard fine-grained complexity assumptions. This essentially matches the best-known running time for average linkage HAC. On the parallel side, we prove that average linkage HAC likely cannot be parallelized even on simple graphs by showing that it is CC-hard on trees of diameter 4. On the possibility side, we demonstrate that average linkage HAC can be efficiently parallelized (i.e., it is in NC) on paths and can be solved in near-linear time when the height of the output cluster hierarchy is small.

Subject Classification

ACM Subject Classification
  • Theory of computation → Parallel algorithms
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Theory of computation → Graph algorithms analysis
  • Clustering
  • Hierarchical Graph Clustering
  • HAC
  • Fine-Grained Complexity
  • Parallel Algorithms
  • CC


