Testing graph cluster structure has been a central object of study in property testing since the foundational work of Goldreich and Ron [STOC'96] on expansion testing, i.e. the problem of distinguishing between a single cluster (an expander) and a graph that is far from a single cluster. More generally, a (k, ε)-clusterable graph G is a graph whose vertex set admits a partition into k induced expanders, each with outer conductance bounded by ε. A recent line of work initiated by Czumaj, Peng and Sohler [STOC'15] has shown how to test whether a graph is close to (k, ε)-clusterable, and to locally determine which cluster a given vertex belongs to with misclassification rate ≈ ε, but no sublinear time algorithms for learning the structure of inter-cluster connections are known. As a simple example, can one locally distinguish between the "cluster graph" forming a line and a clique? In this paper, we consider the problem of testing the hierarchical cluster structure of (k, ε)-clusterable graphs in sublinear time. Our measure of hierarchical clusterability is the well-established Dasgupta cost, and our main result is an algorithm that approximates Dasgupta cost of a (k, ε)-clusterable graph in sublinear time, using a small number of randomly chosen seed vertices for which cluster labels are known. Our main result is an O(√{log k}) approximation to Dasgupta cost of G in ≈ n^{1/2+O(ε)} time using ≈ n^{1/3} seeds, effectively giving a sublinear time simulation of the algorithm of Charikar and Chatziafratis [SODA'17] on clusterable graphs. To the best of our knowledge, ours is the first result on approximating the hierarchical clustering properties of such graphs in sublinear time.
@InProceedings{kapralov_et_al:LIPIcs.ICALP.2025.103, author = {Kapralov, Michael and Kumar, Akash and Lattanzi, Silvio and Mousavifar, Aida and Wrzos-Kaminska, Weronika}, title = {{Approximating Dasgupta Cost in Sublinear Time from a Few Random Seeds}}, booktitle = {52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)}, pages = {103:1--103:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-372-0}, ISSN = {1868-8969}, year = {2025}, volume = {334}, editor = {Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.103}, URN = {urn:nbn:de:0030-drops-234804}, doi = {10.4230/LIPIcs.ICALP.2025.103}, annote = {Keywords: Sublinear algorithms, Hierarchical Clustering, Dasgupta’s Cost} }
Feedback for Dagstuhl Publishing