,
Shai Dorian Peretz
,
Christian Schulz
Creative Commons Attribution 4.0 International license
We present CluStRE, a novel streaming graph clustering algorithm that balances computational efficiency with high-quality clustering using multi-stage refinement. Unlike traditional in-memory clustering approaches, CluStRE processes graphs in a streaming setting, significantly reducing memory overhead while leveraging re-streaming and evolutionary heuristics to improve solution quality. Our method dynamically constructs a quotient graph, enabling modularity-based optimization while efficiently handling large-scale graphs. We introduce multiple configurations of CluStRE to provide trade-offs between speed, memory consumption, and clustering quality. Experimental evaluations demonstrate that CluStRE improves solution quality by 89.8%, operates 2.6× faster, and uses less than two-thirds of the memory required by the state-of-the-art streaming clustering algorithm on average. Moreover, our strongest mode enhances solution quality by up to 150% on average. With this, CluStRE achieves comparable solution quality to in-memory algorithms, i.e. over 96% of the quality of clustering approaches, including Louvain, effectively bridging the gap between streaming and traditional clustering methods.
@InProceedings{chhabra_et_al:LIPIcs.SEA.2025.11,
author = {Chhabra, Adil and Dorian Peretz, Shai and Schulz, Christian},
title = {{CluStRE: Streaming Graph Clustering with Multi-Stage Refinement}},
booktitle = {23rd International Symposium on Experimental Algorithms (SEA 2025)},
pages = {11:1--11:20},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-375-1},
ISSN = {1868-8969},
year = {2025},
volume = {338},
editor = {Mutzel, Petra and Prezza, Nicola},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.11},
URN = {urn:nbn:de:0030-drops-232493},
doi = {10.4230/LIPIcs.SEA.2025.11},
annote = {Keywords: graph clustering, community, streaming, online, memetic, evolutionary}
}
archived version