Search Results

Documents authored by Parekh, Sujay


Document
Bounded Size Graph Clustering with Applications to Stream Processing

Authors: Rohit Khandekar, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Jay Sethuraman, and Joel Wolf

Published in: LIPIcs, Volume 4, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (2009)


Abstract
We introduce a graph clustering problem motivated by a stream processing application. Input to our problem is an undirected graph with vertex and edge weights. A cluster is a subset of the vertices. The {\em size} of a cluster is defined as the total vertex weight in the subset plus the total edge weight at the boundary of the cluster. The bounded size graph clustering problem ($\GC$) is to partition the vertices into clusters of size at most a given budget and minimize the total edge-weight across the clusters. In the {\em multiway cut} version of the problem, we are also given a subset of vertices called {\em terminals}. No cluster is allowed to contain more than one terminal. Our problem differs from most of the previously studied clustering problems in that the number of clusters is not specified. We first show that the feasibility version of the multiway cut $\GC$ problem, i.e., determining if there exists a clustering with bounded-size clusters satisfying the multiway cut constraint, can be solved in polynomial time. Our algorithm is based on the min-cut subroutine and an uncrossing argument. This result is in contrast with the NP-hardness of the min-max multiway cut problem, considered by Svitkina and Tardos (2004), in which the number of clusters must equal the number of terminals. Our results for the feasibility version also generalize to any symmetric submodular function. We next show that the optimization version of $\GC$ is NP-hard by showing an approximation-preserving reduction from the $\frac 13$-balanced cut problem. Our main result is an $O(\log^2 n)$-approximation to the optimization version of the multiway cut $\GC$ problem violating the budget by an $O(\log n)$ factor, where $n$ denotes the number of vertices. Our algorithm is based on a set-cover-like greedy approach which iteratively computes bounded-size clusters to maximize the number of new vertices covered.

Cite as

Rohit Khandekar, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Jay Sethuraman, and Joel Wolf. Bounded Size Graph Clustering with Applications to Stream Processing. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 4, pp. 275-286, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)


Copy BibTex To Clipboard

@InProceedings{khandekar_et_al:LIPIcs.FSTTCS.2009.2325,
  author =	{Khandekar, Rohit and Hildrum, Kirsten and Parekh, Sujay and Rajan, Deepak and Sethuraman, Jay and Wolf, Joel},
  title =	{{Bounded Size Graph Clustering with Applications to Stream Processing}},
  booktitle =	{IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science},
  pages =	{275--286},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-13-2},
  ISSN =	{1868-8969},
  year =	{2009},
  volume =	{4},
  editor =	{Kannan, Ravi and Narayan Kumar, K.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2009.2325},
  URN =		{urn:nbn:de:0030-drops-23250},
  doi =		{10.4230/LIPIcs.FSTTCS.2009.2325},
  annote =	{Keywords: Graph partitioning, uncrossing, Gomory-Hu trees, symmetric submodular functions}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail