Sublinear-time distributed algorithms for detecting small cliques and even cycles

In this paper we give sublinear-time distributed algorithms in the CONGEST\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf {CONGEST}$$\end{document} model for finding or listing cliques and even-length cycles. We show for the first time that all copies of 4-cliques and 5-cliques in the network graph can be detected and listed in sublinear time, O(n5/6+o(1))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n^{5/6+o(1)})$$\end{document} rounds and O(n73/75+o(1))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n^{73/75+o(1)})$$\end{document} rounds, respectively. For even-length cycles, C2k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2k}$$\end{document}, we give an improved sublinear-time algorithm, which exploits a new connection to extremal combinatorics. For example, for 6-cycles we improve the running time from O~(n5/6)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{O}}(n^{5/6})$$\end{document} to O~(n3/4)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{O}}(n^{3/4})$$\end{document} rounds. We also show two obstacles on proving lower bounds for C2k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2k}$$\end{document}-freeness: first, we use the new connection to extremal combinatorics to show that the current lower bound of Ω~(n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\varOmega }}(\sqrt{n})$$\end{document} rounds for 6-cycle freeness cannot be improved using partition-based reductions from 2-party communication complexity, the technique by which all known lower bounds on subgraph detection have been proven to date. Second, we show that there is some fixed constant δ∈(0,1/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta \in (0,1/2)$$\end{document} such that for anyk, a lower bound of Ω(n1/2+δ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varOmega (n^{1/2+\delta })$$\end{document} on C2k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{2k}$$\end{document}-freeness would imply new lower bounds in circuit complexity. We use the same technique to show a barrier for proving any polynomial lower bound on triangle-freeness. For general subgraphs, it was shown by Fischer et al. that for any fixed k, there exists a subgraph H of size k such that H-freeness requires Ω~(n2-Θ(1/k))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{\varOmega }}(n^{2-\varTheta (1/k)})$$\end{document} rounds. It was left as an open problem whether this is tight, or whether some constant-sized subgraph requires truly quadratic time to detect. We show that in fact, for any subgraph H of constant size k, the H-freeness problem can be solved in O(n2-Θ(1/k))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n^{2 - \varTheta (1/k)})$$\end{document} rounds, nearly matching the lower bound.


Introduction
In the subgraph-freeness problem, a network must decide whether its communication graph contains a copy of some fixed subgraph H or not. If the network is H -free, then all nodes should accept, but if the graph contains a copy of H , then at least one node should reject. The subgraph-freeness problem has received significant attention in the sequential world, and recently also in the distributed community. Other than being a fundamental graph problem, it has many application in other scientific fields such as biology and social sciences (e.g [39,41,42]).
From the theoretical perspective, distributed subgraph freeness is especially interesting because it is an extremely local problem: to solve H -freeness for a graph H of size k, 2 Computer Science Department, Tel-Aviv University, Tel-Aviv, Israel each node only needs to examine its own k-hop neighborhood. However, it is known that in bandwidth-constrained networks (the CONGEST model), subgraph freeness cannot always be solved efficiently [13,14,17,26,29,34]. In fact, it is not even known which classes of subgraphs H admit a sublinear-round distributed algorithm for H -freeness: some simple subgraphs, like constant sized odd-length cycles of length at least 5 are known to require linear time [14], and for any constant > 0 there are some constant sized subgraphs that require Ω(n 2− ) time [17]. In contrast, for triangles [7] and for constant even-length cycles [17], sublinear-time algorithms are known.
In this work, we seek to improve our understanding of sublinear-time algorithms for two classes of subgraphs: cliques and even-length cycles. We also show that any constant-sized subgraph can be detected in sub-quadratic time. Small cliques. We show for the first time that 4-cliques and 5cliques can be detected in sublinear time; previously, no nontrivial algorithm for K 4 -freeness or listing was known, and the same is true for K 5 (the trivial solution is simply to have each node send its entire neighborhood to all its neighbors, which requires Θ(n) rounds). In fact, our algorithm is even able to list all copies of K 4 , K 5 : The problem of enumerating all 4-cliques in the graph can be solved inÕ(n 5/6+o (1) ) rounds, and all 5-cliques can be enumerated inÕ(n 73/75+o (1) ) rounds in CONGEST using randomization.
Our algorithm builds on a recent approach of [7], which decomposes the graph into well-connected clusters and a sparse set of edges, and uses this to give an elegant algorithm that enumerates all triangles. A triangle that has two or three edges in the sparse part of the graph can be found using the sparsity of this edge set, while a triangle that has two edges in a well-connected cluster can be found by the cluster nodes. Unlike triangles, however, a 4-clique could be "split" between two clusters, with one edge in one cluster, another edge in another cluster, and the remaining edges crossing between the two clusters in the sparse part of the graph. With a 5-clique the situation becomes even more complex. Thus, listing all 4-cliques and 5-cliques requires significant effort and new ideas beyond [7]. Even-length cycles. Turning our attention to even-length cycles, C 2k for constant k, we give an improved sublineartime algorithm for C 2k -freeness (it is known that odd-length cycles require linear time [14]). Our improved algorithm exploits a new connection to extremal combinatorics: we show that the Zarankiewitz number of the cycle C 2k , which is the maximum number of edges in a bipartite graph that does not contain C 2k , plays a role in testing C 2k -freeness, even for non-bipartite graphs. This allows us to modify the algorithm from [17] and improve its running time: Theorem 2 For any constant integer k, C 2k -freeness can be solved using randomization in at mostÕ k (n 1−2/(k 2 −k+2) ) rounds for odd k ≥ 3, and at mostÕ k (n 1−2/(k 2 −2k+4) ) rounds for even k ≥ 4.
We remark that while our K 4 and K 5 algorithms list all copies of K 4 , K 5 in the graph, our algorithm for even-length cycles is only able to detect whether the graph contains a copy of C 4 . In general, cliques seem like a more "local" type of subgraph than cycles: the presence of a clique implies that all its nodes can communicate with each other, which is obviously not true for cycles. We formalize this intuition by showing that short cycles really are different from small cliques-it is not possible to enumerate all of them in sublinear time: Theorem 3 Any randomized algorithm in CONGEST enumerating all C 4 -copies in the graph requiresΩ(n) rounds.
Obstacles on proving lower bounds for even-length cycles and triangles. For any k ≥ 2, the lower bound for C 2kfreeness has been "stuck" at Ω( √ n) for a long time [14,29]. We give two reasons for why improving this lower bound might be hard.
First, using the same connection to Zarankiewicz numbers, we show that reductions to two-party communication complexity of the type used to prove all known lower bounds on H -freeness for any subgraph H [11,14,17,29], cannot be used to give a lower bound better thanΩ( √ n) on C 6freeness. Following [11], which showed a similar result for cliques, we show:

Related work
The problems of subgraph-freeness and listing have been extensively studied in both the centralized and the distributed settings; for conciseness, we mention here the most directly related results.
In [7,8] randomized algorithms based on expander decompositions for listing all triangles were shown, culminating in anÕ(n 1/3 ) round algorithm in the CONGEST model. This improved the previous algorithm of [27]. The algorithms of [7,8] finds a conductance decomposition of the graph, and then uses a routing result of [23,24] to quickly find all triangles contained or adjacent to some cluster in the decomposition. Our K 4 -listing algorithm uses the decomposition from [7], albeit in a somewhat different manner from the way it is used in [7]. We also use the K s -listing algorithm for the Congested Clique of [13] as a subroutine; [13] shows that for any subgraph H of size s, all copies of H can be found in O(n 1−2/s ) rounds in the Congested Clique.
To our knowledge, prior to our work, no sublineartime K s -freeness algorithm was known for any s ≥ 4 in CONGEST. However, in [3], it is shown that if the network contains an -near-clique of linear size, then an 3 -near clique of linear size can be found in constant time.
For 4-cycles, C 4 , a nearly-tight bound ofΘ( √ n) was shown in [14]. The lower bound was extended to anΩ( √ n) lower bound for any even-length cycle C 2k in [29]. In the Congested Clique, an O(n 0.158 ) round algorithm for C kdetection was shown by [6] based on algebraic methods. The first sublinear-time algorithm for C 2k -freeness for k ≥ 3 was given in [17], and we improve this algorithm here. It is known that odd-length cycles require nearly-linear time to detect [14].
It is shown in [17] that some subgraphs of size O(k) requireΩ(n 2− 1 k ) rounds to detect (for any constant k ≥ 2). There are algorithms for clique-freeness and cycle-freeness in related models, e.g., [5,6,13,16,18,19], but they are not directly relevant to our work, except as mentioned above.

Preliminaries
The CONGEST model. The CONGEST model is a synchronous network model, where computation proceeds in rounds. The network is modeled as a graph, G = (V , E). Each graph node v ∈ V initially knows its own neighborhood, denoted N (v). It is assumed that nodes have unique identifiers, which we conflate with V . In each round, each node of the network may send O(log n) bits on each of its edges, and these messages are received by neighbors in the current round.
Some of our algorithms rely on results from the Congested Clique model. As in CONGEST, we have an input graph G = (V , E), where each vertex V is a separate computing node, which initially knows its neighborhood. However, unlike CONGEST, in the Congested Clique, all the nodes can talk directly to each other: in each round, each node can send O(log n) bits to every other node in V , even if the edge between them is not in E. The Congested Clique admits very efficient algorithms for many distributed tasks, and our algorithms build on a clique detection algorithm for this model [13] by simulating it in regular CONGEST.
The main problem we are concerned with in this paper is the following: Definition 1 (Subgraph freeness and enumeration) Fix a connected constant-sized graph H .
In the H -freeness problem, the goal is to determine whether the input graph G contains a copy of H as a subgraph or not, namely, if there is a subset V ⊆ V and a subset E ⊆ E such that the graph (V , E ) is isomorphic to H . If G is H -free, then all nodes should accept, but if G contains H as a subgraph, then at least one node should reject.
In the H -enumeration (or listing) problem, each node outputs a (possibly empty) set of copies of H in G, such that together the nodes output all copies of H in G.
In the induced H -freeness problem, the goal is to determine whether the input graph G has an induced subgraph that is isomorphic to H , namely, whether there is a subset V ⊂ V for which (V , E(V )) is isomorphic to H , where E(V ) denotes the set of edges with both endpoints in V .
For a graph G, we let V (G) denote the vertex set of G, and E(G) denote the edges of G, and denote their sizes as n = |V (G)|, m = |E(G)| respectively. For a vertex v, we denote its degree by d (v). We denote the diameter of the graph by D.
The arboricity of the graph is defined as the minimum number of edge-disjoint forests required to cover the graph edges. A graph with m edges has arboricity at most O( √ m) [9].
For functions g, f , we say that g(n) =Õ( f (n)) if for some constant c it holds that g(n) = O( f (n) log c n).

Mixing time and conductance
For a graph G and a vertex v ∈ V , an l-length lazy random walk starting at vertex v is a random walk on vertices of G starting at v, defined as follows. The walk is performed for l steps, and for each step the walk remains in the current vertex with probability 1/2, and otherwise traverses to a uniformly random neighbor. Denote by P l (v, u) the probability that a l-length lazy random walk starting at v ends at u.

Definition 2
The mixing time τ mi x of of a graph G is defined as the minimum l such that Informally, the mixing time of a graph is the time required for a lazy random walk on the graph to approach its stationary distribution; The mixing time of a graph is closely related to its conductance, defined next. For a subset S ⊆ G, let Vol(S) denote the sum of degrees of the vertices in S, Vol(S) = v∈S d(v).
denotes the set of edges with one endpoint in S and one endpoint in V \ S. For S = ∅ or S = V , the conductance is defined as The following theorem by Jerrum and Sinclair [28] states the relation between the mixing time of a graph and its conductance. Theorem 7 (Corollary 2.3 in [28]) Let τ mix (G) denote the mixing time of a graph G, and let Φ(G) denote its conductance. It holds that: Remark 1 It is a well known property that D = O(log n/ Φ(G)) (see e.g. [10]). In particular, if the conductance of a graph is at least Ω(1/polylog(n)) (or equivalently, the mixing time is O(polylogn)) then the diameter is at most O(polylogn).

Enumerating all 4-cliques in sublinear time
In this section we show how to find all copies of K 4 in the network graph in O(n 5/6+o(1) ) rounds. Throughout the section, we will use two parameters, = 1/2, δ = 5/6. Since some parts of the algorithm will be reused in later sections with different values for , δ, we leave our results parameterized.
On a very high level, the algorithm works as follows: first, we decompose the edges E of the graph into two sets, E s and E m , by recursively applying a graph decomposition algorithm from [7]. The set E s is oriented (i.e., each edge is assigned a direction), and every node v has at mostÕ(n δ ) outgoing edges in E s ; therefore, cliques contained entirely in E s are easy to find inÕ(n δ ) rounds, by simply having all nodes announce to all their outgoing edges in E s to all their neighbors.
The set E m is the edge-disjoint union ofÕ(n 1−δ ) "very well-connected" clusters of nodes. On each such cluster, we can fairly efficiently simulate the Congested Clique algorithm for finding 4-cliques of [13]. However, we need to find all 4-cliques that have at least one edge in E m (since we already dispensed with all cliques contained entirely in E s ). This includes cliques {u 0 , u 1 , u 2 , u 3 } such that {u 0 , u 1 } ∈ E m , so that nodes u 0 , u 1 are together in some cluster C, but nodes u 2 , u 3 are outside cluster C; while nodes u 0 , u 1 together know about almost all edges of the clique, the edge {u 2 , u 3 } is not necessarily known to any node of C. If we want to find such cliques by simulating a Congested Clique algorithm on C, we must first "bring in" edges from outside C, so that the cluster nodes know about them and can use them in the simulation.
There can be Θ(n 2 ) edges outside of C, and we cannot afford to have all of them sent to nodes of C, because this could require too much time: for example, suppose that outside the cluster C we have a large set U of nodes that are all connected to each other (i.e., a large clique), and each node of U is connected to some node of C by one edge. In this case, we have roughly |U | 2 edges that could participate in 4-cliques with the nodes of the cluster C, but there are only |U | edges over which we can send these edges to C, so sending all |U | 2 edges would take |U | time. However, in this case, precisely because each node in U has few neighbors in C, there are not many possible 4-cliques it can form with the nodes of C, and we can use this fact to quickly find any such 4-cliques.
Thus, our algorithm considers two types of "external edges": if u 2 , u 3 do not have many neighbors in C, then we can use this fact to efficiently find all cliques containing u 2 , u 3 and two nodes from C. On the other hand, if either u 2 or u 3 does have many neighbors in C, they can quickly send their entire neighborhood to the nodes of C, by splitting their neighborhood and sending each part of it on a different edge to C. Then, the cluster nodes, having learned about the external edges, will use them in their simulation of the Congested Clique, and find any 4-clique containing u 2 , u 3 .
We now describe our algorithm in more detail. We begin by showing how we apply the expander decomposition from [7] in a slightly different manner than [7] uses it, and then how we find 4-cliques on the resulting decomposed graph.
Using our "diameter reduction" technique, given by Theorem 15, we assume w.l.o.g that the diameter of G is O(polylog(n)). This reduction is very similar to the diameter reduction in distributed property testing problems [16].

Expander decomposition
A main ingredient in the algorithm is the expander decomposition developed in [7], which partitions the edges E of the graph into three sets, E m , E s , E r , such that E m induces a set of "well-connected clusters", E s induces a directed graph with low out-degree, and |E r | ≤ |E|/6. The well-connected clusters in E m each satisfy the following: Definition 4 (n δ -cluster) Let c 1 , c 2 , c 3 > 0 be constants. A subset G = (V , E ) of G is called an n δ -cluster (or simply "cluster" for short) if it satisfies the following conditions: (1) It has mixing time at most c 1 · log c 2 (n), (2) Each vertex v ∈ V has degree at least c 3 n δ in E .
We call E s,v the edges oriented away from v in E s , and, (3) |E r | ≤ |E|/6.
In the sequel we abuse notation slightly, by thinking of a cluster as both a set of edges and the set of nodes that belong to the cluster.
It is shown in [7] that an n δ -decomposition can be computed efficiently in CONGEST. "Computing" the decomposition means that each node learns which of edges are in E m , E s or E r , and also learns the orientation for its edges in E s . Theorem 8 ([7], Theorem 1) Given a graph G = (V , E) with |V | = n, we can find, w.h.p., an n δ -decomposition iñ O(n 1−δ ) rounds in the CONGEST model.
In [7], the algorithm finds triangles by (a) Computing an n δ -decomposition, (b) Finding triangles that include an edge in either E s or E m , (c) Recursing on E r . (Edges are sometimes moved from E m to E r , but not too many.) The key property is that any triangle that includes at least one edge from E s , E m can be found efficiently by the algorithm of [7]; then these edges can be deleted, and we can recurse on the remaining edges (E r ). In contrast, when it comes to 4-cliques, we do not know how to efficiently find a 4-clique that includes an edge from E s or E m , if the remaining edges are all in E r . Therefore we cannot simply delete E s , E m and recurse (we would miss cliques whose edges are split between E s , E m and E r ). Instead, before we begin searching for 4-cliques, we eliminate all the edges in E r , by recursively applying the decomposition of Theorem 8 until no edges remain. We obtain the following decomposition of the graph.
Let C i 1 , . . . , C i k i be the connected components of E i m ; by Theorem 8, these are n δ clusters. Note that C i 1 , . . . , C i k i are vertex-and edge-disjoint, and by definition, Also, since |E i s | = O(n δ ) for each level i, and there are O(log n) levels, we have |E s | = O(n δ log n), as required.
Unique cluster leaders and identifiers are assigned as follows. We iterate through the levels, and for each level i, every cluster formed in step i learns the ID of the smallest node in the cluster; this is done by having all nodes forward the smallest ID they have heard so far, along all edges in E i m , for O(polylogn) rounds (recall that by Remark 1, the diameter of each cluster is O(polylogn)). Let u 1 i , . . . , u k i i be the smallest nodes in each of the clusters C 1 i , . . . , C k i i of level i. These are the cluster leaders for step i.
Next, we disseminate all the IDs u 1 i , . . . , u k i i of the leaders throughout the network, using pipelining [30]. This requires O(k i + D) = O(n 1−δ + D) rounds, because there are at most k i ≤ n 1−δ clusters (as each cluster has minimum degree n δ ). Finally, each cluster C j i is assigned the identifier (i, r j i ), where r j i is the rank of the leader u j i among step i's leaders, u 1 i , . . . , u k i i , sorted as natural numbers. Each node can compute the ID of the level-i cluster it belongs to (if any), because it knows the IDs of all the level-i leaders and of its own cluster's level-i leader.
The diameter reduction technique described in Sect. 6 allows us to assume w.l.o.g. that the diameter of the network is D = O(polylog(n)). Thus, inÕ(n 1−δ ) rounds, we can select the smallest node in each cluster, and disseminate the IDs of these nodes throughout the network.

Finding 4-cliques
In the remainder of the section we describe our 4-clique listing algorithm, analyze its round complexity and prove its correctness.
We begin by computing the decomposition from Lemma 1. Next, we look for cliques entirely contained in E s , the "lowdegree" part of the graph. Then we turn to the harder task, finding cliques that include at least one edge of E m (i.e., a cluster edge). As we explained above, we divide these cliques into two types: those that have two nodes external to the cluster with few neighbors in the cluster, and those that do not.
Step 1: Finding cliques contained in E s . To find 4-cliques contained entirely in E s , we simply have each node v send E s,v to all its neighbors. This requires O(n δ log n) rounds, because |E s,v | = O(n δ log n) for each v ∈ V . Then, any node that sees a 4-clique outputs it.
Step 2: Finding cliques containing an edge from E m .
Next, we search for copies of K 4 that have at least one edge in some cluster. We divide these cliques into two types.
Let ∈ (0, 1) be a parameter (as we said above, in this section we will use = 1/2, but the next section re-uses some of the machinery we develop here with a different ).
For a cluster C and a node v, If v is not C-light, then we say that v is C-heavy.
Now let H be a copy of K 4 that has at least one edge in E m (that is, in some cluster). We say that H is light if H contains at least two nodes that are C-light for some cluster C, and the two other nodes are contained in C. Otherwise, we say that H is heavy.
Step 2(a): Finding light cliques. To find cliques containing at least two nodes that are light with respect to some cluster, we iterate through the clusters sequentially, in lexicographic order of their cluster IDs. For each cluster C, all nodes that belong to C (i.e., have at least one edge in C) announce this fact to their neighbors. Next, each node v that is C-light sends M C (v) to all its neighbors; this requires n rounds, as v is C-light.
Upon receiving M C (u) from each C-light neighbor u ∈ N (v), node v forms a list of "candidates" -triplets of nodes that may complete a K 4 with v: For each candidate {u, c 1 , c 2 }, node v already knows that all edges of the 4-clique on {v, u, c 1 , c 2 } are present, except for edge {c 1 , c 2 }, which may or may not be present. To check, node v goes through its candidates, and sends each cluster neighbor c 1 ∈ C a list of "edge queries", the list of potential edges of c 1 that, if present, would complete a 4-clique. There are at most n such edges: by definition Node c 1 responds with the subset ({c 1 } × N (c 1 )) ∩ Q v,c 1 of edges that are actually present, and node v outputs the 4-cliques it has found.
Step 2(b): Finding heavy cliques. Finally, we look for 4cliques where at least one edge is in a cluster, and the other two nodes are not light w.r.t. that cluster; they might be in the cluster, or they might be outside it but have many neighbors in the cluster.
For a cluster C, let be the set of all edges incident to a node in C or to some C-heavy node. We iterate through the levels i = 1, . . . , s of the decomposition; our goal is to find 4-cliques contained entirely in F(C j i ), for all the level-i clusters, C 1 i , . . . , C k i i . To do this, we run in parallel a procedure on each level-i cluster C, in which we have the cluster nodes of each cluster simulate the execution of the Congested Clique algorithm for K 4 enumeration from [13] on the n-vertex graph (V , F(C)). This part of the algorithm is carried out in four steps: I. Pull: the nodes of the cluster C "pull" the edges of F(C) from nodes outside C, such that every edge of F(C) is learned by at least one node in C. II. Partition: in this step, we compute a partition {V c } c∈C of V , which is roughly balanced (i.e., |V c 1 | ≈ |V c 2 | for each c 1 , c 2 ∈ C). Each node c ∈ C will be responsible for simulating the nodes in V c . III. Shuffle: the cluster nodes shuffle the edges of F(C) between themselves, so that each node c ∈ C learns the F(C)-neighborhood N F(C) (v) for each node v ∈ V c it needs to simulate. This is carried out by using the routing algorithm from [23]. IV. Simulate: the nodes of C simulate an n-vertex Congested Clique, and run the K 4 -enumeration algorithm of [13] on the graph (V , F(C)) to list all the 4-cliques in F(C).

Remark 2
It is worth noting that [7] gives an extension of the Congested Clique algorithm of [13] to any graph with a low mixing time. This is unfortunately unsuited for our purposes, because in our case we do not run the algorithm only on edges incident to cluster nodes, but instead on a potentially much larger set, F(C). In addition, we simulate an Congested Clique of size n on an n δ -cluster, while in [7] the clusters do not need to simulate any external nodes.
As we said, these four steps are carried out in parallel for all the level-i clusters. Other than the first step, the remaining steps involve only intra-cluster communication (i.e., communication between nodes of the same cluster over the edges belonging to the cluster). Because the level-i clusters are vertex-disjoint components, we incur no extra congestion from simulating all level-i clusters in parallel; and since there are only O(log n) levels, we can easily afford to go through the levels one after the other.
We elaborate on each step below. Pull. As we explained, our goal in this step is to "pull" the edges F(C) into the cluster C, such that each edge in F(C) will be known to at least one node in C.
Each "internal" cluster edge {u, v} ∈ C is already known to its endpoints u, v ∈ C (recall that a node is said to belong to a cluster it has at least one edge in the cluster). Thus, we need only worry about the "outside" edges, F(C) \ C, the edges incident to a heavy node w.r.t C which are not contained inside the cluster.
To pull these edges into C, each C-heavy node v that is not in C partitions its edges arbitrarily into |M C (v)| sets, each of size at most and assigns to each such set to a cluster neighbor c ∈ M C (v) in a one-to-one manner. We denote the set assigned to a cluster neighbor c by S v→c . Then, in parallel for all c ∈ M C (v), node v sends S v→c to node c. This requires O(n 1− ) rounds.
For a node c ∈ C, let be the edges of F(C) that are learned by (or already known to) node c. Following the Pull step, each edge in F(C) is known to at least one node, and at most two nodes, in C: We define the initial load of node c ∈ C to be |F(c)|, the number of F(C)-edges node c knows (including its own edges).
Proof Each cluster node c ∈ C receives at most |S v→c | ≤ n 1− edges from each heavy neighbor v, and since c has at most n heavy neighbors, it receives at most n 2− edges in total. In addition to the edges it receives from heavy neighbors, node c has up to n − 1 edges of its own, and so |F(c)| ≤ n 2− + n − 1 < 2n 2− .
Partition. The nodes of C must now partition all the graph nodes V among themselves, in a roughly-balanced way, quickly and without a lot of communication. Every cluster node needs to learn the entire partition, not just its own part, so that it knows which node of V will be simulated by whom.
Ideally, we would assign each node of V to a uniformly random node in C, but representing such a random assignment could require up to n log n bits (depending on the size of C), so disseminating it to the nodes of C would require a lot of communication. Instead, we use a family of O(log n)-wise independent hash functions to ensure a relatively balanced partition, allowing us to represent the entire partition using only O(log 2 n) bits.
For convenience, we use only n δ nodes in C, which we call the active nodes, to carry out the simulation; the cluster leader selects these nodes by choosing n δ of its neighbors, u 1 , . . . , u n δ (recall that the minimum degree in each n δcluster is at least n δ , so the leader has enough neighbors.) The leader also selects an -wise independent hash function f : U → [n δ ], where = O(log n), and U is the domain from which IDs for the graph are drawn. Then the leader disseminates the assignment of active nodes, {(i, u i )} i=1,...,n δ , and the O(log 2 n)-bit representation of f to all the cluster nodes. Using pipelining [30], this requiresÕ(n δ + diam(C)) rounds, where diam(C) = polylog(n) is the diameter of the cluster C. (By Remark 1, as C has polylogarithmic mixing time, its diameter is also polylogarithmic.) Now let V u i = {v | f (v) = i} be the set of nodes that active node u i will simulate. Using a concentration result from [38], for O(log n)-wise independent random variables, we obtain: Lemma 2 Let = 4 log n, and assume that f is -wise independent. Then with probability at least 1 − 1/n, we have |V u i | ≤ 2n 1−δ for each active node u i .
We defer the proof of this technical lemma until the end of the section. Shuffle and Simulate. Next, we must route the edges of F(C), from the cluster nodes that know them initially to the nodes that need them for the simulation. Then we run a simulation of the algorithm of [13], where each node in C simulates at most O(n 1−δ ) nodes of G. For both purposes we use the routing scheme of [23], which allows us to deliver roughly n δ messages from each cluster node to other cluster nodes in n o (1) rounds.
There are some technical details involved in the simulation, because we must ensure that nodes do not try to send or receive too many messages at once. Ultimately, we show the following result (which will also be used in the next section):

Lemma 3 For a constant 0 < ≤ 1, suppose an edge set E is partitioned between the nodes of an n δ -cluster C, so that each node u ∈ C initially knows a subset E u of size at most O(n 2− ). Then a simulation of t rounds of the Congested
The proof is again quite technical, and it is deferred to the end of the section.
Using the simulation from Lemma 3, each cluster simulates the K 4 enumeration algorithm of [13]. In the full Congested Clique, this algorithm runs in O( √ n) rounds, but since we are using clusters that simulate one round of the Congested Clique in roughly n 2−2δ rounds, our running time will be O(n 2−2δ+1/2 ).

Correctness and running time of the algorithm
The algorithm's success hinges on the success of several steps: the decomposition of Lemma 1, and for each level of the decomposition, the Partition step (Lemma 2) and the Simulation step (Lemma 3). Since every step succeeds with probability at least ≥ 1 − 1 n 2 , by union bound all steps succeed w.h.p.. Let S be the event that all these steps succeeded.

Lemma 4 Conditioned on S, any copy of K 4 in G is outputted by at least one node.
Proof First, consider 4-cliques contained entirely in E s . These cliques are detected in Step (1): if there is a 4-clique on vertices u 1 , . . . , u 4 whose edges are all in E s , then for each i = j, either u i , u j ∈ E s,u i or u j , u i ∈ E s,u j , so either u i or u j will send u i , u j to all its neighbors -including the other three clique nodes. It follows that each node of the 4-clique will learn all edges of the 4-clique and will output it.
Next, let H be a 4-clique that is not contained in E s ; then H has at least one edge {c 1 , c 2 } in some cluster C. Let v 1 , v 2 be the other two nodes of H . If v 1 and v 2 are both C-light (i.e., if H is a light clique), then in Step 2(a), when we reach cluster C in the lexicographic ordering, v 1 and v 2 will send each other the edges {v 1 , . Then, v 1 will form the candidate {v 2 , c 1 , c 2 }, causing it to query c 1 and ask it whether edge {c 1 , c 2 } is present. Since this edge is present, c 1 will send back a positive answer, and at this point node v 1 will output H , as it will now know about the presence of all of its edges.
Finally, let H be a heavy 4-clique: it has some edge {c 1 , c 2 } in a cluster C, and of the other two nodes of H , at least one is C-heavy. Call this node u, and let v be the fourth node of H . In Step 2(b), in the Pull step corresponding to cluster C's level, node u sends the edge {u, v} to some C-neighbor w (i.e., {u, v} ∈ S u→w ), and at this point each edge of H is known to at least one node of C (since c 1 , c 2 are themselves in C). Conditioned on S, the Partiton and Simulation steps succeed in simulating the 4-clique detection algorithm of [13] (which is deterministic), and H is found by some node of C.

Lemma 5
The running time of the K 4 algorithm is O(n 5/6+o(1) ) rounds.
Proof Recall that we set = 1/2, δ = 5/6. Each step of the algorithm has the following running time:  (1) ) rounds for the Shuffle and Simulations steps.

Proof of Lemma 2
To analyze the partition used in the simulation, we rely on the following concentration bound: ) If X is the sum of k-wise independent random variables, each of which is confined to the interval [0, 1] and has μ = E[X], then for α ≥ 1 and assuming k ≤ αμe −1/3 ,

Proof of Lemma 2
Fix an active node u i , and for each v ∈ V , let X i,v be an indicator for the event that f (v) = i. Then By the tail bound given in Theorem 11, By union bound, the probability that any node in C has |V u i | > 2n 1−δ is bounded by 1 − 1/n.

Proof of Lemma 3
In the proof of the lemma, we use the routing algorithm of [23] for graphs with a low mixing time.
is a set of messages that node u wants to send to node v, and we have, That is, each node is the sender of at most s messages, and the target of at most t messages. The length of each message is assumed to be = O(log n), so that a single message can be sent in one round.

Proof
In the cluster C, every node has degree at least n δ , and the mixing time is Ω(polylog(n)); therefore, applying Theorem 12, we see that a set of routing requests where each node has n δ messages to send and receive can be fulfilled, w.h.p., in O(n o(1) ) rounds.
Given an (n α , n β )-routing demand, let be the set of messages node v needs to send. We assume w.l.o.g. that |M v | ≥ n α /2 (otherwise, node v can pad M v by adding empty messages).
The naïve strategy to route all the messages would be to have each cluster node v divide its messages M v into batches of n δ messages, and try to route each message to its destination; however, the resulting set of routing requests would not necessarily obey the requirement of at most n δ messages received at every node. Instead, for N = 16cn max(α,β)−δ log n iterations, in the i'th iteration each vertex v selects a ran- Then, all nodes attempt to route their selected messages S i v v∈C to their destination, using Theorem 12.
We say that an iteration i succeeds if (a) All messages selected are delivered to their destinations (b) Each node v sends at most n δ messages, that is, Now, let us analyze the probability that in N iterations, all iterations succeed. First, consider the third condition: that no node is the sender of too many selected messages. Each node v has |M v | ≤ n α , and every message in M v is selected with probability p = 1/2n max(α,β)−δ , so By Chernoff, assuming sufficiently large n, Similarly, since every node is the target of at most n β messages in total, and since n β · p ≤ n δ /2, the probability that in a given iteration a given node is the target of more than n δ messages is bounded by 1 1000n c+4 . For sufficiently large n, by union bound over all n nodes and all N iterations, the probability that the second or third conditions fail to hold in any iteration is bounded by 2 1000n c+4 · n · 16n max(α,β)−δ log n < 1 3 For each node message m ∈ v∈V M v , let I m be the number of times m is sent.
By Chernoff, As there are at most n 3 messages in total, all messages are sent at least once with probability at least 1 − 1 4n c . Therefore, by union bound, all iterations succeed, and all messages are sent with probability at least 1 − 1 n c .
Lemma 7 Let C be an n δ -cluster. For a constant 0 < ≤ 1, suppose an edge set E is partitioned between the nodes of C, so that each node u ∈ C initially knows a subset (1) ) rounds, every cluster node u ∈ C can learn N (V u ).

Proof
In the cluster C, every node has degree at least n δ , and the mixing time is O(polylog(n)); Therefore, by applying Lemma 6, each node v sends each edge (u, v) ∈ E v to V u . As each node sends at most O(n 2− + n) = O(n 2− ) messages, and each u is the recipient of at most O(n · |V u |) = O(n 2−δ ) messages, number of rounds taken for the routing to finish is Finally, once each cluster node u ∈ C has learned the edges in F(C) incident to all nodes in V u , we can simulate the Congested Clique and find any 4-cliques contained in F(C).
After the procedure of Lemma 7, each vertex u i learns N (V u i ), where each |V u | ≤ 2n 1−δ . Following this, a round in the congested clique on G can be simulated in O(n 2−2δ+o(1) ) rounds by using Lemma 6, with α = β = 2 − δ (i.e n · O(n 1−δ ) sent and received messages per node).

Enumerating all 5-cliques in sublinear time
In this section we show how to find all copies of K 5 in the network graph in O(n 73/75+o(1) ) rounds in CONGEST with high probability.
The algorithm is very similar to the K 4 enumeration algorithm from Sect. 3: it begins by decomposing the network into dense well-connected clusters, and then searches for cliques using different strategies, depending on whether the vertices of the clique are entirely contained, partially contained, or entirely outside some dense cluster. The "hard case" with K 5 , which we did not have with K 4 , is when two vertices v 1 , v 2 are in some dense cluster C 1 , and outside C 1 we have two vertices v 3 , v 4 which have few edges to C 1 ("light vertices") and one vertex v 5 with many edges to C 1 (a "heavy vertex"). Despite the fact that the two vertices v 1 , v 2 inside C 1 learn the edge (v 3 , v 4 ) at Step (1), they might not be able to propagate the light-light edges it learned inside the cluster in Step (2) due to having many light vertices adjacent to them. This differs from the K 4 -freeness, in which if both the outer vertices are light, the K 4 -copy is detected at Step (1). To handle this difficulty, we use a different decomposition, such that the number of "remainder" edges is a sub-constant fraction of the total number of edges in the graph. This implies a (slightly) sublinear bound on the arboricity of the edges outside any given cluster.

The decomposition
Our K 5 -listing algorithm begins by decomposing the graph into well-connected components, but this time we use a slightly different decomposition: instead of recursively applying the decomposition from [7], we use a variant of the decomposition, introduced in [12], and we use it only once (not recursively). This yields a decomposition into vertex-disjoint components (unlike in Sect. 3), which has the following properties: 3. |E r | ≤ c m 1−γ for some constant c > 0, and each edge of E r has endpoints in different connected components in the subgraph induced by the edge set E m . 1 We present the theorem in terms of mixing time, and not conductance. In addition, the minimum degree requirement is not mentioned explicitly in the original theorem, despite the fact that it is proven and strongly used throughout [12]. 2 Unfortunately, there is an error in the analysis of the conductance of a cluster in [12], and the theorem is cited with corrected parameters. This issue causes the round complexity of our algorithm to become n 73/75+o(1) instead of the n 21/22+o(1) claimed in the conference version of this paper. We thank the anonymous reviewer who brought this to our attention.
As in Sect. 3, we refer to the connected components induced by E m as clusters. Also, it is convenient to view the set E s as the union E s = v∈V E s,v , where E s,v are the edges of E s that are incident to node v. We think of the edges in E s,v as oriented away from v.
We say that an algorithm computes a tripartition T δ,γ (G) if at the end of the algorithm, every node v ∈ V knows which of its incident edges belongs to each of the sets E m , E s , E r .
It is shown in [12] that a tripartition can be computed quickly:

Properties of low-arboriticy graphs
We rely on two well-known properties of graph arboriticy, whose proofs can be found in [1].

Lemma 8 A graph with r edges has arboricity at most
Lemma 9 For a graph of arboricity a, it is possible to orient its edges such that each vertex has out-degree at most a. Moreover, there is a CONGEST algorithm with round complexity O(log n) for finding an orientation such that each vertex has at most 2a outgoing edges.
We note that the round complexity of the algorithm orienting the edges is independent of the value of the arboricity.
A converse lemma is also true, in a slightly weaker form:

Lemma 10
If the edges of a graph can be oriented such that each vertex has at most a outgoing edges, then the arboricity of the graph is at most 2a.

The algorithm
First, we apply the decomposition of Theorem 13 with parameters (δ, γ ), to obtain a tripartition of the edges (E m , E s , E r ) with E s having arboricity O(n δ ), |E r | ≤ c m 1−γ , and where each connected component (cluster) in E m has min-degree at least c n δ and mixing time O(n 12γ ). Following the decomposition, we go through a very similar procedure to that of Sect. 3.
Step 1: learning clique edges from E s ∪ E r E r contains at most c m 1−γ edges, and therefore has arboric- . By Lemma 10, we see that E r ∪ E s has arboricity at most O(n δ + n 1−γ ).
The network finds an orientation of the edges of E s ∪ E r such that the number of outgoing edges of each node is O(n δ + n 1−γ ). This can be done in O(log n) rounds using Lemma 9.
To find 5-cliques contained entirely in E s ∪ E r , we simply have each node v send all its outgoing edges in the arboricity orientation to all its neighbors. As the arboricity is O(n δ + n 1−γ ), this can be done in O(n δ + n 1−γ ) rounds. The nodes output the K 5 copies they detect in this step (copies fully contained in E s ∪ E r ).
Even if no clique is found, nodes remember the edges that they learned during this part of the algorithm as they move on to the next step.
Step 2: finding cliques containing an edge from E m Next, we search for copies of K 5 that have at least one edge in some cluster. As in the K 4 algorithm from Sect. 3, given a cluster C, we say that a node v is We do not use the same partition into "light" and "heavy" cliques that was used in Sect. 3. Instead, we use a combination of the strategies employed in Sect. 3, and then show by case analysis that any 5-clique will be detected.

Step 2(a): finding cliques using light vertices
This step is similar to the corresponding part in our K 4 algorithm. We iterate through the clusters sequentially, in lexicographic order of their cluster IDs. For each cluster C, all nodes that belong to C (i.e., have at least one edge in C) announce this to their neighbors. Let M C (v) = N (v) ∩ C denote the neighbors of vertex v that belong to C.
Next, each node v that is C-light sends M C (v) to all its neighbors; as v is C-light, this requires O(n ) rounds.
In parallel for each c 1 neighbor of v that is contained in a cluster for which v is light, v sends a list of "edge queries", There are at most O(n 1−δ+ ) such edges: as the minimum degree is Ω(n δ ), and the clusters are vertex disjoint, there are O(n 1−δ ) clusters, and for any C for which v is light, 1 of edges that are actually present, and node v outputs the 5-cliques it has found using the edges it just learned and the edges it remembers from Step (1).

Step 2(b): finding cliques using heavy vertices
This last part of the algorithm is identical to the last part of our K 4 algorithm. We repeat it here for the reader's convenience.
Let F(C) = {{u, v} | u ∈ Cu ∈ Cu ∈ C or u is C − heavy} be the set of all edges incident to a node in C or to some C-heavy node. Our goal is to find 5-cliques contained entirely in F(C), in parallel for all clusters. To do this, we have the cluster nodes simulate the execution of the Congested Clique algorithm for K 5 enumeration from [13] on the nvertex graph (V , F(C)). This part of the algorithm is carried out in four steps: I. Pull: the nodes of the cluster C "pull" the edges of F(C) from nodes outside C, such that every edge of F(C) is learned by at least one node in C.
. Each node c ∈ C will be responsible for simulating the nodes in V c . III. Shuffle: the cluster nodes shuffle the edges of F(C) between themselves, so that each node c ∈ C learns the needs to simulate. This is carried out by using the routing algorithm from [23]. IV. Simulate: the nodes of C simulate an n-vertex Congest Clique, and run the K 5 -enumeration algorithm of [13] on the graph (V , F(C)) to list all the 5-cliques in F(C).
All phases are performed as described in Sect. 3.

Algorithm correctness and analysis
Round complexity. By an analysis very similar to Sect. 3, the round complexity of the decomposition isÕ(n 1−δ+20γ ); the round complexity of Step (1) is O(n δ + n 1−γ ) as the arboricity of E s ∪ E r is O(n δ + n 1−γ ); the round complexity of Step (2a) is O(n 1−δ+ ); the round complexity of the "Pull" phase is O(n 1− ); The "Partition" phase is done locally; The round complexity of the "Shuffle" phase is O(n 2−δ− +12γ +o (1) ) and the round complexity of the "Simulate" phase is O(n 2−2δ+3/5+12γ +o(1) ), where n 12γ +o(1) in both the "Shuffle" and "Simulate" phases comes from the mixing time of the new decomposition used.

Correctness.
Lemma 12 All copies of K 5 are outputted by the algorithm with high probability.
Proof For a copy H of K 5 , there are several possible cases: I H is entirely contained in E s ∪ E r . II H has one edge in some cluster (i.e., in E m ), and the remaining edges are in E s ∪ E r . III H has a triangle in some cluster, and the other edges are all in E s ∪ E r . IV H has a triangle in some cluster, one edge in another cluster, and all other edges are in E s ∪ E r . V H has one edge in one cluster, one edge in another cluster, and all the other edges are in E s ∪ E r . VI H has a K 4 in some cluster, or is fully contained in one.
We note that any K 5 -copy must fall within at least one of these cases: if all vertices of H are in distinct clusters, it falls under case (1). If H has two vertices in the same cluster, then depending on whether there are two other vertices that share another cluster it falls within case (5) or (2). If H has 3 vertices belonging to the same cluster, then depending on whether the two remaining vertices share another cluster, it falls within case (3) or (4). Finally, if H has 4 or 5 vertices belonging to the same cluster, it falls within case (6).
We show that in each case, H is detected by our algorithm.
Observe that in the first step of the algorithm, all edges in (E s ∪ E r ) ∩ H (that is, all edges of the 5-clique that lie in E s or in E r ) are learned by all vertices of H : consider an edge {u, v} ∈ (E s ∪ E r ) ∩ H , and suppose that in the arboricity orientation of E s ∪ E r , the edge is oriented from u to v. In the first step of the algorithm, u sends its outgoing edges to all neighbors, and in particular, to all other nodes of the clique. Thus, all clique nodes learn about the edge {u, v}.
We immediately see that in Case I, where the clique H is entirely contained in E s ∪ E r , all clique vertices will find the clique.
In Case II, let {u, v} ∈ E m ∩ H be the edge of the clique that is in a cluster. Since (E s ∪ E r )∩ H is learned by all clique vertices, nodes u, v know about all edges of the clique, and they output it.
In Cases III and IV , there are three vertices u, v, w inside a cluster C, and two other vertices x, y outside C. If both vertices x, y outside C are light with regards to C, then in Step (2a), they query the edges {u, v} , {v, w} , {u, w}, and find the clique. If at least one of x, y is heavy, all clique edges will be known to the cluster C after the "Pull" procedure, so the clique will be detected by the cluster in the "Simulate" procedure of Step (2b).
In Case V, there are two clique edges in different clusters, is learned by all clique vertices in Step (1), so the vertices u 1 , v 1 already know all edges of the clique except {u 2 , v 2 }. If one of the vertices u 1 , v 1 is light with respect to C 2 , then in Step 2(a), the vertex queries in particular the edge {u 2 , v 2 }, and finds H .
If both u 1 , v 1 are C 2 -heavy, then in the "Pull" phase of Step (2b), the edge {u 1 , v 1 } will be sent into the cluster C 2 .
All other clique edges are incident to either u 2 or v 2 (or both, in case of the edge {u 2 , v 2 }), so they will also be used in Step (2b) when we simulate the Congested Clique. Thus, in the "Simulate" phase of Step (2b), the cluster C 2 will find H .
In Case VI, all edges of H edges are either contained in the cluster or are incident to a node in the cluster, so H is found in the "Simulate" phase of Step (2b).

Improved algorithm for C 2k -freeness
We now turn our attention to even-length cycles, and give an improved algorithm for C 2k -freeness.
Our main technical contribution is to show that, because of a new connection to Zarankiewicz numbers, if we have a graph that is C 2k -free, then this graph cannot have too many high-degree nodes. The bound we obtain is tighter than a bound used in [17], and it yields an improved algorithm for C 2k -freeness.
Fix a threshold n δ , and call a node u a high-degree node if deg(u) > n δ . Intuitively, a graph that is C 2k -free cannot have "too many" high-degree nodes, because it is known that such graphs are sparse: a graph that has no copies of C 2k can have at most O(n 1+1/k ) edges [20]. Thus, if our graph is C 2k -free, there are not too many high-degree nodes. We do not know in advance whether the graph is C 2k -free, but we execute the algorithm under the optimistic assumption that there are few high-degree nodes, and if this assumption does not hold, the algorithm will detect this and immediately reject. (Alternatively, we can first apply the diameter-reduction technique from Sect. 6, count the high-degree nodes, and reject if there are too many.) The C 2k -algorithm from [17] uses a bound on the number of high-degree nodes as follows. First, we search for a C 2k that contains a high-degree node: we go over these nodes one after the other, and start a BFS from each one to check if it participates in a copy of C 2k . Subsequently, the highdegree nodes are removed from the graph, together with all their edges. Next, we rely on an observation already used in [14], which is that a C 2k -free graph has arboricity at most O(n 1/k ); in particular, its vertices can be quickly partitioned into O(log n) layers V 1 , . . . , V , = Θ(log n), such that the number of edges from any node in layer V i to all higher layers j>i V j is O(n 1/k ). Together with the fact that all high-degree nodes have been removed from the graph, this partition allows us to quickly find any remaining copies of C 2k in the graph.
For an integer d, let V d = {v | d(v) ≥ d} be the set of vertices with degree at least d. Putting both parts together, the running time of the C 2k algorithm from [17] is given by: where again, δ ∈ (0, 1) is the parameter that determines the threshold for what is considered "high degree". In [17], the value of δ is chosen fairly naïvely: since a C 2k -free graph has at most O(n 1+1/k ) edges, for any δ we have |V n δ | ≤ O(n 1+1/k−δ ), and balancing the two terms in (1) yields the result.
Our C 2k algorithm retains the framework described above, but uses a tighter bound on |V d |, which we prove in Lemma 13 below. This allows us to choose a smaller value for δ, lowering the threshold for what we consider a "high-degree node". Bounding the number of high-degree vertices. We say that a bipartite graph G = (A ∪ B, E) is (a, b)-regular if for all v ∈ A it holds that d(v) = a and for all v ∈ B it holds that d(v) = b. Our tighter bound relies on known bounds for the Zarankiewicz number of even-length cycles: We rely on the following upper bound: Theorem 14 (Zarankiewicz numbers for cycles [32,40]) For any integers a, b ≥ 0 and k ≥ 2, for even k.
Perhaps surprisingly, even though Zarankiewicz numbers only bound the size of bipartite H -free graphs, it turns out that we can use them to solve even cycle freeness in general graphs, because they allow us to better bound the number of high-degree vertices in a C 2k -free graph.

Lemma 13 Let c = c(k) be a large enough constant. For
k−2 )), for even k ≥ 4.
Here, the notation O k (·) hides constants that depend on k. For an integer N and a graph H , denote by ex (N , H ) maximum number of edges an H -free graph with N vertices can contain.

Proof of Lemma 13
Define the following sets of edges: Observe that |E d | ≥ d · |V d |/2, since every node in V d has degree at least d. On the other hand, since G is C 2k -free, large enough in k (e.g [4]). But this implies that for a constant c ≥ 4c , we have The bipartite graph induced on (We rely on the fact that Zarankiewicz numbers are nondecreasing, because for any a and b > b, an H -free, (a, b)regular bipartite graph with e edges is trivially extended to a H -free, (a, b )-regular bipartite graph with e edges by simply adding nodes on the right side that have no edges.) For even k ≥ 4, by Theorem 14, If |V d | ≤ (4n) k/(k+2) then we are done, as the statement of the lemma is satisfied, so assume not. In that case, 2k n 1 2 = (4n) 1/2 · n 1/2 = 2n, so for the right-hand side of (2) we obtain: Together with (2), and by rearrangement For odd k ≥ 3, by Theorem 14, Again, if |V d | ≤ n k−1 k+1 then we are done, so assume |V d | > n k−1 k+1 . In this case, For the right-hand side of (4), we get , and by (4), Next, combined with the framework of [17], we show how to obtain anÕ k (n 1−2/(k 2 −k+2) )) round algorithm for odd k ≥ 3, and anÕ k (n 1−2/(k 2 −2k+4) ) round algorithm for even k ≥ 4 (Theorem 2).
Bounding the running time of our improved algorithm is somewhat hairy (due to the form of the bounds in Theorem 14). Before we prove the theorem, we state the following two technical lemmas, which are proven at the end of the section:
We obtain the following improved C 2k -freeness algorithm.
Proof of Theorem 2 As we explained above, it is shown implicitly in [17] that the round complexity of C 2k -freeness isÕ(min δ≥0 max(n (k−2)δ+ 1 k , |V n δ |)). We now instantiate this algorithm with a different choice of δ than the one used in [17].
For even k ≥ 4, plugging in the bound on |V n δ | from Lemma 13, we see that the running time is The first term increases with δ, taking its minimum value when n δ = cn 1/k ; the second term does not depend on δ, and the third decreases as δ increases. Therefore, by Lemma 15, min δ:n δ ≥cn 1/k max n (k−2)δ+ 1 k , n k k+2 , (n 1/2 /n δ ) As the first term is increasing in δ and the latter two are non-increasing in δ, by Lemma 15, min δ:n δ ≥cn 1/k max n
For the other direction, suppose for the sake of contradiction that and let γ ∈ [0, x) be some value that minimizes the righthand side, such that Since γ < x and f is increasing, But since g is decreasing and γ < x, we have g(γ ) > g(x), and therefore 1] max( f (δ), g(δ)), contradicting (10).

Reducing the network diameter in subgraph-detection problems
In this section show a reduction from general networks to networks with small diameter for H -freeness and Henumeration. This result uses a similar approach to the result of [16], which gives a similar result for property-testing of subgraph-freeness. We show two different reductions: a randomized reduction that reduces the diameter to k log n for |H | = k, and a deterministic reduction reducing the diameter to k · 2 O( √ log n log log n) .

Theorem 15 Consider a problem P which is either Hfreeness or H -enumeration, where |H | = k. Let A be a protocol that solves P in time T (n, D) with error probability
There is an algorithm A that solves P with round complexityÕ(T (n, O(k log n)) + k log 2 n) and error probability at most cρn log n + 1 poly(n) , for some constant c.
A (d, c)-network decomposition is a decomposition in which each node is given a color in The randomized algorithm to compute the decomposition is a straightforward adaptation of the strong diameter (1-hop) network decomposition of Elkin and Neiman [15], which itself is a variation of the parallel graph decomposition algorithm of Miller, Peng, and Xu [31]. As we have to change some details in the proof of [15], we give the proof later in this section for completeness. Note that in this variant of the decomposition a node may have more than one color.

Theorem 17 Let G = (V , E) be an n-node graph and let k ≥ 2 be an integer. There is a randomized CONGEST-model algorithm that w.h.p. constructs a set of clusters of diameter O(k log n) such that each node is in at least one cluster, the clusters are colored with O(log n) colors, and clusters of the same color are at distance at least k from each other in G.
The deterministic decomposition is taken to be the k-hop (d, c)-network decomposition of [22].

Theorem 18 ([22] Theorem 3) Let G = (V , E) be an n-node graph and let k ≥ 1 be an integer. There is a deterministic CONGEST-model algorithm that computes a k-hop
Our reduction works as follows: Consider a problem P that is either H -freeness or H -enumeration, for an H , |H | = k. Let A a protocol that solves P in time T (n, D). The network calculates a (2k +1)-hop decomposition (either using Theorem 17 or Theorem 18).
Denote by V i the set of vertices with color i, and for a vertex set A, let G k (A) be the graph induced by A and its k-neighborhood.
Next, sequentially for each color c i , the network runs algorithm A on each cluster of G k (V c i ). If P is an H -freeness problem, a node accepts if it accepts in A for all colors c i , and rejects otherwise. If P is the H -enumeration problem, a node outputs the set of all H copies detected by it in any G k (V i ).

Remark 3
We note that there is a known poly-logarithmic deterministic network decomposition [36], and assuming it can be extended to assure distance ≥ k between two clusters of the same color (which we believe should be possible), Theorem 16 can be improved to have only a poly-logarithmic overhead in round complexity by using this decomposition instead of the decomposition of [22].

Correctness and analysis
If the decomposition used was Theorem 17, the diameter in each V i is O(k log n) + k = O(k log n), so the round complexity is at most O(k log n·T (n, O(k log n))+k log 2 n).
If the decomposition used was Theorem 18, the diameter in each V i is k · 2 O( √ log n log log n) + k = k · 2 O( √ log n log log n) , so the round complexity is at most O( f (n) · T (n, k · f (n))).
Next, we show correctness:

Lemma 16 H 0 is a copy of H is G if and only if there exist i ∈ [c] such that H 0 is in G k (V i ).
Proof Let v be an arbitrary vertex in H 0 , and let c(v) be it's color. As H 0 has diameter ≤ k, it is fully contained in G k (V c(v) ). On the other hand, if there exist an i such that H 0 is a copy of H in G k (V i ), then clearly H 0 is a copy in G as well.
By Lemma 16 indeed G is H -free if and only if all G k (V i ) are H -free, and the set of copies of H in G is equal to the union of the set of copies of H in G k (V i ). In the randomized decomposition, the error probability of O(ρn log n + 1 polyn ) in Theorem 15 is obtained by taking union bound on the the event that any instance of A in a cluster in one of the O(log n) graphs fails, and the event that the decomposition itself fails. In order to make sure that clusters of the same color are at distance more than k on G, each phase is run (with new independent randomness) on the original graph G. That is, unlike in the algorithm of [15], which guarantees disjoint clusters, we do not remove nodes from the graph as soon as they are in some cluster. Together (I)-(IV) and the fact that different phases are independent directly imply the claim of the theorem. The algorithm for a single phase is defined as follows. First, each node v ∈ V , draws an independent random variable X v according to the exponential distribution with rate 1/k, i.e., X v ∼ Exp(1/k). For each node v ∈ V , we define a random radius R v := X v . 3 Note that w.h.p., each of the values R v is upper bounded by R max = O(k log n). The construction of the clusters is defined as follows. For any two nodes u, v ∈ V , we define m u,v :

Randomized decomposition
and we thus get that m u ,v − m u ,v ≤ k − 1, a contradiction to the assumption that u joins the cluster of v .
We next show (III), i.e., that the algorithm can be implemented in O(k log n) rounds in the CONGEST model. First note that a node u can only join a cluster of a node v 1 if ≥ 1 and it thus suffices for u to collect m u,v -values that are positive. Each node v forwards its ID and its radius R v to all nodes within distance In order for each node to receive the largest two m u,v -values without congestion, each node v starts flooding its ID at time R max − R v . Node u then gets v's ID exactly at time Because the largest two values m u,v 1 and m u,v 2 are also among the largest two values on their respective shortest paths from v 1 to u and from v 2 to u, the largest two values m u,v 1 and m u,v 2 reach node u without congestion and thus at time at most R max = O(k log n).
It remains to show that each node joins a cluster with probability at least 1/e. To prove this, we use a technical lemma from [31], which (slightly restated) shows the following: Lemma 17 [31] Let d 1 ≤ · · · ≤ d n be arbitrary values and X 1 , . . . , X n independent random variable from Exp(β).

Then the probability that there is a gap of at least c between the largest and the second largest values of
For any two nodes, let m u,v := X v − d G (u, v) (i.e., we replace R v by the original exponential random variable X v ). Note that if the difference between the largest two m u,v values for u is more than k, then the difference between the largest two m u,v values is more than k −1 and u joins a cluster. Claim (IV) now follows directly from Lemma 17 by choosing c = k and β = 1/k.

Lower bound on listing all copies of C 4
We show that listing all copies of C 4 in the graph requires Θ(n) rounds.
The upper bound is easy: Each node sends its neighbors all of its edges, and outputs all 4-cycles known to it. As the diameter of C 4 is 2, and at the end of the process each node knows its 2-neighborhood, it knows all 4-cycles it participates in.
For our lower bound, our reduction graph is the same as in [14], but we reduce from a different problem: let SetIntersection m be the 2-party problem where Alice and Bob receive sets X , Y ⊆ [m], and must output the intersection X ∩ Y : each player outputs some elements, such that the union of the elements output by the players is exactly X ∩ Y . The SetIntersection m problem is a generalization of the Disjointness m problem, in which the players need to determine whether X ∩ Y = ∅, and which is known to require Ω(m) bits for randomized protocols that succeed with high constant probability [35].

Lemma 18 If there is a protocol that lists all copies of C 4 in r rounds in graphs on 4n vertices, then there is a 2-party protocol for SetIntersection n 2 using O(r · n log n) bits.
Proof Given a protocol for enumerating C 4 , we can solve SetIntersection n 2 as follows: identify the universe of the SetIntersection problem with [n] × [n]. Alice and Bob construct a graph G X ,Y on nodes [n] × [4]. For each i ∈ [n], there is an edge between nodes (i, 1), (i, 2) and between nodes (i, 3), (i, 4). If (i, j) ∈ Y , then we add an edge between nodes ( j, 1), (i, 4), and if (i, j) ∈ X we add the edge (i, 2), ( j, 3).
We can simulate the execution of the distributed algorithm on G X ,Y just as in [14]: For t = 1, . . . , r communication rounds, Alice maintains the state in round t of the vertices of the form (i, 1) and (i, 3) for any i ∈ [n], and Bob maintains the state of the vertices of the form (i, 2) and (i, 4) for any i ∈ [n]. For t = 1, . . . , r rounds, each player sends the other player the bits sent in the t'th round on the edges of the form (i, 1), (i, 4) or the form (i, 2), (i, 3) for any i ∈ [n] (which form a cut between the vertices maintained by Alice, and the vertices maintained by Bob), and locally updates the state of the vertices it simulates to their state in end of the t'th round. As there are O(n) such edges, and the number of bits sent on each edge is O(log n), the communication cost of this protocol is O(n log n · r ). When we are done, Alice outputs the elements (i, j) corresponding to the 4-cycles output by nodes on her side of the graph, and Bob does the same for his side of the graph.

Two-party protocol for detecting 6-cycles
We show that a certain type of reduction from two-party communication complexity cannot prove a lower bound stronger thanΩ( √ n) for C 6 -freeness. This type of reduction is the one used in [14,29] to prove a lower bound ofΩ( √ n) for 4-cycles (in [14]) and for C 2k for any k (in [29]), as well as many other distributed lower bounds.
The reduction has the following structure (we call them "partition-based"): suppose we want to prove that a problem P is hard to solve in a class G of graphs, over a fixed set V of nodes, by reduction from some 2-party communication complexity problem f : {0, 1} N × {0, 1} N → {0, 1}. We fix a partition V A , V B of the vertices V , and assume that given their inputs X , Y ∈ {0, 1} n (respectively), the two players jointly construct a graph G X ,Y ∈ G, such that Alice knows the edges E A incident to the nodes of V A (even without knowing Bob's input Y ), and Bob knows the edges E B incident to V B (even without knowing Alice's input X ). Let S A , S B be the set of nodes in V A (resp. V B ) that have at least one edge to a node in V B (resp. V A ). We prove that Alice and Bob can simulate the execution of a distributed algorithm in the graph G X ,Y without using a lot of communication per round, and furthermore, that from a solution to the distributed problem P, they can compute a solution to the function f that they want to solve. Then, if a communication lower bound on f is known, it implies a round lower bound for solving P.
We show, using Theorem 14, that for any partition V A , V B of a vertex set V , if it is assumed that Alice knows all edges incident to V A and Bob knows the edges incident to V B , then Alice and Bob can check whether the graph is C 6 free, using O((|S A | + |S B |) · √ n log n) bits. This protocol rules out improving the existing lower bound of Ω( √ n) for C 6 -freeness of [29] by more than a logarithmic factor using partition-based reductions from 2-party communication complexity. (This result is inspired by [11], which showed a similar results for cliques. Our protocol, however, is completely different.) Alice and Bob begin the protocol by checking if their own input contains a copy of C 6 , and if so they reject and terminate. Otherwise, the players search for a 6-cycle that crosses the cut between them, and contains at least one edge from E A \ E B and at least one edge from E B \ E A .
For an edge set F and i ≥ 1, let F i ⊆ V 2 be the set of all pairs of nodes connected by a path of i edges from F.
be the edges Alice sees between her cut nodes; Alice sends these edges to Bob. In addition, Alice wants Bob to learn the set She does this in one of two ways: -If the edge set A 3 = {{u, w} ∈ E A | u ∈ S A , w / ∈ S A } is of size at most |S A | · √ n, Alice sends A 3 to Bob, and he computes A 2 from it. -Otherwise, Alice computes A 2 herself and sends it to Bob.
Define B 1 , B 2 , B 3 similarly; Bob acts symmetrically to Alice, so that she learns B 1 and B 2 . Finally, both players check if they "see" a 6-cycle: Alice rejects if either E 5 and similarly for Bob. The correctness of the protcol is proven by a case analysis, going over all possible ways a 6-cycle could be split between the two players. [6] ), then some player rejects.

Lemma 19 If G contains a 6-cycle
Proof C has 6 nodes, each in either V A or V B ; therefore, either |V A ∩V (C)| ≥ 3, or |V B ∩V (C)| ≥ 3. Assume w.l.o.g. that |V A ∩ V (C)| ≥ 3. Therefore, up to symmetry, one of the following three conditions must occur: either u 0 , u 2 , u 4 ∈ V A , or u 0 , u 1 , u 3 ∈ V A , or u 0 , u 1 , u 2 ∈ V A .
u 0 , u 2 , u 4 ∈ V A : since all edges of C are incident to a vertex of Alice, E(C) ∈ E A , and Alice immediately sees C. u 0 , u 1 , u 3 ∈ V A : we split into two cases. If at least one of u 2 , u 4 , u 5 is in E A then E(C) ∈ E A , and Alice immediately sees C. Otherwise, the only edge in C Alice does not have in her input is (u 4 , u 5 ). But since u 4 , u 5 ∈ S B , Bob sends (u 4 , u 5 ) in B 1 . u 0 , u 1 , u 2 ∈ V A : we divide into sub-cases.
-If u 3 , u 4 , u 5 ∈ V A , then E(C) ⊆ E A , and Alice immediately sees C.
because E A includes all edges incident to u 0 , . . . , u 4 , and this includes all edges of C. -If u 3 ∈ V A but u 4 , u 5 ∈ V B , then we have u 4 , u 5 ∈ S B (both nodes have edges to nodes in V A ) and {u 4 , u 5 } ∈ E B , and therefore {u 4 , u 5 } ∈ B 1 . After Bob sends B 1 to Alice, she will find C, as these nodes have edges to V A ). If u 4 ∈ S B , then {u 3 , u 4 } , {u 4 , u 5 } ∈ B 1 , and therefore Alice will find C, as More interestingly, let us show how Theorem 14 is used to prove that the players send each other a total of O((|S A | + |S B |) · √ n log n) bits.

Lemma 20 If E A and E B are C 6 -free, then the players send each other sets of size at most O((|S
Proof Consider first A 1 (and similarly, B 1 ). Recall that and since E A is C 6 -free, so is A 1 . Since the Túran number of 6-cycles is ex(n, Next let us bound the size of the second set Alice sends. If |A 3 | ≤ |S A | · √ n, then Alice sends A 3 ; so, assume that |A 3 | > |S A | · √ n. In this case Alice sends A 2 , the set of pairs in S A connected by a path of length 2 through some node not in S A . Note that |A 2 | ≤ |S A | 2 (as A 2 is a set of pairs from ; the set A 3 is the edge set of this graph. Using the bound on Zarankiewich Numbers in Theorem 14, because it is a C 6free graph, it has at most O((|S A |·|V A \ S A |) 2/3 ) = O((|S A |· n) 2/3 ) edges. However, since we assumed that |A 3 | > |S A | · √ n, we have where c is the constant from Theorem 14. But this implies that |S A | < c 3 √ n, completing the proof.

Subquadratic algorithm for subgraph-freeness problems
In [17], it was shown that for any constant k ≥ 1, there exist a subgraph H k of size |V (H k )| = O(k) such that any randomized algorithm solving H k -freeness in CONGEST has round complexity ofΩ(n 2− 1 k ). As k grows, the bound becomes arbitrarily close to quadratic round complexity. This leads to the natural question of whether there exists a single subgraph of constant size H for which the round complexity of H -freeness is truly quadratic,Ω(n 2 ).
In this section, we answer negatively, by showing an almost matching upper bound:

Theorem 19 For any constant integer k and any subgraph H of size k, there exists a randomized algorithm for H -freeness
in CONGEST with round complexity ofÕ(n 2− 2 3k+1 ), for both induced and non-induced variants.

Preliminaries
Before presenting the algorithm, we describe subroutines from prior work, which we use in our algorithm.
The algorithm requires an efficient distributed scheduling routine, which allows us to run several distributed protocols in parallel on the same network, as we define next. The problem of efficient scheduling has been extensively studied in the past; for more background on this topic, see the introduction of [21] and the references therein.
The Distributed Algorithm Scheduling (DAS) Problem: Given protocols A 1 , . . . , A the network produces an execution so that for each algorithm, each vertex outputs the same value as if that algorithm was run alone.
For a protocol A, let dilation(A) be its maximal round complexity, and let congestion(A, e) be the maximal number of messages passing on e. For a DAS instance, its dilation is the maximal dilation over all protocols, and the congestion is max e∈E i=1 congestion(A i , e) . We use the result of Ghaffari [21].
Theorem 20 (Theorem 1.3 [21]) For any instance of DAS, there is a randomized distributed algorithm using only private randomness that, with high probability, produces a schedule that runs all the algorithms in O(congestion + dilation · log n) rounds, after O(dilation · log 2 (n)) rounds of pre-computation.
In addition, in our algorithm we use [12]'s variant of the decomposition of [7], which was previously stated above in Theorem 13.

High-level overview of the algorithm
The main idea of our algorithm is to construct several centralized components, which are connected components of the graph that have high internal connectivity, allowing some node of the component to learn all internal edges of the component. Some edges of the graph cross between the centralized components, but not too many, so we can afford for all players to learn all cross-cluster edges. We can then think of each centralized component as a "single player", with the ability to perform any local computation on the graph of the centralized component, and use these "players" to solve the subgraph-freeness problem, by searching for copies of the subgraph that are partitioned between the players.
In the spirit of the notations used for expander decompositions, the inter-component edges of the centralized component decomposition are denoted by E r , and the intracomponent edges are denoted by E m .
Let A be the set of vertices that are incident to some inter-player (that is, inter-cluster) edge. The number of intercluster edges is very small, and hence A is also small. In order to find copies of the subgraph H , we enumerate over all possible partial mappings r of the vertices of H onto vertices in A (except the "empty" mapping, which maps no vertices), and check whether each such mapping can be completed into a valid mapping of H onto G. There are only n 2−Θ(1/k) different partial mappings of up to k vertices from H onto A, and each mapping can be checked in constant time.
We say that a partial mapping r is partially consistent if the following condition holds: for every edge {u 1 , u 2 } ∈ H , if the vertices u 1 , u 2 of H are mapped onto vertices v 1 , v 2 of G that lie in different centralized components, then the edge {v 1 , v 2 } is in E r . (To check for induced subgraphs, we also need to verify the converse direction.) Since all interplayer edges are known to the entire graph, each player can locally verify that a partial mapping is partially consistent, without any communication. Mappings that are not partially consistent are discarded.
Next, the players try to complete the partial mapping r into a full mapping, using only internal vertices and edges the centralized clusters to fill in the "missing pieces". (The mapping r already "decided" which edges of H are mapped onto inter-player edges, and we verified that those edges are present, so it remains to fill in the rest of the graph using only internal edges.) After removing the edges of H that the mapping r mapped onto inter-player edges, we are left with vertex-disjoint, connected subgraphs H 1 , . . . , H k of H . Each such "missing piece" H i can be completed only by one centralized com-ponent: there is at least one vertex u ∈ H i for which r (u) is defined, and belongs to some centralized component C (otherwise H i would not be a separate connected component when we remove the edges of H that are mapped onto E r ). Component C is the only component that can "fill in" the vertices in H i for which r is not defined: any attempt to use a vertex from a different centralized component would require the use of a new inter-player edge, but we are not allowed to use any such edges at this point. Thus, we can assign ownership of each piece H i to some player (centralized component), and have that player check whether or not it can extend r in order to map H i onto its vertices and edges. Each player announces to the other players whether or not it was able to complete all the pieces for which it is responsible; if all pieces were completed, then we have found a copy of H , and we reject. Otherwise we move on to the next partial mapping.

Decomposition into centralized components
Formally, the properties we require of the decomposition are as follows: Definition 9 (Centralized component decomposition) Given two parameters δ ∈ (0, 1) and an integer S > 0, a centralized component decomposition is a decomposition of E(G) into two parts, E m and E r , such that -The graph induced by E m is composed of connected components {R i }, such that in each R i there is a special root vertex that knows all edges in the subgraph G{R i }. Each such connected component is called a centralized component. -|E r | ≤ n 2−2δ S, and all vertices in G know all edges in E r .
In order to construct the centralized component decomposition, we rely on the following two lemmas, which, very roughly speaking, show that the relation of two cluster having many disjoint paths between them is an equivalence relation (i.e. transitive, symmetric and reflexive). Furthermore, if we have a set of clusters with high conductance and "sufficiently large" minimum cuts between them, then the nodes of these clusters can find many edge-disjoint paths between these clusters efficiently.
The first lemma is a well known fact of high conductance graphs; for the sake of completeness, we include a proof.

Lemma 21
Let C be a graph with conductance Φ and mindegree n δ . Then for any two vertices x, y ∈ C, there exist a set of Φ · n δ edge-disjoint paths between x and y in C.
Proof Let P be the edges of a minimum cut between x, y, and denote by V x , V y the vertices in x s and y s side of the cut respectively. As the minimum degree of both x, y is at least n δ , min(Vol(V x ), Vol(V y )) ≥ n δ . By definition of conductance (Definition 3), Φ ≤ |P|/n δ , or equivalently, |P| ≥ Φ · n δ .

Lemma 22
For ∈ (0, 1), δ > 6 , and S ∈ [n], let U be a graph, C a set of n δ -clusters in U with conductance at least 1/n 6 , and let p = |C|. Further assume that there is an ordering of the clusters of C, C 1 , . . . , C p with the property that for each j = 2, . . . , p, there exists an index i ∈ {1, . . . , j − 1} such that the minimum cut separating C j from C i has size at least S. Then for each cluster vertex x ∈ p i=1 C i and for each y ∈ C 1 , (1) There exist t = min(S, n δ−6 /2) edge-disjoint paths P x,y ∈[t] connecting x and y in U .
(2) Suppose that each vertex of U knows all edges which are not internal to a cluster in C, and that every cluster vertex v ∈ C j knows all edges of C j (for all j = 1, . . . , p). Then each vertex u in U can locally decide which of its edges (if any) participate in each path P x,y , for any ∈ [t].
Proof To prove (1), we prove equivalently that there does not exist a cut of size smaller than t separating a vertex in one of the clusters of C from a vertex in C 1 . Let us assume by contradiction that this is not true. Let C j be the first cluster in the ordering such that there exists a set R of edges, such that |R| < t and removing R separates a vertex v j ∈ C j from a vertex v ∈ C 1 . If j = 1, then by Lemma 21 there exist a set of edge-disjoint paths between vertices x, y of cardinality n δ /n 6 = n δ−6 . For j > 1, let i < j be an index such that C i and C j have S edge-disjoint paths. Therefore, they must be connected by at least one path, P j,i , that is disjoint from R. Let u j ∈ C j be the origin of P j,i in C j , and let u i ∈ C i be the end point of that path. Also note that as C j has high conductance, and high min-degree, it does not have an induced cut smaller than n δ−6 . Therefore, there is a path P v j ,u i from v j to u i that is disjoint from R. Hence, As we assumed C j is the first cluster that has a vertex not connected to a vertex in which is a contradiction, thus proving (1).
To prove (2) we use a theorem by Nash-Williams [33], stating the following: for any undirected graph C with minimum cut c there exists a set of c/2 edges-disjoint spanning trees of C. Hence, each cluster has t edge-disjoint spanning trees. Let U c be a graph in which every cluster C i is represented by a node v * i and every edge with an endpoint v ∈ C i is replaced by an edge with the endpoint v * i . Since all vertices in U know all non inner-cluster edges, it follows that all vertices know the graph U c . For every x, y, all vertices in U compute the paths {P x,y } ∈ [t] in U c , identically and locally. If path P x,y enters the cluster C j via an edge that in U leads to vertex u ∈ C j , and leaves it via an edge leaving v ∈ C j , the vertices of C j complete the path in U using the path on the th spanning tree of C j .
Proof Assume the network is given parameters δ ∈ (0, 1) and S > 0 (which informally represent the min-degree inside clusters, and the inter-connectivity between clusters inside the same equivalence class respectively), and let = δ/14. First, using Theorem 13 the network computes the decomposition of [12], setting E m , E s , E r ← T δ, (G) iñ O(n 1−δ+20 ) rounds. Following this, each vertex broadcasts to the entire network its id, its cluster-ID, and all of the edges incident to it that are in E s ∪ E r . As there are at most O(n 1+δ + m 1− ) =Õ(n 1+δ + n 2−2 ) edges in E s ∪ E r , and at most n cluster-vertices, broadcasting this information can be done inÕ(n 1+δ + n 2−2 ) rounds. In addition, all vertices learn all edges in their cluster. This can be done using the routing scheme described in Theorem 12 inÕ(n 2−δ+12 ) rounds.
Let G(E s ∪ E r ) be the graph induced by edges of E s and E r . Define a min-cut between two clusters C i , C j as the minimum-sized set of edges in G(E s ∪E r ) that their removal disconnects C i and C j in G. Two clusters are said to have a large cut if their min-cut is larger than S, and otherwise they are said to have a small cut. Let M(C i , C j ) be a minimalsized cut between C i and C j if it is small, and otherwise M(C i , C j ) = ∅. Let E r = i = j M(C i , C j ), and let G = (V , E \ E r ) be the graph that is obtained from G by removing the set of small-cuts edges. Note that |E r | ≤ n 1−δ 2 · S = O(n 2−2δ ) · S, as there are at most n 1−δ clusters, and for each pair we add to E r at most S edges.
Let the cluster graph G c be the graph where each cluster is a node, and each two nodes have an edge iff their clusters have a large min-cut. Note that since the edges of E s and E r are known to all vertices in G, all vertices know G c .
The vertices of G compute an ordering on the clusters by computing locally and identically a BFS of the graph G c , and order the clusters by the order their respective nodes were discovered by in the BFS. The above order defines a partial ordering P i on the clusters of each component CC i . For every cluster C j ∈ CC i except for the first cluster according to P i , the cluster that discovered it in the BFS C j , precedes it in P i and the minimal separating cut between them in G is larger than S.
By Lemma 22(1) on U = G and C as the clusters of a connected component CC i in G , we get that for each connected component CC i in G , all vertices in the first cluster in P i are connected by t disjoint paths to each of the other clusters vertices in the component.
For each connected component CC i in G , the vertices of the component chooses the cluster vertex with the highest ID to be its "root" vertex, v * i , and each vertex compute t = min(n δ−6 /2, S) disjoint paths from every cluster vertex in CC i to v * i using the algorithm described in Lemma 22 part (2).
Next, each cluster-vertex sends all its edges that are in E m to v * i by dividing the edges (almost) equally between the t disjoint paths using Theorem 20. We define the set of algorithms the network has to schedule to be {A e } e∈CC i where A e is the task of sending e to v * i . The congestion on any given edge resulting from a single cluster-vertex v is at most O(d(v)/t) and the dilation is at most O(n). The total congestion on an edge is therefore at most v∈V d(v)/t = O(n 2 /t). By Theorem 20, sending all edges can be done iñ O(n 2 /t + n) rounds.

Detailed description of the H-freeness algorithm
Let 0 < δ ≤ 1 and S ∈ [n] be parameters of the algorithm that will be set subsequently. First, the network computes the decomposition of Lemma 23 with δ and S. Let {CC i } denote the centralized components of G m = (V , E m ). Detecting an instance of H that is completely contained in one of the centralized components CC i can now be done in zero rounds, by having the root vertex check for it. Hence, we only need to consider instances of H that contain vertices from different centralized components.
Let A be the set of vertices incident to an edge in E r . The network sequentially considers each set A ⊂ A of vertices of size at most k, and each mapping r A : If there is such a mapping then it sends it to smallest-ID root vertex. Observe that the union of all partial extensions with r A is a mapping of an induced instance of H into H . This is the case since edges of H are either mapped from two vertices in A or two vertices from a single centralized components CC i of G m . In the first case, the mapping r A ensures that this edge is mapped to an edge in H , and in the latter case this is ensured by the partial extension computed by CC i . As for "non-edges", assume towards a contradiction that there exists an edge (u, v) in G that is mapped to a non-edge in H . If both u and v belong to the same CC i then this contradicts the extension computed by the root vertex of CC i . Otherwise, u and v belong to two different centralized components, implying that u, v ∈ A and therefore in A, and this contradicts the partial mapping r A . Hence, the smallest-ID root vertex can detect if the union of all the partial mappings are a mapping of an induced instance.

Analysis and round complexity
Correctness: We show that the algorithm accepts if and only if there is an instance, or an induced instance (depending on what we wish to check), of H in G. We give the proof for induced instances, as the proof for non-induced is very similar.
If there exists an induced instance of H in G, and it is contained in a connected component of G m , then it is found by the root vertex of that connected component. Otherwise, let A, r A be the set of vertices in A that are part of the instance, and the mapping of them to H respectively. When the algorithm considers A, r A for the vertices incident to E r and these vertices' their mapping to the instance respectively, it accepts, as each connected component H i of H is discovered by the centralized component CC r A (H i ).
If the algorithm accepts, then there is a choice of A and mapping r A , and some extensions of mappings in the connected components of G that together make H . It remains to show that these mappings do not contradict one another. As different mappings map vertices from different connected components, no vertex can appear in two mappings, and vertices between mappings cannot be neighbors. The parts of H that are mapped in different extensions of A also do not share vertices, and their edges are disjoint. Therefore, the mappings do not contradict. Each choice of a set |A| ≤ k, and every mapping from A to V (H ) requires at most |A| root vertices in the algorithm to send at most O(1) messages to the smallest-ID root vertex. Therefore, the process of verifying each choice of A, r A has dilation of at most O(n) and congestion of at most O(1). Therefore, they can all be run in n 2−2δ S ≤k = k i=0 n 2−2δ S k = O(n k(2−2δ) S k ) rounds using the scheduling algorithms of Theorem 20. Therefore the total complexity isÕ(n k(2−2δ) S k + n 2 /S + n 2−δ/7+o(1) + n 1+δ ). Taking δ = 1 − 2 3k+1 , and S = n 2 3k+1 we get that the round complexity isÕ(n 2− 2 3k+1 + n 2−1/7+2/(21k+7)+o (1) ). In particular, if k ≥ 5 this gives O(n 2− 2 3k+1 + n 1.88 ) =Õ(n 2− 2 3k+1 ) rounds. We note that for k ≤ 4, there is a trivial O(n) algorithm for H -freeness, in which every node sends its entire neighborhood to each of its neighbors. It can be shown by a simple case analysis that if the network is not H -free, there exists a node that can detect a copy of H .

Hardness of lower bounds for C 2r -freeness and triangle freeness
We show that any sufficiently strong lower bound on C 2rfreeness or on triangle-freeness in CONGEST would imply new circuit lower bounds -specifically, an explicit polynomial depth lower bound for circuits with constant fan-in and fan-out, input size N , and N 1+α wires (where α > 0 is a constant). This means that such bounds are unlikely to be proven without a major breakthrough in circuit complexity. In comparison, the best known lower bound for the size of a circuit on an explicit problem in a O(log N ) depth circuit with arbitrary binary Boolean operator gates is (3 (see e.g. [25]), and it is considered a major open problem to obtain an explicit super-linear wire lower bound for these circuits. The barrier is proven by showing a reduction from solving the problem in general graphs to solving it in highconductance graphs, and then showing that high-conductance clusters can simulate the class C of circuits described above. This implies that in order to rule out an efficient CONGEST algorithm, one must also rule out the existence of a low-depth circuit in the class C.
Throughout this section, we assume that we are working with graphs that have diameter O(log n); as we showed in Sect. 6, this assumption is without loss of generality. Having small diameter allows us to quickly perform some global operations, such as computing the number of edges in the graph.

Hardness of lower bounds for C 2r -freeness
In this subsection we show that anΩ(n 1− 1 361 2 ) lower bound on C 2r -freeness in the CONGEST model would imply a circuit lower bound. The constant 1/361 2 in the exponent results mostly from the parameters of the expander decomposition, which we use in our reduction.
As we said above, the reduction has two steps. We begin by showing that solving C 2r -freeness on any graph G can be reduced with high constant probability to solving C 2rfreeness on a graph G , such that G has small mixing time.
One ingredient in the reduction is the following lemma, which shows that for every edge e ∈ G, it can be decided with constant probability p whether e participates in an rcycle in O(r r ) rounds. The proof is a simplified version of the proof of Theorem 3 in [16].
Lemma 24 (Simplified version of Thm. 3 in [16]) Given a graph G = (V , E), an edge e ∈ E, a value r ∈ N and a value p < 1, there exists a randomized O(r r log 1/ p)-round algorithm that with probability at least 1 − p detects if e participates in an r -cycle. At the end of the algorithm, the endpoints of edge e know whether or not they participate in an r -cycle.
Proof We follow the proof of Theorem 3 in [16].
We first assign each node w ∈ V a uniformly random value color(w) ∈ [r ], and remove all directed edges (u, v) which are not colored (i, (i +1) mod r ) for some i ∈ r . Note that any directed cycle that remains must have length at least r .
Next, we start a (r − 1)-round BFS from the smaller-ID endpoint of edge e. The BFS may cross only edges that were not discarded. After r − 1 rounds, if the larger-ID endpoint of e receives the BFS token, then we have found an r -cycle that includes e.
Each invocation of the procedure above has one-sided error probability of 1 − 1/r r −1 : if edge e does not participate in an r -cycle, then it will never receive its BFS token after r rounds, because we have eliminated all cycles of length smaller than r ; and if e does participate in an r -cycle, then with probability 1/r r −1 , the cycle will be colored in ascending cyclic order (mod r ), starting from the smaller endpoint of edge e and ending at the larger endpoint. When this event occurs, the cycle is detected. To boost the success probability to 1 − p, we repeat O(r r −1 · ln(1/ p)) times, using new independently-chosen colors each time.
We are now ready to prove the first part of the reduction, where we reduce from general graphs to graphs with high conductance (i.e., low mixing time).
Proof By Theorem 1 of Bondy and Simonovits [2], any graph G = (V , E) with |E| ≥ 100r · n 1+1/r has a cycle C 2 j for every j ∈ [r , rn 1/r ]. Therefore, we can assume that G has |E| < 100r · n 1+1/r , as otherwise we can simply reject. (Recall that we assume logarithmic diameter, so we can afford to count the edges of the graph.) We use the ( , φ)-expander decomposition of Chang and Saranurak [8], defined next. An ( , φ)-expander decomposition of a graph G = (V , E) is defined as a partition of the vertex set into clusters, V = V 1 ∪ . . . ∪ V , satisfying the following conditions: We set k = 2 and = n − 1 r − 1 α , and compute an ( , φ)decomposition for G so that for every i, . For every inter-cluster edge e ∈ E ic , we invoke the algorithm described in Lemma 24 with error parameter p = 1/(10 |E|) in order to decide if e participates in a 2rcycle. This is done sequentially, and since there are at most |E| = O(n 1− 1 α ) inter-cluster edges, this process terminates in O((2r ) 2r · |E|·log 1/ p) =Õ(n 1−1/α ) rounds. By taking a union bound over the edges in E ic , it holds that if any edge e ∈ E ic participates in a 2r -cycle in G, then it would have been detected with high probability.
If no edge in E ic detects a cycle, then it remains to decide whether there exists a 2r -cycle within any of the clusters V i . This is true since all the inter-cluster edges were removed, so that any existing 2r -cycle will be completely contained in one of the clusters. Therefore, solving C 2r -freeness on G is reduced to solving C 2r -freeness on components in parallel, each having conductance Φ =Ω(( /polylogn) 20·3 k ) = Ω(n − 180 r − 180 α ). The number of rounds required for the reduction isÕ(n 1− 1 α + n 1 2 + 180 r + 180 α ) which for r > 360 and α = 362r r −360 is bounded byÕ(n 1− 1 α ). Next, we show that "simple" circuits can be simulated in the CONGEST model in a number of rounds that depends on the mixing time of G, as well as the number of wires and the depth of the circuit. We first formally define the notion of a circuit.

Definition 10
A circuit C is a directed acyclic graph, where the nodes of the graph represent gates from some class of Boolean functions, and the edges represent wires. The depth of C is the path of the longest path from any input node to any output node. The fan-in of the circuit is the maximal in-degree of any node in the graph, and the fan-out is the maximal out-degree.
In [14], Drucker, Kuhn and Oshman prove that if a function f can be computed by a circuit C of depth R with b-separable gates and N = n 2 · s wires, then there exists an O(R)-rounds protocol P for f in the CLIQUE-UCAST model on graphs with n vertices and bandwidth O(b+s), where b-separability is some measure of a gate's simplicity.
Here we prove a weaker variant of the above that is suitable for the CONGEST model.

Lemma 26
Let U be a graph U = (V , E) with |V | = n vertices and |E| = n 1+δ edges, with mixing time τ mix . Suppose that f : {0, 1} cn 1+δ log n → {0, 1} for some c > 1 is computed by a circuit C of depth R, comprising of gates with constant fan-in and fan-out, and at most O(c · s · n 1+δ log n) wires for s ≤ n. Then for any input partition that assigns to each vertex in U no more than c deg(v) log n input wires, there is an O(R · c · s · τ mix · 2 O( √ log n log log n) )-round protocol in the CONGEST model on U that computes f under the input partition.
Proof The proof follows the steps of the proof of Theorem 2 in [14]: Assigning gates to vertices. Since each gate has at least one wire, and the total number of wires is c·s·n 1+δ log n, there can be at most c · s · n 1+δ log n ≤ cs · |E| log n gates. Therefore, we can partition the gates between the vertices, such that each vertex v is assigned O(c · s · deg(v) log n) gates, where deg(v) is the degree of v. Evaluating the circuit. As in [14], we partition the gates of C into R + 1 layers L 0 , . . . , L R according to their depth in the circuit, with L 0 being the input wires and L R the output gate. We evaluate the circuit in R stages, each corresponding to one layer of the circuit. We shall prove that we can evaluate all gates in a single layer in O(c · s · τ mix · 2 O( √ log n log log n) ) rounds, thus concluding the proof.
We prove by induction on the number of layers, R + 1. The input layer L 0 requires no evaluation. Therefore, assume that we have already evaluated the circuit up to layer L i−1 , and we wish to evaluate L i . In order to compute all gates in L i , each player needs to learn the values of the input wires of its assigned gates. Since each vertex v is assigned at most O(c ·s ·deg(v) log n) gates and each gate has constant fan-in, v needs to learnÕ(c · s · deg(v)) bits. As U has mixing time τ mix , by Theorem 12, this takes O(c·s·τ mix ·2 O( √ log n log log n) ) rounds.
Corollary 3 Let C be a constant fan-in and fan-out circuit C with depth R and N = s ·n 1+δ log n wires for s ≤ n that compute C 2r -freeness for any given graph over n vertices. Then there exists a CONGEST protocol that solves C 2r -freeness for G inÕ(n 1− 1 α + s R · n √ log n log log n) ) protocol for solving C 2r -freeness on G[V i ]. As the clusters are edge-disjoint, these protocols can be computed in parallel. Therefore, C 2rfreeness on G can be computed inÕ(n 1− 1 α + s R · n 360 r + 360 α · 2 O( √ log n log log n) ) rounds.
The above implies that there exists a constant c ≥ 1 361 2 such that any lower bound of the formΩ(n 1−c ) on C 2rfreeness imply lower bounds far beyond the reach of current circuit complexity techniques.

Corollary 4
For any constant c ≥ 1 361 2 such that if C 2rfreeness for graphs over n vertices cannot be computed in CONGEST in Ω(n 1−c /2 O( √ log n log log n) ) rounds, then C 2rfreeness cannot be computed by constant fan-in and fan-out circuits with s · n 1+δ wires for s = n (1−c)/2 and depth R =õ(n (1−c)/2 · 2 O( √ log n log log n) ).
Proof By Corollary 3, C 2r -freeness on G can be computed inÕ(n 1− 1 α + s R · n 360 r + 360 α · 2 O( √ log n log log n) ) rounds. Since α is decreasing with r and r > 360, we have that α ≤ 362 2 . Therefore, the above bound is bounded bỹ O(s R · n 360/361+1/362 · 2 O( √ log n log log n) + n 1− 1 361 2 ). Hence, any lower bounds of the formΩ(n 1−c /2 O( √ log n log log n) ) for c ≤ 1 361 2 on the number of rounds required to compute with high constant probability C 2r -freeness on a graph G over n vertices in the CONGEST model, imply a lower bound of Ω( n (1−c)/2 ·2 O( √ log n log log n) ) on R for constant fan-in and fan-out gate circuits with sn 1+δ log n wires.

Barrier of Ä(n˛) for lower bounds on triangle freeness
We now turn our attention to triangles, and show that for any constant α > 0, a round complexity lower bound of Ω(n α ) on triangle freeness implies strong circuit complexity lower bounds. This reduction is strongly based on the algorithm of [7,8] for triangle enumeration, which in essence reduces the problem to solving a related problem on high conductance graphs. In combination with Lemma 26 above, we obtain the barrier. The first step is to reduce from general graphs to graphs with high conductance: let A 2 be an algorithm that solves triangle freeness in a communication network with conductance Φ and n nodes, where every node is given O(deg(v)) edges as additional input (which are not necessarily edges v is incident to). Let T 2 (n, Φ) be the round complexity of A 2 . We show that given such an algorithm we can solve triangle freeness in O(A 2 (n, polylogn) log n) rounds in CONGEST.
First, the network computes the decomposition of Theorem 21 with = 1/6, taking k to be a large enough constant so that the round complexity of the decomposition is less than O(n α ). A node is called good if it has more edges in E m than in E r , and otherwise bad. We call an edge e ∈ E m bad if at least one of its endpoints is bad.

Lemma 27 ([7]) The number of bad edges is at most 2 m.
After computing the decomposition, each cluster C calculates the number of vertices and the number of edges in the cluster. Since Φ = Ω(1), the diameter of each cluster is O(1), and therefore this can be done in O(1) rounds (for example, by constructing a spanning tree on the cluster, and collecting the number of edges and nodes up the tree). Then, each cluster runs A 2 in parallel, where the input of each good node is all its edges (including edges leaving the cluster), and for each bad node, its edges in the cluster. We note that indeed by the definition of a good node, every node has O(deg C (v)) edges as input for A 2 , where deg C (v) is v's degree in its cluster. If A 2 outputs that there exists a triangle, the cluster rejects and terminates. Otherwise, each cluster C removes all good edges which are contained in C. The network then recurses on the remaining edges until O(1) edges remain. Since |E good | = m − |E bad | − m ≥ m − 3 m = m/2, in each iteration the network removes half of its edges, and the number of iterations are at most O(log n).
Clearly, if a node rejects then the graph contains a triangle with high probability (by the correctness of A 2 ). On the other hand, recall that the input graph of A 2 consists of the edges incident to good nodes in C; therefore if A 2 returns that there is no triangle, the network may safely remove all edges between two good nodes, as the triangle is contained in the union of inputs of both good endpoints. This concludes the reduction from general graphs to graphs with high conductance.
Next, we show how to simulate any circuit from a "simple" family of circuits on graphs with high conductance.
Consider the following family of functions f N : {0, 1} 2N log N → {0, 1}: given an encoding of a graph 4 with at most N edges, f returns 1 iff the graph contains a triangle.

Corollary 5
If triangle freeness cannot be solved in less than c 1 n α rounds for any c 1 > 0, then there exist constants c 2 , c 3 > 0 such that there is no family of circuits that solve for all N the function f N with c 2 N α/4 /2 c 3 √ log n log log n depth and at most c 2 N 1+α/4 /2 c 3 √ log n log log n wires.
Proof Let F be an infinite family of graphs for which triangle freeness cannot be solved in less than c 1 n α rounds. Let c 4 > 0 be a constant such that the conductance of the clusters obtained by Theorem 21 is greater than 1/ log c 4 n. Assume by contradiction that for sufficiently large c 2 , c 3 there is an infinite family of circuits solving for any N the function f N with c 2 N α/4 /2 c 3 √ log n log log n depth and c 2 N 1+α/4 /2 c 3 √ log n log log n wires. Then by Lemma 26 taking s = n α/2 /2 c 3 √ log n log log n and R = n α/2 /2 c 3 √ log n log log n (both of which are larger than c 2 (n 2 ) α/4 /2 c 3 √ log n log log n ) there exists an algorithm with round complexity c 1 n α / log n that solves triangle freeness on graphs with conductance at least Φ ≥ 1/ log c 4 n, where the size of the input of each node is bounded byÕ(deg(v)). By the reduction, we get that there is a c 1 > 0 such that triangle freeness can be solved in any network with c 1 n α rounds, which is a contradiction.