Abstract 1 Introduction 2 Techniques 3 Preliminaries 4 Slackness profile and densification 5 Application to clique 6 Application to balanced biclique 7 A lower bound on densification 8 Conclusions and Future Work References Appendix A Concentration inequalities

On Finding Randomly Planted Cliques in Arbitrary Graphs

Francesco Agrimonti ORCID Parma, Italy Marco Bressan ORCID Department of Computer Science, University of Milan, Italy Tommaso d’Orsi ORCID Department of Computing Sciences, Bocconi University, Milan, Italy
Abstract

We study a planted clique model introduced by Feige [18] where a complete graph of size cn is planted uniformly at random in an arbitrary n-vertex graph. We give a simple deterministic algorithm that, in almost linear time, recovers a clique of size (c/3)O(1/c)n as long as the original graph has maximum degree at most (1p)n for some fixed p>0. The proof hinges on showing that the degrees of the final graph are correlated with the planted clique, in a way similar to (but more intricate than) the classical G(n,1/2)+Kn planted clique model. Our algorithm suggests a separation from the worst-case model, where, assuming the Unique Games Conjecture, no polynomial algorithm can find cliques of size Ω(n) for every fixed c>0, even if the input graph has maximum degree (1p)n. Our techniques extend beyond the planted clique model. For example, when the planted graph is a balanced biclique, we recover a balanced biclique of size larger than the best guarantees known for the worst case.

Keywords and phrases:
Computational Complexity, Planted Clique, Semi-random, Unique Games Conjecture, Approximation Algorithms
Category:
APPROX
Copyright and License:
[Uncaptioned image] © Francesco Agrimonti, Marco Bressan, and Tommaso d’Orsi; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Computational complexity and cryptography
; Mathematics of computing Discrete mathematics ; Mathematics of computing Approximation algorithms ; Mathematics of computing Graph algorithms
Related Version:
Previous Version: https://arxiv.org/abs/2505.06725
Editors:
Alina Ene and Eshan Chattopadhyay

1 Introduction

Finding large cliques in a graph is a notoriously hard problem. Its decision version was among the first problems shown to be 𝖭𝖯-complete [31]. In fact, it turns out that for any ε>0 it is 𝖭𝖯-hard to find a clique of size nε even in graphs containing cliques of size n1ε [26, 39, 32].

A large body of work [14, 25, 2, 30, 17, 9, 29] focused on designing polynomial time algorithms to find large cliques given an n-vertex graph containing a clique of size cn. When c<1/logn, the best algorithm known only returns a clique of size O~(log(n)3) [17]. For larger values of c it is possible to find a clique of size O(cn)O(c), which is of order nΩ(1) when the largest clique in the graph contains a constant fraction of the vertices [9, 2]. The current algorithmic landscape further suggests a phase-transition phenomenon around c=12. For sufficiently small ε>0 and c=12ε there exists an algorithm finding a clique of size n1O(ε) [30]. Instead, when c=12+ε, one can efficiently find a complete graph of size 2εn via a reduction to the classical 2-approximation algorithm for vertex cover. Finally, finding a clique of size Ω(εn) for c=12ε was shown to be UGC-hard in [33, 7].

Table 1: Performance of state-of-the-art efficient algorithms for clique when a clique of size cn exists in the graph (note that c can depend on n).
Regime Output clique References
c12+ε 2ϵn [14]
cΩ(1) (cn)Ω(c) [2]
c1/logn Ω(ncc) [9, 25]
any c>0 Ω(log3(cn)log2log(cn)) [17]

Given the grim worst-case picture, a substantial body of work has focused on designing algorithms that perform well under structural or distributional assumptions on the input graph. One research direction has investigated clique and related problems on graphs satisfying expansion or colorability properties [4, 15, 34, 6]. Another line of work has explored planted average case models [31, 27, 35, 3, 20, 19, 13, 22]. In the planted clique model, the input graph is generated by sampling a graph from the Erdős-Rényi distribution ER(n,12) and then embedding a clique of size cn by fully connecting a randomly chosen subset of vertices. Here, basic semidefinite programming relaxations [20, 21], as well a simple rounding of the second smallest eigenvector of the Laplacian [3], are known to efficiently recover the planted clique whenever c1/n. Lower bounds against restricted computational models further provide evidence that these algorithmic guarantees may be tight [23, 8].

In an effort to bridge the worst-case settings and the average-case settings, Feige and Kilian [19] introduced a semi-random model in which the above planted clique instance is further perturbed by: (i) arbitrarily removing edges between the planted clique K and the remainder of the graph GK, and (ii) arbitrarily modifying the subgraph induced by GK. The randomness of this model lies in the cut (K,GK) which separates the clique from the rest of the graph. A flurry of works [12, 37, 10] led to an algorithm that, leveraging the randomness of this cut, can recover a planted clique of size n12+ε in time nO(1/ε). This picture suggests that, from a computational perspective, this semi-random model may be closer to the planted average case model than to worst-case graphs. (Information theoretically the semi-random model differs drastically from the planted clique model [38].)

To better understand the role of randomness in the clique problem, Feige [18] proposed another model in which a clique is randomly planted in an arbitrary graph, and asked what approximation guarantees are efficiently achievable in this setting. In comparison to the aforementioned semi-random case, here the randomness only affects the location of the clique but not the topology of the rest of the graph.

Investigating this model is the main focus of this paper. We provide a first positive answer to Feige’s question, showing that a surprisingly simple deterministic algorithm achieves significantly stronger guarantees than those known for the worst-case settings, for a wide range of parameters. Our results suggest that this model may sit in between the average case and the worst case regimes.

1.1 Results

To present our contributions we first formally state our random planting model. In fact, as our results extend beyond clique, the model we state is a generalization of the one in [18].

Definition 1 (Random planting in arbitrary graphs).

Let G and H be graphs with |V(H)||V(G)|. 𝒢(G,H) describes the following distribution over graphs:

  1. 1.

    Sample a random uniform injective mapping ϕ:V(H)V(G).

  2. 2.

    Return G^ with V(G^)=V(G) and E(G^)=E(G){{ϕ(u),ϕ(u)}:{u,u}E(H)}.

When H is the cn-sized complete graph Kcn, Definition 1 corresponds to the planted clique model of [18]. In this specific setting we obtain the following result.

Theorem 2 (Simplified version).

There exists a deterministic algorithm 𝒜 with the following guarantees. For every c(0,1) and every n-vertex graph G, if G^𝒢(G,Kcn) then 𝒜(G^) with probability at least 11n2 returns a clique of size at least:

n5(c3)4clog2p

where p=1Δn and Δ is the maximum degree of G. Moreover 𝒜 runs in time O~(G^).

To appreciate the guarantees of Theorem 2 consider the setting p=Ω(1), so that Δ=(1p)n is bounded away from n. In this case, for every fixed c>0 the algorithm of Theorem 2 finds with high probability a clique of size Ω(n). In the worst case, however, this is not possible unless the Unique Games Conjecture fails. More precisely, assuming UGC, no polynomial-time algorithm can find a clique of size εn even when one of size (12ε)n exists [33, 7]; indeed, state-of-the-art algorithms [2, 9, 25] are only known to return cliques of size nO(c). By adding np1p isolated vertices to the graph, it also follows that under UGC one cannot efficiently find a clique of size εn even when one of size (1p2ε)n exists and the input graph has degree Δ(1p)n, as in the statement of Theorem 2. Thus, unless UGC fails, we cannot expect Theorem 2 to hold in the worst case. We remark that Theorem 2 also guarantees to recover cliques of size nΩ(1) for cΩ(loglognlogn), a regime in which worst-case algorithms are only known to find cliques of size polylog(n). Finally, as one can expect, the failure probability can be actually made smaller than na for any desired a1; see the full formal version of Theorem 2 in Section 5.

Note that the performance of our algorithm deteriorates as p approaches 0; that is, as the maximum degree approaches n. While it remains an open question whether some assumption on the degree is inherently necessary, we provide some preliminary evidence in Theorem 5, see Section 2 and Section 7.

Our results extend beyond the case where the planted graph is a complete graph. To illustrate this, we also consider the balanced biclique problem, where the goal is to find a largest complete balanced bipartite subgraph. The balanced biclique problem has a long history [24, 28, 1] and a strong connection to clique [11]. Assuming the Small Set Expansion Hypothesis, there is no polynomial-time algorithm that can find a balanced biclique within a factor n1ε of the optimum for every ε>0, unless 𝖭𝖯𝖡𝖯𝖯 [36]. Remarkably, in the worst case, the bicliques that existing algorithms are known to return are significantly smaller than the complete graphs found in the context of clique. In fact, the best algorithm known [11] works through a reduction to clique which constructs an instance with a complete graph of size O(c2n) from a balanced biclique instance with a biclique of size cn.

In comparison, under Definition 1, we obtain the following guarantees.

Theorem 3 (Simplified version).

There exists a deterministic polynomial-time algorithm 𝒜 with the following guarantees. For every c(0,1) and every n-vertex graph G, if G^𝒢(G,Kcn2,cn2) then 𝒜(G^) with probability at least 11n2 returns a balanced biclique of size at least:

c482clogn2.

Moreover 𝒜(G^) runs in time O~(G^).

The main point of Theorem 3 is again the difference with the worst case bounds. In the worst case, existing algorithms are known to find a biclique of size (logn)ω(1) only if there exists one of size cnω(loglognlogn)n in the input graph. In contrast, Theorem 3 states that in typical instances from 𝒢(G,Kcn2,cn2) we can efficiently find a biclique of size (logn)ω(1) whenever there exists one of size cnω((loglogn)2logn)n; that is, for value of c up to loglognlogn times smaller than for the worst case. Furthermore, unlike the bounds of Theorem 2, the ones of Theorem 3 are insensitive to the structure of G, and in particular to its maximum degree.

2 Techniques

This section gives an intuitive description of our techniques, using the planted clique problem as a running example. Let G be an arbitrary n-vertex graph, and let G^𝒢(G,Kcn). For simplicity, we suppose that c>0 and p>0 are fixed constants, and that G has maximum degree Δ(1p)n. Let ϕ:V(Kcn)V(G) be the injective mapping sampled in the process of constructing G^. Because we have almost no knowledge of the global structure of G, it appears difficult to recover the planted clique via the topology of G^ without running into any of the barriers observed in worst-case instances.

On the other hand, since the clique is planted randomly, we can expect certain basic statistics to change in a convenient and somewhat predictable way between G and G^. Our approach focuses on perhaps the simplest such statistic – the degree profile – guided by the intuition that vertices with higher degree in G^ are more likely to belong to the planted clique than those with lower degree. For notational convenience we use the degree in the complement graph, which we call slack. To be precise, for a vertex vV(G), the slack of v in G is sv=(n1)dv where dv is the degree of v in G. In the same way we define the slack of v in G^ as s^v=(n1)d^v, where d^v is the degree of v in G^.

To formalize the intuition above, suppose G contains a subset VV such that (i) the vertices of V have approximately the same slack, in the sense that if s:=minvVsv, then any vV satisfies

sv(1c2)<s,

and (ii) the set V<s(G) of vertices in G with slack smaller than s has size at most, say, c10|V|. Because the map ϕ is chosen uniformly at random, we expect a c fraction of V will be in the image of ϕ. Furthermore, every vV in the image of ϕ acquires csv new neighbors in expectation, which by (i) gives:

𝔼ϕ[s^v]sv(1c)<s.

In fact, as long as |V| and s are large enough (roughly Ω(c1logn)), by standard concentration bounds at least c2|V| vertices of V will be in the image of ϕ, and all those vertices v will satisfy s^v<s. Under these circumstances, by (ii) we conclude that, in G^, the set |V<s(G^)| of vertices having slack smaller than s has size at least c10|V|, and moreover a fraction at least 1/21/2+1/10>0.8 of those vertices form a clique. We can then immediately recover a clique of size Ω(c|V|) via the standard reduction to vertex cover applied to the subgraph of G^ induced by |V<s(G^)|.

The above discussion suggests our intuition is correct whenever a sufficiently large set satisfying (i) and (ii) exists.111We remark that our algorithm does not need to find this set. While arbitrary graphs may not contain such a set, it turns out that the only obstacle towards the existence of a linear size set V is the presence of a large set of vertices of slack strictly smaller than s. Choosing V so that spn we deduce that such a V must exists.

 Remark 4.

The above reasoning works beyond the parameters regime of our example and, in fact, does not require the planted graph to be a clique. In the context of balanced biclique the existence of a large set with slack <s makes the problem easier. Therefore, we are able to drop the assumption on the maximum degree in G.

We complement the intuition above with a lower bound on the performance of degree profiling. Essentially this states that, if we have no guarantees on the maximum degree of G, then the degree profile of G^𝒢(G,Kcn) is uncorrelated with Kcn. Formally:

Theorem 5.

For every c(0,12) and n3 there exists an n-vertex graph G such that G^𝒢(G,Kcn) satisfies what follows with probability at least 11n. For every ordering v1,,vn of the vertices of G^ by nonincreasing degree, and for every j[n], the largest clique in the induced subgraph G^[{v1,,vj}] has size at most

O(nlnnc+cj). (1)

To appreciate Theorem 5 let t=Ω(c2nlnn). Then the theorem says that, if one takes the first t vertices of G^ in order of degree, the largest clique therein has size O(ct) with high probability. In other words, for all t not too small compared to n, the t vertices of highest degree have roughly the same clique density of the entire graph. This suggests that, using degree statistics alone, one has little hope to find cliques larger than O~(n) even for constant c. Note that there is no contradiction with the upper bounds of Theorem 2: those bounds become trivial for large Δ, and the graph behind the proof of Theorem 5 has indeed a large Δ.

3 Preliminaries

Let G be a graph. We let V(G) be its set of vertices and E(G) its set of edges. For VV(G) we let G[V] be the subgraph induced by V. We often use n=|V(G)|. We let G=|V(G)|+|E(G)|. For vV(G), let dv be its degree and sv:=n1dv its slack. For VV(G), let sV:=minvVsv. We write V<s(G):={vV(G)|sv<s}. We do not specify the graph when the context is clear and we define V<s(G^) as V^<s. We let Kn be the complete graph of size n and Ka,b be the biclique with sides of size a and b. We let [n]:={1,,n}, log=log2 and ln=loge.

The computational model is the standard RAM model with words of logarithmic size. Unless otherwise stated, all our graphs are given as adjacency list. By performing a O(n) preprocessing we henceforth assume the adjacency lists are sorted, so that one can perform binary search and check the existence of any given edge in time O(logn).

The following theorem says that, for every fixed c>12, one can efficiently find a clique of size Ω(n) in an n-vertex graph that contains one of size cn.

Theorem 6.

There exists an algorithm, DenseCliqueFinder, with the following guarantees. For every ε>0, if 𝒜 is given in input an n-vertex graph G=(V,E) that contains a clique of size (12+ε)n, then 𝒜 finds in deterministic O~(n2)-time a clique of size 2εn.

The proof is folklore – take the complement of G, find a 2-approximation of the smallest vertex cover through a maximal matching, and return its complement. See also [14].

4 Slackness profile and densification

In this section we prove our structural results on the degree and slackness profile of graphs from Definition 1. We start with a definition.

Definition 7 (Bulging set).

Let G=(V,E) be a graph and α,β>0. A set UV is (α,β)-bulging if:

  1. 1.

    sv<sU1β for all vU.

  2. 2.

    |V<sU|<1α|U|.

The next statement characterizes the existence of (α,β)-bulging sets in any graph based on the value of α and β and the slackness of its vertices.

Lemma 8.

Let G=(V,E) be an n-vertex graph. Then for every β(0,1/2), α2, and s>0 at least one of the following facts holds:

  1. (i)

    |V<s|nα2+1βlogns.

  2. (ii)

    G contains an (α,β)-bulging set U such that |U|nα1+1βlogns and sUs.

Proof.

Let η=β1β>0 and h=log1+ηns. We define a partition of V into h+1 possibly empty sets, as follows:

V0 :=V<s (2)
Vj :=V<s(1+η)jV<s(1+η)j1={vV|s(1+η)j1sv<s(1+η)j}j[h] (3)

It is immediate to see that this is indeed a partition of V, since 0sv<n for every vV. We prove the statement by contradiction. Suppose (ii) does not hold. Then it must be that for j1:

|Vj|<nα2+1βlognsαj, (4)

Indeed, if this was not the case, then one can check that for the smallest j[h] violating Equation 4 the set Vj would be (α,β)-bulging, and moreover every vertex in Vj would have slack at least s (since j1). Suppose further (i) is not verified. Then as the sets Vj form a partition of V, and as α2,

n=j=0h|Vj|<nα2+1βlognsj=0hαj<nα2+1βlognsαh+1 (5)

Now observe that

h1+lognslog(1+η)1+1+ηηlogns=1+1βlogns (6)

where we used the facts that log(1+x)x1+x for all x0, and that η1+η=β. Substituting this bound in Equation 5 yields the absurd n<n. Thus at least one among (i) and (ii) holds. Our next key result states that the subgraph of G^ induced by the set of vertices v with slack s^v<sU, where UV is the bulging set that exists in G for Lemma 8, will contain a large number of vertices of H with high probability.

Lemma 9 (Densification Lemma).

Let G be an n-vertex graph, α2 and c(0,1). Let H be a regular graph with |V(H)|n and minimum degree at least cn10. Let U be an (α,c2)-bulging set of G with min{sU,|U|}12+29alnnc for some a1. Finally, let G^𝒢(G,H), and let H^ be the image of H in G^. Then, with probability at least 1na the set V^<sU satisfies:

  1. (i)

    |V^<sUH^|>c2|U|.

  2. (ii)

    |V^<sUH^|>cα2|V^<sUH^|.

Proof.

First, we claim that H^UV^<sU with high probability. Consider any vU, and note that vV^<sU means s^vsU. Now, if vH^, then s^v=svX, where X=i=1svXi is the sum of non-positively correlated Bernoulli random variables of parameter c=c1n. The event s^vsU is therefore the event XsvsU; since svsUc2sv, as vU and U is (α,c2)-bulging, this implies the event Xc2sv. Now, as cn10, then c910c, and c2sv59csv. Since moreover 𝔼X=csv, we conclude that s^vsU implies the event X(14/9)𝔼X. We then use Lemma 16 with ε=4/9. To this end note that:

𝔼X=csv910csU>10+26alnn13(ln2+(1+a)lnn)=13ln(2na+1) (7)

Therefore:

Pr[vV^<sU]=Pr[s^vsU]Pr[X(14/9)𝔼X]e(4/9)22+4/9𝔼X<e𝔼X13<12na+1 (8)

By a union bound over all vU we conclude that Pr[H^UV^<sU]12na.

Next, consider |H^U|. Note that |H^U|=X where again X=i=1|U|Xi is the sum of non-positively correlated Bernoulli random variables of parameter c. Using again Lemma 16 with ε=4/9, and noting as done above that 𝔼X=c|U|>13ln(2na+1), we obtain:

Pr[|H^U|c2|U|]=Pr[X(14/9)𝔼X]<12na+1<12na (9)

Finally, let S:=V^<sU. The bounds above show that, with probability at least 1na, we have UH^SH^ and |UH^|>c2|U|, which implies |SH^|>c2|U|, that is, (i).

Moreover SH^V<sU, which implies |SH^|<1α|U| as U is (α,c2)-bulging. Together with (i) we conclude that:

|SH^||SH^|>cα2 (10)

which proves (ii).

5 Application to clique

In this section we prove Theorem 2, which we restate in a fully formal way and with more general probabilistic guarantees.

Theorem 10.

There exists a deterministic algorithm 𝒜 with the following guarantees. Fix any a1. Let c:=c(n)ω(1logn), and define:

K(n,c,p)n5(c3)2+2clog2p.

Then for every n large enough and every n-vertex graph G what follows holds. Letting p=1Δn where Δ is the maximum degree of G, if K(n,c,p)1+2alnn, then 𝒜 on input G^𝒢(G,Kcn) returns a clique of G^ whose size is at least K(n,c,p) with probability at least 1na. Moreover 𝒜(G^) runs in time O~(G^) for every input graph G^.

Proof.

We start by proving that Algorithm 1 runs in time O~(n3) and guarantees a clique of size 75K(n,c,p) with the prescribed probability. We then show how to lower the running time to O~(G^) while reducing the clique size to K(n,c,p).

Algorithm 1 CliqueFinder(G^).

The inequalities we are going to claim assume n is indeed sufficiently large (formally, larger than some n0 that may depend on a). To begin with, observe that if c1logn or pn1/2 then K(n,c,p)1 and therefore our algorithm certainly satisfies the bound of Theorem 10. Indeed, if c1logn then the second multiplicative term in the expression of K(n,c,p) satisfies:

(c3)2+2clog2p(13logn)2+2logn<9logn5n (11)

If instead pn1/2 then the same term satisfies:

(c3)2+2clog2p(13)2logn=3logn5n (12)

Thus we may assume cp1nlogn, and therefore:

cp13+29alnnn12+29alnnn+cn (13)

Now let s=pn1. Then s=(n1)Δ; hence all vertices of G have slack at least s, and therefore |V<s|=0. Now apply Lemma 8 with α=3c and β=c2. Note that item (i) fails, thus item (ii) holds. Therefore G contains a (3c,c2)-bulging set U such that:

|U|n(3c)1+2clogns=n(c3)1+2clognsn(c3)1+2clog2p15cK(n,c,p) (14)

where we used the fact that p1n and that n is large enough to obtain that ns2p. Thus, when K(n,c,p)1+2alnn we have |U|12+29alnnc.

Moreover sUs=pn; using Equation 13 this yields sU12+29alnnc, too. We can then apply Lemma 9. It follows that, with probability at least 1na, we have |V^<sUH||V^<sUH|>cα2>32. We deduce that G[V^<sU] contains a clique whose density is at least:

|V^<sUH||V^<sU|=|V^<sUH||V^<sUH|+|V^<sUH|>|V^<sUH|(23+1)|V^<sUH|=12+110. (15)

With the same probability we have simultaneously that |V^<sU|c2|U|>7K(n,c,p). Now consider the invocation of DenseCliqueFinder on G[V^<sU]. By Theorem 6, that invocation finds a clique of size at least:

7K(n,c,p)(2110)=75K(n,c,p) (16)

Next, we bring the running time in O~(n2) while ensuring an output clique of size K(n,c,p). To this end, change the loop at line 3 so as to iterate only over i in the form i=(1+η)j for some η>0. For the smallest i=(1+η)j such that V<sU{v1,,vi} the subgraph G^[v1,,vi] will then have clique density 12+1101+η and will contain V^<sU plus at most η|V^<sU| other vertices. By choosing η>0 sufficiently small one can then ensure that DenseCliqueFinder when ran on G^[v1,,vi] returns a clique of size at least K(n,c,p). The total number of iterations is in O(log1+ηn)=O(logn), and by Theorem 6 every iteration takes time O~(n2), giving a total time of O~(n2) too.

To finally bring the running time in O~(G^), upon receiving G^ we check whether G^(n/logn2). If that is the case then c1logn and K(n,c,p)1 as shown above; in this case we return any vertex of G^. Otherwise, we run the algorithm above. In both cases the bounds are satisfied and the running time is in O~(G^).

6 Application to balanced biclique

In this section we restate and prove a more formal and general version of Theorem 3:

Theorem 11.

There exists a deterministic polynomial-time algorithm 𝒜 with the following guarantees. Fix any a1. Let c:=c(n)ω(1logn). For every n large enough and every n-vertex graph G, when given G^𝒢(G,Kcn2,cn2) in input, 𝒜 with probability at least 1na returns a balanced biclique of size at least:

c482clogn2.

Moreover 𝒜(G^) runs in time O~(G^) for every input graph G^.

The algorithm behind the theorem, Algorithm 2, is based on the following intuition. Observe that our main technical result, Lemma 8, essentially says that every graph G contains either (i) a large number of vertices of small slack (and thus large degree), or (ii) a large bulging set. If (i) holds, then we can hope to find a large biclique by just intersecting the neighborhoods of those vertices (namely, of the k vertices with largest degree for some suitable value of k). If (ii) holds, then we can hope to find a large biclique by exploiting the “densification” phenomenon used by our clique algorithm (see Section 5). The structure of the algorithm follows this intuition, with a first phase that finds a large biclique if (i) holds and a second phase that finds a large biclique if (ii) holds.

Algorithm 2 BalancedBicliqueFinder(G^).

Before delving into the proof, we need a certain subroutine that “extracts” a large balanced biclique of a graph G when given a subset S of vertices of some (larger) balanced biclique of G. This is the subroutine BicliqueExtractor appearing at line 11 of Algorithm 2, and it plays a role similar to the one played by DenseCliqueFinder in the case of clique.

Lemma 12.

There exists a deterministic algorithm 𝒜 with the following guarantees. Let G=(V,E) be an n-vertex graph containing a balanced biclique with sides A and B. Given in input G and SAB, algorithm 𝒜 returns a biclique of G with sides L,R such that min(|L|,|R|)|S|3. The running time of 𝒜 is O(|S|nlogn).

Proof.

We prove that Algorithm 3 satisfies the statement.

Algorithm 3 BicliqueExtractor(G,S).

First, let us prove that the algorithm returns sets L,R that are sides of a complete biclique and such that min(|L|,|R|)>|S|3. We begin by noting the following crucial fact: for each i=1,,r we have V(Gi)A or V(Gi)B. Suppose, in fact, that there exist uV(Gi)A and vV(Gi)B. Since Gi is connected, along any path from u to v in Gi there must exist an edge whose vertices belong to A and B, respectively. Without loss of generality we can thus assume that u and v are such vertices. By definition of Gi this means that u and v are not adjacent in G, a contradiction. Now we distinguish the two cases on which the algorithm branches.

  1. 1.

    |V(G1)|>|S|3.

    In this case, as V(G1)A or V(G1)B, by construction of R we have RB or RA. Therefore:

    |R||A|=|AB|2|S|2|S|3 (17)

    Hence, min(|L|,|R|)>|S|3. Moreover note that L,R are sides of a biclique by construction of R.

  2. 2.

    |V(G1)||S|3. Then by the ordering of G1,,Gr we have |V(Gi)||S|3 for all i=1,,r. Note how this implies that the index i computed by the algorithm satisfies:

    |j=1i1V(Gj)||S|3 (18)

    Therefore:

    |S|3<|j=1iV(Gj)|=|j=1i1V(Gj)|+|V(Gi)||S|3+|S|3=2|S|3 (19)

    This implies again min(|L|,|R|)|S|3. Moreover L,R form again the sides of a biclique; this is because G1,,Gr are connected components of G¯[S], hence in G all edges are present between V(Gj) and V(Gj) for every distinct j,j.

We now analyze the running time of the algorithm. Computing G¯[S] takes time O(|S|2logn) by checking for each of the edges in the sorted adjacency lists of G. Computing and sorting the connected components G1,,Gr takes time O(|S|2+|S|log|S|). The case |V(G1)|>|S|3 requires time O(|V(G1)|n)=O(|S|n) if the intersection of the neighborhoods is done using a bitmap indexed by V(G). The case |V(G1)||S|3 takes time O(|S|). We conclude that the algorithm runs in time O(|S|2logn+|S|n)=O(|S|nlogn). We are now ready to prove Theorem 11.

Proof of Theorem 11..

We prove the biclique size guarantees and the running time bounds separately.

Guarantees.

Let β=c2, and define:

f(α,n,s):=nα2+1βlogns (20)

We begin by showing that, whenever snf(α,n,s) and αmax(2,f(α,n,s)),

  1. 1.

    If item (i) of Lemma 8 holds, then the first phase of Algorithm 1 finds a biclique with at least 23f(α,n,s) vertices per side.

  2. 2.

    If item (ii) of Lemma 8 holds, then the second phase of Algorithm 1 finds with high probability a biclique with at least c6f(α,n,s) vertices per side.

Since by Lemma 8 itself at least one of items (i) and (ii) holds, the algorithm finds with high probability a balanced biclique of size Ω(cf(α,n,s)). We will then choose α,s that satisfy the constraints above while (roughly) maximizing f(α,n,s).

We prove 1. To ease the notation let f=f(α,n,s). If item (i) of Lemma 8 holds, then |V<s|f, hence in G (and thus in G^) there are at least f vertices of slack smaller than sn/f. By a simple counting argument, any k of those vertices have at least nk(s+1) neighbors in common. Choosing k=2f3, and using the fact that sn/f, the common neighbors are at least n2f3(n/f+1)=n2f3. Now observe that, since α2, then n4f, hence n2f32f3. We conclude that the loop at line 3 of Algorithm 2 eventually returns a biclique whose smallest side has at least 2f3 vertices.

We prove 2. If item (ii) of Lemma 8 holds, then G contains an (α,c/2)-bulging set U of size at least αf. Let S=V^<sU. Leveraging Lemma 9 through the same arguments used in the proof of Theorem 10, as long as αf and s are in Ω(c1alogn) and sufficiently large, with probability at least 1na we have |SH|>c2|U| and |SH|>cα2|SH|, where H is the set of vertices of the planted biclique. Now consider any ordering of S (in particular the one given by the degrees). If |SH|=, then the ordering itself is a sequence of elements of H of length |S|=|SH|>c2|U|cαf2. If instead |SH|, as |SH|>cα2|SH| the pigeonhole principle implies that the ordering contains a contiguous sequence of vertices of H of length at least cα2, and therefore are least cf2 as we are assuming αf. We conclude that in any case the ordering of S contains a contiguous sequence of vertices of H of length min(cf2,cαf2)=cf2. By construction Algorithm 2 eventually runs BicliqueExtractor on that sequence and thus, by Lemma 12, finds a biclique with at least cf6 vertices per side.

It remains to choose suitable values of α,s so as to approximately maximize f subject to the constraints snf(α,n,s) and αmax(2,f(α,n,s)). The argument above then yields with probability 1na a biclique with Ω(cf(α,n,s)) vertices per side. We set:

s =nf (21)
α =f (22)

This yields the equation:

α=f(α,n,s)=nα2+1βlogns=nα2+1βlogα (23)

Recalling that β=c/2, rearranging, and taking logarithms yields:

2clog2α+3logαlogn=0. (24)

Solving for logα gives:

logα=3+9+8lognc4c=916c2+c2logn34c>c2logn1 (25)

We conclude that:

α>2c2logn1 (26)

Notice that by definition s,α satisfy the constraint αmax(2,f(α,n,s)) as long as α=f2. Since we are assuming that cω(1logn), Equation 26 guarantees that f=α2 holds for large enough n.

Finally, the lower bound above on the size of each side of the biclique is thus:

cf6=cα6>c122c2logn (27)

Running time.

We describe a variant of BalancedBicliqueFinder that runs in time O~(G^) and finds a biclique of size at least 1/4-th of that of the original algorithm. As a first thing, we compute |E(G^)|. If |E(G^)|<(cn2)2, then necessarily c1logn, and the bound of Theorem 11 is smaller than 1. In this case we immediately return any edge of G, satisfying the bounds. If instead |E(G^)|(cn2)2 then we run the O~(n2)-time variant of Algorithm 2 described below. This makes the running time in O~(G^) in every case. As a byproduct, the lower bound on the biclique size will shrink by a factor 45.

The variant of Algorithm 2 is as follows. First, observe that the loop of line 3 can be implemented in O~(n2) total time by computing R incrementally (this can be done either via a bitmap or using binary search over the sorted adjacency lists). For the loop at line 10, we reduce the running time by coarsening. Instead of iterating over all 1i<jn, for each h=1,,logn we iterate over all subsequences vi,,vj with i=k2h and j=k2h+k1, for k=0,1,2,. Clearly, for every contiguous subsequence S of v1,,vn, we will iterate over some subsequence SS with |S||S|/4. The bound on the size of the biclique thus decreases by a factor of 4. The running time can be easily bounded by noting that, for every h=1,2,, the total cost of invoking BicliqueExtractor on all the subsequences of size 2i is in O~(n2) by Lemma 12. As the loop iterates over O(logn) values of i, we conclude that the second phase takes O~(n2) time overall.

7 A lower bound on densification

In this section we prove Theorem 5. This shows that, whenever c<1/2, there exist arbitrarily large graphs G such that the high degree profiles of typical instances from 𝒢(G,Kcn) are essentially uncorrelated with the planted clique.

Throughout the section, for a graph G we let κ(G) be the size of the largest clique in G. We start by defining a graph H that has between one and two vertex for every degree (or, equivalently, every slack) from 1 to n1. Let H=(V,E) where V=[n] for n3, and

E={{u,v}:u,vV,uv,u+vn+1}. (28)

Note that NH(u)=[1,nu+1]{u}; hence

su={u1un+12u2u>n+12 (31)

This implies that, for every 0sn1,

Vs[s+1,s+2]. (32)

The graph G of Theorem 5 is a perturbation of H as given by the next result.

Lemma 13.

Let G be an n-vertex graph, let η[0,1], and let G be obtained from G by deleting each edge independently with probability η. For every a>1, with probability at least 12n1a:

  1. 1.

    κ(G)<2alnnη+1.

  2. 2.

    |Vs||Vs| for all s0, where s=s+ηn+anlnn.

Proof.

Item 1. Fix UV on k2alnnη+1 vertices. Then:

Pr[G[U] is a clique](1η)(k2)<eη(k2)=ekηk12ekalnn=nak. (33)

Taking a union bound over all U yields Pr[κ(G)k]<n(1a)kn1a.

Item 2. Fix uV. Then su=su+i=1duXi, where the Xi are independent Bernoulli random variables with parameter η. By Hoeffding’s inequality, for every t0,

Pr[susu+ηn+t]Pr[susu+ηdu+t]et2du<et2n (34)

For t=anlnn we obtain Pr[susu+ηn+anlnn]na. This implies that, for every s0, every vVs satisfies vVs with probability 1na, where s=s+ηn+anlnn. By a union bound we conclude that, with probability 1n1a, we have |Vs||Vs| for all s0. As a corollary we get the graph G used in the proof of Theorem 5:

Corollary 14.

For every η[0,1] and every n3 there exists an n-vertex graph G such that:

  1. 1.

    κ(G)4lnnη+1.

  2. 2.

    sηn2nlnn|Vs|s+2 for all s0.

Proof.

Apply Lemma 13 to the graph H defined above for a=2, noting that 12n1a>0. The next result bounds the number of vertices of the planted clique that end up having a certain slack in G^.

Lemma 15.

Let c(0,1), let G be any graph, and let G^𝒢(G,Kcn). With probability at least 13n we have simultaneously for all s0:

c|Vs|2nlnn|KV^s|c|Vs|+2nlnn. (35)

where s=s+2nlnn1c.

Proof.

Lower bound. Note that |KV^s||KVs|, and |KVs|=i=1|Vs|Xi where the Xi are non-positively correlated Bernoulli random variables of parameter c. By Hoeffding’s inequality, then, the probability that the lower bound of the claim fails is at most 1n2 for any given s0. By a union bound, thus, the lower bound holds fails for some s with probability at most 1n.

Upper bound. Let vVs, so sv>s. Note that s^v=svi=1svXi, with the Xi non-positively correlated Bernoulli random variables of parameter c. Therefore 𝔼[s^v]=(1c)sv, and:

s=(1c)s2nln2n<(1c)sv2nlnn=𝔼[s^v]2nln2n (36)

By Hoeffding’s inequality we then get Pr[s^vs]1n2. By a union bound this implies that, with probability at least 11n,

KV^sKVss=1,,n1 (37)

Consider then |KVs|. Note that this is a sum of |Vs| non-positively correlated Bernoulli random variables of parameter c. Another application of Hoeffding’s inequality yields with probability at least 11n2:

|KVs|c|Vs|+2nlnn (38)

A final union bound over all s0 and the three events above concludes the proof. We are now ready to prove Theorem 5.

Proof of Theorem 5.

Let η=ac1lnnn for some a>0 to be defined. Let G be the corresponding graph given by Corollary 14, and let G^𝒢(G,Kcn). We begin by observing that it is sufficient to prove Theorem 5 for the case {v1,,vj}=V^s for some s0.

Consider indeed any ordering v1,,vn of the vertices of G^ by nonincreasing degree. Observe that for every j=1,,n there exists s0 and S^V^sV^s1 such that

{v1,,vj}=V^s1˙S^ (39)

Now suppose the bound of Lemma 15 holds. We claim that |S^|(2+a)nlnnc. Indeed:

|S^| |V^s||V^s1| (40)
|V^s||Vs1| Vs1V^s1 (41)
(cs+2nlnn1c+2nlnn)(s1ηn) Lemma˜15 andCorollary˜14 (42)
=(cs+2nlnn1c+2nlnn)(s1anlnnc) definition of η (43)
(2+a)2nlnnc (44)

where in the last inequality we used c12. Now notice that the upper bound of Theorem 5 has an O(2nlnnc) additive term. Therefore, as said, it is sufficient to prove the theorem for the case {v1,,vj}=V^s for some s0.

Consider then any 0sn1. If |V^s|ac1nlnn then Equation 1 is trivially true. Suppose then |V^s|>ac1nlnn. We have:

|KV^s| c|Vs|+2nlnn (45)
c(s+2)+2nlnn item 2 of Corollary 14 (46)
=O(cs+nlnn) definition of s and c12 (47)

By item 1 of Corollary 14, and since VsV^s, we have s|V^s|. As |V^s|>ac1nlnn, we have nlnn<ca|V^s|. Plugging these bounds in the inequality above gives |KV^s|=O(c|V^s|). To conclude, observe that:

κ(G^[V^s])κ(G)+|KV^s| (48)

and that κ(G)lnnη=acnlnn by Corollary 14 and our choice of η. Together with our bound on |KV^s| this gives the claim.

8 Conclusions and Future Work

In this paper, we considered Feige’s semi-random planted clique model, where the input graph is obtained by planting uniformly at random a clique of size cn in an arbitrary n-vertex graph. We presented a simple, deterministic, almost-linear-time algorithm that successfully recovers a clique of size (c/3)O(1/c)n under the mild assumption that the original graph has maximum degree at most (1p)n for some constant p>0. This result suggests a separation from worst-case scenarios: assuming the Unique Games Conjecture, no polynomial-time algorithm can recover cliques of size Ω(n) in general graphs, even when the maximum degree bound holds. Our technique also extends to the case of planting balanced bicliques. Overall, our results show that even the limited randomness provided by planting a clique or biclique in an arbitrary graph can be successfully harnessed by rather simple algorithms.

We leave open three main questions. The first one had already been posed by Feige in the original formulation of the problem: is it possible to prove any hardness result under any complexity assumption? The main obstacle seems to be the presence of randomness in the formulation of the problem, which makes the design of suitable reductions a non-trivial task. The second one concerns removing the degree constraint: is it possible to retain guarantees similar to our, but without any constraint on the maximum degree of the original graph? If this the case, the results in Section 7 suggest that different techniques are required. The third question is whether our results can be extended or generalized to other kinds of planted subgraphs, beyond cliques and balanced bicliques. This may include dense or multipartite graphs, potentially shedding light on how randomly planting a subgraph alone affects the complexity of this kind of problems.

References

  • [1] Noga Alon. The algorithmic aspects of the regularity lemma. In Proceedings., 33rd Annual Symposium on Foundations of Computer Science, pages 473–481, October 1992. doi:10.1109/SFCS.1992.267804.
  • [2] Noga Alon and Nabil Kahale. Approximating the independence number via the ϑ-function. Mathematical Programming, 80:253–264, February 1998. doi:10.1007/BF01581168.
  • [3] Noga Alon, Michael Krivelevich, and Benny Sudakov. Finding a large hidden clique in a random graph. Random Structures & Algorithms, 13(3-4):457–466, 1998. doi:10.1002/(SICI)1098-2418(199810/12)13:3/4<457::AID-RSA14>3.0.CO;2-W.
  • [4] Sanjeev Arora and Rong Ge. New tools for graph coloring. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 1–12, 2011. doi:10.1007/978-3-642-22935-0_1.
  • [5] Anne Auger and Benjamin Doerr. Theory of Randomized Search Heuristics. WORLD SCIENTIFIC, 2011. doi:10.1142/7438.
  • [6] Mitali Bafna, Jun-Ting Hsieh, and Pravesh K. Kothari. Rounding large independent sets on expanders. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC ’25, pages 631–642, New York, NY, USA, 2025. Association for Computing Machinery. doi:10.1145/3717823.3718137.
  • [7] Nikhil Bansal and Subhash Khot. Optimal long code test with one free bit. In Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’09, pages 453–462, USA, 2009. IEEE Computer Society. doi:10.1109/FOCS.2009.23.
  • [8] Boaz Barak, Samuel B. Hopkins, Jonathan Kelner, Pravesh Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. In Proceedings - 57th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2016, pages 428–437, United States, 2016. IEEE Computer Society. doi:10.1109/FOCS.2016.53.
  • [9] Ravi Boppana and Magnús Halldórsson. Approximating maximum independent sets by excluding subgraphs. BIT Numerical Mathematics, 32:13–25, January 2006. doi:10.1007/3-540-52846-6_74.
  • [10] Rares Darius Buhai, Pravesh K. Kothari, and David Steurer. Algorithms approaching the threshold for semi-random planted clique. In STOC 2023 - Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 1918–1926, 2023. doi:10.1145/3564246.3585184.
  • [11] Parinya Chalermsook, Wanchote Po Jiamjitrak, and Ly Orgo. On finding balanced bicliques via matchings. In Graph-Theoretic Concepts in Computer Science: 46th International Workshop, WG 2020, Leeds, UK, June 24–26, 2020, Revised Selected Papers, pages 238–247, Berlin, Heidelberg, 2020. Springer-Verlag. doi:10.1007/978-3-030-60440-0_19.
  • [12] Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pages 47–60, New York, NY, USA, 2017. Association for Computing Machinery. doi:10.1145/3055399.3055491.
  • [13] Amin Coja-Oghlan. Finding large independent sets in polynomial expected time. Combinatorics, Probability and Computing, 15(5):731–751, 2006. doi:10.1017/S0963548306007553.
  • [14] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009.
  • [15] Roee David and Uriel Feige. On the effect of randomness on planted 3-coloring models. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’16, pages 77–90, New York, NY, USA, 2016. Association for Computing Machinery. doi:10.1145/2897518.2897561.
  • [16] Devdatt Dubhashi and Alessandro Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, October 2009. doi:10.1017/CBO9780511581274.
  • [17] Uriel Feige. Approximating maximum clique by removing subgraphs. SIAM J. Discrete Math., 18:219–225, October 2004. doi:10.1137/S089548010240415X.
  • [18] Uriel Feige. Introduction to semirandom models. In Tim Roughgarden, editor, Beyond the Worst-Case Analysis of Algorithms, pages 189–211. Cambridge University Press, United Kingdom, January 2021. doi:10.1017/9781108637435.013.
  • [19] Uriel Feige and Joe Kilian. Heuristics for semirandom graph problems. J. Comput. Syst. Sci., 63(4):639–671, December 2001. doi:10.1006/jcss.2001.1773.
  • [20] Uriel Feige and Robert Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph. Random Structures & Algorithms, 16(2):195–208, 2000. doi:10.1002/(SICI)1098-2418(200003)16:2<195::AID-RSA5>3.0.CO;2-A.
  • [21] Uriel Feige and Robert Krauthgamer. The probable value of the lovász–schrijver relaxations for maximum independent set. SIAM Journal on Computing, 32(2):345–370, 2003. doi:10.1137/S009753970240118X.
  • [22] Uriel Feige and Eran Ofek. Finding a maximum independent set in a sparse random graph. SIAM Journal on Discrete Mathematics, 22(2):693–718, 2008. doi:10.1137/060661090.
  • [23] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S. Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques. J. ACM, 64(2), April 2017. doi:10.1145/3046674.
  • [24] Michael R. Garey and David S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., USA, 1990.
  • [25] Magnús M. Halldórsson. A still better performance guarantee for approximate graph coloring. Information Processing Letters, 45(1):19–23, 1993. doi:10.1016/0020-0190(93)90246-6.
  • [26] Johan Håstad. Clique is hard to approximate within n1ε. In Proceedings of 37th Conference on Foundations of Computer Science, pages 627–636, 1996. doi:10.1109/SFCS.1996.548522.
  • [27] Mark Jerrum. Large cliques elude the metropolis process. Random Structures & Algorithms, 3(4):347–359, 1992. doi:10.1002/rsa.3240030402.
  • [28] David S. Johnson. The NP-completeness column: An ongoing guide. J. Algorithms, 8(5):438–448, 1987. doi:10.1016/0196-6774(87)90021-6.
  • [29] George Karakostas. A better approximation ratio for the vertex cover problem. ACM Trans. Algorithms, 5(4), November 2009. doi:10.1145/1597036.1597045.
  • [30] David Karger, Rajeev Motwani, and Madhu Sudan. Approximate graph coloring by semidefinite programming. J. ACM, 45(2):246–265, March 1998. doi:10.1145/274787.274791.
  • [31] Richard Karp. Reducibility among combinatorial problems. Complexity of Computer Computations, 40:85–103, January 1972. doi:10.1007/978-3-540-68279-0_8.
  • [32] S. Khot. Improved inapproximability results for maxclique, chromatic number and approximate graph coloring. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 600–609, 2001. doi:10.1109/SFCS.2001.959936.
  • [33] Subhash Khot and Oded Regev. Vertex cover might be hard to approximate to within 2ε. Journal of Computer and System Sciences, 74(3):335–349, 2008. Computational Complexity 2003. doi:10.1016/j.jcss.2007.06.019.
  • [34] Akash Kumar, Anand Louis, and Madhur Tulsiani. Finding pseudorandom colorings of pseudorandom graphs. In 37th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2017), volume 93 of Leibniz International Proceedings in Informatics (LIPIcs), pages 37:1–37:12, Dagstuhl, Germany, 2018. doi:10.4230/LIPIcs.FSTTCS.2017.37.
  • [35] Luděk Kučera. Expected complexity of graph partitioning problems. Discrete Appl. Math., 57(2–3):193–212, 1995. doi:10.1016/0166-218X(94)00103-K.
  • [36] Pasin Manurangsi. Inapproximability of maximum biclique problems, minimum k-cut and densest at-least-k-subgraph from the small set expansion hypothesis. Algorithms, 11(1), 2018. doi:10.3390/a11010010.
  • [37] Theo McKenzie, Hermish Mehta, and Luca Trevisan. A new algorithm for the robust semi-random independent set problem. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 738–746, 2020. doi:10.1137/1.9781611975994.45.
  • [38] Jacob Steinhardt. Does robustness imply tractability? a lower bound for planted clique in the semi-random model. arXiv e-prints, 2017. doi:10.48550/arXiv.1704.05120.
  • [39] David Zuckerman. Linear degree extractors and the inapproximability of max clique and chromatic number. In Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’06, pages 681–690, New York, NY, USA, 2006. Association for Computing Machinery. doi:10.1145/1132516.1132612.

Appendix A Concentration inequalities

The following bounds can be found in [5] or derived from [16]. Let X1,,Xn be binary random variables. We say that X1,,Xn are non-positively correlated if for all I{1,,n}:

Pr(iIXi=0)iIPr(Xi=0) (49)

and

Pr(iIXi=1)iIPr(Xi=1). (50)

Then:

Lemma 16.

Let X1,,Xn be independent or, more generally, non-positively correlated binary random variables. Let a1,,an[0,1] and X=i=1naiXi. Then, for any ε>0, we have:

Pr(X(1ε)𝔼[X])eε22𝔼[X] (51)

and

Pr(X(1+ε)𝔼[X])eε22+ε𝔼[X] (52)