Abstract 1 Introduction 2 Model and main result 3 Volume and edge density 4 Auxiliary lemmas 5 Edge density and volume results for 𝑮𝒏𝒉 6 Modularity of 𝑮𝒏𝒉 vanishes with 𝒉 7 Concluding remarks References

Modularity of Preferential Attachment Graphs

Katarzyna Rybarczyk ORCID Adam Mickiewicz University, Poznań, Poland Małgorzata Sulkowska ORCID Department of Fundamentals of Computer Science, Wrocław University of Science and Technology, Poland
Abstract

We study a preferential attachment model Gnh. The graph Gnh is generated from a finite initial graph by adding new vertices one at a time. Each new vertex connects to h1 already existing vertices, and these are chosen with probability proportional to their current degrees. We are particularly interested in the community structure of Gnh, which is expressed in terms of the so–called modularity. We prove that the modularity of Gnh is, with high probability, upper bounded by a function that tends to 0 as h tends to infinity. This resolves a conjecture of Prokhorenkova, Prałat, and Raigorodskii from 2016.

As a byproduct, we obtain novel concentration results (which are interesting in their own right) for the volume and edge density parameters of vertex subsets of Gnh. The key ingredient here is the definition of a function μ, which serves as a natural measure for vertex subsets, and is proportional to the average size of their volumes. This extends previous results on the topic by Frieze, Pérez-Giménez, Prałat, and Reiniger from 2019.

Keywords and phrases:
Modularity, preferential attachment model, edge expansion
Copyright and License:
[Uncaptioned image] © Katarzyna Rybarczyk and Małgorzata Sulkowska; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Random graphs
; Mathematics of computing Stochastic processes
Related Version:
Full Version: https://arxiv.org/pdf/2501.06771 [33]
Funding:
This research was funded in whole or in part by National Science Centre, Poland, grant OPUS-25 no 2023/49/B/ST6/02517. For the purpose of Open Access, the authors have applied a CC-BY public copyright licence to any Author Accepted Manuscript (AAM) version arising from this submission.
Editors:
Meena Mahajan, Florin Manea, Annabelle McIver, and Nguyễn Kim Thắng

1 Introduction

Real-world networks, ranging from social and information networks to biological and technological infrastructures, often exhibit a rich community structure. Detecting and analyzing such communities has far-reaching applications: identifying groups of common interest in social media, classifying spam and misinformation, retrieving related content, uncovering proteins with similar biological functions, optimizing large-scale infrastructures, improving network visualization, etc. [19, 28].

To model these networks mathematically, preferential attachment graphs have become one of the main paradigms. Their early forms appeared as random recursive trees [22, 35]. However, the in-depth study of preferential attachment models was initiated in 1999 by the work of Barabási and Albert [3], who indicated the applications of such graphs in network modeling. The preferential attachment model was subsequently formally defined and analyzed by Bollobás, Riordan, Spencer, and Tusnády [8], and Bollobás and Riordan in [6, 7]. It relies on two mechanisms: growth (the graph is growing over time, gaining a new vertex and a bunch of h1 edges at each time step) and preferential attachment (an arriving vertex is more likely to attach to other vertices with high degree rather than with low degree), for a precise definition check Section 2. Its degree distribution as well as diameter often fit in with the ones spotted in reality [15, 28]. Nevertheless, an experimental study shows that, unlike real networks, it lacks apparent community structure.

Quantifying community structure itself is a subtle task. Among the many measures proposed, modularity, introduced by Newman and Girvan in 2004 [28, 29], has emerged as a central metric. The vertices of a graph with high modularity may be partitioned into subsets in which there are much more internal edges than we would expect by chance (see Definition 1). Nowadays, modularity is widely used not only as a quality function judging the performance of community detection algorithms [19], but also as a central ingredient of such algorithms, like in the Louvain algorithm [5], the Leiden algorithm [36] or the Tel-Aviv algorithm [13]. Early theoretical results on modularity were given for trees [2] and regular graphs [24]. For a summary of results for various families of graphs check the appendix of [26] by McDiarmid and Skerman from 2020. More recent discoveries include [9] by Chellig, Fountoulakis, and Skerman (for random graphs on the hyperbolic plane), [20] by Lasoń and Sulkowska (for minor-free graphs), [21] by Lichev and Mitsche (for 3-regular graphs and graphs with a given degree sequence), or [32] by Rybarczyk (for random intersection graphs).

Despite being so widely used in practice, modularity still suffers from a narrow theoretical study in the families of random graphs devoted to modeling real-life networks. The first results for the well-known and most studied random graph, the binomial G(n,p), were given by McDiarmid and Skerman just in 2020 [26]. It is commonly known that G(n,p) is a poor fit to real networks [19]. Preferential attachment models perform here much better. Prokhorenkova, Prałat, and Raigorodskii opened the preliminary study on modularity of a standard preferential attachment graph in [30]. They obtained non-trivial upper and lower bounds, however, the gap to close remained big. They conjectured that the modularity of such a graph with high probability tends to 0 with h (the number of edges added per step) tending to infinity (see Conjecture 3). In this paper we prove their conjecture, confirming the supposition that a standard preferential attachment model might have too small modularity to mirror well the behavior of real networks.

As a result, we derive new and interesting concentration results for the volume and edge density parameters of a given subset of vertices in the preferential attachment graph. To this end, we introduce a new function μ, which serves as a natural measure for vertex subsets, and is proportional to the average size of their volumes. These findings are noteworthy on their own and could have potential applications to other problems related to the model in the future. They extend previous results from [12] by Frieze, Pérez-Giménez, Prałat, and Reiniger (see lemmas 3 and 4 therein), which were utilized in the context of Hamilton cycles in the preferential attachment model.

In the following section we give the formal definition of the preferential attachment model and state the main result. Section 3 is devoted to presenting the results regarding the volume and the edge density parameters of subsets of vertices in Gnh. Section 4 is technical, it contains several facts and auxiliary lemmas used in the latter parts of the paper. In Section 5 we derive concentration results stated in Section 3. These results are used in Section 6 to prove the main theorem about vanishing modularity in the standard preferential attachment graph. Section 7 contains concluding remarks.

2 Model and main result

Let denote the set of natural numbers, ={1,2,3,}. For n let [n]={1,2,,n}. For functions f(n) and g(n) we write f(n)g(n) if limnf(n)/g(n)=1. We say that an event occurs with high probability (whp) if the probability [] depends on a certain number n and tends to 1 as n tends to infinity.

All of the graphs considered in this paper are finite, undirected, and loops and multiple edges are allowed. Thus a graph is a pair G=(V,E), where V is a finite set of vertices and E is a finite multiset of elements from V(1)V(2) with V(k) being a set of all k-element subsets of V. Let e(G)=|E| and for S,UV set eG(S)=|{eE(S(1)S(2))| and eG(S,U)=|{eE:eSeU}|. The degree of a vertex vV in G, denoted by degG(v), is the number of edges to which v belongs but loops are counted twice, i.e., degG(v)=2|{eE:veeV(1)}|+|{eE:veeV(2)}|. We define the volume of SV in G by volG(S)=vSdegG(v). By the volume of a graph, vol(G), we understand volG(V). Whenever the context is clear we write e(S) instead of eG(S), e(S,U) instead of eG(S,U), deg(v) instead of degG(v) and vol(S) instead of volG(S).

We focus on a particular random graph model, called here simply the preferential attachment graph (consult [3, 6, 15]). Given h,n, we construct a preferential attachment graph Gnh in two phases. In the first phase we sample a particular random tree Thn, whose vertices are called mini-vertices. (We call Thn a tree, however it might be disconnected, and loops, i.e. single-vertex edges, are allowed in Thn.) Next, the appropriate mini-vertices of Thn are grouped to form vertices of Gnh. Let us describe this procedure in detail.

Phase 1.

We start the whole process with T1 which is a graph consisting of a single mini-vertex 1 with a single loop (thus the degree of vertex 1 is 2). For t1, the graph Tt+1 is built upon Tt by adding a mini-vertex (t+1) and joining it by an edge with a mini-vertex i according to the following probability distribution:

(i=s)={degTt(s)2t+1for1st12t+1fors=t+1.

Note that we allow a newly arrived vertex to connect to itself. We continue the process until we get the random tree Thn.

Phase 2.

A random multigraph Gnh is obtained from Thn by merging each set of mini-vertices {h(i1)+1,h(i1)+2,,h(i1)+h} into a single vertex i for i{1,2,,n}, keeping loops and multiple edges.

Note that if Gnh=(V,E) then V=[n], |V|=n and |E|=hn. Since we will refer very often to the number of edges of Gnh, it will be also denoted by M, i.e., M:=hn. Given Gnh, by Gth, where t[n], we understand the subgraph of Gnh induced by the set of vertices [t].

Our main goal is to upper bound the graph parameter called modularity for Gnh. Its formal definition is given just below.

Definition 1 (Modularity, [29]).

Let G be a graph with at least one edge. For a partition 𝒜 of V define a modularity score of G as

mod𝒜(G)=S𝒜(e(S)e(G)(vol(S)vol(G))2).

Modularity of G is given by

mod(G)=max𝒜mod𝒜(G),

where maximum runs over all the partitions of the set V.

Conventionally, a graph with no edges has the modularity equal to 0. A single summand of the modularity score is the difference between the fraction of edges within S and the expected fraction of edges within S in a certain random multigraph on V with the expected degree sequence given by G (see, e.g., [18]). It is easy to check that mod(G)[0,1).

Non-trivial lower and upper bounds for the modularity of Gnh obtained by Prokhorenkova, Prałat and Raigorodskii in [30] are the following.

Theorem 2 ([30], Theorem 4.2, Section 4.2).

Let Gnh=(V,E) be a preferential attachment graph. Then whp by n

mod(Gnh)=Ωh(1/h)

and whp by n

mod(Gnh)1min{δ(Gnh)/(2h),1/16},

where δ(Gnh)=minSV,1|S||V|/2e(S,VS)|S| is the edge expansion of Gnh.

Applying the results for the edge expansion of Gnh by Mihail, Papadimitriou and Saberi from [27] to the upper bound one obtains that whp mod(Gnh)1O(1/h). Indeed, the gap between the upper and the lower bound remained big. The authors stated the following two conjectures suggesting that the upper bound could be improved.

Conjecture 3 ([30]).

Let Gnh be a preferential attachment graph. Then whp by n

mod(Gnh)h 0.
Conjecture 4 ([30]).

Let Gnh be a preferential attachment graph. Then whp by n

mod(Gnh)=Θh(1/h).

In this paper we present a much better upper bound for the modularity of Gnh than the one from Theorem 2 when h is large, resolving, in the positive, Conjecture 3. Conjecture 4 still remains open. The main result of the paper may be presented as follows.

Theorem 5.

Let Gnh be a preferential attachment graph. Then for every ε>0, whp by n

mod(Gnh)(1+ε)f(h)h,

where

f(h)=6g𝒱(h)+42ln2g𝒱(h)2/h

with

g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2.
 Remark.

Note that f(h)32ln2lnh as h thus f(h)/h0 as h. The value of f(h)/h drops below 1 for h810.

Corollary 6.

Let Gnh be a preferential attachment graph. Then whp by n

mod(Gnh)3.54lnh+0.62+19.49h.
 Remark.

The value 3.54lnh+0.62+19.49h drops below 1 for h847.

 Remark.

Some new results on the fact that mod(Gnh) is whp separated from 1 by a constant even for small values of h can be found in [23].

3 Volume and edge density

When talking about Gnh=(V,E) we will very often refer to its corresponding random tree Thn=(V~,E~). Recall that V=[n], V~=[hn] and |E~|=|E|=hn=:M. For SV the corresponding set of its mini-vertices in V~ will be denoted by S~, thus |S~|=h|S|. For i[n] and SV let Si=S[i], in particular Sn=S. Analogously, for i[M] and S~V~ set S~i=S~[i], in particular S~M=S~. Note that for SV we have volGnh(S)=volThn(S~) and eGnh(S)=eThn(S~).

When working with modularity we need to have a control over eGnh(S) and volGnh(S), where SV. Those values depend a lot on the arrival times of vertices from S. To capture this phenomenon we define a special measure μ:2V~[0,), where 2V~ stands for the set of all subsets of V~.

Definition 7 (Measure μ).

Let Gnh=(V,E) be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. Let SV thus S~V~ is the set of its corresponding mini-vertices. Associate S~ with the set of indicator functions

δiS~={1ifiS~0ifiS~,

where i[M] (whenever the context is clear we write δi instead of δiS~). Define a function μ:2V~[0,) as follows:

μ(S~)=π2j=1MδjS~cj1

with cj=i=1j2i12i for j1 and c0=1.

 Remark.

Let Gnh=(V,E) be a preferential attachment graph, SV and t[M]. Note that

μ(S~t)=π2j=1tδjS~cj1.

We use the measure μ to express the following novel concentration results for volGnh(S), eGnh(S), and eGnh(S,VS), where S is an arbitrary subset of V.

Theorem 8.

Let Gnh=(V,E) be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. Then for every ε>0 whp by n

SV|vol(S)2Mμ(S~)|(1+ε)g𝒱(h)Mh,

where g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2.

Theorem 9.

Let Gnh be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. Then for every ε>0 whp by n

SV|e(S)μ(S~)2|(1+ε)g(h)Mh,

where

g(h)=g𝒱(h)2+2ln2

with

g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2.
Theorem 10.

Let Gnh be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. Then for every ε>0 whp by n

SV|e(S,VS)2μ(S~)(Mμ(S~))|(1+ε)(32g𝒱(h)+2ln2)Mh,

where

g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2.

To grasp the intuition hidden behind the above concentration results, it is helpful to know that there is a relation between the structure of the graph Thn and the structure of a random graph G^ on the vertex set [M] in which every edge {i,j} (for i,j[M]) is present with probability 1/(2ij), independently of the other possible edges (in particular, a loop at vertex i is present with probability 1/(2i), consult Section 4 in [7] by Bollobás and Riordan). We will see that, for any set S~, the values μ(S~) and μ(S~)2 are closely related to the expected value of volG^(S~) and eG^(S~), respectively.

Let S~[M]. The number of inner edges of S~ in G^, eG^(S~), satisfies

𝔼[eG^(S~)] =12iS~jSji12ij+iS~12i=iS~12ijS~ji12j+iS~12i
=(iS~12i)2iS~14i=(iS~12i)2+O(lnM).

Analogously one shows that the number of edges between S~ and [M]S~, eG^(S~,[M]S~), fulfills

𝔼[eG^(S~,[M]S~)]=2(iS~12i)(i[M]S~12i).

Since volG^(S~)=2eG^(S~)+eG^(S~,V~S~) one also gets

𝔼[volG^(S~)]=2(i[M]12i)(iS~12i)+O(lnM).

We will see later (consult Lemma 14) that the value of πcj is asymptotically close to 1/j thus the measure μ is constructed in such a way that μ(S~) mimics the behavior of iS~12i in G^. In particular μ({i})12i for in and μ([M])M. Therefore we may expect that in Thn we will get e(S~)μ(S~)2, e(S~,VS~)2μ(S~)(Mμ(S~)) and vol(S~)2Mμ(S~).

4 Auxiliary lemmas

The current section gathers all technical lemmas needed in the latter parts of the paper.

The concentration results presented in Section 3 and proved in Section 5 are based on two variants of the Azuma-Hoeffding martingale inequality. The first one is standard. We state it as it appears in [16] by Janson, Łuczak and Ruciński.

Lemma 11 (Azuma-Hoeffding inequality, [1, 14]).

If X0,X1,,Xn is a martingale and there exist b1,,bn such that |XjXj1|bj for each j[n], then, for every x>0,

[XnX0+x]exp{x22j=1nbj2}.

The second one is Freedman’s inequality. We state it below in the form very similar to the one presented in Lemma 2.2 of [37] by Warnke (one may consult also [4] by Bennett and Dudek).

Lemma 12 (Freedman’s inequality, [11]).

Let X0,X1,,Xn be a martingale with respect to a filtration 01n. Set Ak=maxi[k](XiXi1) and Wk=i=1kVar[XiXi1|i1]. Then for every λ>0 and W,A>0 we have

[k[n]XkX0+λ,WkW,AkA]exp{λ22W+2Aλ/3}.

Next, we present bounds for the values of cj and an upper bound for the function μ(S~), both introduced in Definition 7. These results will be referred to very often later on. They are derived using Stirling’s approximation.

Lemma 13 (Stirling’s approximation, [31]).

Let n. Then

2πn(ne)nexp{112n+1}<n!<2πn(ne)nexp{112n}.
Lemma 14.

For j1 let cj=i=1j2i12i. Then

exp{18j14144j2}1πjcj1πj.

Proof.

By Stirling’s approximation (Lemma 13) we get

cj=i=1j2i12i=(2j)!22j(j!)21πjexp{124j212j+1}1πj.

Analogously, since 1122j+11122j1144(2j)2,

cj1πjexp{124j14144j2212j}=1πjexp{18j14144j2}.

Lemma 15.

Let Gnh be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. Let also t[M] and S~V~. Then

μ(S~t)t+12.

Proof of Lemma 15.

By Lemma 14 we get

μ(S~t)π2j=1tcj1π2+12j=2t1j1π2+12+1t12j𝑑jt+12.

Lemmas 16, 17, and 18 are auxiliary calculations for expressions involving the volumes of subsets of V~ (however the reader might not notice the connection with volumes at this point). They will be directly used in Section 5 in the proof of Theorem 9 stating the result on the concentration of eGnh(S) for SV. For the proofs check the full version of this paper [33].

Lemma 16.

Let Gnh be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. Fix S~V~ and let t0[M] be such that t0=t0(n)n. Then

i=t0+1Mδi2i1=π2i=1M(δici1)2+O(lnM).
Lemma 17.

Let Gnh be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. Fix S~V~ and let t0[M] be such that t0=t0(n)n. Then

i=t0+1Mδi2i1μ(S~i1)2i1=π2i=1M(δici1j=1i1δjcj1)+O(lnM+t0).
Lemma 18.

Let Gnh be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. Fix S~V~ and let t0[M]. Then for any constant C>0

i=t0+1MδiC(i1)2i1C2(Mt0).

5 Edge density and volume results for 𝑮𝒏𝒉

In this section we use martingale techniques to prove theorems 8, 9, and 10 stated in Section 3, i.e., we derive concentration results for vol(S), e(S), and e(S,VS) for an arbitrary subset SV in Gnh. A series of results will lead us to Corollary 22 which implies, as a special case, Theorem 8. We start with analyzing the volumes.

Lemma 19.

Let Gnh be a preferential attachment graph. Consider the process of constructing its corresponding random tree Thn=(V~,E~). Fix S~V~ and for t[M] let Zt=volTt(S~t). Set

Z^t=ctZtj=1tδjcj1

(recall that δj and cj were introduced in Definition 7). Let t be a σ-algebra associated with all the events that happened till time t. Then Z^1,Z^2,,Z^M is a martingale with respect to the filtration 1M. Moreover, for t[M1]

|Z^t+1Z^t|2πtandVar[(Z^t+1Z^t)|t]14π(t+1).
 Remark.

The formula for Z^t was inspired by the martingale constructed in Lemma 4 of [12] by Frieze, Prałat, Pérez-Giménez, and Reiniger. The Frieze et al.’s martingale, in contrast to ours, consisted only of the first term, thus of ctZt=(j=1t2j12j)Zt, and therefore addressed only those sets of mini-vertices that formed compact intervals. By introducing the second term, i.e., by subtracting j=1tδjcj1, we are able to handle all types of sets, including those scattered throughout the entire interval [1,M].

Proof.

Let t[M1]. Recall that when mini-vertex (t+1) arrives, it may also connect to itself. Therefore, conditioned on t,

Zt+1={Zt+δt+1+1with probabilityZt+δt+12t+1Zt+δt+1otherwise. (1)

Additionally, since ct=ct+12t+22t+1, we get

𝔼[Z^t+1|t]=𝔼[ct+1Zt+1j=1t+1δjcj1|t]=ct+1(Zt+δt+1+Zt+δt+12t+1)j=1t+1δjcj1=ct+12t+22t+1(Zt+δt+1)j=1t+1δjcj1=ctZtj=1tδjcj1=Z^t,

thus Z^1,,Z^M is a martingale with respect to the filtration 1M. Next, since ct+1=ct2t+12t+2

|Z^t+1Z^t|=|ct+1Zt+1ctZtδt+1ct|=ct|2t+12t+2Zt+1Ztδt+1|=ct|(Zt+1Zt)(Zt+12t+2+δt+1)|.

Note that (Zt+1Zt){0,1,2}. Moreover, the volume of S~t+1 in Tt+1 may be at most 2t+2. Thus Zt+12t+2, which also implies that (Zt+12t+2+δt+1)[0,2]. Now use Lemma 14 and the fact that |ab|max{a,b} for non-negative a,b to get

|Z^t+1Z^t|2ct2πt.

By the fact that Z^1,,Z^M is a martingale with respect to the filtration 1M, ct=2t+22t+1ct+1 and by (1) we also have

Var[(Z^t+1Z^t)|t]=𝔼[(Z^t+1Z^t)2|t]=𝔼[(ct+1Zt+1ctZtδt+1ct)2|t]=ct+12𝔼[(Zt+12t+22t+1(Zt+δt+1))2|t]=ct+12((Zt+δt+1+12t+22t+1(Zt+δt+1))2Zt+δt+12t+1+(Zt+δt+12t+22t+1(Zt+δt+1))2(1Zt+δt+12t+1))=ct+12(Zt+δt+12t+1(1Zt+δt+12t+1))ct+12414π(t+1),

where the last inequalities follow from the fact that Zt+δt+12t+1[0,1] and Lemma 14, respectively. The next proof utilizes the following well-known approximation.

Lemma 20 (See [17]).

Let n and Hn=k=1n1k. Then Hn=lnn+γ+12nαn, where γ0.5772 is known as Euler-Mascheroni constant and 0αn1/(8n2).

Theorem 21.

Let Gnh be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. Fix S~V~ and let t[M]. Then for every ε>0, for sufficiently large t and for sufficiently large n we get

[|volTt(S~t)2tμ(S~t)|(1+ε)g𝒱(h)th]22(1+ε/2)t/h,

where g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2.
(Recall that μ(S~t) was introduced in Definition 7).

Proof.

Throughout the proof we refer to the process of constructing the random tree Thn. For t[M] let t be a σ-algebra associated with all the events that happened till time t. Fix ε>0, set t0=t/h and for j{t0,t0+1,,t} consider

Z^j=cjZji=1jδici1,

where Zj=volTj(S~j) and ci’s and δi’s are as in Definition 7. By Lemma 19 we know that Z^t0,,Z^t is a martingale with respect to the filtration t0t such that |Z^jZ^j1|2π(j1) and Var[(Z^jZ^j1)|j1]14πj. Therefore

maxj{t0+1,,t}(Z^jZ^j1)2π(t01)=2π(t/h)(1+O(1/t))

and, by Lemma 20,

j=t0+1tVar[(Z^jZ^j1)|j1]j=t0+1t14πj14πln(t/t0)+14π18t/h2=14πlnh+O(1/t).

Applying Freedman’s inequality (Lemma 12) to Z^t0,,Z^t with A=2π(t/h)(1+O(1/t)), W=14πlnh+O(1/t) and λ=(1+ε)g¯𝒱(h)πt/h, where g¯𝒱(h)=g𝒱(h)2, we get

[Z^tZ^t0+(1+ε)g¯𝒱(h)πt/h]exp{(1+ε)2g¯𝒱(h)2πth214πlnh+O(1/t)+232π(1+ε)g¯𝒱(h)(1+O(1/t))}exp{(1+ε/2)2g¯𝒱(h)212lnh+43(1+ε/2)g¯𝒱(h)th},

where the last inequality holds for sufficiently large t and n. One can verify that

(1+ε/2)2g¯𝒱(h)212lnh+43(1+ε/2)g¯𝒱(h)(1+ε/2)ln2

thus for sufficiently large t and n we get

[Z^tZ^t0+(1+ε)g¯𝒱(h)πt/h]2(1+ε/2)t/h. (2)

Let us now analyze the complementary event {Z^tZ^t0+(1+ε)g¯𝒱(h)πt/h}. It is equivalent to

{ctZti=1tδici1ct0Zt0i=1t0δici1+(1+ε)g¯𝒱(h)πt/h}

which is

{Zt1cti=1tδici1ct0ctZt01cti=1t0δici1+1ct(1+ε)g¯𝒱(h)πt/h}.

By the definition of μ(S~t), Lemma 14 and Lemma 15 we have

1cti=1tδici1=2πctμ(S~t)2tμ(S~t)e18t+14144t2=2tμ(S~t)(1+O(1/t))=2tμ(S~t)+O(1). (3)

Note that Zt02t0, therefore by Lemma 14

ct0ctZt02tt0(1+O(1/t))=2th+O(1), (4)

and, again by Lemma 14,

1ct(1+ε)g¯𝒱(h)πt/h(1+ε)g¯𝒱(h)th(1+O(1/t))=(1+ε)g¯𝒱(h)th+O(1). (5)

Thus by (3), (4) and (5) we get that the event {Z^tZ^t0+(1+ε)g¯𝒱(h)πt/h} implies

{Zt2tμ(S~t)2th+(1+ε)g¯𝒱(h)th+O(1)}

which, by (2), for sufficiently large t and n, gives (recall that g¯𝒱(h)+2=g𝒱(h))

[volTt(S~t)2tμ(S~t)(1+ε)g𝒱(h)th]2(1+ε/2)t/h. (6)

To get the opposite bound we repeat the reasoning for the martingale Z^t0,,Z^t (check the full version of this paper for the details [33]).

Corollary 22.

Let Gnh=(V,E) be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. Then for every ε>0 we have

[i{log2n,,n}SV|volGih(Si)2hiμ(S~hi)|(1+ε)g𝒱(h)hih]=1o(1),

where g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2.

Proof.

Fix ε>0. Recall that volGih(Si)=volThi(S~hi). For SV and i{log2n,,n} define the event S,i as follows

S,i={|volThi(S~hi)2hiμ(S~hi)|(1+ε)g𝒱(h)hih}.

For i{log2n,,n}, by Theorem 21 and the union bound, for sufficiently large n we have

[SVS,iC]2i22(1+ε/2)i=22(ε/2)i.

Indeed, note that i iterates over the vertices of Gnh and at time i there are 2i possible configurations for Si, thus also for S~hi. Next, again by the union bound, for sufficiently large n we get

[i{log2n,,n}SVS,iC]i=log2nn22(ε/2)i22εlog2n12ε2(12ε)nε,

which implies

[i{log2n,,n}SVS,i]=1o(1).

Note that considering only i=n in Corollary 22 we get the statement of Theorem 8, which finishes its proof.

Now we will again use martingale inequalities which, together with concentration results for volumes from Corollary 22, will lead to the proof of Theorem 9. Consider the process of constructing the random tree Thn=(V~,E~). Let S~V~ and j{1,,M}. The result stated in Corollary 22 gives the concentration of the volumes of S~j at time j only for j=hi, where i{log2n,,n}. In particular, it says nothing about the concentration of the volumes of the sets S~hi+k, where k{1,2,h1}, at time hi+k. Such intermediate concentrations will be needed to prove Theorem 9, i.e., to draw a conclusion about the concentration of the number of edges within S~. We derive those intermediate concentrations in Lemma 23.

Lemma 23.

Let Gnh=(V,E) be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. Then for every ε>0 we have

[t{log2nh,,M}SV|volTt(S~t)2tμ(S~t)|(1+ε)g𝒱(h)th]=1o(1),

where g𝒱(h)=162ln2(9lnh+8ln2)+(2/3)ln2+2.

Proof.

Fix ε>0. Define the events and 𝒱 as follows

={t{log2nh,,M}SV|volTt(S~t)2tμ(S~t)|(1+ε)g𝒱(h)th}𝒱={j=ih such that i{log2n,,n}SV|volTj(S~j)2jμ(S~j)|(1+ε/2)g𝒱(h)jh}.

By Corollary 22 we know that [𝒱]=1o(1) thus it is enough to show that the event 𝒱 implies the event .

Assume that 𝒱 holds. For t[M] and SV let Zt=volTt(S~t). Consider all j=ih where i{log2n,,n1} and let k{1,2,h1}. By the fact that 1+k/j=1+O(1/j), δ{0,1}, Lemma 14 and Lemma 15 we get

2j+kμ(S~j+k)=2j1+k/j(μ(S~j)+=j+1j+kδc1)2j(1+O(1/j))(μ(S~j)+=j+1j+k1π(1))=(2j+O(1/j))(μ(S~j)+O(1/j))=2jμ(S~j)+O(1).

Therefore, if 𝒱 holds, then for all j=ih where i{log2n,,n1}, for all SV, and for all k{1,2,,h1}, for sufficiently large n on one hand,

Zj+kZj2jμ(S~j)(1+ε/2)g𝒱(h)jh2j+kμ(S~j+k)O(1)(1+ε/2)g𝒱(h)j+kh2j+kμ(S~j+k)(1+ε)g𝒱(h)j+kh

and on the other hand

Zj+kZj+2k2jμ(S~j)+(1+ε/2)g𝒱(h)jh+2h2j+kμ(S~j+k)+(1+ε)g𝒱(h)j+kh.

Thus for sufficiently large n the event 𝒱 implies the event , therefore [𝒱]=1o(1) implies []=1o(1) and the proof is finished.

Now, we move on to the concentration of the number of edges within subsets of V.

Lemma 24.

Let Gnh be a preferential attachment graph. Consider the process of constructing its corresponding random tree Thn=(V~,E~). Fix S~V~ and for t[M] let Xt=e(S~t) and t be a σ-algebra associated with all the events that happened till time t. For j{2,,M} set Dj=𝔼[XjXj1|j1] and define

X^t=Xtj=2tDj.

Then X^1,X^2,,X^M is a martingale with respect to the filtration 1M. Moreover, for t{2,,M}

|X^tX^t1|δt.

Proof.

Let t{2,,M}. Note that

𝔼[X^tX^t1|t1]=𝔼[XtXt1Dt|t1]=𝔼[XtXt1|t1]𝔼[𝔼[XtXt1|t1]|t1]=𝔼[XtXt1|t1]𝔼[XtXt1|t1]=0,

thus X^1,,X^M is a martingale with respect to the filtration 1M.

Now, for t[M] let Zt=volTt(S~t). Recall that when the mini-vertex t arrives it may also connect to itself thus for t{2,,M} we have

Dt=𝔼[XtXt1|t1]=δtZt1+12t1.

Therefore

|X^tX^t1|=|XtXt1Dt|=|XtXt1δtZt1+12t1|max{XtXt1,δtZt1+12t1}δt,

where we used the fact that 0XtXt1δt, Zt12t2 thus 0δtZt12t1δt and |ab|max{a,b} for non-negative a,b.

Lemma 25.

Let Gnh be a preferential attachment graph and Thn=(V~,E~) its corresponding random tree. For t[M] and SV let Zt=volTt(S~t). Then for t0[M] divisible by h and for every ε>0 we have

[SV|e(S)(e(St0/h)+i=t0+1MδiZi1+12i1)|BεMh]=1o(1),

where Bε=(1+ε)2ln2.

Proof.

Throughout the proof we again refer to the process of constructing the random tree Thn. For t[M] let t be a σ-algebra associated with all the events that happened till time t. For SV let Xt=eTt(S~t) and Dt=𝔼[XtXt1|t1]. Fix ε>0 and for j{t0,t0+1,,M} and SV consider

X^j=Xji=2jDi.

By Lemma 24 we know that X^t0,,X^M is a martingale with respect to the filtration t0M such that |X^jX^j1|δj. Moreover

j=t0Mδj2=j=t0MδjM.

Thus applying Azuma-Hoeffding inequality (Lemma 11) to X^t0,,X^M with bj=δj and x=BεMh, where Bε=(1+ε)2ln2 we get

[X^MX^t0+BεMh]exp{(1+ε)2ln2M2/h2M}=2(1+ε)n. (7)

Let us now analyze the event {X^MX^t0+BεMh}. It is equivalent to

{XMi=2MDiXt0i=2t0Di+BεMh}

which is

{XMXt0+i=t0+1MDi+BεMh}

and, by the definition of Di (check the proof of Lemma 24),

{XMXt0+i=t0+1MδiZi1+12i1+BεMh}.

Thus, by (7), we get

[XM(Xt0+i=t0+1MδiZi1+12i1)BεMh]2(1+ε)n. (8)

Acting analogously for the martingale X^t0,,X^M we get the opposite bound (check the full version of this paper for the details [33]).

We are ready to prove Theorem 9.

Proof of Theorem 9.

Throughout the proof we again refer to the process of constructing the random tree Thn. Fix ε>0. For t[M] and SV let Xt=eTt(S~t) and Zt=volTt(S~t). For t0=hlog2n we define the events , and 𝒱 as follows

={SV|e(S)μ(S~)2|(1+ε)g(h)Mh},={SV|e(S)(e(St0/h)+i=t0+1MδiZi1+12i1)|BεMh},𝒱={t{t0,,M}SV|Zt2tμ(S~t)|(1+ε)g𝒱(h)th},

where Bε=(1+ε)2ln2. By lemmas 23 and 25 we know that [𝒱]=1o(1). Thus it is enough to show that the event 𝒱 implies the event .

Assume that 𝒱 holds. By lemmas 16, 17, and 18, for any SV, as t0=O(lnM), we can write

i=t0+1MδiZi1+12i1i=t0+1Mδi2i1+i=t0+1Mδi2i1μ(S~i1)2i1+i=t0+1Mδi(1+ε)g𝒱(h)i1h2i1π2i=1M(δici1)2+π2i=1M(δici1j=1i1δjcj1)+(1+ε)g𝒱(h)2Mt0h+O(lnM)=(π2)2(i=1M(δici1)2+2i=1M(δici1j=1i1δjcj1))+π4i=1M(δici1)2+(1+ε)g𝒱(h)2Mt0h+O(lnM)μ(S~M)2+π4i=1M1π(i1)+(1+ε)g𝒱(h)2Mt0h+O(lnM)=μ(S~M)2+(1+ε)g𝒱(h)2Mt0h+O(lnM),

where the last inequality follows from Lemma 14, the fact that δi{0,1}, and the fact that

μ(S~M)2=(π2i=1Mδici1)2=(π2)2(i=1M(δici1)2+2i=1M(δici1j=1i1δjcj1)).

Analogously, again by lemmas 16, 17, and 18, for any SV we can get

i=t0+1MδiZi1+12i1i=t0+1Mδi2i1+i=t0+1Mδi2i1μ(S~i1)2i1i=t0+1Mδi(1+ε)g𝒱(h)i1h2i1μ(S~M)2(1+ε)g𝒱(h)2Mt0hO(lnM).

Note that for any SV we have e(St0/h)t0 and recall that t0=hlog2n. Thus for all SV, for sufficiently large n we may write (recall that 𝒱 holds)

e(S)t0+BεMh+μ(S~)2+(1+ε)g𝒱(h)2Mt0h+O(lnM)μ(S~)2+(1+ε)2ln2Mh+(1+ε)g𝒱(h)2Mh=μ(S~)2+(1+ε)g(h)Mh,

where the term O(lnM)+t0 vanishes after the second inequality as the factor 1+ε from Bε is replaced by (1+ε). Analogously we get

e(S)BεMh+μ(S~)2(1+ε)g𝒱(h)2Mt0hO(lnM)μ(S~)2(1+ε)g(h)Mh.

This means that for sufficiently large n, the event 𝒱 implies the event . Thus [𝒱]=1o(1) implies []=1o(1), and the proof is finished. Mimicking the reasoning from the proofs of Lemma 24, Lemma 25, and Theorem 9 one can prove Theorem 10. We leave it without the proof details as this would be very repetitive.

6 Modularity of 𝑮𝒏𝒉 vanishes with 𝒉

Recall that the main result of the paper (Theorem 5) states that the modularity of a preferential attachment graph Gnh is with high probability upper bounded by a function tending to 0 with h tending to infinity. The whole current section is devoted to its proof.

The first step in the proof of Theorem 5 follows from an interesting general result on modularity by Dinh and Thai.

Lemma 26 ([10], Lemma 1).

Let G be a graph with at least one edge and let k{1}. Then

mod(G)kk1max𝒜:|𝒜|kmod𝒜(G).

In particular,

mod(G)2max𝒜:|𝒜|2mod𝒜(G).
Corollary 27.

Let G=(V,E) be a graph with at least one edge. Then

mod(G)4maxSV(e(S)e(G)vol(S)2vol(G)2).

Proof.

Consider 2-element partitions of V. We have

max𝒜:|𝒜|=2mod𝒜(G)=max𝒜={S,VS}(e(S)e(G)vol(S)2vol(G)2+e(VS)e(G)vol(VS)2vol(G)2)2maxSV(e(S)e(G)vol(S)2vol(G)2).

The conclusion follows by Lemma 26. (Note that for S=V the argument of the maximum equals 0 thus the bound is non-negative.)

The above corollary frees us from considering all the partitions of V when analyzing modularity of Gnh. We can simply concentrate on upper bounding the values of (e(S)e(Gnh)vol(S)2vol(Gnh)2) over all SV. To do so, we use the concentration results for e(S) and vol(S) obtained in Section 5.

Proof of Theorem 5.

Fix ε>0 and let g(h)=g𝒱(h)2+2ln2. Let us define the events , and 𝒱 as follows

={mod(Gnh)(1+ε)f(h)h},
={SVe(S)μ(S~)2+(1+ε)g(h)Mh},
𝒱={SVvol(S)2Mμ(S~)(1+ε)g𝒱(h)Mh}.

By Theorem 8 and Theorem 9 we know that [𝒱]=1o(1). Thus it is enough to show that the event 𝒱 implies the event .

Assume that 𝒱 holds. Recall that e(Gnh)=M and vol(Gnh)=2M. For any SV we may write

e(S)e(Gnh)vol(S)2vol(Gnh)2=4Me(S)vol(S)24M214M2(4Mμ(S~)2+4(1+ε)g(h)M2h(2Mμ(S~)(1+ε)g𝒱(h)Mh)2)=(1+ε)g(h)h+μ(S~)(1+ε)g𝒱(h)Mh(1+ε)2g𝒱(h)24h(1+ε)(g(h)+g𝒱(h)g𝒱(h)2/(4h))h,

where the last inequality follows from Lemma 15 and is valid for sufficiently large n. By Corollary 27, for sufficiently large n we obtain

mod(Gnh)4maxSV(e(S)e(Gnh)vol(S)2vol(Gnh)2)(1+ε)(4g(h)+4g𝒱(h)g𝒱(h)2/h)h=(1+ε)f(h)h.

We got that for sufficiently large n the event 𝒱 implies the event thus [𝒱]=1o(1) implies []=1o(1) and the proof is finished.

We finish by proving Corollary 6.

Proof of Corollary 6.

The corollary follows from Theorem 5 by considering f~(h)=6g𝒱(h)+42ln2 instead of f(h) in the upper bound (note that 32ln23.54, (8/9)ln20.62 and 4ln2+12+42ln219.49).

7 Concluding remarks

We showed that the modularity of a preferential attachment graph Gnh is, with high probability, upper bounded by a function of the order Θ(lnh/h). This proves Conjecture 3 but means that Conjecture 4, saying that modularity of Gnh is, with high probability, of the order Θ(1/h), still remains open. Note that the term Θ(1/h) can also be seen as Θ(1/d¯h), where d¯h states for the average vertex degree in Gnh. Such behavior of modularity has already been reported in other random graphs. For Gn,r being a random r-regular graph it is known that whp mod(Gn,r)=Θ(1/r) (see [25]). For binomial random graph G(n,p) it is known that mod(G(n,p))=Θ(1/np) when np1 and p is bounded below 1 (see [26, 34]). These might be premises supporting Conjecture 4.

To the best of our knowledge, this paper provides the first concentration results for vol(S), e(S), and e(S,VS) in Gnh, where S can be an arbitrary subset of V. The analogous results obtained so far, e.g. in [7], [12] or [30] always addressed “compact” subsets of vertices, i.e., sets of the form {i,i+1,i+2,,j}. (In Lemma 4 of [12] the authors investigate the volume of any set S[t] of size 1kt at time t but in fact in their proof the volume of S is upper bounded by the volume of the k “oldest” vertices in [t], i.e., the volume of the set [k].) In this paper more accurate analysis was possible thanks to introducing indicator functions δiS~ in Definition 7. We believe that the proof methods leading to results obtained in Section 5 might help in the future to get bounds for edge expansion or conductance of Gnh that are stronger than those currently known.

References

  • [1] K. Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 19(3):357–367, 1967. doi:10.2748/tmj/1178243286.
  • [2] J.P. Bagrow. Communities and bottlenecks: Trees and treelike networks have high modularity. Physical Review E, 85:066118, 2012. doi:10.1103/PhysRevE.85.066118.
  • [3] A.L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999. doi:10.1126/science.286.5439.509.
  • [4] P. Bennett and A. Dudek. A gentle introduction to the differential equation method and dynamic concentration. Discrete Mathematics, 345(12):113071, 2022. doi:10.1016/j.disc.2022.113071.
  • [5] V.D. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008. doi:10.1088/1742-5468/2008/10/P10008.
  • [6] B. Bollobás and O. Riordan. Handbook of Graphs and Networks: From the Genome to the Internet. Wiley-VCH, 2003. Pages 1–34. doi:10.1002/3527602755.
  • [7] B. Bollobás and O. Riordan. The diameter of a scale-free random graph. Combinatorica, 24:5–34, 2004. doi:10.1007/s00493-004-0002-2.
  • [8] B. Bollobás, O. Riordan, J. Spencer, and G. Tusnády. The degree sequence of a scale-free random graph process. Random Structures & Algorithms, 18(3):279–290, April 2001. doi:10.1002/rsa.1009.
  • [9] J. Chellig, N. Fountoulakis, and F. Skerman. The modularity of random graphs on the hyperbolic plane. Journal of Complex Networks, 10(1):cnab051, 2021. doi:10.1093/comnet/cnab051.
  • [10] T.N. Dinh and M.T. Thai. Finding community structure with performance guarantees in scale-free networks. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pages 888–891, 2011. doi:10.1109/PASSAT/SocialCom.2011.185.
  • [11] D. Freedman. On tail probabilities for martingales. The Annals of Probability, 3(1):100–118, 1975. doi:10.1214/aop/1176996452.
  • [12] A. Frieze, X. Pérez-Giménez, P. Prałat, and B. Reiniger. Perfect matchings and Hamiltonian cycles in the preferential attachment model. Random Structures & Algorithms, 54(2):258–288, 2019. doi:10.1002/rsa.20778.
  • [13] G. Gilad and R. Sharan. From Leiden to Tel-Aviv University (TAU): exploring clustering solutions via a genetic algorithm. PNAS Nexus, 2(6):pgad180, 2023. doi:10.1093/pnasnexus/pgad180.
  • [14] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963. doi:10.1080/01621459.1963.10500830.
  • [15] R. van der Hofstad. Random Graphs and Complex Networks. Vol.1. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2017. doi:10.1017/9781316779422.
  • [16] S. Janson, T. Łuczak, and A. Ruciński. Random Graphs. John Wiley & Sons, Inc., 2000. doi:10.1002/9781118032718.
  • [17] R. P. Boas Jr. and J. W. Wrench Jr. Partial sums of the harmonic series. The American Mathematical Monthly, 78(8):864–870, 1971. doi:10.2307/2316476.
  • [18] B. Kamiński, V. Poulin, P. Prałat, P. Szufel, and F. Théberge. Clustering via hypergraph modularity. Plos One, 14:e0224307, February 2019. doi:10.1371/journal.pone.0224307.
  • [19] B. Kamiński, P. Prałat, and F.Théberge. Mining Complex Networks. Chapman and Hall/CRC, 2021. doi:10.1201/9781003218869.
  • [20] M. Lasoń and M. Sulkowska. Modularity of minor-free graphs. Journal of Graph Theory, 102(4):728–736, 2023. doi:10.1002/jgt.22896.
  • [21] L. Lichev and D. Mitsche. On the modularity of 3-regular random graphs and random graphs with given degree sequences. Random Structures & Algorithms, 61(4):754–802, 2022. doi:10.1002/rsa.21080.
  • [22] Hosam M. Mahmoud, R. T. Smythe, and J. Szymański. On the structure of random plane-oriented recursive trees and their branches. Random Structures & Algorithms, 4(2):151–176, 1993. doi:10.1002/rsa.3240040204.
  • [23] C. McDiarmid, K. Rybarczyk, F. Skerman, and M. Sulkowska. Note on edge expansion and modularity in preferential attachment graphs, 2025. Preprint.
  • [24] C. McDiarmid and F. Skerman. Modularity in random regular graphs and lattices. Electronic Notes in Discrete Mathematics, 43:431–437, 2013. doi:10.1016/j.endm.2013.07.063.
  • [25] C. McDiarmid and F. Skerman. Modularity of regular and treelike graphs. Journal of Complex Networks, 6(4):596–619, 2018. doi:10.1093/comnet/cnx046.
  • [26] C. McDiarmid and F. Skerman. Modularity of Erdős-Rényi random graphs. Random Structures & Algorithms, 57(1):211–243, 2020. doi:10.1002/rsa.20910.
  • [27] M. Mihail, C. Papadimitriou, and A. Saberi. On certain connectivity properties of the internet topology. Journal of Computer and System Sciences, 72:239–251, 2006. doi:10.1016/j.jcss.2005.06.009.
  • [28] M.E.J. Newman. Networks. Oxford University Press, 2018. doi:10.1093/oso/9780198805090.001.0001.
  • [29] M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69(2):026113, 2004. doi:10.1103/physreve.69.026113.
  • [30] L. Prokhorenkova, A. Raigorodskii, and P. Prałat. Modularity of complex networks models. Internet Mathematics, 2017. doi:10.24166/im.12.2017.
  • [31] H. Robbins. A remark on Stirling’s formula. The American Mathematical Monthly, 62(1):26–29, 1955. doi:10.2307/2308012.
  • [32] K. Rybarczyk. Modularity of random intersection graphs. In M. Bloznelis, P. Drungilas, B. Kamiński, P. Prałat, M. Šileikis, F. Théberge, and R. Vaicekauskas, editors, Modelling and Mining Networks, pages 30–44, Cham, 2025. Springer Nature Switzerland. doi:10.1007/978-3-031-92898-7_3.
  • [33] K. Rybarczyk and M. Sulkowska. Modularity of preferential attachment graphs, 2025. doi:10.48550/arXiv.2501.06771.
  • [34] K. Rybarczyk and M. Sulkowska. New bounds on the modularity of G(n,p), 2025. doi:10.48550/arXiv.2504.16254.
  • [35] J. Szymański. On a nonuniform random recursive tree. In A. Barlotti, M. Biliotti, A. Cossu, G. Korchmaros, and G. Tallini, editors, Annals of Discrete Mathematics (33), volume 144 of North-Holland Mathematics Studies, pages 297–306. North-Holland, 1987. doi:10.1016/S0304-0208(08)73062-7.
  • [36] V.A. Traag, L. Waltman, and N.J. van Eck. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports, 9(5233), 2019. doi:10.1038/s41598-019-41695-z.
  • [37] L. Warnke. On the method of typical bounded differences. Combinatorics, Probability & Computing, 25:269–299, 2022. doi:10.1017/S0963548315000103.