Additive Approximation Algorithms for Modularity Maximization

The modularity is a quality function in community detection, which was introduced by Newman and Girvan (2004). Community detection in graphs is now often conducted through modularity maximization: given an undirected graph $G=(V,E)$, we are asked to find a partition $\mathcal{C}$ of $V$ that maximizes the modularity. Although numerous algorithms have been developed to date, most of them have no theoretical approximation guarantee. Recently, to overcome this issue, the design of modularity maximization algorithms with provable approximation guarantees has attracted significant attention in the computer science community. In this study, we further investigate the approximability of modularity maximization. More specifically, we propose a polynomial-time $\left(\cos\left(\frac{3-\sqrt{5}}{4}\pi\right) - \frac{1+\sqrt{5}}{8}\right)$-additive approximation algorithm for the modularity maximization problem. Note here that $\cos\left(\frac{3-\sqrt{5}}{4}\pi\right) - \frac{1+\sqrt{5}}{8}<0.42084$ holds. This improves the current best additive approximation error of $0.4672$, which was recently provided by Dinh, Li, and Thai (2015). Interestingly, our analysis also demonstrates that the proposed algorithm obtains a nearly-optimal solution for any instance with a very high modularity value. Moreover, we propose a polynomial-time $0.16598$-additive approximation algorithm for the maximum modularity cut problem. It should be noted that this is the first non-trivial approximability result for the problem. Finally, we demonstrate that our approximation algorithm can be extended to some related problems.


Introduction
Identifying community structure is a fundamental primitive in graph mining [16]. Roughly speaking, a community (also referred to as a cluster or module) in a graph is a subset of vertices densely connected with each other, but sparsely connected with the vertices outside the subset. Community detection in graphs is a powerful way to discover components that have some special roles or possess important functions. For example, consider the graph representing the World Wide Web, where vertices correspond to web pages and edges represent hyperlinks between pages. Communities in this graph are likely to be the sets of web pages dealing with the same or similar topics, or sometimes link spam [19]. As another example, consider the protein-protein interaction graphs, where vertices correspond to proteins within a cell and edges represent interactions between proteins. Communities in this graph are likely to be the sets of proteins that have the same or similar functions within the cell [31].
To date, numerous community detection algorithms have been developed, most of which are designed to maximize a quality function. Quality functions in community detection return some value that represents the community-degree for a given partition of the set of vertices. The best known and widely used quality function is the modularity, which was introduced by Newman and Girvan [29]. Let G = (V, E) be an undirected graph consisting of n = |V | vertices and m = |E| edges. The modularity, a quality function for a partition C = {C 1 , . . . , C k } of V (i.e., k i=1 C i = V and C i ∩ C j = ∅ for i = j), can be written as where m C represents the number of edges whose endpoints are both in C, and D C represents the sum of degrees of the vertices in C. The modularity represents the sum, over all communities, of the fraction of the number of edges within communities minus the expected fraction of such edges assuming that they are placed at random with the same degree distribution. The modularity is based on the idea that the greater the above surplus, the more community-like the partition C.
Although the modularity is known to have some drawbacks (e.g., the resolution limit [17] and degeneracies [21]), community detection is now often conducted through modularity maximization: given an undirected graph G = (V, E), we are asked to find a partition C of V that maximizes the modularity. Note that the modularity maximization problem has no restriction on the number of communities in the output partition; thus, the algorithms are allowed to specify the best number of communities by themselves. Brandes et al. [6] proved that modularity maximization is NP-hard. This implies that unless P = NP, there exists no polynomial-time algorithm that finds a partition with maximum modularity for any instance. A wide variety of applications (and this hardness result) have promoted the development of modularity maximization heuristics. In fact, there are numerous algorithms based on various techniques such as greedy procedure [5,11,29], simulated annealing [22,24], spectral optimization [28,30], extremal optimization [15], and mathematical programming [1,7,8,25]. Although some of them are known to perform well in practice, they have no theoretical approximation guarantee at all.
Recently, to overcome this issue, the design of modularity maximization algorithms with provable approximation guarantees has attracted significant attention in the computer science community. DasGupta and Desai [12] designed a polynomial-time -additive approximation algorithm 1 for dense graphs (i.e., graphs with m = Ω(n 2 )) using an algorithmic version of the regularity lemma [18], where > 0 is an arbitrary constant. Moreover, Dinh, Li, and Thai [13] very recently developed a polynomial-time 0.4672-additive approximation algorithm. This is the first polynomial-time additive approximation algorithm with a non-trivial approximation guarantee for modularity maximization (that is applicable to any instance). 2 Note that, to our knowledge, this is the current best additive approximation error. Their algorithm is based on the semidefinite programming (SDP) relaxation and the hyperplane separation technique.

Our Contribution
In this study, we further investigate the approximability of modularity maximization. Our contribution can be summarized as follows: 1. We propose a polynomial-time cos 3 < 0.42084 holds; thus, this improves the current best additive approximation error of 0.4672, which was recently provided by Dinh, Li, and Thai [13]. Interestingly, our analysis also demonstrates that the proposed algorithm obtains a nearly-optimal solution for any instance with a very high modularity value.
2. We propose a polynomial-time 0.16598-additive approximation algorithm for the maximum modularity cut problem. It should be noted that this is the first non-trivial approximability result for the problem.
3. We demonstrate that our additive approximation algorithm for the modularity maximization problem can be extended to some related problems.
First result. Let us describe our first result in details. Our additive approximation algorithm is also based on the SDP relaxation and the hyperplane separation technique. However, as described below, our algorithm is essentially different from the one proposed by Dinh, Li, and Thai [13]. The algorithm by Dinh, Li, and Thai [13] reduces the SDP relaxation for the modularity maximization problem to the one for MaxAgree problem arising in correlation clustering (e.g., see [3] or [9]) by adding an appropriate constant to the objective function. Then, the algorithm adopts the SDP-based 0.7664-approximation algorithm 3 for MaxAgree problem, which was developed by Charikar, Guruswami, and Wirth [9]. In fact, the additive approximation error of 0.4672 is just derived from 2(1 − κ), where κ represents the approximation ratio of the SDP-based algorithm for MaxAgree problem (i.e., κ = 0.7664). It should be noted that the analysis of the SDP-based algorithm for MaxAgree problem [9] aims at multiplicative approximation rather than additive one. As a result, the analysis by Dinh, Li, and Thai [13] has caused a gap in terms of additive approximation.
In contrast, our algorithm does not depend on such a reduction. In fact, our algorithm just solves the SDP relaxation for the modularity maximization problem without any transformation. Moreover, our algorithm employs a hyperplane separation procedure different from the one used in their algorithm. The algorithm by Dinh, Li, and Thai [13] generates 2 and 3 random hyperplanes to obtain feasible solutions, and then returns the better one. On the other hand, our algorithm chooses an appropriate number of hyperplanes using the information of the optimal solution to the SDP relaxation so that the lower bound on the expected modularity value is maximized. Note here that this modification does not improve the worst-case performance of the algorithm by Dinh, Li, and Thai [13]. In fact, as shown in our analysis, their algorithm already has the additive approximation error of cos 3− 8 . However, we demonstrate that the proposed algorithm has a much better lower bound on the expected modularity value for many instances. In particular, for any instance with optimal value close to 1 (a trivial upper bound), our algorithm obtains a nearly-optimal solution. At the end of our analysis, we summarize a lower bound on the expected modularity value with respect to the optimal value of a given instance.
Second result. Here we describe our second result in details. The modularity maximization problem has no restriction on the number of clusters in the output partition. On the other hand, there also exist a number of problem variants with such a restriction. The maximum modularity cut problem is a typical one, where given an undirected graph G = (V, E), we are asked to find a partition C of V consisting of at most two components (i.e., a bipartition C of V ) that maximizes the modularity. This problem appears in many contexts in community detection. For example, a few hierarchical divisive heuristics for the modularity maximization problem repeatedly solve this problem either exactly [7,8] or heuristically [1], to obtain a partition C of V . Brandes et al. [6] proved that the maximum modularity cut problem is NP-hard (even on dense graphs). More recently, DasGupta and Desai [12] showed that the problem is NP-hard even on d-regular graphs with any fixed d ≥ 9. However, to our knowledge, there exists no approximability result for the problem.
Our additive approximation algorithm adopts the SDP relaxation and the hyperplane separation technique, which is identical to the subroutine of the hierarchical divisive heuristic proposed by Agarwal and Kempe [1]. Specifically, our algorithm first solves the SDP relaxation for the maximum modularity cut problem (rather than the modularity maximization problem), and then generates a random hyperplane to obtain a feasible solution for the problem. Although the computational experiments by Agarwal and Kempe [1] demonstrate that their hierarchical divisive heuristic maximizes the modularity quite well in practice, the approximation guarantee of the subroutine in terms of the maximum modularity cut was not analyzed. Our analysis shows that the proposed algorithm is a 0.16598-additive approximation algorithm for the maximum modularity cut problem. At the end of our analysis, we again present a lower bound on the expected modularity value with respect to the optimal value of a given instance. This reveals that for any instance with optimal value close to 1/2 (a trivial upper bound in the case of bipartition), our algorithm obtains a nearly-optimal solution.
Third result. Finally, we describe our third result. In addition to the above problem variants with a bounded number of clusters, there are many other variations of modularity maximization [16]. We demonstrate that our additive approximation algorithm for the modularity maximization problem can be extended to the following three problems: the weighted modularity maximization problem [27], the directed modularity maximization problem [23], and Barber's bipartite modularity maximization problem [4].

Related Work
SDP relaxation. The seminal work by Goemans and Williamson [20] has opened the door to the design of approximation algorithms using the SDP relaxation and the hyperplane separation technique. To date, this approach has succeeded in developing approximation algorithms for various NP-hard problems [32]. As mentioned above, Agarwal and Kempe [1] introduced the SDP relaxation for the maximum modularity cut problem. For the original modularity maximization problem, the SDP relaxation was recently used by Dinh, Li, and Thai [13].
Multiplicative approximation algorithms. As mentioned above, the design of approximation algorithms for modularity maximization has recently become an active research area in the computer science community. Indeed, in addition to the additive approximation algorithms described above, there also exist multiplicative approximation algorithms.
DasGupta and Desai [12] designed an Ω(1/ log d)-approximation algorithm for d-regular graphs with d ≤ n 2 log n . Moreover, they developed an approximation algorithm for the weighted modularity maximization problem. The approximation ratio is logarithmic in the maximum weighted degree of edge-weighted graphs (where the edge-weights are normalized so that the sum of weights are equal to the number of edges). This algorithm requires that the maximum weighted degree is less than about 5 √ n log n . These algorithms are not derived directly from logarithmic approximation algorithms for quadratic forms (e.g., see [2] or [10]) because the quadratic form for modularity maximization has negative diagonal entries. To overcome this difficulty, they designed a more specialized algorithm using a graph decomposition technique.
Dinh and Thai [14] developed multiplicative approximation algorithms for the modularity maximization problem on scale-free graphs with a prescribed degree sequence. In their graphs, the number of vertices with degree d is fixed to some value proportional to d −γ , where −γ is called the power-law exponent. For such scale-free graphs with γ > 2, they developed a polynomialtime is the Riemann zeta function. For graphs with 1 < γ ≤ 2, they developed a polynomial-time Ω(1/ log n)-approximation algorithm using the logarithmic approximation algorithm for quadratic forms [10].
Inapproximability results. There are some inapproximability results for the modularity maximization problem. DasGupta and Desai [12] showed that it is NP-hard to obtain a (1 − )approximate solution for some constant > 0 (even for complements of 3-regular graphs). More recently, Dinh, Li, and Thai [13] proved a much stronger statement, that is, there exists no polynomial-time (1 − )-approximation algorithm for any > 0, unless P = NP. It should be noted that these results are on multiplicative approximation rather than additive one. In fact, there exist no inapproximability results in terms of additive approximation for modularity maximization.

Preliminaries
Here we introduce definitions and notation used in this paper. Let G = (V, E) be an undirected graph consisting of n = |V | vertices and m = |E| edges. Let P = V × V . By simple calculation, as mentioned in Brandes et al. [6], the modularity can be rewritten as is the (unique) community to which i ∈ V belongs, and δ represents the Kronecker symbol equal to 1 if two arguments are identical and 0 otherwise. This form is useful to write mathematical programming formulations for modularity maximization. For convenience, we define We can divide the set P into the following two disjoint subsets: Clearly, we have and thus We denote this value by q, i.e., Note that for any instance, we have q < 1.

Paper Organization
This paper is structured as follows. In Section 2, we revisit the SDP relaxation for the modularity maximization problem, and then describe an outline of our algorithm. In Section 3, the approximation guarantee of the proposed algorithm is carefully analyzed. In Section 4, we propose an additive approximation algorithm for the maximum modularity cut problem. We extend our additive approximation algorithm to some related problems in Section 5. Finally, conclusions and future work are presented in Section 6.

Algorithm
In this section, we revisit the SDP relaxation for the modularity maximization problem, and then describe an outline of our algorithm. The modularity maximization problem can be formulated as follows: where e k (1 ≤ k ≤ n) represents the vector that has 1 in the kth coordinate and 0 elsewhere. We denote by OPT the optimal value of this original problem. Note that for any instance, we have OPT ∈ [0, 1). We introduce the following semidefinite relaxation problem: where S n + represents the cone of n × n symmetric positive semidefinite matrices. It is easy to see that every feasible solution X = (x ij ) of SDP satisfies x ij ≤ 1 for any (i, j) ∈ P . Although the algorithm by Dinh, Li, and Thai [13] reduces SDP to the one for MaxAgree problem by adding an appropriate constant to the objective function, our algorithm just solves SDP without any transformation. Let X * = (x * ij ) be an optimal solution to SDP, which can be computed (with an arbitrarily small error) in time polynomial in n and m. Using the optimal solution X * , we define the following two values: both of which are useful in the analysis of the approximation guarantee of our algorithm. Clearly, we have 0 ≤ z * + ≤ 1 and −1 ≤ z * − ≤ 0.

Algorithm 1 Hyperplane(k)
Obtain an optimal solution X * = (x * ij ) to SDP 2: Generate k random hyperplanes and obtain a partition We apply the hyperplane separation technique to obtain a feasible solution of the modularity maximization problem. Specifically, we consider the following general procedure: generate k random hyperplanes to separate the vectors corresponding to the optimal solution X * , and then obtain a partition C k = {C 1 , . . . , C 2 k } of V . For reference, the procedure is described in Algorithm 1. Note here that at this time, we have not yet mentioned how to determine the number k of hyperplanes we generate. As revealed in our analysis, we can choose an appropriate number of hyperplanes using the value of z * + so that the lower bound on the expected modularity value of the output of Hyperplane(k) is maximized.

Analysis
In this section, we first analyze an additive approximation error of Hyperplane(k) for each positive integer k ∈ Z >0 . Then, we provide an appropriate number k * ∈ Z >0 of hyperplanes we generate, which completes the design of our algorithm. Finally, we present a lower bound on the expected modularity value of the output of Hyperplane(k * ) with respect to the value of OPT.
When k random hyperplanes are generated independently, the probability that two vertices i, j ∈ V are in the same cluster is given by as mentioned in previous works (e.g., see [9] or [20]). For simplicity, we define the function Here we present the lower convex envelope of each of f k (x) and −f k (x).
The following lemma lower bounds the expected modularity value of the output of Hyperplane(k).
Lemma 2. Let C k be the output of Hyperplane(k). For any positive integer k, it holds that Proof. Recall that C k (i) for each i ∈ V denotes the (unique) cluster in C k that includes the vertex i. Note here that δ(C k (i), C k (j)) for each (i, j) ∈ P is a random variable, which takes 1 with probability f k (x * ij ) and 0 with probability 1 − f k (x * ij ). The expectation E[Q(C k )] is lower bounded as follows: where the last inequality follows from Jensen's inequality.
The following lemma provides an additive approximation error of Hyperplane(k) by evaluating the above lower bound on E[Q(C k )] using the value of OPT.
For simplicity, we define the function is an intersection of the functions g 2 (x) and g 3 (x). for x ∈ [0, 1]. Then, the inequality of the above lemma can be rewritten as . Figure 1 plots the above additive approximation error of Hyperplane(k) with respect to the value of z * + . As can be seen, the appropriate number of hyperplanes (i.e., the number of hyperplanes that minimizes the additive approximation error) depends on the value of z * + . Intuitively, we wish to choose k * * that satisfies However, it is not clear whether Hyperplane(k * * ) runs in polynomial time. In fact, the number k * * becomes infinity if the value of z * + approaches 1. Therefore, alternatively, our algorithm chooses k * ∈ arg min k∈{1,...,max{3, log 2 n }} g k (z * + ).
Our analysis demonstrates that the worst-case performance of Hyperplane(k * ) is exactly the same as that of Hyperplane(k * * ), and moreover, the lower bound on the expected modularity value with respect to the value of OPT is not affected by this change.
The following lemma analyzes the worst-case performance of Hyperplane(k * ); thus, it provides the additive approximation error of Hyperplane(k * ). .
It suffices to show that for any k ∈ Z >0 , For k = 1, 2, 3, and 4, we have respectively. For k ≥ 5, we have Thus, for any k ∈ Z >0 , we obtain Next, we show that By simple calculation, we get Let us take an arbitrary x with 0 ≤ x ≤ cos 3− √ 5 4 π . Since g 2 (x) > 0, we have On the other hand, take x with cos 3− √ 5 4 π ≤ x < 1. Since g 3 (x) < 0, we have Note finally that g 3 (1) < cos 3− as desired.
Remark 1. From the proof of the above lemma, it follows directly that This implies that the worst-case performance of Hyperplane(k * * ) is no better than that of Hyperplane(k * ).

Remark 2.
Here we consider the algorithm that executes Hyperplane(2) and Hyperplane (3), and then returns the better solution. Note that this algorithm is essentially the same as that proposed by Dinh, Li, and Thai [13]. From the proof of the above lemma, it follows immediately that This implies that the algorithm by Dinh, Li, and Thai [13] already has the worst-case performance exactly the same as that of Hyperplane(k * ). However, as shown below, Hyperplane(k * ) has a much better lower bound on the expected modularity value for many instances.
Finally, we present a lower bound on the expected modularity value of the output of Hyperplane(k * ) with respect to the value of OPT (rather than z * + ). The following lemma is useful to show that the lower bound on the expected modularity value with respect to the value of OPT is not affected by the change from k * * to k * . The proof can be found in Appendix A.
We are now ready to prove the following theorem. Theorem 1. Let C k * be the output of Hyperplane(k * ). It holds that In particular, if OPT ≥ cos 3− √ 5 4 π holds, then Note here that q < 1 and cos 3−

Proof. From Lemmas 3 and 4, it follows directly that
Here we prove the remaining part of the theorem. Assume that OPT ≥ cos 3− √ 5 4 π holds. By simple calculation, for any k ∈ Z >0 , we have all of which are negative for x ∈ (0, 1). This means that for any k ∈ Z >0 , the function g k (x) is strictly concave, and moreover, so is the function min k∈{1,...,max{3, log 2 n }} g k (x). From the proof of Lemma 4, the function min k∈{1,...,max{3, log 2 n }} g k (x) attains its maximum (i.e., cos 3− 4 π . Thus, the function min k∈{1,...,max{3, log 2 n }} g k (x) is strictly monotonically decreasing over the interval cos 3− √ 5 4 π , 1 . Therefore, we have where the second inequality follows from z * + ≥ OPT/q > OPT and the last equality follows from Lemma 5. Figure 2 depicts the above lower bound on E[Q(C k * )]. As can be seen, if OPT is close to 1, then Hyperplane(k * ) obtains a nearly-optimal solution. For example, for any instance with OPT ≥ 0.99900, it holds that E[Q(C k * )] > 0.90193, i.e., the additive approximation error is less than 0.09807. Remark 3. The additive approximation error of Hyperplane(k * ) depends on the value of q < 1. We see that the less the value of q, the better the additive approximation error. Thus, it is interesting to find some graphs that have a small value of q. For instance, for any regular graph G that satisfies m = α 2 n 2 , it holds that q = 1 − α, where α is an arbitrary constant in (0, 1). Here we prove the statement. Since G is regular, we have d i = 2m/n = αn for any i ∈ V . Moreover, for any {i, j} ∈ E, it holds that

Maximum Modularity Cut
In this section, we propose a polynomial-time 0.16598-additive approximation algorithm for the maximum modularity cut problem.  Figure 2: A brief illustration of the lower bound on the expected modularity value of the output of Hyperplane(k * ) with respect to the value of OPT. For simplicity, we replace q by its upper bound 1. Note that k ∈ arg min k∈Z >0 g k (OPT).

Algorithm
In this subsection, we revisit the SDP relaxation for the maximum modularity cut problem, and then describe our algorithm. The maximum modularity cut problem can be formulated as follows: We denote by OPT cut the optimal value of this original problem. Note that for any instance, it holds that OPT cut ∈ [0, 1/2], as shown in DasGupta and Desai [12]. We introduce the following semidefinite relaxation problem: where recall that S n + represents the cone of n × n symmetric positive semidefinite matrices. Let X * = (x * ij ) be an optimal solution to SDP cut , which can be computed (with an arbitrarily small error) in time polynomial in n and m. Note here that x * ij may be negative for (i, j) ∈ P with i = j, unlike SDP in the previous section. The objective function value of X * can be divided into the following two terms:

Algorithm 2 Modularity Cut
Input: Graph G = (V, E) Output: Bipartition C of V 1: Obtain an optimal solution X * = (x * ij ) to SDP cut 2: Generate a random hyperplane and obtain a bipartition We generate a random hyperplane to separate the vectors corresponding to the optimal solution X * , and then obtain a bipartition C = {C 1 , C 2 } of V . For reference, the procedure is described in Algorithm 2. As mentioned above, this algorithm is identical to the subroutine of the hierarchical divisive heuristic for the modularity maximization problem, which was proposed by Agarwal and Kempe [1].

Analysis
In this subsection, we show that Algorithm 2 obtains a 0.16598-additive approximate solution for any instance. At the end of our analysis, we present a lower bound on the expected modularity value of the output of Algorithm 2 with respect to the value of OPT cut .
We start with the following lemma.
In Algorithm 2, the probability that two vertices i, j ∈ V are in the same cluster is given by 1 − arccos(x * ij )/π. For simplicity, we define the following two functions Here we present the lower convex envelope of each of p + (x) and p − (x). The lower convex envelope of p + (x) is given by and the lower convex envelope of p − (x) is given by The following lemma lower bounds the expected modularity value of the output of Algorithm 2.
Lemma 8. Let C out be the output of Algorithm 2. It holds that Proof. Recall that C out (i) for each i ∈ V denotes the (unique) cluster in C out that includes the vertex i. Note here that δ(C out (i), C out (j)) for each (i, j) ∈ P is a random variable, which takes 1 with probability p + (x * ij ) and 0 with probability 1 − p + (x * ij ). The expectation E[Q(C out )] is lower bounded as follows: where the last inequality follows from Jensen's inequality.
The following lemma provides an additive approximation error of Algorithm 2 by evaluating the above lower bound on E[Q(C out )] using the value of OPT cut . Lemma 9. It holds that Proof. Since −1 ≤ z * − ≤ −1/2 by Lemma 6, we have 0 ≤ −1 − 2z * − ≤ 1 and hence Thus, recalling z * + + z * − ≥ OPT cut , we obtain p + (2z * Combining this with Lemma 8, we have as desired.
For simplicity, we define the function Then, the inequality of the above lemma can be rewritten as The following lemma analyzes the worst-case performance of Algorithm 2; thus, it provides the additive approximation error of Algorithm 2.
Lemma 10. It holds that Proof. If 1/2 ≤ x ≤ (β + 1)/2 ( 0.844579), then 0 ≤ 2x − 1 ≤ β holds, and hence Otherwise (i.e., (β + 1)/2 < x ≤ 1), it holds that β < 2x − 1 ≤ 1, and thus Summarizing the above, we have which is attained at Finally, we present a lower bound on E[Q(C out )] with respect to the value of OPT cut (rather than z * + ). Specifically, we have the following theorem. Proof. From Lemmas 9 and 10, it follows immediately that Here we prove the remaining part of the theorem. Assume that OPT cut ≥ √ π 2 −4 2π holds. Clearly, the function g(x) is monotonically decreasing over the interval 1 2 + √ π 2 −4 2π , 1 . Therefore, we have where the second inequality follows from z * + ≥ OPT cut + 1/2. Figure 4 depicts the above lower bound on E[Q(C out )]. As can be seen, if OPT cut is close to 1/2, then Algorithm 2 obtains a nearly-optimal solution.

Related Problems
In this section, we demonstrate that our additive approximation algorithm for the modularity maximization problem can be extended to some related problems. Modularity maximization on edge-weighted graphs. First, we consider community detection in edge-weighted graphs. Let G = (V, E, w) be a weighted undirected graph consisting of n = |V | vertices, m = |E| edges, and weight function w : E → R >0 . For simplicity, let w ij = w(i, j) and W = {i,j}∈E w ij . The weighted modularity, which was introduced by Newman [27], can be written as where s i represents the weighted degree of i ∈ V (i.e., s i = {i,j}∈E w ij ). We consider the weighted modularity maximization problem: given a weighted undirected graph G = (V, E, w), we are asked to find a partition C of V that maximizes the weighted modularity. Since this problem is a generalization of the modularity maximization problem, it is also NP-hard.
Our additive approximation algorithm for the modularity maximization problem can be generalized to the weighted modularity maximization problem. In fact, it suffices to set The analysis of the additive approximation error is similar; thus we have the following corollary. -additive approximation algorithm for the weighted modularity maximization problem.
Modularity maximization on directed graphs. Next we consider community detection in directed graphs. Let G = (V, A) be a directed graph consisting of n = |V | vertices and m = |A| edges. The directed modularity, which was introduced by Leicht and Newman [23], can be written as where A ij is the (i, j) component of the (directed) adjacency matrix A of G, and d in i and d out i , respectively, represent the in-and out-degree of i ∈ V . Note that there is no factor of 2 in the denominators, unlike the (undirected) modularity. This is due to the directed counterpart of the null model used in the definition [23]. We consider the directed modularity maximization problem: given a directed graph G = (V, A), we are asked to find a partition C of V that maximizes the directed modularity. As mentioned in DasGupta and Desai [12], this problem is also a generalization of the modularity maximization problem, and thus NP-hard.
Our additive approximation algorithm for the modularity maximization problem can also be generalized to the directed modularity maximization problem. In fact, it suffices to set The analysis of the additive approximation error is also similar; thus we have the following corollary. -additive approximation algorithm for the directed modularity maximization problem.
Barber's bipartite modularity maximization. Finally, we consider community detection in bipartite graphs. Let G = (V, E) be an undirected bipartite graph consisting of n = |V | vertices and m = |E| edges, where V can be divided into V 1 and V 2 so that every edge in E has one endpoint in V 1 and the other in V 2 . Although the modularity is applicable to community detection in bipartite graphs, the null model used in the definition does not reflect the structural property of bipartite graphs. Thus, if we know that the input graphs are bipartite, the modularity is not an appropriate quality function.
To overcome this concern, Barber [4] introduced a variant of the modularity, which is called the bipartite modularity, for community detection in bipartite graphs. The bipartite modularity can be written as Q b (C) = 1 m i∈V 1 j∈V 2 A ij − d i d j m δ(C(i), C(j)).
Note again that there is no factor of 2 in the denominators. This is due to the bipartite counterpart of the null model used in the definition [4]. We consider Barber's bipartite modularity maximization problem: given an undirected bipartite graph G = (V, E), we are asked to find a partition C of V that maximizes the bipartite modularity. This problem is known to be NP-hard [26]. Our additive approximation algorithm for the modularity maximization problem is applicable to Barber's bipartite modularity maximization problem. For each i, j ∈ V , we set otherwise.
The analysis of the additive approximation error is again similar; thus we have the following corollary. -additive approximation algorithm for Barber's bipartite modularity maximization problem.

Conclusions
In this study, we have investigated the approximability of modularity maximization. Specifically, we have proposed a polynomial-time cos 3− < 0.42084 holds; thus, this improves the current best additive approximation error of 0.4672, which was recently provided by Dinh, Li, and Thai [13]. Interestingly, our analysis has also demonstrated that the proposed algorithm obtains a nearly-optimal solution for any instance with a very high modularity value. Moreover, we have proposed a polynomial-time 0.16598-additive approximation algorithm for the maximum modularity cut problem. It should be noted that this is the first non-trivial approximability result for the problem. Finally, we have demonstrated that our additive approximation algorithm for the modularity maximization problem can be extended to some related problems.
There are several directions for future research. It is quite interesting to investigate additive approximation algorithms for the modularity maximization problem more deeply. For example, it is challenging to design an algorithm that has a better additive approximation error than that of Hyperplane(k * ). As another approach, is it possible to improve the additive approximation error of Hyperplane(k * ) by completely different analysis? Our analysis implies that if we lower bound the expectation E[Q(C k )] by the form in Lemma 3, our additive approximation error of cos 3− is the best possible. As another future direction, the inapproximability of the modularity maximization problem in terms of additive approximation should be investigated, as mentioned in Dinh, Li, and Thai [13]. Does there exist some constant > 0 such that computing an -additive approximate solution for the modularity maximization problem is NP-hard?