Abstract 1 Introduction 2 Technical Overview 3 Preliminaries 4 Algorithm for Bipartite Graphs References

A 0.51-Approximation of Maximum Matching in Sublinear n1.5 Time

Sepideh Mahabadi ORCID Microsoft Research, Redmond, WA, USA Mohammad Roghani ORCID Stanford University, CA, USA Jakub Tarnawski ORCID Microsoft Research, Redmond, WA, USA
Abstract

We study the problem of estimating the size of a maximum matching in sublinear time. The problem has been studied extensively in the literature and various algorithms and lower bounds are known for it. Our result is a 0.5109-approximation algorithm with a running time of O~(nn).

All previous algorithms either provide only a marginal improvement (e.g., 2280) over the 0.5-approximation that arises from estimating a maximal matching, or have a running time that is nearly n2. Our approach is also arguably much simpler than other algorithms beating 0.5-approximation.

Keywords and phrases:
Sublinear Algorithms, Maximum Matching, Maximal Matching, Approximation Algorithm
Category:
Track A: Algorithms, Complexity and Games
Copyright and License:
[Uncaptioned image] © Sepideh Mahabadi, Mohammad Roghani, and Jakub Tarnawski; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Streaming, sublinear and near linear time algorithms
Acknowledgements:
This work was done while Mohammad Roghani was an intern at Microsoft Research. The work was conducted in part while Sepideh Mahabadi was long-term visitor at the Simons Institute for the Theory of Computing as part of the Sublinear Algorithms program.
Related Version:
Full Version: https://arxiv.org/pdf/2506.01669
Editors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

1 Introduction

Given a graph G=(V,E), a matching is a set of edges with no common endpoints, and the maximum matching problem asks for finding a largest such subset. Matching is a fundamental combinatorial optimization problem, and a benchmark for new algorithmic techniques in all major computational models. It also has a wide range of applications such as ad allocation, social network recommendations, and information retrieval, among others. Given that many of these applications need to handle large volumes of data, the study of sublinear time algorithms for estimating the maximum matching size has received considerable attention over the past two decades.111It is impossible to find the edges of any constant-approximate matching in time sublinear with respect to the size of the input [20]. A sublinear time algorithm is not allowed to read the entire graph, which would take Ω(n2) time where n=|V|; instead it is provided oracle access to the input graph and must run in o(n2) time. There are two main oracles for graph problems considered in the literature, which we also consider in this work.

  • Adjacency list oracle. Here, the algorithm can query (v,i), where vV and in, and the oracle reports the i-th neighbor of the vertex v in its adjacency list, or NULL if i is larger than the number of v’s neighbors.

  • Adjacency matrix oracle. Here, the algorithm can query (u,v), where u,vV, and the oracle reports whether there exists an edge between u and v.

Earlier results on estimating the size of the maximum matching in sublinear time mostly focused on graphs with bounded degree Δ, starting with the pioneering work of Parnas and Ron [20], and later works of [19, 22, 21, 1, 17, 14]. However, for Δ=Ω(n) these do not lead to sublinear time algorithms. Thus, later works focused on general graphs with arbitrary maximum degree and managed to obtain sublinear time algorithms for them [13, 15, 3, 8, 11, 9, 10]. In particular, the state of the art results can be categorized into two regimes:

  • Algorithms that run in slightly sublinear time, i.e., n2Ωε(1). For example, the works of [6, 11] gave a (2/3ε)-approximation algorithm that runs in time n2Ωε(1). Later, [10] improved the approximation factor to 1ε.

  • Algorithms whose approximation factor is 0.5+ε. The state of the art result in this category is the work of [8], whose running time is n1+ε for an approximation factor of 0.5+Ωε(1). However, the best approximation factor one can get using their trade-off is only 0.5+2280.222More specifically, for ε(0,1/4), they get an algorithm with approximation factor of 0.5+270/ε that runs in time O(n1+ε). Indeed, they mention: We do not expect our techniques to lead to a better than, say, 0.51-approximation in n2Ω(1) time.

Thus, all known algorithms either give only a marginal improvement over the 0.5-approximation arising from estimating the size of a maximal matching (which can be done even in O~(n) time [3]), or have a running time that is nearly n2. Making a significant improvement on both fronts simultaneously has remained an elusive open question.

Our results.

In this work, we show how to beat the factor 0.5 in time that is strongly sublinear. Specifically, we present an algorithm that runs in time O(nn) and achieves an approximation factor of 0.5109 for estimating the size of the maximum matching.

It is worth noting that our algorithm is much simpler, both in terms of implementation and analysis, compared to the prior works that beat 0.5-approximation [8].

Theorem 1.

There exists an algorithm that, given access to the adjacency list of a graph, estimates the size of the maximum matching with a multiplicative approximation factor of 0.5109 and runs in O~(nn) time with high probability.

Theorem 2.

There exists an algorithm that, given access to the adjacency matrix of a graph, estimates the size of the maximum matching with a multiplicative-additive approximation factor of (0.5109,o(n)) and runs in O~(nn) time with high probability.

Moreover, our algorithm can be employed as a subroutine for Theorem 2 of [4] to obtain an improved approximation ratio in the dynamic setting. More precisely, that result requires a subroutine for estimating the maximum matching size in O~(nn) time, for which it uses the 0.5-approximation of [3]. Our algorithm can be used instead, resulting in a very slight improvement to the overall approximation guarantee for [4].

We note that the framework of [10] can also be used to obtain a similar result. Their algorithm, which performs a single iteration to find a constant fraction of augmenting paths of length three on top of a maximal matching, likewise yields a better-than-2 approximate matching with n2Ω(1) running time. However, the trade-off in this approach is worse in terms of both the approximation ratio and the running time.

Related work.

On the lower bound front, Parnas and Ron [20] demonstrated that any algorithm getting a constant approximation of the maximum matching size needs at least Ω(n) time. More recently, the work of [6] established that any algorithm providing a (2/3+Ω(1),εn)-multiplicative-additive approximation requires at least n6/5o(1) time. For sparse graphs, a lower bound of ΔΩ(1/ε) was shown for any εn additive approximation [5]. For dense graphs, [7] showed a lower bound of n2Oε(1) for the runtime of algorithms achieving such additive approximations.

Paper organization.

In Section 2, we provide an overview of the challenges encountered while designing our algorithm and the techniques used to address them. We first develop an algorithm for bipartite graphs with a multiplicative-additive error in Section 4, avoiding additional challenges that arise from general graphs, trying to obtain multiplicative error (in the adjacency list model), or working with the adjacency matrix. In the full version of the paper, we extend our algorithm to handle general graphs. In addition, we show how to obtain a multiplicative approximation guarantee. Finally, we present a simple reduction demonstrating that our algorithm also applies in the adjacency matrix model, achieving a multiplicative-additive approximation.

2 Technical Overview

In this section, we provide an overview of the techniques used in this paper to design our algorithm. We begin with the two-pass semi-streaming algorithm of Konrad and Naidu [16] for bipartite graphs. In the first pass, the algorithm constructs a maximal matching M. In the second pass, it constructs a maximal b-matching between vertices matched in M and those unmatched in M. More specifically, each vertex in V(M) has a capacity of k, while each vertex in VV(M) has a capacity of kb, where b=1+2 and k is a large constant. The idea is that if M is far from maximum, the b-matching will contain many length-3 augmenting paths that can be used to augment M. This algorithm obtains a (22)0.585-approximation.

Our goal is to develop a sublinear-time algorithm by translating this semi-streaming two-pass algorithm to the sublinear time model. When trying to do so, several challenges arise. In this section we describe them step by step, and show how to overcome them.

Challenge (1): constructing a maximal matching in sublinear time is not possible.

In fact, finding all edges of any constant-factor approximation of the maximum matching is impossible in sublinear time due to [20]. Dynamic algorithms for maximum matching [4, 12] use the following approach: they maintain a maximal matching M and then apply the sublinear-time random greedy maximal matching (RGMM) algorithm of Behnezhad [3] to estimate the size of the maximal b-matching. In our setting, we cannot afford to explicitly construct M. However, we can obtain oracle access to M using the sublinear-time RGMM algorithm of [3]. More specifically, we can query whether a vertex v is matched in M or not in O~(n) time. Therefore, a possible solution to address the first challenge is to design two nested oracles: the outer oracle attempts to build a maximal b-matching, whereas the inner oracle checks the status of vertices (matched or not in M) to correctly filter edges and assign capacities to each vertex.

Challenge (2): two nested oracles require 𝛀(𝒏𝟐) time.

The algorithm of [3] runs in O~(d¯(G)) time, where d¯(G) denotes the average degree of the graph G. Additionally, for the outer oracle, it requires O~(d¯(G[V(M),VV(M)])) time (i.e., queries to the inner oracle). Unfortunately, it is possible for both d¯(G) and d¯(G[V(M),VV(M)]) to be as large as Ω(n). For example, consider a graph with a vertex set AB, where |A|=|B|=n/2. The edges within A form a complete bipartite graph, while there is an εn/2-regular graph between A and B. After running the RGMM algorithm, most edges in the maximal matching belong to G[A], and most vertices in B are unmatched. Consequently, we have d¯(G[V(M),VV(M)])=Ω(n).

To address this issue, we sparsify the original graph while manually constructing a matching M. In a preprocessing step, starting from an empty M, for each unmatched vertex in the graph, we sample Θ~(n) neighbors uniformly at random. If an unmatched neighbor exists, we match the two vertices and add this edge to M. Using this preprocessing step, we show that after spending O~(nn) time, the induced subgraph of vertices that remain unmatched in M has a maximum degree of n with high probability. Moreover, since we explicitly materialize M, we are able to check if any vertex is matched in M in O(1) time, eliminating the need for costly oracle calls. Note that M need not be maximal in G; therefore we next extend it to a maximal matching.

Let M be a maximal matching in G[VV(M)] obtained by running the sublinear time RGMM algorithm of [3]. Now MM is a maximal matching for G. Inspired by the two-pass semi-streaming algorithm of [16], we attempt to augment the maximal matching MM in two possible ways (see also Figure 1):

  1. 1.

    Augment M using a b-matching between V(M) and VV(M)V(M). The algorithm then outputs the size of the augmented matching, plus the size of the previously constructed matching M.

  2. 2.

    Augment M using a b-matching between V(M) and VV(M). The algorithm then outputs the size of the augmented matching.

Figure 1: Our algorithm explicitly constructs a matching M (blue), which need not be maximal in G. We extend it with another matching M (red), such that MM is maximal. The highlighted (light blue) subgraph G[VV(M)] has degree at most n with high probability. In case 1, our algorithm augments M using a b-matching B1 (zigzag edges, brown). In case 2, our algorithm augments M using a b-matching B2 (swirly edges, green).

The key intuition here (think of the case when MM yields only a 0.5-approximation) is that either M is sufficiently large, making |M|+(22)2|M| larger than the approximation guarantee, or M itself is large enough so that (22)2|M| meets our approximation guarantee (note that (22) is the approximation guarantee of [16]). Augmenting M using a b-matching is easier since we have explicit access to M and only need to run a single RGMM oracle to estimate the size of the b-matching. Our first estimate, which requires finding a b-matching between V(M) and VV(M)V(M), is more challenging since we do not have explicit access to M. To avoid the Ω(n2) running time of the two nested oracles, we make crucial use of the aforementioned property that the subgraph of vertices unmatched in M has low induced degree (at most n); this is the reason why we only try to augment M rather than MM. We will discuss this in the next paragraphs.

Challenge (3): the algorithm does not have access to the adjacency list of 𝑮[𝑽𝑽(𝑴)].

After the sparsification step, the average degree d of G[VV(M)] is at most n. Hence, if the algorithm had access to the adjacency list of G[VV(M)], it could run the nested oracles in O~(d2)=O~(n) time by executing two RGMM algorithms: inner oracle for computing M and outer oracle for the b-matching to augment M. But, since the nested oracles may visit up to n vertices, and retrieving the full adjacency list of a vertex in G[VV(M)] requires Ω(n) time, it seems that the overall running time of the algorithm could still be as high as Ω(n2).

Here, we leverage two key properties of the RGMM algorithm to refine the runtime analysis. The first property is that at each step, the algorithm requires only a random neighbor of the currently visited vertex. Intuitively, if a vertex has degree Θ(n) in G, in expectation it takes O(n/d) samples from the adjacency list of the original graph to encounter a vertex from G[VV(M)]. Thus, if all vertices in G[VV(M)] had degree d, one could easily argue that the running time of the algorithm is O~(d2n/d)=O~(nn). However, vertex degrees can vary, and for a vertex with a constant degree, we would need Ω(n) samples from the adjacency list of G to find a single neighbor in G[VV(M)]. To address this challenge, we utilize another property of the RGMM algorithm, recently proven by [18]. Informally, this result shows that during oracle calls for RGMM, a vertex is visited proportionally to its degree, implying that low-degree vertices are visited only a small number of times.

Challenge (4): outer oracle creates biased inner oracle queries.

The final main challenge we discuss here is that the simple O~(nn) bound, which we informally proved in the previous paragraph, relies on the tacit assumption that the inner oracle queries generated by the outer oracle correspond to O~(n) uniformly random calls to the inner oracle. Indeed, the running time of the algorithm of [3] is analyzed for a uniformly random query vertex; however, there may exist a vertex v in the graph for which calling the inner oracle takes significantly more than O~(d) time. Consequently, if all outer oracle calls end up querying v, the running time could be significantly worse than O~(nn). To overcome this issue, we use the result of [18] along with the fact that the maximum degree of G[VV(M)] is O~(n). We show that for any vertex v, the outer oracle queries the inner oracle for v at most O~(degG[VV(M)](v)/n) times in expectation. This enables us to formally prove that the total running time of the algorithm remains at most O~(nn).

General graphs and the adjacency matrix model.

There are additional challenges when dealing with general graphs as opposed to bipartite graphs, such as the fact that the sizes of the maximal matching and the b-matching alone are insufficient to achieve a good approximation ratio. For general graphs, our algorithm estimates the maximum matching in the union of the maximal matching and the b-matching, which requires using the (1ε)-approximate local computation algorithm (LCA) by [17] on the subgraph formed by this union, to which we only have oracle access. We encourage readers to refer to the full version of the paper for more details about the techniques used there.

Additionally, for more information on the extension of the algorithm that operates in the adjacency matrix model, we recommend that readers check the full version of the paper.

3 Preliminaries

Given a graph G, we use V(G) to denote its set of vertices and use E(G) to refer to its set of edges. For a vertex vV(G), we use degG(v) to denote the degree of the vertex, i.e., the number of edges with one endpoint equal to v. We use Δ(G) to denote the maximum degree over all vertices in the graph, and d¯(G) to denote the average degree of the graph. Further, we use μ(G) to denote the size of the maximum matching in G.

Given a graph G=(V,E), and a subset of vertices AV, G[A] is defined to be the induced subgraph consisting of all edges with both endpoints in A. Further, given disjoint subsets A,BV of vertices, G[A,B] is defined to be the bipartite subgraph of G consisting of all edges between A and B.

Given a matching M in G, an augmenting path is a simple path starting and ending at different vertices, such that the first and the last vertices are unmatched in M, and the edges of the path alternate between not belonging to M and belonging to M.

Given a vector b of integer capacities of dimension |V(G)|, a b-matching in G is a multi-set F of edges in G such that each vertex vV appears no more than bv times as an endpoint of an edge in F.

Given a graph G and a permutation π over its edges, GMM(G,π) is used to refer to the unique matching M obtained by the following process. We let M be initialized as empty, and consider the edges of G one by one according to the permutation π. We add an edge to the matching M if none of its endpoints are already matched in M. A random greedy maximal matching, i.e., RGMM(G) is the matching obtained by picking a permutation π uniformly at random and outputting GMM(G,π).

Proposition 3 ([3]).

There exists an algorithm that, given adjacency list access to a graph G of average degree d, for a random vertex v and a random permutation π, determines if v is matched in GMM(G,π) in O~(d) expected time.

Given the problem of maximizing a function f:D defined over a domain D, with optimal value f, an (α,β)-multiplicative-additive approximation of f is a solution sD such that (f/α)βf(s)f.

4 Algorithm for Bipartite Graphs

We begin by describing our algorithm for bipartite graphs. We focus on implementing an algorithm with a multiplicative-additive approximation guarantee. Also, we assume that we have access to the adjacency list of the graph. These assumptions will help us avoid certain complications and challenges that arise when working with general graphs, the adjacency matrix model, or when trying to obtain a multiplicative approximation guarantee. To lift these assumptions, we can leverage strong tools and methods from the literature, which, with slight modifications, can be applied here. This section contains the main novelties of our approach and proofs. Our algorithm for bipartite graphs can be seen as a translation and implementation of a two-pass streaming algorithm, which we discuss in the next subsection.

4.1 Two-Pass Streaming Algorithm for Bipartite Graphs

Our starting point is the two-pass streaming algorithm which is described in Algorithm 1. This algorithm, or its variations, has appeared in previous works on designing streaming or dynamic algorithms for maximum matching [16, 12, 4]. In words, the first pass of the algorithm only finds a maximal matching M. In the second pass, the algorithm finds a maximal b-matching B in G[V(M),VV(M)], where V(M) is the set of vertices matched by M. The capacities of vertices in V(M) and in VV(M) for the b-matching are k and kb, respectively. Moreover, in the second pass of the algorithm, when an edge (u,v) arrives in the stream, we add multiple copies of the edge to the subgraph B, as long as doing so does not violate the capacity constraints.

Algorithm 1 Two-pass Streaming Algorithm for Bipartite Graphs.

Intuitively, the algorithm tries to find length-3 augmenting paths using the b-matching that it finds in the second pass. The following lemma shows the approximation guarantee of Algorithm 1.

Lemma 4 (Lemma 3.3 in [12]).

For any ε(0,1), the output of Algorithm 1 is a (22ε)-approximation for maximum matching of G.

4.2 Sublinear Time Implementation of Algorithm 1

In this section, we demonstrate how to implement a modification of Algorithm 1 in the sublinear time model, as outlined in Section 2.

Sparsification.

In order to be able to use two levels of recursive oracle calls, we need to sparsify the graph. We first sample O~(nn) edges and construct a maximal matching on the sampled edges to sparsify the induced subgraph of unmatched vertices. This sparsification step is formalized in Algorithm 2. In Lemma 6, we show that if we sample enough edges, then vertices that remain unmatched after this phase have an induced degree of n. This step is very similar to the algorithm of [13] for approximating a maximal matching.

Algorithm 2 Sparsification of the Induced Subgraph of Unmatched Vertices.
Claim 5.

Algorithm 2 runs in O~(nn) time.

Proof.

For each vertex v in the graph, the algorithm samples O~(n) vertices from the adjacency list of the vertex v if it is not matched by the time the algorithm processes the vertex in Line 3. Thus, the total running time is at most O~(nn).

Lemma 6.

With high probability, we have Δ(G[VV(M)])n.

Proof.

We will show that for every vV, the probability that vVV(M) and the degree of v in G[VV(M)] is larger than n is at most 1/n2. The lemma then follows by a union bound over all vV.

Consider the moment before v is processed. Assume that at this time, still vVV(M) and the degree of v in G[VV(M)] is larger than n. Then, each of the c samples has a probability of at least n/n to be one of the unmatched neighbors, in which case v will become matched. Thus the probability that v remains unmatched after it is processed is at most

(11n)c=(11n)2nlogn(1e)2logn=1n2.

Augmenting 𝑴 using nested oracles.

Now we are ready to present our sublinear algorithm. After sparsifying the graph by finding a partially maximal matching M, we try to augment M in two different ways that we have outlined in Section 2 and which are formalized in Algorithm 3. See also Figure 1.

For simplicity, we pretend that kb throughout the paper. Since k is an arbitrarily large constant, using kb instead of kb leads to an arbitrarily small error in the calculations. We also note that a maximal b-matching can be viewed as maximal matching if we duplicate vertices multiple times.

First, we try to augment the matching by designing a maximal matching oracle for G[VV(M)] vertices and then another oracle for finding a b-matching between the vertices newly matched using the new oracle and unmatched vertices. Let M be the maximal matching of G[VV(M)] that can be obtained by the oracle. We try to augment it with a b-matching B1.

However, it is also possible that |M| is small compared to |M|, which implies that in the previous case, the b-matching does not help to find many augmenting paths, as the size of the maximal matching that we try to augment is too small. To account for this case, the algorithm also finds a b-matching B2 between V(M) and VV(M).

Note that because the algorithm finds the initial matching M explicitly, checking whether a vertex belongs to V(M) or not can be done in O(1) time.

Algorithm 3 Sublinear Time Algorithm for Bipartite Graphs with Access to the Adjacency List (see Figure 1).
Algorithm 4 Algorithm for the First Case.
Algorithm 5 Algorithm for the Second Case.

Implementation details of the algorithm.

There are some technical details in the implementation of the algorithm that are not included in the pseudocode:

  • Access to the adjacency list of an induced subgraph: Both in Algorithm 4 and Algorithm 5, we run the algorithm of Proposition 3 for some induced subgraph of G (for example, line 5 of Algorithm 5). However, Proposition 3 works with access to the adjacency list of the input graph. To address this issue, we leverage an important property of the algorithm in Proposition 3, namely that it only needs to find a random neighbor of a given vertex at each step of its execution. Now, whenever the algorithm requires a random neighbor of vertex v in a subgraph H, it queries random neighbors in the original graph G until it finds one that belongs to H. This increases the running time of the algorithm, as it may take ω(1) time to locate a valid neighbor in H, which we will formally bound in our runtime analysis.

  • Nested oracles in line 8 of Algorithm 4: Unlike M, we do not explicitly construct the maximal matching M in Algorithm 4. Moreover, the edges of the subgraph G1 connect vertices matched by M with those that remain unmatched in either M or M. Hence, to verify whether an edge belongs to G1, we need to determine whether its endpoints are matched or unmatched in M by accessing the algorithm of Proposition 3. This again increases the algorithm’s runtime, which we will also formally bound in our runtime analysis.

4.3 Analysis of the Approximation Ratio

The following lemma, an analogue of Observation 3.1 in [12], substantiates the soundness of the estimates μ1 and μ2 produced in Algorithm 3.

Lemma 7.

Let M, M, B1 and B2 be as in the description of Algorithm 3. Then

  • μ(G)μ(MMB1)|M|+(11b)|M|+1kb|B1|,

  • μ(G)μ(MB2)(11b)|M|+1kb|B2|.

Proof.

Since G is bipartite, by integrality of the bipartite matching polytope, it is enough to exhibit a fractional matching x of the appropriate value exe. For case 1, we set

xe={1eM,11beM,1kbeB1.

Note that M, M and B1 are pairwise disjoint, and B1 has no edge to V(M). Therefore it is easy to verify that the degree constraints for x are satisfied. For case 2, we similarly set

xe={11beM,1kbeB2.

The following lemma states the (22)-approximation guarantee of the “maximal matching plus b-matching” approach obtained in prior work, for both bipartite and general graphs. We will invoke it for appropriate subgraphs of G to obtain our guarantee.

Lemma 8.

Let G be a graph, M be any maximal matching in G, and B be a maximal b-matching in G[V(M),V(G)V(M)] for vertex capacities k for vertices in V(M) and kb for vertices in V(G)V(M), where k>1bε3 and b=1+2. Then:

  • for bipartite G, we have μ(MB)(11b)|M|+1kb|B|(22ε)μ(G),

  • for general G, if B is a random greedy maximal b-matching, we still have E[μ(MB)](22ε)μ(G).

Proof.

The first statement is the same as Lemma 4, and shown as Lemma 3.3 in [12]. The second statement is shown as Claim 5.5 in [2].

The following lemma is the crux of our approximation ratio analysis.

Lemma 9.

In a bipartite graph G, let M, M, B1 and B2 be as in the description of Algorithm 3. Then

max[|M|+(11b)|M|+1kb|B1|,(11b)|M|+1kb|B2|]0.5109μ(G).

Proof.

Fix a maximum matching M in G, and partition its edges as follows (see Figure 2):

  • M2 are those with both endpoints in V(M),

  • M2 are those with one endpoint in V(M) and the other in V(M),

  • M2′′ are those with both endpoints in V(M),

  • M1 are those with one endpoint in V(M) and the other in VV(M)V(M),

  • M1 are those with one endpoint in V(M) and the other in VV(M)V(M).

Figure 2: Illustration for the proof of Lemma 9. The thick edges belong to a fixed maximum matching M. Each of them is labeled with its partition (M2, M2, M2′′, M1, or M1). The two subgraphs for which we invoke Lemma 8 are marked (case 1 – highlighted in light blue, case 2 – dashed line).

Since MM is maximal, M cannot have edges with no endpoints in V(M)V(M). We thus have

μ(G)=|M|=|M1|+|M1|+|M2|+|M2|+|M2′′|. (1)

Also, by maximality of M and simple counting,

|M| =|M2|+12|M1|+12|M2|. (2)

We will use Lemma 8 to analyze both cases in Algorithm 3. For case 1, we can use G=G[VV(M)]; in this graph, M is a maximal matching, and B1 is a b-matching as in the statement of Lemma 8 (see Figure 2), thus we have

(11b)|M|+1kb|B1|(22ε)μ(G[VV(M)])(22ε)(|M1|+|M2′′|)

since M1M2′′ is a matching in G=G[VV(M)]. With Equation 2 this gives

|M|+(11b)|M|+1kb|B1||M2|+12|M1|+12|M2|+(22ε)(|M1|+|M2′′|). (3)

For case 2, we can instead use G=G[V(M)]G[V(M),VV(M)]; in this graph, M is a maximal matching, and B2 is a b-matching as in the statement of Lemma 8 (see Figure 2), thus

(11b)|M|+1kb|B2|(22ε)μ(G)(22ε)(|M2|+|M2|+|M1|) (4)

since M2M2M1 is a matching in G. Using Equations 3 and 4, we can bound the left-hand side of this lemma’s statement as

max[] max[|M2|+12|M1|+12|M2|+(22ε)(|M1|+|M2′′|),
(22ε)(|M2|+|M2|+|M1|)]

We bound the maximum by a weighted average with weights β and 1β for some β[0,1] to be determined (max(x,y)βx+(1β)y):

|M2|(β+(22ε)(1β))
+(|M1|+|M2|)(β2+(22ε)(1β))
+(|M1|+|M2′′|)(22ε)β
(|M2|+|M1|+|M2|)(β2+(22ε)(1β))+(|M1|+|M2′′|)(22ε)β
()(|M2|+|M1|+|M2|)γ+(|M1|+|M2′′|)γ
=(1)γμ(G).

We want () to hold for some γ (the approximation ratio) as large as possible. For () to hold, we need to satisfy:

β2+(22ε)(1β) γ,
(22ε)β γ.

By solving for β2+(22)(1β)=(22)β we get β=12+2217 and γ=4(522)17O(ε)>0.5109.

Lemma 10.

Let max(μ1,μ2) be the output of Algorithm 3. With high probability, it holds that

0.5109μ(G)o(n)max(μ1,μ2)μ(G).

The proof of Lemma 10 is routine. We defer it to the full version of the paper.

4.4 Running Time Analysis

For the analysis of the running time, we use a crucial property of random greedy maximal matching algorithm that was proved recently in [18].

Proposition 11 (Lemma 5.14 of [18]).

Let Q(v) be the expected number of times that the oracle queries an adjacent edge of v if we start the oracle calls from a random vertex, for a random permutation over the edges of the graph G when running random greedy maximal matching. It holds that Q(v)=O~(degG(v)/|V(G)|).

Proposition 12 (Corollary 5.15 of [18]).

Let T(v) be the expected time needed to return a random neighbor of vertex v. Then, the expected time to run the random greedy maximal matching oracle for a random vertex and a random permutation in graph G is vV(G)O~(T(v)degG(v)/|V(G)|).

Lemma 13.

Algorithm 4 runs in O~(d¯(G)n) time in expectation.

Proof.

Without loss of generality, we assume that |VV(M)|=Ω(n); otherwise, Algorithm 3 does not need to execute Algorithm 4, as the estimate from Algorithm 5 suffices.

The proof consists of two parts: first we show that line 6 of the algorithm can be implemented in O~(d¯(G)) expected time, and then we prove that line 8 of the algorithm can be implemented in O~(d¯(G)n) expected time.

For the first part, let G′′=G[VV(M)] and let v be a vertex in G′′. In the adjacency list of v in the original graph, which contains degG(v) elements, only degG′′(v) of them are neighbors in G′′. Thus, to find a neighbor in G′′, we need to randomly sample T(v)=degG(v)/degG′′(v) elements from v’s adjacency list in expectation. Consequently, using Proposition 12, the expected time required to execute the random greedy maximal matching oracle for a randomly chosen vertex and permutation in G′′ is

vV(G′′)O~(T(v)degG′′(v)|V(G′′)|)=vV(G′′)O~((degG(v)/degG′′(v))degG′′(v)|V(G′′)|)
=vV(G′′)O~(degG(v)|V(G′′)|)=O~(|E(G)||V(G′′)|)=O~(|E(G)||V(G)|)=O~(d¯(G)).

Note that this bound is for r random permutations and vertices. However, in the algorithm, we only choose one permutation π and then run the random greedy maximal matching for r randomly chosen vertices. Let S be the set of r vertices that are chosen uniformly at random. Also, let F(v,π) be the running time of random greedy maximal matching on vertex v and permutation π. Therefore, the expected running time can be written as

πSvSF(v,π)|E[G′′]|!(|V[G′′]|r) πvV[G′′](|V[G′′]|1r1)F(v,π)|E[G′′]|!(V[G′′]r)
πvV[G′′]rF(v,π)|E[G′′]|!|V[G′′]|
=rEπ,vV[G′′][F(v,π)].

Since r=O~(1), the expected running time of the first part is O~(d¯(G)).

We prove the second part in a few steps. As a first step, for simplicity, assume that we have access to the adjacency list of G1 and there is no need to run nested oracles. Then, using Proposition 3, the expected running time of line 8 is bounded by O~(d¯(G1)) for r=O~(1) random permutations and random vertices.

Now, suppose that we have access to the adjacency list of G′′=G[VV(M)] instead of G1. By Proposition 12, the total expected runtime of the outer (b-matching) oracle is

O~(vG1Touter(v)degG1(v)/|V(G1)|),

where Touter(v) is the expected time needed to find a random neighbor of v in G1. To find such a neighbor, we sample neighbors w of v in G′′ and check whether w is matched in M using the inner oracle. The expected number of these checks until a vertex w with (v,w)E(G1) is found is O~(degG′′(v)/degG1(v)). The expected cost of each check is Ew:(v,w)E(G′′)[costinner(w)], where costinner(w) is the runtime of invoking the inner oracle to check if w is matched in M. Putting this together, the expected total runtime is

O~(vG1Touter(v)degG1(v)|V(G1)|)
=O~(1|V(G1)|vG1degG′′(v)degG1(v)w:(v,w)E(G′′)costinner(w)degG′′(v)degG1(v))
=O~(1|V(G1)|wG′′costinner(w)degG′′(w))
=O~(wG′′costinner(w)|V(G′′)|degG′′(w))
=Lemma 6O~(nEwG′′[costinner(w)])
=Proposition 12O~(nvG′′degG′′(v)Tinner(v)1|V(G′′)|)

where Tinner(v) is the expected time needed to find a random neighbor of v in G′′. Under our assumption of having access to the adjacency list of G′′, Tinner(v)=O(1) and thus we can finish bounding with O~(nd¯(G′′))=O~(nd¯(G)).

Now we lift the assumption of having access to the adjacency list of G′′. This means that, similarly as in the first part, we need to sample Tinner(v)=degG(v)/degG′′(v) vertices from the adjacency list of v in expectation (checking if a vertex is matched in M is done in O(1) time). Then the total bound becomes

O~(nvG′′degG′′(v)degG(v)degG′′(v)1|V(G′′)|)=O~(n|E(G)||V(G′′)|)=O~(nd¯(G)).

Also Touter(v) increases by degG(v)/degG1(v), which increases the total runtime only by

O~(1|V(G1)|vG1degG(v)degG1(v)degG1(v))=O~(d¯(G)).

Finally, since we have a fixed permutation π instead of r random permutations, we can use the exact same argument as in the proof of the first part to conclude the proof.

Lemma 14.

Algorithm 5 runs in O~(d¯(G)) time in expectation.

Proof.

It is enough to show that it takes O~(d¯(G)) time to run the algorithm of Proposition 3 (the b-matching oracle) for each sampled vertex and permutation, as we have r=O~(1). We repeat the same arguments as in the proof of Lemma 13.

Let v be a vertex in the graph G2=G[V(M),VV(M)]. Note that in the adjacency list of v in the original graph, which contains degG(v) elements, only degG2(v) of the elements are neighbors in G2. Thus, we need to randomly sample T(v)=degG(v)/degG2(v) elements of the adjacency list of v to find a neighbor in G2. Recall that we can check whether a vertex is matched in M in O(1) time. Therefore, using Proposition 12, the expected time needed to run the random greedy maximal matching for a random vertex and permutation in G2 is

vV(G2)O~(T(v)degG2(v)|V(G2)|)=vVO~((degG(v)/degG2(v))degG2(v)n)
=vVO~(degG(v)n)=O~(|E(G)|n)=O~(d¯(G)).

Lemma 15.

Algorithm 3 runs in O~(nn) time with high probability.

Proof.

By Claim 5, the sparsification step takes O~(nn) time. Also, by Lemmas 13 and 14, Algorithms 4 and 5 run in O~(d¯(G)n) and O~(d¯(G)) expected time, respectively. To establish a high probability bound on the time complexity, we execute O(logn) instances of the algorithm in parallel and halt as soon as the first one completes. By applying Markov’s inequality, we deduce that each individual instance terminates within O~(nn) time with constant probability. As a result, with high probability, at least one of these instances finishes within O~(nn) time. This concludes the proof.

Now we are ready to prove the final theorem of this section.

Theorem 16.

There exists an algorithm that, given access to the adjacency list of a bipartite graph, estimates the size of the maximum matching with a multiplicative-additive approximation factor of (0.5109,o(n)) and runs in O~(nn) time with high probability.

Proof.

By Lemma 10, Algorithm 3 achieves multiplicative-additive approximation of (0.5109,o(n)). Moreover, the running time of Algorithm 3 is O~(nn) with high probability by Lemma 15.

References

  • [1] Noga Alon, Ronitt Rubinfeld, Shai Vardi, and Ning Xie. Space-Efficient Local Computation Algorithms. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 1132–1139, 2012. doi:10.1137/1.9781611973099.89.
  • [2] Amir Azarmehr, Soheil Behnezhad, and Mohammad Roghani. Fully dynamic matching: (22)-approximation in polylog update time. In David P. Woodruff, editor, Proceedings of the 2024 ACM-SIAM Symposium on Discrete Algorithms, SODA 2024, Alexandria, VA, USA, January 7-10, 2024, pages 3040–3061. SIAM, 2024. doi:10.1137/1.9781611977912.109.
  • [3] Soheil Behnezhad. Time-optimal sublinear algorithms for matching and vertex cover. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 2022, pages 873–884. IEEE, 2021. doi:10.1109/FOCS52979.2021.00089.
  • [4] Soheil Behnezhad. Dynamic algorithms for maximum matching size. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 129–162. SIAM, 2023. doi:10.1137/1.9781611977554.CH6.
  • [5] Soheil Behnezhad, Mohammad Roghani, and Aviad Rubinstein. Local computation algorithms for maximum matching: New lower bounds. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 2322–2335. IEEE, 2023. doi:10.1109/FOCS57990.2023.00143.
  • [6] Soheil Behnezhad, Mohammad Roghani, and Aviad Rubinstein. Sublinear time algorithms and complexity of approximate maximum matching. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 267–280, 2023. doi:10.1145/3564246.3585231.
  • [7] Soheil Behnezhad, Mohammad Roghani, and Aviad Rubinstein. Approximating maximum matching requires almost quadratic time. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 444–454, 2024. doi:10.1145/3618260.3649785.
  • [8] Soheil Behnezhad, Mohammad Roghani, Aviad Rubinstein, and Amin Saberi. Beating greedy matching in sublinear time. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 3900–3945. SIAM, 2023. doi:10.1137/1.9781611977554.CH151.
  • [9] Soheil Behnezhad, Mohammad Roghani, Aviad Rubinstein, and Amin Saberi. Sublinear algorithms for TSP via path covers. CoRR, abs/2301.05350, 2023. doi:10.48550/arXiv.2301.05350.
  • [10] Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak. Dynamic (1+ε)-approximate matching size in truly sublinear update time. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 1563–1588. IEEE, 2023.
  • [11] Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak. Sublinear algorithms for (1.5+ε)-approximate matching. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 254–266, 2023. doi:10.1145/3564246.3585252.
  • [12] Sayan Bhattacharya, Peter Kiss, Thatchaphol Saranurak, and David Wajc. Dynamic matching with better-than-2 approximation in polylogarithmic update time. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 100–128. SIAM, 2023. doi:10.1137/1.9781611977554.CH5.
  • [13] Yu Chen, Sampath Kannan, and Sanjeev Khanna. Sublinear algorithms and lower bounds for metric tsp cost estimation. arXiv preprint arXiv:2006.05490, 2020. arXiv:2006.05490.
  • [14] Mohsen Ghaffari. Local computation of maximal independent set. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 438–449, 2022. doi:10.1109/FOCS54457.2022.00049.
  • [15] Michael Kapralov, Slobodan Mitrović, Ashkan Norouzi-Fard, and Jakab Tardos. Space efficient approximation to maximum matching size from uniform edge samples. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1753–1772. SIAM, 2020. doi:10.1137/1.9781611975994.107.
  • [16] Christian Konrad and Kheeran K. Naidu. On two-pass streaming algorithms for maximum bipartite matching. In Mary Wootters and Laura Sanità, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2021, August 16-18, 2021, University of Washington, Seattle, Washington, USA (Virtual Conference), volume 207 of LIPIcs, pages 19:1–19:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.APPROX/RANDOM.2021.19.
  • [17] Reut Levi, Ronitt Rubinfeld, and Anak Yodpinyanee. Local computation algorithms for graphs of non-constant degrees. Algorithmica, 77(4):971–994, 2017. doi:10.1007/S00453-016-0126-Y.
  • [18] Sepideh Mahabadi, Mohammad Roghani, Jakub Tarnawski, and Ali Vakilian. Sublinear Metric Steiner Tree via Improved Bounds for Set Cover. In Raghu Meka, editor, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025), volume 325 of Leibniz International Proceedings in Informatics (LIPIcs), pages 74:1–74:24, Dagstuhl, Germany, 2025. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2025.74.
  • [19] Huy N Nguyen and Krzysztof Onak. Constant-time approximation algorithms via local improvements. In 2008 49th annual IEEE symposium on foundations of computer science, pages 327–336. IEEE, 2008. doi:10.1109/FOCS.2008.81.
  • [20] Michal Parnas and Dana Ron. Approximating the Minimum Vertex Cover in Sublinear Time and a Connection to Distributed Algorithms. Theor. Comput. Sci., 381(1-3):183–196, 2007. doi:10.1016/J.TCS.2007.04.040.
  • [21] Ronitt Rubinfeld, Gil Tamir, Shai Vardi, and Ning Xie. Fast local computation algorithms. In Innovations in Computer Science - ICS 2011, Tsinghua University, Beijing, China, January 7-9, 2011. Proceedings, pages 223–238, 2011. URL: http://conference.iiis.tsinghua.edu.cn/ICS2011/content/papers/36.html.
  • [22] Yuichi Yoshida, Masaki Yamamoto, and Hiro Ito. An improved constant-time approximation algorithm for maximum matchings. In Michael Mitzenmacher, editor, Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 225–234. ACM, 2009. doi:10.1145/1536414.1536447.