Maximum Matching sans Maximal Matching: A New Approach for Finding Maximum Matchings in the Data Stream Model

The problem of finding a maximum size matching in a graph (known as the maximum matching problem) is one of the most classical problems in computer science. Despite a significant body of work dedicated to the study of this problem in the data stream model, the state-of-the-art single-pass semi-streaming algorithm for it is still a simple greedy algorithm that computes a maximal matching, and this way obtains 1/2-approximation. Some previous works described two/three-pass algorithms that improve over this approximation ratio by using their second and third passes to improve the above mentioned maximal matching. One contribution of this paper continuous this line of work by presenting new three-pass semi-streaming algorithms that work along these lines and obtain improved approximation ratios of 0.6111 and 0.5694 for triangle-free and general graphs, respectively. Unfortunately, a recent work (Konrad and Naidu, 2021) shows that the strategy of constructing a maximal matching in the first pass and then improving it in further passes has limitations. Additionally, this technique is unlikely to get us closer to single-pass semi-streaming algorithms obtaining a better than 1/2-approximation. Therefore, it is interesting to come up with algorithms that do something else with their first pass (we term such algorithms non-maximal-matching-first algorithms). No such algorithms are currently known (to the best of our knowledge), and the main contribution of this paper is describing such algorithms that obtain approximation ratios of 0.5384 and 0.5555 in two and three passes, respectively, for general graphs (the result for three passes improves over the previous state-of-the-art, but is worse than the result of this paper mentioned in the previous paragraph for general graphs).


Introduction
The problem of finding a maximum size matching in a graph (known as the maximum matching problem) is one of the most classical problems in computer science, and many polynomial time algorithms have been designed for it over the years (see, e.g., [9,15,23]). Due to its central role, the maximum matching problem is often one of the first problems considered when new computational models are suggested. One such model is the data stream model, which is motivated by Big-Data applications, and has been the subject of an enormous amount of research over the last couple of decades.
In the data stream model, the algorithm receives the input in the form of a stream which it can read sequentially, but due to memory restrictions, the algorithm can store only a small part of this stream. This means that the algorithm has to process (in some sense) the input stream while reading it, and never gets an opportunity to see all the parts of the input at the same time. Traditional algorithms for this model, known as streaming algorithms, are allowed only memory that is poly-logarithmic in the natural parameters of the problem. Obtaining a streaming algorithm for a problem is very desirable, but is often not possible. In particular, many graph problems provably do not admit streaming algorithms, and the maximum matching problem is among these problems if one would like an algorithm for the problem to output an (approximately) maximum matching because such a matching might be of linear size in the number of vertices. Nevertheless, non-trivial streaming algorithms have been designed for the maximum matching problem when only the (approximate) size of a maximum matching is desired (see Section 1.1 for details).
The resistance of many graph problems to streaming algorithms has motivated Feigenbaum et al. [19] to suggest semi-streaming algorithms, which are algorithms for the data stream model that are allowed a space complexity of O(n log c n) for some constant c ≥ 0, where n is the number of vertices in the graph. Such algorithms turn out to be a sweet-spot that on the one hand allows many results of interest, and on the other hand, does not lead to triviality because O(n log c n) is less than the space necessary for storing the input graph (unless this graph is very sparse). In particular, Feigenbaum et al. [19] observed that one can obtain 1 /2-approximation for the maximum matching problem using a simple semi-streaming algorithm that greedily constructs a maximal matching. 1 The above 1 /2-approximation semi-streaming algorithm for the maximum matching problem also has the desirable property that it reads the input stream only once (i.e., it makes a single pass over it). Surprisingly, no single-pass semi-streaming algorithm improving over the approximation ratio of this simple algorithm was suggested in the decade and a half that has already passed since the work of [19] (in contrast, Kapralov [26] showed that no such algorithm can have an approximation ratio better than 1/(1 + ln 2) ≈ 0.59, improving over previous inapproximability results due to [22,25]). Given this lack of progress, interest arose in obtaining improved approximation ratios for relaxed versions of the above problem. Perhaps, one of the simplest such relaxations is to allow the algorithm to make a few (usually two or three) sequential passes over the input stream.
The last line of work was introduced by Konrad et al. [29], and was later studied by [18]. The state-of-the-art results for it are summarized in Table 1. We note that beside the state-of-the-art results for general input graphs, Table 1 also gives improved results for bipartite and triangle-free graphs. All the known results in this line of work (to the best of our knowledge) start by greedily constructing a maximal matching during the first pass over the input stream, and then augmenting this matching in the subsequent passes. Recently, Konrad and Naidu [30] showed that this technique has limitations (specifically, even for bipartite graphs, a two-pass semi-streaming algorithm based on this technique cannot obtain a better than 2/3-approximation, which is much more strict than the inapproximability known for general two-pass semi-streaming algorithms [2]). Additionally, and arguably Table 1 The state-of-the-art approximation ratios for semi-streaming algorithms using two or three passes, and our improvements over these ratios (the number to the right of each improvement is the number of the theorem formally stating it).

Number
Type more importantly, multi-pass algorithms that use their first pass for constructing a maximal matching are unlikely to be a step towards a single-pass semi-streaming algorithm with a better than 1 /2-approximation guarantee. Given the above observations, it is natural to believe that the future of the study of semi-streaming algorithms for the maximum matching problem lies in algorithms that use their first pass in a more sophisticated way than simply constructing the traditional maximal matching. We term such algorithms non-maximal-matching-first algorithms (or non-MMF algorithms for short). In this paper, we present the first non-MMF algorithms, which leads to improvements over the state-of-the-art both for two and three passes. Admittedly, the improvements we obtain are numerically not very impressive, but their main importance (in our opinion) is in demonstrating the potential of non-MMF algorithms.
To intuitively understand our non-MMF algorithms, one should note that greedily constructing a maximal matching is equivalent to greedily constructing a graph whose connected components are of size at most 2 (where the size of a connected component is defined as the number of vertices in it). Therefore, a natural generalization is to greedily construct in the first pass a graph whose connected components are of size at most 3. There are two intuitive advantages for doing that compared to constructing a maximal matching.
A connected component of size 3 can contribute two edges to the output matching if it is "augmented" during the next passes with a single additional edge. In contrast, doing the same with a connected component of size 2 requires "augmenting" it with two additional edges. It is important to note that there is a significant conceptual difference between an augmentation of a connected component with one or two edges. Augmenting a connected component with two edges requires finding pairs of edges that augment the same connected component, while augmenting with a single edge does not require such a synchronization.
If we are not able to enjoy the above advantage because many connected components end up being of size 2 rather than 3. Then, the fact that this has happened despite us only restricting the components to be of size at most 3 implies that few edges of a maximum matching intersect only a single connected component of the constructed graph; and therefore, the constructed graph must have many connected components compared to the size of a maximum matching. Using the above ideas, we prove the following two theorems. The proof of Theorem 1 appears in Section 3, while the adaptation of this proof leading to Theorem 2 is deferred to the full version of this paper [20].
The algorithms used to prove Theorems 3 and 4 are strongly based on the algorithms suggested by Kale and Tirodkar [24]. For example, the first two passes of the algorithm suggested by Theorem 3 are identical to a two-pass algorithm presented by [24], and the third pass of this algorithm is very similar to the third pass of the three-pass algorithm of [24]. Our novelty, however, is in our ability to analyze the algorithm obtained by putting these two components together.

Related Work
As mentioned in Section 1, streaming algorithms are not appropriate for the maximum matching problem when the algorithm is required to output an (approximately) maximum matching. However, some non-trivial streaming algorithms are known for this problem when the algorithm is only required to estimate the size of the maximum matching. Kapralov et al. [27] designed a poly-log approximation streaming algorithm for this problem under the assumption that the edges in the input stream are ordered in a uniformly random order. A different line of work [13, 17, 32] considered graphs of bounded arboricity α, comulating with the work of McGregor and Vorotnikova [33], who designed (α + 2)(1 + ε)-approximation streaming algorithm for this problem requiring only O(ε −2 log n) space. In contrast, Assadi et al. [5] showed that (1 − ε)-approximation of the size of the maximum matching cannot be obtained by a single pass algorithm, even if this algorithm is allowed a semi-streaming space complexity, and [6, 8] lower bounded the number of passes required to obtain such a good approximation using a sub-polynomial space complexity.
Recall that, to date, the best single-pass semi-streaming algorithm for the maximum matching problem is still the natural greedy algorithm, which guarantees 1 /2-approximation. Chitnis et al. [12] presented an exact single-pass algorithm for this problem. However, this algorithm requiresÕ(k 2 ) memory, where k is an upper bound on the size of the maximum matching (which the algorithm needs to know upfront), and thus, this algorithm is a semi-streaming algorithm only when k =Õ( √ n). Given the difficultly to improve over the guarantee of the greedy algorithm using single-pass semi-streaming algorithms, people consider also relaxed versions of the maximum matching problem. One standard relaxation is to allow the algorithm to make multiple passes over the input stream. Section 1 surveys algorithms of this kind that use two or three passes. Another line of work considers algorithms that assume a constant (but possibly large) number of passes. The first result of this kind was presented by Feigenbaum et al. [19] (in the same paper that also introduced the notion of semi-streaming algorithms), and guaranteed (2/3 − ε)-approximation using O(ε −1 log ε −1 ) passes for bipartite graphs. Later [31] showed how to obtain (1 − ε)-approximation for general graphs using (ε −1 ) O(ε −1 ) passes, and the number of passes necessary to obtain this guarantee was improved by many further works (see, e.g., [1,4,7,21]). Another standard relaxation for the maximum matching problem is to assume that the edges of the input stream appear in a uniformly random order. The state-of-the-art for this relaxation is a (2/3 + ε 0 )-approximation single-pass semi-streaming algorithm, where ε 0 > 0 is some absolute constant [3, 10] (see also the references therein for previous works on this relaxation).
The related maximum weight matching problem was also studied heavily in the context of the data stream model. Here, it is not immediately clear that one can obtain a constant approximation ratio using a single-pass semi-streaming algorithm. However, Feigenbaum et al. [19] presented the first such algorithm guaranteeing 1 /6-approximation, and this ratio was improved in series of works [14,16,31,35]. The current state-of-the-art for the problem is ( 1 /2 − ε)-approximation due to Paz and Schwartzman [34]. Since this approximation ratio is essentially identical to the state-of-the-art for the (unweighted) maximum matching problem, any further progress on the maximum weight matching problem (beyond removing the ε) will imply an improvement over the guarantee of the greedy algorithm for the (unweighted) maximum matching problem. It is also worth mentioning that a recent reduction due to Bernstein et al. [11] shows that the reverse is also true in a sense. More specifically, any semi-streaming algorithm for bipartite unweighted graphs can be translated into such an algorithm for weighted graphs with the same number of passes and a loss of only 1 − ε in the approximation guarantee. Naturally, this reduction automatically extends some of our results to the weighted case.

Preliminaries
In this section we present the problem that we study more formally, and also introduce the notation used throughout the rest of the paper. We are interested in semi-streaming algorithms for the problem of finding a maximum size matching in a graph G = (V, E) of n vertices. A semi-streaming algorithm for this problem is an algorithm with a space complexity of O(n log c n) (for some constant c ≥ 0) that initially has no knowledge about the edges of E. Instead, the edges of E appear sequentially in an "input stream", and the algorithm may make one or more passes over this input stream. In each pass the algorithm sees the edges one by one, and may do arbitrary calculations after viewing each edge. It is important to note that the space complexity allowed for the algorithm does not suffice for storing all the edges of the graph (unless the graph is very sparse), and this is the reason that the algorithm might benefit from doing multiple passes over the input stream. It is standard to assume that the vertices of V are known upfront, and that each vertex of V can be stored using O(log n) bits (which implies that every edge of E can also be stored using this asymptotic number of bits). Throughout the paper, we consider only unweighted graphs and matchings. We also denote by M * an arbitrary maximum matching of G (i.e., an arbitrary optimal solution for our problem). Notation-wise, we treat M * (and any other matching considered in the paper) as a set of the edges included in it. Similarly, when considering a connected component C of a graph, we treat it as a set of the vertices in it, which in particular, implies that |C| is the number of such vertices.
Given a set of edges S or a path P in a graph, we denote by V (S) and V (P ) the set of vertices intersecting any edge of S or P , respectively. Similarly, the set of edges included in the path P is denoted by E(P ). Often we need to consider collections of paths (or triangles) in a given graph. For clarity, such collections are always denoted using calligraphic letters, and we extend the above notation to such collections. In other words, if P is a collection of paths, then V (P) and E(P) is the set of vertices and edges, respectively, that are included in these paths. Finally, given a set S of edges and a vertex v, we use deg S (v) to denote the degree of the vertex v in the subgraph (V, S).

Two-Pass Non-MMF Algorithm
In this section we prove Theorem 1, which we repeat below for convenience.
The algorithm whose existence is guaranteed by Theorem 1 appears as Algorithm 1. In its first pass, this algorithm greedily grows a set P of edges that form either triangles or partial triangles (i.e., isolated edges or paths of length 2). For simplicity, we refer below to the connected components of (V, P ) that are not isolated vertices as partial triangles although, technically, they can also be full triangles. In the second pass of Algorithm 1, the algorithm tries to convert the partial triangles of P into more involved structures in one of two ways. To understand these ways, we need to define some terms. First, we designate some of the vertices of every partial triangle as "connection vertices". Specifically, all the vertices of a triangle are considered connection vertices; in a path of length 2 only the two end points are considered to be connection vertices; and finally, in an isolated edge there are no connection vertices. We refer to a partial triangle that was not converted yet into a more involved structure as a "naïve" partial triangle. The first way in which Algorithm 1 tries to convert the partial triangles of P into more involved structures is by greedily adding edges that connect a connection vertex of a naïve partial triangle with an isolated vertex. The set A 1 in the algorithm includes the edges that were added in this way. In parallel, the algorithm also tries a second way to convert the partial triangles of P into more involved structures, which is to greedily add edges that connect a connection vertex of a naïve partial triangle either to a connection vertex of another naïve partial triangle or to an isolated vertex. The set A 2 in the algorithm includes the edges that were added in this way. Upon termination, Algorithm 1 outputs a maximum matching in the set of all the edges that it kept. We recall that given a connected component C of a graph, the notation |C| represents the number of vertices in C.
We begin the analysis of Algorithm 1 by noting that it is indeed a semi-streaming algorithm. The proof of the next observation can be found in the full version of this paper [20].

▶ Observation 5. Algorithm 1 is a semi-streaming algorithm.
In the rest of this section we analyze the approximation ratio of Algorithm 1. Recall that we use M * to denote some maximum matching of G. Our first objective in the analysis of the approximation ratio of Algorithm 1 is to lower bound the number of edges of M * that can potentially be added either to A 1 or to A 2 . Towards this goal, we define a charging scheme

// Second Pass
Let C u and C v be the connected components of u and v, respectively, in (V, P ).
We assume without loss of generality that |C u | > 1, otherwise we swap the roles of u and v. // Note that we cannot have |C u | = |C v | = 1 because the edge (u, v) was not added to P in the first pass.
else if no edge of A 2 intersects C u and C v , and u and v are connection vertices of C u and C v , respectively then Add the edge (u, v) to A 2 .
10 return a maximum matching in the graph π. Under the charging scheme π, every edge (u, v) ∈ M * charges the connected components of u and v in (V, P ). Each one of these connected components is charged one unit by (u, v), unless it is an isolated edge or an isolated vertex, in which case it is charged only half a unit or nothing by (u, v), respectively. We note that when u and v belong to the same connected component of (V, P ), then this connected component is charged twice by (u, v). 3 The following observation provides an upper bound on the total charged by all the edges of M * together. Let (#single) be the number of isolated edges in P , (#double) be the number of connected components in (V, P ) that are paths of length 2 and (#triangle) be the number of triangles in P .
Proof. Every positive amount charged by π is charged to some connected component of (V, P ) which is not an isolated vertex. Therefore, to prove the observation we only need to show that every isolated edge of (V, P ) is charged at most one unit, and every connected component of (V, P ) that is either a path of length 2 or a triangle is charged at most 3 units. Below we are argue that this is indeed the case.
Each connected component C of (V, P ) can be charged at most once for every one of its vertices since the fact that M * is a matching implies that every vertex of C can appear in at most a single edge of M * . For isolated edges of (V, P ), this implies that they can be charged at most twice, and therefore, they are charged at most one unit because they are charged half a unit in each charge. Similarly, connected components of (V, P ) that are either paths of length 2 or triangles contain 3 vertices, and therefore, can be charged at most three times. Since every one of these charges is of a single unit, the total charge to each connected component of these kinds is at most 3. ◀ To complement the last observation, let us now describe a simple lower bound on the total charging done by all the edges of M * according to π. Let (#component-free) be the number of edges of M * that connect a connection vertex of a connected component of (V, P ) to an isolated vertex of (V, P ), (#component-component) be the number of edges of M * that connect connection vertices of two different connected components of (V, P ), (#single-single) be the number of edges of M * whose two end points belong to (not necessarily distinct) isolated edges of (V, P ), (#single-component) be the number of edges of M * that connect a vertex of an isolated edge of (V, P ) with a connection vertex of some (other) connected component of (V, P ) and (#middle) be the number of edges that either intersect the middle vertex of a length 2 path connected component of (V, P ) or are included within a triangle connected component of (V, P ). For convenience, the definitions of the notation we use are summarized in Appendix B.
▶ Observation 7. The total charge of all the edges of M * according to the charging scheme π is at least (#component-free) Proof. Since the edges of M * counted by (#component-free) intersect a connection vertex, they must intersect a connected component of (V, P ) which is not an isolated vertex or an isolated edge, and therefore, they charge this connected component one unit. Hence, the total charge by all the edges counted by (#component-free) is at least (#component-free). Similar logic shows that the total charge by all the edges counted by (#component-component), Combining Observations 6 and 7, we get the following inequality.
(#component-free) + 2(#component-component) + (#single-single) (1) In its current form, Inequality (1) is not very useful. We later derive from it a more convenient inequality, but before doing this we need to prove a few other inequalities. Let (#non-M * -triangles) denote the number of triangle connected components of (V, P ) that do not include any edge of M * within them.
▶ Lemma 8. The following inequalities hold and they imply together Proof. Since every edge that is included in a connected component of (V, P ) which is a path of length 2 must include the middle vertex of this path, every edge e ∈ M * that is not counted by either (#component-free), (#component-component), (#single-single), (#single-component) or (#middle) must either connect a vertex of an isolated edge of (V, P ) to an isolated vertex or connect two isolated vertices of (V, P ). However, such edges cannot exists. Specifically, assume towards a contradiction that (u, v) is an edge of M * such that u is an isolated vertex of (V, P ) and v is either another isolated vertex of (V, P ) or belongs to an isolated edge of this graph. Then, the edge (u, v) should have been added by Algorithm 1 to P upon arrival, which contradicts the fact that its end point u ended up as an isolated vertex of (V, P ). Hence, every edge e ∈ M * is counted by either (#component-free), (#component-component), (#single-single), (#single-component) or (#middle), which implies Inequality (2).
Recall that every edge counted by (#middle) must either be included in a triangle connected component of (V, P ) or intersect the middle vertex of a path of length 2 connected component of (V, P ). Since M * is a matching, only one edge of M * can intersect the middle vertex of a given length 2 path or be included in a given triangle, and therefore, every edge counted by (#middle) can be associated with a distinct path of length 2 or triangle component of (V, P ) that is not counted by (#non-M * -triangles), which implies Inequality (3).
Every edge counted by (#single-single) touches two end-points of isolated edges of (V, P ). Similarly, every edge counted by (#single-component) intersects an end-point of an isolated edge of (V, P ). Since every end-point of an isolated edge of (V, P ) can be touched by at most a single edge of M * because M * is a matching, this implies that the number of end points of the isolated edges of (V, P ) is at least 2(#single-single) + (#single-component). However, this number is also equal to 2(#single), which implies Inequality (4). ◀ The last inequality in the previous lemma provides a lower bound on (#component-free) + (#component-component), and one can view (#component-free)+(#component-component) as a count of edges of M * that have potential to be added to A 2 in Algorithm 1. The next lemma is the promised derivative of Inequality (1), and it provides a lower bound on (#component-free). Observe that (#component-free) is a count of edges of M * that have the potential to be added to A 1 . Proof. We say that an edge e of M * counted by (#component-free) is excluded by an edge f ∈ A 1 if e and f intersect the same connected component of (V, P ). One can observe that every edge e counted by (#component-free) is excluded by some edge of A 1 (possibly itself) when Algorithm 1 terminates because otherwise Algorithm 1 would have added e to A 1 , which would have resulted in e excluding itself. Therefore, we can upper bound (#component-free) by counting the number of edges excluded by the edges of A 1 .
Let (u, v) be an edge of A 1 , and assume without loss of generality that v is the end point of this edge which is an isolated vertex of (V, P ). This implies that u is a connection vertex of a connected component C u of (V, P ) which is either a path of length 2 or a triangle. If C u is a path of length 2, then the edge (u, v) can exclude only edges counted by (#component-free) that intersect either v or a connection vertex of C u , and there can be only 3 such edges because M * is a matching (see Figure 1a). Next, consider the case in which C u is a triangle which is not counted by (#non-M * -triangles). In this case there can be at most 2 edges of M * intersecting C u (see Figure 1b), and since (u, v) can exclude only edges that intersect either C u or v, we get that it can exclude at most 3 edges. 4 It remains to consider the case in which C u is a triangle counted by (#non-M * -triangles). In this case, (u, v) can again exclude every edge of M * that intersects C u or v, and this time there can be at most 4 such edges (see Figure 1c). Combining all the above, we get that the number of edges excluded by all the edges of A 1 is at most 3|A 1 | + |{e ∈ A 1 | e intersects a triangle counted by (#non-M * -triangles)}| .
As explained above, this expression is an upper bound on (#component-free). Furthermore, since A 1 includes at most a single edge intersecting every connected component of (V, P ), the second term in this expression is upper bounded by (#non-M * -triangles). Therefore, we get (#component-free) ≤ 3|A 1 | + (#non-M * -triangles) .
The lemma now follows by rearranging this inequality. ◀ The next corollary now follows by combining Lemmata 9 and 10.
▶ Lemma 12. It holds that The proof of Lemma 12 is quite similar to the proof of Lemma 10. Therefore, and due to space constrained, we defer it to Appendix C. The next corollary now follows by combining Lemma 12 and the final inequality in Lemma 8.
Let us now denote L = (#single) + (#double) + (#triangle) + max{|A 1 |, |A 2 |}. We argue below that L is a lower bound on the size of the solution produced by Algorithm 1. However, before proving this, let us show first that L is large.

▶ Lemma 15. Algorithm 1 outputs a matching of size at least L.
Proof. Since Algorithm 1 outputs a maximum matching in (V, P ∪ A 1 ∪ A 2 ), to prove the lemma it suffices to show that the graph (V, P ∪ A 1 ) includes a matching of size (#single) + (#double) + (#triangle) + |A 1 | and the graph (V, P ∪ A 2 ) includes a matching of size (#single) + (#double) + (#triangle) + |A 2 |. We prove below only the claim regarding (V, P ∪ A 2 ). The claim regarding (V, P ∪ A 1 ) can be proved analogously.
Let H be the number of edges in A 2 that connect two non-isolated vertices of (V, P ). Then, we classify the connected components of (V, P ∪ A 2 ) as follows, and show how to build a large matching M based on this classification.
(V, P ∪A 2 ) includes (#single)+(#double)+(#triangle)−|A 2 |−H connected components that are (i) not an isolated node, and (ii) appear also in (V, P ). Each one of these connected components contains at least one edge, and therefore, can contribute some edge to M . (V, P ∪A 2 ) includes |A 2 |−H connected components that consist of a connected component C of (V, P ) that has connection vertices and an edge e connecting a connection vertex of C to an isolated vertex of (V, P ). One can observe that the combination of C and e must be either a path of length 3 or a triangle and an edge attached to one of its vertices, and in both cases this combined connected component contains two vertex disjoint edges which it can contribute to the matching M . (V, P ∪ A 2 ) includes H connected components that consist of two connected components C 1 , C 2 of (V, P ) that have connection vertices and an edge e connecting a connecting vertex of C 1 with a connecting vertex of C 2 . There are three shapes that the connected component obtained in this way can take: a path of length 5, a triangle with a path of length 3 attached to one of its vertices or two triangles and an edge connecting them. However, one can observe that all these shapes include three vertex disjoint edges that can be contributed to the matching M . Before concluding this section, we note that Theorem 2 is proved in the full version of this paper [20] by splitting the second pass of Algorithm 1 into two passes. One pass that constructs A 1 , and a second pass that constructs A 2 , while making sure not to use again connected components of (V, P ) already used by A 1 .

Three-Pass Algorithm for Triangle-Free Graphs
In this section we prove Theorem 3, which we repeat here for convenience.
We refer to the algorithm whose existence is guaranteed by Theorem 3 as Triangle-FreeAlg. In its first pass, TriangleFreeAlg constructs a maximal matching M 0 of G. Formally, the pseudocode for this pass appears as Algorithm 2. We say that an edge e ∈ E is a wing if e includes exactly one vertex of V (M 0 ). Intuitively, the reason we are interested in wings is that one can obtain an augmenting path 5 for M 0 by combining an edge (u, v) ∈ M 0 with two wings: one wing that intersects u and one wing that intersects v. The second pass of TriangleFreeAlg grows a set W of wings. Since we hope to construct multiple augmenting paths using these wings, the algorithm makes sure to limit the number of wings in W that intersect any given vertex u (specifically, the algorithm allows only a single wing in W to intersect u if u ∈ V (M 0 ), and otherwise it allows up to two wings of W to intersect u). The pseudocode of this second pass appears as Algorithm 3.
Algorithm 3 also includes a post-processing step in which a set P 1 of augmenting paths (with respect to M 0 ) is constructed using W . This is done by constructing an auxiliary multi-graph G A over the vertices of V \ V (M 0 ) in which there is an edge between two nodes u, v ∈ V \ V (M 0 ) for every path P u,v of length 3 in W ∪ M 0 between them. One can note that every such path P u,v must be an augmenting path consisting of an edge e ∈ M 0 and two wings from W : one intersecting u and an end-point of e, and the other intersecting v and the other end-point of e. Algorithm 3 finds a maximum size matching M A in G A , and then sets P 1 to be the collection of (augmenting) paths corresponding to the edges of M A .
Consider now an edge e ∈ M 0 that does not appear in any path of P 1 and is connected by some wing w ∈ W to some vertex u ̸ ∈ V (M 0 ) ∪ V (P 1 ). The pair e, w can be extended into an augmenting path if one can find another wing w ′ connecting the other end of e (the end that does not intersect w) to a vertex v ̸ ∈ V (M 0 ) ∪ V (P 1 ) that is not u. The third pass of TriangleFreeAlg greedily constructs a collection P 2 of augmenting paths in this way. A pseudocode of this pass appears as Algorithm 4. After completing the pass, Algorithm 4 returns the matching obtained by augmenting M 0 with the augmenting paths of P 1 and P 2 .
Add the path u, a, b, v to P 2 . // Note that u ̸ = v because otherwise u, a, b, v would have been a triangle.
We begin the analysis of TriangleFreeAlg with the following lemma, which shows that this algorithm returns a matching, and also gives a basic lower bound on the size of this matching. Due to space constraints, the proof of this lemma is deferred to Appendix C.
▶ Lemma 17. The paths in P 1 and P 2 are vertex disjoint, and therefore, the output of TriangleFreeAlg is a matching of size |M 0 | + |P 1 | + |P 2 |.
Using the last lemma we can also bound the space complexity of Algorithm 4. The technical proof of the next corollary can be found in the full version of this paper [20].

▶ Corollary 18. TriangleFreeAlg is a semi-streaming algorithm.
It remains to analyze the approximation ratio of TriangleFreeAlg. Our analysis roughly follows the flow of the algorithm, and thus, we begin by observing that the matching M 0 constructed in the first pass of this algorithm is of size at least |M * |/2 (recall that M * is a maximum size matching of G) because M 0 is a maximal matching of G by construction.

APPROX/RANDOM 2022
In its second pass, TriangleFreeAlg constructs the set W of wings. Our next objective is to lower bound the size of W . Towards this goal, we need to define W M to be the set of all edges of M * that are wings (we recall that an edge e is a wing if exactly one of its end points appear in V (M 0 )).
where the equality holds since every edge of W M is a wing, and therefore, intersects a single vertex of V (M 0 ). The lemma now follows by adding two copies of Inequality (6) to Inequality (5). ◀ We now get to the analysis of the third pass of TriangleFreeAlg, and our first goal in this analysis is to identify a set of paths that have a potential (in some sense) to end up in P 2 . Let P ′ be the set of paths of length 3 in G that consist of a wing of W M followed by an edge of M 0 and then a wing of W . We think of the paths in P ′ as directed from their W M to their W edge, and consider two paths that differ only in their direction to be different paths. This is important because if there is an edge e ∈ M 0 incident to two edges w 1 , w 2 ∈ W ∩ W M , then the path w 1 , e, w 2 fulfills the requirements to belong to P ′ both when w 1 is considered the first edge in it and when w 2 is considered the first edge of the path. Thus, the fact that we treat the direction of the path as part of the path's definition allows both the paths w 1 , e, w 2 and w 2 , e, w 1 to appear in P ′ .  | (a, b) ∈ M 0 }. One can now observe that P ′ includes a (distinct) path for every wing of W M that intersect b(a) for some vertex a ∈ V W . Therefore, where the first equality holds since {b(a) | a ∈ V W } is a subset of V (M 0 ), and the last inequality follows from Observation 19 and Lemma 20. ◀ A path in P ′ has a potential to be added to P 2 only if none of its vertices appears in P 1 . Let P ′′ be the set of such paths (formally, P ′′ = {P ∈ P ′ | V (P ) ∩ V (P 1 ) = ∅}). The following lemma lower bounds the size of P ′′ .
Proof. The second inequality of the lemma follows from Observation 21, and therefore, we concentrate on proving the first inequality. Towards this goal, assume that P ′ ∈ P ′ is a path that intersects with a path P 1 ∈ P 1 on an internal vertex. Since the middle edge of both paths is an edge of M 0 , this implies that the two paths intersect on both their internal vertices. Furthermore, since both end-edges of P 1 and one end-edge of P ′ belong to W , there must be an internal vertex a ∈ V (M 0 ) of both paths that intersects an edge of W in both paths. However, since deg W (a) ≤ 1, the edges of W intersecting a in both paths must be identical, which implies that the paths P ′ and P 1 intersect also on some end-point. Since P ′ and P 1 where chosen as general paths of P ′ and P 1 , respectively, that intersect on an internal node, this implies that the difference |P ′ | − |P ′′ | is equal to the number of paths in P ′ that intersect a path of P 1 in an end-point. The rest of the proof is devoted to proving that the last number is at most 6|P 1 |.
Since each path of P 1 has only two end points, to prove that the paths of P 1 intersect at most 6|P 1 | paths of P ′ at an end-point, it suffices to show that every vertex of V \ V (M 0 ) can appear in at most 3 paths of P ′ . To see why that is the case, consider an arbitrary vertex u ∈ V \ V (M 0 ). If u belongs to some path P ′ ∈ P ′ , then it must be in one of two roles as follows.
If u is the last vertex of the path, then the last edge of the path is an edge e ∈ W that includes u, and the other edges of the path P ′ are the single edge of M 0 intersecting e and the single edge of W M intersecting e. Note that this means that the identity of the entire path is determined by the edge e, and therefore, the number of paths of P ′ in which u is the last vertex can be upper bounded by deg W (u) ≤ 2.
If u is the first vertex of the path, then the first edge of the path is the single edge e ∈ W M that includes u, and the other edges of the path are the single edge e ′ ∈ M 0 that intersect e and the single edge e ′′ ∈ W that intersects e ′ . Hence, the entire path is determined by the fact that u is its first vertex, and therefore, there can be only a single path in P ′ in which u is the first vertex. ◀ Originally, all the paths of P ′′ can be picked in the third pass of TriangleFreeAlg (Algorithm 4) since they are vertex disjoint from the paths of P 1 . However, as Algorithm 4 starts to add paths to P 2 , it stops being possible to add some paths of P ′′ to P 2 . Still, we APPROX/RANDOM 2022 can lower bound the size of P 2 in terms of the size of P ′′ . The proof of the next lemma is based on a logic similar to the one used in the previous proof. Thus, we defer this proof to Appendix C.
▶ Corollary 24. The size of the output of TriangleFreeAlg is |M 0 | + |P 1 | + |P 2 | ≥ 11 18 |M * | = ( 1 2 + 1 9 )|M * |. Proof. The size of the output of TriangleFreeAlg is |M 0 | + |P 1 | + |P 2 | by Lemma 17, thus, we only need to lower bound this sum. To do this, note that where the first inequality follows from Lemma 23, and the second inequality follows from the observation made at the beginning of this section (namely, that |M 0 | is a 1 /2-approximation for |M * | because M 0 is a maximal matching). ◀ Theorem 3 now follows from Corollaries 18 and 24.

Conclusion and Future Work
We have presented in this paper a new approach for semi-streaming algorithms for the maximum matching problem, and showed that this approach can be used to improve the state-of-the-art in two and three passes. Our approach calls for a more sophisticated logic in the first pass rather than simply building a maximal matching in a greedy fashion, as is done by previous algorithms. In our implementation of this approach, we greedily built in this pass connected components of size 3 (recall that greedily building a maximal matching is equivalent to greedily building connected components of size 2). Similarly, one can try to greedily construct in the first pass larger connected component, which we believe is likely to yield even better approximation guarantees. However, the analysis of algorithms based on such a first pass is likely to be inelegant, and to require a lot of case analysis since larger components allow many more configurations compared to smaller components. It might also be interesting to try to come up with an interesting algorithm that uses a completely different kind of logic in its first pass. In addition to the above, we have used in this paper the traditional technique to improve over the state-of-the-art for three passes. Further improving the approximation ratio of two-pass and three-pass algorithms (or proving that this is not possible), is a nice question that is still open. We conclude by recalling that the most basic open question in this field of research is still breaking the (almost trivial) 1/2-approximation for single-pass algorithms. We hope that our new approach will lead to progress on both the above open questions.

A Three-Pass Algorithm for General Graphs
In this section we prove Theorem 4, which we repeat here for convenience.
The algorithm that we use to prove Theorem 4 is given as Algorithm 5. Since this algorithm is very similar to the algorithm TriangleFreeAlg presented in Section 4, we use below the terminology and notation defined in the last section.
Intuitively, the reason why TriangleFreeAlg does not apply to general graphs is that given an edge (a, b) ∈ M 0 , a wing (u, a) ∈ W M and a wing (b, v) ∈ W , we are not guaranteed that these three edges form an augmenting path for the matching M 0 because they might represent a triangle. To overcome this hurdle, Algorithm 5 constructs two sets of edges in its second pass: a set W 1 constructed exactly like the set W in TriangleFreeAlg, and a set W 2 constructed in the same way, but while excluding the edges of W 1 . Since W 1 and W 2 are disjoint, given an edge (a, b) ∈ M 0 and a wing (u, a) ∈ W M , at most one of the sets W 1 or W 2 can contain a wing that forms a triangle together with these two edges, which intuitively allows us to bound the deterioration in the approximation guarantee resulting from the existence of such triangles.
We note that the analysis of TriangleFreeAlg up to Lemma 20 applies to Algorithm 5 with a single modification. Namely, Lemma 20 provides a lower bound on the size of the set W , which translates into an identical lower bound on the size of the corresponding set W 1 in Algorithm 5.
In the rest of this section, it will be convenient to work with the set W ′ 2 constructed by Algorithm 6 (note that Algorithm 6 is used for analysis purposes only). Intuitively, W ′ 2 is constructed in the same general way in which W 1 and W 2 are constructed; however, while all the edges of the input stream are considered in the construction of W 1 , and only the edges of E \ W 1 are considered in the construction of W 2 , the construction of W ′ 2 takes into account the edges of (E \ W 1 ) ∪ W M . W M and W M is a subset of the matching M * . However, this leads to a contradiction because one of the edges e ′ 1 or e ′ 2 must belong to W 1 , and the other of these edges must belong to W ′ 2 , and the sets W 1 and W ′ 2 can intersect only on edges of W M . It remains to consider the case in which e 1 ̸ = e 2 . Let u 1 , u 2 be the end-points of these edges, respectively, that do not belong to the edge e of M 0 . Since e 1 ̸ = e 2 are edges of the W M , which is a subset of the matching M * , u 1 and u 2 must be distinct. Consider now the path e ′ 1 , e, e ′ 2 . One can observe that this is indeed a path because (i) u 1 ̸ = u 2 and (ii) the fact that e 1 and e 2 are vertex disjoint implies that e ′ 1 and e ′ 2 intersect different end-points of e. Furthermore, since T 1 , T 2 ∈P ′ , this path does not intersect any vertex of P 1 , and thus, its existence contradicts the maximality of the matching M A constructed by Algorithm 5 because both e ′ 1 and e ′ 2 belong to We are now ready to lower bound the number of augmenting paths found by Algorithm 5 during its third pass. ▶ Lemma 28. |P 2 | ≥ |P ′′ |/12 ≥ 5 9 |M * | − 35 36 |M 0 | − |P 1 |. Proof. The proof of the lemma is very similar to the proof of Lemma 23, except that now every path of P 2 might get a charge of up to 12 because the paths of P ′′ originally added to P ′ by Line 4 of Algorithm 7 can contribute up to 6 to this charge, and the same goes for the paths of P ′′ originally added to P ′ by Line 5 of this algorithm. ◀ Theorem 4 now follows from Corollary 18 and the next corollary.
▶ Corollary 29. The size of the matching produced by Algorithm 5 is at least ( 1 2 + 1 14.4 )|M * |. Proof. By Lemma 17, the size of the matching produced by Algorithm 5 is at least where the first inequality holds by Lemma 28, and the second inequality holds since M 0 (as a maximal matching) is of size at least 1 2 |M * |. ◀

B Notation Summary
The next table summarizes the notation used in the analyses of our non-MMF algorithms.

Notation Explanation (#single)
The number of isolated edges in (V, P ).
(#double) The number of connected components in (V, P ) that are paths of length 2. (#triangle) The number of triangles in (V, P ).
(#component-free) The number of edges of M * that connect a connection vertex of a connected component of (V, P ) to an isolated vertex of (V, P ).
(#component-component) The number of edges of M * that connect connection vertices of two different connected components of (V, P ).
(#single-single) The number of edges of M * whose two end points belong to (not necessarily distinct) isolated edges of (V, P ). (#single-component) The number of edges of M * that either connect a vertex of an isolated edge of (V, P ) with a connection vertex of some (other) connected component of (V, P ). (#middle) The number of edges that either (i) intersect the middle vertex of a length 2 path connected component of (V, P ), or (ii) are included within a triangle connected component of (V, P ).
Proof. The proof of Lemma 12 is very similar to the proof of Lemma 10, and therefore, we only sketch it. We first define that an edge e ∈ A 2 excludes an edge f of M * counted by either (#component-component) or (#component-free) if they both intersect the same connected component of (V, P ). Like in the proof of Lemma 10, it can be argued that where the second inequality holds since every connected component of (V, P ) intersects only a single edge of A 2 . The lemma now follows by rearranging the last inequality. ◀ ▶ Lemma 17. The paths in P 1 and P 2 are vertex disjoint, and therefore, the output of TriangleFreeAlg is a matching of size |M 0 | + |P 1 | + |P 2 |.
Proof. Given the above discussion, it is clear that all the paths in P 1 ∪ P 2 are augmentation paths with respect to M 0 , which implies that the first part of the lemma indeed implies the second part. Furthermore, one can observe that the condition in Line 3 of Algorithm 4 guarantees that the paths in P 2 are vertex disjoint from each other and from the paths of P 1 . Thus, to complete the proof of the lemma, it remains to argue that the paths in P 1 are also vertex disjoint.
Recall that the end-points of every path in P 1 belong to V \ V (M 0 ) and the internal points of these paths belong to V (M 0 ). Hence, to show that the paths in P 1 are vertex disjoint, it suffices to argue this separately for their end-points and their internal nodes. Every path P u,v ∈ P 1 corresponds to an edge (u, v) in the matching M A . Since the end-points of the path P u,v are also the end-points of this edge, we get that the paths in P 1 must have disjoint end-points because M A is a matching. Consider now some path P u,v ∈ P 1 , and let us denote the internal nodes of this path by a and b. Since a and b appear only in the edge (a, b) of M 0 (because M 0 is a matching), we get that if one of them belongs to a path of P 1 , then the other belongs to this path as well. Furthermore, by Line 5 of Algorithm 3, deg W (a) = deg W (b) = 1, which implies that any path of P 1 that includes the nodes a and b as internal nodes must in fact be identical to P u,v itself. Hence, no two paths in P 1 share internal nodes. ◀ ▶ Lemma 23. |P 2 | ≥ 1 6 |P ′′ | ≥ 5 9 |M * | − 8 9 |M 0 | − |P 1 |. Proof. We begin the proof by observing that no edge e ∈ M 0 is connect by two distinct wings w 1 , w 2 ∈ W to vertices of V \ (V (M 0 ) ∪ V (P 1 )). Assume towards a contradiction that this is not true, then there is an edge e in G A corresponds to the path P defined as w 1 , e, w 2 . Since M A is a maximum matching in G A , it must include at least one edge that contains some end-point of P (otherwise, the edge corresponding to P could be added to M A , which violates its maximality); which contradicts the definition of either w 1 or w 2 .
For every path P ′′ ∈ P ′′ , let us charge a cost of 1 to some path of P 2 that intersects it. To see why such a path must exist, let us denote by e M the edge of P ′′ that belongs to W M (the first edge of P ′′ ). When e M arrives, the path P ′′ was one candidate to be added APPROX/RANDOM 2022 to P 2 by Algorithm 4. If this candidate was still feasible at this time (in the sense that it was vertex disjoint from P 2 ), then Algorithm 4 must have added either P ′′ to P 2 or another path that includes e M . In either case, following the arrival of e M , some path intersecting P ′′ (which is possibly P ′′ itself) appears in P 2 -and can be charged.
Our next goal is to show that the total cost charged to any single path of P 2 is at most 6, which implies the lemma because the total cost charged to all the paths of P 2 is exactly |P ′′ |. We do that by making two observations.
Since P ′′ ⊆ P ′ , we get by the proof of Lemma 22 that at most 3 paths of P ′′ can include any given vertex u ∈ V \ V (M 0 ). Our second observation is that, if a path P ′′ ∈ P ′′ intersects a path P 2 ∈ P 2 , then they must intersect on an end-point of P 2 . Assume towards a contradictions that they only intersect on an internal node a. Since the middle edges of both paths are edges of M 0 that include a, both internal edges must be the same. Let us denote this internal edge by e. Furthermore, as explained above, there can be only a single edge w ∈ W that intersects e and does not include a vertex of V (P 1 ). This edge must belong also to both paths, and therefore, the end-point of w that does not belong to V (M 0 ) is an end-point of both P ′′ and P 2 . Combining the above two observations, we get that, for every path P 2 ∈ P 2 , only paths of P ′′ intersecting an end-point of P 2 can charge a cost to P 2 , and there can be at most 3 paths of P ′′ intersecting each such end-point. Since P 2 has only two end-points, this implies that at most 6 paths of P ′′ can charge P 2 . ◀