Improved Streaming Edge Coloring
Abstract
Given a graph, an edge coloring assigns colors to edges so that no pairs of adjacent edges share the same color. We are interested in edge coloring algorithms under the W-streaming model. In this model, the algorithm does not have enough memory to hold the entire graph, so the edges of the input graph are read from a data stream one by one in an unknown order, and the algorithm needs to print a valid edge coloring in an output stream. The performance of the algorithm is measured by the amount of space and the number of different colors it uses.
This streaming edge coloring problem has been studied by several works in recent years. When the input graph contains vertices and has maximum vertex degree , it is known that in the W-streaming model, an -edge coloring can be computed deterministically with space [Ansari, Saneian, and Zarrabi-Zadeh, 2022], or an -edge coloring can be computed by a -space randomized algorithm [Behnezhad, Saneian, 2024] [Chechik, Mukhtar, Zhang, 2024].
In this paper, we achieve polynomial improvement over previous results. Specifically, we show how to improve the number of colors to using space deterministically, for any constant . This is the first deterministic result that bypasses the quadratic bound on the number of colors while using near-linear space.
Keywords and phrases:
edge coloring, streamingCategory:
Track A: Algorithms, Complexity and GamesFunding:
Shiri Chechik: European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 803118 UncertainENV).Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computation Streaming, sublinear and near linear time algorithms ; Theory of computation Graph algorithms analysisEditors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele PuppisSeries and Publisher:

1 Introduction
Let be an undirected graph on vertices with maximum vertex degree . An edge coloring of is an assignment of colors to edges in such that no pairs of adjacent edges share the same color, and the basic objective is to understand the smallest possible number of colors that are needed in any edge coloring, which is called the edge-chromatic number of . It is clear that the total number of colors should be at least , and a simple greedy algorithm can always find an edge coloring using colors. By the celebrated Vizing’s theorem [43] and Shannon’s theorem [40], -edge coloring and -edge coloring always exist in simple and multi-graphs respectively, and these two upper bounds are tight in some hard cases.
The edge coloring problem has been studied widely from the algorithmic perspective. There have been efficient algorithms for finding good edge coloring in various computational models, including sequential [2, 30, 41, 3, 9, 12, 4], dynamic [6, 10, 26, 21, 11, 20], online [22, 13, 38, 36, 14, 15, 27], and distributed [37, 28, 29, 31, 5, 16, 8, 21, 24] models. In this paper, we are particularly interested in computing edge coloring under the streaming model where we assume the input graph does not fit in the memory of the algorithm and can only be accessed via one pass over a stream of all edges in the graph. Since the output of edge coloring is as large as the graph size, the algorithm cannot store it in its memory. To address this limitation, the streaming model is augmented with an output stream in which the algorithm can write its answers during execution, and this augmented streaming model is thus called the W-streaming model.
Edge coloring in the W-streaming model was first studied in [7], and improved by follow-up works [17, 1] which led to a deterministic edge coloring algorithm using colors and space, or colors and space as a general trade-off. In [32], the authors improved the trade-off to colors and 111 hides factors. memory, yet it does not break the quadratic bound in the most natural memory regime; this algorithm is randomized but can be derandomized in exponential time. The quadratic upper bound of colors was subsequently bypassed in [39, 19] where it was shown that an -edge coloring can be computed by a randomized algorithm using space. Furthermore, in [39] the authors obtained a general trade-off of colors and space which is an improvement over [32].
1.1 Our Results
In this paper, we focus on the basic -memory setting and improve the recent randomized upper bound to .
Theorem 1.
Given a simple graph on vertices with maximum vertex degree , for any constant , there is a randomized W-streaming algorithm that outputs a proper edge coloring of using space and different colors; both upper bounds hold in expectation.
Furthermore, we also show that our algorithm can be derandomized using bipartite expanders based on error correcting codes at the cost of slightly worse bounds, as stated below. The deterministic algorithm and proof can be found in the full version [18].
Theorem 2.
Given a simple graph on vertices with maximum vertex degree , for any constant , there is a deterministic W-streaming algorithm that outputs a proper edge coloring of using colors and space.
1.2 Technical Overview
Previous Approaches
Using a deterministic general-to-bipartite reduction from [32], we can assume the input graph is bipartite. Also, it suffices to color only a constant fraction of all edges in , because we can recurse on the rest fraction of which only incurs an extra factor of on the total number of different colors.
Let us begin by recapping the randomized -edge coloring from [39]. Since the algorithm has bits of memory, we can assume that input graph is read from stream in batches of size . If the subgraph formed by every batch has maximum degree at most , then we can allocate new colors for each batch, using colors in total. Thus, the main challenge arises when some batches contain subgraphs with maximum degree exceeding .
To simplify the problem, let us assume that in each batch of edges, every vertex in has degree . To assign colors to edges, organize a table of colors of size , represented as a matrix with indices and . Then, for every vertex , draw a random shift uniformly at random from . During the algorithm, each vertex keeps a counter of its degree in the stream so far, and each vertex keeps a counter of the number of batches in which it has appeared so far. Then, to color a single batch, for edge , the algorithm tentatively assigns the color (indices are under modulo ).
Clearly, edges incident on the same vertex receive different colors, because the counter is incremented for each edge ; also edges incident on the same vertex but arriving in different batches are different, because the values of counter are different. Randomization ensures that edges incident on within the same batch receive mostly different colors from the same column in . Consider all the neighbors of in a single batch . Since all the random shifts are independent and all the counters are deterministic (they only depend on the input stream), with high probability, most row indices will be different.
Bypassing the Bound
To better understand the bottleneck in the approach of [39], consider the following case. If every batch forms a regular subgraph with uniform degree , then we can reduce the size of the color table from to , since each vertex could only appear in different batches. So the main difficult case is when the batches are subgraphs of unbalanced degrees. As an extreme example, consider when vertices in have degree , while the vertices in have degree . For the rest, our main focus will be on this extreme case, and show how to obtain a -edge coloring which is almost optimal.
The flavor of our approach is similar to [19]. Divide all batches into phases, each phase consisting of consecutive batches. Let be a parameter which upper bounds the number of batches in which any vertex could appear during a single phase. Then, the maximum degree of a single phase would be bounded by , so in principle we should be able to color all edges within this phase using different colors. To implement the coloring procedure, at the beginning of each phase, prepare fresh palettes , each of size , and assign each batch in this phase a palette where is chosen from uniformly at random. To keep track of the colors already used around each vertex, we maintain the following data structures.
-
Each vertex keeps a list of used palettes.
-
Each vertex initializes a random shift at the beginning of the algorithm.
When a batch of edges arrives, we will use the assigned palette to color this batch. For every edge , if , then tentatively assign the color in to edge , where denotes the number of times that palette has been assigned to previous batches. If , we mark the edge as uncolored. Since the palettes are assigned to batches randomly, in expectation, a constant fraction of edges will be successfully colored. In the case that multiple edges incident on are assigned the same color, we retain only one such edge and mark the remaining edges as uncolored. Since all the random shifts ’s are uniformly random and independent of the randomness of counters , most tentative colors around any vertex in this batch should be different. Once the batch is processed, add to all lists such that .
To reason about the space usage, we can argue that the total size of the lists does not exceed , because a palette appears in the list only when receives edges in a batch which uses palette . Since the total number of edges in a phase is , the total list size should be bounded by . In this way, we are able to store all the used palettes of vertices in ; this is not new to our approach, and a similar argument was also used in [19]. The new ingredient is the way we store the used colors around vertices in . Here we have exploited the fact that vertices in have low degrees in each batch, so their progress in every palette is synchronized; that is, we only need to store a single counter which is the number of times that appears, and then the next tentative color of would be .
The above scheme would use different colors in a single phase, so colors across all phases. When , this bound would be much better than . So, what if ? To deal with this case, we will apply a two-layer approach. Specifically, let us further group every consecutive phases as a super-phase, so there are at most super-phases in total (recall that we were assuming ). Within a super-phase, we will allocate fresh colors which are divided into different color packages , where each color package is further divided into palettes of size as . In this way, the total number of colors would be .
Then, like what we did before, for each phase in a super-phase, assign a color package from uniformly at random. Within each phase, we will stick to its assigned color package and reuse the algorithm within a phase we have described before. Since a color package is shared by multiple phases, each vertex needs to store a list which stores all the packages it has used so far, and in any phase where is assigned but is already contained in , we would not color any edges incident on in this particular phase. By repeating the same argument, we can argue that the total size of all the lists is bounded by as well.
We can further generalize this two-layer approach to multiple layers and reduce the total number of colors to . However, this only works with the most unbalanced setting where vertices on always have constant degrees in each batch of input. In general, when low-degree side has degree , our algorithm needs colors. If is large, then we would switch to the color table approach from [39]. Balancing the two cases, we end up with colors overall.
Derandomization using Bipartite Expanders
Randomization is used both in the unbalanced case and in the regular case. To replace the choices of the random shifts and random color package assignments, we will show one can apply unbalanced bipartite expanders [42, 34, 35] in a black box manner. However, for the random shifts used in the regular case where we apply the color table idea from [39], it would not be enough to use an arbitrary bipartite expander, because the counters ’s could be different and possibly damage the expansion guarantee; for example, it is not clear to us how to apply the bipartite expander construction based on Parvaresh–Vardy Codes [34]. To fix this issue, it turns out that it would be most convenient to use the bipartite expander construction based on multiplicity codes [35].
2 Preliminaries
Basic Terminologies
For any integer , let us conventionally define . For any set and integer , let be the multi-set that contains exactly copies of each element in .
Let denote the simple input graph on vertices and edges with maximum degree . For any subset of edges and any vertex , let be the number of edges in incident on . Sometimes we will also refer to the subgraph simply as .
Problem Definition
In the edge coloring problem, we need to find an assignment of colors to edges such that adjacent edges have distinct colors, and the objective is to minimize the total number of different colors.
In the W-streaming model introduced by [25, 33], all edges of the graph arrive one by one in the stream in an arbitrary order, and the algorithm makes one pass over the stream to perform its computation. For the task of edge coloring, since the algorithm has much less space than the total size of the graph, it cannot store all the edge colors in its memory. To output the answer, the algorithm is given a write stream in which it can print all colors.
Next, to set the stage in a convenient way, we will state several reductions for the problem.
General-to-Bipartite Reduction
Let us first simplify the problem by a deterministic reduction from edge coloring in general graphs to bipartite graphs.
Lemma 3 (Corollary 3.2 in [32]).
Given an algorithm for streaming edge coloring on bipartite graphs using colors and space, there is an algorithm using colors and space in general graphs. Furthermore, this reduction is deterministic.
Reduction to Fixed Degree Pairs
By the above reduction, we focus on edge coloring for bipartite graphs , where . Since the algorithm has space , we can divide the input stream into at most batches, each batch containing edges. For any vertex and any batch , let be the number of edges in incident on . For every pair of integers and every batch , let , and define over all batches in the stream and .
In the main text, we will devise an algorithm to handle edges in for any fixed pair of . The final coloring is obtained by taking the union of the colors over all , which will blow up the number of colors and space by a factor of .
Adapting to an Unknown
So far we have assumed that the maximum vertex degree in is known in advance. Our algorithm can be adapted to an unknown value in a standard way. Specifically, we can maintain the value which is the maximum degree of the subgraph containing the first edges in the input stream. Whenever , we apply our algorithm with to color all the edges. When increments at some point, we restart a new instance of the algorithm with a new choice and continue to color the edges with fresh colors. Overall, the total number of colors will not change asymptotically.
3 Randomized Edge Coloring
This section is devoted to the proof of Theorem 1.
Reduction to Partial Coloring
To find an edge coloring of a graph, it was shown in [19] that it suffices to color a constant fraction of edges in if we do not care about blow-ups in the number of colors. Roughly speaking, for the uncolored edges, we can view them as another instance of edge coloring, and solve it recursively. The following statement formalizes this reduction.
Lemma 4 (implicit in [19]).
Suppose there is a randomized streaming algorithm with space space, such that for each edge in , it assigns a color from or marks it as (uncolored) and print this information in the output stream, with the guarantee that there are at least edges in expectation which receive actual colors, for some value . Then, there is a randomized edge coloring algorithm which uses at most colors and space in expectation.
Proof sketch.
The idea is to recursively apply the streaming algorithm on all edges marked with (uncolored). In expectation, each time we reapply the algorithm, the number of uncolored edges decrease by a factor of . So the expected recursion depth would be before all uncolored edges fit in memory.
According to the reductions in the preliminaries, we will focus on partial edge coloring in bipartite graphs with fixed degree pairs. More specifically, our algorithm consists of the following two components.
Lemma 5.
Fix a parameter . Given an graph on vertices with maximum vertex degree , for any constant , and fix an integer pair , there is a randomized W-streaming algorithm that outputs a partial coloring of edges such that least edges receive colors in expectation; here is also a constant. The algorithm uses colors and space.
Lemma 6.
Given an graph on vertices with maximum vertex degree , and fix an integer pair , there is a randomized W-streaming algorithm that outputs a partial coloring of edges such that least edges receive colors in expectation. The algorithm uses colors and space.
Proof of Theorem 1.
3.1 Proof of Lemma 5
3.1.1 Data Structures
Before presenting the details of the data structures we will use in the main algorithm, let us start with a brief technical overview. The data structures consist of three components.
-
Forest structures on batches. This part organizes edge input batches into parameterized forests in a way similar to B-trees. The height of the forests will be , and the branch size of each level will be an integer power of in the range , and so the total number of different forests will be bounded by .
-
Color allocation on forests. Each forest will be associated with a separate color set of size roughly . On each level of a forest, we will randomly partition the color set of this forest into color subsets (packages) and assign them to the tree nodes on this level. We will also make sure that the color packages on tree nodes are nested; that is, the color package of a node is a subset of the color package of its parent node.
Each vertex will choose colors for its incident edges according to its own frequencies of accumulating edges in the stream. For example, when a vertex is gathering a large number of incident edges in a short interval of batches, it would use color packages on tree nodes with large branch sizes.
-
Vertex-wise data structures. To ensure that we never assign the same color to adjacent edges, each vertex needs to remember which colors it has already used around it. To efficiently store all the previously used colors, we will show that the used colors are actually concentrated in color packages, so each vertex only needs to store the tree nodes corresponding to those color packages, instead of storing every used colors individually.
Next, let us turn to the details of the data structures we have outlined above.
Forest Structures on Batches
Without loss of generality, assume , , and is an integer power of two; note that if , then the maximum degree inside each batch is at most , so we can use a fresh palette of size , so the total number of colors would be .
As we have done in the preliminaries, partition the entire input stream into at most batches of size . We will create at most different forest structures, where each leaf represents a batch, and the internal tree nodes represent consecutive batches. Each forest is parameterized by an integer vector such that:
-
is an integer power of two;
-
, and .
We can assume the total number of batches in the input stream is an integer multiple of by padding empty batches. Given such a vector , we will define a forest structure with levels by a bottom-up procedure; basically we will build a forest on all the batches with branching factors bottom-up. More specifically, consider the following inductive procedure.
-
All the batches will be leaf nodes on level . Partition the sequence of all batches into groups of consecutive batches. For each group, create a tree node at level- connecting to all leaf nodes in that group.
-
Given any , assume we have defined all the tree nodes on levels . List all the tree nodes on level following the same ordering of the batches, and partition this sequence of level- nodes into groups of size . For each group, create a tree node at level- that connects to all nodes in the group.
According to the definition, in general, tree levels up to have the same topological structure for all frequency vectors which share the same first coordinates . For any node , let be the subtree rooted at node . By the above definition, for any , for any level- node , the set of all leaf nodes contained in the subtree form a sub-interval of the batch sequence with length .
Color Allocation on Forests
Next, we will allocate color packages at each tree node of each forest . By construction, each forest structure is a tree of levels (from level- to level-), and each tree is rooted at a level- node. Go over each tree and we will allocate color packages to tree nodes in a top-down manner.
-
Basis. At the root of the tree , allocate a color package with new colors. Divide package evenly into smaller packages (colors are ordered alphabetically in a package):
Here, symbol means disjoint union. By construction of tree , has different children . Let sequence be a random permutation of . For each , define color package .
-
Induction. In general, suppose we have defined color packages for tree nodes on levels . Go over all tree nodes on level . Inductively, assume . Divide color package into smaller packages (colors are ordered alphabetically in a package):
Let be a random permutation of . For each such level- node , by construction, it has children . Then, for each , define color package .
By the above construction, each leaf node is allocated a color package of size . We will call each such smallest color package a palette. Notice that, by definition, the same palette may appear at multiple leaf nodes (which represent input batches). For any leaf node (or equivalently, a batch), let count the total number of times that palette is also allocated to previous leaf nodes (batches). Note that the counters do not depend on the input stream and can be computed in advance.
Vertex-Wise Data Structures
To carry out the streaming algorithm, we will also maintain some data structures for vertices in . At the beginning of the streaming algorithm, for each vertex , draw a random shift uniformly at random; these random shifts ’s will remain fixed throughout the entire execution of the algorithm.
The main part is to specify the data structures associated with each vertex . For each forest , we will maintain a set of marked nodes throughout the streaming algorithm.
Invariant 7.
We will ensure the following properties regarding the marked nodes throughout the execution of the streaming algorithm.
-
(1)
No two nodes in lie on the same root-to-leaf path in the forest . Furthermore, suppose the current input batch corresponds to a leaf , and let be the unique tree path from to the tree root. Then, any node is a child of some node on .
-
(2)
For any node on level- such that , let be all the input batches which correspond to leaf nodes in subtree . Take the union of the batches . Then, we have .
-
(3)
For any previous input batch before such that:
-
and are in the same tree component in ,
-
,
-
used some colors in during the algorithm,
we guarantee that must be contained in some subtree for some .
-
-
(4)
The choices of is independent of the randomness of and the randomness of color packages , and they only depend deterministically on the input stream.
Let us explain the purpose of the above properties. Invariant 7(1)(2) ensures that the algorithm only uses space in total. Invariant 7(3) allows us to rule out all colors used previously, preventing duplicate assignments to edges incident on the same vertex. Invariant 7(4) will be technically important for the analysis of randomization.
3.1.2 Algorithm Description
Next, let us turn to describe the coloring procedure. At the beginning, all the marked sets are empty for any and any frequency vector . Upon the arrival of a new input batch , we will describe the procedures that update the marked sets and assign edge colors.
Preprocessing Marked Sets
Since the current input batch has changed due to the arrival of , we may have violated Invariant 7(2) as the root-to-leaf tree path may have changed. Therefore, we first need to update all the marked sets with the following procedure named .
Go over every vertex and every frequency vector . Consider the position of in the forest , and let be the root-to-leaf path in ending at leaf . First, remove all marked nodes which are not in the same tree as ; note that this may happen when is the first leaf in a new tree component of .
Next, go over every node lying on the tree path , for any child node of , if (1) and (2) , then remove all nodes in from and add to . In other words, we elevate the positions of all the marked nodes in to their ancestors which are children of . See Figure 1 for an illustration.
Coloring
To find colors for edges in , we first need to find a proper palette for each vertex such that . We will describe an algorithm which finds a proper frequency vector for each such and use the palette associated with leaf node in the forest . To identify the proper frequency vector , we will figure out each coordinate one by one inductively.
-
Basis. Let be the unique level- node containing (recall that level- nodes are defined irrespective of frequency vectors). Find which is the smallest integer such that there exists a frequency vector , so that contains less than children of . Note that such a vector must exist, because always satisfies this requirement.
If , return the 1-dimensional vector .
-
Induction. In general, assume we have specified a prefix for some . Note that all the forests share the same topological structures from level up to . Let be the unique level- node containing conditioning on . Check all the frequency vectors that begin with the prefix , and find the smallest possible such that there exists a frequency vector under the condition that contains less than children of . Note that such a vector must exist, because always satisfies this requirement.
If , then return the vector .
-
Choosing palette . Once we have finished the above inductive process, we need to choose a palette for which contains different colors. Basically, we will choose the palette associated with leaf node in forest as , but must ensure that has not been used previously.
To make sure of this, let denote the root-to-leaf tree path ending at leaf node in . Travel down the path from root to leaf and enumerate all the encountered tree nodes. For every such tree node , if already contains a sibling and , then abort this procedure and assign .
In the end, if this procedure terminates without abortion, then assign ; here refers to the palette of size associated with leaf node in forest .
In this way, we can select a frequency vector for every vertex such that . Let be the palette assigned to leaf node by forest . Next, we are going to color all edges in using colors from for edges incident on . Go over each vertex , and list its neighbors in . For any index , if , then reserve a tentative color to edge , which is the -th color in palette and
Recall that counts the number of times that palette has appeared at previous leaf nodes under . Notice that , these tentative colors are different around .
On the side of , it checks all the tentative colors on all of its edges in . If a tentative color is used more than once, then it keeps only one of them, and turns other tentative colors back to . Finally, for each edge , assign its own tentative color and print it in the output stream.
Postprocessing Marked Sets
After processing the input batch , for every vertex such that , add node to ; note that we mark irrespective of whether is or not as will be important for establishing Invariant 7(4). The whole algorithm is summarized in Algorithm 1.
3.1.3 Proof of Correctness
To begin with, let us first bound the total number of different colors that we could possibly use.
Lemma 8.
The total number of colors that the algorithm could use is .
Proof.
By the design of the forest structures, for each frequency vector , the number of colors assigned to each tree in the forest is bounded by Since each tree in covers batches, and there are at most batches, the total number of colors in is bounded by
Since there are different choices for , the total number of colors used by the algorithm is
Let us next state some basic properties of any forest .
Lemma 9.
Each palette is used by at most different batches.
Proof.
Consider a palette allocated to a leaf node . By the construction of color packages:
-
At the root, exactly one package contains this palette, and children of the root inherit the palette. Therefore, at level , the palette is used in at most nodes.
-
Suppose at level , the palette is used in nodes. Each of these nodes has children that inherit the palette. Thus, at level , the palette is used in at most nodes.
By induction, the palette is used in at most different batches at level 0, as required.
Corollary 10.
During the algorithm, the values of the counters never exceed for any palette .
To bound the total space, it is clear that the bottleneck is storing all the marked sets , since all the forest structures and color assignments only require space .
Lemma 11.
If Invariant 7 is satisfied, then at any point during the execution of the algorithm, for any given frequency vector , the total size of marked sets is bounded by . Consequently, the total space usage is bounded by .
Proof.
For any level- vertex , the subtree contains exactly batches, and thus edges. Let denote the union of all batches in . For any vertex , if , by Invariant 7(2), we have . Thus, the number of vertices such that is bounded by:
Suppose the current working batch is . By Invariant 7(1), any marked vertex is a child of a node on the root-to-leaf path . Thus, among all vertices at level , there are vertices that are candidates for being marked. Taking the summation over all levels, the total size of is bounded by
Since the depth of the tree is at most , and there are distinct frequency vectors , the total space usage is therefore bounded by . Now, our main focus will be on verifying all the properties in Invariant 7.
Lemma 12.
Invariant 7 is preserved by the algorithm throughout its execution.
Proof.
Property (1) is preserved in a straightforward manner by the way we maintain all the marked sets in the pre- and post-processing step for each input batch. Property (4) is also guaranteed because our updates to marked sets only depend on the input stream, not on the randomness used by the algorithm.
As for property (3), according to the coloring algorithm, for any previous input batch where , if vertex used some colors associated with leaf node in forest , then the algorithm must have added to back then. Then, according to the preprocessing rules, would always be contained in the subtree of some marked as long as the current leaf node belongs to the same connected tree component as in .
Next, we will focus on property (2). We will prove this by an induction on time. As the basis, property (2) holds trivially because all the marked sets are empty. Consider the arrival of any new input batch and any vertex such that .
First, we argue that the pre-processing step does not harm property (2). This is because the inequality in property (2) is required for all ancestors of any marked node (including itself), so if it held right before the arrival of , then it should also hold after the pre-processing step as we are only raising the positions of marked nodes to their ancestors.
Next, we consider the coloring step and the post-processing step. According to the coloring procedure, since will only use colors and mark nodes in forest , we will only need to worry about property (2) in forest . Before the post-processing step, let be the root-to-leaf path ending at leaf node in , and let be the highest (in terms of levels) node on such that , namely the subtree does not contain any marked nodes; this node always exists because itself is a possible candidate.
To prove property (2) after we add to , we only need to verify the inequality for all nodes on between and . List all these nodes as . We will prove the inequality of property (2) for each index .
-
When , we already know that , so the inequality holds.
-
For any index , consider the procedure that chose the coordinate . Because of the minimality of , we know that for some alternate frequency vector , there are at least marked children of (before the post-processing step), where sits at the same topological position as (or in other words, and are isomorphic). Let be the union of the batches which are all leaf nodes of . Therefore, by our inductive assumption regarding property (2), we know that:
This verifies the inequality at node .
Next, let us turn to the color assignment part. First, we need to verify that the algorithm never assigns the same color twice around the neighborhood of a single vertex.
Lemma 13.
In the output stream, the algorithm never prints the same color for two adjacent edges.
Proof.
First, let us rule out color conflicts for edges around the same vertex . This is rather straightforward, because according to the algorithm description, within each batch, we only assign tentative colors with distinct color indices around each vertex . For two different batches , if we happen to use the same color palette in around the same vertex , then the counter values must be different in these two batches. According to Corollary 10 and that , the value of is always in the range . Together with the fact that , the algorithm must use distinct colors in from .
Next, let us rule out color conflicts for edges around the same vertex . For any color palette provided by which was used before in some previous batch , according to Invariant 7(3), there must be a marked node whose subtree contains . According to our coloring procedure, is nonempty only when is disjoint from all the color packages of its marked siblings , for any node on the root-to-leaf path to in . Therefore, could not reuse any palettes previously assigned. Furthermore, since we discard repeated colors within a single palette, the assigned colors must be distinct.
Finally, let us prove that the algorithm successfully colors a good fraction of all edges in the input stream.
Lemma 14.
The total number of colored edges in the output stream is at least in expectation, where .
Proof.
Fix any input batch and any vertex such that , it suffices to lower bound the expected number of colored edges in incident on .
First, we need to analyze the probability that the color palette is nonempty, based on the randomness of the distribution of color packages on forest . As before, let denote the root-to-leaf path in ending at , and let be all the nodes on the tree path. According to the selection of the frequency vector , for any , has less than marked siblings. By design, the color packages of are determined by the length- prefix of a random permutation . Therefore, by the independence guarantee from Invariant 7(4), the probability that does not conflict with the color packages of any other marked siblings is at least . By applying this bound over all levels, the probability that is nonempty is at least .
Next, conditioning on the event that , let us analyze the amount of edges in that are colored around . Let be the neighbors of in graph , where . For any fixed , as was chosen uniformly at random from , the probability that for all is at least . This ensures that the tentative colors between and survive in the output stream with probability at least .
Overall, the expected number of colored edges in would be at least , which concludes the proof.
3.2 Proof of Lemma 6
Without loss of generality, assume is a power of 2.
3.2.1 Data Structures
At the beginning, for each vertex , draw a random number uniformly at random. For each vertex , draw a random number uniformly at random.
As in the preliminary steps, the input stream is divided into batches of size (except for the last one). For each , let count the number of previous batches where , and symmetrically let count the number of previous batches where for .
Additionally, we will use a palette matrix of size , where each entry in corresponds to a distinct palette of size , with . All the colors can be represented as integers in the range naturally.
3.2.2 Algorithm Description
Let us describe the coloring procedure upon the arrival of a new input batch . For each vertex , propose a tentative row index . For each vertex , propose a tentative column index . For each edge , we will assign a color from the matrix in the following manner.
Let be the set of edges where , and let be the subgraph whose edge set is . To color with only colors, we need to prune it so that its maximum degree does not exceed , which is done in this way: for each edge , if , mark it as (uncolored) and remove it from . Finally, since is a bipartite graph, we can apply the -edge coloring algorithm [23] to color using the palette which has size .
Finally, for each vertex such that , increment the counter by ; also, increment the counters for in a symmetric way. The whole algorithm is summarized in Algorithm 2.
3.2.3 Proof of Correctness
Proper Coloring
To prove that the colored edges form a proper coloring, we need to show that for any two distinct edges and sharing a common vertex , their colors are different. Since the algorithm is symmetric for and , we can assume . There are several cases below.
-
If and use different matrix entries in , their colors are already distinct.
-
Suppose both edges use palette . Then, there are two sub-cases below.
-
–
If and belong to different batches and with batch counters and , we have . Since , never exceeds . Thus, , which leads to a contradiction.
-
–
If and belong to the same batch, they belong to the same subgraph . The correctness of the offline coloring algorithm guarantees they get distinct colors.
-
–
Space Usage
For each vertex , we maintain its batch counter and random shift for (or for ), requiring space in total. During the batch coloring process, we also store the indices and , which require additional space. Furthermore, each subgraph has at most edges, so the offline coloring algorithm requires space. These spaces are reused across different subgraph coloring processes and different batches. Therefore, the overall space complexity is .
Number of Colors
The total number of colors is given by
Next, we show that at least half of the edges get colored in expectation.
Lemma 15.
During the algorithm, at least a fraction of the edges are colored in expectation.
Proof.
For any edge such that , we estimate the probability that is colored. Define . It suffices to lower bound the probability that .
Let be all edges incident on other than . Since is a simple graph, all the vertices are distinct and are different from . Since is uniformly distributed in , the probability that is at most . Using Markov’s inequality, we have:
Symmetrically, we can argue that . Hence, the probability that remains in after the pruning procedure would be at least . This concludes the proof.
References
- [1] Mohammad Ansari, Mohammad Saneian, and Hamid Zarrabi-Zadeh. Simple streaming algorithms for edge coloring. In 30th Annual European Symposium on Algorithms (ESA 2022). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2022.
- [2] Eshrat Arjomandi. An efficient algorithm for colouring the edges of a graph with colours. INFOR: Information Systems and Operational Research, 20(2):82–101, 1982.
- [3] Sepehr Assadi. Faster Vizing and Near-Vizing Edge Coloring Algorithms. In Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2025.
- [4] Sepehr Assadi, Soheil Behnezhad, Sayan Bhattacharya, Martín Costa, Shay Solomon, and Tianyi Zhang. Vizing’s theorem in near-linear time. arXiv preprint arXiv:2410.05240, 2024. doi:10.48550/arXiv.2410.05240.
- [5] Alkida Balliu, Sebastian Brandt, Fabian Kuhn, and Dennis Olivetti. Distributed edge coloring in time polylogarithmic in . In Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing, pages 15–25, 2022. doi:10.1145/3519270.3538440.
- [6] Leonid Barenboim and Tzalik Maimon. Fully-dynamic graph algorithms with sublinear time inspired by distributed computing. In International Conference on Computational Science (ICCS), volume 108 of Procedia Computer Science, pages 89–98. Elsevier, 2017. doi:10.1016/J.PROCS.2017.05.098.
- [7] Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Marina Knittel, and Hamed Saleh. Streaming and massively parallel algorithms for edge coloring. In 27th Annual European Symposium on Algorithms (ESA 2019). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019.
- [8] Anton Bernshteyn. A fast distributed algorithm for -edge-coloring. J. Comb. Theory, Ser. B, 152:319–352, 2022.
- [9] Sayan Bhattacharya, Din Carmon, Martín Costa, Shay Solomon, and Tianyi Zhang. Faster -Edge Coloring: Breaking the Time Barrier. In 65th IEEE Symposium on Foundations of Computer Science (FOCS), 2024.
- [10] Sayan Bhattacharya, Deeparnab Chakrabarty, Monika Henzinger, and Danupon Nanongkai. Dynamic Algorithms for Graph Coloring. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1–20. SIAM, 2018. doi:10.1137/1.9781611975031.1.
- [11] Sayan Bhattacharya, Martín Costa, Nadav Panski, and Shay Solomon. Nibbling at Long Cycles: Dynamic (and Static) Edge Coloring in Optimal Time. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM, 2024. doi:10.1137/1.9781611977912.122.
- [12] Sayan Bhattacharya, Martín Costa, Shay Solomon, and Tianyi Zhang. Even Faster -Edge Coloring via Shorter Multi-Step Vizing Chains. In Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2025. (to appear).
- [13] Sayan Bhattacharya, Fabrizio Grandoni, and David Wajc. Online edge coloring algorithms via the nibble method. In Proceedings of theACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2830–2842. SIAM, 2021. doi:10.1137/1.9781611976465.168.
- [14] Joakim Blikstad, Ola Svensson, Radu Vintan, and David Wajc. Online edge coloring is (nearly) as easy as offline. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). ACM, 2024. doi:10.1145/3618260.3649741.
- [15] Joakim Blikstad, Ola Svensson, Radu Vintan, and David Wajc. Deterministic Online Bipartite Edge Coloring. In Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2025.
- [16] Yi-Jun Chang, Qizheng He, Wenzheng Li, Seth Pettie, and Jara Uitto. Distributed Edge Coloring and a Special Case of the Constructive Lovász Local Lemma. ACM Trans. Algorithms, 16(1):8:1–8:51, 2020. doi:10.1145/3365004.
- [17] Moses Charikar and Paul Liu. Improved algorithms for edge colouring in the w-streaming model. In Symposium on Simplicity in Algorithms (SOSA), pages 181–183. SIAM, 2021. doi:10.1137/1.9781611976496.20.
- [18] Shiri Chechik, Hongyi Chen, and Tianyi Zhang. Improved streaming edge coloring, 2025. arXiv:2504.16470.
- [19] Shiri Chechik, Doron Mukhtar, and Tianyi Zhang. Streaming Edge Coloring with Subquadratic Palette Size. In Karl Bringmann, Martin Grohe, Gabriele Puppis, and Ola Svensson, editors, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024), volume 297 of Leibniz International Proceedings in Informatics (LIPIcs), pages 40:1–40:12, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ICALP.2024.40.
- [20] Aleksander B. G. Christiansen. Deterministic dynamic edge-colouring. CoRR, abs/2402.13139, 2024. doi:10.48550/arXiv.2402.13139.
- [21] Aleksander Bjørn Grodt Christiansen. The Power of Multi-step Vizing Chains. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing (STOC), pages 1013–1026. ACM, 2023. doi:10.1145/3564246.3585105.
- [22] Ilan Reuven Cohen, Binghui Peng, and David Wajc. Tight bounds for online edge coloring. In 60th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 1–25. IEEE Computer Society, 2019. doi:10.1109/FOCS.2019.00010.
- [23] Richard Cole, Kirstin Ost, and Stefan Schirra. Edge-coloring bipartite multigraphs in o (e logd) time. Combinatorica, 21(1):5–12, 2001. doi:10.1007/S004930170002.
- [24] Peter Davies. Improved distributed algorithms for the lovász local lemma and edge coloring. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 4273–4295. SIAM, 2023. doi:10.1137/1.9781611977554.CH163.
- [25] Camil Demetrescu, Irene Finocchi, and Andrea Ribichini. Trading off space for passes in graph streaming problems. ACM Transactions on Algorithms (TALG), 6(1):1–17, 2009. doi:10.1145/1644015.1644021.
- [26] Ran Duan, Haoqing He, and Tianyi Zhang. Dynamic edge coloring with improved approximation. In 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2019.
- [27] Aditi Dudeja, Rashmika Goswami, and Michael Saks. Randomized Greedy Online Edge Coloring Succeeds for Dense and Randomly-Ordered Graphs. In Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2025.
- [28] Michael Elkin, Seth Pettie, and Hsin-Hao Su. -Edge-Coloring is Much Easier than Maximal Matching in the Distributed Setting. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 355–370. SIAM, 2014.
- [29] Manuela Fischer, Mohsen Ghaffari, and Fabian Kuhn. Deterministic distributed edge-coloring via hypergraph maximal matching. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 180–191. IEEE, 2017. doi:10.1109/FOCS.2017.25.
- [30] Harold N Gabow, Takao Nishizeki, Oded Kariv, Daneil Leven, and Osamu Terada. Algorithms for edge coloring. Technical Rport, 1985.
- [31] Mohsen Ghaffari, Fabian Kuhn, Yannic Maus, and Jara Uitto. Deterministic distributed edge-coloring with fewer colors. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 418–430, 2018. doi:10.1145/3188745.3188906.
- [32] Prantar Ghosh and Manuel Stoeckl. Low-memory algorithms for online edge coloring. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024.
- [33] Christian Glazik, Jan Schiemann, and Anand Srivastav. Finding euler tours in one pass in the w-streaming model with o (n log (n)) ram. arXiv preprint arXiv:1710.04091, 2017. arXiv:1710.04091.
- [34] Venkatesan Guruswami, Christopher Umans, and Salil Vadhan. Unbalanced expanders and randomness extractors from parvaresh–vardy codes. Journal of the ACM (JACM), 56(4):1–34, 2009. doi:10.1145/1538902.1538904.
- [35] Itay Kalev and Amnon Ta-Shma. Unbalanced expanders from multiplicity codes. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022.
- [36] Janardhan Kulkarni, Yang P. Liu, Ashwin Sah, Mehtaab Sawhney, and Jakub Tarnawski. Online edge coloring via tree recurrences and correlation decay. In 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 104–116. ACM, 2022. doi:10.1145/3519935.3519986.
- [37] Alessandro Panconesi and Romeo Rizzi. Some simple distributed algorithms for sparse networks. Distributed computing, 14(2):97–100, 2001. doi:10.1007/PL00008932.
- [38] Amin Saberi and David Wajc. The greedy algorithm is not optimal for on-line edge coloring. In 48th International Colloquium on Automata, Languages, and Programming (ICALP), volume 198 of LIPIcs, pages 109:1–109:18, 2021. doi:10.4230/LIPICS.ICALP.2021.109.
- [39] Mohammad Saneian and Soheil Behnezhad. Streaming edge coloring with asymptotically optimal colors. In 51st International Colloquium on Automata, Languages, and Programming, ICALP 2024, July 8-12, 2024, Tallinn, Estonia, volume 297 of LIPIcs, pages 121:1–121:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.ICALP.2024.121.
- [40] Claude E Shannon. A theorem on coloring the lines of a network. Journal of Mathematics and Physics, 28(1-4):148–152, 1949.
- [41] Corwin Sinnamon. Fast and simple edge-coloring algorithms. arXiv preprint arXiv:1907.03201, 2019.
- [42] Amnon Ta-Shma, Christopher Umans, and David Zuckerman. Loss-less condensers, unbalanced expanders, and extractors. In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 143–152, 2001. doi:10.1145/380752.380790.
- [43] Vadim G Vizing. The chromatic class of a multigraph. Cybernetics, 1(3):32–41, 1965.