Canonical Labeling of Sparse Random Graphs
Abstract
We show that if , then the Erdős-Rényi random graph with high probability admits a canonical labeling computable in time . Combined with the previous results on the canonization of random graphs, this implies that with high probability admits a polynomial-time canonical labeling whatever the edge probability function . Our algorithm combines the standard color refinement routine with simple post-processing based on the classical linear-time tree canonization. Noteworthy, our analysis of how well color refinement performs in this setting allows us to complete the description of the automorphism group of the 2-core of .
Keywords and phrases:
Graph isomorphism, random graphs, canonical labeling, color refinementFunding:
Oleg Verbitsky: Supported by DFG grant KO 1053/8–2. On leave from the IAPMM, Lviv, Ukraine.Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Mathematics of computing Random graphs ; Mathematics of computing Graph algorithmsAcknowledgements:
The authors would like to thank Michael Krivelevich for helpful discussions.Editors:
Olaf Beyersdorff, Michał Pilipczuk, Elaine Pimentel, and Nguyễn Kim ThắngSeries and Publisher:

1 Introduction
On an -vertex input graph , a canonical labeling algorithm computes a bijection such that if another graph is isomorphic to , then the isomorphic images of and under respective permutations and are equal. Given the labelings and , it takes linear time to check whether and are isomorphic. The existence of polynomial-time algorithms for testing isomorphism of two given graphs and, in particular, for producing a canonical labeling remain open. Babai’s breakthrough quasi-polynomial algorithm for testing graph isomorphism [7] was subsequently extended to a canonical labeling algorithm of the same time complexity [8]. In the present paper, we address the canonical labeling problem for the Erdős-Rényi (or binomial) random graph . Recall that the vertex set of is , and each pair of vertices is adjacent with probability , independently of the other pairs.
Babai, Erdős, and Selkow [5] proved that the simple algorithmic routine known as color refinement (CR for brevity) with high probability produces a discrete coloring of the vertices of , that is, a coloring where the vertex colors are pairwise different. Since the vertex colors are isomorphism-invariant, this yields a canonical labeling of by numbering the color names in the lexicographic order. Here and throughout, we say that an event happens for with high probability (whp for brevity) if the probability of this event tends to 1 as . The result of [5] has a fundamental meaning: almost all graphs admit an easily computable canonical labeling and, hence, the graph isomorphism problem has low average-case complexity.
The argument of [5] can be extended to show [14, Theorem 3.17] that the CR coloring of is whp discrete for all . Note that it is enough to consider the case of since has the same distribution as the complement of . Remarkably, the algorithm of Babai, Erdős, and Selkow performs only a bounded number of color refinement steps and, due to this, works in linear time.
A different algorithm suggested by Bollobás [12] works in polynomial time and whp produces a canonical labeling of in a much sparser regime, namely when for any positive constant . The next improvement was obtained by Czajka and Pandurangan [16] who proved that, in a bounded number of rounds, CR yields a discrete coloring of whp when . Finally, Linial and Mosheiff [27] designed a polynomial time algorithm for canonical labeling of when . As shown by Gaudio, Rácz, and Sridhar [21], in the subdiapason for any fixed , a canonical labeling can still be provided by CR in a bounded number of rounds.
The decades-long line of research summarized above leaves open the question whether a random graph admits efficient canonization in the regime . Note that the case of is easy. Indeed, as long as , whp is a vertex-disjoint union of trees and unicyclic graphs (i.e., connected graphs containing exactly one cycle). Canonization of such graphs is tractable due to the classical linear-time canonical labeling algorithms for trees [1] and even planar graphs (see [6] for a survey of the early work on graph isomorphism covering these graph classes). Thus, efficient canonization remains unknown for all such that, for some and all , (even though stays planar with a non-negligible probability as long as ; see [32]). Our first result closes this gap.
Theorem 1.
If , then whp admits a canonical labeling computable in time .
Edge probability | Algorithm |
---|---|
Babai, Erdős, and Selkow [5]∗ | |
Bollobás [14]∗ | |
Bollobás [12] | |
Czajka and Pandurangan [16]∗ | |
Gaudio, Rácz, and Sridhar [21]∗ | |
Linial and Mosheiff [27] | |
this paper |
The development of canonical labeling algorithms for is summarized in Table 1. The sources marked by ∗ show that canonical labeling in the corresponding range can be obtained by CR in a constant number of refinement rounds. An inspection of the other algorithms reveals that all of them can be implemented as a combination of the 2-WL (2-dimensional Weisfeiler-Leman) algorithm [34] with tree canonization. The 2-WL is an extension of CR which computes a canonical coloring of pairs of vertices. Thus, in these cases, canonical labeling can be obtained in time , matching the running time bound for 2-WL (see [23]). If , the running time is actually because our algorithm, as discussed below, uses CR along with simple pre- and post-processing.
A simple argument shows that Theorem 1, combined with the previous results, implies that the Erdős-Rényi random graph whp admits an efficiently computable canonical labeling whatever the edge probability function .
Corollary 2.
There exists a polynomial time algorithm that, for any function with values in , whp produces a canonical labeling of .
We now recall some highlights of the evolution of the random graph. Erdős and Rényi [19] proved their spectacular result that when passes a certain threshold around , then the size of the largest connected component in rapidly grows from to . A systematic study of the structure of connected components in the random graph when is around the critical value was initiated in the influential paper of Bollobás [13]. For more details about the phase transition, see, e.g., [24, Chapter 5].
A connected graph is called complex, if it has more than one cycle. The union of all complex components of a graph will be called the complex part of , and the union of the other components will be referred to as the simple part. As we already mentioned, if then the complex part of is whp empty. This is the so-called subcritical phase. In the critical phase, when , the complex part of whp has size and its structure is thoroughly described in [29, 30]. Here and below, for a sequence of random variables and a sequence of reals we write if the sequence is stochastically bounded111I.e. for every , there exists and such that for all .. Finally, in the supercritical phase, when , whp contains a unique complex connected component and this component has size . In particular, when and for a constant (we refer to this case as strictly supercritical regime), whp this component has linear size . It is called the giant component as all the other connected components have size .
In general, the simple part of a graph is easily canonizable by the known techniques, which reduces our problem to finding a canonical labeling for the complex part of . Furthermore, recall that the 2-core of a graph , which we will for brevity call just core and denote by , is the maximal subgraph of that does not have vertices of degree 1. Equivalently, can be defined as the subgraph of obtained by iteratively pruning all vertices in that have degree at most 1 until there are no more such vertices. Thus, if is the (non-empty) complex part of , then consists of and some (possibly empty) rooted trees planted at the vertices of the core. It follows that if we manage to canonically label the vertices of , then this labeling easily extends to a canonical labeling of the entire graph .
Suppose that CR is run on . In the most favorable case, it would output a vertex coloring discrete on . It turns out that, though not exactly true, this is indeed the case to a very large extent.
Theorem 3.
Let and assume that . Let denote the complex part of and . When CR is run on , then
-
1.
CR assigns individual colors to all but vertices in ;
-
2.
the other color classes whp have size 2;
-
3.
whp, every such color class is an orbit of the automorphism group consisting of two vertices with degree 2 in .
Remark 4.
From our proofs it is easy to derive that, when , then CR distinguishes between all vertices of whp.
Theorem 3 allows us to obtain an efficient canonical labeling algorithm for , as stated in Theorem 1, by combining CR with simple post-processing whose most essential part is invoking the linear-time tree canonization. Another consequence of Theorem 3 is that CR alone is powerful enough to solve the standard version of the graph isomorphism problem for the complex part of . Specifically, we say that a graph is identifiable by CR if CR distinguishes from any non-isomorphic graph (in the sense that CR outputs different multisets of vertex colors on inputs and ). It is not hard to see that is identifiable by CR whenever the CR coloring of is discrete. Fortunately, the properties of the CR coloring ensured by Theorem 3 are still sufficient for CR-identifiability.
Corollary 5.
Under the assumption of Theorem 3,
-
1.
is whp identifiable by CR and, consequently,
-
2.
whp, is identifiable by CR exactly when the simple part of is identifiable.
The CR-identifiability of the simple part of a graph admits an explicit, efficiently verifiable characterization, which we give in Theorem 14. This characterization can be used to show that the random graph is identifiable by CR with probability asymptotically bounded away from 0 and 1.
Our techniques for proving Theorems 1 and 3 can also be used for deriving a structural information about the automorphisms of a random graph. As proved by Erdős and Rényi [20] and by Wright [35], for is asymmetric, i.e., has no non-identity automorphism, if as . This result is best possible because if, for some constant , then the random graph has at least 2 isolated vertices with non-vanishing probability. It is noteworthy that the asymmetry of in the regime can be certified by the fact that CR coloring of is discrete due to the aforementioned result of Gaudio, Rácz, and Sridhar [21]. In the diapason of forcing to be disconnected, the action of the automorphism group can be easily understood on the simple part and on the tree-like pieces of the complex part, and full attention should actually be given to the core of the complex part. Theorem 3 provides a pretty much precise information about the action of on . More subtle questions arise if, instead of considering the action of on , we want to understand the automorphisms of itself. It is quite remarkable that the CR algorithm, applied to rather than to , can serve as a sharp instrument for tackling this problem (and, in fact, the proof of Theorem 3 is based on an analysis of the output of CR on ).
We begin with describing simple types of potential automorphisms of (with the intention of showing that, whp, all automorphism of are actually of this kind). If a vertex has degree 2 in , then it belongs to a (unique) path from a vertex of degree at least 3 to a vertex of degree at least 3 with all intermediate vertices having degree 2. We call such a path in pendant. It is possible that , and in this case we speak of a pendant cycle. Clearly, the reflection of a pendant cycle fixing its unique vertex of degree more than 2 is an automorphism of . Furthermore, call two pendant paths transposable if they have the same length and share the endvertices. Note that has an automorphism transposing such paths (and fixing their endvertices). Let denote the set of the automorphisms provided by pendant cycles, and let be the set of the automorphisms provided by transposable pairs of pendant paths. Moreover, can have a connected component consisting of two vertices of degree 3 and three pendant paths of pairwise different lengths between these vertices. Such a component has a single non-trivial automorphism, which contributes in . The set of such automorphisms of will be denoted by .
Recall that an elementary abelian 2-group is a group in which all non-identity elements have order 2 or, equivalently, a group isomorphic to the group for some .
Theorem 6.
Let and assume that . Let be the core of the complex part of .
-
1.
The order of is stochastically bounded, i.e., .
-
2.
Whp, is an elementary abelian 2-group. Moreover, is a minimum generating set of .222Consequently, whp contains neither a triple of pairwise transposable paths, nor two isomorphic components with an automorphism in , nor a cyclic component with a single chord between diametrically opposite vertices. Moreover, whp no two pendant cycles in share a vertex.
-
3.
In addition,
-
(a)
if for a constant , then both and are non-empty with non-negligible probability, while whp.
-
(b)
If and , then with non-negligible probability, while whp.
-
(c)
If , then both and are non-empty with non-negligible probability, while whp.
-
(a)
This theorem makes a final step in the study of the automorphisms group of a random graph. Recall that is whp empty when and that is connected and asymmetric when . We, therefore, focus on the intermediate diapason. If as , then the core of the giant component of is whp still asymmetric, as proved independently by Łuczak [28] and Linial and Mosheiff [27]. Moreover, Łuczak described the automorphisms group of the core of the giant component of when for a large enough constant , and obtained an analogue of Theorem 6 for this case; see [28, Theorem 4]. Our Theorem 6 not only extends [28, Theorem 4] to the full range of but also refines this result even for by showing that is actually an elementary 2-group. Another interesting observation is that some automorphisms of the core do not extend to automorphisms of the entire . Indeed, if , then whp acts trivially on the core; see Remark 4.
Related work.
As we already mentioned, Theorem 1 combined with the previous results on canonical labeling of for implies the existence of a polynomial-time canonical labeling algorithm succeeding on whp for an arbitrary edge probability function . In this form, this result has been independently obtained by Michael Anastos, Matthew Kwan, and Benjamin Moore [2]. Another result in their paper describes the action of on the core of , which follows also from our Theorem 3 and the results of Łuczak [28] and Linial and Mosheiff [27]. The techniques used in [2] and in our paper are completely different, though both proofs rely on color refinement. The two approaches have their own advantages. The method developed in [2] is used there also to show that color refinement is helpful for canonical labeling of the random graph when and to study the smoothed complexity of graph isomorphism. Our method allows obtaining precise results on the automorphism group of the core (Theorem 6).
Immerman and Lander [23] showed a tight connection between CR-identifiability and definability of a graph in first-order logic with counting quantifiers. Corollary 5 can, therefore, be recast in logical terms as follows. If , then is whp definable in this logic with using only two first-order variables (where the definability of a graph means the existence of a formula which is true on and false on any graph non-isomorphic to ). Definability of the giant component of in the standard first-order logic (without counting quantifiers) was studied by Bohman et al. [11].
The rest of the paper and proof strategy.
Section 2 begins with formal description of the color refinement algorithm in Subsection 2.1 and then, in Subsection 2.2, presents a useful criterion of CR-distinguishability in terms of universal covers. The concept of a universal cover appeared in algebraic and topological graph theory [10, 15, 31], and its tight connection to CR was observed in [3]. Subsection 2.3 pays special attention to the CR-identifiability of unicyclic graphs, which in Subsection 2.4 allows us to obtain an explicit criterion of CR-identifiability for general graphs in terms of the complex and the simple part of a graph. Finally, in Subsection 2.5 we use the relationship between CR and universal covers to prove useful properties of the CR-coloring of the core of an arbitrary graph.
Theorem 1 and Corollaries 2 and 5 are proved in Section 3. The proofs of Theorem 1 and Corollary 5 are based on Theorem 3. A crucial ingredient of the proof of Theorem 3 is our Main Lemma (Lemma 20). This lemma says that CR is unable to distinguishe between two vertices in the core only if they lie either on pendant paths (with the same endvertices) transposable by an automorphism of the graph or on a pendant cycle admitting a reflection by an automorphism. Note that this statement alone, which is a part of the information provided by Theorem 3, is enough to derive Theorem 1 and Corollary 5.
The proof of Main Lemma is outlined in Section 4. It heavily relies on the notion of a kernel. The kernel of a graph is a multigraph obtained from by contracting all pendant paths. That is, is obtained via the following iterative procedure: at every step if there exists a vertex with only two neighbors , we remove with both incident edges and add the edge instead. Note that this transformation can lead to appearance of multiple edges and loops.
To prove that CR colors vertices of the core in the described manner, we use the contiguous models due to Ding, Lubetzky, and Peres [18] in strictly supercritical regime and due to Ding, Kim, Lubetzky, and Peres in critical regime [17]. They allow to transfer properties of random multigraphs with given degree sequences to the kernel of the giant component in the random graph. Another important ingredient in our proofs is the fact that in the kernel of the supercritical random graph there are whp no small complex subgraphs. We consider separately two types of vertices: first, we prove that CR colors differently all vertices such that their neighborhoods induce trees. This is done in Sections 4.1 and 4.2 for and respectively. Then, in Section 4.3, we prove that these vertices are helpful to distinguish between all the remaining vertices.
2 Color refinement: From identifiability to canonical labeling
2.1 Description of the CR algorithm
We now give a formal description of the color refinement algorithm (CR for short). CR operates on vertex-colored graphs but applies also to uncolored graphs by assuming that their vertices are colored uniformly. An input to the algorithm consists either of a single graph or a pair of graphs. Consider the former case first. For an input graph with initial coloring , CR iteratively computes new colorings
(1) |
where denotes a multiset and is the neighborhood of a vertex . Denote the partition of into the color classes of by . Note that each subsequent partition is either finer than or equal to . If , then for all . Suppose that the color partition stabilizes in the -th round, that is, is the minimum number such that . CR terminates at this point and outputs the coloring . Note that if the colors are computed exactly as defined by (1), they will require exponentially long color names. To prevent this, the algorithm renames the colors after each refinement step, using the same set of no more than color names.
If an input consists of two graphs and , then it is convenient to assume that their vertex sets and are disjoint. The vertex colorings of and define an initial coloring of the union , which is iteratively refined according to (1). The color partition is defined exactly as above but now on the whole set . As soon as the color partition of stabilizes, CR terminates and outputs the current coloring of . The color names are renamed for both graphs synchronously.
We say that CR distinguishes and if . If CR fails to distinguish and , then we call these graphs CR-equivalent and write . A graph is called CR-identifiable if always implies .
2.2 Covering maps and universal covers
A surjective homomorphism from a graph onto a graph is a covering map if its restriction to the neighborhood of each vertex in is bijective. We suppose that is a finite graph, while can be an infinite graph. If there is a covering map from to (in other terms, covers ), then is called a covering graph of . Let be connected. We say that a graph is a universal cover of a graph if covers every connected covering graph of . A universal cover of is unique up to isomorphism. Alternatively, can be defined as a tree that covers . If is itself a tree, then ; otherwise the tree is infinite.
A straightforward inductive argument shows that a covering map preserves the coloring produced by CR, that is, for all , where is defined by (1). It follows that, if two connected graphs and have a common universal cover, i.e., , then . The converse implication is also true, as a consequence of the following lemma.
Lemma 7 (cf. Lemmas 2.3 and 2.4 in [26]).
Let and be universal covers of connected graphs and respectively. Furthermore, let be a covering map from to and be a covering map from to . For a vertex of and a vertex of , let and be the rooted versions of and with roots at and respectively. Then (isomorphism of rooted trees) if and only if .
The union of CR-identifiable graphs does not need be CR-identifiable. However, the concept of a universal cover allows us to state the following criterion, which is an extension of [4, Thm. 5.4] (see [4, p. 649] for details).
Lemma 8.
Let be connected CR-identifiable graphs and be their vertex-disjoint union. Then is CR-identifiable if and only if, for every pair of distinct and such that neither nor is a tree, the universal covers of and are non-isomorphic.
2.3 Unicyclic graphs
2.3.1 Universal covers of unicyclic graphs
For a unicyclic graph , its is the set of vertices lying on the unique cycle of . We use the notation for the length of this cycle. For a vertex in , let denote the subgraph of induced by the vertices reachable from along a path avoiding the other vertices in . This is obviously a tree. Moreover, we define as a rooted tree with root at . Let denote the isomorphism class of the rooted tree . We treat as a coloring of and write to denote the cycle of endowed with this coloring. Thus, is defined as a vertex-colored cycle graph. It will also be useful to see as a circular word over the alphabet ; see, e.g., [22] and the references therein for more details on this concept in combinatorics on words. In fact, is associated with two circular words, depending on one of the two directions in which we go along . However, the choice of one of the two words is immaterial in what follows.
Speaking about a word, we mean a standard, non-circular word. Two words are conjugated if they are obtainable from one another by cyclic shifts. A circular word is formally defined as the conjugacy class of a word. A word is a period of a word if for some . A word is a period of a circular word if is a period of some representative in the conjugacy class of . Note that if is a period of a word , then any conjugate of is a period of some conjugate of . This allows us to consider periods of circular words themselves being circular words. We define the periodicity of a circular word to be the minimum length of a period of . It may be useful to keep in mind that a period of length is also a period of every period of ; cf. [22, Proposition 1] (note that our terminology is different from [22]).
For a unicyclic graph , we define its periodicity by , where is seen as a circular word as explained above. Note that is a divisor of and that .
Like trees, unicyclic graphs are also characterizable in terms of universal covers.
Claim 9.
A connected graph is unicyclic if and only if has a unique infinite path.
The unique infinite path subgraph of will be denoted by . The structure of is clear: The cycle of is unfolded into the infinite path . Moreover, let be a covering map from to . Then is obtained by planting a copy of the rooted tree at each vertex on . The path will be considered being a vertex colored graph, with each vertex colored by .
The following observation is quite useful in what follows. Let be a covering map from to . The restriction of to is a covering map from the vertex-colored path to the vertex-colored cycle . Note that a covering map must preserve vertex colors.
Lemma 10.
Let and be connected unicyclic graphs. Then if and only if the circular words and have a common period. Moreover, if , then .
Proof.
In one direction the statement is clear: if and have a common period, then by the definition. Let be a common universal cover of and . We can naturally see as an infinite word. An arbitrary subword of length of is a period of , and the same is true for an arbitrary subword of length of . It follows that has a period of length . Indeed, it is sufficient to note that there exist integers such that . Thus, for any at distance in , we get that .
If and are covering maps from to and respectively, then is a period of and is a period of . Since and preserve the vertex colors, we have the equality . This also implies that, in fact, .
2.3.2 CR-identifiability of unicyclic graphs
Claim 11.
Let be a connected unicyclic graph. Suppose that and consists of connected components . Then
-
1.
for all ,
-
2.
every is unicyclic, and
-
3.
.
Proof.
1. Fix . Let and and be the respective covering maps. Let be a vertex in . Since and are CR-equivalent, must contain a vertex such that . By Lemma 7, . It follows that .
2. Immediately by Part 1 and Claim 9.
3. Fix a period of and set where the summation goes over all in this period (this definition obviously does not depend on the choice of the period). Let . Note that . By Part 1 and Lemma 10, we similarly have . The required equality now follows from the trivial equality .
Lemma 12.
A connected unicyclic graph is CR-identifiable if and only if one of the following conditions is true:
-
and ,
-
and ,
-
.
Proof.
( ) Suppose that a connected unicyclic graph satisfies one of the three conditions and show that it is CR-identifiable. Assuming that is CR-equivalent to , we have to check that and are actually isomorphic.
Assume first that is connected. If is not unicyclic, then and have equal number of vertices but different number of edges. This implies that and have different degree sequences, contradicting the assumptions that . Therefore, must be unicyclic. By Part 3 of Claim 11, we have . Along with Lemma 10, which is applicable because whenever , this implies that . The last relation, in its turn, implies that .
Assume now that is disconnected. Let be the connected components of .
Combining Claim 11 and Lemma 10, we see that
. It follows that satisfies one of the first two
conditions. The restrictions on , however, rule out the equality in Part 3 of Claim 11. In particular, if , then the only possible case is and . However, it contradicts the equality .
Thus, the case of disconnected is actually impossible, that is, all such
are distinguishable from by CR.
( )
Suppose that all three conditions are false. That is,
either and , or and (note that is even in this case),
or (in the last case, is a proper divisor of ).
In each case, is CR-equivalent to a disjoint union of two shorter vertex-colored
cycles and , both sharing the same period of length with .
Taking the connected unicyclic graphs and such that
and , we see that is CR-equivalent to the disjoint union of and
and is, therefore, not CR-identifiable.
Lemma 13.
Let and be connected unicyclic graphs with . Assume that both and are CR-identifiable. Then if and only if and one of the following conditions is true:
-
,
-
and ,
-
and while .
Proof.
( )
By Lemma 10.
( )
Let and be CR-identifiable connected unicyclic graphs with .
Assume that . The equality
immediately follows from Lemma 10. By the same corollary, .
If , then Lemma 12 yields the equality ,
which readily implies by using Lemma 10 once again.
If , then either and or and then, by Lemma 12,
and are as claimed.
2.4 A general criterion of CR-identifiability
Deciding whether a given graph is CR-identifiable is an efficiently solvable problem [4, 25]. For our purposes, it is beneficial to have a more explicit description of CR-identifiable graphs in terms of the complex and the simple part of a graph. We now derive such a description from the facts obtained for unicyclic graphs in the preceding subsection.
Theorem 14.
-
1.
A graph is CR-identifiable if and only if both the complex and the simple parts of are CR-identifiable.
-
2.
The simple part of is CR-identifiable if and only if both of the following two conditions are true:
Proof.
-
1.
If is CR-identifiable, then its complex and simple parts are both CR-identifiable as a consequence a more general fact: The vertex-disjoint union of any set of connected components of is CR-identifiable. This fact is easy to see directly, and it also immediately follows from Lemma 8.
In the other direction, assuming that the complex and the simple parts of are CR-identifiable, we have to conclude that is CR-identifiable. Lemma 8 reduces our task to verification that if is a complex connected component of and is a simple connected component of (a tree or a unicyclic graph), then the universal covers of and are non-isomorphic. The last condition follows from the fact that the universal cover of a tree is the tree itself and from Claim 9.
- 2.
2.5 Coloring the cores of general graphs
We conclude this section by collecting useful general facts about the CR-colors of vertices in the core of a graph. Let be an arbitrary graph. If is a vertex in , then in we have a tree growing from the root that shares with only the vertex . We denote this rooted tree by .
Claim 15.
Let and be graphs. Let be a vertex in and be a vertex in . If , then .
Proof.
Clearly, it suffices to prove this for connected and . The condition readily implies that , and the claim follows from Lemma 7.
Claim 16.
Let and be two graphs (it is not excluded that ). For vertices and assume that . Then if and only if .
Proof.
Assume that and are connected (the general case will easily follow). Let be a covering map from to , and be a covering map from to . Consider and such that and . Note that if and only if there is a cycle in containing , and the same is true about any . This proves the claim because by Lemma 7.
In our proofs, we will deal with cores that locally have a tree structure, that is, the balls of sufficiently large radii around most of its vertices induce trees. In this case, CR distinguishes vertices that have non-isomorphic neighborhoods.
Claim 17.
Let denote the set of vertices at distance at most from a vertex . Let . If, for some , the -neighborhoods and induce non-isomorphic trees rooted in and respectively, then .
Proof.
This is a direct consequence of Lemma 7.
Throughout the paper, we identify the vertex set of the kernel of with the set of vertices of having degrees at least 3 in the core. We now state another consequence of Lemma 7.
Claim 18.
Let be a graph with minimum degree at least 2 and let be its kernel. Let be a positive integer. For , let be the subgraph of induced by the set of vertices at distance at most from in . Let be the subdivided version of . Let be vertices of such that, for some , graphs are non-isomorphic trees rooted in . Then .
Finally, we need the fact that the partition produced by CR on a graph refines the partition produced by CR on its core.
Claim 19.
Let and be vertices in . Let and be the colorings produced by CR run on and respectively. If , then also .
Proof.
3 Proofs of main results
3.1 Derivation of Corollary 5 from Theorem 3
Part 1.
Any isomorphism of graphs obviously respects their cores; cf. Claim 16. Note that the CR-color of any vertex in the core contains a complete information about the isomorphism type of the rooted tree “growing” from this vertex (cf. Claim 15). This has the following consequence. Let denote the colored version of where each vertex is colored by the isomorphism type of . Then is CR-identifiable if and only if is CR-identifiable. In order to show that is CR-identifiable it suffices to show that is reconstructible up to isomorphism from the multiset of the vertex colors produced by CR on input . The CR-color partition of is equal to the restriction of the CR-color partition of to (recall Claim 16). Theorem 3, therefore, provides us with the following information (whp):333Note that Conditions (b) and (c) are provided by Main Lemma. Condition (a) is not essential for the argument in Subsections 3.1 and 3.2, which easily extends to the case of more than two mutually transposable pendant paths.
-
(a)
every CR-color class of has size either 1 or 2,
-
(b)
every two equally colored vertices have degree 2,
-
(c)
every two equally colored vertices are transposable by an automorphism of .
Moreover, our Main Lemma (Lemma 20) ensures that whp has no involutory automorphism of type described in Section 1. Along with this fact, the above conditions readily imply that the color classes of size 2 occur either “along” a pair of transposable pendant paths between two vertices of degree at least 3 or correspond to the reflectional symmetry of a pendant cycle. Here we use the notions introduced in Section 1 in the context of , which should now be refined by taking into account the coloring of .
If and are two color classes of size 1, then the colors and yield the information on whether the vertices and are adjacent or not. For color classes and , note that and are adjacent if and only if and are adjacent. This adjacency pattern is as well reconstructible from the colors and . If and are two color classes of size 2, then they span either a complete or empty bipartite graph or a matching (for example, is adjacent to , is adjacent to , and there is no other edges between these color classes). Each of these three possible adjacency patterns is reconstructible from the colors and . A crucial observation, completing the proof, is that all ways to put a matching between and lead to isomorphic graphs.
Part 2.
This follows from part 1 by part 1 of Theorem 14.
3.2 Derivation of Theorem 1 from Theorem 3
Before proceeding to the proof, we remark that when we say that a canonical labeling algorithm succeeds on a random graph , we mean that the algorithm works correctly on a certain efficiently recognizable (closed under isomorphisms) class of graphs such that belongs to whp. Though not explicitly stated in the argument below, it will be clear that, in our case, is the class of all graphs satisfying the conditions of Theorem 3. Note that these conditions are easy to check after running CR on a graph.
First of all, we distinguish the complex and the simple parts of and compute a canonical labeling of the simple part separately. This is doable in linear time. It remains to handle the complex part .
It is enough to compute a suitable injective coloring of and subsequently to rename the colors in their lexicographic order by using the labels that were not used for the simple part. To this end, we run CR on . This takes time as CR can be implemented [9] in time , where denotes the number of edges (which is linear for the sparse random graph under consideration). Then we begin with coloring the vertices of the core . Theorem 3 along with Claim 16 ensures that the vertices of degree at least 3 already received individual colors. The duplex colors occur along transposable pendant paths and pendant cycle (like in Section 3.1, these notions are understood with respect to rather than to alone). To make such vertex colors unique, we keep the original colors along one of two transposable paths and concatenate their counterparts in the other path with a special symbol. We proceed similarly with symmetric pendant cycles. In this way, every vertex in the core receives an individual color . In the last phase, we compute a canonical labeling for each tree part of , regarding as a tree rooted at . This coloring is not injective yet because some and can be isomorphic. This is rectified by concatenating all vertex colors in with .
3.3 Proof of Corollary 2
As well known, if then the core of the giant component of coincides with the core of the entire graph. Due to the classical linear-time algorithms for canonical labeling of trees, this observation reduces canonical labeling of with to canonical labeling of its core.
Linial and Mosheiff [27] suggested an algorithm that, for any with , whp labels canonically in time by distinguishing between all vertices of the core. In [16], it was proved that, if , then CR whp distinguishes between all vertices of the entire . Finally, our Theorem 1 provides an algorithm that, for any , whp labels canonically in time . Now, consider the following algorithm :
-
1.
Run CR. If it colors differently all vertices, then halt and output the canonical labeling produced by CR.
-
2.
If the algorithm does not halt in Step 1, then run . If it succeeds (i.e., colors differently all vertices in the core of the input graph), then halt and output the labeling produced by .
-
3.
If the algorithm does not halt in Steps 1 and 2, then run and output the labeling it produces (or give up if fails).
Let us show that the algorithm succeeds whp for any with . Assume, to the contrary, that there exist a constant and a sequence such that
for all . If there is a subsequence and a constant such that for all , then we get a contradiction with the performance of the algorithm . Therefore, . If there is a subsequence such that for all , then we get a contradiction with the performance of the algorithm . It follows that for all . This, however, contradicts the result of [16] that CR in this regime produces a discrete coloring of whp.
In order to obtain canonical labeling, whp, for all with , we run the algorithm on input and if it fails, then we run once again on the complement of .
4 CR-coloring of the random graph
In this section, we state and prove our Main Lemma that describes the output of CR on the random graph. Given a graph , we call vertices and in interchangeable, if
-
they both have degree 2 in ,
-
and belong to a cycle with the following property: there exists a vertex on the cycle such that has degree at least 3 in , , and all the other vertices on the cycle, but the vertex opposite to when is even, have degree 2 in . In other words, and either belong to a pendant cycle or to two transposable pendant paths, and the respective transposition replaces and .
Lemma 20 (Main Lemma).
Let be a constant, , and . Let be the union of complex components in , and be its core. If CR is run on , then whp any pair of vertices in receiving the same color is interchangeable. Under the condition , this is true also if CR is run on .
The proof of Main Lemma is given in Sections 4.1–4.3. We consider separately large (supercritical phase) and small (critical phase). In both cases, we specify good sets of vertices and show that all vertices from good sets are distinguished by CR. This is done in Section 4.1 for large and Section 4.2 for small . Finally, in Section 4.3 we complete the proof: we show that distinguishing between good vertices in the core is sufficient to distinguish between all pairs in the core that are not interchangeable.
For a graph , is the shortest-path distance between and in . Sometimes, when the graph is clear from the context, we omit the subscript . For a vertex and a real number , we denote by the ball of radius around in , i.e., the graph induced on the set of all vertices at distance at most from in . For a non-negative integer , we denote by the sphere of radius around in , i.e., the graph induced on the set of all vertices at distance exactly from in .
For a connected graph , its excess is the difference between the number of edges and the number of vertices. In particular, a tree has excess . We call -complex a connected graph with excess . The total excess of a graph without unicyclic components is the sum of excesses of all its components.
4.1 Distinguishing good vertices in the core in the supercritical and strictly supercritical phases
In this subsection, we let be such that for some constant . Denote . We denote the kernel and the core of the giant component of by and . Let be the coloring produced by CR on .
We assign to every edge of the weight , where is the number of vertices that subdivide the edge of the kernel belongs to. The weight of a path is the sum of weights of its edges. For , let be the fractional distance between and , i.e. the minimum weight of a path between and . We denote the respective metric space by .
Fix a positive real . Let be the set of all such that the ball around in of radius induces an acyclic graph. For every vertex and integer , let be the multiset of lengths of edge-disjoint paths from that are produced by subdividing edges , where while . We will need the following facts.
Claim 21.
Let . Whp for any two different , there exists an integer such that the multisets and are different.
Proof.
We fix and let . We prove this claim in the contiguous models , defined in [17, Thm. 2] and [18, Thm. 1] and then use these theorems to conclude that it also holds in . So, in what follows, , , and .
Let us expose and let . As proved in [17, 18], whp . Assume first that the distance between and is at most in . Let be the shortest path between and – it is unique due to the definition of . Let be a neighbor of in that does not belong to . Then, by the definition of , for all . Since , we have that and are trees. It immediately implies, that for every such ,
We then generate subdivisions of the edges of the kernel from the definition of in the following order: for every , we, first, subdivide all edges growing from outside of the ball, and then all edges growing from outside of . Notice that all sets are disjoint for different . For every , as soon as the edges that correspond to the vertex are subdivided, the event that immediately implies that the random multiset of lengths of paths from , that are produced by subdividing edges from that grow from outside, should be equal to a predefined value. This multiset has size at least . Since the geometric random variables considered in [17, 18] do not have atoms with probability measure , the latter event has probability at most due to the de Moivre–Laplace local limit theorem. Eventually,
Assume now that the distance between and is bigger than in . Then, by the definition of , sets and are disjoint and sets have size at least for all . As above, we get that for all with probability at most .
Claim 22.
Let and . Whp, for any distinct .
Proof.
Assume that the assertion of Claim 21 holds for and deterministically. Let . Let and be the subdivided versions of and in . Since due to the conclusion of Claim 21, we get that due to Claim 18. It remains to consider the case and . Assume towards contradiction that . Then, both and have degree 2 in . In particular, . Consider the edges of that and subdivide. Let be the subdivided versions of . Due to the assertion of Claim 21 applied to , we get that all vertices of from have different colors. On the other hand, by the definition of CR, the neighbors of should have exactly the same color as the neighbors of . Thus, by induction, we get that the entire paths are colored identically. It may only happen if the endpoints of coincide with the endpoints of . By the definition of , it means that . Since the endpoints of are colored in different colors, it can be easily shown by induction that all vertices in are also colored in different colors. Thus, , yielding a contradiction.
4.2 Distinguishing good vertices in the core in the critical regime
Let be a large positive number. Let . Whp any complex component in has size at least due to the following well-known fact.
Claim 23.
Let , , and . There exists such that as and whp any connected subgraph of of size at most is not complex.
Let us say that a path extends the path if, for some the sets and are disjoint and . For convenience, we assume that this notion is closed under rotations of paths, i.e. if extends , then it also extends and we also say that extends both and in this case. We call two paths and weakly disjoint, if they are either vertex-disjoint or one paths extends the other one.
Claim 24.
Whp in there are no two weakly disjoint paths and of length such that, for every , has degree 2 if and only if has degree 2.
Proof.
Due to Claim 23, whp in there are no complex subgraphs with at most vertices.
For a path in , let us consider a binary word defined as follows: if and only if has degree in . Notice that, if a path extends the path so that and , then is periodic and defined by .
Let be the number of pairs of paths as in the statement of the claim and such that there are at most 2 edges between the paths (we are allowed to assume this since there are no complex subgraphs of size at most ). Fix two weakly disjoint path and and assume without loss of generality that either extends , or they are disjoint. Let be such that . If there is no such , i.e. the paths are disjoint, set . Then, expose edges from all , , and assume that they send at most 2 edges to , other than the edge . Then, probability that for every , has degree 2 if and only if has degree 2, is at most
We then get
for an appropriate choice of . Due to Markov’s inequality, , completing the proof.
Let be the set of all vertices in that belong to a complex component of and such that is a tree. Let be the coloring produced by CR on .
Claim 25.
Whp, for any .
Proof.
Assume that the statement of Claim 24 holds deterministically in . Fix two different . Let us show towards contradiction that trees and are not isomorphic.
Take an arbitrary path of length , where . Since, by assumption, , there exists a path such that and, for every , has degree 2 if and only if has degree 2. In the same way, since and is a tree, we may consider a path that shares only the vertex with . Since , there should be a path that shares with the only vertex and such that, for every , has degree 2 if and only if has degree 2. Since is acyclic and since pairs of paths and cannot be disjoint due to Claim 24, must lie on the path . Moreover, since is acyclic, once or leave , they never meet with again. Thus, the path is divided by in at most 3 parts: the first part does not have common vertices with , the second part is a subpath of , and the third part does not have common vertices with again. Let be the longest part of the three. Then has length . Moreover, since by assumption and , there should be a subpath such that and are weakly disjoint, and the degrees of internal vertices in and are aligned in the sense that the -th inner vertex of have degree 2 if and only if the -th vertex of has degree 2. This is impossible due to Claim 24. Thus, implying due to Claim 17.
4.3 Completing the proof of Main Lemma (Lemma 20)
Due to Claims 22, 25, and 19, it remains to prove the following:
-
1.
If , then
-
for every and ;
-
for any non-interchangeable pair ;
-
-
2.
If , then
-
for every and ;
-
for any non-interchangeable pair .
-
Claim 26.
Let be a constant, , and . Then whp in there are no complex subgraphs of size at most .
For the sake of brevity, below we prove both statements in two different regimes simultaneously. Thus, with some abuse of notation, in the supercritical phase (i.e., ), we let and as we only consider CR on . We also assume that when , the core is equipped with the fractional distance , constituting the metric space . If , then is equipped with the usual shortest-path distance, that we denote by as well. We also use the following notation: when we prove the assertion for and when we prove it for . In what follows, we assume that the assertions of Claims 22, 23, 25, and 26 hold deterministically in .
-
1.
Assume that some and have . We know that is -close to a cycle of length at most . If , then let be the closest to vertex on that has degree more than 2 in the core. Otherwise, let . Let be the shortest path from to . Let us extend this path by a path of length beyond . Due to Claim 23 and Claim 26, it has a subpath of length consisting of vertices from only such that . We know that all elements of the vector are different. Then, due to our assumption, must have a vertex at distance at most such that is the first vertex of a path with , .
We now consider separately two cases: and . In the first case, we have that the distance from to the closest cycle (which is , the same as for ) is at most
Let be the shortest path between and . Due to Claim 23 and Claim 26, there exists a path of length that does not meet and consists of vertices from only. Due to Claim 22 and Claim 25 all elements of the vector are different and no element of equals to any element of . Moreover, by construction, . Then, due to our assumption, must have a neighbor such that is the first vertex of a path with , . Note that due to Claim 23 and Claim 26. Since all vertices in are distinguished by , we conclude that all vertices must be outside . Due to Claim 23, Claim 26, and the definition of , they constitute a (self-avoiding) path and are -close to a cycle of length at most . Since the path has length , we get a contradiction with Claim 23 or Claim 26.
We then assume . It may only happen when . Moreover, all are not in . Indeed, otherwise, different paths and have common vertices. Then the path from to that goes through has length greater than . However, due to Claim 23 and Claim 26, there are no two different paths from to , both of length at most and, also, there is no other cycle of length at most such that a path from to has at most vertices. This contradicts the fact that . Thus, we again get a (self-avoiding) path consisting of vertices that are -close to a cycle of length at most . This contradicts Claim 23 or Claim 26 again, since the path has length . We conclude that every vertex has that does not equal to the color of any other vertex in the core.
-
2.
It remains to prove that, for any two distinct that are not interchangeable, . Fix two such vertices and . We may assume that since otherwise due to Claim 15. Let and be two cycles of length at most that are closest to and respectively (both are at distance at most from the respective vertices). If , then set . In this case, we let when and let be the closest vertex of degree 3 in to otherwise. If , then, without loss of generality we assume that either is not in or both are in . Let when and let be a vertex of that has degree at least 3 and such that otherwise. Note that such a vertex exists due to the definition of an interchangeable pair. Consider a path of length that starts at and does not meet . Due to Claim 23 and Claim 26, this path has a vertex in such that . If , then
due to Claim 23 and Claim 26. Finally, let . Assume, in addition, . Then the only possibility for to be equal to is to have a path between and of length . Let us extend by a path of length beyond . Due to Claim 23 and Claim 26 this path has a vertex from such that . But then , implying . If , then
In either case, we get or . Recalling that has a unique color, we readily conclude that , completing the proof.
References
- [1] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullmann. The Design and Analysis of Computer Algorithms. Addison-Wesley Longman Publishing Co., 1974.
- [2] Michael Anastos, Matthew Kwan, and Benjamin Moore. Smoothed analysis for graph isomorphism, 2024. arXiv:2410.06095.
- [3] D. Angluin. Local and global properties in networks of processors. In The 12th Annual ACM Symposium on Theory of Computing, pages 82–93, 1980.
- [4] Vikraman Arvind, Johannes Köbler, Gaurav Rattan, and Oleg Verbitsky. Graph isomorphism, color refinement, and compactness. Comput. Complex., 26(3):627–685, 2017. doi:10.1007/s00037-016-0147-6.
- [5] L. Babai, P. Erdős, and S. M. Selkow. Random graph isomorphism. SIAM Journal on Computing, 9(3):628–635, 1980. doi:10.1137/0209047.
- [6] László Babai. Moderately exponential bound for Graph Isomorphism. In Proc. of the 3rd Int. Conf. on Fundamentals of Computation Theory (FCT’81), volume 117 of Lecture Notes in Computer Science, pages 34–50. Springer, 1981. doi:10.1007/3-540-10854-8_4.
- [7] László Babai. Graph isomorphism in quasipolynomial time. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing (STOC’16), pages 684–697, 2016. doi:10.1145/2897518.2897542.
- [8] László Babai. Canonical form for graphs in quasipolynomial time: preliminary report. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC’19), pages 1237–1246, 2019. doi:10.1145/3313276.3316356.
- [9] Cristoph Berkholz, Paul Bonsma, and Martin Grohe. Tight lower and upper bounds for the complexity of canonical colour refinement. Theory of Computing Systems, 60:581–614, 2017. doi:10.1007/S00224-016-9686-0.
- [10] N. Biggs. Algebraic graph theory. Cambridge University Press, 2nd edition, 1994.
- [11] Tom Bohman, Alan M. Frieze, Tomasz Luczak, Oleg Pikhurko, Clifford D. Smyth, Joel Spencer, and Oleg Verbitsky. First-order definability of trees and sparse random graphs. Comb. Probab. Comput., 16(3):375–400, 2007. doi:10.1017/S0963548306008376.
- [12] Bela Bollobás. Distinguishing vertices of random graphs. Ann. Discrete Math., 13:33–50, 1982.
- [13] Bela Bollobás. The evolution of random graphs. Transactions of the American Mathematical Society, 286(1):257–274, 1984.
- [14] Bela Bollobás. Random graphs. Cambridge University Press, 2001.
- [15] D. M. Cvetković, M. Doob, and H. Sachs. Spectra of graphs. Theory and applications. Leipzig: J. A. Barth Verlag, 3rd edition, 1995.
- [16] Tomek Czajka and Gopal Pandurangan. Improved random graph isomorphism. Journal of Discrete Algorithms, 6:85–92, 2008. doi:10.1016/J.JDA.2007.01.002.
- [17] Jian Ding, Jeong Han Kim, Eyal Lubetzky, and Yuval Peres. Anatomy of young giant component in the random graph. Random Structures & Algorithms, 39(2):139–178, 2011. doi:10.1002/RSA.20342.
- [18] Jian Ding, Eyal Lubetzky, and Yuval Peres. Anatomy of the giant component: The strictly supercritical regime. European Journal of Combinatorics, 35:155–168, 2014. doi:10.1016/J.EJC.2013.06.004.
- [19] Paul Erdős and Alfred Rényi. On the evolution of random graphs. Publ. Math. Inst. Hungar. Acad. Sci., 5:17–61, 1960.
- [20] Paul Erdős and Alfred Rényi. Asymmetric graphs. Acta Math Acad Sci Hung, 14:295–315, 1963.
- [21] Julia Gaudio, Miklós Z. Rácz, and Anirudh Sridhar. Average-case and smoothed analysis of graph isomorphism, 2023. arXiv:2211.16454.
- [22] László Hegedüs and Benedek Nagy. On periodic properties of circular words. Discrete Mathematics, 339(3):1189–1197, 2016. doi:10.1016/j.disc.2015.10.043.
- [23] Neil Immerman and Eric Lander. Describing Graphs: A First-Order Approach to Graph Canonization, pages 59–81. Springer New York, 1990.
- [24] Svante Janson, Tomasz Łuczak, and Andrzej Ruciński. Random graphs. John Wiley & Sons, 2000.
- [25] Sandra Kiefer, Pascal Schweitzer, and Erkal Selman. Graphs identified by logics with counting. ACM Trans. Comput. Log., 23(1):1:1–1:31, 2022. doi:10.1145/3417515.
- [26] Andreas Krebs and Oleg Verbitsky. Universal covers, color refinement, and two-variable counting logic: Lower bounds for the depth. In 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS’15), pages 689–700. IEEE Computer Society, 2015. doi:10.1109/LICS.2015.69.
- [27] N. Linial and J. Mosheiff. On the rigidity of sparse random graphs. Journal of Graph Theory, 85(2):466–480, 2017. doi:10.1002/JGT.22073.
- [28] T. Łuczak. The automorphism group of random graphs with a given number of edges. Math Proc Camb Phil Soc, 104:441–449, 1988.
- [29] Tomasz Łuczak. The phase transition in a random graph. In D. Miklós, V.T. Sós, and T. Szőnyi, editors, Combinatorics, Paul Erdős is Eighty, volume 2, pages 399–422. Bolyai Soc. Math. Stud. 2, J. Bolyai Math. Soc., Budapest, 1996.
- [30] Tomasz Łuczak, Boris Pittel, and John C. Wierman. The structure of a random graph at the point of the phase transition. Transactions of the American Mathematical Society, 341(2):721–748, 1994.
- [31] W. S. Massey. Algebraic topology: An introduction, volume 56 of Graduate Texts in Mathematics. Springer, 5th edition, 1981.
- [32] Marc Noy, Vlady Ravelomanana, and Juanjo Rué. On the probability of planarity of a random graph near the critical point. Proc. Am. Math. Soc., 143(3):925–936, 2015. doi:10.1090/S0002-9939-2014-12141-1.
- [33] Oleg Verbitsky and Maksim Zhukovskii. Canonical labeling of sparse random graphs, 2024. arXiv:2409.18109.
- [34] B.Yu. Weisfeiler and A.A. Leman. The reduction of a graph to canonical form and the algebra which appears therein. NTI, Ser. 2, 9:12–16, 1968. In Russian. English translation is available at https://www.iti.zcu.cz/wl2018/pdf/wl_paper_translation.pdf.
- [35] E. M. Wright. Asymmetric and symmetric graphs. Glasgow Math J, 15:69–73, 1974.