Abstract 1 Introduction 2 Color refinement: From identifiability to canonical labeling 3 Proofs of main results 4 CR-coloring of the random graph References

Canonical Labeling of Sparse Random Graphs

Oleg Verbitsky ORCID Institut für Informatik, Humboldt-Universität zu Berlin, Germany Maksim Zhukovskii ORCID School of Computer Science, University of Sheffield, UK
Abstract

We show that if p=O(1/n), then the Erdős-Rényi random graph G(n,p) with high probability admits a canonical labeling computable in time O(nlogn). Combined with the previous results on the canonization of random graphs, this implies that G(n,p) with high probability admits a polynomial-time canonical labeling whatever the edge probability function p. Our algorithm combines the standard color refinement routine with simple post-processing based on the classical linear-time tree canonization. Noteworthy, our analysis of how well color refinement performs in this setting allows us to complete the description of the automorphism group of the 2-core of G(n,p).

Keywords and phrases:
Graph isomorphism, random graphs, canonical labeling, color refinement
Funding:
Oleg Verbitsky: Supported by DFG grant KO 1053/8–2. On leave from the IAPMM, Lviv, Ukraine.
Copyright and License:
[Uncaptioned image] © Oleg Verbitsky and Maksim Zhukovskii; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Random graphs
; Mathematics of computing Graph algorithms
Related Version:
Full Version: https://arxiv.org/abs/2409.18109 [33]
Acknowledgements:
The authors would like to thank Michael Krivelevich for helpful discussions.
Editors:
Olaf Beyersdorff, Michał Pilipczuk, Elaine Pimentel, and Nguyễn Kim Thắng

1 Introduction

On an n-vertex input graph G, a canonical labeling algorithm computes a bijection λG:V(G){1,,n} such that if another graph G is isomorphic to G, then the isomorphic images of G and G under respective permutations λG and λG are equal. Given the labelings λG and λG, it takes linear time to check whether G and G are isomorphic. The existence of polynomial-time algorithms for testing isomorphism of two given graphs and, in particular, for producing a canonical labeling remain open. Babai’s breakthrough quasi-polynomial algorithm for testing graph isomorphism [7] was subsequently extended to a canonical labeling algorithm of the same time complexity [8]. In the present paper, we address the canonical labeling problem for the Erdős-Rényi (or binomial) random graph G(n,p). Recall that the vertex set of G(n,p) is {1,,n}, and each pair of vertices is adjacent with probability p=p(n), independently of the other pairs.

Babai, Erdős, and Selkow [5] proved that the simple algorithmic routine known as color refinement (CR for brevity) with high probability produces a discrete coloring of the vertices of G(n,1/2), that is, a coloring where the vertex colors are pairwise different. Since the vertex colors are isomorphism-invariant, this yields a canonical labeling of G(n,1/2) by numbering the color names in the lexicographic order. Here and throughout, we say that an event happens for G(n,p) with high probability (whp for brevity) if the probability of this event tends to 1 as n. The result of [5] has a fundamental meaning: almost all graphs admit an easily computable canonical labeling and, hence, the graph isomorphism problem has low average-case complexity.

The argument of [5] can be extended to show [14, Theorem 3.17] that the CR coloring of G(n,p) is whp discrete for all n1/5lnnp1/2. Note that it is enough to consider the case of p1/2 since G(n,1p) has the same distribution as the complement of G(n,p). Remarkably, the algorithm of Babai, Erdős, and Selkow performs only a bounded number of color refinement steps and, due to this, works in linear time.

A different algorithm suggested by Bollobás [12] works in polynomial time and whp produces a canonical labeling of G(n,p) in a much sparser regime, namely when (1+δ)lnnnp2n11/12 for any positive constant δ. The next improvement was obtained by Czajka and Pandurangan [16] who proved that, in a bounded number of rounds, CR yields a discrete coloring of G(n,p) whp when ln4nnlnlnnp12. Finally, Linial and Mosheiff [27] designed a polynomial time algorithm for canonical labeling of G(n,p) when 1np12. As shown by Gaudio, Rácz, and Sridhar [21], in the subdiapason p(1+δ)lnnn for any fixed δ>0, a canonical labeling can still be provided by CR in a bounded number of rounds.

The decades-long line of research summarized above leaves open the question whether a random graph G(n,p) admits efficient canonization in the regime p=O(1/n). Note that the case of p=o(1/n) is easy. Indeed, as long as pn=1ω(n1/3), whp G(n,p) is a vertex-disjoint union of trees and unicyclic graphs (i.e., connected graphs containing exactly one cycle). Canonization of such graphs is tractable due to the classical linear-time canonical labeling algorithms for trees [1] and even planar graphs (see [6] for a survey of the early work on graph isomorphism covering these graph classes). Thus, efficient canonization remains unknown for all p=p(n) such that, for some C>0 and all n, 1Cn1/3pnC (even though G(n,p) stays planar with a non-negligible probability as long as pn=1+O(n1/3); see [32]). Our first result closes this gap.

Theorem 1.

If p=O(1/n), then G(n,p) whp admits a canonical labeling computable in time O(nlogn).

Table 1: Canonical labeling algorithms for random graphs in the full-scale Erdős-Rényi evolutional model G(n,p).
Edge probability Algorithm
p=12 Babai, Erdős, and Selkow [5]
n1/5lnnp1/2 Bollobás [14]
(1+δ)lnnnp2n11/12 Bollobás [12]
ln4nnlnlnnp12 Czajka and Pandurangan [16]
(1+δ)lnnnpn5/6 Gaudio, Rácz, and Sridhar [21]
1np12 Linial and Mosheiff [27]
p=O(1n) this paper

The development of canonical labeling algorithms for G(n,p) is summarized in Table 1. The sources marked by  show that canonical labeling in the corresponding range can be obtained by CR in a constant number of refinement rounds. An inspection of the other algorithms reveals that all of them can be implemented as a combination of the 2-WL (2-dimensional Weisfeiler-Leman) algorithm [34] with tree canonization. The 2-WL is an extension of CR which computes a canonical coloring of pairs of vertices. Thus, in these cases, canonical labeling can be obtained in time O(n3logn), matching the running time bound for 2-WL (see [23]). If p=O(1/n), the running time is actually O(nlogn) because our algorithm, as discussed below, uses CR along with simple pre- and post-processing.

A simple argument shows that Theorem 1, combined with the previous results, implies that the Erdős-Rényi random graph G(n,p) whp admits an efficiently computable canonical labeling whatever the edge probability function p(n).

Corollary 2.

There exists a polynomial time algorithm that, for any function p=p(n) with values in [0,1], whp produces a canonical labeling of G(n,p).

We now recall some highlights of the evolution of the random graph. Erdős and Rényi [19] proved their spectacular result that when p passes a certain threshold around 1/n, then the size of the largest connected component in G(n,p) rapidly grows from Θ(logn) to Θ(n). A systematic study of the structure of connected components in the random graph when p is around the critical value 1/n was initiated in the influential paper of Bollobás [13]. For more details about the phase transition, see, e.g., [24, Chapter 5].

A connected graph is called complex, if it has more than one cycle. The union of all complex components of a graph G will be called the complex part of G, and the union of the other components will be referred to as the simple part. As we already mentioned, if pn=1ω(n1/3) then the complex part of G(n,p) is whp empty. This is the so-called subcritical phase. In the critical phase, when pn=1±O(n1/3), the complex part of G(n,p) whp has size OP(n2/3) and its structure is thoroughly described in [29, 30]. Here and below, for a sequence of random variables ξn and a sequence of reals an we write ξn=OP(an) if the sequence ξn/an is stochastically bounded111I.e. for every ε>0, there exists C>0 and n0 such that (|ξn/an|>C)<ε for all nn0.. Finally, in the supercritical phase, when pn=1+ω(n1/3), whp G(n,p) contains a unique complex connected component and this component has size Θ(n2(p1/n)). In particular, when p=O(1/n) and p>(1+δ)/n for a constant δ>0 (we refer to this case as strictly supercritical regime), whp this component has linear size Ω(n). It is called the giant component as all the other connected components have size O(logn).

In general, the simple part of a graph G is easily canonizable by the known techniques, which reduces our problem to finding a canonical labeling for the complex part of G. Furthermore, recall that the 2-core of a graph H, which we will for brevity call just core and denote by core(H), is the maximal subgraph of H that does not have vertices of degree 1. Equivalently, core(H) can be defined as the subgraph of H obtained by iteratively pruning all vertices in H that have degree at most 1 until there are no more such vertices. Thus, if H is the (non-empty) complex part of G, then H consists of core(H) and some (possibly empty) rooted trees planted at the vertices of the core. It follows that if we manage to canonically label the vertices of core(H), then this labeling easily extends to a canonical labeling of the entire graph G.

Suppose that CR is run on H. In the most favorable case, it would output a vertex coloring discrete on core(H). It turns out that, though not exactly true, this is indeed the case to a very large extent.

Theorem 3.

Let Gn=G(n,p) and assume that p=O(1/n). Let 𝖧n denote the complex part of Gn and 𝖢n=core(𝖧n). When CR is run on 𝖧n, then

  1. 1.

    CR assigns individual colors to all but OP(1) vertices in 𝖢n;

  2. 2.

    the other color classes whp have size 2;

  3. 3.

    whp, every such color class is an orbit of the automorphism group Aut(𝖧n) consisting of two vertices with degree 2 in 𝖢n.

 Remark 4.

From our proofs it is easy to derive that, when np=1+o(1), then CR distinguishes between all vertices of 𝖢n whp.

Theorem 3 allows us to obtain an efficient canonical labeling algorithm for G(n,p), as stated in Theorem 1, by combining CR with simple post-processing whose most essential part is invoking the linear-time tree canonization. Another consequence of Theorem 3 is that CR alone is powerful enough to solve the standard version of the graph isomorphism problem for the complex part of G(n,p). Specifically, we say that a graph H is identifiable by CR if CR distinguishes H from any non-isomorphic graph H (in the sense that CR outputs different multisets of vertex colors on inputs H and H). It is not hard to see that H is identifiable by CR whenever the CR coloring of H is discrete. Fortunately, the properties of the CR coloring ensured by Theorem 3 are still sufficient for CR-identifiability.

Corollary 5.

Under the assumption of Theorem 3,

  1. 1.

    𝖧n is whp identifiable by CR and, consequently,

  2. 2.

    whp, Gn is identifiable by CR exactly when the simple part of Gn is identifiable.

The CR-identifiability of the simple part of a graph admits an explicit, efficiently verifiable characterization, which we give in Theorem 14. This characterization can be used to show that the random graph Gn is identifiable by CR with probability asymptotically bounded away from 0 and 1.

Our techniques for proving Theorems 1 and 3 can also be used for deriving a structural information about the automorphisms of a random graph. As proved by Erdős and Rényi [20] and by Wright [35], G(n,p) for p1/2 is asymmetric, i.e., has no non-identity automorphism, if nplnn as n. This result is best possible because if, nplnnγ for some constant γ>0, then the random graph has at least 2 isolated vertices with non-vanishing probability. It is noteworthy that the asymmetry of G(n,p) in the regime p(1+δ)lnnn can be certified by the fact that CR coloring of G(n,p) is discrete due to the aforementioned result of Gaudio, Rácz, and Sridhar [21]. In the diapason of p forcing G(n,p) to be disconnected, the action of the automorphism group can be easily understood on the simple part and on the tree-like pieces of the complex part, and full attention should actually be given to the core of the complex part. Theorem 3 provides a pretty much precise information about the action of Aut(Gn) on 𝖢n. More subtle questions arise if, instead of considering the action of Aut(Gn) on 𝖢n, we want to understand the automorphisms of 𝖢n itself. It is quite remarkable that the CR algorithm, applied to 𝖢n rather than to 𝖧n, can serve as a sharp instrument for tackling this problem (and, in fact, the proof of Theorem 3 is based on an analysis of the output of CR on 𝖢n).

We begin with describing simple types of potential automorphisms of 𝖢n (with the intention of showing that, whp, all automorphism of 𝖢n are actually of this kind). If a vertex x has degree 2 in 𝖢n, then it belongs to a (unique) path from a vertex s of degree at least 3 to a vertex t of degree at least 3 with all intermediate vertices having degree 2. We call such a path in 𝖢n pendant. It is possible that s=t, and in this case we speak of a pendant cycle. Clearly, the reflection of a pendant cycle fixing its unique vertex of degree more than 2 is an automorphism of 𝖢n. Furthermore, call two pendant paths transposable if they have the same length and share the endvertices. Note that 𝖢n has an automorphism transposing such paths (and fixing their endvertices). Let A1 denote the set of the automorphisms provided by pendant cycles, and let A2 be the set of the automorphisms provided by transposable pairs of pendant paths. Moreover, 𝖢n can have a connected component consisting of two vertices of degree 3 and three pendant paths of pairwise different lengths between these vertices. Such a component has a single non-trivial automorphism, which contributes in Aut(𝖢n). The set of such automorphisms of 𝖢n will be denoted by A3.

Recall that an elementary abelian 2-group is a group in which all non-identity elements have order 2 or, equivalently, a group isomorphic to the group (2)k for some k.

Theorem 6.

Let Gn=G(n,p) and assume that p=O(1/n). Let 𝖢n be the core of the complex part of Gn.

  1. 1.

    The order of Aut(𝖢n) is stochastically bounded, i.e., |Aut(𝖢n)|=OP(1).

  2. 2.

    Whp, Aut(𝖢n) is an elementary abelian 2-group. Moreover, A1A2A3 is a minimum generating set of Aut(𝖢n).222Consequently, whp 𝖢n contains neither a triple of pairwise transposable paths, nor two isomorphic components with an automorphism in A3, nor a cyclic component with a single chord between diametrically opposite vertices. Moreover, whp no two pendant cycles in 𝖢n share a vertex.

  3. 3.

    In addition,

    1. (a)

      if pn1+δ for a constant δ>0, then both A1 and A2 are non-empty with non-negligible probability, while A3= whp.

    2. (b)

      If pn=1+o(1) and pn=1+ω(n1/3), then A1 with non-negligible probability, while A2=A3= whp.

    3. (c)

      If pn=1±O(n1/3), then both A1 and A3 are non-empty with non-negligible probability, while A2= whp.

This theorem makes a final step in the study of the automorphisms group of a random graph. Recall that 𝖧n is whp empty when np=1ω(n1/3) and that G(n,p) is connected and asymmetric when np=lnn+ω(1). We, therefore, focus on the intermediate diapason. If np as n, then the core of the giant component of G(n,p) is whp still asymmetric, as proved independently by Łuczak [28] and Linial and Mosheiff [27]. Moreover, Łuczak described the automorphisms group of the core of the giant component of G(n,p) when np>γ for a large enough constant γ, and obtained an analogue of Theorem 6 for this case; see [28, Theorem 4]. Our Theorem 6 not only extends [28, Theorem 4] to the full range of p=O(1/n) but also refines this result even for np>γ by showing that Aut(𝖢n) is actually an elementary 2-group. Another interesting observation is that some automorphisms of the core do not extend to automorphisms of the entire G(n,p). Indeed, if np=1+o(1), then whp Aut(G(n,p)) acts trivially on the core; see Remark 4.

Related work.

As we already mentioned, Theorem 1 combined with the previous results on canonical labeling of G(n,p) for 1/np1/2 implies the existence of a polynomial-time canonical labeling algorithm succeeding on G(n,p) whp for an arbitrary edge probability function p=p(n). In this form, this result has been independently obtained by Michael Anastos, Matthew Kwan, and Benjamin Moore [2]. Another result in their paper describes the action of Aut(G(n,p)) on the core of G(n,p), which follows also from our Theorem 3 and the results of Łuczak [28] and Linial and Mosheiff [27]. The techniques used in [2] and in our paper are completely different, though both proofs rely on color refinement. The two approaches have their own advantages. The method developed in [2] is used there also to show that color refinement is helpful for canonical labeling of the random graph when p1/n and to study the smoothed complexity of graph isomorphism. Our method allows obtaining precise results on the automorphism group of the core (Theorem 6).

Immerman and Lander [23] showed a tight connection between CR-identifiability and definability of a graph in first-order logic with counting quantifiers. Corollary 5 can, therefore, be recast in logical terms as follows. If p=O(1/n), then 𝖧n is whp definable in this logic with using only two first-order variables (where the definability of a graph H means the existence of a formula which is true on H and false on any graph non-isomorphic to H). Definability of the giant component of G(n,p) in the standard first-order logic (without counting quantifiers) was studied by Bohman et al. [11].

The rest of the paper and proof strategy.

Section 2 begins with formal description of the color refinement algorithm in Subsection 2.1 and then, in Subsection 2.2, presents a useful criterion of CR-distinguishability in terms of universal covers. The concept of a universal cover appeared in algebraic and topological graph theory [10, 15, 31], and its tight connection to CR was observed in [3]. Subsection 2.3 pays special attention to the CR-identifiability of unicyclic graphs, which in Subsection 2.4 allows us to obtain an explicit criterion of CR-identifiability for general graphs in terms of the complex and the simple part of a graph. Finally, in Subsection 2.5 we use the relationship between CR and universal covers to prove useful properties of the CR-coloring of the core of an arbitrary graph.

Theorem 1 and Corollaries 2 and 5 are proved in Section 3. The proofs of Theorem 1 and Corollary 5 are based on Theorem 3. A crucial ingredient of the proof of Theorem 3 is our Main Lemma (Lemma 20). This lemma says that CR is unable to distinguishe between two vertices in the core only if they lie either on pendant paths (with the same endvertices) transposable by an automorphism of the graph or on a pendant cycle admitting a reflection by an automorphism. Note that this statement alone, which is a part of the information provided by Theorem 3, is enough to derive Theorem 1 and Corollary 5.

The proof of Main Lemma is outlined in Section 4. It heavily relies on the notion of a kernel. The kernel K(G) of a graph G is a multigraph obtained from core(G) by contracting all pendant paths. That is, K(G) is obtained via the following iterative procedure: at every step if there exists a vertex u with only two neighbors v1,v2, we remove u with both incident edges and add the edge {v1,v2} instead. Note that this transformation can lead to appearance of multiple edges and loops.

To prove that CR colors vertices of the core in the described manner, we use the contiguous models due to Ding, Lubetzky, and Peres [18] in strictly supercritical regime and due to Ding, Kim, Lubetzky, and Peres in critical regime [17]. They allow to transfer properties of random multigraphs with given degree sequences to the kernel of the giant component in the random graph. Another important ingredient in our proofs is the fact that in the kernel of the supercritical random graph there are whp no small complex subgraphs. We consider separately two types of vertices: first, we prove that CR colors differently all vertices such that their neighborhoods induce trees. This is done in Sections 4.1 and 4.2 for p=1+ω(n1/3) and p=1+O(n1/3) respectively. Then, in Section 4.3, we prove that these vertices are helpful to distinguish between all the remaining vertices.

A complete proof of Theorem 3 and the proof of Theorem 6 are omitted due to the space constraints and can be found in the full version of the paper [33].

2 Color refinement: From identifiability to canonical labeling

2.1 Description of the CR algorithm

We now give a formal description of the color refinement algorithm (CR for short). CR operates on vertex-colored graphs but applies also to uncolored graphs by assuming that their vertices are colored uniformly. An input to the algorithm consists either of a single graph or a pair of graphs. Consider the former case first. For an input graph G with initial coloring C0, CR iteratively computes new colorings

Ci(x)=(Ci1(x),{{Ci1(y)}}yN(x)), (1)

where {{}} denotes a multiset and N(x) is the neighborhood of a vertex x. Denote the partition of V(G) into the color classes of Ci by 𝒫i. Note that each subsequent partition 𝒫i+1 is either finer than or equal to 𝒫i. If 𝒫i+1=𝒫i, then 𝒫j=𝒫i for all ji. Suppose that the color partition stabilizes in the t-th round, that is, t is the minimum number such that 𝒫t=𝒫t1. CR terminates at this point and outputs the coloring C=Ct. Note that if the colors are computed exactly as defined by (1), they will require exponentially long color names. To prevent this, the algorithm renames the colors after each refinement step, using the same set of no more than n color names.

If an input consists of two graphs G and H, then it is convenient to assume that their vertex sets V(G) and V(H) are disjoint. The vertex colorings of G and H define an initial coloring C0 of the union V(G)V(H), which is iteratively refined according to (1). The color partition 𝒫i is defined exactly as above but now on the whole set V(G)V(H). As soon as the color partition of V(G)V(H) stabilizes, CR terminates and outputs the current coloring C=Ct of V(G)V(H). The color names are renamed for both graphs synchronously.

We say that CR distinguishes G and H if {{C(x)}}xV(G){{C(x)}}xV(H). If CR fails to distinguish G and H, then we call these graphs CR-equivalent and write GCRH. A graph G is called CR-identifiable if GCRH always implies GH.

2.2 Covering maps and universal covers

A surjective homomorphism from a graph K onto a graph G is a covering map if its restriction to the neighborhood of each vertex in K is bijective. We suppose that G is a finite graph, while K can be an infinite graph. If there is a covering map from K to G (in other terms, K covers G), then K is called a covering graph of G. Let G be connected. We say that a graph U is a universal cover of a graph G if U covers every connected covering graph of G. A universal cover U=UG of G is unique up to isomorphism. Alternatively, UG can be defined as a tree that covers G. If G is itself a tree, then UGG; otherwise the tree UG is infinite.

A straightforward inductive argument shows that a covering map α preserves the coloring produced by CR, that is, Ci(u)=Ci(α(u)) for all i, where Ci is defined by (1). It follows that, if two connected graphs G and H have a common universal cover, i.e., UGUH, then {C(u):uV(G)}={C(v):vV(H)}. The converse implication is also true, as a consequence of the following lemma.

Lemma 7 (cf. Lemmas 2.3 and 2.4 in [26]).

Let UG and UH be universal covers of connected graphs G and H respectively. Furthermore, let α be a covering map from UG to G and β be a covering map from UH to H. For a vertex x of UG and a vertex y of UH, let UxG and UyH be the rooted versions of UG and UH with roots at x and y respectively. Then UxGUyH (isomorphism of rooted trees) if and only if C(α(x))=C(β(y)).

The union of CR-identifiable graphs does not need be CR-identifiable. However, the concept of a universal cover allows us to state the following criterion, which is an extension of [4, Thm. 5.4] (see [4, p. 649] for details).

Lemma 8.

Let G1,,Gk be connected CR-identifiable graphs and G be their vertex-disjoint union. Then G is CR-identifiable if and only if, for every pair of distinct i and j such that neither Gi nor Gj is a tree, the universal covers of Gi and Gj are non-isomorphic.

2.3 Unicyclic graphs

2.3.1 Universal covers of unicyclic graphs

For a unicyclic graph G, its core(G) is the set of vertices lying on the unique cycle of G. We use the notation c(G)=|core(G)| for the length of this cycle. For a vertex x in core(G), let Gx denote the subgraph of G induced by the vertices reachable from x along a path avoiding the other vertices in core(G). This is obviously a tree. Moreover, we define Gx as a rooted tree with root at x. Let t(x) denote the isomorphism class of the rooted tree Gx. We treat t as a coloring of core(G) and write R(G) to denote the cycle of G endowed with this coloring. Thus, R(G) is defined as a vertex-colored cycle graph. It will also be useful to see R(G) as a circular word over the alphabet {t(x):xcore(G)}; see, e.g., [22] and the references therein for more details on this concept in combinatorics on words. In fact, R(G) is associated with two circular words, depending on one of the two directions in which we go along R(G). However, the choice of one of the two words is immaterial in what follows.

Speaking about a word, we mean a standard, non-circular word. Two words are conjugated if they are obtainable from one another by cyclic shifts. A circular word is formally defined as the conjugacy class of a word. A word u is a period of a word v if v=uk for some k1. A word u is a period of a circular word w if u is a period of some representative in the conjugacy class of w. Note that if u is a period of a word v, then any conjugate of u is a period of some conjugate of v. This allows us to consider periods of circular words themselves being circular words. We define the periodicity p(w) of a circular word w to be the minimum length of a period of w. It may be useful to keep in mind that a period of length p(w) is also a period of every period of w; cf. [22, Proposition 1] (note that our terminology is different from [22]).

For a unicyclic graph G, we define its periodicity by p(G)=p(R(G)), where R(G) is seen as a circular word as explained above. Note that p(G) is a divisor of c(G) and that 1|{t(x)}xcore(G)|p(G)c(G).

Like trees, unicyclic graphs are also characterizable in terms of universal covers.

Claim 9.

A connected graph G is unicyclic if and only if UG has a unique infinite path.

The unique infinite path subgraph of UG will be denoted by P(UG). The structure of UG is clear: The cycle of G is unfolded into the infinite path P(UG). Moreover, let α be a covering map from UG to G. Then UG is obtained by planting a copy of the rooted tree Gα(x) at each vertex x on P(UG). The path P(UG) will be considered being a vertex colored graph, with each vertex x colored by t(α(x)).

The following observation is quite useful in what follows. Let α be a covering map from UG to G. The restriction of α to P(UG) is a covering map from the vertex-colored path P(UG) to the vertex-colored cycle R(G). Note that a covering map must preserve vertex colors.

Lemma 10.

Let G and H be connected unicyclic graphs. Then UGUH if and only if the circular words R(G) and R(H) have a common period. Moreover, if UGUH, then p(G)=p(H).

Proof.

In one direction the statement is clear: if R(G) and R(H) have a common period, then UGUH by the definition. Let UUGUH be a common universal cover of G and H. We can naturally see P(U) as an infinite word. An arbitrary subword of length p(G) of P(U) is a period of P(U), and the same is true for an arbitrary subword of length p(H) of P(U). It follows that P(U) has a period u=u1uq of length q=gcd(p(G),p(H)). Indeed, it is sufficient to note that there exist integers β1,β2 such that q=β1p(G)β2p(H). Thus, for any u0,uq at distance q in P(U), we get that u0=uβ1p(G)=uq+β2p(H)=uq.

If α and β are covering maps from U to G and H respectively, then α(u1)α(uq) is a period of R(G) and β(u1)β(uq) is a period of R(H). Since α and β preserve the vertex colors, we have the equality α(u1)α(uq)=β(u1)β(uq). This also implies that, in fact, q=p(G)=p(H).

2.3.2 CR-identifiability of unicyclic graphs

Claim 11.

Let G be a connected unicyclic graph. Suppose that GCRH and H consists of connected components H1,,Hm. Then

  1. 1.

    UHiUG for all i,

  2. 2.

    every Hi is unicyclic, and

  3. 3.

    c(G)=c(H1)++c(Hm).

Proof.

1. Fix i[m]. Let UG and W=UHi and αU,αW be the respective covering maps. Let y be a vertex in UHi. Since G and H are CR-equivalent, UG must contain a vertex x such that C(αU(x))=C(αW(y)). By Lemma 7, UxWy. It follows that UW.

2. Immediately by Part 1 and Claim 9.

3. Fix a period of R(G) and set T=x|V(Gx)| where the summation goes over all x in this period (this definition obviously does not depend on the choice of the period). Let p=p(G). Note that |V(G)|=c(G)pT. By Part 1 and Lemma 10, we similarly have |V(Hi)|=c(Hi)pT. The required equality now follows from the trivial equality |V(G)|=|V(H1)|++|V(Hm)|.

Lemma 12.

A connected unicyclic graph G is CR-identifiable if and only if one of the following conditions is true:

  • p(G)=1 and c(G){3,4,5},

  • p(G)=2 and c(G){4,6},

  • p(G)=c(G).

Proof.

( ) Suppose that a connected unicyclic graph G satisfies one of the three conditions and show that it is CR-identifiable. Assuming that H is CR-equivalent to G, we have to check that G and H are actually isomorphic.

Assume first that H is connected. If H is not unicyclic, then G and H have equal number of vertices but different number of edges. This implies that G and H have different degree sequences, contradicting the assumptions that GCRH. Therefore, H must be unicyclic. By Part 3 of Claim 11, we have c(G)=c(H). Along with Lemma 10, which is applicable because UGUH whenever GCRH, this implies that R(G)R(H). The last relation, in its turn, implies that GH.

Assume now that H is disconnected. Let H1,,Hm be the connected components of H. Combining Claim 11 and Lemma 10, we see that p(G)=p(H1)c(H1)<c(G). It follows that G satisfies one of the first two conditions. The restrictions on c(G), however, rule out the equality in Part 3 of Claim 11. In particular, if c(G)=6, then the only possible case is m=2 and c(H1)=c(H2)=3. However, it contradicts the equality p(H1)=p(H2)=p(G)=2. Thus, the case of disconnected H is actually impossible, that is, all such H are distinguishable from G by CR.
( ) Suppose that all three conditions are false. That is, either p(G)=1 and c(G)6, or p(G)=2 and c(G)8 (note that c(G) is even in this case), or 3p(G)<c(G) (in the last case, p(G) is a proper divisor of c(G)). In each case, R(G) is CR-equivalent to a disjoint union of two shorter vertex-colored cycles R1 and R2, both sharing the same period of length p(G) with R(G). Taking the connected unicyclic graphs H1 and H2 such that R(H1)R1 and R(H2)R2, we see that G is CR-equivalent to the disjoint union of H1 and H2 and is, therefore, not CR-identifiable.

Lemma 13.

Let G and H be connected unicyclic graphs with c(H)c(G). Assume that both G and H are CR-identifiable. Then UGUH if and only if {t(x):xcore(G)}={t(x):xcore(H)} and one of the following conditions is true:

  • GH,

  • p(G)=p(H)=1 and 3c(H)<c(G)5,

  • p(G)=p(H)=2 and c(H)=4 while c(G)=6.

Proof.

( ) By Lemma 10.
( ) Let G and H be CR-identifiable connected unicyclic graphs with c(H)c(G). Assume that UGUH. The equality {t(x):xcore(G)}={t(x):xcore(H)} immediately follows from Lemma 10. By the same corollary, p(G)=p(H)=p. If p3, then Lemma 12 yields the equality c(G)=p(G)=p(H)=c(H), which readily implies GH by using Lemma 10 once again. If p2, then either c(G)=c(H) and GH or c(H)<c(G) and then, by Lemma 12, c(H) and c(G) are as claimed.

2.4 A general criterion of CR-identifiability

Deciding whether a given graph is CR-identifiable is an efficiently solvable problem [4, 25]. For our purposes, it is beneficial to have a more explicit description of CR-identifiable graphs in terms of the complex and the simple part of a graph. We now derive such a description from the facts obtained for unicyclic graphs in the preceding subsection.

Theorem 14.
  1. 1.

    A graph G is CR-identifiable if and only if both the complex and the simple parts of G are CR-identifiable.

  2. 2.

    The simple part of G is CR-identifiable if and only if both of the following two conditions are true:

    1. (a)

      every unicyclic component of G is CR-identifiable, i.e., is as described in Lemma 12;

    2. (b)

      every two unicyclic components of G have non-isomorphic universal covers, i.e., there is no pair of connected components as described in Lemma 13.

Proof.
  1. 1.

    If G is CR-identifiable, then its complex and simple parts are both CR-identifiable as a consequence a more general fact: The vertex-disjoint union of any set of connected components of G is CR-identifiable. This fact is easy to see directly, and it also immediately follows from Lemma 8.

    In the other direction, assuming that the complex and the simple parts of G are CR-identifiable, we have to conclude that G is CR-identifiable. Lemma 8 reduces our task to verification that if H is a complex connected component of G and S is a simple connected component of G (a tree or a unicyclic graph), then the universal covers of H and S are non-isomorphic. The last condition follows from the fact that the universal cover of a tree is the tree itself and from Claim 9.

  2. 2.

    The second part of the theorem follows immediately from Lemma 8 due to the well-known fact [23] that every tree is CR-identifiable.

2.5 Coloring the cores of general graphs

We conclude this section by collecting useful general facts about the CR-colors of vertices in the core of a graph. Let G be an arbitrary graph. If x is a vertex in core(G), then in G we have a tree growing from the root x that shares with core(G) only the vertex x. We denote this rooted tree by Tx.

Claim 15.

Let G and H be graphs. Let x be a vertex in core(G) and y be a vertex in core(H). If Tx≇Ty, then C(x)C(y).

Proof.

Clearly, it suffices to prove this for connected G and H. The condition Tx≇Ty readily implies that UxG≇UyH, and the claim follows from Lemma 7.

Claim 16.

Let G and H be two graphs (it is not excluded that G=H). For vertices uV(G) and vV(H) assume that C(u)=C(v). Then ucore(G) if and only if vcore(H).

Proof.

Assume that G and H are connected (the general case will easily follow). Let αG be a covering map from UG to G, and αH be a covering map from UH to H. Consider xV(UG) and yV(UH) such that αG(x)=u and αH(y)=v. Note that ucore(G) if and only if there is a cycle in UG containing x, and the same is true about v any y. This proves the claim because UxGUyH by Lemma 7.

In our proofs, we will deal with cores that locally have a tree structure, that is, the balls of sufficiently large radii around most of its vertices induce trees. In this case, CR distinguishes vertices that have non-isomorphic neighborhoods.

Claim 17.

Let Br(v) denote the set of vertices at distance at most r from a vertex v. Let v1,v2V(G). If, for some r, the r-neighborhoods Br(v1) and Br(v1) induce non-isomorphic trees rooted in v1 and v2 respectively, then C(v1)C(v2).

Proof.

This is a direct consequence of Lemma 7.

Throughout the paper, we identify the vertex set of the kernel of G with the set of vertices of core(G) having degrees at least 3 in the core. We now state another consequence of Lemma 7.

Claim 18.

Let G be a graph with minimum degree at least 2 and let K be its kernel. Let r be a positive integer. For vV(K), let rK(v) be the subgraph of K induced by the set of vertices at distance at most r from v in K. Let r(v)G be the subdivided version of rK(v). Let v1,v2 be vertices of K such that, for some r, graphs r(v1),r(v2)G are non-isomorphic trees rooted in v1,v2. Then CG(v1)CG(v2).

Finally, we need the fact that the partition produced by CR on a graph refines the partition produced by CR on its core.

Claim 19.

Let u and v be vertices in core(G). Let C and C be the colorings produced by CR run on G and core(G) respectively. If C(u)C(v), then also C(u)C(v).

Proof.

Clearly, it is enough to prove the claim for a connected graph G. Let us assume towards a contradiction that C(u)=C(v). Let α be a covering map from UG to G. Let x,yV(UG) be such that α(x)=u and α(y)=v. Due to Lemma 7, UxGUyG. Therefore, Uxcore(G)=core(UxG)core(UyH)=Uycore(H). But then, again by Lemma 7, C(u)=C(v), a contradiction.

3 Proofs of main results

3.1 Derivation of Corollary 5 from Theorem 3

Part 1.

Any isomorphism of graphs obviously respects their cores; cf. Claim 16. Note that the CR-color of any vertex x in the core 𝖢n contains a complete information about the isomorphism type of the rooted tree Tx “growing” from this vertex (cf. Claim 15). This has the following consequence. Let 𝖢n denote the colored version of 𝖢n where each vertex x is colored by the isomorphism type of Tx. Then 𝖧n is CR-identifiable if and only if 𝖢n is CR-identifiable. In order to show that 𝖢n is CR-identifiable it suffices to show that 𝖢n is reconstructible up to isomorphism from the multiset of the vertex colors produced by CR on input 𝖢n. The CR-color partition of 𝖢n is equal to the restriction of the CR-color partition of 𝖧n to 𝖢n (recall Claim 16). Theorem 3, therefore, provides us with the following information (whp):333Note that Conditions (b) and (c) are provided by Main Lemma. Condition (a) is not essential for the argument in Subsections 3.1 and 3.2, which easily extends to the case of more than two mutually transposable pendant paths.

  1. (a)

    every CR-color class of 𝖢n has size either 1 or 2,

  2. (b)

    every two equally colored vertices have degree 2,

  3. (c)

    every two equally colored vertices are transposable by an automorphism of 𝖢n.

Moreover, our Main Lemma (Lemma 20) ensures that 𝖧n whp has no involutory automorphism of type A3 described in Section 1. Along with this fact, the above conditions readily imply that the color classes of size 2 occur either “along” a pair of transposable pendant paths between two vertices of degree at least 3 or correspond to the reflectional symmetry of a pendant cycle. Here we use the notions introduced in Section 1 in the context of Aut(𝖢n), which should now be refined by taking into account the coloring of 𝖢n.

If {u} and {v} are two color classes of size 1, then the colors C(u) and C(v) yield the information on whether the vertices u and v are adjacent or not. For color classes {u} and {v,v}, note that u and v are adjacent if and only if u and v are adjacent. This adjacency pattern is as well reconstructible from the colors C(u) and C(v)=C(v). If {u,u} and {v,v} are two color classes of size 2, then they span either a complete or empty bipartite graph or a matching (for example, u is adjacent to v, u is adjacent to v, and there is no other edges between these color classes). Each of these three possible adjacency patterns is reconstructible from the colors C(u)=C(u) and C(v)=C(v). A crucial observation, completing the proof, is that all ways to put a matching between {u,u} and {v,v} lead to isomorphic graphs.

Part 2.

This follows from part 1 by part 1 of Theorem 14.

3.2 Derivation of Theorem 1 from Theorem 3

Before proceeding to the proof, we remark that when we say that a canonical labeling algorithm succeeds on a random graph Gn, we mean that the algorithm works correctly on a certain efficiently recognizable (closed under isomorphisms) class of graphs 𝒞 such that Gn belongs to 𝒞 whp. Though not explicitly stated in the argument below, it will be clear that, in our case, 𝒞 is the class of all graphs satisfying the conditions of Theorem 3. Note that these conditions are easy to check after running CR on a graph.

First of all, we distinguish the complex and the simple parts of Gn and compute a canonical labeling of the simple part separately. This is doable in linear time. It remains to handle the complex part 𝖧n.

It is enough to compute a suitable injective coloring of 𝖧n and subsequently to rename the colors in their lexicographic order by using the labels that were not used for the simple part. To this end, we run CR on 𝖧n. This takes time O(nlogn) as CR can be implemented [9] in time O((n+m)logn), where m denotes the number of edges (which is linear for the sparse random graph under consideration). Then we begin with coloring the vertices of the core 𝖢n. Theorem 3 along with Claim 16 ensures that the vertices of degree at least 3 already received individual colors. The duplex colors occur along transposable pendant paths and pendant cycle (like in Section 3.1, these notions are understood with respect to 𝖧n rather than to 𝖢n alone). To make such vertex colors unique, we keep the original colors along one of two transposable paths and concatenate their counterparts in the other path with a special symbol. We proceed similarly with symmetric pendant cycles. In this way, every vertex x in the core 𝖢n receives an individual color (x). In the last phase, we compute a canonical labeling for each tree part Tx of 𝖧n, regarding Tx as a tree rooted at x. This coloring is not injective yet because some Tx and Ty can be isomorphic. This is rectified by concatenating all vertex colors in Tx with (x).

3.3 Proof of Corollary 2

As well known, if 1/np1/2 then the core of the giant component of G(n,p) coincides with the core of the entire graph. Due to the classical linear-time algorithms for canonical labeling of trees, this observation reduces canonical labeling of G(n,p) with 1/np1/2 to canonical labeling of its core.

Linial and Mosheiff [27] suggested an algorithm 𝖠1 that, for any p with 1np(n)<n2/3, whp labels canonically G(n,p) in time O(n4) by distinguishing between all vertices of the core. In [16], it was proved that, if ln4nnp12, then CR whp distinguishes between all vertices of the entire G(n,p). Finally, our Theorem 1 provides an algorithm 𝖠2 that, for any p=O(1/n), whp labels canonically G(n,p) in time O(nlnn). Now, consider the following algorithm 𝖠:

  1. 1.

    Run CR. If it colors differently all vertices, then halt and output the canonical labeling produced by CR.

  2. 2.

    If the algorithm does not halt in Step 1, then run 𝖠1. If it succeeds (i.e., colors differently all vertices in the core of the input graph), then halt and output the labeling produced by 𝖠1.

  3. 3.

    If the algorithm does not halt in Steps 1 and 2, then run 𝖠2 and output the labeling it produces (or give up if 𝖠2 fails).

Let us show that the algorithm 𝖠 succeeds whp for any p with p(n)1/2. Assume, to the contrary, that there exist a constant ε>0 and a sequence (nk)k such that

(𝖠 fails on G(nk,p(nk)))>ε

for all k. If there is a subsequence (nki)i and a constant C>0 such that p(nki)<C/nki for all i, then we get a contradiction with the performance of the algorithm 𝖠2. Therefore, p(nk)1nk. If there is a subsequence (nki)i such that p(nki)<nki2/3 for all i, then we get a contradiction with the performance of the algorithm 𝖠1. It follows that p(nk)nk2/3 for all k. This, however, contradicts the result of [16] that CR in this regime produces a discrete coloring of G(n,p) whp.

In order to obtain canonical labeling, whp, for all p with p(n)[0,1], we run the algorithm 𝖠 on input G and if it fails, then we run 𝖠 once again on the complement of G.

4 CR-coloring of the random graph

In this section, we state and prove our Main Lemma that describes the output of CR on the random graph. Given a graph G, we call vertices u and v in core(G) interchangeable, if

  • they both have degree 2 in core(G),

  • u and v belong to a cycle Fcore(G) with the following property: there exists a vertex w on the cycle such that w has degree at least 3 in core(G), dF(u,w)=dF(v,w), and all the other vertices on the cycle, but the vertex opposite to w when |V(F)| is even, have degree 2 in core(G). In other words, u and v either belong to a pendant cycle or to two transposable pendant paths, and the respective transposition replaces u and v.

Lemma 20 (Main Lemma).

Let γ>1 be a constant, pnγ, and Gn=G(n,p). Let Hn be the union of complex components in Gn, and Cn be its core. If CR is run on Hn, then whp any pair of vertices in Cn receiving the same color is interchangeable. Under the condition pn=1+ω(n1/3), this is true also if CR is run on Cn.

The proof of Main Lemma is given in Sections 4.14.3. We consider separately large p (supercritical phase) and small p (critical phase). In both cases, we specify good sets of vertices and show that all vertices from good sets are distinguished by CR. This is done in Section 4.1 for large p and Section 4.2 for small p. Finally, in Section 4.3 we complete the proof: we show that distinguishing between good vertices in the core is sufficient to distinguish between all pairs in the core that are not interchangeable.

For a graph G, dG(u,v) is the shortest-path distance between u and v in G. Sometimes, when the graph is clear from the context, we omit the subscript G. For a vertex v and a real number r, we denote by rG(v) the ball of radius r around v in G, i.e., the graph induced on the set of all vertices at distance at most r from v in G. For a non-negative integer r, we denote by 𝒮rG(v)rG(v) the sphere of radius r around v in G, i.e., the graph induced on the set of all vertices at distance exactly r from v in G.

For a connected graph G, its excess is the difference between the number of edges and the number of vertices. In particular, a tree has excess 1. We call -complex a connected graph with excess . The total excess of a graph without unicyclic components is the sum of excesses of all its components.

4.1 Distinguishing good vertices in the core in the supercritical and strictly supercritical phases

In this subsection, we let p=p(n) be such that γnp=1+ω(n1/3) for some constant γ>1. Denote δn:=np1. We denote the kernel and the core of the giant component of GnG(n,p) by Kn and Cn. Let Ccore be the coloring produced by CR on Cn.

We assign to every edge e of Cn the weight 1/, where 1 is the number of vertices that subdivide the edge of the kernel e belongs to. The weight of a path is the sum of weights of its edges. For u,vV(Cn), let df(u,v) be the fractional distance between u and v, i.e. the minimum weight of a path between u and v. We denote the respective metric space by n.

Fix a positive real s. Let Ds be the set of all vV(Cn) such that the ball around v in n of radius s induces an acyclic graph. For every vertex vDs and integer r<s, let 𝒫r(v) be the multiset of lengths of edge-disjoint paths from Cn that are produced by subdividing edges {x,y}E(Kn), where df(x,v)r while df(y,v)>r. We will need the following facts.

Claim 21.

Let s0.6(ln(δn3n))2/3. Whp for any two different u,vDsV(Kn), there exists an integer r0.5(ln(δn3n))2/3 such that the multisets 𝒫r(u) and 𝒫r(v) are different.

Proof.

We fix s0.6(ln(δn3n))2/3 and let D:=Ds. We prove this claim in the contiguous models G~n, defined in [17, Thm. 2] and [18, Thm. 1] and then use these theorems to conclude that it also holds in Gn. So, in what follows, K~n=K(G~n), C~n=C(G~n), and D~=D(K~n).

Let us expose K~n and let u,vD~V(K~n). As proved in [17, 18], whp N=|V(K~n)|=Θ(δn3n). Assume first that the distance between u and v is at most 0.4(ln(δn3n))2/3 in K~n. Let P be the shortest path between u and v – it is unique due to the definition of D~. Let v be a neighbor of v in K~n that does not belong to P. Then, by the definition of D~, |𝒮rK~n(v)rK~n(v)|2r for all r[0.4(ln(δn3n))2/3,0.5(ln(δn3n))2/3]. Since u,vD~, we have that sK~n(u) and sK~n(v) are trees. It immediately implies, that for every such r, |𝒮r+1K~n(v)r+1K~n(u)|2r.

We then generate subdivisions of the edges of the kernel from the definition of G~n in the following order: for every r=0.4(ln(δn3n))2/3,,0.5(ln(δn3n))2/3, we, first, subdivide all edges growing from r+1K~n(u) outside of the ball, and then all edges growing from 𝒮r+1K~n(v) outside of r+1K~n(v). Notice that all sets 𝒮r+1K~n(v) are disjoint for different r. For every r, as soon as the edges that correspond to the vertex u are subdivided, the event that 𝒫r+1(u)=𝒫r+1(v) immediately implies that the random multiset of lengths of paths from C~n, that are produced by subdividing edges from K~n that grow from r+1K~n(v) outside, should be equal to a predefined value. This multiset has size at least 2r. Since the geometric random variables considered in [17, 18] do not have atoms with probability measure 1o(1), the latter event has probability at most 2Θ(r) due to the de Moivre–Laplace local limit theorem. Eventually,

(𝒫r+1(u)=𝒫r+1(v) for all r[0.4(ln(δn3n))2/3,0.5(ln(δn3n))2/3])exp(Θ((log(δn3n))4/3)).

Assume now that the distance between u and v is bigger than 0.4(ln(δn3n))2/3 in K~n. Then, by the definition of D~, sets 0.2(ln(δn3n))2/3K~n(v) and 0.2(ln(δn3n))2/3K~n are disjoint and sets 𝒮rK~n(v) have size at least 2r for all r[0.15(ln(δn3n))2/3,0.2(ln(δn3n))2/31]. As above, we get that 𝒫r(u)=𝒫r(v) for all r[0.15(ln(δn3n))2/3,0.2(ln(δn3n))2/31] with probability at most exp(Θ((log(δn3n))4/3)).

The union bound over all pairs u,vD~ along with [17, Thm. 2] and [18, Thm. 1] completes the proof.

Claim 22.

Let s:=(ln(δn3n))2/3 and D=Ds. Whp, Ccore(u)Ccore(v) for any distinct u,vD.

Proof.

Assume that the assertion of Claim 21 holds for s=s and s=s1 deterministically. Let u,vDV(Kn). Let Bu and Bv be the subdivided versions of sKn(u) and sKn(v) in Cn. Since BuBv due to the conclusion of Claim 21, we get that Ccore(u)Ccore(v) due to Claim 18. It remains to consider the case vDV(Kn) and vuD. Assume towards contradiction that Ccore(u)=Ccore(v). Then, both u and v have degree 2 in Cn. In particular, uV(Kn). Consider the edges eu,ev of Kn that u and v subdivide. Let Pu,Pv be the subdivided versions of eu,ev. Due to the assertion of Claim 21 applied to s=s1, we get that all vertices of Kn from euev have different colors. On the other hand, by the definition of CR, the neighbors of u should have exactly the same color as the neighbors of v. Thus, by induction, we get that the entire paths Pu,Pv are colored identically. It may only happen if the endpoints of Pu coincide with the endpoints of Pv. By the definition of D, it means that Pu=Pv=:P. Since the endpoints of P are colored in different colors, it can be easily shown by induction that all vertices in P are also colored in different colors. Thus, u=v, yielding a contradiction.

4.2 Distinguishing good vertices in the core in the critical regime

Let A be a large positive number. Let 1n1/3lnnpn=1+o(1). Whp any complex component in GnG(n,p) has size at least 100Alnn due to the following well-known fact.

Claim 23.

Let γ>1, npγ, and GnG(n,p). There exists ε=ε(γ) such that ε as γ1 and whp any connected subgraph of Gn of size at most εlnn is not complex.

Let us say that a path u1uk extends the path v1vk if, for some i{2,,k} the sets {v1,,vi1} and {uki+2,,uk} are disjoint and u1=vi,,uki+1=vk. For convenience, we assume that this notion is closed under rotations of paths, i.e. if u1uk extends v1vk, then it also extends vkv1 and we also say that uku1 extends both v1vk and vkv1 in this case. We call two paths v1vk and u1uk weakly disjoint, if they are either vertex-disjoint or one paths extends the other one.

Claim 24.

Whp in Gn there are no two weakly disjoint paths v1vk and u1uk of length k=Alnn such that, for every i{2,,k1}, vi has degree 2 if and only if ui has degree 2.

Proof.

Due to Claim 23, whp in Gn there are no complex subgraphs with at most 2k vertices.

For a path P=v1vk in Gn, let us consider a binary word w(P)=(w2,,wk1) defined as follows: wi=1 if and only if vi has degree 2 in Gn. Notice that, if a path u1uk extends the path v1vk so that u1=vi,,uki+1=vk and w(v1vk)=w(u1uk), then w(u1uk) is periodic and defined by w(v1vi+1)=(w2,,wi).

Let X be the number of pairs of paths as in the statement of the claim and such that there are at most 2 edges between the paths (we are allowed to assume this since there are no complex subgraphs of size at most 2k). Fix two weakly disjoint path 𝐯=v1vk and 𝐮=u1uk and assume without loss of generality that either 𝐮 extends 𝐯, or they are disjoint. Let i be such that u1=vi. If there is no such i, i.e. the paths are disjoint, set i=k+1. Then, expose edges from all vj, ji, and assume that they send at most 2 edges to u2,,uk1, other than the edge {u1,u2}. Then, probability that for every j{2,,k1}, vj has degree 2 if and only if uj has degree 2, is at most

max{(1p)n2k,(1(1p)n)}k4(1e(1+o(1)))k4=o((23)k).

We then get

𝔼Xi=2k+1nk+i1pk+i3(23)kkn2(1+o(1))2k(23)k=o(1),

for an appropriate choice of A. Due to Markov’s inequality, (X1)𝔼X=o(1), completing the proof.

Let D be the set of all vertices v in Cn=core(Gn) that belong to a complex component of Gn and such that Bv:=3AlnnGn(v) is a tree. Let C be the coloring produced by CR on Gn.

Claim 25.

Whp, C(u)C(v) for any u,vD.

Proof.

Assume that the statement of Claim 24 holds deterministically in Gn. Fix two different u,vD. Let us show towards contradiction that trees Bv and Bu are not isomorphic.

Take an arbitrary path v1vk of length k:=1.9Alnn, where v1=v. Since, by assumption, BvBu, there exists a path 𝐮=u1uk such that u1=u and, for every i{2,,k1}, vi has degree 2 if and only if ui has degree 2. In the same way, since vcore(Gn) and Bv is a tree, we may consider a path v1vk that shares only the vertex v1=v=v1 with v1vk. Since BuBv, there should be a path 𝐮=u1uk that shares with u1uk the only vertex u1=u=u1 and such that, for every i{2,,k1}, vi has degree 2 if and only if ui has degree 2. Since Bv is acyclic and since pairs of paths v1vk,u1uk and v1vk,u1uk cannot be disjoint due to Claim 24, u must lie on the path P:=vkv1vk. Moreover, since Bu is acyclic, once 𝐮 or 𝐮 leave P, they never meet with P again. Thus, the path uku1uk is divided by P in at most 3 parts: the first part does not have common vertices with P, the second part is a subpath of P, and the third part does not have common vertices with P again. Let Q be the longest part of the three. Then Q has length 13(2k1)>Alnn. Moreover, since degGnu=degGnv by assumption and uv, there should be a subpath PP such that P and Q are weakly disjoint, and the degrees of internal vertices in P and Q are aligned in the sense that the i-th inner vertex of P have degree 2 if and only if the i-th vertex of Q has degree 2. This is impossible due to Claim 24. Thus, BvBu implying C(u)C(v) due to Claim 17.

4.3 Completing the proof of Main Lemma (Lemma 20)

Due to Claims 22, 25, and 19, it remains to prove the following:

  1. 1.

    If 1+ω(n1/3)=pnγ, then

    • Ccore(u)Ccore(v) for every vV(Cn)D and uD;

    • Ccore(u)Ccore(v) for any non-interchangeable pair u,vV(Cn)D;

  2. 2.

    If 1n1/3lnnpn=1+o(1), then

    • C(u)C(v) for every vV(Cn)D and uD;

    • C(u)C(v) for any non-interchangeable pair u,vV(Cn)D.

We will use the following technical fact, which follows from [17, Thm. 2] and [18, Thm. 1].

Claim 26.

Let δ>0 be a constant, n1/3δn:=pn1δ, and GnG(n,p). Then whp in K(Gn) there are no complex subgraphs of size at most (ln(nδn3))3/4.

For the sake of brevity, below we prove both statements in two different regimes simultaneously. Thus, with some abuse of notation, in the supercritical phase (i.e., 1+ω(n1/3)=pnγ), we let Gn:=Cn and C:=Ccore as we only consider CR on Cn. We also assume that when 1+ω(n1/3)=pnγ, the core is equipped with the fractional distance df, constituting the metric space n. If 1n1/3lnnpn1+o(1), then Gn is equipped with the usual shortest-path distance, that we denote by df as well. We also use the following notation: d=(ln(δn3n))2/3 when we prove the assertion for 1+ω(n1/3)=pnγ and d=3Alnn when we prove it for 1n1/3lnnpn=1+o(1). In what follows, we assume that the assertions of Claims 22, 23, 25, and 26 hold deterministically in Gn.

  1. 1.

    Assume that some vD and uD have C(u)=C(v). We know that v is d-close to a cycle F of length at most 2d. If vV(F), then let v be the closest to v vertex on F that has degree more than 2 in the core. Otherwise, let v=v. Let P be the shortest path from v to F. Let us extend this path by a path P of length 10d beyond v. Due to Claim 23 and Claim 26, it has a subpath ww of length 5d consisting of vertices from D only such that df(w,v)d. We know that all elements of the vector 𝐜:=(C(w),,C(w)) are different. Then, due to our assumption, u must have a vertex z at distance at most df(w,v) such that z is the first vertex of a path zz with C(w)=C(z), , C(w)=C(z).

    We now consider separately two cases: w=z and wz. In the first case, we have that the distance from u to the closest cycle (which is F, the same as for v) is at most

    df(w,u)+df(w,v)+df(v,F)length of F+2(df(w,v)+df(v,F))6d.

    Let P be the shortest path between u and v. Due to Claim 23 and Claim 26, there exists a path uw~w~ of length 5d+1 that does not meet P and consists of vertices from D only. Due to Claim 22 and Claim 25 all elements of the vector 𝐜~:=(C(w~),,C(w~)) are different and no element of 𝐜~ equals to any element of 𝐜. Moreover, by construction, df(v,w~)>df(u,w~). Then, due to our assumption, v must have a neighbor z~w~ such that z~ is the first vertex of a path z~z~ with C(w~)=C(z~), , C(w~)=C(z~). Note that w~z~,,w~z~ due to Claim 23 and Claim 26. Since all vertices in D are distinguished by C(), we conclude that all vertices z~,,z~ must be outside D. Due to Claim 23, Claim 26, and the definition of D, they constitute a (self-avoiding) path and are d-close to a cycle of length at most 2d. Since the path has length 5d, we get a contradiction with Claim 23 or Claim 26.

    We then assume wz. It may only happen when zD. Moreover, all z,,z are not in D. Indeed, otherwise, different paths ww and zz have common vertices. Then the path from z to F that goes through w has length greater than d. However, due to Claim 23 and Claim 26, there are no two different paths from z to F, both of length at most 12d and, also, there is no other cycle F of length at most 2d such that a path from z to F has at most d vertices. This contradicts the fact that zD. Thus, we again get a (self-avoiding) path consisting of vertices z,,z that are d-close to a cycle of length at most 2d. This contradicts Claim 23 or Claim 26 again, since the path has length 5d. We conclude that every vertex uD has C(u) that does not equal to the color of any other vertex in the core.

  2. 2.

    It remains to prove that, for any two distinct u,vD that are not interchangeable, C(u)C(v). Fix two such vertices u and v. We may assume that TuTv since otherwise C(u)C(v) due to Claim 15. Let Fu and Fv be two cycles of length at most 2d that are closest to u and v respectively (both are at distance at most d from the respective vertices). If FuFv, then set F:=Fu. In this case, we let u=u when uV(F) and let u be the closest vertex of degree 3 in F to u otherwise. If Fu=Fv=:F, then, without loss of generality we assume that either u is not in F or both u,v are in F. Let u=u when uV(F) and let u be a vertex of F that has degree at least 3 and such that df(u,u)df(v,u) otherwise. Note that such a vertex exists due to the definition of an interchangeable pair. Consider a path P of length 10d that starts at u and does not meet F. Due to Claim 23 and Claim 26, this path has a vertex w in D such that d(w,u)d. If FuFv, then

    df(v,w)df(Fu,Fv)df(v,Fv)df(w,Fu)>df(u,w)

    due to Claim 23 and Claim 26. Finally, let Fu=Fv. Assume, in addition, uV(F). Then the only possibility for C(u) to be equal to C(v) is to have a path P between v and w of length df(u,w). Let us extend P by a path of length 10d beyond v. Due to Claim 23 and Claim 26 this path has a vertex w from D such that df(w,v)d. But then df(u,w)>df(v,w), implying C(u)C(v). If u,vV(F), then

    df(v,w)=df(v,u)+df(u,w)df(u,u)+df(u,w)=df(u,w).

    In either case, we get df(v,w)df(u,w) or C(u)C(v). Recalling that w has a unique color, we readily conclude that C(u)C(v), completing the proof.

References

  • [1] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullmann. The Design and Analysis of Computer Algorithms. Addison-Wesley Longman Publishing Co., 1974.
  • [2] Michael Anastos, Matthew Kwan, and Benjamin Moore. Smoothed analysis for graph isomorphism, 2024. arXiv:2410.06095.
  • [3] D. Angluin. Local and global properties in networks of processors. In The 12th Annual ACM Symposium on Theory of Computing, pages 82–93, 1980.
  • [4] Vikraman Arvind, Johannes Köbler, Gaurav Rattan, and Oleg Verbitsky. Graph isomorphism, color refinement, and compactness. Comput. Complex., 26(3):627–685, 2017. doi:10.1007/s00037-016-0147-6.
  • [5] L. Babai, P. Erdős, and S. M. Selkow. Random graph isomorphism. SIAM Journal on Computing, 9(3):628–635, 1980. doi:10.1137/0209047.
  • [6] László Babai. Moderately exponential bound for Graph Isomorphism. In Proc. of the 3rd Int. Conf. on Fundamentals of Computation Theory (FCT’81), volume 117 of Lecture Notes in Computer Science, pages 34–50. Springer, 1981. doi:10.1007/3-540-10854-8_4.
  • [7] László Babai. Graph isomorphism in quasipolynomial time. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing (STOC’16), pages 684–697, 2016. doi:10.1145/2897518.2897542.
  • [8] László Babai. Canonical form for graphs in quasipolynomial time: preliminary report. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC’19), pages 1237–1246, 2019. doi:10.1145/3313276.3316356.
  • [9] Cristoph Berkholz, Paul Bonsma, and Martin Grohe. Tight lower and upper bounds for the complexity of canonical colour refinement. Theory of Computing Systems, 60:581–614, 2017. doi:10.1007/S00224-016-9686-0.
  • [10] N. Biggs. Algebraic graph theory. Cambridge University Press, 2nd edition, 1994.
  • [11] Tom Bohman, Alan M. Frieze, Tomasz Luczak, Oleg Pikhurko, Clifford D. Smyth, Joel Spencer, and Oleg Verbitsky. First-order definability of trees and sparse random graphs. Comb. Probab. Comput., 16(3):375–400, 2007. doi:10.1017/S0963548306008376.
  • [12] Bela Bollobás. Distinguishing vertices of random graphs. Ann. Discrete Math., 13:33–50, 1982.
  • [13] Bela Bollobás. The evolution of random graphs. Transactions of the American Mathematical Society, 286(1):257–274, 1984.
  • [14] Bela Bollobás. Random graphs. Cambridge University Press, 2001.
  • [15] D. M. Cvetković, M. Doob, and H. Sachs. Spectra of graphs. Theory and applications. Leipzig: J. A. Barth Verlag, 3rd edition, 1995.
  • [16] Tomek Czajka and Gopal Pandurangan. Improved random graph isomorphism. Journal of Discrete Algorithms, 6:85–92, 2008. doi:10.1016/J.JDA.2007.01.002.
  • [17] Jian Ding, Jeong Han Kim, Eyal Lubetzky, and Yuval Peres. Anatomy of young giant component in the random graph. Random Structures & Algorithms, 39(2):139–178, 2011. doi:10.1002/RSA.20342.
  • [18] Jian Ding, Eyal Lubetzky, and Yuval Peres. Anatomy of the giant component: The strictly supercritical regime. European Journal of Combinatorics, 35:155–168, 2014. doi:10.1016/J.EJC.2013.06.004.
  • [19] Paul Erdős and Alfred Rényi. On the evolution of random graphs. Publ. Math. Inst. Hungar. Acad. Sci., 5:17–61, 1960.
  • [20] Paul Erdős and Alfred Rényi. Asymmetric graphs. Acta Math Acad Sci Hung, 14:295–315, 1963.
  • [21] Julia Gaudio, Miklós Z. Rácz, and Anirudh Sridhar. Average-case and smoothed analysis of graph isomorphism, 2023. arXiv:2211.16454.
  • [22] László Hegedüs and Benedek Nagy. On periodic properties of circular words. Discrete Mathematics, 339(3):1189–1197, 2016. doi:10.1016/j.disc.2015.10.043.
  • [23] Neil Immerman and Eric Lander. Describing Graphs: A First-Order Approach to Graph Canonization, pages 59–81. Springer New York, 1990.
  • [24] Svante Janson, Tomasz Łuczak, and Andrzej Ruciński. Random graphs. John Wiley & Sons, 2000.
  • [25] Sandra Kiefer, Pascal Schweitzer, and Erkal Selman. Graphs identified by logics with counting. ACM Trans. Comput. Log., 23(1):1:1–1:31, 2022. doi:10.1145/3417515.
  • [26] Andreas Krebs and Oleg Verbitsky. Universal covers, color refinement, and two-variable counting logic: Lower bounds for the depth. In 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS’15), pages 689–700. IEEE Computer Society, 2015. doi:10.1109/LICS.2015.69.
  • [27] N. Linial and J. Mosheiff. On the rigidity of sparse random graphs. Journal of Graph Theory, 85(2):466–480, 2017. doi:10.1002/JGT.22073.
  • [28] T. Łuczak. The automorphism group of random graphs with a given number of edges. Math Proc Camb Phil Soc, 104:441–449, 1988.
  • [29] Tomasz Łuczak. The phase transition in a random graph. In D. Miklós, V.T. Sós, and T. Szőnyi, editors, Combinatorics, Paul Erdős is Eighty, volume 2, pages 399–422. Bolyai Soc. Math. Stud. 2, J. Bolyai Math. Soc., Budapest, 1996.
  • [30] Tomasz Łuczak, Boris Pittel, and John C. Wierman. The structure of a random graph at the point of the phase transition. Transactions of the American Mathematical Society, 341(2):721–748, 1994.
  • [31] W. S. Massey. Algebraic topology: An introduction, volume 56 of Graduate Texts in Mathematics. Springer, 5th edition, 1981.
  • [32] Marc Noy, Vlady Ravelomanana, and Juanjo Rué. On the probability of planarity of a random graph near the critical point. Proc. Am. Math. Soc., 143(3):925–936, 2015. doi:10.1090/S0002-9939-2014-12141-1.
  • [33] Oleg Verbitsky and Maksim Zhukovskii. Canonical labeling of sparse random graphs, 2024. arXiv:2409.18109.
  • [34] B.Yu. Weisfeiler and A.A. Leman. The reduction of a graph to canonical form and the algebra which appears therein. NTI, Ser. 2, 9:12–16, 1968. In Russian. English translation is available at https://www.iti.zcu.cz/wl2018/pdf/wl_paper_translation.pdf.
  • [35] E. M. Wright. Asymmetric and symmetric graphs. Glasgow Math J, 15:69–73, 1974.