Color Refinement for Relational Structures
Abstract
Color Refinement, also known as Naive Vertex Classification, is a classical method to distinguish graphs by iteratively computing a coloring of their vertices. While it is traditionally used as an imperfect way to test for isomorphism, the algorithm has permeated many other, seemingly unrelated, areas of computer science. The method is algorithmically simple, and it has a well-understood distinguishing power: it has been logically characterized by Immerman and Lander (1990) and Cai, Fürer, Immerman (1992), who showed that it distinguishes precisely those graphs that can be distinguished by a sentence of first-order logic with counting quantifiers and only two variables. A combinatorial characterization was given by Dvořák (2010), who showed that it distinguishes precisely those graphs that differ in the number of homomorphisms from some tree.
In this paper, we introduce Relational Color Refinement (RCR, for short), a generalization of the Color Refinement method from graphs to arbitrary relational structures, whose distinguishing power admits the equivalent combinatorial and logical characterizations as Color Refinement has on graphs: we show that RCR distinguishes precisely those structures that differ in the number of homomorphisms from an acyclic connected relational structure. Further, we show that RCR distinguishes precisely those structures that are distinguished by a sentence of the guarded fragment of first-order logic with counting quantifiers. Additionally, we show that for every fixed finite relational signature, RCR can be implemented to run on structures of that signature in time , where denotes the number of tuples present in the structure.
Keywords and phrases:
color refinement, counting logics, homomorphism counts, homomorphism indistinguishability, guarded logics, pebble games, relational structures, alpha-acyclicity, join-treesCopyright and License:
2012 ACM Subject Classification:
Theory of computation Finite Model Theory ; Theory of computation Graph algorithms analysis ; Mathematics of computing Graph theoryEditors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał SkrzypczakSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Color Refinement (CR, for short) constitutes a simple procedure to classify the vertices of a graph ; it is well-understood and widely used in many areas of computer science. The idea is a simple iteration: given a coloring of the vertices of a graph , one computes a new coloring of following a certain procedure. The new coloring is then used to compute another coloring following the same procedure, and so on, until the coloring stabilizes, i.e., the partitioning of induced by the new coloring is the same as the one induced by the previous coloring. The procedure to compute the new coloring is very simple: two vertices shall get different colors if they already have different colors, or they have a different number of neighbors of some color. Otherwise, they receive the same color. To start the iteration, one either uses the colors of the vertices (if is a colored graph), or the uniform coloring that assigns to every vertex the same color. This approach is sometimes also called “naive vertex classification” or the “1-dimensional Weisfeiler-Leman algorithm”. CR is often formalized using multisets in the following way, see e.g. [19, Chapter 3]. Considering an uncolored, undirected, simple graph , we start with for all , and for all , we let Note that this formalizes the procedure described above: if , then ; and if and disagree in the number of neighbors of some color, then we have , hence, . This formalization has the additional advantage that the colorings assign canonical colors, i.e., the colors themselves do not depend on the vertex set of . Cardon and Crochemore [11] showed that CR can be implemented to run in time , where denotes the number of vertices, and the number of edges; and Berkholz, Bonsma, Grohe [5] showed that even a canonical coloring can be computed within the same running time.
Applications.
An obvious application of CR is to test for graph isomorphism: if there is an and a such that (we say CR distinguishes and if this is true), then and cannot be isomorphic. However, this test is not perfect, since there exist non-isomorphic pairs of graphs that are not distinguished by CR. Nevertheless, it is a common subroutine in practical isomorphism testers and even plays a part in Babai’s seminal result that graph isomorphism is decidable in quasi-polynomial time [3]. In recent years, the classification of “similar vertices” that CR establishes has been applied to other problems as well: it was used in [23, 22] to reduce the cost of solving linear programs, in [31] it was used to speed up the evaluation of binary acyclic conjunctive queries, and in the area of machine learning, it is used as a graph kernel [37] and was proven to be equivalent to so-called Graph Neural Networks (GNNs) w.r.t. vertex classification [38, 21].
The power of CR.
The power of CR is well-understood [2, 5, 10, 15, 26, 28] – consult e.g. [19, 27] for an overview. Some key results on the distinguishing power of CR are a logical characterization due to Immerman and Lander [26] and Cai, Fürer, Immerman [10], and a combinatorial characterization w.r.t. the concept of “homomorphism indistinguishability” due to Dvořák [15] and Dell, Grohe, Rattan [13]. In [10, 26] it is shown that CR distinguishes and if, and only if, there is a sentence of first-order logic with counting quantifiers with at most 2 variables (, for short) such that and . In [15, 13] it is shown that CR distinguishes two graphs and if, and only if, and differ in the number of homomorphisms from some tree into and . If no such exists, one says that and are homomorphism indistinguishable over the class of trees. This result sparked active research in recent years exploring the concept of homomorphism indistinguishability over various graph classes, see e.g. [12, 16, 20, 29, 30, 32, 36, 33, 34]. These characterizations can explain the success of the CR method, and in particular, they give us a hint on why the vertex classification produced by CR is so powerful: two vertices get classified as “similar” by CR if the number of homomorphisms from every rooted tree into that map the root to is equal to the number of homomorphisms from into that map the root to .
Contributions.
With the success of CR in mind, it is an obvious question how one could devise a method to color arbitrary finite relational structures, not just graphs.
In particular, we would like a method that admits a combinatorial characterization w.r.t. homomorphism counts from the class of acyclic connected relational structures (for a sensible, broad definition of acyclicity) and a logical characterization for a sensible logic, analogously to the ones of CR mentioned above.
We propose Relational Color Refinement (RCR, for short) as such a method and show that it indeed admits the desired characterizations and, at the same time, can be implemented to run in time comparable to the running time of classical CR.
Our main result reads as follows.
Theorem A.
Let be a finite relational signature.
-
(a)
RCR can be implemented to run in time upon input of a -structure , where denotes the number of tuples present in .
-
(b)
For all -structures and , the following statements are equivalent:
-
(1)
RCR distinguishes and .
-
(2)
There exists an acyclic and connected -structure such that .
-
(3)
There exists a sentence such that and .
-
(4)
Spoiler wins the Guarded Game on .
-
(1)
Here, denotes the number of homomorphisms from to (cf. Section 4), while is the guarded fragment of the logic (cf. Section 5.1), and the Guarded Game is a particular variant of Ehrenfeucht-Fraïssé games (cf. Section 5.2). The technically most challenging parts are proving (a) and proving the equivalence of (1) and (2) in (b).
Section 2 introduces the necessary basic concepts and notations used throughout the paper. In Section 3 we introduce RCR, discuss its connection to CR, and provide a proof of Theorem A (a). Section 4 is devoted to the proof of the equivalence of statements (1) and (2) of Theorem A (b). Section 5 starts by introducing the logic and the Guarded Game, followed by the proof of the equivalence of statements (1), (3) and (4) of Theorem A (b). Finally, we conclude in Section 6 with a summary and an outlook on future work. Due to space limitations, many details had to be deferred to the paper’s full version.
Further related work.
The articles [2, 28] studied related, but different questions that do not concern the power of distinguishing two given graphs, but rather the power of distinguishing one given graph from all other graphs. Arvind, Köbler, Rattan, Verbitsky [2] provided a characterization of those graphs that are amenable by CR in the sense that CR distinguishes from every graph that is not isomorphic to . Kiefer, Schweitzer, Selman [28] provided a characterization of those graphs and, more generally, finite relational structures that can be identified (i.e., described up to isomorphism) by a sentence of the logic . In [2] it is noted that by the result of [26, 10] it follows that “amenability by CR” and “identifiability by ” are equivalent notions.
The work by Dell, Grohe, Rattan [13] has been generalized to relational structures by Butti and Dalmau [8]. However, they apply classical CR on the incidence graph of the relational structure and use the weaker notion of Berge-acyclicity that is subsumed by our notion of acyclicity; consequently, the distinguishing power of their algorithm is considerably weaker than that of RCR.
There is also related work in this direction on hypergraphs, which are conceptually similar to relational structures. Böker [9] introduced a variant of CR that works on hypergraphs, and showed that it distinguishes two hypergraphs if, and only if, there is a Berge-acyclic hypergraph that has a different number of homomorphisms to them. Again, the distinguishing power yielded by Berge-acyclicity is considerably weaker than that of the more general acyclicity notion considered in our work. The connection between the logic and homomorphism counts from trees due to [15, 13] was generalized to the logic (a logic similar to the guarded fragment , but tailored towards hypergraphs) and homomorphism counts from acyclic hypergraphs by Scheidt and Schweikardt [34].
2 Preliminaries
Basic notation.
We write for the set of non-negative integers, and we let . For we write to denote the set (i.e., , , and for ). For a set we write to denote the power set (i.e., the set of all subsets) of ; and for we let . We use bold letters to denote tuples . The tuple’s arity is , and denotes the tuple’s -th entry (for ). We let . We write to describe the function with for .
A multiset is a tuple , where is a set and is a function ; the number indicates the multiplicity with which the element occurs in the multiset . We write to denote the multiplicity of in the multiset ; in particular, denotes that . We adopt the usual notation for multisets using brackets in which each is listed exactly times. E.g., denotes the multiset .
A coloring of a set is a function for some set . Let and be two colorings of the same set . We say that refines , if for all : . The colorings and are equivalent, if for all : .
An (uncolored, undirected, simple) graph is a tuple , where is a finite set of vertices and is a set of edges. A forest is an acyclic graph; and a tree is a connected forest.
Relational Structures.
A (finite, relational) signature is a finite, non-empty set; the elements in are called relation symbols. Every has an associated arity . The arity of is defined as . By , for , we denote the subset of relation symbols of with arity exactly .
A structure of signature (for short, -structure) consists of a finite, non-empty set (called the universe of ), and a relation for every . We additionally require that every occurs as an entry in at least one tuple of at least one relation of – note that this assumption can easily be met, e.g. by inserting into a new relation symbol of arity 1 with (here, we identify a tuple of arity 1 with the element ). By we denote the set of all tuples that belong to some relation of . We define the size111In the literature, usually denotes the size of a reasonable representation of as input for an algorithm. Since we consider to be a fixed signature and restrict attention to structures where each occurs in at least one tuple in , within the -notation our notion of is equivalent to the one used in the literature. of as . We say that two -structures , have strictly equal size, if for every .
The Gaifman graph of is defined as the (undirected, simple) graph with and where consists of all for which there is a tuple with . A -structure is called connected if its Gaifman graph is connected.
A binary signature is a signature where every has arity . A colored multigraph is a structure of a binary signature. The binary relations of can be viewed as directed edge relations that carry specific labels, and the unary relations of can be viewed as assigning specific labels to the vertices of .
Color Refinement (CR).
Color Refinement (CR) can be adapted to colored multigraphs by including the vertex labels and the loops in the base color, and the edge labels in the iteration. This can be formalized as follows. Let be a binary signature, and let be a -structure. For every , let , and for all let
where denotes the Gaifman graph of and
Types.
The notion of atomic type and the similarity type of tuples will be crucial for the definition of RCR. Let be an arbitrary signature. For a -structure and a tuple of arity , the atomic type is the set . For every tuple of arity , the similarity type between and is defined as the set . We use as shorthand for .
In general, an atomic type of arity (over signature ) is a subset of . A similarity type of arity (over signature ) is a subset of that satisfies the following condition of “rectangularity”: for all and , if then . We write and to denote the arity of and . If has arity , we simply say that has arity and write . We let be the set of all similarity types of arity for any . We say that a tuple has atomic type if . Analogously, we say that has similarity type , if ; and we say that , have similarity type , if .
3 Color Refinement on Relational Structures
The goal of this section is to introduce RCR as a generalization of CR from graphs to relational structures. Let be an arbitrary (relational) signature; this will be fixed throughout the rest of the paper.
3.1 Relational Color Refinement (RCR, for short) – Definition
Consider an arbitrary -structure . The key idea of RCR is to color the tuples in and take into account the tuples’ atomic type and their mutual overlap (the latter is formalized by their similarity type). The details are as follows.
We iteratively compute colors for every tuple . For every , the initial color consists of the atomic type and the similarity type of , i.e., it is . For , the color after iterations is defined as , where
Note that . By definition, refines for all . The -th coloring is stable, if for all we have . It is easy to see that for every -structure there is an such that the -th coloring is stable; we let be the smallest such number, and we write to denote .
For we write to denote the set of colors produced in the -th refinement round, i.e., . For each we let , i.e., is the number of tuples with color . We let ; and we will call this the set of stable colors on produced by RCR.
We say that RCR distinguishes two -structures and in round , if there is a color such that . Furthermore, we say that RCR distinguishes and if there is an such that RCR distinguishes and in round . It is straightforward to see that if and are not of strictly equal size, then RCR distinguishes and in round 0.
A run of RCR on particular example structures can be found in the paper’s full version.
3.2 Connection between RCR and CR
For the signature with and , the following is straightforward to see. For any (simple, undirected) graph let be the -structure that represents as follows: and consists of the tuples and for all . RCR on produces a stable coloring that is equivalent to the stable coloring produced by CR on . Therefore, RCR can be viewed as a generalization of CR from graphs to -structures for arbitrary (relational) signatures .
Next, we point out that running RCR on a -structure produces the same result as running CR on a suitably defined colored multigraph representation of . The colored multigraph can be viewed as a suitably colored generalization, from graphs to relational structures, of the notion of the line graph (cf. [14]) associated with an undirected graph . It will be crucial for the proof of Theorem A (b).
Definition 3.1.
We represent a -structure by a colored multigraph of the signature
where for all and for all .
The universe of consists of a new element for every tuple . Furthermore, , for all . And for all we have .
Example 3.2.
Consider with and , and let be the -structure with the universe , where , and . The representation of as a colored multigraph is depicted in Figure 1(a). To keep the figure easy to grasp, we labeled the vertices with the tuples they represent, omitted the self-loops, and contracted multi-edges into a single one with a combined edge label, where denotes the tuple .
It is easy to see that running RCR on a -structure produces (in the same number of rounds) a stable coloring of that is equivalent (via identifying with ) to the stable coloring produced by classical CR on the colored multigraph .
Let us define the cohesion of as the number of all tuples with and . Obviously, .
It is known that classical CR can be implemented to run in time on colored multigraphs, where denotes the number of vertices and denotes the total number of edges (cf., [11, 5]). Thus, on the colored multigraph representing , CR runs in time where and (for each fixed signature ). Thus, by running CR on , we obtain an implementation of RCR on that, for each fixed signature , runs in time . In the next subsection, we will improve the running time to by using a representation of by a colored multigraph different from .
3.3 Implementing RCR in Time
Theorem B.
For each fixed signature , RCR can be implemented to run in time upon input of a -structure .
The factor hidden by the -notation is of size , where is the maximum arity of the relation symbols in .
Recall from Section 3.2 that by performing classical CR on the colored multigraph we can implement RCR on a -structure with runtime , where denotes the number of vertices and denotes the total number of edges of . Note that the nodes for all tuples that share an element form a clique in . This causes a blow-up of the number of edges in . We will alleviate this by resolving every clique by inserting a constant number of fresh vertices that are connected to all tuples participating in a clique. This will drastically reduce the number of edges, yielding a new colored multigraph whose number of nodes and edges is in . For the precise definition of we need the following notation.
Definition 3.3.
Let be a -structure. We call a tuple a slice (over ), if its elements are pairwise distinct (i.e., ) and . We call a slice of , if is a slice over and . For an , we write for the set of all slices of , i.e., . Conversely, for a slice over we denote by the set of tuples in that is a slice of.
Definition 3.4.
Let be a -structure. Let be the colored multigraph of signature defined as follows. The universe consists of the nodes for all and a new node for every slice . I.e., . Furthermore, , for all . And for all we let
Example 3.5.
Consider the signature with . Then, consists of the unary relation symbol and binary relation symbols for .
Let be the -structure where and . Then, and . The colored multigraph is the -structure with . The unary symbol is interpreted by the set . See Figure 1(b) for an illustration of .
Note that for each of arity , the number of slices of is . Thus, for we have and , for all . Hence, the total number of edges of is at most . I.e., for the fixed relational signature , the number of nodes and edges of the colored multigraph associated with a -structure is of size , where the factor hidden by the -notation is bounded by for .
Since the number of vertices and the number of edges in both are of size , ˜B is obtained as an immediate consequence of the following Theorem 3.6 and the known running time of CR (cf. Sections 1 and 3.2):
Theorem 3.6.
Let be a -structure. Let be the color assigned to tuple in the -th round of Relational Color Refinement RCR on , and let be the color assigned to node of in round of conventional Color Refinement CR on the colored multigraph . For all and all we have: .
The remainder of this section is devoted to the proof of Theorem 3.6. See the paper’s full version for detailed proofs of all subsequent lemmas and claims. We start with a lemma that summarizes some obvious facts.
Lemma 3.7.
Let be a -structure. Let and let and be elements in , and let be a slice over .
-
(a)
.
-
(b)
and the function with for all is well-defined and bijective.
-
(c)
for every there exists a such that for every there exists an such that .
-
(d)
Let . For all we have: .
Two nodes are called neighbors in if for some .
Remark 3.8.
The following characterization of tuples with will be crucial for our proof of Theorem 3.6.
Lemma 3.9.
Let be a -structure. For all we have: there is a bijection such that for all we have .
It follows from Remark 3.8 that the bijection is unique, if it exists. The following lemma summarizes straightforward properties of the mapping obtained from Lemma 3.9.
Lemma 3.10.
Let be a -structure, let with , and let be the bijection obtained from Lemma 3.9. For all and for and we have: (1) , and (2) , and (3) .
For let . We proceed with the main technical lemma that will enable us to prove Theorem 3.6.
Lemma 3.11.
Let be a -structure. Let be a non-empty set and let be a mapping . Consider with and . Let be the bijection obtained by Lemma 3.9. The following are equivalent:
-
1.
.
-
2.
For all we have:
.
The proof makes heavy use of Lemma 3.10 and is combinatorially quite involved;
in particular, the proof of direction 21
proceeds by an intricate induction that starts with tuples
in with , and the induction step considers tuples
in with a decreasing size of the intersection of and
.
The following fact will be helpful for the subsequent proofs.
Fact 3.11.
For all we have
-
(a)
.
-
(b)
For all and we have:
-
(c)
For all , and for all , we have:
. -
(d)
For all nodes of we have:
and are neighbors in for some .
The first three statements follow immediately from the definition of , see ˜3.4. The last statement follows from Remark 3.8 and the definition of . The following lemma relates the color that CR assigns to the node to the colors it assigns to the nodes for the slices of .
Lemma 3.12.
For all
with and all
we have:
(a) and (b) ,
for all and .
Here, is the bijection from Lemma 3.9.
Finally, we are ready for the proof of Theorem 3.6.
Proof of Theorem 3.6.
We proceed by induction on .
For the induction base , the proof is straightforward by using Section 3.3 and Lemma 3.9.
For the inductive Step, consider an , and let .
Induction hypothesis:
For all and we have: .
Induction Claim:
.
In case that , we have , and hence, by the definition of RCR, holds for all . Furthermore, by the induction base, implies that . Hence, by the definition of CR, holds for all . This yields: and , completing the induction step.
In the following, we consider the case where . From Lemma 3.9 we obtain a bijection satisfying for all .
If , then by definition of RCR and by induction hypothesis. It follows from the definition of CR that as well. Thus, from now on we consider the case that .
Using Lemma 3.12 we get that if, and only if, and for all and it holds that . Applying the same lemma again yields that if, and only if and for all and it holds that and .
Thus, we must show that if, and only if, and for all and it holds that and . Recall that we have . Since , we get that from the induction hypothesis. Hence, it remains to show that if, and only if, for all and it holds that and . It is now easy to see that the following two claims finish the proof.
Claim 3.13.
If , then the following holds for all and :
If , then .
Claim 3.14.
for all and we have .
To prove both claims we use Lemma 3.11, Section 3.3 and the definitions of CR and RCR.
4 Connection to Homomorphism Counts
This section is devoted to proving the equivalence of statements (1) and (2) of Theorem A (b). I.e., we relate the distinguishing power of RCR to distinguishability via homomorphism counts from acyclic -structures.
Acyclic -structures.
Let be a -structure. A join-tree for is a tree (i.e., an undirected, simple graph that is connected and acyclic) with vertex set (i.e., the tuples in serve as vertices of ) and which satisfies the following connectivity condition: for all the set induces a connected subgraph of ; we will denote this subgraph (which in fact is a tree) by .
We call a -structure acyclic if there exists a join-tree for . This definition of acyclicity of -structures is equivalent to acyclicity as defined in the textbook [1], it is equivalent to the notion of alpha-acyclicity as defined in [4, 6] and, finally, is also equivalent to having (generalized or fractional) hypertree width 1 as defined in [17, 18, 24]. In the literature, also other notions of acyclicity for relational structures (and hypergraphs) have been considered; but alpha-acyclicity arguably is the most common and the least restrictive one. Consult [7] for a detailed survey on this topic.
For the special case of binary signatures , i.e., where -structures are colored multigraphs, it is known [7] that a -structure is acyclic if, and only if, its Gaifman graph is acyclic (w.r.t. the usual notion of acyclicity of undirected simple graphs). It is well-known that for non-binary signatures there exist acyclic -structures whose Gaifman graph is not acyclic.
Homomorphisms.
A homomorphism from a -structure to a -structure is a mapping such that for all , for , and all we have . We write for the set of homomorphisms from to , and we let denote the number of homomorphisms from to .
The remainder of this section is dedicated to proving the equivalence of statements (1) and (2) of Theorem A (b), i.e., proving the following theorem.
Theorem C.
For all -structures and , the following statements are equivalent.
-
1.
RCR distinguishes and .
-
2.
There exists an acyclic and connected -structure such that .
This can be viewed as a generalization of the following result by Dvořák [15] and Dell, Grohe, Rattan [13] to arbitrary signatures . While [15, 13] state the theorem just for graphs, it easily extends to colored multigraphs (as noted in [13]). A colored multitree is an acyclic and connected colored multigraph, i.e., a colored multigraph whose Gaifman graph is a tree.
Theorem 4.1 ([15, 13]).
Let and be colored multigraphs. The following statements are equivalent.
-
1.
CR distinguishes and .
-
2.
There exists a colored multitree such that .
Theorem 4.1 will serve as the first key ingredient of our proof of ˜C. The second key ingredient is to use the following notion of a join-tree representation . Recall from ˜3.1 the binary signature and the colored multigraph of signature that represents a -structure . For an acyclic -structure and a join-tree for we define the colored multigraph of signature to have universe where is a new vertex for each tuple , and
for all and all . Via identifying and for all , the Gaifman graph of is isomorphic to a subgraph of . Thus, since is a tree, is acyclic.
The last two ingredients for our proof of ˜C are the following lemmas:
Lemma 4.2.
For -structures and and any join-tree for
we have:
.
Lemma 4.3.
Let and be -structures, and let be a colored multitree of signature such that . There exists an acyclic and connected -structure and a join-tree for such that .
Before proving these lemmas, we first show how to use the four key ingredients for proving ˜C.
Proof of ˜C.
As pointed out in Section 3.2, running RCR on a -structure produces a stable coloring that is equivalent (via identifying with ) to the stable coloring produced by the classical CR on the colored multigraph . Thus, RCR distinguishes the -structures and if, and only if, classical CR distinguishes and . According to Theorem 4.1 the latter is the case if, and only if, there is a colored multitree with .
Hence, for the direction “1 2”, if RCR distinguishes and , then there exists a colored multitree with . By Lemma 4.3, there also exists an acyclic and connected -structure and a join-tree for such that . According to Lemma 4.2, this implies that .
For the direction “2 1”, if there exists an acyclic and connected -structure such that , then according to Lemma 4.2 we have , for any join-tree for . By construction, the Gaifman graph of is a subgraph of . In fact, the Gaifman graph of being connected implies that the Gaifman graph of is exactly . Hence, is a colored multitree. Thus, as pointed out in the first paragraph of the proof, RCR distinguishes and .
The remainder of this section is devoted to proving the Lemmas 4.2 and 4.3. The following notation will be convenient. If is a mapping from at set to a set , and is a tuple in for some , then we write for the tuple .
Proof of Lemma 4.2.
We prove the lemma by providing a bijection between and . Recall that and .
For all let , where is defined by for all .
Since and , we obtain that .
Hence, .
To prove the lemma, it suffices to verify that:
(a) , i.e.,
is a homomorphism for every ;
(b) is injective; and
(c) is surjective.
The proofs of statements (a) and (b) are straightforward.
For the proof of statement (c) let .
Our aim is to find an such that .
By definition of (recall our assumption on -structures described in Section 2), for every there exists an and a tuple such that .
For each let us choose arbitrary, but from now on fixed such and which we henceforth will denote by and , and let us fix an such that is the -th component of the tuple .
Since , by definition of we have .
Since , we obtain .
By definition of there is a tuple such that .
Let us write to denote the -th component of the tuple .
Clearly, .
Using these notions, we define the mapping by letting for every .
Claim 1: For all and with we have: .
Claim 2: .
Proof of Lemma 4.3.
By assumption, , for a colored multitree of signature and -structures , . We will show that the homomorphisms from provide us with “templates” for acyclic connected -structures, one of which must have a number of homomorphisms into that is different from its number of homomorphisms into .
Let us write to denote the Gaifman graph of . Since is a colored multitree, is a tree. For an the print of in is the colored multigraph of signature defined by and, for all and all :
Note that is also the Gaifman graph of . We let . The notion of the print of in for and the set are defined analogously. Note that and are not necessarily disjoint, and that different homomorphisms may have the same print.
For every we let . The number is defined analogously. Note that and – see the paper’s full version for a proof.
For any two prints we say that is a subprint of (for short: ) if and , for all and all . Obviously, is a partial order on .
It can be verified that for every print we have and .
Since and and, by assumption, , there must be a such that . We choose a largest such w.r.t. the partial order . I.e., , but for all with and . Combining this with the fact that and , we obtain: .
Now, all that remains to be done is to show that there exists an acyclic and connected -structure and a join-tree for ( will have exactly the same shape as ) such that is isomorphic to – then implies that . Details on how to construct are given in the paper’s full version. This then completes the proof of Lemma 4.3.
5 Connection to Logic
This section’s goal is to provide a logical characterization of the distinguishing power of RCR. We aim for a theorem that is analogous to the following result due to Immerman and Lander [26] and Cai, Fürer, Immerman [10] concerning the logic , a syntactic extension of first-order logic with counting quantifiers of the form (for every fixed ), expressing “there exist at least values for such that holds”. The restriction of to two variables is denoted by .
Theorem 5.1 ([10, 26]).
Let and be graphs. The following statements are equivalent:
-
1.
CR distinguishes and .
-
2.
There exists a sentence such that and .
5.1 The Guarded Fragment of Counting Logic
This section introduces the guarded fragment of the logic , for short: . Its definition is in the same spirit as the logic (the guarded fragment of first-order logic; see [25]) and the logic (the guarded fragment of any logic that is a subset of first-order logic; see [18]). Here, we use a similar notation as in [18], but adapt it in order to obtain a reasonable notion of “guarded fragment of ”. As in [25, 18], the term “guarded” refers to the fact that quantifiers are appropriately relativized by relational atoms.
We have available a countably infinite set of variables. We call a tuple of distinct variables of the form a variable tuple, and we let . Recall that at the beginning of Section 3 we have chosen an arbitrary (relational) signature that is fixed throughout the rest of this paper.
Definition 5.2 (Syntax of ).
The logic is inductively defined along with the free variables and the guard-depth, formalized by the functions and .
- Atomic Formulas:
-
For all with , all and all , the following formulas (of signature ) are in : is of the form
-
1.
with and ;
-
2.
with and .
-
1.
- Inductive Rules:
-
Let be formulas (of signature ) in . The following formulas (of signature ) are in : is of the form
-
3.
with and ;
-
4.
with and .
An atomic formula (of signature ) of the form in is called a guard for , if . Let and let be a guard for . For every variable tuple with , the following formula (of signature ) is in : is of the form
-
5.
with and .
-
3.
In this paper we assume w.l.o.g. that the variable tuple after a quantifier is ordered, i.e., . This has no effect on the semantics, but simplifies some arguments. We write as shorthand for . We omit parentheses in the usual way.
Definition 5.3 (Semantics of ).
A -interpretation is a tuple consisting of a -structure and a function . Formulas (of signature ) in are evaluated on -interpretations . We write to denote that satisfies , and to denote that does not satisfy . By we denote the -interpretation with for all , and for all . The semantics of formulas in are inductively defined as follows:
We will use the following conventions throughout the paper: denotes that ; and denotes that where is an assignment where holds for all . A sentence is a formula that has no free variable, i.e., . If is a sentence, we write to denote that for any assignment (since does not matter in this case). We write to denote that the -structures and satisfy the same sentences (of signature ) in . Finally, we say that and are distinguishable in if .
Example 5.4.
Clearly, is a formula in with and . Thus, is a guard for . Hence, with , and .
The formula states that there is at least one tuple (of arity ) in such that the first 3 entries of this tuple form a triangle w.r.t. relation . Hence, and , which means and are distinguishable in .
5.2 The Guarded Game
Our ultimate goal in Section 5 is to prove for any two -structures and that RCR distinguishes and if, and only if, . Similarly to the proof of Theorem 5.1, our proof will use, as an intermediate step, a game characterization of (in)distinguishability of two -structures in . We call this game the Guarded Game; it is defined as follows. It is played on two -structures , . A configuration of the Guarded Game is a tuple of the form , where and are the given -structures and , for some . A configuration is called distinguishing, if or there are an and indices such that . We may omit parentheses if they are clear from the context. If , we write for the configuration , and we call this the empty configuration; note that this configuration is not distinguishing.
A round of the Guarded Game is played as follows: consider , to be the configuration at the beginning of the round. Spoiler picks a relation symbol . Then, Duplicator provides a bijection between and . If no such bijection exists (i.e., ), the round ends and Spoiler wins this round; otherwise the round proceeds as follows. Spoiler picks some and creates the new configuration , where . Duplicator wins this round if the new configuration is not distinguishing and . Otherwise, Spoiler wins this round.
Duplicator has a -round winning strategy on , if the configuration is not distinguishing. For , Duplicator has an -round winning strategy on , if this configuration is not distinguishing, and she can provide a bijection for every that Spoiler may pick, such that for every and that Spoiler may choose, she wins the current round and has an -round winning strategy on the resulting configuration , . Spoiler has an -round winning-strategy on , , if Duplicator does not have one. In particular, if Spoiler has a winning strategy for rounds, he also has a winning strategy for more than rounds. If and are not of strictly equal size, then Spoiler has a trivial -round winning strategy, because Duplicator is unable to give a bijection in the first round. We say that Duplicator wins the Guarded Game on , if she has an -round winning strategy on , for every .
5.3 RCR is equivalent to and the Guarded Game
This section is devoted to proving the equivalence of statements (1), (3) and (4) of Theorem A (b), i.e., we prove the following theorem.
Theorem D.
For all -structures and , the following statements are equivalent.
-
1.
RCR distinguishes and .
-
2.
There exists a sentence such that and .
-
3.
Spoiler wins the Guarded Game on .
We prove the theorem by showing that the implication chain holds. For this, we use the following three lemmas; their proofs are inductive and quite similar to the way the analogous result on graphs is shown, thus we defer their proofs to the paper’s full version. The arity (atomic type , similarity type ) of a color is defined as the arity (atomic type, similarity type) of the tuples in that receive this color. Recall that we denote the color of a tuple after iterations of RCR as .
Lemma 5.5.
For every -structure , every , and every of arity , there exists a formula with such that for every -structure of size strictly equal to and every of arity we have: .
Lemma 5.6.
Let and be -structures of strictly equal size and let , be arbitrary tuples of arity . Let be a tuple of distinct variables. If there exists a formula with such that , then Spoiler has a -round winning strategy for the Guarded Game on , .
Lemma 5.7.
Let , let and be -structures of strictly equal size, and let , be tuples of arity . If , then the configuration , is not distinguishing. Further, if and for all , then Duplicator has an -round winning strategy for the Guarded Game on , .
The proof of ˜D proceeds as follows. If and are not of strictly equal size, it is straightforward to see that each of the theorem’s three statements is fulfilled. For the case where and are of strictly equal size, “12” easily follows from Lemma 5.5, and “23” is obtained from Lemma 5.6. Concerning “31”, one proves the contraposition and uses Lemma 5.7 to obtain a winning strategy for Duplicator in the Guarded Game; this winning strategy ensures that after each round, the configuration is of the form , where and have the same color in the stable coloring produced by RCR on and .
6 Final Remarks
We introduced Relational Color Refinement (RCR) as an adaptation of
the classical Color Refinement (CR) procedure for arbitrary relational
structures. We showed that it can be implemented with the same running time
as CR (˜B).
Furthermore, we showed that the
distinguishing power of RCR admits an
analogous combinatorial (˜C) and logical
(˜D) characterization as CR.
Combining the Theorems B, C
and D yields our main result, ˜A, formulated in Section 1.
There are multiple directions for further research:
-
One interesting task is to lift the results of [2, 28] from CR and to RCR and . This is non-trivial, because is capable of identifying certain -structures that cannot be identified in : consider, e.g., the signature with and the -structure with and . It is easy to construct a -sentence that is satisfied by but by no -structure that is not isomorphic to and where every node is contained in some relation (recall the assumption on structures we adopted in Section 2). But there does not exist any -sentence with the same property – in fact, according to [28, Proof of Corollary 5.9], the logic does not identify any -structure whose universe contains elements and where contains a relation symbol of arity .
-
Considering that CR is equivalent to the 1-dimensional Weisfeiler-Leman algorithm (WL), RCR might be a good basis to devise a generalization of the -dimensional WL to arbitrary relational structures.
-
Given the close relationship between relational structures and hypergraphs, a coloring method similar to RCR should exist for hypergraphs, too. However, RCR relies heavily on the order that the tuples provide – and this order is absent in hyperedges. Thus, it is not clear how to adapt the refinement of the colors in an iteration step from tuples to hyperedges.
-
Another promising direction is to think about the applications of RCR, given that there are so many applications of classical CR, also apart from isomorphism testing (cf. Section 1). In particular, we conjecture that some of the techniques developed in this paper can be used to lift the result by Riveros, Scheidt, Schweikardt [31] from binary structures to arbitrary structures. Further, the tight connection between CR and Graph Neural Networks suggests interesting applications for RCR in machine learning as well. We plan to investigate this.
References
- [1] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.
- [2] V. Arvind, Johannes Köbler, Gaurav Rattan, and Oleg Verbitsky. Graph Isomorphism, Color Refinement, and Compactness. Computational Complexity, 26(3):627–685, September 2017. doi:10.1007/s00037-016-0147-6.
- [3] László Babai. Graph isomorphism in quasipolynomial time [extended abstract]. In Daniel Wichs and Yishay Mansour, editors, Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 684–697. ACM, 2016. doi:10.1145/2897518.2897542.
- [4] Catriel Beeri, Ronald Fagin, David Maier, and Mihalis Yannakakis. On the desirability of acyclic database schemes. J. ACM, 30(3):479–513, 1983. doi:10.1145/2402.322389.
- [5] Christoph Berkholz, Paul S. Bonsma, and Martin Grohe. Tight lower and upper bounds for the complexity of canonical colour refinement. Theory Comput. Syst., 60(4):581–614, 2017. doi:10.1007/S00224-016-9686-0.
- [6] Philip A. Bernstein and Nathan Goodman. Power of natural semijoins. SIAM J. Comput., 10(4):751–771, 1981. doi:10.1137/0210059.
- [7] Johann Brault-Baron. Hypergraph Acyclicity Revisited. ACM Comput. Surv., 49(3):54:1–54:26, 2016. doi:10.1145/2983573.
- [8] Silvia Butti and Víctor Dalmau. Fractional Homomorphism, Weisfeiler-Leman Invariance, and the Sherali-Adams Hierarchy for the Constraint Satisfaction Problem. In Filippo Bonchi and Simon J. Puglisi, editors, 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021), volume 202 of Leibniz International Proceedings in Informatics (LIPIcs), pages 27:1–27:19, Dagstuhl, Germany, 2021. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.MFCS.2021.27.
- [9] Jan Böker. Color Refinement, Homomorphisms, and Hypergraphs. In Ignas Sau and Dimitrios M. Thilikos, editors, Graph-Theoretic Concepts in Computer Science - 45th International Workshop, WG 2019, Vall de Núria, Spain, June 19-21, 2019, Revised Papers, volume 11789 of Lecture Notes in Computer Science, pages 338–350. Springer, 2019. doi:10.1007/978-3-030-30786-8_26.
- [10] Jin-Yi Cai, Martin Fürer, and Neil Immerman. An optimal lower bound on the number of variables for graph identification. Combinatorica, 12(4):389–410, December 1992. doi:10.1007/BF01305232.
- [11] A. Cardon and Maxime Crochemore. Partitioning a Graph in . Theor. Comput. Sci., 19:85–98, 1982. doi:10.1016/0304-3975(82)90016-0.
- [12] Anuj Dawar, Tomáš Jakl, and Luca Reggio. Lovász-type theorems and game comonads. In 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2021, Rome, Italy, June 29 - July 2, 2021, pages 1–13. IEEE, 2021. doi:10.1109/LICS52264.2021.9470609.
- [13] Holger Dell, Martin Grohe, and Gaurav Rattan. Lovász meets Weisfeiler and Leman. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, volume 107 of LIPIcs, pages 40:1–40:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPIcs.ICALP.2018.40.
- [14] Reinhard Diestel. Graph Theory, volume 173 of Graduate Texts in Mathematics. Springer-Verlag, Heidelberg, 6th edition, 2025. URL: https://diestel-graph-theory.com/.
- [15] Zdeněk Dvořák. On recognizing graphs by numbers of homomorphisms. Journal of Graph Theory, 64(4):330–342, 2010. doi:10.1002/jgt.20461.
- [16] Eva Fluck, Tim Seppelt, and Gian Luca Spitzer. Going deep and going wide: Counting logic and homomorphism indistinguishability over graphs of bounded treedepth and treewidth. In Aniello Murano and Alexandra Silva, editors, 32nd EACSL Annual Conference on Computer Science Logic, CSL 2024, February 19-23, 2024, Naples, Italy, volume 288 of Leibniz International Proceedings in Informatics (LIPIcs), pages 27:1–27:17, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.CSL.2024.27.
- [17] Georg Gottlob, Nicola Leone, and Francesco Scarcello. Hypertree decompositions and tractable queries. J. Comput. Syst. Sci., 64(3):579–627, 2002. doi:10.1006/jcss.2001.1809.
- [18] Georg Gottlob, Nicola Leone, and Francesco Scarcello. Robbers, marshals, and guards: Game theoretic and logical characterizations of hypertree width. Journal of Computer and System Sciences, 66(4):775–808, 2003. doi:10.1016/S0022-0000(03)00030-8.
- [19] Martin Grohe. Descriptive Complexity, Canonisation, and Definable Graph Structure Theory, volume 47 of Lecture Notes in Logic. Cambridge University Press, 2017. doi:10.1017/9781139028868.
- [20] Martin Grohe. Counting Bounded Tree Depth Homomorphisms. In Holger Hermanns, Lijun Zhang, Naoki Kobayashi, and Dale Miller, editors, Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’20, pages 507–520, New York, NY, USA, 2020. Association for Computing Machinery. doi:10.1145/3373718.3394739.
- [21] Martin Grohe. Word2vec, Node2vec, Graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data. In Dan Suciu, Yufei Tao, and Zhewei Wei, editors, Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS’20, pages 1–16. ACM, 2020. doi:10.1145/3375395.3387641.
- [22] Martin Grohe, Kristian Kersting, Martin Mladenov, and Pascal Schweitzer. Color Refinement and Its Applications. In Guy Van den Broeck, Kristian Kersting, Sriraam Natarajan, and David Poole, editors, An Introduction to Lifted Probabilistic Inference. The MIT Press, 2021. doi:10.7551/mitpress/10548.003.0023.
- [23] Martin Grohe, Kristian Kersting, Martin Mladenov, and Erkal Selman. Dimension Reduction via Colour Refinement. In Andreas S. Schulz and Dorothea Wagner, editors, Algorithms - ESA 2014, Lecture Notes in Computer Science, pages 505–516, Berlin, Heidelberg, 2014. Springer. doi:10.1007/978-3-662-44777-2_42.
- [24] Martin Grohe and Dániel Marx. Constraint solving via fractional edge covers. ACM Trans. Algorithms, 11(1):4:1–4:20, 2014. doi:10.1145/2636918.
- [25] Erich Grädel. On the Restraining Power of Guards. Journal of Symbolic Logic, 64(4):1719–1742, 1999. doi:10.2307/2586808.
- [26] Neil Immerman and Eric Lander. Describing Graphs: A First-Order Approach to Graph Canonization. In Alan L. Selman, editor, Complexity Theory Retrospective: In Honor of Juris Hartmanis on the Occasion of His Sixtieth Birthday, July 5, 1988, pages 59–81. Springer, New York, NY, 1990. doi:10.1007/978-1-4612-4478-3_5.
- [27] Sandra Kiefer. The Weisfeiler-Leman Algorithm: An Exploration of Its Power. ACM SIGLOG News, 7(3):5–27, November 2020. doi:10.1145/3436980.3436982.
- [28] Sandra Kiefer, Pascal Schweitzer, and Erkal Selman. Graphs Identified by Logics with Counting. ACM Transactions on Computational Logic, 23(1):1:1–1:31, October 2021. doi:10.1145/3417515.
- [29] Laura Mančinska and David E. Roberson. Quantum isomorphism is equivalent to equality of homomorphism counts from planar graphs. In Sandy Irani, editor, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 661–672. IEEE, 2020. doi:10.1109/FOCS46700.2020.00067.
- [30] Yoàv Montacute and Nihil Shah. The Pebble-Relation Comonad in Finite Model Theory. In Christel Baier and Dana Fisman, editors, Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science, number 13 in LICS ’22, pages 1–11, New York, NY, USA, 2022. Association for Computing Machinery. doi:10.1145/3531130.3533335.
- [31] Cristian Riveros, Benjamin Scheidt, and Nicole Schweikardt. Using Color Refinement to Boost Enumeration and Counting for Acyclic CQs of Binary Schemas. CoRR, 2024. doi:10.48550/arXiv.2405.12358.
- [32] David E. Roberson. Oddomorphisms and homomorphism indistinguishability over graphs of bounded degree. CoRR, 2022. arXiv:2206.10321.
- [33] Benjamin Scheidt. On Homomorphism Indistinguishability and Hypertree Depth. In Karl Bringmann, Martin Grohe, Gabriele Puppis, and Ola Svensson, editors, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024), volume 297 of Leibniz International Proceedings in Informatics (LIPIcs), pages 152:1–152:18, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ICALP.2024.152.
- [34] Benjamin Scheidt and Nicole Schweikardt. Counting Homomorphisms from Hypergraphs of Bounded Generalised Hypertree Width: A Logical Characterisation. In Jérôme Leroux, Sylvain Lombardy, and David Peleg, editors, 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023), volume 272 of Leibniz International Proceedings in Informatics (LIPIcs), pages 79:1–79:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.MFCS.2023.79.
- [35] Benjamin Scheidt and Nicole Schweikardt. Color Refinement for Relational Structures. CoRR, 2024. doi:10.48550/arXiv.2407.16022.
- [36] Tim Seppelt. Logical equivalences, homomorphism indistinguishability, and forbidden minors. In Jérôme Leroux, Sylvain Lombardy, and David Peleg, editors, 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023), volume 272 of Leibniz International Proceedings in Informatics (LIPIcs), pages 82:1–82:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.MFCS.2023.82.
- [37] Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. Weisfeiler-Lehman Graph Kernels. J. Mach. Learn. Res., 12:2539–2561, 2011. doi:10.5555/1953048.2078187.
- [38] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How Powerful are Graph Neural Networks? In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL: https://openreview.net/forum?id=ryGs6iA5Km.
