Color Refinement for Relational Structures

Scheidt, Benjamin; Schweikardt, Nicole

doi:10.4230/LIPIcs.MFCS.2025.88

Color Refinement for Relational Structures

Benjamin Scheidt

Humboldt-Universität zu Berlin, Germany Nicole Schweikardt

Humboldt-Universität zu Berlin, Germany

Abstract

Color Refinement, also known as Naive Vertex Classification, is a classical method to distinguish graphs by iteratively computing a coloring of their vertices. While it is traditionally used as an imperfect way to test for isomorphism, the algorithm has permeated many other, seemingly unrelated, areas of computer science. The method is algorithmically simple, and it has a well-understood distinguishing power: it has been logically characterized by Immerman and Lander (1990) and Cai, Fürer, Immerman (1992), who showed that it distinguishes precisely those graphs that can be distinguished by a sentence of first-order logic with counting quantifiers and only two variables. A combinatorial characterization was given by Dvořák (2010), who showed that it distinguishes precisely those graphs that differ in the number of homomorphisms from some tree.

In this paper, we introduce Relational Color Refinement (RCR, for short), a generalization of the Color Refinement method from graphs to arbitrary relational structures, whose distinguishing power admits the equivalent combinatorial and logical characterizations as Color Refinement has on graphs: we show that RCR distinguishes precisely those structures that differ in the number of homomorphisms from an acyclic connected relational structure. Further, we show that RCR distinguishes precisely those structures that are distinguished by a sentence of the guarded fragment of first-order logic with counting quantifiers. Additionally, we show that for every fixed finite relational signature, RCR can be implemented to run on structures of that signature in time $\mathcal{O}(N\cdot\log N)$ , where $N$ denotes the number of tuples present in the structure.

Keywords and phrases:

color refinement, counting logics, homomorphism counts, homomorphism indistinguishability, guarded logics, pebble games, relational structures, alpha-acyclicity, join-trees

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Finite Model Theory ; Theory of computation

\rightarrow

Graph algorithms analysis ; Mathematics of computing

\rightarrow

Graph theory

Editors:

Paweł Gawrychowski, Filip Mazowiecki, and Michał Skrzypczak

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Color Refinement (CR, for short) constitutes a simple procedure to classify the vertices of a graph $G$ ; it is well-understood and widely used in many areas of computer science. The idea is a simple iteration: given a coloring $\operatorname{\gamma}$ of the vertices $V(G)$ of a graph $G$ , one computes a new coloring $\operatorname{\gamma}^{\prime}$ of $V(G)$ following a certain procedure. The new coloring $\operatorname{\gamma}^{\prime}$ is then used to compute another coloring $\operatorname{\gamma}^{\prime\prime}$ following the same procedure, and so on, until the coloring stabilizes, i.e., the partitioning of $V(G)$ induced by the new coloring is the same as the one induced by the previous coloring. The procedure to compute the new coloring is very simple: two vertices $u, v$ shall get different colors if they already have different colors, or they have a different number of neighbors of some color. Otherwise, they receive the same color. To start the iteration, one either uses the colors of the vertices (if $G$ is a colored graph), or the uniform coloring that assigns to every vertex the same color. This approach is sometimes also called “naive vertex classification” or the “1-dimensional Weisfeiler-Leman algorithm”. CR is often formalized using multisets in the following way, see e.g. [19, Chapter 3]. Considering an uncolored, undirected, simple graph $G$ , we start with $\operatorname{\gamma}_{0}(v)=0$ for all $v\in V(G)$ , and for all $i\in\mathbb{N}$ , we let $\operatorname{\gamma}_{i+1}(v)\coloneqq\bigl{(}\operatorname{\gamma}_{i}(v),\,% \{\!\!\{\,{\operatorname{\gamma}_{i}(w)\;:\;\{\,{v,w}\,\}\in E(G)}\,\}\!\!\}% \bigr{)}\,.$ Note that this formalizes the procedure described above: if $\operatorname{\gamma}_{i}(u)\neq\operatorname{\gamma}_{i}(v)$ , then $\operatorname{\gamma}_{i+1}(u)\neq\operatorname{\gamma}_{i+1}(v)$ ; and if $u$ and $v$ disagree in the number of neighbors of some color, then we have $\{\!\!\{{\operatorname{\gamma}_{i}(w)\;:\;\{\,{u,w}\,\}\in E(G)}\}\!\!\}\neq\{% \!\!\{{\operatorname{\gamma}_{i}(w)\;:\;\{\,{v,w}\,\}\in E(G)}\}\!\!\}$ , hence, $\operatorname{\gamma}_{i+1}(u)\neq\operatorname{\gamma}_{i+1}(v)$ . This formalization has the additional advantage that the colorings $\operatorname{\gamma}_{i}$ assign canonical colors, i.e., the colors themselves do not depend on the vertex set of $G$ . Cardon and Crochemore [11] showed that CR can be implemented to run in time $\mathcal{O}((n+m)\cdot\log(n))$ , where $n$ denotes the number of vertices, and $m$ the number of edges; and Berkholz, Bonsma, Grohe [5] showed that even a canonical coloring can be computed within the same running time.

Applications.

An obvious application of CR is to test for graph isomorphism: if there is an $i\in\mathbb{N}$ and a $c$ such that $\lvert{}{\{\,{v\in V(G)\;:\;\operatorname{\gamma}_{i}(v)=c}\,\}}\rvert\neq% \lvert{}{\{\,{v\in V(H)\;:\;\operatorname{\gamma}_{i}(v)=c}\,\}}\rvert$ (we say CR distinguishes $G$ and $H$ if this is true), then $G$ and $H$ cannot be isomorphic. However, this test is not perfect, since there exist non-isomorphic pairs of graphs that are not distinguished by CR. Nevertheless, it is a common subroutine in practical isomorphism testers and even plays a part in Babai’s seminal result that graph isomorphism is decidable in quasi-polynomial time [3]. In recent years, the classification of “similar vertices” that CR establishes has been applied to other problems as well: it was used in [23, 22] to reduce the cost of solving linear programs, in [31] it was used to speed up the evaluation of binary acyclic conjunctive queries, and in the area of machine learning, it is used as a graph kernel [37] and was proven to be equivalent to so-called Graph Neural Networks (GNNs) w.r.t. vertex classification [38, 21].

The power of CR.

The power of CR is well-understood [2, 5, 10, 15, 26, 28] – consult e.g. [19, 27] for an overview. Some key results on the distinguishing power of CR are a logical characterization due to Immerman and Lander [26] and Cai, Fürer, Immerman [10], and a combinatorial characterization w.r.t. the concept of “homomorphism indistinguishability” due to Dvořák [15] and Dell, Grohe, Rattan [13]. In [10, 26] it is shown that CR distinguishes $G$ and $H$ if, and only if, there is a sentence $\varphi$ of first-order logic with counting quantifiers with at most 2 variables ( $\mathsf{C}^{2}$ , for short) such that $G\models\varphi$ and $H\not\models\varphi$ . In [15, 13] it is shown that CR distinguishes two graphs $G$ and $H$ if, and only if, $G$ and $H$ differ in the number of homomorphisms from some tree $T$ into $G$ and $H$ . If no such $T$ exists, one says that $G$ and $H$ are homomorphism indistinguishable over the class of trees. This result sparked active research in recent years exploring the concept of homomorphism indistinguishability over various graph classes, see e.g. [12, 16, 20, 29, 30, 32, 36, 33, 34]. These characterizations can explain the success of the CR method, and in particular, they give us a hint on why the vertex classification produced by CR is so powerful: two vertices $u,v\in V(G)$ get classified as “similar” by CR if the number of homomorphisms from every rooted tree $T$ into $G$ that map the root to $u$ is equal to the number of homomorphisms from $T$ into $G$ that map the root to $v$ .

Contributions.

With the success of CR in mind, it is an obvious question how one could devise a method to color arbitrary finite relational structures, not just graphs. In particular, we would like a method that admits a combinatorial characterization w.r.t. homomorphism counts from the class of acyclic connected relational structures (for a sensible, broad definition of acyclicity) and a logical characterization for a sensible logic, analogously to the ones of CR mentioned above. We propose Relational Color Refinement (RCR, for short) as such a method and show that it indeed admits the desired characterizations and, at the same time, can be implemented to run in time comparable to the running time of classical CR.
Our main result reads as follows.

Theorem A.

Let $\sigma$ be a finite relational signature.

(a)

RCR can be implemented to run in time $\mathcal{O}(N\cdot\log N)$ upon input of a $\sigma$ -structure $\mathcal{A}$ , where $N$ denotes the number of tuples present in $\mathcal{A}$ .
(b)
For all $\sigma$ -structures $\mathcal{A}$ and $\mathcal{B}$ , the following statements are equivalent:
1. (1)
  
  RCR distinguishes $\mathcal{A}$ and $\mathcal{B}$ .
2. (2)
  
  There exists an acyclic and connected $\sigma$ -structure $\mathcal{C}$ such that $\operatorname{hom}(\mathcal{C},\mathcal{A})\neq\operatorname{hom}(\mathcal{C},% \mathcal{B})$ .
3. (3)
  
  There exists a sentence $\varphi\in{\mathsf{GF}}(\mathsf{C})$ such that $\mathcal{A}\models\varphi$ and $\mathcal{B}\not\models\varphi$ .
4. (4)
  
  Spoiler wins the Guarded Game on $\mathcal{A},\mathcal{B}$ .

Here, $\operatorname{hom}(\mathcal{C},\mathcal{A})$ denotes the number of homomorphisms from $\mathcal{C}$ to $\mathcal{A}$ (cf. Section 4), while ${\mathsf{GF}}(\mathsf{C})$ is the guarded fragment of the logic $\mathsf{C}$ (cf. Section 5.1), and the Guarded Game is a particular variant of Ehrenfeucht-Fraïssé games (cf. Section 5.2). The technically most challenging parts are proving (a) and proving the equivalence of (1) and (2) in (b).

Section 2 introduces the necessary basic concepts and notations used throughout the paper. In Section 3 we introduce RCR, discuss its connection to CR, and provide a proof of Theorem A (a). Section 4 is devoted to the proof of the equivalence of statements (1) and (2) of Theorem A (b). Section 5 starts by introducing the logic ${\mathsf{GF}}(\mathsf{C})$ and the Guarded Game, followed by the proof of the equivalence of statements (1), (3) and (4) of Theorem A (b). Finally, we conclude in Section 6 with a summary and an outlook on future work. Due to space limitations, many details had to be deferred to the paper’s full version.

Further related work.

The articles [2, 28] studied related, but different questions that do not concern the power of distinguishing two given graphs, but rather the power of distinguishing one given graph from all other graphs. Arvind, Köbler, Rattan, Verbitsky [2] provided a characterization of those graphs $G$ that are amenable by CR in the sense that CR distinguishes $G$ from every graph $H$ that is not isomorphic to $G$ . Kiefer, Schweitzer, Selman [28] provided a characterization of those graphs and, more generally, finite relational structures that can be identified (i.e., described up to isomorphism) by a sentence of the logic $\mathsf{C}^{2}$ . In [2] it is noted that by the result of [26, 10] it follows that “amenability by CR” and “identifiability by $\mathsf{C}^{2}$ ” are equivalent notions.

The work by Dell, Grohe, Rattan [13] has been generalized to relational structures by Butti and Dalmau [8]. However, they apply classical CR on the incidence graph of the relational structure and use the weaker notion of Berge-acyclicity that is subsumed by our notion of acyclicity; consequently, the distinguishing power of their algorithm is considerably weaker than that of RCR.

There is also related work in this direction on hypergraphs, which are conceptually similar to relational structures. Böker [9] introduced a variant of CR that works on hypergraphs, and showed that it distinguishes two hypergraphs if, and only if, there is a Berge-acyclic hypergraph that has a different number of homomorphisms to them. Again, the distinguishing power yielded by Berge-acyclicity is considerably weaker than that of the more general acyclicity notion considered in our work. The connection between the logic $\mathsf{C}^{2}$ and homomorphism counts from trees due to [15, 13] was generalized to the logic $\mathsf{GC}$ (a logic similar to the guarded fragment ${\mathsf{GF}}(\mathsf{C})$ , but tailored towards hypergraphs) and homomorphism counts from acyclic hypergraphs by Scheidt and Schweikardt [34].

Riveros, Scheidt, Schweikardt [31] use CR to speed up the evaluation of acyclic conjunctive queries on edge- and vertex-labeled graphs. Beyond CR itself, there is no overlap in the technical contributions between this paper and [31].

2 Preliminaries

Basic notation.

We write $\mathbb{N}$ for the set of non-negative integers, and we let $\mathbb{N}_{\geqslant{}1}\coloneqq\mathbb{N}\setminus\{{0}\}$ . For $n\in\mathbb{N}$ we write $[n]$ to denote the set $\{\,{i\in\mathbb{N}\;:\;1\leqslant i\leqslant n}\,\}$ (i.e., $[0]=\varnothing$ , $[1]=\{{1}\}$ , and $[n]=\{\,{1,\ldots,n}\,\}$ for $n\geqslant 2$ ). For a set $S$ we write $2^{S}$ to denote the power set (i.e., the set of all subsets) of $S$ ; and for $k\in\mathbb{N}$ we let $\binom{S}{k}\coloneqq\{\,{X\subseteq S\;:\;|X|=k}\,\}$ . We use bold letters ${\bm{a}}$ to denote tuples $(a_{1},\dots,a_{k})$ . The tuple’s arity $\operatorname{ar}({\bm{a}})$ is $k$ , and $a_{i}$ denotes the tuple’s $i$ -th entry (for $i\in[k]$ ). We let $\operatorname{set}({\bm{a}})=\{\,{a_{1},\ldots,a_{k}}\,\}$ . We write $\{\,{a_{1}\to b_{1},\;\ldots,\;a_{k}\to b_{k}}\,\}$ to describe the function $f\colon\{\,{a_{1},\ldots,a_{k}}\,\}\to\{\,{b_{1},\ldots,b_{k}}\,\}$ with $f(a_{i})=b_{i}$ for $i\in[k]$ .

A multiset $M$ is a tuple $(S,f)$ , where $S$ is a set and $f$ is a function $f\colon S\to\mathbb{N}_{\scriptscriptstyle\geqslant 1}$ ; the number $f(s)$ indicates the multiplicity with which the element $s\in S$ occurs in the multiset $M$ . We write $\operatorname{mult}_{M}(x)$ to denote the multiplicity of $x$ in the multiset $M$ ; in particular, $\operatorname{mult}_{M}(x)=0$ denotes that $x\not\in S$ . We adopt the usual notation for multisets using brackets $\{\!\!\{\,{\cdots}\,\}\!\!\}$ in which each $s\in S$ is listed exactly $\operatorname{mult}_{M}(s)$ times. E.g., $\{\!\!\{\,{a,a,b}\,\}\!\!\}$ denotes the multiset $(\{\,{a,b}\,\},\{\,{a\to 2,b\to 1}\,\})$ .

A coloring of a set $S$ is a function $\gamma\colon S\to C$ for some set $C$ . Let $\alpha\colon S\to C_{\alpha}$ and $\beta\colon S\to C_{\beta}$ be two colorings of the same set $S$ . We say that $\alpha$ refines $\beta$ , if for all $u,v\in S$ : $\alpha(u)=\alpha(v)\implies\beta(u)=\beta(v)$ . The colorings $\alpha$ and $\beta$ are equivalent, if for all $u,v\in S$ : $\alpha(u)=\alpha(v)\iff\beta(u)=\beta(v)$ .

An (uncolored, undirected, simple) graph is a tuple $G\coloneqq(V(G),E(G))$ , where $V(G)$ is a finite set of vertices and $E(G)\subseteq\binom{V(G)}{2}$ is a set of edges. A forest is an acyclic graph; and a tree is a connected forest.

Relational Structures.

A (finite, relational) signature $\sigma$ is a finite, non-empty set; the elements in $\sigma$ are called relation symbols. Every $R\in\sigma$ has an associated arity $\operatorname{ar}(R)\in\mathbb{N}_{\scriptscriptstyle\geqslant 1}$ . The arity of $\sigma$ is defined as $\operatorname{ar}(\sigma)\coloneqq\max\{\,{\operatorname{ar}(R)\;:\;R\in\sigma% }\,\}$ . By ${\sigma|_{k}}$ , for $k\in\mathbb{N}_{\scriptscriptstyle\geqslant 1}$ , we denote the subset $\{\,{R\in\sigma\;:\;\operatorname{ar}(R)=k}\,\}$ of relation symbols of $\sigma$ with arity exactly $k$ .

A structure $\mathcal{A}$ of signature $\sigma$ (for short, $\sigma$ -structure) consists of a finite, non-empty set $V(\mathcal{A})$ (called the universe of $\mathcal{A}$ ), and a relation $R^{\mathcal{A}}\subseteq{V(\mathcal{A})}^{\operatorname{ar}(R)}$ for every $R\in\sigma$ . We additionally require that every $v\in V(\mathcal{A})$ occurs as an entry in at least one tuple of at least one relation of $\mathcal{A}$ – note that this assumption can easily be met, e.g. by inserting into $\sigma$ a new relation symbol $U$ of arity 1 with $U^{\mathcal{A}}=V(\mathcal{A})$ (here, we identify a tuple $(v)$ of arity 1 with the element $v$ ). By ${\bm{A}}$ we denote the set $\bigcup_{R\in\sigma}R^{\mathcal{A}}$ of all tuples that belong to some relation of $\mathcal{A}$ . We define the size¹¹1In the literature, usually $\lVert{}{\mathcal{A}}\rVert$ denotes the size of a reasonable representation of $\mathcal{A}$ as input for an algorithm. Since we consider $\sigma$ to be a fixed signature and restrict attention to structures where each $v\in V(\mathcal{A})$ occurs in at least one tuple in ${\bm{A}}$ , within the $\mathcal{O}$ -notation our notion of $\lVert{}{\mathcal{A}}\rVert$ is equivalent to the one used in the literature. of $\mathcal{A}$ as $\lVert{}{\mathcal{A}}\rVert\coloneqq\lvert{}{{\bm{A}}}\rvert$ . We say that two $\sigma$ -structures $\mathcal{A}$ , $\mathcal{B}$ have strictly equal size, if $\lvert{}{R^{\mathcal{A}}}\rvert=\lvert{}{R^{\mathcal{B}}}\rvert$ for every $R\in\sigma$ .

The Gaifman graph of $\mathcal{A}$ is defined as the (undirected, simple) graph $G$ with $V(G)=V(\mathcal{A})$ and where $E(G)$ consists of all $\{\,{u,v}\,\}\in\binom{V(G)}{2}$ for which there is a tuple ${\bm{a}}\in{\bm{A}}$ with $u,v\in\operatorname{set}({\bm{a}})$ . A $\sigma$ -structure is called connected if its Gaifman graph is connected.

A binary signature is a signature $\sigma$ where every $R\in\sigma$ has arity $\leqslant 2$ . A colored multigraph $\mathcal{G}$ is a structure of a binary signature. The binary relations of $\mathcal{G}$ can be viewed as directed edge relations that carry specific labels, and the unary relations of $\mathcal{G}$ can be viewed as assigning specific labels to the vertices of $\mathcal{G}$ .

Color Refinement (CR).

Color Refinement (CR) can be adapted to colored multigraphs by including the vertex labels and the loops in the base color, and the edge labels in the iteration. This can be formalized as follows. Let $\sigma$ be a binary signature, and let $\mathcal{G}$ be a $\sigma$ -structure. For every $v\in V(\mathcal{G})$ , let $\operatorname{\gamma}_{0}(v)=(\{{C\in{\sigma|_{1}}\;:\;v\in C^{\mathcal{G}}}\}% ,\,\{{R\in{\sigma|_{2}}\;:\;(v,v)\in{R}^{\mathcal{G}}}\})$ , and for all $i\in\mathbb{N}$ let

\operatorname{\gamma}_{i+1}(v)\;\coloneqq\;\bigl{(}\operatorname{\gamma}_{i}(v% ),\,\{\!\!\{{\,(\lambda(v,w),\operatorname{\gamma}_{i}(w))\;:\;\{{v,w}\}\in E(% G)}\}\!\!\}\bigr{)},

where $G$ denotes the Gaifman graph of $\mathcal{G}$ and

\lambda(v,w)\ \coloneqq\ \big{\{}\,{R^{+}\;:\;R\in{\sigma|_{2}},\ (v,w)\in{R}^% {\mathcal{G}}}\,\big{\}}\;\cup\;\big{\{}\,{R^{-}\;:\;R\in{\sigma|_{2}},\ (w,v)% \in{R}^{\mathcal{G}}}\,\big{\}}.

Types.

The notion of atomic type and the similarity type of tuples will be crucial for the definition of RCR. Let $\sigma$ be an arbitrary signature. For a $\sigma$ -structure $\mathcal{A}$ and a tuple ${\bm{a}}\in{V(\mathcal{A})}^{k}$ of arity $k$ , the atomic type $\mathsf{atp}({\bm{a}})$ is the set $\{\,{R\in\sigma\;:\;{\bm{a}}\in R^{\mathcal{A}}}\,\}$ . For every tuple ${\bm{b}}\in{V(\mathcal{A})}^{\ell}$ of arity $\ell$ , the similarity type $\mathsf{stp}({\bm{a}},{\bm{b}})$ between ${\bm{a}}$ and ${\bm{b}}$ is defined as the set $\{\,{(i,j)\in[k]\times[\ell]\;:\;a_{i}=b_{j}}\,\}$ . We use $\mathsf{stp}({\bm{a}})$ as shorthand for $\mathsf{stp}({\bm{a}},{\bm{a}})$ .

In general, an atomic type $\rho$ of arity $k$ (over signature $\sigma$ ) is a subset of ${\sigma|_{k}}$ . A similarity type $\tau$ of arity $(k,\ell)$ (over signature $\sigma$ ) is a subset of $[k]\times[\ell]$ that satisfies the following condition of “rectangularity”: for all $i,i^{\prime}\in[k]$ and $j,j^{\prime}\in[\ell]$ , if $\{\,{(i,j),(i^{\prime},j),(i,j^{\prime})}\,\}\subseteq\tau$ then $(i^{\prime},j^{\prime})\in\tau$ . We write $\operatorname{ar}(\rho)$ and $\operatorname{ar}(\tau)$ to denote the arity of $\rho$ and $\tau$ . If $\tau$ has arity $(k,k)$ , we simply say that $\tau$ has arity $k$ and write $\operatorname{ar}(\tau)=k$ . We let $\mathsf{STP}_{\sigma}$ be the set of all similarity types of arity $(\operatorname{ar}(R),\operatorname{ar}(S))$ for any $R,S\in\sigma$ . We say that a tuple ${\bm{a}}\in{V(\mathcal{A})}^{k}$ has atomic type $\rho$ if $\rho=\mathsf{atp}({\bm{a}})$ . Analogously, we say that ${\bm{a}}$ has similarity type $\tau$ , if $\tau=\mathsf{stp}({\bm{a}})$ ; and we say that ${\bm{a}}$ , ${\bm{b}}$ have similarity type $\tau$ , if $\tau=\mathsf{stp}({\bm{a}},{\bm{b}})$ .

3 Color Refinement on Relational Structures

The goal of this section is to introduce RCR as a generalization of CR from graphs to relational structures. Let $\sigma$ be an arbitrary (relational) signature; this $\sigma$ will be fixed throughout the rest of the paper.

3.1 Relational Color Refinement (RCR, for short) – Definition

Consider an arbitrary $\sigma$ -structure $\mathcal{A}$ . The key idea of RCR is to color the tuples in ${\bm{A}}$ and take into account the tuples’ atomic type and their mutual overlap (the latter is formalized by their similarity type). The details are as follows.

We iteratively compute colors for every tuple ${\bm{a}}\in{\bm{A}}$ . For every ${\bm{a}}\in{\bm{A}}$ , the initial color consists of the atomic type and the similarity type of ${\bm{a}}$ , i.e., it is $\operatorname{\varrho}_{0}({\bm{a}})\coloneqq(\mathsf{atp}({\bm{a}}),\mathsf{% stp}({\bm{a}}))$ . For $i\in\mathbb{N}_{\geqslant{}1}$ , the color after $i$ iterations is defined as $\operatorname{\varrho}_{i}({\bm{a}})\coloneqq(\operatorname{\varrho}_{i-1}({% \bm{a}}),N_{i}^{\mathcal{A}}({\bm{a}}))$ , where

N_{i}^{\mathcal{A}}({\bm{a}})\>\coloneqq\>\big{\{}\!\!\big{\{}{\bigl{(}\mathsf% {stp}({\bm{a}},{\bm{b}}),\,\operatorname{\varrho}_{i-1}({\bm{b}})\bigr{)}\;:\;% {\bm{b}}\in{\bm{A}},\,\mathsf{stp}({\bm{a}},{\bm{b}})\neq\varnothing}\big{\}}% \!\!\big{\}}.

Note that $\mathsf{stp}({\bm{a}},{\bm{b}})\neq\varnothing$ $\iff$ $\operatorname{set}({\bm{a}})\cap\operatorname{set}({\bm{b}})\neq\varnothing$ . By definition, $\operatorname{\varrho}_{i}$ refines $\operatorname{\varrho}_{i-1}$ for all $i\in\mathbb{N}_{\geqslant{}1}$ . The $i$ -th coloring is stable, if for all ${\bm{a}},{\bm{b}}\in{\bm{A}}$ we have $\operatorname{\varrho}_{i}({\bm{a}})=\operatorname{\varrho}_{i}({\bm{b}})\iff% \operatorname{\varrho}_{i+1}({\bm{a}})=\operatorname{\varrho}_{i+1}({\bm{b}})$ . It is easy to see that for every $\sigma$ -structure $\mathcal{A}$ there is an $i\in\mathbb{N}$ such that the $i$ -th coloring is stable; we let $i_{\mathcal{A}}$ be the smallest such number, and we write $\operatorname{\varrho}_{\infty}({\bm{a}})$ to denote $\operatorname{\varrho}_{i_{\mathcal{A}}}({\bm{a}})$ .

For $i\in\mathbb{N}$ we write $\mathsf{RC}_{i}(\mathcal{A})$ to denote the set of colors produced in the $i$ -th refinement round, i.e., $\mathsf{RC}_{i}(\mathcal{A})=\{\,{\operatorname{\varrho}_{i}({\bm{a}})\;:\;{% \bm{a}}\in{\bm{A}}}\,\}$ . For each $c\in\mathsf{RC}_{i}(\mathcal{A})$ we let $\operatorname{mult}_{\mathcal{A}}(c)\coloneqq\lvert{}{\{\,{{\bm{a}}\in{\bm{A}}% \;:\;\operatorname{\varrho}_{i}({\bm{a}})=c}\,\}}\rvert$ , i.e., $\operatorname{mult}_{\mathcal{A}}(c)$ is the number of tuples with color $c$ . We let $\mathsf{RC}(\mathcal{A})\coloneqq\mathsf{RC}_{\infty}(\mathcal{A})\coloneqq% \mathsf{RC}_{i_{\mathcal{A}}}(\mathcal{A})$ ; and we will call this the set of stable colors on $\mathcal{A}$ produced by RCR.

We say that RCR distinguishes two $\sigma$ -structures $\mathcal{A}$ and $\mathcal{B}$ in round $i$ , if there is a color $c\in\mathsf{RC}_{i}(\mathcal{A})\cup\mathsf{RC}_{i}(\mathcal{B})$ such that $\operatorname{mult}_{\mathcal{A}}(c)\neq\operatorname{mult}_{\mathcal{B}}(c)$ . Furthermore, we say that RCR distinguishes $\mathcal{A}$ and $\mathcal{B}$ if there is an $i\leqslant\max\{\,{i_{\mathcal{A}},i_{\mathcal{B}}}\,\}$ such that RCR distinguishes $\mathcal{A}$ and $\mathcal{B}$ in round $i$ . It is straightforward to see that if $\mathcal{A}$ and $\mathcal{B}$ are not of strictly equal size, then RCR distinguishes $\mathcal{A}$ and $\mathcal{B}$ in round 0.

A run of RCR on particular example structures can be found in the paper’s full version.

3.2 Connection between RCR and CR

For the signature $\sigma=\{\,{E,U}\,\}$ with $\operatorname{ar}(E)=2$ and $\operatorname{ar}(U)=1$ , the following is straightforward to see. For any (simple, undirected) graph $G$ let $\mathcal{A}_{G}$ be the $\sigma$ -structure $\mathcal{A}$ that represents $G$ as follows: $V(\mathcal{A})=U^{\mathcal{A}}=V(G)$ and $E^{\mathcal{A}}$ consists of the tuples $(u,v)$ and $(v,u)$ for all $\{\,{u,v}\,\}\in E(G)$ . RCR on $\mathcal{A}_{G}$ produces a stable coloring $\gamma$ that is equivalent to the stable coloring $\gamma^{\prime}$ produced by CR on $G$ . Therefore, RCR can be viewed as a generalization of CR from graphs to $\sigma$ -structures for arbitrary (relational) signatures $\sigma$ .

Next, we point out that running RCR on a $\sigma$ -structure $\mathcal{A}$ produces the same result as running CR on a suitably defined colored multigraph representation $\mathcal{G}_{\mathcal{A}}$ of $\mathcal{A}$ . The colored multigraph $\mathcal{G}_{\mathcal{A}}$ can be viewed as a suitably colored generalization, from graphs to relational structures, of the notion of the line graph $L(G)$ (cf. [14]) associated with an undirected graph $G$ . It will be crucial for the proof of Theorem A (b).

Definition 3.1.

We represent a $\sigma$ -structure $\mathcal{A}$ by a colored multigraph $\mathcal{G}_{\mathcal{A}}$ of the signature

\widehat{\sigma}\ \coloneqq\ \ \{\,{E_{i,j}\;:\;i,j\in[\operatorname{ar}(% \sigma)]}\,\}\cup\{\,{U_{R}\;:\;R\in\sigma}\,\}\,,

where $\operatorname{ar}(E_{i,j})=2$ for all $i,j\in[\operatorname{ar}(\sigma)]$ and $\operatorname{ar}(U_{R})=1$ for all $R\in\sigma$ .

The universe $V(\mathcal{G}_{A})$ of $\mathcal{G}_{\mathcal{A}}$ consists of a new element $w_{{\bm{a}}}$ for every tuple ${\bm{a}}\in{\bm{A}}$ . Furthermore, ${(U_{R})}^{\mathcal{G}_{\mathcal{A}}}=\{\,{w_{{\bm{a}}}\;:\;{\bm{a}}\in R^{% \mathcal{A}}}\,\}$ , for all $R\in\sigma$ . And for all $i,j\in[\operatorname{ar}(\sigma)]$ we have ${(E_{i,j})}^{\mathcal{G}_{\mathcal{A}}}\coloneqq\{\,{(w_{{\bm{a}}},w_{{\bm{b}}% })\;:\;{\bm{a}},{\bm{b}}\in{\bm{A}},\ (i,j)\in\mathsf{stp}({\bm{a}},{\bm{b}})}\,\}$ .

Example 3.2.

Consider $\sigma_{1}\coloneqq\{\,{E,R}\,\}$ with $\operatorname{ar}(E)=2$ and $\operatorname{ar}(R)=6$ , and let $\mathcal{A}_{1}$ be the $\sigma_{1}$ -structure with the universe $\{\,{1,2,3,u,v,w}\,\}$ , where $E^{\mathcal{A}_{1}}\coloneqq\{(1,2),(2,3),(3,1),(u,v),(v,w),\allowbreak(w,u)\}$ , and $R^{\mathcal{A}_{1}}\coloneqq\{\,{(1,2,3,u,v,w)}\,\}$ . The representation $\mathcal{G}_{\mathcal{A}_{1}}$ of $\mathcal{A}_{1}$ as a colored multigraph is depicted in Figure 1(a). To keep the figure easy to grasp, we labeled the vertices with the tuples they represent, omitted the self-loops, and contracted multi-edges into a single one with a combined edge label, where $x y$ denotes the tuple $(x,y)$ .

It is easy to see that running RCR on a $\sigma$ -structure $\mathcal{A}$ produces (in the same number of rounds) a stable coloring of ${\bm{A}}$ that is equivalent (via identifying ${\bm{a}}\in{\bm{A}}$ with $w_{{\bm{a}}}\in V(\mathcal{G}_{\mathcal{A}})$ ) to the stable coloring produced by classical CR on the colored multigraph $\mathcal{G}_{\mathcal{A}}$ .

Let us define the cohesion of $\mathcal{A}$ as the number $\Gamma(\mathcal{A})$ of all tuples $({\bm{a}},{\bm{b}})\in{\bm{A}}\times{\bm{A}}$ with ${\bm{a}}\neq{\bm{b}}$ and $\operatorname{set}({\bm{a}})\cap\operatorname{set}({\bm{b}})\neq\varnothing$ . Obviously, $\Gamma(\mathcal{A})<\lVert{}{\mathcal{A}}\rVert^{2}$ .

It is known that classical CR can be implemented to run in time $\mathcal{O}((n+m)\cdot\log(n))$ on colored multigraphs, where $n$ denotes the number of vertices and $m$ denotes the total number of edges (cf., [11, 5]). Thus, on the colored multigraph $\mathcal{G}_{\mathcal{A}}$ representing $\mathcal{A}$ , CR runs in time $\mathcal{O}((n+m)\cdot\log(n))$ where $n=\lvert{}{{\bm{A}}}\rvert=\lVert{}{\mathcal{A}}\rVert$ and $m=\sum_{i,j\in[\operatorname{ar}(\sigma)]}\lvert{}{{(E_{i,j})}^{\mathcal{G}_{% \mathcal{A}}}}\rvert=\mathcal{O}(\Gamma(\mathcal{A}))$ (for each fixed signature $\sigma$ ). Thus, by running CR on $\mathcal{G}_{\mathcal{A}}$ , we obtain an implementation of RCR on $\mathcal{A}$ that, for each fixed signature $\sigma$ , runs in time $\mathcal{O}\bigl{(}(\lVert{}{\mathcal{A}}\rVert+\Gamma(\mathcal{A}))\cdot\log(% \lVert{}{\mathcal{A}}\rVert)\bigr{)}=\mathcal{O}(\lVert{}{\mathcal{A}}\rVert^{% 2}\cdot\log(\lVert{}{\mathcal{A}}\rVert))$ . In the next subsection, we will improve the running time to $\mathcal{O}(\lVert{}{\mathcal{A}}\rVert\cdot\log\lVert{}{\mathcal{A}}\rVert)$ by using a representation of $\mathcal{A}$ by a colored multigraph different from $\mathcal{G}_{\mathcal{A}}$ .

3.3 Implementing RCR in Time $\mathcal{O}(\lVert{}{\mathcal{A}}\rVert\cdot\log(\lVert{}{\mathcal{A}}\rVert))$

This subsection is devoted to proving Theorem A (a), i.e., we prove the following theorem.

Theorem B.

For each fixed signature $\sigma$ , RCR can be implemented to run in time $\mathcal{O}(\lVert{}{\mathcal{A}}\rVert\cdot\log(\lVert{}{\mathcal{A}}\rVert))$ upon input of a $\sigma$ -structure $\mathcal{A}$ .

The factor hidden by the $\mathcal{O}$ -notation is of size $2^{\mathcal{O}(k\cdot\log k)}$ , where $k$ is the maximum arity of the relation symbols in $\sigma$ .

Recall from Section 3.2 that by performing classical CR on the colored multigraph $\mathcal{G}_{\mathcal{A}}$ we can implement RCR on a $\sigma$ -structure $\mathcal{A}$ with runtime $\mathcal{O}((n+m)\cdot\log(n))$ , where $n$ denotes the number of vertices and $m$ denotes the total number of edges of $\mathcal{G}_{\mathcal{A}}$ . Note that the nodes $w_{\bm{a}}$ for all tuples ${\bm{a}}\in{\bm{A}}$ that share an element form a clique in $\mathcal{G}_{\mathcal{A}}$ . This causes a blow-up of the number of edges in $\mathcal{G}_{\mathcal{A}}$ . We will alleviate this by resolving every clique by inserting a constant number of fresh vertices that are connected to all tuples participating in a clique. This will drastically reduce the number of edges, yielding a new colored multigraph $\mathcal{H}_{\mathcal{A}}$ whose number of nodes and edges is in $\mathcal{O}(\lVert{}{\mathcal{A}}\rVert)$ . For the precise definition of $\mathcal{H}_{\mathcal{A}}$ we need the following notation.

Definition 3.3.

Let $\mathcal{A}$ be a $\sigma$ -structure. We call a tuple ${\bm{s}}=(s_{1},\dots,s_{\ell})\in{V(\mathcal{A})}^{\ell}$ a slice (over $V(\mathcal{A})$ ), if its elements are pairwise distinct (i.e., $\ell=\lvert{}{\operatorname{set}({\bm{s}})}\rvert$ ) and $\ell\geqslant 1$ . We call ${\bm{s}}$ a slice of ${\bm{a}}\in{V(\mathcal{A})}^{k}$ , if ${\bm{s}}$ is a slice over $V(\mathcal{A})$ and $\operatorname{set}({\bm{s}})\subseteq\operatorname{set}({\bm{a}})$ . For an ${\bm{a}}\in{\bm{A}}$ , we write $\mathcal{S}({\bm{a}})$ for the set of all slices of ${\bm{a}}$ , i.e., $\mathcal{S}({\bm{a}})=\{\,{{\bm{s}}\in{V(\mathcal{A})}^{\ell}\;:\;% \operatorname{set}({\bm{s}})\subseteq\operatorname{set}({\bm{a}}),\ \ell% \geqslant 1,\ \lvert{}{\operatorname{set}({\bm{s}})}\rvert=\ell}\,\}$ . Conversely, for a slice ${\bm{s}}$ over $V(\mathcal{A})$ we denote by $\mathcal{S}^{-1}({\bm{s}})\coloneqq\{\,{{\bm{a}}\in{\bm{A}}\;:\;{\bm{s}}\in% \mathcal{S}({\bm{a}})}\,\}$ the set of tuples in ${\bm{A}}$ that ${\bm{s}}$ is a slice of.

Definition 3.4.

Let $\mathcal{A}$ be a $\sigma$ -structure. Let $\mathcal{H}_{\mathcal{A}}$ be the colored multigraph of signature $\widehat{\sigma}$ defined as follows. The universe $V(\mathcal{H}_{\mathcal{A}})$ consists of the nodes $w_{{\bm{a}}}$ for all ${\bm{a}}\in{\bm{A}}$ and a new node $v_{{\bm{s}}}$ for every slice ${\bm{s}}\in\mathcal{S}({\bm{A}})\coloneqq\bigcup_{{\bm{a}}\in{\bm{A}}}\mathcal% {S}({\bm{a}})$ . I.e., $V(\mathcal{H}_{\mathcal{A}})=\{\,{w_{{\bm{a}}}\;:\;{\bm{a}}\in{\bm{A}}}\,\}% \mathrel{\dot{\cup}}\{\,{v_{{\bm{s}}}\;:\;{\bm{s}}\in\mathcal{S}({\bm{A}})}\,\}$ . Furthermore, ${(U_{R})}^{\mathcal{H}_{\mathcal{A}}}\coloneqq\{\,{w_{{\bm{a}}}\;:\;{\bm{a}}% \in R^{\mathcal{A}}}\,\}$ , for all $R\in\sigma$ . And for all $i,j\in[\operatorname{ar}(\sigma)]$ we let ${(E_{i,j})}^{\mathcal{H}_{\mathcal{A}}}\coloneqq$ $\bigl{\{}\,(w_{{\bm{a}}},v_{{\bm{s}}})\;:\;{\bm{a}}\in{\bm{A}},\,{\bm{s}}\in% \mathcal{S}({\bm{a}}),\,(i,j)\in\mathsf{stp}({\bm{a}},{\bm{s}})\,\bigr{\}}\;\;% \cup\;\;\bigl{\{}\,(v_{{\bm{s}}},w_{{\bm{b}}})\;:\;{\bm{b}}\in{\bm{A}},\,{\bm{% s}}\in\mathcal{S}({\bm{b}}),\,(i,j)\in\mathsf{stp}({\bm{s}},{\bm{b}})\,\bigr{% \}}\;.$

(a)

\mathcal{G}_{\mathcal{A}_{1}}

for the structure

\mathcal{A}_{1}

from Example 3.2.

(b)

\mathcal{H}_{\mathcal{A}_{2}}

for the structure

\mathcal{A}_{2}

from Example 3.5.

Figure 1: Visualization of the colored multigraph representations from Sections 3.2 and 3.3.

Example 3.5.

Consider the signature $\sigma_{2}\coloneqq\{\,{R}\,\}$ with $\operatorname{ar}(R)=3$ . Then, $\widehat{\sigma}_{2}$ consists of the unary relation symbol $U_{R}$ and binary relation symbols $E_{i,j}$ for $i,j\in\{\,{1,2,3}\,\}$ .

Let $\mathcal{A}_{2}$ be the $\sigma_{2}$ -structure where $V(\mathcal{A}_{2})\coloneqq\{\,{1,2,3}\,\}$ and $R^{\mathcal{A}_{2}}\coloneqq\{\,{(1,1,2),\;(2,3,2)}\,\}$ . Then, ${\bm{A}}_{2}=R^{\mathcal{A}_{2}}$ and $\mathcal{S}({\bm{A}}_{2})=\{\,{(1),\;(2),\;(3),\;\allowbreak(1,2),\;(2,1),\;(2% ,3),\;(3,2)}\,\}$ . The colored multigraph $\mathcal{H}_{\mathcal{A}_{2}}$ is the $\widehat{\sigma}_{2}$ -structure with $V(\mathcal{H}_{\mathcal{A}_{2}})=\{\,{w_{(1,1,2)},\;w_{(2,3,2)}}\,\}% \allowbreak\cup\{\,{v_{\bm{s}}\;:\;{\bm{s}}\in\mathcal{S}({\bm{A}}_{2})}\,\}$ . The unary symbol $U_{R}$ is interpreted by the set ${(U_{R})}^{\mathcal{H}_{\mathcal{A}_{2}}}=\{\,{w_{(1,1,2)},\;w_{(2,3,2)}}\,\}$ . See Figure 1(b) for an illustration of $\mathcal{H}_{\mathcal{A}_{2}}$ .

Note that for each ${\bm{a}}\in{\bm{A}}$ of arity $k$ , the number of slices of ${\bm{a}}$ is $\lvert{}{\mathcal{S}({\bm{a}})}\rvert\leqslant k{\cdot}{k!}$ . Thus, for $k\coloneqq\operatorname{ar}(\sigma)=\max\{\,{\operatorname{ar}(R)\,:\,R\in% \sigma}\,\}$ we have $\lvert{}{V(\mathcal{H}_{\mathcal{A}})}\rvert=\lvert{}{{\bm{A}}}\rvert+\lvert{}% {\mathcal{S}({\bm{A}})}\rvert\leqslant(1+k{\cdot}{k!}){\cdot}\lvert{}{{\bm{A}}}\rvert$ and $\lvert{}{{(E_{i,j})}^{\mathcal{H}_{\mathcal{A}}}}\rvert\leqslant 2{\cdot}k{% \cdot}{k!}{\cdot}\lvert{}{{\bm{A}}}\rvert$ , for all $i,j\in[k]$ . Hence, the total number of edges of $\mathcal{H}_{\mathcal{A}}$ is at most $2{\cdot}k^{3}{\cdot}k!{\cdot}\lvert{}{{\bm{A}}}\rvert$ . I.e., for the fixed relational signature $\sigma$ , the number of nodes and edges of the colored multigraph $\mathcal{H}_{\mathcal{A}}$ associated with a $\sigma$ -structure $\mathcal{A}$ is of size $\mathcal{O}(\lVert{}{\mathcal{A}}\rVert)$ , where the factor hidden by the $\mathcal{O}$ -notation is bounded by $2{\cdot}k^{3}{\cdot}k!=2^{\mathcal{O}(k\cdot\log k)}$ for $k\coloneqq\operatorname{ar}(\sigma)$ .

Since the number of vertices and the number of edges in $\mathcal{H}_{\mathcal{A}}$ both are of size $\mathcal{O}(\lVert{}{\mathcal{A}}\rVert)$ , ˜B is obtained as an immediate consequence of the following Theorem 3.6 and the known running time of CR (cf. Sections 1 and 3.2):

Theorem 3.6.

Let $\mathcal{A}$ be a $\sigma$ -structure. Let $\operatorname{\varrho}_{i}({\bm{a}})$ be the color assigned to tuple ${\bm{a}}\in{\bm{A}}$ in the $i$ -th round of Relational Color Refinement RCR on $\mathcal{A}$ , and let $\operatorname{\gamma}_{i}(u)$ be the color assigned to node $u$ of $\mathcal{H}_{\mathcal{A}}$ in round $i$ of conventional Color Refinement CR on the colored multigraph $\mathcal{H}_{\mathcal{A}}$ . For all $i\in\mathbb{N}$ and all ${\bm{a}},{\bm{b}}\in{\bm{A}}$ we have: $\operatorname{\varrho}_{i}({\bm{a}})=\operatorname{\varrho}_{i}({\bm{b}})$ $\iff$ $\operatorname{\gamma}_{2i+1}(w_{{\bm{a}}})=\operatorname{\gamma}_{2i+1}(w_{{% \bm{b}}})$ .

The remainder of this section is devoted to the proof of Theorem 3.6. See the paper’s full version for detailed proofs of all subsequent lemmas and claims. We start with a lemma that summarizes some obvious facts.

Lemma 3.7.

Let $\mathcal{A}$ be a $\sigma$ -structure. Let $k,k^{\prime},\ell\geqslant 1$ and let ${\bm{a}}=(a_{1},\ldots,a_{k})$ and ${\bm{b}}=(b_{1},\ldots,b_{k^{\prime}})$ be elements in ${\bm{A}}$ , and let ${\bm{s}}=(s_{1},\ldots,s_{\ell})$ be a slice over $V(\mathcal{A})$ .

(a)

$\mathsf{stp}({\bm{a}},{\bm{b}})\neq\varnothing\iff\operatorname{set}({\bm{a}})% \cap\operatorname{set}({\bm{b}})\neq\varnothing$ .
(b)

$\mathsf{stp}({\bm{a}})=\mathsf{stp}({\bm{b}})$ $\iff$ $\operatorname{ar}({\bm{a}})=\operatorname{ar}({\bm{b}})$ and the function $\beta\colon\operatorname{set}({\bm{a}})\to\operatorname{set}({\bm{b}})$ with $\beta(a_{i})\coloneqq b_{i}$ for all $i\in[k]$ is well-defined and bijective.
(c)

${\bm{s}}\in\mathcal{S}({\bm{a}})$ $\iff$ for every $i\in[\ell]$ there exists a $j\in[k]$ such that $(i,j)\in\mathsf{stp}({\bm{s}},{\bm{a}})$ $\iff$ for every $j\in[\ell]$ there exists an $i\in[k]$ such that $(i,j)\in\mathsf{stp}({\bm{a}},{\bm{s}})$ .
(d)

Let ${\bm{s}}\in\mathcal{S}({\bm{a}})$ . For all ${\bm{s}}^{\prime}\in\mathcal{S}({\bm{a}})$ we have: $\mathsf{stp}({\bm{a}},{\bm{s}})=\mathsf{stp}({\bm{a}},{\bm{s}}^{\prime})$ $\iff$ ${\bm{s}}={\bm{s}}^{\prime}$ .

Two nodes $u,u^{\prime}\in V(\mathcal{H}_{\mathcal{A}})$ are called neighbors in $\mathcal{H}_{\mathcal{A}}$ if $(u,u^{\prime})\in{(E_{i,j})}^{\mathcal{H}_{\mathcal{A}}}$ for some $i,j\in[\operatorname{ar}(\sigma)]$ .

$\blacktriangleright$ Remark 3.8.

Note that for all ${\bm{a}}\in{\bm{A}}$ and ${\bm{s}}\in\mathcal{S}({\bm{A}})$ we have: $w_{\bm{a}}$ and $v_{{\bm{s}}}$ are neighbors in $\mathcal{H}_{\mathcal{A}}$ $\iff$ ${\bm{s}}\in\mathcal{S}({\bm{a}})$ . Also, by Lemma 3.7 (d) we obtain: For all ${\bm{a}}\in{\bm{A}}$ and all ${\bm{s}},{\bm{s}}^{\prime}\in\mathcal{S}({\bm{a}})$ with ${\bm{s}}\neq{\bm{s}}^{\prime}$ , we have $\mathsf{stp}({\bm{a}},{\bm{s}})\neq\mathsf{stp}({\bm{a}},{\bm{s}}^{\prime})$ .

The following characterization of tuples ${\bm{a}},{\bm{b}}$ with $\mathsf{stp}({\bm{a}})=\mathsf{stp}({\bm{b}})$ will be crucial for our proof of Theorem 3.6.

Lemma 3.9.

Let $\mathcal{A}$ be a $\sigma$ -structure. For all ${\bm{a}},{\bm{b}}\in{\bm{A}}$ we have: $\mathsf{stp}({\bm{a}})=\mathsf{stp}({\bm{b}})$ $\iff$ there is a bijection $\pi_{\mathcal{S}}\colon\mathcal{S}({\bm{a}})\to\mathcal{S}({\bm{b}})$ such that for all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ we have $\mathsf{stp}({\bm{a}},{\bm{s}})=\mathsf{stp}({\bm{b}},\pi_{\mathcal{S}}({\bm{s% }}))$ .

It follows from Remark 3.8 that the bijection $\pi_{\mathcal{S}}$ is unique, if it exists. The following lemma summarizes straightforward properties of the mapping $\pi_{\mathcal{S}}$ obtained from Lemma 3.9.

Lemma 3.10.

Let $\mathcal{A}$ be a $\sigma$ -structure, let ${\bm{a}},{\bm{b}}\in{\bm{A}}$ with $\mathsf{stp}({\bm{a}})=\mathsf{stp}({\bm{b}})$ , and let $\pi_{\mathcal{S}}\colon\mathcal{S}({\bm{a}})\to\mathcal{S}({\bm{b}})$ be the bijection obtained from Lemma 3.9. For all ${\bm{s}},{\bm{s}}^{\prime}\in\mathcal{S}({\bm{a}})$ and for ${\bm{t}}\coloneqq\pi_{\mathcal{S}}({\bm{s}})$ and ${\bm{t}}^{\prime}\coloneqq\pi_{\mathcal{S}}({\bm{s}}^{\prime})$ we have: (1) $\operatorname{ar}({\bm{s}})=\operatorname{ar}({\bm{t}})$ , and (2) $\operatorname{set}({\bm{s}})\subseteq\operatorname{set}({\bm{s}}^{\prime})$ $\iff$ $\operatorname{set}({\bm{t}})\subseteq\operatorname{set}({\bm{t}}^{\prime})$ , and (3) $\operatorname{set}({\bm{s}})=\operatorname{set}({\bm{s}}^{\prime})$ $\iff$ $\operatorname{set}({\bm{t}})=\operatorname{set}({\bm{t}}^{\prime})$ .

For ${\bm{a}}\in{\bm{A}}$ let $N({\bm{a}})\coloneqq\{\,{{\bm{c}}^{\prime}\in{\bm{A}}\;:\;\mathsf{stp}({\bm{a}% },{\bm{c}}^{\prime})\neq\varnothing}\,\}=\{\,{{\bm{c}}^{\prime}\in{\bm{A}}\;:% \;\operatorname{set}({\bm{a}})\cap\operatorname{set}({\bm{c}}^{\prime})\neq% \varnothing}\,\}$ . We proceed with the main technical lemma that will enable us to prove Theorem 3.6.

Lemma 3.11.

Let $\mathcal{A}$ be a $\sigma$ -structure. Let $Z$ be a non-empty set and let $f$ be a mapping $f\colon{\bm{A}}\to Z$ . Consider ${\bm{a}},{\bm{b}}\in{\bm{A}}$ with $f({\bm{a}})=f({\bm{b}})$ and $\mathsf{stp}({\bm{a}})=\mathsf{stp}({\bm{b}})$ . Let $\pi_{\mathcal{S}}\colon\mathcal{S}({\bm{a}})\to\mathcal{S}({\bm{b}})$ be the bijection obtained by Lemma 3.9. The following are equivalent:

1.

$\{\!\!\{\,{(\mathsf{stp}({\bm{a}},{\bm{c}}),f({\bm{c}}))\;:\;{\bm{c}}\in N({% \bm{a}})}\,\}\!\!\}\;=\;\{\!\!\{\,{(\mathsf{stp}({\bm{b}},{\bm{c}}),f({\bm{c}}% ))\;:\;{\bm{c}}\in N({\bm{b}})}\,\}\!\!\}$ .
2.

For all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ we have:
$\{\!\!\{\,{(\mathsf{stp}({\bm{s}},{\bm{c}}),f({{\bm{c}}}))\;:\;{\bm{c}}\in% \mathcal{S}^{-1}({\bm{s}})}\,\}\!\!\}\;=\;\{\!\!\{\,{(\mathsf{stp}(\pi_{% \mathcal{S}}({\bm{s}}),{\bm{d}}),f({{\bm{d}}}))\;:\;{\bm{d}}\in\mathcal{S}^{-1% }(\pi_{\mathcal{S}}({\bm{s}}))}\,\}\!\!\}$ .

The proof makes heavy use of Lemma 3.10 and is combinatorially quite involved; in particular, the proof of direction 2 $\Rightarrow$ 1 proceeds by an intricate induction that starts with tuples ${\bm{c}}$ in $N({\bm{a}})$ with $\operatorname{set}({\bm{c}})=\operatorname{set}({\bm{a}})$ , and the induction step considers tuples ${\bm{c}}$ in $N({\bm{a}})$ with a decreasing size of the intersection of $\operatorname{set}({\bm{c}})$ and $\operatorname{set}({\bm{a}})$ .
The following fact will be helpful for the subsequent proofs.

Fact 3.11.

For all ${\bm{a}},{\bm{b}}\in{\bm{A}}$ we have

(a)

$\mathsf{atp}({\bm{a}})=\mathsf{atp}({\bm{b}})$ $\iff$ $\operatorname{\gamma}_{0}(w_{{\bm{a}}})=\operatorname{\gamma}_{0}(w_{{\bm{b}}})$ .
(b)

For all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ and ${\bm{t}}\in\mathcal{S}({\bm{b}})$ we have: $\mathsf{stp}({\bm{a}},{\bm{s}})=\mathsf{stp}({\bm{b}},{\bm{t}})$ $\iff$ $\lambda(w_{\bm{a}},v_{\bm{s}})=\lambda(w_{\bm{b}},v_{{\bm{t}}})$
(c)

For all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ , ${\bm{t}}\in\mathcal{S}({\bm{b}})$ and for all ${\bm{c}}\in\mathcal{S}^{-1}({\bm{s}})$ , ${\bm{d}}\in\mathcal{S}^{-1}({\bm{t}})$ we have:
$\mathsf{stp}({\bm{s}},{\bm{c}})=\mathsf{stp}({\bm{t}},{\bm{d}})$ $\iff$ $\lambda(v_{\bm{s}},w_{\bm{c}})=\lambda(v_{{\bm{t}}},w_{{\bm{d}}})$ .
(d)

For all nodes $v$ of $\mathcal{H}_{\mathcal{A}}$ we have:
$v$ and $w_{\bm{a}}$ are neighbors in $\mathcal{H}_{\mathcal{A}}$ $\iff$ $v=v_{\bm{s}}$ for some ${\bm{s}}\in\mathcal{S}({\bm{a}})$ .

The first three statements follow immediately from the definition of $\mathcal{H}_{\mathcal{A}}$ , see ˜3.4. The last statement follows from Remark 3.8 and the definition of $\mathcal{H}_{\mathcal{A}}$ . The following lemma relates the color that CR assigns to the node $w_{\bm{a}}$ to the colors it assigns to the nodes $v_{\bm{s}}$ for the slices ${\bm{s}}$ of ${\bm{a}}$ .

Lemma 3.12.

For all ${\bm{a}},{\bm{b}}\in{\bm{A}}$ with $\mathsf{stp}({\bm{a}})=\mathsf{stp}({\bm{b}})$ and all $i\in\mathbb{N}_{\geqslant{}1}$ we have:
$\operatorname{\gamma}_{i}(w_{{\bm{a}}})=\operatorname{\gamma}_{i}(w_{{\bm{b}}})\iff$
(a) $\operatorname{\gamma}_{i-1}(w_{{\bm{a}}})=\operatorname{\gamma}_{i-1}(w_{{\bm{% b}}})$ and (b) $\operatorname{\gamma}_{i-1}(v_{{\bm{s}}})=\operatorname{\gamma}_{i-1}(v_{{\bm{% t}}})$ , for all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ and ${\bm{t}}\coloneqq\pi_{\mathcal{S}}({\bm{s}})$ .
Here, $\pi_{\mathcal{S}}\colon\mathcal{S}({\bm{a}})\to\mathcal{S}({\bm{b}})$ is the bijection from Lemma 3.9.

Finally, we are ready for the proof of Theorem 3.6.

Proof of Theorem 3.6.

We proceed by induction on $i$ . For the induction base $i{=}0$ , the proof is straightforward by using Section 3.3 and Lemma 3.9.
For the inductive Step, consider an $i\in\mathbb{N}_{\geqslant{}1}$ , and let ${\bm{a}},{\bm{b}}\in{\bm{A}}$ .
Induction hypothesis: For all $j<i$ and ${\bm{c}},{\bm{d}}\in{\bm{A}}$ we have: $\operatorname{\varrho}_{j}({\bm{c}})=\operatorname{\varrho}_{j}({\bm{d}})\iff% \operatorname{\gamma}_{2j+1}(w_{{\bm{c}}})=\operatorname{\gamma}_{2j+1}(w_{{% \bm{d}}})$ .
Induction Claim: $\operatorname{\varrho}_{i}({\bm{a}})\,{=}\mskip 1.0mu\operatorname{\varrho}_{i% }({\bm{b}})\,{\iff}\mskip 1.0mu\operatorname{\gamma}_{2i+1}(w_{{\bm{a}}})\,{=}% \,\operatorname{\gamma}_{2i+1}(w_{{\bm{b}}})$ .

In case that $\mathsf{stp}({\bm{a}})\neq\mathsf{stp}({\bm{b}})$ , we have $\operatorname{\varrho}_{0}({\bm{a}})\neq\operatorname{\varrho}_{0}({\bm{b}})$ , and hence, by the definition of RCR, $\operatorname{\varrho}_{j}({\bm{a}})\neq\operatorname{\varrho}_{j}({\bm{b}})$ holds for all $j\geqslant 0$ . Furthermore, by the induction base, $\operatorname{\varrho}_{0}({\bm{a}})\neq\operatorname{\varrho}_{0}({\bm{b}})$ implies that $\operatorname{\gamma}_{1}(w_{{\bm{a}}})\neq\operatorname{\gamma}_{1}(w_{\bm{b}})$ . Hence, by the definition of CR, $\operatorname{\gamma}_{j}(w_{{\bm{a}}})\neq\operatorname{\gamma}_{j}(w_{\bm{b}})$ holds for all $j\geqslant 1$ . This yields: $\operatorname{\varrho}_{i}({\bm{a}})\neq\operatorname{\varrho}_{i}({\bm{b}})$ and $\operatorname{\gamma}_{2i+1}(w_{{\bm{a}}})\neq\operatorname{\gamma}_{2i+1}(w_{% \bm{b}})$ , completing the induction step.

In the following, we consider the case where $\mathsf{stp}({\bm{a}})=\mathsf{stp}({\bm{b}})$ . From Lemma 3.9 we obtain a bijection $\pi_{\mathcal{S}}\colon\mathcal{S}({\bm{a}})\to\mathcal{S}({\bm{b}})$ satisfying $\mathsf{stp}({\bm{a}},{\bm{s}})=\mathsf{stp}({\bm{b}},\pi_{\mathcal{S}}({\bm{s% }}))$ for all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ .

If $\operatorname{\varrho}_{i-1}({\bm{a}})\neq\operatorname{\varrho}_{i-1}({\bm{b}})$ , then $\operatorname{\varrho}_{i}({\bm{a}})\neq\operatorname{\varrho}_{i}({\bm{b}})$ by definition of RCR and $\operatorname{\gamma}_{2(i-1)+1}(w_{{\bm{a}}})\neq\operatorname{\gamma}_{2(i-1% )+1}(w_{{\bm{b}}})$ by induction hypothesis. It follows from the definition of CR that $\operatorname{\gamma}_{2i+1}(w_{{\bm{a}}})\neq\operatorname{\gamma}_{2i+1}(w_{% {\bm{b}}})$ as well. Thus, from now on we consider the case that $\operatorname{\varrho}_{i-1}({\bm{a}})=\operatorname{\varrho}_{i-1}({\bm{b}})$ .

Using Lemma 3.12 we get that $\operatorname{\gamma}_{2i+1}(w_{{\bm{a}}})=\operatorname{\gamma}_{2i+1}(w_{{% \bm{b}}})$ if, and only if, $\operatorname{\gamma}_{2i}(w_{{\bm{a}}})=\operatorname{\gamma}_{2i}(w_{{\bm{b}% }})$ and for all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ and ${\bm{t}}\coloneqq\pi_{\mathcal{S}}({\bm{s}})$ it holds that $\operatorname{\gamma}_{2i}(v_{{\bm{s}}})=\operatorname{\gamma}_{2i}(v_{{\bm{t}% }})$ . Applying the same lemma again yields that $\operatorname{\gamma}_{2i+1}(w_{{\bm{a}}})=\operatorname{\gamma}_{2i+1}(w_{{% \bm{b}}})$ if, and only if $\operatorname{\gamma}_{2i-1}(w_{{\bm{a}}})=\operatorname{\gamma}_{2i-1}(w_{{% \bm{b}}})$ and for all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ and ${\bm{t}}\coloneqq\pi_{\mathcal{S}}({\bm{s}})$ it holds that $\operatorname{\gamma}_{2i}(v_{{\bm{s}}})=\operatorname{\gamma}_{2i}(v_{{\bm{t}% }})$ and $\operatorname{\gamma}_{2i-1}(v_{{\bm{s}}})=\operatorname{\gamma}_{2i-1}(v_{{% \bm{t}}})$ .

Thus, we must show that $\operatorname{\varrho}_{i}({\bm{a}})=\operatorname{\varrho}_{i}({\bm{b}})$ if, and only if, $\operatorname{\gamma}_{2i-1}(w_{{\bm{a}}})=\operatorname{\gamma}_{2i-1}(w_{{% \bm{b}}})$ and for all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ and ${\bm{t}}\coloneqq\pi_{\mathcal{S}}({\bm{s}})$ it holds that $\operatorname{\gamma}_{2i}(v_{{\bm{s}}})=\operatorname{\gamma}_{2i}(v_{{\bm{t}% }})$ and $\operatorname{\gamma}_{2i-1}(v_{{\bm{s}}})=\operatorname{\gamma}_{2i-1}(v_{{% \bm{t}}})$ . Recall that we have $\operatorname{\varrho}_{i-1}({\bm{a}})=\operatorname{\varrho}_{i-1}({\bm{b}})$ . Since $2i-1=2(i-1)+1$ , we get that $\operatorname{\gamma}_{2i-1}(w_{{\bm{a}}})=\operatorname{\gamma}_{2i-1}(w_{{% \bm{b}}})$ from the induction hypothesis. Hence, it remains to show that $\operatorname{\varrho}_{i}({\bm{a}})=\operatorname{\varrho}_{i}({\bm{b}})$ if, and only if, for all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ and ${\bm{t}}\coloneqq\pi_{\mathcal{S}}({\bm{s}})$ it holds that $\operatorname{\gamma}_{2i}(v_{{\bm{s}}})=\operatorname{\gamma}_{2i}(v_{{\bm{t}% }})$ and $\operatorname{\gamma}_{2i-1}(v_{{\bm{s}}})=\operatorname{\gamma}_{2i-1}(v_{{% \bm{t}}})$ . It is now easy to see that the following two claims finish the proof.

Claim 3.13.

If $\operatorname{\varrho}_{i}({\bm{a}})=\operatorname{\varrho}_{i}({\bm{b}})$ , then the following holds for all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ and ${\bm{t}}\coloneqq\pi_{\mathcal{S}}({\bm{s}})$ :
If $\operatorname{\gamma}_{2i-1}(v_{{\bm{s}}})=\operatorname{\gamma}_{2i-1}(v_{{% \bm{t}}})$ , then $\operatorname{\gamma}_{2i}(v_{{\bm{s}}})=\operatorname{\gamma}_{2i}(v_{{\bm{t}% }})$ .

Claim 3.14.

$\operatorname{\varrho}_{i}({\bm{a}})=\operatorname{\varrho}_{i}({\bm{b}})$ $\iff$ for all ${\bm{s}}\in\mathcal{S}({\bm{a}})$ and ${\bm{t}}\coloneqq\pi_{\mathcal{S}}({\bm{s}})$ we have $\operatorname{\gamma}_{2i-1}(v_{{\bm{s}}})=\operatorname{\gamma}_{2i-1}(v_{{% \bm{t}}})$ .

To prove both claims we use Lemma 3.11, Section 3.3 and the definitions of CR and RCR. $\hfill\blacktriangleleft$

4 Connection to Homomorphism Counts

This section is devoted to proving the equivalence of statements (1) and (2) of Theorem A (b). I.e., we relate the distinguishing power of RCR to distinguishability via homomorphism counts from acyclic $\sigma$ -structures.

Acyclic $\sigma$ -structures.

Let $\mathcal{C}$ be a $\sigma$ -structure. A join-tree for $\mathcal{C}$ is a tree (i.e., an undirected, simple graph that is connected and acyclic) $J$ with vertex set $V(J)\coloneqq{\bm{C}}$ (i.e., the tuples in ${\bm{C}}$ serve as vertices of $J$ ) and which satisfies the following connectivity condition: for all $v\in V(\mathcal{C})$ the set $\{\,{{{\bm{c}}}\in V(J)\;:\;v\in\operatorname{set}({\bm{c}})}\,\}$ induces a connected subgraph of $J$ ; we will denote this subgraph (which in fact is a tree) by $J_{v}$ .

We call a $\sigma$ -structure $\mathcal{C}$ acyclic if there exists a join-tree for $\mathcal{C}$ . This definition of acyclicity of $\sigma$ -structures is equivalent to acyclicity as defined in the textbook [1], it is equivalent to the notion of alpha-acyclicity as defined in [4, 6] and, finally, is also equivalent to $\mathcal{C}$ having (generalized or fractional) hypertree width 1 as defined in [17, 18, 24]. In the literature, also other notions of acyclicity for relational structures (and hypergraphs) have been considered; but alpha-acyclicity arguably is the most common and the least restrictive one. Consult [7] for a detailed survey on this topic.

For the special case of binary signatures $\widehat{\sigma}$ , i.e., where $\widehat{\sigma}$ -structures are colored multigraphs, it is known [7] that a $\widehat{\sigma}$ -structure $\mathcal{C}$ is acyclic if, and only if, its Gaifman graph is acyclic (w.r.t. the usual notion of acyclicity of undirected simple graphs). It is well-known that for non-binary signatures $\sigma$ there exist acyclic $\sigma$ -structures whose Gaifman graph is not acyclic.

Homomorphisms.

A homomorphism from a $\sigma$ -structure $\mathcal{C}$ to a $\sigma$ -structure $\mathcal{A}$ is a mapping $h\colon V(\mathcal{C})\to V(\mathcal{A})$ such that for all $R\in\sigma$ , for $k\coloneqq\operatorname{ar}(R)$ , and all ${\bm{c}}=(c_{1},\dots,c_{k})\in R^{\mathcal{C}}$ we have $(h(c_{1}),\dots,h(c_{k}))\in R^{\mathcal{A}}$ . We write $\operatorname{Hom}(\mathcal{C},\mathcal{A})$ for the set of homomorphisms from $\mathcal{C}$ to $\mathcal{A}$ , and we let $\operatorname{hom}(\mathcal{C},\mathcal{A})\coloneqq\lvert{}{\operatorname{Hom% }(\mathcal{C},\mathcal{A})}\rvert$ denote the number of homomorphisms from $\mathcal{C}$ to $\mathcal{A}$ .

The remainder of this section is dedicated to proving the equivalence of statements (1) and (2) of Theorem A (b), i.e., proving the following theorem.

Theorem C.

For all $\sigma$ -structures $\mathcal{A}$ and $\mathcal{B}$ , the following statements are equivalent.

1.

RCR distinguishes $\mathcal{A}$ and $\mathcal{B}$ .
2.

There exists an acyclic and connected $\sigma$ -structure $\mathcal{C}$ such that $\operatorname{hom}(\mathcal{C},\mathcal{A})\neq\operatorname{hom}(\mathcal{C},% \mathcal{B})$ .

This can be viewed as a generalization of the following result by Dvořák [15] and Dell, Grohe, Rattan [13] to arbitrary signatures $\sigma$ . While [15, 13] state the theorem just for graphs, it easily extends to colored multigraphs (as noted in [13]). A colored multitree is an acyclic and connected colored multigraph, i.e., a colored multigraph whose Gaifman graph is a tree.

Theorem 4.1 ([15, 13]).

Let $\mathcal{G}$ and $\mathcal{H}$ be colored multigraphs. The following statements are equivalent.

1.

CR distinguishes $\mathcal{G}$ and $\mathcal{H}$ .
2.

There exists a colored multitree $\mathcal{T}$ such that $\operatorname{hom}(\mathcal{T},\mathcal{G})\neq\operatorname{hom}(\mathcal{T},% \mathcal{H})$ .

Theorem 4.1 will serve as the first key ingredient of our proof of ˜C. The second key ingredient is to use the following notion of a join-tree representation $\mathcal{G}_{\mathcal{C}}^{J}$ . Recall from ˜3.1 the binary signature $\widehat{\sigma}\coloneqq\{\,{E_{i,j}\;:\;i,j\in[\operatorname{ar}(\sigma)]}\,% \}\cup\{\,{U_{R}\;:\;R\in\sigma}\,\}$ and the colored multigraph $\mathcal{G}_{\mathcal{A}}$ of signature $\widehat{\sigma}$ that represents a $\sigma$ -structure $\mathcal{A}$ . For an acyclic $\sigma$ -structure $\mathcal{C}$ and a join-tree $J$ for $\mathcal{C}$ we define the colored multigraph $\mathcal{G}_{\mathcal{C}}^{J}$ of signature $\widehat{\sigma}$ to have universe $V(\mathcal{G}_{\mathcal{C}}^{J})\coloneqq\{\,{v_{{\bm{c}}}\;:\;{\bm{c}}\in{\bm% {C}}}\,\}$ where $v_{{\bm{c}}}$ is a new vertex for each tuple ${\bm{c}}\in{\bm{C}}$ , and

{(U_{R})}^{\mathcal{G}_{\mathcal{C}}^{J}}\coloneqq\{\,{v_{{\bm{c}}}\;:\;{\bm{c% }}\in R^{\mathcal{C}}}\,\}\ \text{and}\ {(E_{i,j})}^{\mathcal{G}_{\mathcal{C}}% ^{J}}\coloneqq\{\,{(v_{{\bm{b}}},v_{{\bm{c}}})\;:\;\{\,{{\bm{b}},{\bm{c}}}\,\}% \in E(J)\;\text{and}\;(i,j)\in\mathsf{stp}({\bm{b}},{\bm{c}})}\,\}

for all $R\in\sigma$ and all $i,j\in[\operatorname{ar}(\sigma)]$ . Via identifying ${\bm{c}}$ and $v_{{\bm{c}}}$ for all ${\bm{c}}\in{\bm{C}}$ , the Gaifman graph of $\mathcal{G}_{\mathcal{C}}^{J}$ is isomorphic to a subgraph of $J$ . Thus, since $J$ is a tree, $\mathcal{G}_{\mathcal{C}}^{J}$ is acyclic.

The last two ingredients for our proof of ˜C are the following lemmas:

Lemma 4.2.

For $\sigma$ -structures $\mathcal{A}$ and $\mathcal{C}$ and any join-tree $J$ for $\mathcal{C}$ we have:
$\operatorname{hom}(\mathcal{C},\mathcal{A})\;=\;\operatorname{hom}(\mathcal{G}% _{\mathcal{C}}^{J},\mathcal{G}_{\mathcal{A}})$ .

Lemma 4.3.

Let $\mathcal{A}$ and $\mathcal{B}$ be $\sigma$ -structures, and let $\mathcal{T}$ be a colored multitree of signature $\widehat{\sigma}$ such that $\operatorname{hom}(\mathcal{T},\mathcal{G}_{\mathcal{A}})\neq\operatorname{hom% }(\mathcal{T},\mathcal{G}_{\mathcal{B}})$ . There exists an acyclic and connected $\sigma$ -structure $\mathcal{C}$ and a join-tree $J$ for $\mathcal{C}$ such that $\operatorname{hom}(\mathcal{G}_{\mathcal{C}}^{J},\mathcal{G}_{\mathcal{A}})% \neq\operatorname{hom}(\mathcal{G}_{\mathcal{C}}^{J},\mathcal{G}_{\mathcal{B}})$ .

Before proving these lemmas, we first show how to use the four key ingredients for proving ˜C.

Proof of ˜C.

As pointed out in Section 3.2, running RCR on a $\sigma$ -structure $\mathcal{A}$ produces a stable coloring that is equivalent (via identifying ${\bm{a}}\in{\bm{A}}$ with $w_{\bm{a}}\in V(\mathcal{G}_{\mathcal{A}})$ ) to the stable coloring produced by the classical CR on the colored multigraph $\mathcal{G}_{\mathcal{A}}$ . Thus, RCR distinguishes the $\sigma$ -structures $\mathcal{A}$ and $\mathcal{B}$ if, and only if, classical CR distinguishes $\mathcal{G}_{\mathcal{A}}$ and $\mathcal{G}_{\mathcal{B}}$ . According to Theorem 4.1 the latter is the case if, and only if, there is a colored multitree $\mathcal{T}$ with $\operatorname{hom}(\mathcal{T},\mathcal{G}_{\mathcal{A}})\neq\operatorname{hom% }(\mathcal{T},\mathcal{G}_{\mathcal{B}})$ .

Hence, for the direction “1 $\Rightarrow$ 2”, if RCR distinguishes $\mathcal{A}$ and $\mathcal{B}$ , then there exists a colored multitree $\mathcal{T}$ with $\operatorname{hom}(\mathcal{T},\mathcal{G}_{\mathcal{A}})\neq\operatorname{hom% }(\mathcal{T},\mathcal{G}_{\mathcal{B}})$ . By Lemma 4.3, there also exists an acyclic and connected $\sigma$ -structure $\mathcal{C}$ and a join-tree $J$ for $\mathcal{C}$ such that $\operatorname{hom}(\mathcal{G}_{\mathcal{C}}^{J},\mathcal{G}_{\mathcal{A}})% \neq\operatorname{hom}(\mathcal{G}_{\mathcal{C}}^{J},\mathcal{G}_{\mathcal{B}})$ . According to Lemma 4.2, this implies that $\operatorname{hom}(\mathcal{C},\mathcal{A})\neq\operatorname{hom}(\mathcal{C},% \mathcal{B})$ .

For the direction “2 $\Rightarrow$ 1”, if there exists an acyclic and connected $\sigma$ -structure $\mathcal{C}$ such that $\operatorname{hom}(\mathcal{C},\mathcal{A})\neq\operatorname{hom}(\mathcal{C},% \mathcal{B})$ , then according to Lemma 4.2 we have $\operatorname{hom}(\mathcal{G}_{\mathcal{C}}^{J},\mathcal{G}_{\mathcal{A}})% \neq\operatorname{hom}(\mathcal{G}_{\mathcal{C}}^{J},\mathcal{G}_{\mathcal{B}})$ , for any join-tree $J$ for $\mathcal{C}$ . By construction, the Gaifman graph of $\mathcal{G}_{\mathcal{C}}^{J}$ is a subgraph of $J$ . In fact, the Gaifman graph of $\mathcal{C}$ being connected implies that the Gaifman graph of $\mathcal{G}_{\mathcal{C}}^{J}$ is exactly $J$ . Hence, $\mathcal{G}_{\mathcal{C}}^{J}$ is a colored multitree. Thus, as pointed out in the first paragraph of the proof, RCR distinguishes $\mathcal{A}$ and $\mathcal{B}$ . $\hfill\blacktriangleleft$

The remainder of this section is devoted to proving the Lemmas 4.2 and 4.3. The following notation will be convenient. If $f$ is a mapping from at set $V$ to a set $V^{\prime}$ , and ${\bm{a}}=(a_{1},\ldots,a_{k})$ is a tuple in $V^{k}$ for some $k\in\mathbb{N}_{\scriptscriptstyle\geqslant 1}$ , then we write $f({\bm{a}})$ for the tuple $(f(a_{1}),\ldots,f(a_{k}))$ .

Proof of Lemma 4.2.

We prove the lemma by providing a bijection $\pi$ between $\operatorname{Hom}(\mathcal{C},\mathcal{A})$ and $\operatorname{Hom}(\mathcal{G}_{\mathcal{C}}^{J},\mathcal{G}_{\mathcal{A}})$ . Recall that $V(\mathcal{G}_{\mathcal{C}}^{J})=\{\,{v_{{\bm{c}}}\;:\;{\bm{c}}\in{\bm{C}}}\,\}$ and $V(\mathcal{G}_{A})=\{\,{w_{\bm{a}}\;:\;{\bm{a}}\in{\bm{A}}}\,\}$ .

For all $h\in\operatorname{Hom}(\mathcal{C},\mathcal{A})$ let $\pi(h)\coloneqq h^{\prime}$ , where $h^{\prime}$ is defined by $h^{\prime}(v_{\bm{c}})\coloneqq w_{h({\bm{c}})}$ for all ${\bm{c}}\in{\bm{C}}$ . Since ${\bm{c}}\in{\bm{C}}$ and $h\in\operatorname{Hom}(\mathcal{C},\mathcal{A})$ , we obtain that $h({\bm{c}})\in{\bm{A}}$ . Hence, $h^{\prime}(v_{\bm{c}})=w_{h({\bm{c}})}\in V(\mathcal{G}_{A})$ . To prove the lemma, it suffices to verify that: (a) $\operatorname{img}(\pi)\subseteq\operatorname{Hom}(\mathcal{G}_{\mathcal{C}}^{% J},\mathcal{G}_{\mathcal{A}})$ , i.e., $\pi(h)$ is a homomorphism for every $h\in\operatorname{Hom}(\mathcal{C},\mathcal{A})$ ; (b) $\pi$ is injective; and (c) $\pi$ is surjective. The proofs of statements (a) and (b) are straightforward. For the proof of statement (c) let $h^{\prime\prime}\in\operatorname{Hom}(\mathcal{G}_{\mathcal{C}}^{J},\mathcal{G% }_{\mathcal{A}})$ . Our aim is to find an $h\in\operatorname{Hom}(\mathcal{C},\mathcal{A})$ such that $h^{\prime\prime}=\pi(h)$ . By definition of $\mathcal{C}$ (recall our assumption on $\sigma$ -structures described in Section 2), for every $z\in V(\mathcal{C})$ there exists an $R\in\sigma$ and a tuple ${\bm{c}}\in R^{\mathcal{C}}$ such that $z\in\operatorname{set}({\bm{c}})$ . For each $z\in V(\mathcal{C})$ let us choose arbitrary, but from now on fixed such $R$ and ${\bm{c}}$ which we henceforth will denote by $R_{z}$ and ${\bm{c}}_{z}$ , and let us fix an $i_{z}\in[\operatorname{ar}(R_{z})]$ such that $z$ is the $i_{z}$ -th component of the tuple ${\bm{c}}_{z}$ . Since ${\bm{c}}_{z}\in{(R_{z})}^{\mathcal{C}}$ , by definition of $\mathcal{G}_{\mathcal{C}}^{J}$ we have $v_{{\bm{c}}_{z}}\in{(U_{R_{z}})}^{\mathcal{G}_{\mathcal{C}}^{J}}$ . Since $h^{\prime\prime}\in\operatorname{Hom}(\mathcal{G}_{\mathcal{C}}^{J},\mathcal{G% }_{\mathcal{A}})$ , we obtain $h^{\prime\prime}(v_{{\bm{c}}_{z}})\in{(U_{R_{z}})}^{\mathcal{G}_{\mathcal{A}}}$ . By definition of $\mathcal{G}_{\mathcal{A}}$ there is a tuple ${\bm{a}}_{z}\in{(R_{z})}^{\mathcal{A}}$ such that $h^{\prime\prime}(v_{{\bm{c}}_{z}})=w_{{\bm{a}}_{z}}$ . Let us write $x_{z}$ to denote the $i_{z}$ -th component of the tuple ${\bm{a}}_{z}$ . Clearly, $x_{z}\in V(\mathcal{A})$ . Using these notions, we define the mapping $h\colon V(\mathcal{C})\to V(\mathcal{A})$ by letting $h(z)\coloneqq x_{z}$ for every $z\in V(\mathcal{C})$ .
Claim 1: For all ${\bm{c}}\in{\bm{C}}$ and ${\bm{a}}\in{\bm{A}}$ with $h^{\prime\prime}(v_{\bm{c}})=w_{\bm{a}}$ we have: $h({\bm{c}})={\bm{a}}$ .
Claim 2: $h\in\operatorname{Hom}(\mathcal{C},\mathcal{A})$ .

See the paper’s full version for proofs of both claims. To complete the proof of Lemma 4.2 it suffices to show that $\pi(h)=h^{\prime\prime}$ . By definition of $\pi$ we have: $\pi(h)=h^{\prime}$ , where $h^{\prime}$ is defined by $h^{\prime}(v_{\bm{c}})\coloneqq w_{h({\bm{c}})}$ for all ${\bm{c}}\in{\bm{C}}$ . From Claim 1 we obtain that $h^{\prime\prime}(v_{\bm{c}})=h^{\prime}(v_{{\bm{c}}})$ for all ${\bm{c}}\in{\bm{C}}$ . Hence, $h^{\prime\prime}=h^{\prime}=\pi(h)$ . This completes the proof of statement (c) and the proof of Lemma 4.2. $\hfill\blacktriangleleft$

Proof of Lemma 4.3.

By assumption, $\operatorname{hom}(\mathcal{T},\mathcal{G}_{\mathcal{A}})\neq\operatorname{hom% }(\mathcal{T},\mathcal{G}_{\mathcal{B}})$ , for a colored multitree $\mathcal{T}$ of signature $\widehat{\sigma}$ and $\sigma$ -structures $\mathcal{A}$ , $\mathcal{B}$ . We will show that the homomorphisms from $\mathcal{T}$ provide us with “templates” for acyclic connected $\sigma$ -structures, one of which must have a number of homomorphisms into $\mathcal{A}$ that is different from its number of homomorphisms into $\mathcal{B}$ .

Let us write $T$ to denote the Gaifman graph of $\mathcal{T}$ . Since $\mathcal{T}$ is a colored multitree, $T$ is a tree. For an $h\in\operatorname{Hom}(\mathcal{T},\mathcal{G}_{\mathcal{A}})$ the print $P_{h}$ of $h$ in $\mathcal{G}_{\mathcal{A}}$ is the colored multigraph of signature $\widehat{\sigma}$ defined by $V(P_{h})\coloneqq V(\mathcal{T})=V(T)$ and, for all $R\in\sigma$ and all $i,j\in[\operatorname{ar}(\sigma)]$ :

\begin{array}[]{rcl}{(U_{R})}^{P_{h}}&\coloneqq&\{\,{\,v\in V({T})\,\;:\;\,h(v% )\in{(U_{R})}^{\mathcal{G}_{\mathcal{A}}}\,}\,\}\,,\vskip 3.0pt plus 1.0pt % minus 1.0pt\\ {(E_{i,j})}^{P_{h}}&\coloneqq&\{\,{\,(u,v)\,\;:\;\,(h(u),h(v))\in{(E_{i,j})}^{% \mathcal{G}_{\mathcal{A}}}\text{ \;and \;}\big{(}\;u{=}v\text{ \,or \,}\{\,{u,% v}\,\}\in E(T)\;\big{)}\;}\,\}\,.\end{array}

Note that $T$ is also the Gaifman graph of $P_{h}$ . We let $P_{\mathcal{A}}=\{\,{P_{h}\,:\,h\in\operatorname{Hom}(\mathcal{T},\mathcal{G}_% {\mathcal{A}})}\,\}$ . The notion of the print $P_{h}$ of $h$ in $\mathcal{G}_{\mathcal{B}}$ for $h\in\operatorname{Hom}(\mathcal{T},\mathcal{G}_{\mathcal{B}})$ and the set $P_{\mathcal{B}}$ are defined analogously. Note that $P_{\mathcal{A}}$ and $P_{\mathcal{B}}$ are not necessarily disjoint, and that different homomorphisms may have the same print.

For every $P\in P_{\mathcal{A}}\cup P_{\mathcal{B}}$ we let $\#(P,\mathcal{A})\coloneqq|\{\,{h\in\operatorname{Hom}(\mathcal{T},\mathcal{G}% _{\mathcal{A}})\,:\,P_{h}=P}\,\}|$ . The number $\#(P,\mathcal{B})$ is defined analogously. Note that $\operatorname{hom}(\mathcal{T},\mathcal{G}_{\mathcal{A}})=\sum_{P\in P_{% \mathcal{A}}}\#(P,\mathcal{A})$ and $\operatorname{hom}(\mathcal{T},\mathcal{G}_{\mathcal{B}})=\sum_{P\in P_{% \mathcal{B}}}\#(P,\mathcal{B})$ – see the paper’s full version for a proof.

For any two prints $P,P^{\prime}$ we say that $P$ is a subprint of $P^{\prime}$ (for short: $P\preceq P^{\prime}$ ) if ${(U_{R})}^{P}\subseteq{(U_{R})}^{P^{\prime}}$ and ${(E_{i,j})}^{P}\subseteq{(E_{i,j})}^{P^{\prime}}$ , for all $R\in\sigma$ and all $i,j\in[\operatorname{ar}(\sigma)]$ . Obviously, $\preceq$ is a partial order on $P_{\mathcal{A}}\cup P_{\mathcal{B}}$ .

It can be verified that for every print $P$ we have $\operatorname{hom}(P,\mathcal{G}_{\mathcal{A}})=\sum_{P^{\prime}:P\preceq P^{% \prime}}\#(P^{\prime},\mathcal{A})$ and $\operatorname{hom}(P,\mathcal{G}_{\mathcal{B}})=\sum_{P^{\prime}:P\preceq P^{% \prime}}\#(P^{\prime},\mathcal{B})$ .

Since $\operatorname{hom}(\mathcal{T},\mathcal{G}_{\mathcal{A}})=\sum_{P\in P_{% \mathcal{A}}}\#(P,\mathcal{A})$ and $\operatorname{hom}(\mathcal{T},\mathcal{G}_{\mathcal{B}})=\sum_{P\in P_{% \mathcal{B}}}\#(P,\mathcal{B})$ and, by assumption, $\operatorname{hom}(\mathcal{T},\mathcal{G}_{\mathcal{A}})\neq\operatorname{hom% }(\mathcal{T},\mathcal{G}_{\mathcal{B}})$ , there must be a $P\in P_{\mathcal{A}}\cup P_{\mathcal{B}}$ such that $\#(P,\mathcal{A})\neq\#(P,\mathcal{B})$ . We choose a largest such $P$ w.r.t. the partial order $\preceq$ . I.e., $\#(P,\mathcal{A})\neq\#(P,\mathcal{B})$ , but $\#(P^{\prime},\mathcal{A})=\#(P^{\prime},\mathcal{B})$ for all $P^{\prime}$ with $P\preceq P^{\prime}$ and $P^{\prime}\neq P$ . Combining this with the fact that $\operatorname{hom}(P,\mathcal{G}_{\mathcal{A}})=\sum_{P^{\prime}:P\preceq P^{% \prime}}\#(P^{\prime},\mathcal{A})$ and $\operatorname{hom}(P,\mathcal{G}_{\mathcal{B}})=\sum_{P^{\prime}:P\preceq P^{% \prime}}\#(P^{\prime},\mathcal{B})$ , we obtain: $\operatorname{hom}(P,\mathcal{G}_{\mathcal{A}})\neq\operatorname{hom}(P,% \mathcal{G}_{\mathcal{B}})$ .

Now, all that remains to be done is to show that there exists an acyclic and connected $\sigma$ -structure $\mathcal{C}$ and a join-tree $J$ for $\mathcal{C}$ ( $J$ will have exactly the same shape as $T$ ) such that $\mathcal{G}_{\mathcal{C}}^{J}$ is isomorphic to $P$ – then $\operatorname{hom}(P,\mathcal{G}_{\mathcal{A}})\neq\operatorname{hom}(P,% \mathcal{G}_{\mathcal{B}})$ implies that $\operatorname{hom}(\mathcal{G}_{\mathcal{C}}^{J},\mathcal{G}_{\mathcal{A}})% \neq\operatorname{hom}(\mathcal{G}_{\mathcal{C}}^{J},\mathcal{G}_{\mathcal{B}})$ . Details on how to construct $\mathcal{C}$ are given in the paper’s full version. This then completes the proof of Lemma 4.3. $\hfill\blacktriangleleft$

5 Connection to Logic

This section’s goal is to provide a logical characterization of the distinguishing power of RCR. We aim for a theorem that is analogous to the following result due to Immerman and Lander [26] and Cai, Fürer, Immerman [10] concerning the logic $\mathsf{C}$ , a syntactic extension of first-order logic with counting quantifiers of the form $\exists^{\scalebox{0.6}{$\geqslant$}n}x\,\psi$ (for every fixed $n\in\mathbb{N}_{\scriptscriptstyle\geqslant 1}$ ), expressing “there exist at least $n$ values for $x$ such that $\psi$ holds”. The restriction of $\mathsf{C}$ to two variables is denoted by $\mathsf{C}^{2}$ .

Theorem 5.1 ([10, 26]).

Let $G$ and $H$ be graphs. The following statements are equivalent:

1.

CR distinguishes $G$ and $H$ .
2.

There exists a sentence $\varphi\in\mathsf{C}^{2}$ such that $G\models\varphi$ and $H\not\models\varphi$ .

5.1 The Guarded Fragment of Counting Logic

This section introduces the guarded fragment of the logic $\mathsf{C}$ , for short: ${\mathsf{GF}}(\mathsf{C})$ . Its definition is in the same spirit as the logic $\mathsf{GF}$ (the guarded fragment of first-order logic; see [25]) and the logic $\mathsf{GF(L)}$ (the guarded fragment of any logic $\mathsf{L}$ that is a subset of first-order logic; see [18]). Here, we use a similar notation as in [18], but adapt it in order to obtain a reasonable notion of “guarded fragment of $\mathsf{C}$ ”. As in [25, 18], the term “guarded” refers to the fact that quantifiers are appropriately relativized by relational atoms.

We have available a countably infinite set $\mathsf{Var}\coloneqq\{\,{\mathtt{v}_{i}\;:\;i\in\mathbb{N}_{\geqslant{}1}}\,\}$ of variables. We call a tuple ${\bm{\mathtt{v}}}$ of $m$ distinct variables of the form $(\mathtt{v}_{i_{1}},\dots,\mathtt{v}_{i_{m}})\in\mathsf{Var}^{m}$ a variable tuple, and we let $\operatorname{vars}({\bm{\mathtt{v}}})\coloneqq\{\,{\mathtt{v}_{i_{1}},\dots,% \mathtt{v}_{i_{m}}}\,\}$ . Recall that at the beginning of Section 3 we have chosen an arbitrary (relational) signature that is fixed throughout the rest of this paper.

Definition 5.2 (Syntax of ${\mathsf{GF}}(\mathsf{C})$ ).

The logic ${\mathsf{GF}}(\mathsf{C})$ is inductively defined along with the free variables and the guard-depth, formalized by the functions $\operatorname{free}\colon{\mathsf{GF}}(\mathsf{C})\to 2^{\mathsf{Var}}$ and $\operatorname{gd}\colon{\mathsf{GF}}(\mathsf{C})\to\mathbb{N}$ .

Atomic Formulas:

For all $R\in\sigma$ with $\ell\coloneqq\operatorname{ar}(R)$ , all $x_{1},\dots,x_{\ell}\in\mathsf{Var}$ and all $x,y\in\mathsf{Var}$ , the following formulas $\varphi$ (of signature $\sigma$ ) are in ${\mathsf{GF}}(\mathsf{C})$ : $\varphi$ is of the form

1.

$R(x_{1},\dots,x_{\ell})$ with $\operatorname{free}(\varphi)\coloneqq\{\,{x_{1},\dots,x_{\ell}}\,\}$ and $\operatorname{gd}(\varphi)\coloneqq 0$ ;
2.

$x\kern 1.0pt{=}\kern 1.0pty$ with $\operatorname{free}(\varphi)\coloneqq\{\,{x,y}\,\}$ and $\operatorname{gd}(\varphi)\coloneqq 0$ .

Inductive Rules:

Let $\chi,\psi$ be formulas (of signature $\sigma$ ) in ${\mathsf{GF}}(\mathsf{C})$ . The following formulas $\varphi$ (of signature $\sigma$ ) are in ${\mathsf{GF}}(\mathsf{C})$ : $\varphi$ is of the form

3.

$\lnot\chi$ with $\operatorname{free}(\varphi)\coloneqq\operatorname{free}(\chi)$ and $\operatorname{gd}(\varphi)\coloneqq\operatorname{gd}(\chi)$ ;
4.

$(\chi\land\psi)$ with $\operatorname{free}(\varphi)\coloneqq\operatorname{free}(\chi)\cup% \operatorname{free}(\psi)$ and $\operatorname{gd}(\varphi)\coloneqq\max(\operatorname{gd}(\chi),\operatorname{% gd}(\psi))$ .

An atomic formula $\Delta$ (of signature $\sigma$ ) of the form $R(x_{1},\ldots,x_{\ell})$ in ${\mathsf{GF}}(\mathsf{C})$ is called a guard for $\psi$ , if $\operatorname{free}(\psi)\subseteq\operatorname{free}(\Delta)$ . Let $n\in\mathbb{N}_{\geqslant{}1}$ and let $\Delta$ be a guard for $\psi$ . For every variable tuple ${\bm{\mathtt{v}}}$ with $\operatorname{vars}({\bm{\mathtt{v}}})\subseteq\operatorname{free}(\Delta)$ , the following formula $\varphi$ (of signature $\sigma$ ) is in ${\mathsf{GF}}(\mathsf{C})$ : $\varphi$ is of the form

5.

$\exists^{\scalebox{0.6}{$\geqslant$}n}\,{\bm{\mathtt{v}}}\mathbin{.}(\Delta% \land\psi)$ with $\operatorname{free}(\varphi)\coloneqq\operatorname{free}(\Delta)\setminus% \operatorname{vars}({\bm{\mathtt{v}}})$ and $\operatorname{gd}(\varphi)\coloneqq\operatorname{gd}(\psi)+1$ .

In this paper we assume w.l.o.g. that the variable tuple ${\bm{\mathtt{v}}}=(\mathtt{v}_{i_{1}},\dots,\mathtt{v}_{i_{m}})$ after a quantifier $\exists^{\scalebox{0.6}{$\geqslant$}n}$ is ordered, i.e., $i_{1}<\cdots<i_{m}$ . This has no effect on the semantics, but simplifies some arguments. We write $\exists^{\scalebox{0.6}{$=$}n}\,{\bm{\mathtt{v}}}\mathbin{.}(\Delta\land\varphi)$ as shorthand for $\big{(}\,\exists^{\scalebox{0.6}{$\geqslant$}n}\,{\bm{\mathtt{v}}}\mathbin{.}(% \Delta\land\varphi)\ \;\land\;\lnot\,\exists^{\scalebox{0.6}{$\geqslant$}n{+}1% }\,{\bm{\mathtt{v}}}\mathbin{.}(\Delta\land\varphi)\,\big{)}$ . We omit parentheses in the usual way.

Definition 5.3 (Semantics of ${\mathsf{GF}}(\mathsf{C})$ ).

A $\sigma$ -interpretation is a tuple $\mathcal{I}=(\mathcal{A},\alpha)$ consisting of a $\sigma$ -structure $\mathcal{A}$ and a function $\alpha\colon\mathsf{Var}\to V(\mathcal{A})$ . Formulas (of signature $\sigma$ ) in ${\mathsf{GF}}(\mathsf{C})$ are evaluated on $\sigma$ -interpretations $\mathcal{I}$ . We write $\mathcal{I}\models\varphi$ to denote that $\mathcal{I}$ satisfies $\varphi$ , and $\mathcal{I}\not\models\varphi$ to denote that $\mathcal{I}$ does not satisfy $\varphi$ . By $\mathcal{I}\!\tfrac{(a_{1},\dots,a_{\ell})}{(\mathtt{v}_{i_{1}},\dots,\mathtt{% v}_{i_{\ell}})}$ we denote the $\sigma$ -interpretation $(\mathcal{A},\alpha^{\prime})$ with $\alpha^{\prime}(\mathtt{v}_{i_{j}})\coloneqq a_{j}$ for all $j\in[\ell]$ , and $\alpha^{\prime}(x)\coloneqq\alpha(x)$ for all $x\in\mathsf{Var}\setminus\{\,{\mathtt{v}_{i_{1}},\dots,\mathtt{v}_{i_{\ell}}}\,\}$ . The semantics of formulas in ${\mathsf{GF}}(\mathsf{C})$ are inductively defined as follows:

Item 1:: $\mathcal{I}\models R(x_{1},\dots,x_{\ell})\iff(\alpha(x_{1}),\dots,\alpha(x_{% \ell}))\in R^{\mathcal{A}}$ .
Item 2:: $\mathcal{I}\models x\kern 1.0pt{=}\kern 1.0pty\iff\alpha(x)=\alpha(y)$ .
Items 3 and 4:: $\mathcal{I}\models\lnot\chi\iff\mathcal{I}\not\models\chi$ . $\mathcal{I}\models(\chi\land\psi)\iff\mathcal{I}\models\chi$ and $\mathcal{I}\models\psi$ .
Item 5:: $\mathcal{I}\models\exists^{\scalebox{0.6}{$\geqslant$}n}{\bm{\mathtt{v}}}% \mathbin{.}(\Delta\land\psi)$ $\iff$ there are at least $n$ tuples ${\bm{a}}\in{V(\mathcal{A})}^{\operatorname{ar}({\bm{\mathtt{v}}})}$ such that $\mathcal{I}\!\tfrac{{\bm{a}}}{{\bm{\mathtt{v}}}}\models(\Delta\land\psi)$ .

We will use the following conventions throughout the paper: $\varphi(x_{1},\dots,x_{k})$ denotes that $\operatorname{free}(\varphi)\subseteq\{\,{x_{1},\dots,x_{k}}\,\}$ ; and $\mathcal{A},(a_{1},\dots,a_{k})\models\varphi(x_{1},\dots,x_{k})$ denotes that $(\mathcal{A},\alpha)\models\varphi$ where $\alpha$ is an assignment where $\alpha(x_{i})=a_{i}$ holds for all $i\in[k]$ . A sentence is a formula $\varphi\in{\mathsf{GF}}(\mathsf{C})$ that has no free variable, i.e., $\operatorname{free}(\varphi)=\varnothing$ . If $\varphi$ is a sentence, we write $\mathcal{A}\models\varphi$ to denote that $(\mathcal{A},\alpha)\models\varphi$ for any assignment $\alpha$ (since $\alpha$ does not matter in this case). We write $\mathcal{A}\equiv_{{\mathsf{GF}}(\mathsf{C})}\mathcal{B}$ to denote that the $\sigma$ -structures $\mathcal{A}$ and $\mathcal{B}$ satisfy the same sentences (of signature $\sigma$ ) in ${\mathsf{GF}}(\mathsf{C})$ . Finally, we say that $\mathcal{A}$ and $\mathcal{B}$ are distinguishable in ${\mathsf{GF}}(\mathsf{C})$ if $\mathcal{A}\not\equiv_{{\mathsf{GF}}(\mathsf{C})}\mathcal{B}$ .

Example 5.4.

Consider the formula $\varphi\coloneqq$

\exists^{\scalebox{0.6}{$\geqslant$}1}(\mathtt{v}_{1},\mathtt{v}_{2},\mathtt{v% }_{3},\mathtt{v}_{4},\mathtt{v}_{5},\mathtt{v}_{6})\mathbin{.}\bigl{(}R(% \mathtt{v}_{1},\mathtt{v}_{2},\mathtt{v}_{3},\mathtt{v}_{4},\mathtt{v}_{5},% \mathtt{v}_{6})\land\bigl{(}E(\mathtt{v}_{1},\mathtt{v}_{2})\land(\,E(\mathtt{% v}_{2},\mathtt{v}_{3})\land E(\mathtt{v}_{3},\mathtt{v}_{1})\,)\bigr{)}\bigr{)},

the $\sigma_{1}$ -structure $\mathcal{A}_{1}$ from Example 3.2 and the $\sigma_{1}$ -structure $\mathcal{B}_{1}$ with $V(\mathcal{B}_{1})=\{{1,2,3,u,v,w}\}$ , $E^{\mathcal{B}_{1}}\coloneqq\{(1,2),(2,w),(w,u),\allowbreak(u,v),(v,3),(3,1)\}$ and $R^{\mathcal{B}_{1}}\coloneqq\{{(1,2,3,u,v,w)}\}$ .

Clearly, $\psi\coloneqq\bigl{(}E(\mathtt{v}_{1},\mathtt{v}_{2})\land(\,E(\mathtt{v}_{2},% \mathtt{v}_{3})\land E(\mathtt{v}_{3},\mathtt{v}_{1})\,)\bigr{)}$ is a formula in ${\mathsf{GF}}(\mathsf{C})$ with $\operatorname{free}(\psi)=\{{\mathtt{v}_{1},\mathtt{v}_{2},\mathtt{v}_{3}}\}$ and $\operatorname{gd}(\psi)=0$ . Thus, $R(\mathtt{v}_{1},\mathtt{v}_{2},\mathtt{v}_{3},\mathtt{v}_{4},\mathtt{v}_{5},% \mathtt{v}_{6})$ is a guard for $\psi$ . Hence, $\varphi\in{\mathsf{GF}}(\mathsf{C})$ with $\operatorname{free}(\varphi)=\varnothing$ , and $\operatorname{gd}(\varphi)=1$ .

The formula $\varphi$ states that there is at least one tuple (of arity $6$ ) in $R$ such that the first 3 entries of this tuple form a triangle w.r.t. relation $E$ . Hence, $\mathcal{A}_{1}\models\varphi$ and $\mathcal{B}_{1}\not\models\varphi$ , which means $\mathcal{A}_{1}$ and $\mathcal{B}_{1}$ are distinguishable in ${\mathsf{GF}}(\mathsf{C})$ .

5.2 The Guarded Game

Our ultimate goal in Section 5 is to prove for any two $\sigma$ -structures $\mathcal{A}$ and $\mathcal{B}$ that RCR distinguishes $\mathcal{A}$ and $\mathcal{B}$ if, and only if, $\mathcal{A}\not\equiv_{{\mathsf{GF}}(\mathsf{C})}\mathcal{B}$ . Similarly to the proof of Theorem 5.1, our proof will use, as an intermediate step, a game characterization of (in)distinguishability of two $\sigma$ -structures in ${\mathsf{GF}}(\mathsf{C})$ . We call this game the Guarded Game; it is defined as follows. It is played on two $\sigma$ -structures $\mathcal{A}$ , $\mathcal{B}$ . A configuration of the Guarded Game is a tuple of the form $((\mathcal{A},{\bm{a}}),(\mathcal{B},{\bm{b}}))$ , where $\mathcal{A}$ and $\mathcal{B}$ are the given $\sigma$ -structures and ${\bm{a}}\in{V(\mathcal{A})}^{k}$ , ${\bm{b}}\in{V(\mathcal{B})}^{k}$ for some $k\in\mathbb{N}$ . A configuration $((\mathcal{A},{\bm{a}}),(\mathcal{B},{\bm{b}}))$ is called distinguishing, if $\mathsf{stp}({\bm{a}})\neq\mathsf{stp}({\bm{b}})$ or there are an $\ell\in[\operatorname{ar}(\sigma)]$ and indices $i_{1},\ldots,i_{\ell}\in[k]$ such that $\mathsf{atp}((a_{i_{1}},\dots,a_{i_{\ell}}))\neq\mathsf{atp}((b_{i_{1}},\dots,% b_{i_{\ell}}))$ . We may omit parentheses if they are clear from the context. If $k=0$ , we write $\mathcal{A},\mathcal{B}$ for the configuration $((\mathcal{A},()),(\mathcal{B},()))$ , and we call this the empty configuration; note that this configuration is not distinguishing.

A round of the Guarded Game is played as follows: consider $(\mathcal{A},{\bm{a}})$ , $(\mathcal{B},{\bm{b}})$ to be the configuration at the beginning of the round. Spoiler picks a relation symbol $R\in\sigma$ . Then, Duplicator provides a bijection $\pi$ between $R^{\mathcal{A}}$ and $R^{\mathcal{B}}$ . If no such bijection exists (i.e., $\lvert{}{R^{\mathcal{A}}}\rvert\neq\lvert{}{R^{\mathcal{B}}}\rvert$ ), the round ends and Spoiler wins this round; otherwise the round proceeds as follows. Spoiler picks some ${\bm{a}}^{\prime}\in R^{\mathcal{A}}$ and creates the new configuration $(\mathcal{A},{\bm{a}}^{\prime})$ , $(\mathcal{B},{\bm{b}}^{\prime})$ where ${\bm{b}}^{\prime}\coloneqq\pi({\bm{a}}^{\prime})$ . Duplicator wins this round if the new configuration is not distinguishing and $\mathsf{stp}({\bm{a}},{\bm{a}}^{\prime})=\mathsf{stp}({\bm{b}},{\bm{b}}^{% \prime})$ . Otherwise, Spoiler wins this round.

Duplicator has a $0$ -round winning strategy on $(\mathcal{A},{\bm{a}})$ , $(\mathcal{B},{\bm{b}})$ if the configuration is not distinguishing. For $i\geqslant 1$ , Duplicator has an $i$ -round winning strategy on $(\mathcal{A},{\bm{a}})$ , $(\mathcal{B},{\bm{b}})$ if this configuration is not distinguishing, and she can provide a bijection $\pi$ for every $R\in\sigma$ that Spoiler may pick, such that for every ${\bm{a}}^{\prime}$ and ${\bm{b}}^{\prime}\coloneqq\pi({\bm{a}}^{\prime})$ that Spoiler may choose, she wins the current round and has an $(i{-}1)$ -round winning strategy on the resulting configuration $(\mathcal{A},{\bm{a}}^{\prime})$ , $(\mathcal{B},{\bm{b}}^{\prime})$ . Spoiler has an $i$ -round winning-strategy on $(\mathcal{A},{\bm{a}})$ , $(\mathcal{B},{\bm{b}})$ , if Duplicator does not have one. In particular, if Spoiler has a winning strategy for $i$ rounds, he also has a winning strategy for more than $i$ rounds. If $\mathcal{A}$ and $\mathcal{B}$ are not of strictly equal size, then Spoiler has a trivial $1$ -round winning strategy, because Duplicator is unable to give a bijection in the first round. We say that Duplicator wins the Guarded Game on $(\mathcal{A},{\bm{a}})$ , $(\mathcal{B},{\bm{b}})$ if she has an $i$ -round winning strategy on $(\mathcal{A},{\bm{a}})$ , $(\mathcal{B},{\bm{b}})$ for every $i\in\mathbb{N}$ .

5.3 RCR is equivalent to ${\mathsf{GF}}(\mathsf{C})$ and the Guarded Game

This section is devoted to proving the equivalence of statements (1), (3) and (4) of Theorem A (b), i.e., we prove the following theorem.

Theorem D.

For all $\sigma$ -structures $\mathcal{A}$ and $\mathcal{B}$ , the following statements are equivalent.

1.

RCR distinguishes $\mathcal{A}$ and $\mathcal{B}$ .
2.

There exists a sentence $\varphi\in{\mathsf{GF}}(\mathsf{C})$ such that $\mathcal{A}\models\varphi$ and $\mathcal{B}\not\models\varphi$ .
3.

Spoiler wins the Guarded Game on $\mathcal{A},\mathcal{B}$ .

We prove the theorem by showing that the implication chain $1\Rightarrow 2\Rightarrow 3\Rightarrow 1$ holds. For this, we use the following three lemmas; their proofs are inductive and quite similar to the way the analogous result on graphs is shown, thus we defer their proofs to the paper’s full version. The arity $\operatorname{ar}(c)$ (atomic type $\mathsf{atp}(c)$ , similarity type $\mathsf{stp}(c)$ ) of a color $c\in\mathsf{RC}_{i}(\mathcal{A})$ is defined as the arity (atomic type, similarity type) of the tuples in ${\bm{A}}$ that receive this color. Recall that we denote the color of a tuple ${\bm{a}}\in{\bm{A}}$ after $i$ iterations of RCR as $\operatorname{\varrho}_{i}({\bm{a}})$ .

Lemma 5.5.

For every $\sigma$ -structure $\mathcal{A}$ , every $i\in\mathbb{N}$ , and every $c\in\mathsf{RC}_{i}(\mathcal{A})$ of arity $k$ , there exists a formula $\varphi^{i}_{c}({\bm{x}})\in{\mathsf{GF}}(\mathsf{C})$ with ${\bm{x}}=(x_{1},\dots,x_{k})$ such that for every $\sigma$ -structure $\mathcal{B}$ of size strictly equal to $\mathcal{A}$ and every ${\bm{b}}\in{\bm{B}}$ of arity $k$ we have: $\mathcal{B},{\bm{b}}\models\varphi^{i}_{c}({\bm{x}})$ $\iff$ $\operatorname{\varrho}_{i}({\bm{b}})=c$ .

Lemma 5.6.

Let $\mathcal{A}$ and $\mathcal{B}$ be $\sigma$ -structures of strictly equal size and let ${\bm{a}}\in{V(\mathcal{A})}^{k}$ , ${\bm{b}}\in{V(\mathcal{B})}^{k}$ be arbitrary tuples of arity $k$ . Let ${\bm{x}}=(x_{1},\ldots,x_{k})$ be a tuple of $k$ distinct variables. If there exists a formula $\varphi\in{\mathsf{GF}}(\mathsf{C})$ with $\operatorname{free}(\varphi)\subseteq\{\,{x_{1},\ldots,x_{k}}\,\}$ such that $\mathcal{A},{\bm{a}}\models\varphi({\bm{x}}){\iff}\;\mathcal{B},{\bm{b}}\not% \models\varphi({\bm{x}})$ , then Spoiler has a $\operatorname{gd}(\varphi)$ -round winning strategy for the Guarded Game on $(\mathcal{A},{\bm{a}})$ , $(\mathcal{B},{\bm{b}})$ .

Lemma 5.7.

Let $i\in\mathbb{N}$ , let $\mathcal{A}$ and $\mathcal{B}$ be $\sigma$ -structures of strictly equal size, and let ${\bm{a}}\in{\bm{A}}$ , ${\bm{b}}\in{\bm{B}}$ be tuples of arity $k$ . If $\operatorname{\varrho}_{1}({\bm{a}})=\operatorname{\varrho}_{1}({\bm{b}})$ , then the configuration $(\mathcal{A},{\bm{a}})$ , $(\mathcal{B},{\bm{b}})$ is not distinguishing. Further, if $\operatorname{\varrho}_{i+1}({\bm{a}})=\operatorname{\varrho}_{i+1}({\bm{b}})$ and $\operatorname{mult}_{\mathcal{A}}(c)=\operatorname{mult}_{\mathcal{B}}(c)$ for all $c\in\mathsf{RC}_{i}(\mathcal{A})\cup\mathsf{RC}_{i}(\mathcal{B})$ , then Duplicator has an $i$ -round winning strategy for the Guarded Game on $(\mathcal{A},{\bm{a}})$ , $(\mathcal{B},{\bm{b}})$ .

The proof of ˜D proceeds as follows. If $\mathcal{A}$ and $\mathcal{B}$ are not of strictly equal size, it is straightforward to see that each of the theorem’s three statements is fulfilled. For the case where $\mathcal{A}$ and $\mathcal{B}$ are of strictly equal size, “1 $\Rightarrow$ 2” easily follows from Lemma 5.5, and “2 $\Rightarrow$ 3” is obtained from Lemma 5.6. Concerning “3 $\Rightarrow$ 1”, one proves the contraposition and uses Lemma 5.7 to obtain a winning strategy for Duplicator in the Guarded Game; this winning strategy ensures that after each round, the configuration is of the form $(\mathcal{A},{\bm{a}})$ , $(\mathcal{B},{\bm{b}})$ where ${\bm{a}}$ and ${\bm{b}}$ have the same color in the stable coloring produced by RCR on $\mathcal{A}$ and $\mathcal{B}$ .

6 Final Remarks

We introduced Relational Color Refinement (RCR) as an adaptation of the classical Color Refinement (CR) procedure for arbitrary relational structures. We showed that it can be implemented with the same running time as CR (˜B). Furthermore, we showed that the distinguishing power of RCR admits an analogous combinatorial (˜C) and logical (˜D) characterization as CR. Combining the Theorems B, C and D yields our main result, ˜A, formulated in Section 1.
There are multiple directions for further research:

$\blacksquare$

One interesting task is to lift the results of [2, 28] from CR and $\mathsf{C}^{2}$ to RCR and ${\mathsf{GF}}(\mathsf{C})$ . This is non-trivial, because ${\mathsf{GF}}(\mathsf{C})$ is capable of identifying certain $\sigma$ -structures that cannot be identified in $\mathsf{C}^{2}$ : consider, e.g., the signature $\sigma_{3}\coloneqq\{\,{R}\,\}$ with $\operatorname{ar}(R)=4$ and the $\sigma_{3}$ -structure $\mathcal{A}_{3}$ with $V(\mathcal{A}_{3})=\{\,{1,2,3}\,\}$ and $R^{\mathcal{A}_{3}}=\{\,{(1,2,3,3)}\,\}$ . It is easy to construct a ${\mathsf{GF}}(\mathsf{C})$ -sentence $\varphi$ that is satisfied by $\mathcal{A}_{3}$ but by no $\sigma_{3}$ -structure $\mathcal{B}_{3}$ that is not isomorphic to $\mathcal{A}_{3}$ and where every node is contained in some relation (recall the assumption on structures we adopted in Section 2). But there does not exist any $\mathsf{C}^{2}$ -sentence $\varphi$ with the same property – in fact, according to [28, Proof of Corollary 5.9], the logic $\mathsf{C}^{2}$ does not identify any $\sigma$ -structure whose universe contains $\geqslant 3$ elements and where $\sigma$ contains a relation symbol of arity $\geqslant 3$ .
$\blacksquare$

Considering that CR is equivalent to the 1-dimensional Weisfeiler-Leman algorithm (WL), RCR might be a good basis to devise a generalization of the $k$ -dimensional WL to arbitrary relational structures.
$\blacksquare$

Given the close relationship between relational structures and hypergraphs, a coloring method similar to RCR should exist for hypergraphs, too. However, RCR relies heavily on the order that the tuples provide – and this order is absent in hyperedges. Thus, it is not clear how to adapt the refinement of the colors in an iteration step from tuples to hyperedges.
$\blacksquare$

Another promising direction is to think about the applications of RCR, given that there are so many applications of classical CR, also apart from isomorphism testing (cf. Section 1). In particular, we conjecture that some of the techniques developed in this paper can be used to lift the result by Riveros, Scheidt, Schweikardt [31] from binary structures to arbitrary structures. Further, the tight connection between CR and Graph Neural Networks suggests interesting applications for RCR in machine learning as well. We plan to investigate this.

References

[1] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.
[2] V. Arvind, Johannes Köbler, Gaurav Rattan, and Oleg Verbitsky. Graph Isomorphism, Color Refinement, and Compactness. Computational Complexity, 26(3):627–685, September 2017. doi:10.1007/s00037-016-0147-6.
[3] László Babai. Graph isomorphism in quasipolynomial time [extended abstract]. In Daniel Wichs and Yishay Mansour, editors, Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 684–697. ACM, 2016. doi:10.1145/2897518.2897542.
[4] Catriel Beeri, Ronald Fagin, David Maier, and Mihalis Yannakakis. On the desirability of acyclic database schemes. J. ACM, 30(3):479–513, 1983. doi:10.1145/2402.322389.
[5] Christoph Berkholz, Paul S. Bonsma, and Martin Grohe. Tight lower and upper bounds for the complexity of canonical colour refinement. Theory Comput. Syst., 60(4):581–614, 2017. doi:10.1007/S00224-016-9686-0.
[6] Philip A. Bernstein and Nathan Goodman. Power of natural semijoins. SIAM J. Comput., 10(4):751–771, 1981. doi:10.1137/0210059.
[7] Johann Brault-Baron. Hypergraph Acyclicity Revisited. ACM Comput. Surv., 49(3):54:1–54:26, 2016. doi:10.1145/2983573.
[8] Silvia Butti and Víctor Dalmau. Fractional Homomorphism, Weisfeiler-Leman Invariance, and the Sherali-Adams Hierarchy for the Constraint Satisfaction Problem. In Filippo Bonchi and Simon J. Puglisi, editors, 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021), volume 202 of Leibniz International Proceedings in Informatics (LIPIcs), pages 27:1–27:19, Dagstuhl, Germany, 2021. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.MFCS.2021.27.
[9] Jan Böker. Color Refinement, Homomorphisms, and Hypergraphs. In Ignas Sau and Dimitrios M. Thilikos, editors, Graph-Theoretic Concepts in Computer Science - 45th International Workshop, WG 2019, Vall de Núria, Spain, June 19-21, 2019, Revised Papers, volume 11789 of Lecture Notes in Computer Science, pages 338–350. Springer, 2019. doi:10.1007/978-3-030-30786-8_26.
[10] Jin-Yi Cai, Martin Fürer, and Neil Immerman. An optimal lower bound on the number of variables for graph identification. Combinatorica, 12(4):389–410, December 1992. doi:10.1007/BF01305232.
[11] A. Cardon and Maxime Crochemore. Partitioning a Graph in $O(|A|\log_{2}|V|)$ . Theor. Comput. Sci., 19:85–98, 1982. doi:10.1016/0304-3975(82)90016-0.
[12] Anuj Dawar, Tomáš Jakl, and Luca Reggio. Lovász-type theorems and game comonads. In 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2021, Rome, Italy, June 29 - July 2, 2021, pages 1–13. IEEE, 2021. doi:10.1109/LICS52264.2021.9470609.
[13] Holger Dell, Martin Grohe, and Gaurav Rattan. Lovász meets Weisfeiler and Leman. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, volume 107 of LIPIcs, pages 40:1–40:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPIcs.ICALP.2018.40.
[14] Reinhard Diestel. Graph Theory, volume 173 of Graduate Texts in Mathematics. Springer-Verlag, Heidelberg, 6th edition, 2025. URL: https://diestel-graph-theory.com/.
[15] Zdeněk Dvořák. On recognizing graphs by numbers of homomorphisms. Journal of Graph Theory, 64(4):330–342, 2010. doi:10.1002/jgt.20461.
[16] Eva Fluck, Tim Seppelt, and Gian Luca Spitzer. Going deep and going wide: Counting logic and homomorphism indistinguishability over graphs of bounded treedepth and treewidth. In Aniello Murano and Alexandra Silva, editors, 32nd EACSL Annual Conference on Computer Science Logic, CSL 2024, February 19-23, 2024, Naples, Italy, volume 288 of Leibniz International Proceedings in Informatics (LIPIcs), pages 27:1–27:17, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.CSL.2024.27.
[17] Georg Gottlob, Nicola Leone, and Francesco Scarcello. Hypertree decompositions and tractable queries. J. Comput. Syst. Sci., 64(3):579–627, 2002. doi:10.1006/jcss.2001.1809.
[18] Georg Gottlob, Nicola Leone, and Francesco Scarcello. Robbers, marshals, and guards: Game theoretic and logical characterizations of hypertree width. Journal of Computer and System Sciences, 66(4):775–808, 2003. doi:10.1016/S0022-0000(03)00030-8.
[19] Martin Grohe. Descriptive Complexity, Canonisation, and Definable Graph Structure Theory, volume 47 of Lecture Notes in Logic. Cambridge University Press, 2017. doi:10.1017/9781139028868.
[20] Martin Grohe. Counting Bounded Tree Depth Homomorphisms. In Holger Hermanns, Lijun Zhang, Naoki Kobayashi, and Dale Miller, editors, Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’20, pages 507–520, New York, NY, USA, 2020. Association for Computing Machinery. doi:10.1145/3373718.3394739.
[21] Martin Grohe. Word2vec, Node2vec, Graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data. In Dan Suciu, Yufei Tao, and Zhewei Wei, editors, Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS’20, pages 1–16. ACM, 2020. doi:10.1145/3375395.3387641.
[22] Martin Grohe, Kristian Kersting, Martin Mladenov, and Pascal Schweitzer. Color Refinement and Its Applications. In Guy Van den Broeck, Kristian Kersting, Sriraam Natarajan, and David Poole, editors, An Introduction to Lifted Probabilistic Inference. The MIT Press, 2021. doi:10.7551/mitpress/10548.003.0023.
[23] Martin Grohe, Kristian Kersting, Martin Mladenov, and Erkal Selman. Dimension Reduction via Colour Refinement. In Andreas S. Schulz and Dorothea Wagner, editors, Algorithms - ESA 2014, Lecture Notes in Computer Science, pages 505–516, Berlin, Heidelberg, 2014. Springer. doi:10.1007/978-3-662-44777-2_42.
[24] Martin Grohe and Dániel Marx. Constraint solving via fractional edge covers. ACM Trans. Algorithms, 11(1):4:1–4:20, 2014. doi:10.1145/2636918.
[25] Erich Grädel. On the Restraining Power of Guards. Journal of Symbolic Logic, 64(4):1719–1742, 1999. doi:10.2307/2586808.
[26] Neil Immerman and Eric Lander. Describing Graphs: A First-Order Approach to Graph Canonization. In Alan L. Selman, editor, Complexity Theory Retrospective: In Honor of Juris Hartmanis on the Occasion of His Sixtieth Birthday, July 5, 1988, pages 59–81. Springer, New York, NY, 1990. doi:10.1007/978-1-4612-4478-3_5.
[27] Sandra Kiefer. The Weisfeiler-Leman Algorithm: An Exploration of Its Power. ACM SIGLOG News, 7(3):5–27, November 2020. doi:10.1145/3436980.3436982.
[28] Sandra Kiefer, Pascal Schweitzer, and Erkal Selman. Graphs Identified by Logics with Counting. ACM Transactions on Computational Logic, 23(1):1:1–1:31, October 2021. doi:10.1145/3417515.
[29] Laura Mančinska and David E. Roberson. Quantum isomorphism is equivalent to equality of homomorphism counts from planar graphs. In Sandy Irani, editor, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 661–672. IEEE, 2020. doi:10.1109/FOCS46700.2020.00067.
[30] Yoàv Montacute and Nihil Shah. The Pebble-Relation Comonad in Finite Model Theory. In Christel Baier and Dana Fisman, editors, Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science, number 13 in LICS ’22, pages 1–11, New York, NY, USA, 2022. Association for Computing Machinery. doi:10.1145/3531130.3533335.
[31] Cristian Riveros, Benjamin Scheidt, and Nicole Schweikardt. Using Color Refinement to Boost Enumeration and Counting for Acyclic CQs of Binary Schemas. CoRR, 2024. doi:10.48550/arXiv.2405.12358.
[32] David E. Roberson. Oddomorphisms and homomorphism indistinguishability over graphs of bounded degree. CoRR, 2022. arXiv:2206.10321.
[33] Benjamin Scheidt. On Homomorphism Indistinguishability and Hypertree Depth. In Karl Bringmann, Martin Grohe, Gabriele Puppis, and Ola Svensson, editors, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024), volume 297 of Leibniz International Proceedings in Informatics (LIPIcs), pages 152:1–152:18, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ICALP.2024.152.
[34] Benjamin Scheidt and Nicole Schweikardt. Counting Homomorphisms from Hypergraphs of Bounded Generalised Hypertree Width: A Logical Characterisation. In Jérôme Leroux, Sylvain Lombardy, and David Peleg, editors, 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023), volume 272 of Leibniz International Proceedings in Informatics (LIPIcs), pages 79:1–79:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.MFCS.2023.79.
[35] Benjamin Scheidt and Nicole Schweikardt. Color Refinement for Relational Structures. CoRR, 2024. doi:10.48550/arXiv.2407.16022.
[36] Tim Seppelt. Logical equivalences, homomorphism indistinguishability, and forbidden minors. In Jérôme Leroux, Sylvain Lombardy, and David Peleg, editors, 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023), volume 272 of Leibniz International Proceedings in Informatics (LIPIcs), pages 82:1–82:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.MFCS.2023.82.
[37] Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. Weisfeiler-Lehman Graph Kernels. J. Mach. Learn. Res., 12:2539–2561, 2011. doi:10.5555/1953048.2078187.
[38] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How Powerful are Graph Neural Networks? In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL: https://openreview.net/forum?id=ryGs6iA5Km.

[bib.bib1] [1] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.

[bib.bib2] [2] V. Arvind, Johannes Köbler, Gaurav Rattan, and Oleg Verbitsky. Graph Isomorphism, Color Refinement, and Compactness. Computational Complexity, 26(3):627–685, September 2017. doi:10.1007/s00037-016-0147-6.

[bib.bib3] [3] László Babai. Graph isomorphism in quasipolynomial time [extended abstract]. In Daniel Wichs and Yishay Mansour, editors, Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 684–697. ACM, 2016. doi:10.1145/2897518.2897542.

[bib.bib4] [4] Catriel Beeri, Ronald Fagin, David Maier, and Mihalis Yannakakis. On the desirability of acyclic database schemes. J. ACM, 30(3):479–513, 1983. doi:10.1145/2402.322389.

[bib.bib5] [5] Christoph Berkholz, Paul S. Bonsma, and Martin Grohe. Tight lower and upper bounds for the complexity of canonical colour refinement. Theory Comput. Syst., 60(4):581–614, 2017. doi:10.1007/S00224-016-9686-0.

[bib.bib6] [6] Philip A. Bernstein and Nathan Goodman. Power of natural semijoins. SIAM J. Comput., 10(4):751–771, 1981. doi:10.1137/0210059.

[bib.bib7] [7] Johann Brault-Baron. Hypergraph Acyclicity Revisited. ACM Comput. Surv., 49(3):54:1–54:26, 2016. doi:10.1145/2983573.

[bib.bib8] [8] Silvia Butti and Víctor Dalmau. Fractional Homomorphism, Weisfeiler-Leman Invariance, and the Sherali-Adams Hierarchy for the Constraint Satisfaction Problem. In Filippo Bonchi and Simon J. Puglisi, editors, 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021), volume 202 of Leibniz International Proceedings in Informatics (LIPIcs), pages 27:1–27:19, Dagstuhl, Germany, 2021. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.MFCS.2021.27.

[bib.bib9] [9] Jan Böker. Color Refinement, Homomorphisms, and Hypergraphs. In Ignas Sau and Dimitrios M. Thilikos, editors, Graph-Theoretic Concepts in Computer Science - 45th International Workshop, WG 2019, Vall de Núria, Spain, June 19-21, 2019, Revised Papers, volume 11789 of Lecture Notes in Computer Science, pages 338–350. Springer, 2019. doi:10.1007/978-3-030-30786-8_26.

[bib.bib10] [10] Jin-Yi Cai, Martin Fürer, and Neil Immerman. An optimal lower bound on the number of variables for graph identification. Combinatorica, 12(4):389–410, December 1992. doi:10.1007/BF01305232.

[bib.bib11] [11] A. Cardon and Maxime Crochemore. Partitioning a Graph in $O(|A|\log_{2}|V|)$ . Theor. Comput. Sci., 19:85–98, 1982. doi:10.1016/0304-3975(82)90016-0.

[bib.bib12] [12] Anuj Dawar, Tomáš Jakl, and Luca Reggio. Lovász-type theorems and game comonads. In 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2021, Rome, Italy, June 29 - July 2, 2021, pages 1–13. IEEE, 2021. doi:10.1109/LICS52264.2021.9470609.

[bib.bib13] [13] Holger Dell, Martin Grohe, and Gaurav Rattan. Lovász meets Weisfeiler and Leman. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, volume 107 of LIPIcs, pages 40:1–40:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPIcs.ICALP.2018.40.

[bib.bib14] [14] Reinhard Diestel. Graph Theory, volume 173 of Graduate Texts in Mathematics. Springer-Verlag, Heidelberg, 6th edition, 2025. URL: https://diestel-graph-theory.com/.

[bib.bib15] [15] Zdeněk Dvořák. On recognizing graphs by numbers of homomorphisms. Journal of Graph Theory, 64(4):330–342, 2010. doi:10.1002/jgt.20461.

[bib.bib16] [16] Eva Fluck, Tim Seppelt, and Gian Luca Spitzer. Going deep and going wide: Counting logic and homomorphism indistinguishability over graphs of bounded treedepth and treewidth. In Aniello Murano and Alexandra Silva, editors, 32nd EACSL Annual Conference on Computer Science Logic, CSL 2024, February 19-23, 2024, Naples, Italy, volume 288 of Leibniz International Proceedings in Informatics (LIPIcs), pages 27:1–27:17, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.CSL.2024.27.

[bib.bib17] [17] Georg Gottlob, Nicola Leone, and Francesco Scarcello. Hypertree decompositions and tractable queries. J. Comput. Syst. Sci., 64(3):579–627, 2002. doi:10.1006/jcss.2001.1809.

[bib.bib18] [18] Georg Gottlob, Nicola Leone, and Francesco Scarcello. Robbers, marshals, and guards: Game theoretic and logical characterizations of hypertree width. Journal of Computer and System Sciences, 66(4):775–808, 2003. doi:10.1016/S0022-0000(03)00030-8.

[bib.bib19] [19] Martin Grohe. Descriptive Complexity, Canonisation, and Definable Graph Structure Theory, volume 47 of Lecture Notes in Logic. Cambridge University Press, 2017. doi:10.1017/9781139028868.

[bib.bib20] [20] Martin Grohe. Counting Bounded Tree Depth Homomorphisms. In Holger Hermanns, Lijun Zhang, Naoki Kobayashi, and Dale Miller, editors, Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’20, pages 507–520, New York, NY, USA, 2020. Association for Computing Machinery. doi:10.1145/3373718.3394739.

[bib.bib21] [21] Martin Grohe. Word2vec, Node2vec, Graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data. In Dan Suciu, Yufei Tao, and Zhewei Wei, editors, Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS’20, pages 1–16. ACM, 2020. doi:10.1145/3375395.3387641.

[bib.bib22] [22] Martin Grohe, Kristian Kersting, Martin Mladenov, and Pascal Schweitzer. Color Refinement and Its Applications. In Guy Van den Broeck, Kristian Kersting, Sriraam Natarajan, and David Poole, editors, An Introduction to Lifted Probabilistic Inference. The MIT Press, 2021. doi:10.7551/mitpress/10548.003.0023.

[bib.bib23] [23] Martin Grohe, Kristian Kersting, Martin Mladenov, and Erkal Selman. Dimension Reduction via Colour Refinement. In Andreas S. Schulz and Dorothea Wagner, editors, Algorithms - ESA 2014, Lecture Notes in Computer Science, pages 505–516, Berlin, Heidelberg, 2014. Springer. doi:10.1007/978-3-662-44777-2_42.

[bib.bib24] [24] Martin Grohe and Dániel Marx. Constraint solving via fractional edge covers. ACM Trans. Algorithms, 11(1):4:1–4:20, 2014. doi:10.1145/2636918.

[bib.bib25] [25] Erich Grädel. On the Restraining Power of Guards. Journal of Symbolic Logic, 64(4):1719–1742, 1999. doi:10.2307/2586808.

[bib.bib26] [26] Neil Immerman and Eric Lander. Describing Graphs: A First-Order Approach to Graph Canonization. In Alan L. Selman, editor, Complexity Theory Retrospective: In Honor of Juris Hartmanis on the Occasion of His Sixtieth Birthday, July 5, 1988, pages 59–81. Springer, New York, NY, 1990. doi:10.1007/978-1-4612-4478-3_5.

[bib.bib27] [27] Sandra Kiefer. The Weisfeiler-Leman Algorithm: An Exploration of Its Power. ACM SIGLOG News, 7(3):5–27, November 2020. doi:10.1145/3436980.3436982.

[bib.bib28] [28] Sandra Kiefer, Pascal Schweitzer, and Erkal Selman. Graphs Identified by Logics with Counting. ACM Transactions on Computational Logic, 23(1):1:1–1:31, October 2021. doi:10.1145/3417515.

[bib.bib29] [29] Laura Mančinska and David E. Roberson. Quantum isomorphism is equivalent to equality of homomorphism counts from planar graphs. In Sandy Irani, editor, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 661–672. IEEE, 2020. doi:10.1109/FOCS46700.2020.00067.

[bib.bib30] [30] Yoàv Montacute and Nihil Shah. The Pebble-Relation Comonad in Finite Model Theory. In Christel Baier and Dana Fisman, editors, Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science, number 13 in LICS ’22, pages 1–11, New York, NY, USA, 2022. Association for Computing Machinery. doi:10.1145/3531130.3533335.

[bib.bib31] [31] Cristian Riveros, Benjamin Scheidt, and Nicole Schweikardt. Using Color Refinement to Boost Enumeration and Counting for Acyclic CQs of Binary Schemas. CoRR, 2024. doi:10.48550/arXiv.2405.12358.

[bib.bib32] [32] David E. Roberson. Oddomorphisms and homomorphism indistinguishability over graphs of bounded degree. CoRR, 2022. arXiv:2206.10321.

[bib.bib33] [33] Benjamin Scheidt. On Homomorphism Indistinguishability and Hypertree Depth. In Karl Bringmann, Martin Grohe, Gabriele Puppis, and Ola Svensson, editors, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024), volume 297 of Leibniz International Proceedings in Informatics (LIPIcs), pages 152:1–152:18, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ICALP.2024.152.

[bib.bib34] [34] Benjamin Scheidt and Nicole Schweikardt. Counting Homomorphisms from Hypergraphs of Bounded Generalised Hypertree Width: A Logical Characterisation. In Jérôme Leroux, Sylvain Lombardy, and David Peleg, editors, 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023), volume 272 of Leibniz International Proceedings in Informatics (LIPIcs), pages 79:1–79:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.MFCS.2023.79.

[bib.bib35] [35] Benjamin Scheidt and Nicole Schweikardt. Color Refinement for Relational Structures. CoRR, 2024. doi:10.48550/arXiv.2407.16022.

[bib.bib36] [36] Tim Seppelt. Logical equivalences, homomorphism indistinguishability, and forbidden minors. In Jérôme Leroux, Sylvain Lombardy, and David Peleg, editors, 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023), volume 272 of Leibniz International Proceedings in Informatics (LIPIcs), pages 82:1–82:15, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.MFCS.2023.82.

[bib.bib37] [37] Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. Weisfeiler-Lehman Graph Kernels. J. Mach. Learn. Res., 12:2539–2561, 2011. doi:10.5555/1953048.2078187.

[bib.bib38] [38] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How Powerful are Graph Neural Networks? In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL: https://openreview.net/forum?id=ryGs6iA5Km.

Color Refinement for Relational Structures

Abstract

Keywords and phrases:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Applications.

The power of CR.

Contributions.

Theorem A.

Further related work.

2 Preliminaries

Basic notation.

Relational Structures.

Color Refinement (CR).

Types.

3 Color Refinement on Relational Structures

3.1 Relational Color Refinement (RCR, for short) – Definition

3.2 Connection between RCR and CR

Definition 3.1.

Example 3.2.

3.3 Implementing RCR in Time 𝓞⁢(∥𝓐∥⋅𝐥𝐨𝐠⁡(∥𝓐∥))

Theorem B.

Definition 3.3.

Definition 3.4.

Example 3.5.

Theorem 3.6.

Lemma 3.7.

▶ Remark 3.8.

Lemma 3.9.

Lemma 3.10.

Lemma 3.11.

Fact 3.11.

Lemma 3.12.

Proof of Theorem 3.6.

Claim 3.13.

Claim 3.14.

4 Connection to Homomorphism Counts

Acyclic 𝝈-structures.

Homomorphisms.

Theorem C.

Theorem 4.1 ([15, 13]).

Lemma 4.2.

Lemma 4.3.

Proof of ˜C.

Proof of Lemma 4.2.

Proof of Lemma 4.3.

5 Connection to Logic

Theorem 5.1 ([10, 26]).

5.1 The Guarded Fragment of Counting Logic

Definition 5.2 (Syntax of 𝖦𝖥⁢(𝖢)).

Definition 5.3 (Semantics of 𝖦𝖥⁢(𝖢)).

Example 5.4.

5.2 The Guarded Game

5.3 RCR is equivalent to 𝗚𝗙⁢(𝗖) and the Guarded Game

Theorem D.

Lemma 5.5.

Lemma 5.6.

Lemma 5.7.

6 Final Remarks

References

3.3 Implementing RCR in Time $\mathcal{O}(\lVert{}{\mathcal{A}}\rVert\cdot\log(\lVert{}{\mathcal{A}}\rVert))$

$\blacktriangleright$ Remark 3.8.

Acyclic $\sigma$ -structures.

Definition 5.2 (Syntax of ${\mathsf{GF}}(\mathsf{C})$ ).

Definition 5.3 (Semantics of ${\mathsf{GF}}(\mathsf{C})$ ).

5.3 RCR is equivalent to ${\mathsf{GF}}(\mathsf{C})$ and the Guarded Game