A Game for Counting Logic Formula Size and an Application to Linear Orders
Abstract
Ehrenfeucht–Fraïssé (EF) games are a basic tool in finite model theory for proving definability lower bounds, with many applications in complexity theory and related areas. They have been applied to study various logics, giving insights on quantifier rank and other logical complexity measures. In this paper, we present an EF game to capture formula size in counting logic with a bounded number of variables. The game combines games introduced previously for counting logic quantifier rank due to Immerman and Lander, and for first-order formula size due to Adler and Immerman, and Hella and Väänänen. The game is used to prove an extension of a formula size lower bound of Grohe and Schweikardt for distinguishing linear orders, from 3-variable first-order logic to 3-variable counting logic.
Keywords and phrases:
Finite Model Theory, Logical Aspects of Computational ComplexityCopyright and License:
2012 ACM Subject Classification:
Theory of computation Finite Model Theory ; Theory of computation Models of learning ; Theory of computation Complexity theory and logicFunding:
Support from grants NSF 2217023 and NSF 2240532 is acknowledged. Support from Project 2024-1.2.3-HU-RIZONT-2024-00017 is acknowledged, financed by the Ministry of Culture and Innovation of Hungary from the National Research, Development and Innovation Fund, under the a 2024-1.2.3-HU-RIZONT funding scheme.Editors:
Stefano Guerrini and Barbara KönigSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Ehrenfeucht–Fraïssé (EF) games [11, 7] are a basic tool of finite model theory for proving definability lower bounds [26, 6]. Combined with logical characterizations of complexity classes, they provide a logic-based approach to problems in complexity theory. The original form of EF games gives bounds for quantifier rank in first-order logic. The games have been extended and modified for many logics and formula complexity measures. An EF game for formula size in first-order logic (FO) is given by Hella and Väänänen [19], building on Adler and Immerman [1]. In what follows, we refer to this game as the HV-game.
Counting logic extends first-order logic by adding the counting quantifier , and is frequently used in complexity theory and combinatorics. Counting logic turns out to be relevant for understanding the computational power of graph neural networks (Gnn) as well [12]. An EF game for distinguishing graphs in counting logic with a bounded number of variables is formulated by Immerman and Lander [22]. It extends the basic EF setup by an additional phase in each round involving the choice of subsets of the same cardinality in the two structures.
In this paper, we formulate a game for capturing formula-size complexity for counting logic. The game is a combination of the Immerman-Lander and Hella-Väänänen games. Restricted versions characterize formula size for fragments of counting logic where the number of variables and the counting rank are bounded. The game for formulas with a bounded number of variables could be a useful tool for exploring the implications of the logical characterization of Gnns (Barceló et al. [2]).
While the standard EF game is played on two structures, the HV game is played on two sets of structures, referred to in this paper as families. Grohe and Schweikardt [14] prove a formula size lower bound for linear orders for the 3-variable fragment of first-order logic, using this technique implicitly. They show that every first-order 3-variable formula that distinguishes a linear order of size from a larger one has size . Their proof is based on the notion of a separator and an involved weighting technique, which allows for a refined analysis of the syntax tree of a formula, by gauging the progress made in subformulas of a distinguishing formula. The proof is a detailed case analysis, with numerous subcases required to deal with quantifiers.
The formula size characterization is given in Theorem 10. The main result of the paper is that every 3-variable counting logic formula with counting rank distinguishing a linear order of size from a larger one has size at least (Theorem 16). This result extends the result of [14] from first-order logic to counting logic. The theorem is proved using the game characterization. There is a simple distinguishing formula of size (Proposition 15). For the case , i.e., for 3-variable FO, our result improves the formula size lower bound of [14] from to .
The proof of Theorem 16 adapts the framework of [14]. The main difference is in the most technical part, the quantifier case. We present the main lemma dealing with this case, with proofs of lemmas used in this proof given in the appendix. Other lemmas used in the lower bound are stated without proof due to lack of space.
The paper is structured as follows. After reviewing related work in Section 2, we describe the counting and HV games in Section 3. Section 4 describes the game and Section 5 gives the correspondence between the game and counting logic formula size. Section 6 contains the application on linear orders, with the proof of the lower bound in Section 7. The proof of the main lemma on counting quantifiers is presented separately in Section 7.3 and additional details are given in Appendix C. Section 8 contains further remarks.
2 Related Work
Counting logic has been discussed in several different forms. Grohe [12] defines counting logic as first-order logic (FO) extended by counting quantifiers of the form , and as its fragment using at most variables. This is the counting logic we consider in this paper. Previous work using this kind of counting logic includes Immerman and Lander [22] and Cai et al. [3]. Grohe [13], on the other hand, considers a more powerful counting logic, where formulae can include arithmetic operations on the number of elements satisfying a formula (see also Kuske and Schweikardt [25]).
EF games using sets of structures, capturing the number of quantifiers as opposed to quantifier rank, have been proposed by Immerman [20]. These multi-structural (MS) games receive increasing current attention (Fagin et al. [10], Carmosino et al. [4], Vinall-Smeeth [32]). HV games are essentially extensions of MS games, also modelling Boolean connectives in the formulae.
A graph neural network (Gnn) is a variant of neural networks for machine learning problems involving graphs [31]. Such a network allows the use of deep learning techniques to classify graphs (graph classification), or to classify the nodes of a large graph (node classification). The computational power of Gnn is closely related to the Weisfeiler–Leman (WL) graph isomorphism algorithm (Morris et al. [27], Xu et al. [33]). The connection of the WL algorithm to counting logic with a bounded number of variables [3, 12] brings these logics into the Gnn picture as well. Barceló et al. [2] gave logical characterizations in terms of counting logic with a bounded number of variables using results established in modal logic (Otto [29]). The complexity aspects of the characterizations are not discussed in [2], and studying this aspect (also pointed out in Grohe [12]) has been a motivation for the topic of this paper (a brief further discussion is given at the end of the paper).
The Gnn characterizations of Grohe [13] establish a connection of Gnn to threshold circuits, a Boolean circuit model of neural networks. The computational power of such circuits corresponds to counting logic with an arbitrary built-in predicate. Proving superpolynomial lower bounds for threshold circuits is an open problem. Hajnal et al. [15, 16] prove an exponential lower bound for depth-2 circuits with polynomial weights. The same papers prove a quantifier rank lower bound for counting logic with successor as the built-in relation. Similar results are also given in Etessami [8, 9]. Karchmer and Wigderson [24] formulate an approach, related to HV games, to proving monotone formula depth lower bounds. They also prove a depth version of the Krapchenko formula size lower bound. Krapchenko’s Theorem is proved in [19] as an application of HV games.
3 Background
In this section we introduce basic notation used in the paper and review EF and HV games.
3.1 Basic Definitions
3.1.1 Logics
We consider relational structures over a fixed vocabulary. Counting logic is obtained by extending first-order logic with counting quantifiers and . Here means that there are at least distinct assignments to the variable that satisfy . Thus is logically equivalent to . The quantifier stands for . In and , is referred to as the counting rank of the quantifier. The counting rank of a formula is the maximum counting rank of its quantifiers.
As a counting quantifier can be replaced by standard quantifiers, adding counting quantifiers does not change the expressivity of first-order logic. It does, however, impact the succinctness, the minimum size of formulae expressing a property. In applications to finite model theory one usually considers a sequence of pairs of structures. Complexity bounds to be proven are also functions of . Note that is equivalent to for an -element structure. The transformation increases counting rank and thus it cannot be used in formula size bounds for the bounded counting rank case.
Parameters to be considered are the bounds on the number of variables, on the counting rank, and on formula size. The fragment of containing at most variables is denoted by , and by if, in addition, counting rank is at most .
3.1.2 Structures and Families
The universe of a structure is denoted by . We use , , to denote variables. A variable assignment for a structure is a finite partial mapping , where is the set of variables. The finite domain of is denoted by .
An interpretation is a pair . For a formula , means that the assignment satisfies the formula in the structure , with containing all the for which the variable is free in . A formula distinguishes interpretations and , denoted by , if and .
A family is a set of interpretations where and for every and is some set. When the context is clear, we drop the subscript. We write to express that for all it holds that , and we say that distinguishes . For a structure , we denote by the family , containing a single interpretation with the empty assignment.
3.1.3 Operations
If is an assignment on , and , then is the assignment that maps to and agrees with otherwise.
Given a family , a choice function is of the form . The set of all choice functions on the family is denoted by . We define two operations on families [19].
-
Change: Given a family , a choice function and , the change operation on with for the variable produces the family
In the new family, the assignment to is changed based on the choice function . If then is a new variable, and is updated to . Thus a change operation may either leave the domain unchanged or add a new element to it.
-
Multiplication: Given a family and , the multiplication operation for the variable produces the family
The new family consists of interpretations with assigned to all possible values in . Here, again, the domain is either unchanged or a new element is added to it.
3.1.4 Formula Complexity
The size of a formula is defined inductively: if is an atomic formula, ; and for formulae , ; ; .
3.2 Review of Games
In this section we review the counting logic game [22] and the first-order logic formula size game [19], referred to as the HV game.
Definition 1 (-round IL -pebbling game).
The game IL is played on two relational structures and . There are two players, Spoiler and Duplicator, and pairs of pebbles for . It goes as follows:
-
For rounds:
-
–
Spoiler picks a set of elements of or and a number . Duplicator selects a set of elements of the same cardinality in the other structure.
-
–
Spoiler picks an element in and Duplicator selects an element in . The pebble (resp., ) holds the value of the element picked in (resp., ).
-
–
-
The -round game ends in the position , . Duplicator wins if the mapping from to is a partial isomorphism between and , where denotes the constants of the language.
Theorem 2.
The following are equivalent:
-
and satisfy the same sentences of of quantifier rank at most .
-
Duplicator has a winning strategy in the -round -pebbling game.
A similar result holds for the bounded counting rank fragment , by restricting the cardinality of the sets picked to be at most .
Definition 3 (HV game for formula size on first-order logic).
The game HV is played on two families, and , by Spoiler and Duplicator, for a positive integer . The initial position is . Spoiler chooses one of five possibilities for the continuation of the game:
-
-move: the game continues from the position .
-
-move: Spoiler chooses such that and partitions to get . Duplicator sets the next position as or , from which the game goes on.
-
-move: similar but played on .
-
-move: Spoiler chooses and a choice function from . Then the game continues from .
-
-move: similar but Spoiler chooses on .
Spoiler wins if the game reaches a position for and there is an atomic formula that distinguishes and . Duplicator wins if the game reaches a position and Spoiler does not win.
Theorem 4.
Let be a pair of families, and let be a positive integer. Then the following are equivalent:
-
1.
Spoiler has a winning strategy in the game HV.
-
2.
There is a formula of FO of size such that .
Note that for any , one can define the variant of the game HV, for which the moves and are restricted by . This game then characterizes FOm, the fragment of FO logic, with variables. An illustration of the -move is given in Fig. 1.
In Table 1, we summarize the game characterizations discussed above and the game we are about to introduce.
4 The Counting Logic Formula Size Game
In this section we define the game for counting logic formula size. We start by extending the operations of the HV game to this setting.
4.1 Extended Operations
Definition 5 (-choice function).
Given a family , a -choice function is of the form , where each is a choice function, and for every the elements are pairwise distinct. The set of all the -choice functions on the family is denoted by .
Definition 6 (selection).
A mapping is a selection if for every . The set of all the selection functions of is denoted by .
The extended set of operations is the following:
-
-Change: Given a family and , the -change operation associated to produces the family
Thus each interpretation gives rise to interpretations composing the new family, where the assignments to are changed based on the -choice function .
-
-Multiplication: Given a family and , the -multiplication operation associated to the selection produces the family
To avoid overloading the notation, we will keep the implicit with the notation , and use the term “selects” when referring to the action of .
In plain language, for every and , one interpretation is selected from to be part of . Thus each interpretation generates interpretations that compose the family .
4.2 The CS Game
We now define the game for counting logic formula size (referred to as the CS game for “counting formula size”). It is presented in the version, as this will be used in the rest of the paper. The new -move is illustrated in Fig. 2.
Definition 7 (CS game for formula size on counting logic).
The game CS has two players, Spoiler and Duplicator, is the number of variables. are two families with of size at most . Suppose after moves we reach the position . Depending on Spoiler’s choice, the game continues as follows:
-
-move: the game continues from the position .
-
-move: Spoiler first chooses numbers and such that 1 and . Then Spoiler partitions into a pair of families and . The game continues either from the position , , or from the position , , according to Duplicator’s choice.
-
-move: Spoiler first chooses numbers and such that and . Then Spoiler partitions into a pair of families and . The game continues either from the position , , or from the position , , according to Duplicator’s choice.
-
-move: Spoiler chooses and a -choice function on . For every -choice function on , Spoiler selects 111Note that formally a selection chosen by Spoiler is considered here as in Definition 6. from . The union of the over forms . Then the game continues from the position , , .
-
-move: Spoiler chooses and a -choice function on . For every -choice function on , Spoiler selects from . The union of the over forms . Then the game continues from the position , , .
(Atomic) The game ends in a position if either there is an atomic formula such that , in which case Spoiler wins, or, otherwise if Duplicator wins if there is no such .
In the application to linear orders we use the version of the game for bounded counting rank.
Definition 8 (CS game for -formula size).
The CS game is the version of the CS game where for the and moves.
4.3 An Illustration of the CS Game
In this section we present a small example illustrating the game.
Example 9.
Consider playing the game CS on structures and of 4 and 3 elements respectively with unary relations and , as illustrated in Fig. 3.
Formulas of size cannot distinguish the structures, as those can only contain a single atomic formula and a quantifier.
The sentence distinguishes and and is of size . We show that Spoiler has a winning strategy when and illustrate an strategy.
Consider the starting position with no assignment CS.
Spoiler starts with an -move and picks the 2-choice function ,
corresponding to two copies of , with and . This is shown on the left of Fig. 3 by showing only the assignments in the two copies.
The -choice functions on are (the three boxes on the right of Fig. 2).
Spoiler then selects one of the two possible choices in each box. For example, picking the selection function (picking 1 from the first and second boxes and 3 from the third box) is a valid choice for Spoiler.
To win at this step (bottom of the Fig.), and make sufficient for the initial game, Spoiler must distinguish all elements on the left from the element of the right with or in one move, since this is not possible no matter what Spoiler had picked from the three boxes, the game has to go on for Spoiler to be able to win.
From , in the position shown in the bottom of Fig. 3, , and families , need to be distinguished. Spoiler can make a move with to get to position on one side, and position on the other side. On the first side, Spoiler wins in with and on the other side wins in with .
5 Game Characterization Theorem
In this section, we state the game characterization theorem and formulate some corollaries. The proof of the characterization theorem is given in Appendix A.
Theorem 10 (Characterization Theorem).
Let be families and be a positive integer. Then the following are equivalent:
-
1.
Spoiler has a winning strategy in the game CS.
-
2.
There is a formula of size such that .
A corollary for structures is as follows:
Corollary 11.
Let and be structures and let be a positive integer. Then the following conditions are equivalent:
-
1.
Spoiler has a winning strategy in the game CS.
-
2.
There is a -sentence of size such that .
The game for can be used to prove analogous results for bounded counting rank.
Corollary 12.
Suppose and are families and let be a positive integer. Then the following are equivalent:
-
1.
Spoiler has a winning strategy in the game CS.
-
2.
There is a -formula of size such that .
The proof is similar to the CS game characterization of formula size presented in Appendix A, but with the aforementioned parameter bounded, and is omitted.
We derive a variation of the counting game for guarded counting logic [12], where the guarding binary relation is denoted by . The resulting game is denoted by -CS. By the equivalence between the expressivity of Gnns and guarded counting logic established in [2, Theorem 4.2], we obtain the following corollary linking the distinguishing power of Gnns and the guarded counting game:
Corollary 13.
Let and be families and let be a positive integer. Then the following are equivalent:
-
1.
Spoiler has a winning strategy in the game -CS.
-
2.
There is a Gnn that expresses an FO formula of size that is capable of distinguishing from .
6 Distinguishing Linear Orders with Counting Logics
In the rest of the paper we deal with counting logic definability on linear orders and apply CS games in this context. A linear order is defined over the signature , where is a linear ordering. denotes the structure ( , ), where is the standard linear ordering. For , let d and -type be or reflecting the order between and .
The first result gives a lower bound for the quantifier rank needed to distinguish linear orders in counting logic with bounded counting rank. It extends the standard lower bound for first-order logic (see, e.g., [26]). The proof uses the IL pebble game of Definition 1 with bounded counting rank [22] and is presented in Appendix B. The argument provides intuition for the framework developed for Theorem 16.
Theorem 14.
Let , and let . Then and cannot be distinguished by sentences of counting rank at most and quantifier rank at most .
We now turn to the problem of distinguishing two linear orders using counting logic sentences with a bounded number of variables and bounded counting rank, starting with an upper bound using two variables.
Proposition 15.
There is a -sentence of size distinguishing and , where .
Proof.
Consider the formulae:
-
.
-
.
-
For with :
Let Then satisfies iff . The size of is . As discussed earlier, the following lower bound extends [14] to counting logic. The proof is presented in the remainder of the paper and in Appendix C. For , i.e. for FO3, this theorem improves on lower bound of [14] from to .
Theorem 16.
If is a -sentence distinguishing and , where , then .
7 Proof of Theorem 16
In this section we first introduce the framework of [14]. Extended syntax trees are adapted to counting logic. The definitions of separators and the weighting scheme are unchanged. The lower bound for formula size in terms of separator weights and the proof of Theorem 16 based on it are given in Section 7.2. The main technical result is Lemma 22, Part 3, which is proved in Section 7.3. Proofs of lemmas used in this section are given in Appendix C.
7.1 Description of the Framework
We extend the framework for FO of [14] to counting logic. In the following, we write for the three variables used. We start by describing the main concepts: the counting game derived extended syntax trees, separators, and weighting scheme.
7.1.1 Extended Syntax Tree
Given two families , the extended syntax tree represents a winning strategy for Spoiler in the CS game on . It assigns to each tree node a pair of families (“interpretation label”), along with Spoiler’s move (“syntax label”) in this position such that:
-
1.
A node is a child of node if position can be obtained in one move from ,
-
2.
A node is a leaf if satisfies the atomic win condition.
For each node some additional information is provided, including the family distinguished by the interpretation label of the node. The root is associated to the starting position of the game CS, and we consider the nodes associated to positions reached throughout a winning strategy.
Definition 17.
Let be an -formula and and be families such that . The extended syntax tree is defined as follows:
-
If is an atomic formula, then is a single node with syntax label and interpretation label . Comment: .
-
If is of the form , then has root with and . The unique child of is the root of . Comment: .
-
If is of the form , then has root with and . The first child of is the root of and the second child of is the root of , where for . Comment: and .
-
If is of the form , then has root with and . The first child of is the root of and the second child of is the root of , where for . Comment: and .
-
If is of the form , for a variable and , then has root with and . The unique child of is the root of , where is chosen so that , and for every , is selected from so that . Comment: , as .
-
If is of the form , for a variable and , then has a root node with and . The unique child of is the root of , where is chosen so that , and for every , is selected from so that . Comment: , as .
7.1.2 Separators and Weighting Scheme
Separators are the key concept in [14] to study succinctness of formulas with a bounded number of variables. They provide a different “yardstick” for every pair of variables and constants for distances to be represented exactly. In contrast to the standard approach (as in Theorem 14), this “context-dependent” approach allows for a detailed representation of the progress made in the game.
Definition 18 (separator).
A separator for families is a mapping
such that for every and , there are with one of the following hold:
-
1.
-type-type
-
2.
both of the following two conditions hold:
-
MIN
-
.
-
In this case the separator distinguishes the two families. We will also use the notion of the type of an element, in the following specific form.
Definition 19.
(type of an element) Let be an interpretation and . Then -type = (-type, -type, -type, -type.
The definition of the weight of a separator is motivated by the quantities appearing in the computations [14].
Definition 20 (weight of a separator).
Let be a separator, we define its:
-
1.
border-distance MAX
-
2.
center-distance MAX
-
3.
weight .
The weight is a measure of the distinguishing power of separators. A separator is minimal if it has minimal weight.
7.2 Proof of Theorem 16
The proof is based Theorem 21, which gives a formula size lower bound in terms of minimal separator weight. Theorem 21, in turn, is based on two lemmas, Lemma 22 and 23. The proofs of these lemmas are similar for FO and counting logic and are omitted due to lack of space, with the exception of quantifiers, handled in Lemma 22 Part 3. This is the most technical part of [14] and the problem we consider is to prove a version of this part for counting quantifiers. This is the point where counting rank enters the bounds. Other than this difference, the rest of the argument, like handling propositional connectives, involves minor changes in the computation to account for . The proof Lemma 22 Part 3 is based on Lemma 24. This lemma is proved in Section 7.3 and Appendix C. The derivation of Lemma 22 from Lemma 24 is again similar to the FO case and is omitted.
Theorem 21.
Suppose and is a minimal separator for , then .
The following lemma links separator weights to the structure of extended syntax trees.
Lemma 22.
Suppose and let be the extended syntax tree , and be a minimal separator for . For every node of the following is true:
-
1.
If is a leaf, then .
-
2.
If has 2 children and , and is a minimal separator for , for , then .
-
3.
If has exactly one child , and is a minimal separator for , then .
The bound of part 3. extends the corresponding bound of [14] to counting logic. In fact, for counting rank , i.e., for first-order logic without counting, it improves the bound from to . This leads to the slight improvement of the lower bound of [14] mentioned after Theorem 16.
Lemma 23.
Let be a finite binary tree where each node is equipped with a weight such that the following is true:
-
If v is a leaf, then .
-
If has 2 children and , then .
-
If has exactly one child , then .
Then, , where is the root of and is the number of nodes of .
Theorem 21 follows directly from the previous lemmas.
Proof of Theorem 21.
Consider the syntax tree for the pair , .
We associate to each node of a weight , where is a minimal separator
for . From , we get that , where is a minimal separator of .
Finally, as a consequence, we get Theorem 16.
Proof of Theorem 16.
7.3 Proof of Lemma 22
The proof of Lemma 22 Part 3. relies on the following key lemma, which gives a bound on how much a counting quantifier can increase the distinguishing power of separators.
Lemma 24.
Let be a node of that has syntax-label for , and . Let be a separator for , where is the unique child of in .
Let be the separator defined via:
-
, for all and,
-
MAX for all .
Then, is a separator for .
7.3.1 Proof of Lemma 24
Due to symmetry, we only consider the case . It has to be shown that is a separator for . By definition, . Consider and . We need to show that separates . Let:
-
be the set of interpretations generated from in , and let and , .
-
For any -choice function on , let be the choice function selected from , and let be the interpretation selected from . We define ; note that corresponds to some , .
Note that for all , and for all , . So we will omit subscripts and just write for the assignment of and for the minimum in the copies of (and similarly with ). Similarly, for all , for all , and for all , , so we will also write for the assignment of and for the minimum in the copies of (and similarly with ). We will say that separates to signify that witnesses the separation property of on .
We formulate several indistinguishability properties for the choice functions and the interpretations. Lemma 30 shows that if each property holds then does not separate as, using the indistinguishabilities, a -choice function can be constructed for which and are not separated by . Thus some of the assumptions fail. Lemmas 26-29 combined show that if any of the properties fail, then separates .
The first property is about the role of in separation.
Property 1.
- Part a)
-
For every , none of the is separated from by for .
- Part b)
-
If there are such that MIN dd then dd.
We may assume w.l.o.g. that . Property 1 implies that holds as well.
The other properties refer to“closeness” of elements to in , gauged by for , depending on whether d or d. The information about the location of is completed by its type -type (Definition 19).
The next two properties deal with implications of closeness to the existence of a “twin” in the other structure.
Property 2.
For every and such that d there is a such that d d and -type -type.
Let d be the set of variables to which is close.
Property 3.
For every such that , there is a such that -type -type and for every it holds that
-
a)
d d if
-
b)
d otherwise.
The last property is about elements which are not in any closeness relation. We define quantities, called gaps, to characterize those elements.
Definition 25 (Gap).
For and
define:
A separator cannot distinguish using a pair of variables in between an element in and . This is the point where the additive term enters the bounds.
Property 4.
If and there are choices of in then .
Now we formulate the lemmas which conclude the separator property of in case some subsets of the properties fail. The first lemma notes that a separating pair in not involving also separated in .
Lemma 26.
If Property 1 fails then separates .
In view of Lemma 26, the next two lemmas say that if Property 1 holds, then the non-existence of a twin implies separation.
Lemma 27.
If Properties 1 or 2 fails then separates .
Lemma 28.
If Properties 1, 2 or 3 fails then separates .
The last lemma of this kind concludes separation from the sizes of gaps.
Lemma 29.
If Properties 1 or 4 fails then separates .
The final Lemma 30 provides the step needed to complete the proof of Lemma 24. It shows that, on the other hand, the conjunction of all the indistinguishabilities contradicts the property that is a separator,
Lemma 30.
If Properties 1,2,3,4 hold, then there is a such that does not separate .
Proof.
As Property 3 holds, for every such that , we have that there exists a unique such that -type -type and d d for all , and d for all .
Additionally, since Property 4 holds, for every in for , there exists a distinct .
Consider the choice function consisting of the ’s matching the ’s as described above. Suppose index from the choice function is selected to create .
-
If , i.e., is not in a gap, then satisfies the conditions described by Property 3, so cannot be separated by for .
-
If is in , then by construction is in , and cannot be distinguished by for .
Property 1 states that pairs with cannot distinguish either. Thus does not distinguish .
Proof.
8 Conclusion and Perspectives
In this paper an EF game is formulated for counting logic formula size. It is used to prove a lower bound for the size of 3-variable counting logic formulae with counting rank , distinguishing a linear order of size from a larger one. The lower bound extends a lower bound of [14] for FO3. The proof is based on the approach of [14], with a different argument for the central case of handling counting quantifiers. Closing the gap between the lower bound and the upper bound of size is an open problem. This is open even in the FO case where, as far as we know, no improvement is known of the linear upper bound.
Comparing the succinctness of various knowledge representation formalisms is studied in detail in knowledge compilation [5], Boolean complexity theory [23] and other areas, but, as noted in [14], perhaps less so for predicate logic. Comparing -variable counting logic formula sizes for seems to be an interesting problem.
We conclude with a brief description of the connection between Gnn and counting logic formula size. A Gnn works on a graph with feature vectors assigned to the nodes. In each round these are updated by applying a combination function to the previous vector and an aggregate of the feature vectors of the neighbors [17]. A logical classifier computes a unary query on graphs (e.g., assigning to every graph the set of red vertices with all blue neighbors). Barceló et al. [2] showed that an FO logical classifier is computable by a Gnn iff it is definable in 2-variable guarded counting logic. See also [12, 13], also noting that this is a “uniform” model. Every such formula has a Gnn simulation with complexity (number of features and rounds) depending on the complexity of the formula. One of the many questions raised by this connection is: how is the complexity of the formula related to the complexity of learning the Gnn? A related question, recently considered in [30], is whether the underlying formula can be extracted from a learned Gnn using explainability techniques.
References
- [1] Micah Adler and Neil Immerman. An n! lower bound on formula size. ACM Trans. Comput. Log., 4(3):296–314, 2003. doi:10.1145/772062.772064.
- [2] Pablo Barceló, Egor V. Kostylev, Mikaël Monet, Jorge Pérez, Juan L. Reutter, and Juan Pablo Silva. The logical expressiveness of graph neural networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020. URL: https://openreview.net/forum?id=r1lZ7AEKvB.
- [3] Jin-yi Cai, Martin Fürer, and Neil Immerman. An optimal lower bound on the number of variables for graph identification. Comb., 12(4):389–410, 1992. doi:10.1007/BF01305232.
- [4] Marco Carmosino, Ronald Fagin, Neil Immerman, Phokion G. Kolaitis, Jonathan Lenchner, and Rik Sengupta. A finer analysis of multi-structural games and beyond. CoRR, abs/2301.13329, 2023. doi:10.48550/arXiv.2301.13329.
- [5] Adnan Darwiche and Pierre Marquis. A knowledge compilation map. J. Artif. Intell. Res., 17:229–264, 2002. doi:10.1613/jair.989.
- [6] Heinz-Dieter Ebbinghaus and Jörg Flum. Finite model theory. Perspectives in Mathematical Logic. Springer, 1995.
- [7] Andrzej Ehrenfeucht. An application of games to the completeness problem for formalized theories. Fundamenta Mathematicae, 1961.
- [8] Kousha Etessami. Counting quantifiers, successor relations, and logarithmic space. In Proceedings of the Tenth Annual Structure in Complexity Theory Conference, Minneapolis, Minnesota, USA, June 19-22, 1995, pages 2–11. IEEE Computer Society, 1995. doi:10.1109/SCT.1995.514723.
- [9] Kousha Etessami. Counting quantifiers, successor relations, and logarithmic space. J. Comput. Syst. Sci., 54(3):400–411, 1997. doi:10.1006/jcss.1997.1485.
- [10] Ronald Fagin, Jonathan Lenchner, Nikhil Vyas, and R. Ryan Williams. On the number of quantifiers as a complexity measure. In Stefan Szeider, Robert Ganian, and Alexandra Silva, editors, 47th International Symposium on Mathematical Foundations of Computer Science, MFCS 2022, Vienna, Austria, August 22-26, 2022, volume 241 of LIPIcs, pages 48:1–48:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.MFCS.2022.48.
- [11] Roland Fraisse. Sur quelques classifications des systemes de relations, 1954.
- [12] Martin Grohe. The logic of graph neural networks. In 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2021, Rome, Italy, June 29 - July 2, 2021, pages 1–17. IEEE, 2021. doi:10.1109/LICS52264.2021.9470677.
- [13] Martin Grohe. The descriptive complexity of graph neural networks. In 38th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2023, Boston, MA, USA, June 26-29, 2023, pages 1–14. IEEE, 2023. doi:10.1109/LICS56636.2023.10175735.
- [14] Martin Grohe and Nicole Schweikardt. The succinctness of first-order logic on linear orders. Log. Methods Comput. Sci., 1(1), 2005. doi:10.2168/LMCS-1(1:6)2005.
- [15] András Hajnal, Wolfgang Maass, Pavel Pudlák, Mario Szegedy, and György Turán. Threshold circuits of bounded depth. In 28th Annual Symposium on Foundations of Computer Science, Los Angeles, California, USA, 27-29 October 1987, pages 99–110. IEEE Computer Society, 1987. doi:10.1109/SFCS.1987.59.
- [16] András Hajnal, Wolfgang Maass, Pavel Pudlák, Mario Szegedy, and György Turán. Threshold circuits of bounded depth. J. Comput. Syst. Sci., 46(2):129–154, 1993. doi:10.1016/0022-0000(93)90001-D.
- [17] William L. Hamilton. Graph Representation Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2020. doi:10.2200/S01045ED1V01Y202009AIM046.
- [18] Lauri Hella. Logical hierarchies in PTIME. Inf. Comput., 129(1):1–19, 1996. doi:10.1006/inco.1996.0070.
- [19] Lauri Hella and Jouko Väänänen. The size of a formula as a measure of complexity. In Åsa Hirvonen, Juha Kontinen, Roman Kossak, and Andrés Villaveces, editors, Logic Without Borders - Essays on Set Theory, Model Theory, Philosophical Logic and Philosophy of Mathematics, volume 5. De Gruyter, 2015. doi:10.1515/9781614516873.193.
- [20] Neil Immerman. Number of quantifiers is better than number of tape cells. J. Comput. Syst. Sci., 22(3):384–406, 1981. doi:10.1016/0022-0000(81)90039-8.
- [21] Neil Immerman. Descriptive complexity. Graduate texts in computer science. Springer, 1999. doi:10.1007/978-1-4612-0539-5.
- [22] Neil Immerman and Eric Lander. Describing graphs: A first-order approach to graph canonization, 1990.
- [23] Stasys Jukna. Boolean Function Complexity - Advances and Frontiers, volume 27 of Algorithms and combinatorics. Springer, 2012. doi:10.1007/978-3-642-24508-4.
- [24] Mauricio Karchmer and Avi Wigderson. Monotone circuits for connectivity require super-logarithmic depth. SIAM J. Discret. Math., 3(2):255–265, 1990. doi:10.1137/0403021.
- [25] Dietrich Kuske and Nicole Schweikardt. First-order logic with counting: At least, weak hanf normal forms always exist and can be computed! In Proceedings of the 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), volume abs/1703.01122, pages 1–12. IEEE, 2017. doi:10.48550/arXiv.1703.01122.
- [26] Leonid Libkin. Elements of Finite Model Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2004. doi:10.1007/978-3-662-07003-1.
- [27] Christopher Morris, Martin Ritzert, Matthias Fey, William L. Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and Leman go neural: Higher-order graph neural networks. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, pages 4602–4609. AAAI Press, 2019. doi:10.1609/aaai.v33i01.33014602.
- [28] Martin Otto. Bounded Variable Logics and Counting: A Study in Finite Models, volume 9 of Lecture Notes in Logic. Cambridge University Press, 2017. doi:10.1017/9781316716878.
- [29] Martin Otto. Graded modal logic and counting bisimulation. CoRR, abs/1910.00039, 2019. doi:10.48550/arXiv.1910.00039.
- [30] Alexander Pluska, Pascal Welke, Thomas Gärtner, and Sagar Malhotra. Logical distillation of graph neural networks. In Pierre Marquis, Magdalena Ortiz, and Maurice Pagnucco, editors, Proceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning, KR 2024, Hanoi, Vietnam. November 2-8, 2024, 2024. doi:10.24963/kr.2024/86.
- [31] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Trans. Neural Networks, 20(1):61–80, 2009. doi:10.1109/TNN.2008.2005605.
- [32] Harry Vinall-Smeeth. From quantifier depth to quantifier number: Separating structures with k variables, 2024. doi:10.48550/arXiv.2311.15885.
- [33] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019. URL: https://openreview.net/forum?id=ryGs6iA5Km.
Appendix A Proof of Theorem 10
Proof.
We use induction on . If , Spoiler wins in CS only if there is an atomic formula , so verifying , such that . Reciprocally, if and , then is an atomic formula and spoiler wins in CS.
We now assume the equivalence for .
To prove the forward direction of the equivalence, suppose Spoiler has a winning strategy in the game CS starting with:
-
-move; Spoiler has a winning strategy in CS, so by induction hypothesis there is a formula verifying and . Finally and so we get .
-
-move, choosing and such that with , splitting into and . Spoiler has a winning strategy in CS and CS, so by the induction hypothesis, there are formulae and such that , and .
We have that and , so . Conversely, and , and consequently .
It follows that , and since , we get . -
-move; and by a symmetrical argument to the -move we get .
-
-move; choosing , a -choice function on and for every , selecting from to form . Spoiler has a winning strategy in CS, , so by the induction hypothesis, there is a formula such that and , .
Define . We have that , and so for all , and therefore .
Now suppose that there exists a -choice function on and a such that . Then there is an such that is part of the family through the selected , but , which is a contradiction. Therefore .
Finally and , so we get . -
-move; a symmetrical argument to the -move yields .
To prove the backward direction, assume there is a formula of size such that . We show that Spoiler has a winning strategy in CS. Let us consider the form of the formula :
-
: Spoiler plays the -move and gets in the position CS. Since and , Spoiler has a winning strategy in CS by induction hypothesis, and therefore Spoiler has also a winning strategy in CS.
-
: Define , , and such that , and . Spoiler plays the -move associated to and gets to CS or CS according to Duplicator’s choice. We have that , and therefore by induction hypothesis Spoiler has a winning strategy in CS or CS, and therefore Spoiler has also a winning strategy in CS.
-
: a symmetrical argument shows that Spoiler has a winning strategy in CS.
-
: Since , there is a -choice function on such that
, for every and for all . Thus, .
On the other hand, , thus from every on we can select such that and therefore, .
Finally we have that and , so by induction hypothesis, Spoiler has a winning strategy in the game CS and therefore Spoiler has also a winning strategy in CS. -
: a symmetrical argument shows that Spoiler has a winning strategy in CS.
Appendix B Proof of Theorem 14
We want to show that Duplicator can win a -round EF game on and taken to be of length at least . The key idea behind the proof is that, from a given assignment, Spoiler can “only look up to a certain distance” in the two directions on the line.
Proof.
Suppose and are linear orders of length at least , on which the EF game will be played.
After moves, we denote by the “position”: is a tuple consisting of concatenated to the moves played on : , , . Similarly, we define the tuple of moves played on .
For , we prove that regardless of Spoiler’s choices, Duplicator can maintain the following inequalities:
-
1.
if , then .
-
2.
if , then .
-
3.
iff
Using those inequalities for moves, Property 3. yields that Duplicator win the -round EF game on and . Since this happens no matter what Spoiler plays, the game characterization theorem for bounded counting rank implies that and verify the same sentences.
We now prove by induction on the moves (on ) that the inequalities can be maintained. The base case of is immediate.
For the induction step, assume the inequalities hold for moves, and suppose without loss of generality that Spoiler makes his move on . Spoiler plays a set of cardinality at most in . We describe Duplicator’s response, a set on . In the second part of the move, if Spoiler picks on for (i.e. ), Duplicator will respond with on (i.e. ).
If for and , Duplicator sets to be . Suppose there is an element which has not been selected earlier. We define such that and there is no other previously played moves on inside this interval. By Property 3., the interval between and contains no other elements of . Then we have two cases regarding the length of the interval:
-
. Then by Property 1. and the inductive hypothesis, , and the intervals and are isomorphic. Duplicator picks so that and , which ensures that the three properties hold for moves.
-
. In this case by Property 2., . We have three possibilities:
-
–
. Then , and Duplicator picks on so that maintaining Properties 1 and 3. Since this maintains .
-
–
, in which case a similar reasoning applies.
-
–
Otherwise, Spoiler has picked such that both: and . There can be at most such as .
Since , there are at least distinct elements in between and . Duplicator picks the first on that interval that is not already in . This ensures that, and , therefore satisfying all three properties.
-
–
Thus, in all the cases, the induction is preserved.
Appendix C Proof of the lemmas of Section 7
C.1 Proof of Lemma 26
If Part a) fails then there is for which one of the is separated from by for . As for these pairs, then
separates .∎
If Part b) fails then there are such that
MIN dd and dd.
Since we have that:
dd and
MINdd, i.e., separates .∎
C.2 Proof of Lemma 27
If Property 1 fails then we already know from the previous lemma that separates . So assume Property 1 holds and Property 2 fails. Then there is an and a such that d and there is no that satisfies d d and -type -type.
Note that there is at most one such that d d and -type -type. We distinguish two cases depending on whether such an element exists or not.
Case 1.
There are and such that d and there is no such that d d and -type -type.
There are four cases, we consider and (the other two follow by symmetry).
-
is : It holds that d d. Here the first inequality uses the definition of , the second is a property of this case and third holds as there is no such . But d d, so d d. With d this implies that separates .
-
is : The reasoning is similar. If , d d. So d d and d. Then separates . If , we reach a similar conclusion.
Note that contradicts Property 1. ∎
Case 2.
Case 1 does not hold, so there is and such that d and there is with d d, -type -type, and -type -type. As noted above, given , is unique.
Let be such that -type -type, which exists by -type -type.
By Property 1, it holds that -type -type. As is on a different side of than of , this implies that
both: MINd d d and d d. Since d we get that:
d d and MINd d. So separates.∎
C.3 Proof of Lemma 28
If Property 1 or 2 fails, then we already know that separates . So assume Properties 1, 2 hold and Property 3 fails. For any such that , we pick . By Property 2, we can define such that d d and -type -type. We consider two cases.
- Case 1 (Part a) fails):
-
Suppose there is such that and does not satisfy d d for every . Let be such that d d. Thus d d.
Notice that d d d . So d d and d. Therefore separates .
- Case 2 (Part b) fails):
-
Suppose there are and such that d and does not satisfy d.
Then d, so we must have that d d.Notice that d dd. So d d and d. Therefore separates . ∎
C.4 Proof of Lemma 29
We first define “gap boundaries”, a pair from bounding a given gap.
Definition 31 (Gap boundaries).
Let be an interpretation, and let be one of . A pair is the gap boundaries of if:
-
1.
belongs to the corresponding admissible set:
-
2.
the following coverage inequality holds:
Among all such pairs, we choose the one with the smallest left component and the largest right component.
The definition is illustrated in Fig. 5.
The gap boundaries for gap are always defined, and may differ from . For example, the gap might have gap boundaries when the neighborhood of contains the neighborhood of , hence covering a bigger chunk of the interval. If the gap is nonempty, the inequalities in Definition 31 become equalities, as the two segments corresponding to the gap boundaries are non-overlapping, and so the two values added to the size of the gap add up to the length of the interval.
We derive a result for gap boundaries before proceeding to the next case.
Proposition 32.
If Property 1 holds, then
the gap boundaries associated with
and are the same.
Proof.
Let be the gap boundaries associated with , and be the gap boundaries associated with .
We will first reason from to , and suppose that . By the definition of the gap on , it must be that .
Property 1 implies that d since . So . Therefore cannot be a gap variable for , as otherwise would have chosen on instead of .
If , then by Property 1 , and the same reasoning applies on the gap on . By symmetry, the same analysis holds for .
We now proceed to the proof of the lemma. Suppose again that Property 1 holds and Property 4 fails.
There is such that there are choices of in and .
By Proposition 32, there is a unique pair of gap boundaries for both and .
On one hand by the definition of gap boundaries on , we have
On the other hand by the definition of gap boundaries on ,
Thus separates . ∎
