Abstract 1 Introduction 2 Preliminaries 3 Upper bound formulas 4 Lower bounds via formula size games 5 Expected description complexity 6 Entropy and description complexity 7 Conclusion References Appendix A Appendix

Description Complexity of Unary Structures in First-Order Logic with Links to Entropy

Reijo Jaakkola ORCID Mathematics Research Centre, Tampere University, Finland Antti Kuusisto ORCID Mathematics Research Centre, Tampere University, Finland Miikka Vilander ORCID Mathematics Research Centre, Tampere University, Finland
Abstract

The description complexity of a model is the length of the shortest formula that defines the model. We study the description complexity of unary structures in first-order logic FO, also drawing links to semantic complexity in the form of entropy. The class of unary structures provides, e.g., a simple way to represent tabular Boolean data sets as relational structures. We define structures with FO-formulas that are strictly linear in the size of the model as opposed to using the naive quadratic ones, and we use arguments based on formula size games to obtain related lower bounds for description complexity. For a typical structure the upper and lower bounds in fact match up to a sublinear term, leading to a precise asymptotic result on the expected description complexity of a randomly selected structure. We then give bounds on the relationship between Shannon entropy and description complexity. We extend this relationship also to Boltzmann entropy by establishing an asymptotic match between the two entropies. Despite the simplicity of unary structures, our arguments require the use of formula size games, Stirling’s approximation and Chernoff bounds.

Keywords and phrases:
formula size, finite model theory, formula size games, entropy, randomness
Copyright and License:
[Uncaptioned image] © Reijo Jaakkola, Antti Kuusisto, and Miikka Vilander; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Finite Model Theory
; Mathematics of computing Information theory
Related Version:
Full Version: https://arxiv.org/abs/2406.02108
Funding:
Antti Kuusisto and Miikka Vilander were supported by the Academy of Finland projects Explaining AI via Logic (XAILOG), grant number 345612 and Theory of computational logics, grant numbers 352419, 352420, 353027, 324435 and 328987.
Editors:
Jörg Endrullis and Sylvain Schmitz

1 Introduction

This paper investigates the resources needed to define finite models with a unary relational vocabulary. While unary models are very simple, it turns out that proving limits on the formula sizes for defining them is non-trivial. Furthermore, unary models are important as they give a direct relational representation of Boolean data sets, consisting simply of data points and their properties – thereby providing one of the simplest data representation schemes available. In practice all tabular data can be discretized and modeled via a Boolean data set. This relates to applications in, e.g., explainability and compression.

Given a logic and a class of models, the description complexity C(𝔐) of a model 𝔐 is the minimum length of a formula φ that defines 𝔐 with respect to . In the main scenario of this paper, is the class of models with the same domain of a finite size n and with the same unary vocabulary τ. We mostly study the setting via first-order logic FO. However, as description complexity links to the themes of compressibility and compression, we also investigate the restricted languages FOd where the quantifier rank of every formula is limited to a positive integer d. This will lead to dramatically shorter description lengths (cf. Section 3) via a natural lossy compression phenomenon.

We also investigate how the Shannon entropies of unary structures are linked to their description complexities, the general trend being that higher entropy relates to higher description complexity. Shannon entropy is a well-known measure of intrinsic complexity, or randomness, from information theory. The Shannon entropy of a probability distribution :X[0,1] over a finite set X is given by xX(x)log2(x). A relational structure 𝔐 of size n over a unary vocabulary τ naturally defines a probability distribution over its domain. Indeed, let T be the set of unary quantifier-free types over τ, i.e., subsets of τ. A point a of a model 𝔐 realizes a type πτ if π is the set of relation symbols corresponding to exactly those unary relations that contain the point a. Now a τ-model 𝔐 of size n naturally defines the probability distribution :T[0,1] such that (π)=|π|n, where |π| is the number of points of 𝔐 realizing the type π. The Shannon entropy of 𝔐 is then naturally defined to be equal to the Shannon entropy of the distribution :T[0,1].

While the Shannon entropy of 𝔐 gives an intrinsic measure of complexity (or randomness) of 𝔐, another entropy measure may perhaps be easier to grasp intuitively. Boltzmann entropy has its origins in statistical mechanics, and it was originally defined as klnΩ, where k is the Boltzmann constant and Ω the number of microstates of a system. In our setting, we follow [14] and define Boltzmann entropy of a model class 𝒜 as log2|𝒜|, thus dropping the Boltzmann constant k, using binary logarithms and associating models with microstates. Now, it is natural to then define the Boltzmann entropy of a model 𝔐 as log2||, where is the isomorphism class of 𝔐 (recall here that in our setting, all models have the same domain of size n, so is finite). The reason why the Boltzmann entropy of 𝔐 is a reasonable measure of intrinsic complexity of 𝔐 is now easy to motivate. Firstly, consider a τ-model 𝔐0 of size n where each Pτ is interpreted as the empty relation. This is a very simple model whose isomorphism class has size 1 and the Boltzmann entropy of 𝔐0 is thus very low: log21=0. On the other hand, models with the predicates in τ distributed in more disordered ways have larger isomorphism classes and thus greater Boltzmann entropies.

1.1 Contributions

Concerning upper bounds on description complexity, we show how to define unary structures via FO-formulas that are linear in model size. This contrasts the standard quadratic formulas that use equalities for counting cardinalities in a naive way. We also give analogous formulas for FOd with quantifier rank at most d. Concerning lower bounds, we use formula size games to provide bounds with a worst case gap of a constant factor of 2 in relation to the upper bounds. This is done both for full FO and FOd.

For a random structure the upper and lower bounds in fact match up to a sublinear additive term. Using this, we show that – asymptotically – the expected description complexity of a random unary structure of size n and over the vocabulary τ is exactly 3n/2|τ| .

We then turn our attention to entropy. We show a close relationship between the Shannon entropy and Boltzmann entropy of a unary structure. We obtain related upper and lower bounds and thereby also establish the following asymptotic equivalence for every sequence 𝔐n of models of increasing size n: HS(𝔐n)1nHB(𝔐n). We note that a result bearing a resemblance to this one has been obtained in a slightly different framework in [15].

Finally, we relate the description complexity of a model to its entropy. We investigate the general picture of the relationship by giving upper and lower bounds on the description complexity of a model in terms of its entropy. See Figure 1(a) for the case of FO and Figure 1(b) for FOd. The bounds allow us to exclude a large portion of the (a priori) possible combinations of description complexity and entropy. In particular, we see that models with very high entropy have higher description complexity than models with very low entropy. Moreover, models with a very low entropy are guaranteed to have a reasonably low description complexity, while models with very high entropies must have a notable description complexity.

1.2 Related work, techniques and applications

Description complexity is conceptually related to Kolmogorov complexity, and it is also well known that entropy and Kolmogorov complexity are linked. Indeed, for computable distributions, Shannon entropy links to Kolmogorov complexity to within a constant. This is discussed, e.g., in [17, 9, 16]. However, [23] shows that the general link fails for Rényi and Tsallis entropies. See, e.g., [9, 16, 23] for discussions on Rényi and Tsallis entropies.

Concerning work in the intersection of logic and entropy, the recent article [14] by Jaakkola et al. provides related results for a graded modal logic GMLU over Kripke-models with the universal accessibility relation.

They show that the expected Boltzmann entropy of the equivalence classes of GMLU is asymptotically equivalent to the expected description complexity times the vocabulary size. While [14] concerns GMLU, the current paper studies (monadic) FO. Because of the multi-variable nature of FO, this leads to some major differences in the techniques required. The upper bound formulas of the current paper use some clever tricks that are not possible in the modal logic GMLU. Indeed, together with the results of [14], our upper bound formulas show that FO is more succinct than GMLU. Furthermore, the techniques used for the lower bounds for GMLU do not suffice for FO, necessitating new arguments.

Surprisingly, the relationship to entropy also turns out to be different. Indeed, in the case of GMLU, models with maximal entropy have maximal description complexity, while in the case of FO this is no longer the case.

For proving bounds on formula sizes, we use formula size games for FO. Indeed, variants of standard Ehrenfeucht-Fraïssé games would not suffice, as we need to deal with formula length, and thereby with all logical operators, including connectives. The formula size game that we use for FO is a slight modification of the game of Hella and Väänänen [10]. The first formula size game, developed by Razborov in [20], dealt with propositional logic. A later variant of the game was defined by Adler and Immerman for CTL in [1]. In [11] the formula size game for modal logic ML was used by Hella and Vilander to establish that bisimulation invariant FO is non-elementarily more succinct than ML. For a further example, we also mention the frame validity games of Balbiani et al. [2]. Recently, Fagin et al. in [7, 8] and Carmosino et al. in [4, 5, 6] have developed and used multi-structural games to prove lower bounds on the number of quantifiers that are needed for separating two structures in a given logic. In [8] they have also pointed out that strong lower bounds on the number of quantifiers would imply new lower bounds in circuit complexity.

Description complexity is relevant in many applications, one interesting link being data compression. It is natural to consider unary models 𝔐 as data sets to be compressed into corresponding FO-sentences. To give a simplified example, let be the class of models over the unary alphabet τ={P,Q} and with domain M={1,,10}. Let 𝔐1 be the model where P𝔐1=Q𝔐1=M and 𝔐2 be the model where P𝔐2={1,2,3} and Q𝔐2 is, say, {3,4,5,6,7}. Now, the simple formula x(P(x)Q(x)) fully defines the model 𝔐1 with respect to , while the model 𝔐2 clearly requires a more complex formula. Suppose then that our models are represented as tabular Boolean data, meaning that each model corresponds to a 0-1-matrix with ten rows (one row for each domain element mM) and two columns, one column for P and another one for Q. In this framework, when using FO as a compression language, the Boolean matrix for 𝔐1 then compresses nicely into the formula x(P(x)Q(x)), while the matrix for 𝔐2 compresses to a notably more complex formula.

Many of the technical goals in explainable artificial intelligence (XAI) relate to compression [22], often revolving around issues of compressing information given by probability distributions. It is natural to expect representations of distributions with very high values of Shannon entropy to be more difficult to compress than ones with very low values. Concerning formula length, recent articles on XAI using minimum length formulas of logics as explanations of longer specifications include, e.g.,[3, 18, 12, 13], and numerous others. For work on using short Boolean formulas as general explanations of real-life data given in the form of unary relational structures (i.e., tabular Boolean data sets), see [13]. In that paper, surprisingly short Boolean formulas are shown to give similar error rates to ones obtained by more sophisticated classifiers, e.g., neural networks and naive Bayesian classifiers.

Concerning further directions in explainability, minimum size descriptions ψ of unary relational models 𝔐 can be useful for finding explanations in the context of the special explainability problem [12]. The positive case of this problem amounts to finding formulas χ with a given bound k on length such that 𝔐χφ, where φ acts as a classifier. In this context, it often suffices to find a short interpolant χ such that ψχφ, where ψ is a minimum description of 𝔐. In applications, this latter task can often be more efficient than the first one, especially when ψ is significantly smaller than 𝔐. One way to ensure ψ is short enough is to describe 𝔐 in a sufficiently incomplete way, such as with FOd with small d.

Finally, in applications, it is typically easy to compute the Shannon entropy of structures, while description complexity and thereby issues relating to compressibility and explainability are much more difficult to determine. Therefore, even a rough picture of the links between entropy and description complexity can be useful.

The plan of the paper is as follows. After the preliminaries in Section 2, we provide upper bounds for the description complexity of unary structures in Section 3. In Section 4 we establish related lower bounds using games. In Section 5 we determine asymptotically the expected description complexity of a random unary structure. In Section 6 we give bounds on the relationship between entropy and description complexity. In Section 7 we conclude.

2 Preliminaries

Let τ={P1,,Pk} be a monadic vocabulary and let 𝑉𝑎𝑟={x1,x2,} be a countably infinite set of variables. The syntax of first-order logic FO[τ] is generated by the grammar: φ::=x=yP(x)¬φφφφφxφxφ, where x,y𝑉𝑎𝑟 and Pτ. The quantifier rank of a formula φFO[τ] is the maximum number of nested quantifiers in the formula. We denote by FOd[τ] the fragment of FO[τ] that only includes the formulas with quantifier rank at most d. A formula φFO[τ] is in negation normal form if negations are only applied to atomic formulas x=y or P(x). We assume all formulas are in negation normal form and treat the notation ¬φ as shorthand for the negation normal form formula obtained from φ by pushing the negation to the level of atomic formulas.

The size of a formula φFO[τ] is defined as the number of atomic formulas, conjunctions, disjunctions and quantifiers in φ. Note that negations do not contribute to the size of φ. This choice together with using negation normal form means that positive and negative atomic information is treated as equal in terms of formula size. In line with this thinking, we will refer also to xy and ¬P(x) as atomic formulas in the sequel.

A formula φFO[τ] is in prenex normal form if it is of the form Q1x1Qmxmψ, where Qi{,} for i{1,,m} and ψFO[τ] has no quantifiers. It is well-known that every FO-formula can be transformed into an equivalent formula in prenex normal form which has the same size as the original formula.

A 𝝉-model is a tuple 𝔐=(M,P1𝔐,,Pk𝔐), where M={1,,n} and Pi𝔐M for i{1,,k}. A model 𝔐 is a model of size n if |M|=n. A partial function s:𝑉𝑎𝑟M is called an interpretation. We also call pairs (𝔐,s) models and identify the pair (𝔐,) with the model 𝔐. The truth relation (𝔐,s)φ is defined in the usual way for FO[τ].

Let 𝔐=(M,P1𝔐,,Pk𝔐) be a τ-model of size n. We say that a formula φFO[τ] defines 𝔐 if for all τ-models 𝔐 of size n we have (𝔐,)φ iff 𝔐 is isomorphic to 𝔐. As first-order logic cannot distinguish between isomorphic structures, we can in some sense identify the model 𝔐 with the class of models isomorphic to 𝔐. The description complexity C(𝔐) of 𝔐 is the size of the smallest formula in FO[τ] that defines 𝔐.

Note that our definition of description complexity concerns separating 𝔐 only from other models of the same size n. Requiring separation from all other models would unduly emphasize the size of the model, making even very simple models have a high description complexity. For example, the model 𝔐=(M,P𝔐) of size n, where P𝔐=M, would already require a formula with size in the order of n. In our setting, C(𝔐)=2, because 𝔐 is defined by the formula xP(x).

A τ-type π is a subset of τ. A point aM realizes a τ-type π if for all Pτ we have aP𝔐 iff Pπ. We let |π|𝔐 denote the number of points in 𝔐 realizing π. We often omit the subscript when the model is clear from the context. Note that two τ-models 𝔐 and 𝔐 are isomorphic iff each type is realized in the same number of points in both models.

We also consider more coarse ways to divide models into classes than isomorphism. For each positive integer d we can define an equivalence relation d over τ-models of size n as follows. Given two τ-models 𝔐 and 𝔐 of size n, we define that 𝔐d𝔐 iff for each τ-type π with |π|𝔐<d, we have that |π|𝔐=|π|𝔐. In other words, 𝔐d𝔐 iff each type that is realized in less than d points in 𝔐 is realized in the same number of points in both models. It is easy to show that 𝔐d𝔐 iff they satisfy the same sentences of FOd[τ]. The 𝒅-description complexity Cd(𝔐) of a τ-model 𝔐 is the size of the smallest FOd[τ]-formula that defines the equivalence class of 𝔐 in d.

To characterize model classes, we use tuples with t=2|τ| numbers. For an isomorphism class, the tuple is simply (|π1|,,|πt|). For an equivalence class of d, we only use numbers up to d. For a tuple m¯=(m1,,mt), if mi=d, then there are at least d realizing points of type πi in models of the class . If mi<d, then each model has exactly mi points realizing the type πi. The notation m¯ refers to classes of d via these tuples. The tuples that correspond to some class of d are characterized by the conditions mid for i{1,,t}, i=1tmin and if i=1tmi<n, then mj=d for some j{1,,t}. If i=1tmi=n, then m¯ is an isomorphism class.

Since τ-types partition the points of a τ-model 𝔐, we may consider a natural probability distribution over the types in 𝔐. The probability pπ of a type π is simply |π|/n, that is, the probability of hitting a point of type π when selecting a point from 𝔐 randomly. The Shannon entropy of 𝔐 is the quantity HS(𝔐):=i=1tpπilog(pπi)=i=1t|πi|nlog(|πi|n). Here we follow the convention 0log(0)=0. Shannon entropy is an information theoretic way of measuring randomness of probability distributions. Uniform distributions have maximal Shannon entropy, as the uncertainty of the outcome of choosing a random point is maximized. Conversely, for a distribution that places all of the probability mass on a single event, Shannon entropy is zero. Hence, a model realizing each type the same number of times (or as close as possible) has maximal Shannon entropy, while for a model that realizes only a single type Shannon entropy is zero.

Another way to define entropy of a model 𝔐 uses the model class 𝔐 belongs to. Given an equivalence relation over models of size n (and thus domain {1,,n}), the Boltzmann entropy of 𝔐 with respect to is HB(𝔐):=log(||), where is the equivalence class of 𝔐. In this paper the equivalence relation is either isomorphism in the case of full FO or d for FOd. For isomorphism, we write HB(𝔐) and for d we write HBd(𝔐).

Boltzmann entropy originates from statistical mechanics, where it measures the randomness of a macrostate (= a model class) via the number of microstates (= models) that correspond to it. The idea is that a larger macrostate is “more random” (or “less specific”) since it is more likely to be hit by a random selection. We show in Section 6 that HS(𝔐)1nHB(𝔐), where n is the size of the domain of 𝔐. Thus the two notions of entropy are asymptotically equivalent up to normalization. This shows that both entropies indeed measure the randomness of a model from different points of view.

3 Upper bound formulas

In this section we define arbitrary τ-models via formulas of size linear in the size of the model. Recall that defining a model means separating it from all non-isomorphic models with the same domain size. To see why linear size formulas are quite succinct, note that the following naive formula =12|τ|x1x|π|(i=1|π|π(xi)j=i+1|π|xixj), which expresses that for each 12|τ| the type π is realized by at least |π| distinct points, is of quadratic size in the size n of the model.

For clean results on formula size, we define a constant cτ:=15|τ|2|τ|. Note that we consider cτ to be constant as it only depends on the size of the alphabet τ, which in our context is constant.

Theorem 1.

Let 𝔐 be a model of size n. Let T={π1,,π} be the types realized in 𝔐, enumerated in ascending order of numbers of realizing points. Now we have the bound C(𝔐)min(3|π|+cτ,6|π1|+cτ).

Proof.

We obtain two different upper bound formulas. Due to lack of space, we only give one of them in full here; see A.1 for details on the second formula.

We begin with an easy formula we use extensively below. For a type π and x𝑉𝑎𝑟, let

π(x):=PπP(x)Pπ¬P(x).

The formula π(x) states that the point x realizes the type π.

Let T={π1,,π} be a set of τ-types and let m¯ be a sequence of r positive integers with 0<m1mr. Let 𝔐 be a model of size n, where exactly the types in T are realized. We will make sure of this with a separate formula later. The formula φ(T,m¯) below is satisfied by such a model 𝔐 if and only if for every i{1,,r}, the model 𝔐 has at least mi points that realize the type πi. Note that we do not assert anything about the types πr+1,,π, but we still need to mention them in the formula. We define

ψmr:=yxmr1j{1,,r}mj=mr(πj(x1)πj(y))
ψi:=yxi1ψi+1, if mji for all j{1,,r}, and
ψi:=yxi1(j{1,,r}mj=i(πj(x1)πj(y))ψi+1), otherwise.
ψ1:=ψ2, if mj1 for all j{1,,r}, and
ψ1:=j{1,,r}mj=1πj(x1)ψ2, otherwise.
φ(T,m¯):=x1xmr1y(j{r+1,,}πj(x1)ψ1)

We proceed with an explanation of how the formula φ(T,m¯) works. We assume that precisely the types in T are realized in the model 𝔐 to be evaluated, so we know that the first universal variable x1 is always attached to a point that realizes one of the types in T. The formula first checks if x1 realizes one of the types πr+1,,π that we wish to ignore. The recursion then handles the rest of the types, starting with the smallest ones. If the type πj of x1 has mj=1, nothing further is stated as we already know the type is realized in 𝔐 by our assumption.

Now, consider a type πj with, say, mj=5. Up to the subformula ψ5, the recursion of our formula has insisted that yxi for i{1,2,3,4}. Note that the formula does not contain any atomic formulas xi1xi2. The crucial point is that since the variables x1,,x4 are universally quantified, the existence of y must hold also in the case, where x1,x4 happen to all be different points of the same type πj. If the evaluated model 𝔐 has at least 5 points that realize πj, then the formula holds as another point y that realizes πj can be found. If, however, 𝔐 has only 4 points that realize πj, then one of the universally quantified tuples includes precisely those 4 points and another y of the same type cannot be found.

We adopt the notation k=|τ| and compute the size of φ(T,m¯). The formula has mr quantifiers. For each type πT, there are at most two occurrences of the subformula π(x) (with different variables x). Each subformula π(x) contains k atomic formulas. Thus there are at most 2k|T| atomic formulas of the form P(x) or ¬P(x). Each inequality yxi for 1imr1 occurs exactly once, so there are mr1 atomic formulas that are equalities or inequalities. Finally we multiply the number of atomic formulas by two and subtract one to also account for the binary connectives. The size of φ(T,m¯) is thus at most

mr+2(mr1+2k|T|)1=3mr+4k|T|3.

We proceed to define our first complete upper bound formula that defines an isomorphism class of models. Let 𝔐 be a τ-model with domain M={1,,n}. Let T={π1,,π} be the set of τ-types realized in 𝔐 and let m¯=(|π1|,,|π|). Assume further that m¯ is increasing. The full formula φ(𝔐) is based on bounding the size of every type in T from below, thus separating it from all non-isomorphic models with the same domain size.

φ(𝔐):=i=1xπi(x)xi=1πi(x)φ(T,m¯)

In addition to the size of φ(T,m¯) computed above, φ(𝔐) includes |T|+1 quantifiers and two occurrences of π(x) for each type πT, resulting in 2k|T| atomic formulas. Accounting for the added binary connectives, the size of φ(𝔐) is thus at most

|T|+1+22k|T|+3|π|+4k|T|3=3|π|+8k|T|+|T|23|π|+cτ.

The second formula ψ(𝔐) of size at most 6|π1|+cτ states that each type πi with i has exactly |πi| points. See A.1 for details. Both formulas define any model 𝔐 so we can always use whichever is smaller, thus proving the claim.

Corollary 2.

Let 𝔐 be a model of size n. Now C(𝔐)2n+cτ.

Proof.

A model 𝔐 corresponding to the tuple (0,,0,n/3,2n/3) maximises the value of the expression min(3|π|+cτ,6|π1|+cτ), getting the value 2n+cτ.

We now consider defining equivalence classes of d. Recall that an equivalence class of d corresponds to a tuple m¯=(m1,,mt), where t=2|τ|, mid for all i{1,,t}, i=1tmin and if i=1tmi<n, then mj=d for some j{1,,t}.

Theorem 3.

Let 𝔐 be a τ-model of size n. Let m¯ be the equivalence class of 𝔐 in d, where m¯=(m1,,mt) is the corresponding tuple with the numbers in ascending order. Let mr be the highest number in m¯ below d. Now Cd(𝔐)3d+3mr+cτ. Additionally, if mt1<d, then Cd(𝔐)6mt1+cτ.

Proof.

We use the same subformulas from Theorem 1 to obtain two linear size formulas. See A.2 for details. The first formula of size 3d+3mr+cτ works for any tuple m¯ and states that each type πi has exactly mi points if mi<d and at least d points if mi=d. The second formula of size 6mt1+cτ states that each type πi with it has exactly mi points and works only if all types except possibly πt have less than d points.

Note that since mr<d, we have 6mr<3d+3mr so the bound for the special case is tighter than the general one. While we must use the more general bound for any m¯ with at least two instances of d, the tighter bound is significantly better for small classes with only one instance of d in their tuple. For example, the class with the tuple (0,,0,1,d) gets an upper bound of 6+cτ regardless of the number d. At the other extreme, the class with the tuple (0,,0,d1,d,d) gets an upper bound of 3d+3(d1)+cτ=6d3+cτ.

We again directly obtain a global upper bound on description complexity.

Corollary 4.

Let 𝔐 be a τ-model of size n. Now Cd(𝔐)6d3+cτ.

4 Lower bounds via formula size games

In this section, we show lower bounds that match the upper bounds of Section 3 up to a factor of 2. We use the formula size game for first-order logic defined in [10]. We modify the game slightly to correspond to formulas in prenex normal form as this form does not affect the size of the formula. In addition, we introduce a second resource parameter q that corresponds to the number of quantifiers in the separating formula. The game consists of two phases: a quantifier phase, where only -moves and -moves can be made by S, and an atomic phase, where only -moves, -moves and atomic moves can be made. Before the definition of the game, we define some notation.

Let 𝒜 be a set of τ-models and let φFO[τ]. We denote 𝒜φ to mean (𝔐,s)φ for all (𝔐,s)𝒜. Similarly, we denote 𝒜¬φ to mean (𝔐,s)φ for all (𝔐,s)𝒜.

For an interpretation s, a point aM and a variable x𝑉𝑎𝑟, we denote by s[a/x] the interpretation s such that s(x)=a and s(y)=s(y) for all ydom(s), yx. Let 𝒜 be a set of τ-models with the same domain M and let f:𝒜M be a function. We denote by 𝒜[f/x] the set {(𝔐,s[f(𝔐,s)/x])(𝔐,s)𝒜}. Intuitively, the function f gives the new interpretation of the variable x for each model (𝔐,s)𝒜. Additionally, we denote 𝒜[M/x]:={(𝔐,s[a/x])(𝔐,s)𝒜,aM}. Here the variable x is given all possible interpretations, usually leading to a larger set of models. We next define the game.

Let 𝒜0 and 0 be sets of τ-models and let r0,q0 with r0>q0. The FO prenex formula size game FSτ(r0,q0,𝒜0,0) has two players: Samson (S) and Delilah (D). Positions of the game are of the form (r,q,𝒜,), where r,q and 𝒜 and are sets of τ-models. The starting position is (r0,q0,𝒜0,0). In a position (r,q,𝒜,), if r=0, then the game ends and D wins. Otherwise, if q>0, the game is said to be in the quantifier phase and S can choose from the following three moves:

  • -move: S chooses f:𝒜M and xi𝑉𝑎𝑟. The new position is
    (r1,q1,𝒜[f/xi],[M/xi]).

  • -move: The same as the -move with the roles of 𝒜 and switched.

  • Phase change: S moves on to the atomic phase and the new position is (r,0,𝒜,).

In a position (r,q,𝒜,), if q=0, the game is said to be in the atomic phase and S can choose from the following three moves:

  • -move: S chooses r1,r2 and 1,2 such that r1+r2+1=r and 12=. Then D chooses the next position from the options (r1,0,𝒜,1) and (r2,0,𝒜,2).

  • -move: The same as the -move with the roles of 𝒜 and switched.

  • Atomic move: S chooses an atomic formula α. The game ends. If 𝒜α and ¬α, then S wins. Otherwise, D wins.

The prenex formula size game characterizes separation of model classes with formulas of limited size in the following way.

Theorem 5.

Let 𝒜0 and 0 be sets of τ-models and let r0,q0 with r0>q0. The following are equivalent

  1. 1.

    S has a winning strategy in the game FSτ(r0,q0,𝒜0,0),

  2. 2.

    there is an FO[τ]-formula φ in prenex normal form with size at most r0 and at most q0 quantifiers such that 𝒜0φ and 0¬φ,

  3. 3.

    there is an FO[τ]-formula φ with size at most r0 and at most q0 quantifiers such that 𝒜0φ and 0¬φ.

Proof.

For the simple inductive proof on how the game works, see [10]. The slight modifications of the separate parameter q for quantifiers and prenex normal form do not change the proof in any meaningful way so we omit it. For the equivalence between the second and third item, note that transforming a formula into prenex form and renaming variables as needed, does not increase its size in full FO with no restrictions on, say, the number of variables.

We take a moment to build some intuition on the formula size game. The role of player S is to show that the model sets 𝒜0 and 0 can be separated by some FO formula with restrictions on size and number of quantifiers. To achieve this, S starts building the supposedly separating formula, starting from the quantifiers.

Each move of the game corresponds to an operator or atomic formula. When making a move, S makes choices for each model that reflect how that particular model is going to satisfy the formula, in the case of models in 𝒜, or not satisfy it, in the case of models in . For example, for an -move, S must choose for each model in 𝒜 the point to quantify. This is done via the function f. For a -move, S chooses for each model in one of the conjuncts, asserting that the model will not satisfy that conjunct.

The resources r0 and q0 restrict the moves of S. He can only make at most q0 quantifier moves in the quantifier phase of the game. The resource r0 limits the size of the entire separating formula, including the quantifiers. In the atomic phase, for -moves, S must divide the remaining resource r between the two conjuncts. It is then the role of D to choose the conjunct she thinks cannot be completed in such a way that the models present are separated. Once D has chosen a conjunct, the other conjunct not chosen is discarded for the rest of the game. Thus, the entire separating formula need not be constructed.

We move on to our lower bounds. Let 𝔐 be a τ-model with domain M={1,,n} and let T={π1,,π} be the types realized in 𝔐, enumerated in ascending order of numbers of realizing points, like in the previous section. We assume that 2 as a model, where all points are of the same type, is easily defined by a constant-sized formula. We use the formula size game to show a lower bound of the order 3|π1| for the description complexity of 𝔐.

Let 𝔐 be the model obtained from 𝔐 by changing the type of one point from π1 to π. We define 𝒜0={(𝔐,)} and 0={(𝔐,)}. We will show that separating the sets 𝒜0 and 0 requires a formula of size at least 3|π1|3. We begin with an easy lemma on the number of quantifiers required to separate 𝒜0 from 0.

Lemma 6.

If φ separates 𝒜0 from 0, then φ has at least |π1| quantifiers.

Proof.

Let r0>|π1|1. We show that D has a winning strategy for the formula size game FSτ(r0,|π1|1,𝒜0,0). By Theorem 5, this proves the claim.

We show that in any position of such a game, there is a pair (𝔐,s)𝒜 and (𝔐,s) of models that cannot be separated by any atomic formula. At the starting position, the single models in 𝒜0 and 0 are such a pair as no variables have been quantified. We proceed to show that D can maintain this pair of models through any move of S. We only treat one of each pair of dual moves as the other is handled the same way.

-move:

S chooses a function f:𝒜M. We focus on the point a=f(𝔐,s) chosen for the model (𝔐,s)𝒜. On the other side, copies of (𝔐,s) are generated for each point bM, but we restrict attention to only one as follows. If there is a previously quantified variable x with s(x)=a, then we choose b=s(x). Otherwise we choose a new point b of the same type as a. If the type of a is πi with i<1, then 𝔐 and 𝔐 have the same points of type πi so we may choose b=a. If i{1,}, then both 𝔐 and 𝔐 have at least |π1|1 points of the type πi so we may choose a fresh b of the same type. The new pair of models found in this manner is clearly atomic-equivalent.

Phase change:

With no changes to the sets of models 𝒜 and , the important pair of models is still clearly present in the next position.

-move:

S chooses splits r1+r2+1=r and 12=. Now the model (𝔐,s) is in 1 or 2 and 𝒜 remains unchanged. Thus our model pair is present in one of the positions (r1,0,𝒜,1) and (r2,0,𝒜,2). By choosing such a position, D maintains the pair of models.

Atomic move:

The model pair is atomic-equivalent, so D wins after any atomic move.

The next lemma concerns the atomic phase. We show that if the number of different atomic formulas required to separate the model sets 𝒜 and is too large, D wins the game.

Lemma 7.

In a game FSτ(r0,q0,𝒜0,0), let (r,0,𝒜,) be the first position of the atomic phase and let Γ be a minimum size set of atomic formulas such that for every (𝔐,s)𝒜 and (𝔐,s), there is αΓ with (𝔐,s)α and (𝔐,s)α. If r<2|Γ|1, then D has a winning strategy from the position (r,0,𝒜,).

Proof.

We show that every move of S either ends the game in a win for D, or maintains the condition r<2|Γ|1. Assume this condition holds in position (r,0,𝒜,).

Atomic move:

S chooses an atomic formula α. Since 1r<2|Γ|1, we have |Γ|2 so the single atomic formula α does not separate 𝒜 from and D wins.

-move:

S chooses splits r1+r2+1=r and 12=. Assume for contradiction that there are sets Γ1 and Γ2 of atomic formulas such that Γi separates 𝒜 from i and ri2|Γi|1. Now for every pair of models (𝔐,s)𝒜 and (𝔐,s) we have (𝔐,s)1 or (𝔐,s)2 so the set Γ1Γ2 separates 𝒜 from . Recalling that Γ is a separating set of minimum size and r<2|Γ|1, we also have r<2|Γ1Γ2|12(|Γ1|+|Γ2|)1r1+r2+1=r, which is a contradiction. Thus we have r1<2|Γ1|1 or r2<2|Γ2|1. By choosing the correct position D can maintain the required condition.

-move:

Identical to the -move with the roles of 𝒜 and switched.

We are now ready for the main theorem of this section.

Theorem 8.

Let 𝔐 be a model of size n. Let T={π1,,π} be the types realized in 𝔐, enumerated in ascending order of numbers of realizing points, where 2. Now C(𝔐)3|π1|3.

Proof.

We begin with a definition. Let Γ be a set of atomic FO-formulas. We denote the set of variables occurring in formulas of Γ by V(Γ). We define the variable graph of 𝚪 as G(Γ)=(V(Γ),E(Γ)), where (x,y)E(Γ) iff x=yΓ or xyΓ. We say that ΔΓ is a connected component of 𝚪 if G(Δ) is a maximal connected subgraph of G(Γ).

For convenience, we denote here m:=|π1|. Consider a formula size game FSτ(3m4,q0,𝒜0,0). We show that D has a winning strategy for this game, thus proving the claim by Theorem 5. By Lemma 6 we see that to have a chance of winning, S must begin the game with at least m quantifiers. We then move on to the first position (r,0,𝒜,) of the atomic phase, where r2m4. Let Γ be a set of atomic formulas such that for every (𝔐,s)𝒜 and (𝔐,s), there is αΓ such that (𝔐,s)α and (𝔐,s)α. If |Γ|m1 for every such Γ, then r2m4=2(m1)2<2|Γ|1 so D has a winning strategy by Lemma 7. We now assume for contradiction that there exists such a Γ with |Γ|m2.

Consider the connected components Δ of Γ. Since a connected graph with k edges has at most k+1 vertices, for every Δ at most m1 variables occur in the formulas of Δ.

We now explain why there is a single pair of models (𝔐,s)𝒜 and (𝔐,s) such that they are atomic equivalent with respect to the variables in V(Δ) for every connected component Δ of Γ. We consider the quantifier moves S made in the quantifier phase in the order the moves were made. For every variable x used in a -move, we consider Δ such that xV(Δ). We proceed as in the proof of Lemma 6, with respect to only the variables in V(Δ). That is, if there is a previously quantified variable yV(Δ) such that s(y)=s(x), we choose the opposing model where s(x)=s(y). Otherwise, we choose a point with no variables of V(Δ) attached. Each Δ uses at most m1 variables so we do not run out of fresh points of any type. The same protocol works for -moves as well.

Note that the choices of models are made based on the connected component Δ of x, completely independently of other components. Since every variable x is in exactly one component Δ, this means that the resulting pair of models is simultaneously atomic equivalent with regards to each component separately. Thus this model pair cannot be separated by any atomic formula in Γ. This contradiction with the definition of Γ proves the claim.

We now consider lower bounds in the setting of FOd. Recall that an equivalence class of d is characterized by a tuple (m1,,mt), where t=2|τ|, mid, i=1tmin and if i=1tmi<n, then mj=d for some j. Let m¯=(m1,,mt) be such a tuple in ascending order of the numbers mi. If i=1tmi=n, then m¯ corresponds to an isomorphism class and the lower bounds above work as is. Thus we assume that i=1tmi<n and consequently mt=d. By taking a model 𝔐 in the equivalence class m¯ with a maximal number of points of the type πt, we can directly obtain the model 𝔐 as above and get a lower bound on defining the class m¯ in full FO. This bound directly extends also to FOd, as limiting quantifier rank gives no advantage in terms of formula size.

Corollary 9.

Let m¯ be an equivalence class of d, where m¯=(m1,,mt) is the corresponding tuple with the numbers in ascending order. Now C(m¯)3mt13.

5 Expected description complexity

Using Theorems 1 and 8, we can determine asymptotically the expected description complexity of a random τ-model. Here by random we mean that the model is sampled uniformly at random from the set of all τ-models of size n. That is, we determine the asymptotic behavior of the quantity 𝔼n[C]:=12|τ|n𝔐C(𝔐) as n, where the sum is taken over all the τ-models 𝔐 of size n.

We say that a τ-model 𝔐 is balanced, if for every τ-type π, we have ||π|𝔐n2|τ||=o(n). In other words, a model is balanced if every type is realized roughly the same number of times, allowing for a sublinear discrepancy. We use the well-known Chernoff bounds to establish that a random model is very likely balanced.

Proposition 10 (Multiplicative Chernoff bound).

Let X:=i=1nXi be a sum of independent 0-1-valued random variables, where Xi=1 with probability p and Xi=0 with probability 1p. Let μ:=𝔼[X]. Now, for every 0δ<1 we have that Pr[|Xμ|δμ]2eδ2μ/3

Proof.

See for example Corollary 4.6 in [19].

Lemma 11.

The probability that a random τ-model of size n is balanced is at least 12|τ|+1/n.

Proof.

A routine calculation using Proposition 10. See A.3 for details.

The previous lemma gives a rough characterization of random τ-models. Using this characterization together with Theorem 8 we can determine asymptotically the expected description complexity of a random τ-model.

Theorem 12.

𝔼n[C]3n2|τ|

Proof.

To give an upper bound on 𝔼n[C] we first rewrite it as follows:

𝔼n[C]=12|τ|n𝔐 balancedC(𝔐)+12|τ|n𝔐 not balancedC(𝔐) (1)

Using Corollary 2 and Lemma 11 we see that

12|τ|n𝔐 not balancedC(𝔐)12|τ|n𝔐 not balanced2n+cτ=Pr[𝔐 is not balanced](2n+cτ)
2|τ|+1n(2n+cτ)=2|τ|+2+cτ2|τ|+1n=𝒪(1).

Since we are interested in the asymptotic behavior of 𝔼n[C], the above shows that we can safely concentrate on the first sum in Equation (1). Using Theorems 1 and 8 we see that if 𝔐 is balanced, then 3n2|τ|o(n)C(𝔐)3n2|τ|+o(n). Hence

Pr[𝔐 is balanced](3n2|τ|o(n))12|τ|n𝔐 balancedC(𝔐)Pr[𝔐 is balanced](3n2|τ|+o(n)).

Since Pr[𝔐 is balanced] goes to one as n, we see that 12|τ|n𝔐 balancedC(𝔐)3n2|τ|, which is what we wanted to show.

6 Entropy and description complexity

In this section we establish results that illustrate how entropy and description complexity relate to each other. As one can already imagine after seeing our results on description complexity, there can be models with very close entropies and quite different description complexities. We can nevertheless use our results to exclude many a priori possible combinations of description complexity and entropy. For notational simplicity, we adopt the notation t:=2|τ|.

We begin by showing that the Boltzmann and Shannon entropies of a single model are essentially the same up to normalization. This underlines the fact that both entropies measure the same thing: the randomness of a model.

Theorem 13.

Let 𝔐 be a τ-model of size n. Now

HS(𝔐)1nHB(𝔐)<(t1)log(2πn)nlog(e)12n2+tlog(e)12n2+n.

Proof.

Using the quantitative version of Stirling’s approximation given in [21], we obtain

HB(𝔐)= log(nn1nt)=logn!n1!nt!=log(n!)i=1tlog(ni!)
< log(2πn(ne)ne112n)i=1tlog(2πni(nie)nie112n+1)
= log(2πn)+nlog(n)nlog(e)+log(e)12n
i=1t(log(2πni)+nilog(ni)nilog(e)+log(e)12n+1)
nlog(n)i=1tnilog(ni)(t1)log(2πn)+log(e)12ntlog(e)12n+1.

Note that the term nlog(e) is cancelled out above because n1++nt=n. Using this same fact we also easily see that

HS(𝔐) =i=1tninlognin=i=1tninlog(n)i=1tninlog(ni)=log(n)i=1tninlog(ni).

Finally, by dividing HB(𝔐) with n we obtain

HS(𝔐)1nHB(𝔐)<(t1)log(2πn)nlog(e)12n2+tlog(e)12n2+n.

The above quantitative result readily implies that the Boltzmann and Shannon entropies of a single model are asymptotically the same up to normalization. A connection that bears a similarity to the one pointed out here has also been noted briefly in [15].

Corollary 14.

Let (𝔐n)n+ be a sequence of τ-models where each 𝔐n has size n. Now HS(𝔐n)1nHB(𝔐n) as n.

The above results show that for the connections to description complexity, we could use either of the two notions of entropy. We opt for Shannon entropy here.

We will next use results from Sections 3 and 4 to prove two theorems that give bounds on description complexity in terms of Shannon entropy. Recall from Section 3 the constant cτ:=15|τ|2|τ|. The first of our two theorems gives global upper and lower bounds on description complexity based on the same edge case distributions.

Theorem 15.

Let p[0,1t[. If HS(𝔐)>((t1)p1)log(1(t1)p)(t1)plog(p), then 3np3<C(𝔐)<3n(1(t1)p)+cτ.

Proof.

Let f(p):=((t1)p1)log(1(t1)p)(t1)plog(p). The function f(p) gives the entropy of a τ-model 𝔐 corresponding to the tuple (np,,np,n(1(t1)p)), where n(1(t1)p)>np for the given values of p. Since all types but the largest are evenly distributed, any model, where the largest type has at least n(1(t1)p) realizing points has entropy at most HS(𝔐)=f(p). Therefore if HS(𝔐)>f(p), then the largest type of 𝔐 has less than n(1(t1)p) realizing points. By Theorem 1, we obtain C(𝔐)<3n(1(t1)p)+cτ. On the other hand, since the largest type of 𝔐 has less realizing points than in 𝔐, those points realize some other type. Therefore the second largest type of 𝔐 has more than np realizing points. By Theorem 8, we obtain C(𝔐)>3np3.

The next theorem uses low entropy models with only two realized types to show a better upper bound on description complexity for low entropy models than the above global one.

Theorem 16.

Let p[0,12]. If HS(𝔐)<(p1)log(1p)plog(p), then C(𝔐)<6np+cτ.

Proof.

Let h(p):=(p1)log(1p)plog(p). The function h(p) gives the entropy of a τ-model 𝔐 corresponding to the tuple (0,,0,np,n(1p)). If HS(𝔐)<h(p), then the second largest type of 𝔐 must be smaller than np. Thus, by Theorem 1, C(𝔐)<6np+cτ.

(a)
(b)
Figure 1: Figure 1(a) on the left shows an area that encapsulates all combinations of Shannon entropy and FO-description complexity for the values |τ|=2 and n=1000. Figure 1(b) on the right concerns the case of FOd and shows bounds on description complexity in terms of Boltzmann entropy for values |τ|=2, n=100 and d=10 with the constants 3 and cτ omitted.

Figure 1(a) incorporates both of the above theorems as well as Corollary 2 to show an area, where all possible combinations of Shannon entropy and description complexity must fall. First, comparing the left side of the plot to the right, we can see that models with very high entropy have significantly higher description complexity than models with very low entropy.

We can also see from Figure 1(a) that the gap between our upper bounds and lower bounds is only constant at both extremes of entropy. For models with middling entropy, the gap is at its largest. This is because middling values of entropy can be realized by models with very different distributions of types, leading to different description complexity.

We conjecture that the upper bound given by Theorem 1 is in reality tight up to the constant cτ. Now, recall that for any single model, our upper and lower bounds have a worst case gap of a factor of 2. Therefore, assuming that our conjecture is true, the lower bound would only rise to at most double its current height. In other words, the general picture illustrated by Figure 1(a) would not be significantly different under our conjecture.

We proceed to show that similar relationships between description complexity and entropy hold also in the case of limited quantifier rank. As the classes of d contain multiple different isomorphism types of models, it is not clear how to define Shannon entropy. Boltzmann entropy, however, is still straightforward so we use Boltzmann entropy here. We formulate similar theorems to those above for full FO.

Theorem 17.

Let h{1,,d1}. If HBd(𝔐)>log(nhhn(t1)h),then Cd(𝔐)>3h3.

Proof.

Let f(n,h)=log(nhhn(t1)h). The function f(n,h) gives the Boltzmann entropy of the class of models m¯, where m¯=(h,,h,d). Any class of models obtained from this one by lowering any of the numbers in the tuple is clearly smaller than m¯ and thus has lower Boltzmann entropy. Thus, for any larger class of models the second largest number in its tuple must be greater than h. By Corollary 9, we obtain Cd(𝔐)>3h3.

Theorem 18.

Let h{1,,d1}. If HBd(𝔐)<log(nh), then Cd(𝔐)<6h+cτ.

Proof.

The function g(n,h)=log(nh) gives the Boltzmann entropy of a class m¯ of models, where m¯=(0,,0,h,d). Now every class of models, where the second largest number in the tuple is at least h, is larger than or equal to m¯. Thus if HBd(𝔐)<g(n,h), then the class of 𝔐 is smaller and the second largest number in its tuple is smaller than h. By Theorem 3 we obtain Cd(𝔐)<6h+cτ.

We again have a plot in Figure 1(b), where the possible combinations of entropy and description complexity lie between the two chopped lines. This time, we plotted from the above theorems 3h for the lower bound and 6h for the upper bound, omitting the constants 3 and cτ. For these low values of n and d, the constants would have warped the picture in a significant way. With high enough n and d, the constants are clearly negligible, but for such values, the Boltzmann entropy quickly becomes impractical to calculate as the model class sizes explode. We provide a plot of the leading terms for the values n=100 and d=10 without the constants to illustrate the trends one would see for higher values of n and d.

We see that the first observation we made for full FO still holds. The models with very high entropy have significantly higher description complexity than those with very low entropy. Concerning the gap between the upper and lower bounds, it is again constant at the extremes. The largest gap can now be found significantly before the halfway point of entropy, unlike for full FO. This is because the limit d of quantifier rank quite quickly cuts short the growth of the upper bound while the lower bound grows slower.

7 Conclusion

We have studied the description complexity of unary models, obtaining bounds for FO and FOd. We have found the asymptotic description complexity of a random unary structure and studied the relation between Shannon entropy and description complexity – also observing a connection between Boltzmann and Shannon entropy. Links to entropy can be useful as computing entropy is significantly easier than determining description complexity.

An obvious future goal would be to close the gaps between the upper and lower bounds. Generalizing to full relational vocabularies is also interesting, although this seems to require highly involved arguments. The part on entropy would there relate to Boltzmann entropy, as there is no obvious unique definition for Shannon entropy in the k-ary scenario.

References

  • [1] Micah Adler and Neil Immerman. An n! lower bound on formula size. ACM Trans. Comput. Log., 4(3):296–314, 2003. doi:10.1145/772062.772064.
  • [2] Philippe Balbiani, David Fernández-Duque, Andreas Herzig, and Petar Iliev. Frame-validity games and lower bounds on the complexity of modal axioms. Log. J. IGPL, 30(1):155–185, 2022. doi:10.1093/jigpal/jzaa068.
  • [3] Pablo Barceló, Mikaël Monet, Jorge Pérez, and Bernardo Subercaseaux. Model interpretability through the lens of computational complexity. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL: https://proceedings.neurips.cc/paper/2020/hash/b1adda14824f50ef24ff1c05bb66faf3-Abstract.html.
  • [4] Marco Carmosino, Ronald Fagin, Neil Immerman, Phokion Kolaitis, Jonathan Lenchner, and Rik Sengupta. On the number of quantifiers needed to define boolean functions, 2024.
  • [5] Marco Carmosino, Ronald Fagin, Neil Immerman, Phokion G. Kolaitis, Jonathan Lenchner, and Rik Sengupta. A finer analysis of multi-structural games and beyond. CoRR, abs/2301.13329, 2023. doi:10.48550/arXiv.2301.13329.
  • [6] Marco Leandro Carmosino, Ronald Fagin, Neil Immerman, Ph. G. Kolaitis, Jonathan Lenchner, Rik Sengupta, and Ryan Williams. Parallel play saves quantifiers. ArXiv, 2024.
  • [7] Ronald Fagin, Jonathan Lenchner, Kenneth W. Regan, and Nikhil Vyas. Multi-structural games and number of quantifiers. In 2021 36th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–13, 2021. doi:10.1109/LICS52264.2021.9470756.
  • [8] Ronald Fagin, Jonathan Lenchner, Nikhil Vyas, and Ryan Williams. On the Number of Quantifiers as a Complexity Measure. In Stefan Szeider, Robert Ganian, and Alexandra Silva, editors, 47th International Symposium on Mathematical Foundations of Computer Science (MFCS 2022), volume 241 of Leibniz International Proceedings in Informatics (LIPIcs), pages 48:1–48:14, Dagstuhl, Germany, 2022. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.MFCS.2022.48.
  • [9] Peter Grünwald and Paul M. B. Vitányi. Shannon information and Kolmogorov complexity. CoRR, cs.IT/0410002, 2004. URL: http://arxiv.org/abs/cs.IT/0410002, doi:10.48550/arXiv.cs/0410002.
  • [10] Lauri Hella and Jouko Väänänen. The size of a formula as a measure of complexity. In Åsa Hirvonen, Juha Kontinen, Roman Kossak, and Andrés Villaveces, editors, Logic Without Borders - Essays on Set Theory, Model Theory, Philosophical Logic and Philosophy of Mathematics, volume 5 of Ontos Mathematical Logic, pages 193–214. De Gruyter, 2015. doi:10.1515/9781614516873.193.
  • [11] Lauri Hella and Miikka Vilander. Formula size games for modal logic and μ-calculus. J. Log. Comput., 29(8):1311–1344, 2019. doi:10.1093/logcom/exz025.
  • [12] Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Masood Feyzbakhsh Rankooh, and Miikka Vilander. Explainability via short formulas: the case of propositional logic with implementation. In Joint Proceedings of (HYDRA 2022) and the RCRA Workshop on Experimental Evaluation of Algorithms for Solving Problems with Combinatorial Explosion, volume 3281 of CEUR Workshop Proceedings, pages 64–77, 2022. URL: https://ceur-ws.org/Vol-3281/paper6.pdf.
  • [13] Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Masood Feyzbakhsh Rankooh, and Miikka Vilander. Short boolean formulas as explanations in practice. In Sarah Alice Gaggl, Maria Vanina Martinez, and Magdalena Ortiz, editors, Logics in Artificial Intelligence - 18th European Conference, JELIA 2023, Dresden, Germany, September 20-22, 2023, Proceedings, volume 14281 of Lecture Notes in Computer Science, pages 90–105. Springer, 2023. doi:10.1007/978-3-031-43619-2_7.
  • [14] Reijo Jaakkola, Antti Kuusisto, and Miikka Vilander. Relating description complexity to entropy. In Petra Berenbrink, Patricia Bouyer, Anuj Dawar, and Mamadou Moustapha Kanté, editors, 40th International Symposium on Theoretical Aspects of Computer Science, STACS 2023, March 7-9, 2023, Hamburg, Germany, volume 254 of LIPIcs, pages 38:1–38:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.STACS.2023.38.
  • [15] Andrey Kolmogorov. The theory of transmission of information. In Selected Works of A. N. Kolmogorov: Volume III: Information Theory and the Theory of Algorithms, pages 6–32. Springer Netherlands, 1993. doi:10.1007/978-94-017-2973-4_3.
  • [16] Sik K. Leung-Yan-Cheong and Thomas M. Cover. Some equivalences between Shannon entropy and Kolmogorov complexity. IEEE Trans. Inf. Theory, 24(3):331–338, 1978. doi:10.1109/TIT.1978.1055891.
  • [17] Ming Li and Paul M. B. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications, 4th Edition. Texts in Computer Science. Springer, 2019. doi:10.1007/978-3-030-11298-1.
  • [18] João Marques-Silva, Thomas Gerspacher, Martin C. Cooper, Alexey Ignatiev, and Nina Narodytska. Explanations for monotonic classifiers. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 7469–7479. PMLR, 2021. URL: http://proceedings.mlr.press/v139/marques-silva21a.html.
  • [19] Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005. doi:10.1017/CBO9780511813603.
  • [20] Alexander A. Razborov. Applications of matrix methods to the theory of lower bounds in computational complexity. Comb., 10(1):81–93, 1990. doi:10.1007/BF02122698.
  • [21] Herbert Robbins. A remark on stirling’s formula. The American Mathematical Monthly, 62(1):26–29, 1955. doi:10.2307/2308012.
  • [22] Advait Sarkar. Is explainable AI a race against model complexity? In Workshop on Transparency and Explanations in Smart Systems (TeXSS), in conjunction with ACM Intelligent User Interfaces (IUI 2022), volume 3124 of CEUR Workshop Proceedings, pages 192–199, 2022.
  • [23] Andreia Teixeira, Armando Matos, Andre Souto, and Luis Filipe Coelho Antunes. Entropy measures vs. Kolmogorov complexity. Entropy, 13(3):595–611, 2011. doi:10.3390/e13030595.

Appendix A Appendix

A.1 Proof of Theorem 1 continued

We define here the second upper bound formula ψ(𝔐) of size at most 6|π1|+cτ, along with required subformulas.

Let T, m¯ and 𝔐 be as in the proof so far. We define another formula χ(T,m¯) below. Now the model 𝔐 satisfies χ(T,m¯) if and only if for every i{1,,r}, the model 𝔐 has at most mi points that realize the type πi. We again do not assert anything about the types πj with no corresponding mj.

θmr :=y=xmrj{1,,r}mj=mr(πj(x1)¬πj(y))
θi :=y=xiθi+1, if mji for all j{1,,r}, and
θi :=y=xi(j{1,,r}mj=i(πj(x1)¬πj(y))(j{1,,r}mj=i¬πj(x1)θi+1), otherwise.
χ(T,m¯) :=x1x2xmry(j{r+1,,}πj(x1)θ1)

We again explain how the above formula works. Note that directly taking the negation of the formula φ(T,m¯) would not work as we are dealing with all types at once. We instead again start with a universally quantified variable x1 that is attached to a point realizing a type πjT. We first check if πj is one of the types we can safely ignore. Assume then that mj=5. The existentially quantified variables x2,,x5 are then chosen to be of the same type πj as x1 in such a way that every point of the type πj has at least one xi attached to it. Since mj=5, the first step of the recursion insists that either y is the same as x1 or the recursion continues. When the recursion arrives at θ5, we cannot go any further, as to continue, we would need mj5. We are instead left with the two options of either y=x5 or y realizes a different type than x1. This amounts to saying that there are no more than 5 points that realize the type πj.

The crucial point of the formula χ(T,m¯) is that the first universally quantified variable x1 allows us to use the same existential quantifiers to count all types at once. To ensure that we do not require all of the types to be the same size, we restrict the type realized by x1 before continuing with the recursion.

We compute the size of χ(T,m¯). The formula has mr+1 quantifiers. For each type π, the subformula π(x) occurs at most three times and for at least one type with |π|=mr, only two times. This results in 3k|T|k atomic formulas of the form P(x) or ¬P(x). For the equalities and inequalities, each equality y=xi for 1imr occurs exactly once, for a total of mr such atomic formulas. Accounting for the binary connectives, the size of χ(T,m¯) is thus at most

mr+1+2(mr+3k|T|k)1= 3mr+6k|T|2k.

Our second complete upper bound formula ψ(𝔐) avoids counting the type π with the most realizing points by bounding the size of all other types from above and from below. For this formula we denote by m¯|π| the sequence (|π1|,,|π1|). We define

ψ(𝔐):= i=1xπi(x)xi=1πi(x)φ(T,m¯|π|)χ(T,m¯|π|).

The numbers of new quantifiers and atomic formulas are the same as for φ(𝔐). Accounting for the binary connectives, including the one connecting φ(T,m¯|π|) and χ(T,m¯|π|), the size of ψ(𝔐) is now at most

|T|+1+2(k|T|+k|T|)+3|π1|+4k|T|3+3|π1|+6k|T|2k+1
= 6|π1|+14k|T|+|T|2k16|π1|+cτ.

A.2 Proof of Theorem 3

Let m¯=(m1,,mt) be a tuple corresponding to a class of d, ordered in the following way. The first numbers m1,,mr are the ones greater than 0 and smaller than d in ascending order. The numbers mr+1,,m are all equal to d, and finally the numbers m+1,,mt are all equal to 0.

Using this order for the types, the set T={π1,,π} is now the set of types realized in models of the class and the first r types are each realized exactly mi<d times. This is in line with the notation of the formulas for full FO above.

Our first formula works for any m¯. The formula states that each type πj is realized at least mj times and furthermore, the ones with mj<d are realized at most mj times.

φd(m¯):= i=1xπi(x)xi=1πi(x)φ(T,(m1,,m))χ(T,(m1,,mr))

In the same way as for ψ(𝔐) in the proof of Theorem 1, the size of φd(m¯) is at most

|T|+1+2(k|T|+k|T|)+3d+4k|T|3+3mr+6k|T|2k+1
= 3d+3mr+14k|T|+|T|2k13d+3mr+cτ.

Our second formula is only for the special case, where there is exactly one mj equal to d. In this case, as with full FO, we can avoid counting the type with the most realizing points. The rest of the types πj have mj<d and the formula states that each πj is realized at least and at most mj times.

ψd(m¯):= i=1xπi(x)xi=1πi(x)φ(T,(m1,,mr))χ(T,(m1,,mr))

Again in the same way as for ψ(𝔐) in the proof of Theorem 1, the size of ψd(m¯) is at most

|T|+1+2(k|T|+k|T|)+3mr+4k|T|3+3mr+6k|T|2k+1
= 6mr+14k|T|+|T|2k16mr+cτ.

The upper bounds of the claim follow.

A.3 Proof of Lemma 11

We will use Proposition 10. For every type π and 1in we associate a 0-1-valued random variable Xπ,i such that Xπ,i=1 with probability 2|τ| and Xπ,i=0 with probability 12|τ|. Intuitively this is an indicator random variable for the event “the ith element received the type π”. Now Xπ=i=1nXπ,i is a random variable that counts the number of times π is realized. Clearly 𝔼[Xπ]=n/2|τ|, which also holds for every type π. Set μ:=n/2|τ| and δ(n):=32|τ|ln(n)n. Now

2eδ(n)2μ/3=2n1

and

δ(n)μ=32|τ|2|τ|ln(n)n.

Thus, by Proposition 10, we know that

Pr[|Xπμ|32|τ|2|τ|ln(n)n]2n1

Applying the union bound, we also see that

Pr(π:|Xπμ|32|τ|2|τ|ln(n)n)
πPr[|Xπμ|32|τ|2|τ|ln(n)n]
2|τ|+1n1

Thus, with probability at least 12|τ|+1/n in a random model 𝔐 of size n we have for every type π that

||π|𝔐n2|τ||32|τ|2|τ|ln(n)n.

Hence, with probability at least 12|τ|+1/n a random model of size n is balanced.