Abstract 1 Introduction 2 Preliminaries on neural networks 3 A black-box query language 4 Weighted structures and FO(SUM) 5 White-box querying 6 Model-agnostic queries 7 From FO(𝐑𝐥𝐢𝐧) to FO(SUM) 8 Conclusion References

Query Languages for Neural Networks

Martin Grohe ORCID RWTH Aachen University, Aachen, Germany Christoph Standke ORCID RWTH Aachen University, Aachen, Germany Juno Steegmans ORCID Data Science Institute, UHasselt, Diepenbeek, Belgium Jan Van den Bussche ORCID Data Science Institute, UHasselt, Diepenbeek, Belgium
Abstract

We lay the foundations for a database-inspired approach to interpreting and understanding neural network models by querying them using declarative languages. Towards this end we study different query languages, based on first-order logic, that mainly differ in their access to the neural network model. First-order logic over the reals naturally yields a language which views the network as a black box; only the input–output function defined by the network can be queried. This is essentially the approach of constraint query languages. On the other hand, a white-box language can be obtained by viewing the network as a weighted graph, and extending first-order logic with summation over weight terms. The latter approach is essentially an abstraction of SQL. In general, the two approaches are incomparable in expressive power, as we will show. Under natural circumstances, however, the white-box approach can subsume the black-box approach; this is our main result. We prove the result concretely for linear constraint queries over real functions definable by feedforward neural networks with a fixed number of hidden layers and piecewise linear activation functions.

Keywords and phrases:
Expressive power of query languages, Machine learning models, languages for interpretability, explainable AI
Funding:
Martin Grohe: Funded by the European Union (ERC, SymSim, 101054974). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.
Christoph Standke: Funded by the German Research Foundation (DFG) under grants GR 1492/16-1 and GRK 2236 (UnRAVeL).
Juno Steegmans: Supported by the Special Research Fund (BOF) of UHasselt.
Jan Van den Bussche: Partially supported by the Flanders AI Program (FAIR).
Copyright and License:
[Uncaptioned image] © Martin Grohe, Christoph Standke, Juno Steegmans, and Jan Van den Bussche; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Database query languages (principles)
Related Version:
Full Version: https://arxiv.org/abs/2408.10362  [14]
Editors:
Sudeepa Roy and Ahmet Kara

1 Introduction

Neural networks [11] are a popular and successful representation model for real functions learned from data. Once deployed, the neural network is “queried” by supplying it with inputs then obtaining the outputs. In the field of databases, however, we have a much richer conception of querying than simply applying a function to given arguments. For example, in querying a database relation Employee(name,salary), we can not only ask for Anne’s salary; we can also ask how many salaries are below that of Anne’s; we can ask whether no two employees have the same salary; and so on.

In this paper, we consider the querying of neural networks from this more general perspective. We see many potential applications: obvious ones are in explanation, verification, and interpretability of neural networks and other machine-learning models [8, 2, 25]. These are huge areas [31, 7] where it is important [29, 22] to have formal, logical definitions for the myriad notions of explanation that are being considered. Another potential application is in managing machine-learning projects, where we are testing many different architectures and training datasets, leading to a large number of models, most of which become short-term legacy code. In such a context it would be useful if the data scientist could search the repository for earlier generated models having certain characteristics in their architecture or in their behavior, which were perhaps not duly documented.

The idea of querying machine learning models with an expressive, declarative query language comes naturally to database researchers, and indeed, Arenas et al. already proposed a language for querying boolean functions over an unbounded set of boolean features [3]. In the modal logic community, similar languages are being investigated [26, references therein].

In the present work, we focus on real, rather than boolean, functions and models, as is indeed natural in the setting of verifying neural networks [2].

The constraint query language approach.

A natural language for querying real functions on a fixed number of arguments (features) is obtained by simply using first-order logic over the reals, with a function symbol F representing the function to be queried. We denote this by FO(𝐑). For example, consider functions F with three arguments. The formula b|F(a,b,c)F(a,b,c)|<ϵ expresses that the output on (a,b,c) does not depend strongly on the second feature, i.e., F(a,b,c) is ϵ-close to F(a,b,c) for any b. Here, a, b, c and ϵ can be real constants or parameters (free variables).

The language FO(𝐑) (also known as FO+Poly) and its restriction FO(𝐑lin) to linear arithmetic (aka FO+Lin) were intensively investigated in database theory around the turn of the century, under the heading of constraint query languages, with applications to spatial and temporal databases. See the compendium volume [21] and book chapters [24, chapter 13], [13, chapter 5]. Linear formulas with only universal quantifiers over the reals, in front of a quantifier-free condition involving only linear arithmetic (as the above example formula), can already model many properties considered in the verification of neural networks [2]. This universal fragment of FO(𝐑lin) can be evaluated using linear programming techniques [2].

Full FO(𝐑) allows alternation of quantifiers over the reals, and multiplication in arithmetic. Because the first-order theory of the reals is decidable [5], FO(𝐑) queries can still be effectively evaluated on any function that is semi-algebraic, i.e., itself definable in first-order logic over the reals. Although the complexity of this theory is high, if the function is presented as a quantifier-free formula, FO(𝐑) query evaluation actually has polynomial-time data complexity; here, the “data” consists of the given quantifier-free formula [18].

Functions that can be represented by feedforward neural networks with ReLU hidden units and linear output units are clearly semi-algebraic; in fact, they are piecewise linear. For most of our results, we will indeed focus on this class of networks, which are widespread in practice [11], and denote them by ReLU-FNN.

The SQL approach.

Another natural approach to querying neural networks is to query them directly, as graphs of neurons with weights on the nodes and edges. For this purpose one represents such graphs as relational structures with numerical values and uses SQL to query them. As an abstraction of this approach, in this paper, we model neural networks as weighted finite structures. As a query language we use FO(SUM): first-order logic over weighted structures, allowing order comparisons between weight terms, where weight terms can be built up using rational arithmetic, if-then-else, and, importantly, summation.

Based on logics originally introduced by Grädel and Gurevich [12], the language FO(SUM) is comparable to the relational calculus with aggregates [19] and, thus, to SQL [23]. Logics close to FO(SUM), but involving arithmetic in different semirings, were recently also used for unifying different algorithmic problems in query processing [35], as well as for expressing hypotheses in the context of learning over structures [36]. The well-known FAQ framework [28], restricted to the real semiring, can be seen as the conjunctive fragment of FO(SUM).

To give a simple example of an FO(SUM) formula, consider ReLU-FNNs with a single input unit, one hidden layer of ReLU units, and a single linear output unit. The following formula expresses the query that asks if the function evaluation on a given input value is positive:

0<b(out)+x:E(in,x)w(x,out)ReLU(w(in,x)𝑣𝑎𝑙+b(x)).

Here, E is the edge relation between neurons, and constants in and out hold the input and output unit, respectively. Thus, variable x ranges over the neurons in the hidden layer. Weight functions w and b indicate the weights of edges and the biases of units, respectively; the weight constant 𝑣𝑎𝑙 stands for a given input value. We assume for clarity that ReLU is given, but it is definable in FO(SUM).

Just like the relational calculus with aggregates, or SQL select statements, query evaluation for FO(SUM) has polynomial time data complexity, and techniques for query processing and optimization from database systems directly apply.

Comparing expressive powers.

Expressive power of query languages has been a classical topic in database theory and finite model theory [1, 24], so, with the advent of new models, it is natural to revisit questions concerning expressivity. The goal of this paper is to understand and compare the expressive power of the two query languages FO(𝐑) and FO(SUM) on neural networks over the reals. The two languages are quite different. FO(𝐑) sees the model as a black-box function F, but can quantify over the reals. FO(SUM) can see the model as a white box, a finite weighted structure, but can quantify only over the elements of the structure, i.e., the neurons.

In general, indeed the two expressive powers are incomparable. In FO(SUM), we can express queries about the network topology; for example, we may ask to return the hidden units that do not contribute much to the function evaluation on a given input value. (Formally, leaving them out of the network would yield an output within some ϵ of the original output.) Or, we may ask whether there are more than a million neurons in the first hidden layer. For FO(𝐑), being a black box language, such queries are obviously out of scope.

A more interesting question is how the two languages compare in expressing model agnostic queries: these are queries that return the same result on any two neural networks that represent the same input–output function. For example, when restricting attention to networks with one hidden layer, the example FO(SUM) formula seen earlier, which evaluates the network, is model agnostic. FO(𝐑) is model agnostic by design, and, indeed, serves as a very natural declarative benchmark of expressiveness for model-agnostic queries. It turns out that FO(SUM), still restricting to networks of some fixed depth, can express model-agnostic queries that FO(𝐑) cannot. For example, for any fixed depth d, we will show that FO(SUM) can express the integrals of a functions given by a ReLU-FNNs of depth d. In contrast, we will show that this cannot be done in FO(𝐑) (Theorem 6.1).

The depth of a neural network can be taken as a crude notion of “schema”. Standard relational query languages typically cannot be used without knowledge of the schema of the data. Similarly, we will show that without knowledge of the depth, FO(SUM) cannot express any nontrivial model-agnostic query (Theorem 6.2). Indeed, since FO(SUM) lacks recursion, function evaluation can only be expressed if we known the depth. (Extensions with recursion is one of the many interesting directions for further research.)

When the depth is known, however, for model-agnostic queries, the expressiveness of FO(SUM) exceeds the benchmark of expressiveness provided by FO(𝐑lin). Specifically, we show that every FO(𝐑lin) query over functions representable by ReLU-FNNs is also expressible in FO(SUM) evaluated on the networks directly (Theorem 7.1). This is our main technical result, and can be paraphrased as “SQL can verify neural networks.” The proof involves showing that the required manipulations of higher-dimensional piecewise linear functions, and the construction of cylindrical cell decompositions in n, can all be expressed in FO(SUM). To allow for a modular proof, we also develop the notion of FO(SUM) translation, generalizing the classical notion of first-order interpretations [16].

This paper is organized as follows. Section 2 provides preliminaries on neural networks. Section 3 introduces FO(𝐑). Section 4 introduces weighted structures and FO(SUM), after which Section 5 introduces white-box querying. Section 6 considers model-agnostic queries. Section 7 presents the main technical result. Section 8 concludes with a discussion of topics for further research.

A full version of this paper with full proofs is available [14].

2 Preliminaries on neural networks

A feedforward neural network [11], in general, could be defined as a finite, directed, weighted, acyclic graph, with some additional aspects which we discuss next. The nodes are also referred to as neurons or units. Some of the source nodes are designated as inputs, and some of the sink nodes are designated as outputs. Both the inputs, and the outputs, are linearly ordered. Neurons that are neither inputs nor outputs are said to be hidden. All nodes, except for the inputs, carry a weight, a real value, called the bias. All directed edges also carry a weight.

In this paper, we focus on ReLU-FNNs: networks with ReLU activations and linear outputs. This means the following. Let 𝒩 be a neural network with m inputs. Then every node u in 𝒩 represents a function Fu𝒩:m defined as follows. We proceed inductively based on some topological ordering of 𝒩. For input nodes u, simply Fu𝒩(x1,,xm):=xi, if u is the ith input node. Now let u be a hidden neuron and assume Fv𝒩 is already defined for all predecessors v of u, i.e., nodes v with an edge to u. Let v1,,vl be these predecessors, let w1,,wl be the weights on the respective edges, and let b be the bias of u. Then

Fu𝒩(𝒙):=ReLU(b+iwiFvi𝒩(𝒙)),

where ReLU::zmax(0,z).

Finally, for an output node u, we define Fu𝒩 similarly to hidden neurons, except that the application of ReLU is omitted. The upshot is that a neural network 𝒩 with m inputs and n outputs u1,,un represents a function F𝒩:mn mapping 𝒙 to (Fu1𝒩(𝒙),,Fun𝒩(𝒙)). For any node u in the network, Fu𝒩 is always a continuous piecewise linear function. We denote the class of all continuous piecewise linear functions F:m by 𝒫(m); that is, continuous functions F that admit a partition of m into finitely many polytopes such that F is affine linear on each of them.

Hidden layers.

Commonly, the hidden neurons are organized in disjoint blocks called layers. The layers are ordered, such that the neurons in the first layer have only edges from inputs, and the neurons in any later layer have only edges from neurons in the previous layer. Finally, outputs have only edges from neurons in the last layer.

We will use 𝐅(m,) to denote the class of layered networks with m inputs of depth , that is, with an input layer with m nodes, 1 hidden layers, and an output layer with a single node. Recall that the nodes on all hidden layer use ReLU activations and the output node uses the identity function.

It is easy to see that networks in 𝐅(1,1) just compute linear functions and that for every 2 we have {F𝒩𝒩𝐅(1,)}=𝒫(1), that is, the class of functions that can be computed by a network in 𝐅(1,) is the class of all continuous piecewise linear functions. The well-known Universal Approximation Theorem [9, 17] says that every continuous function f:K defined on a compact domain Km can be approximated to any additive error by a network in 𝐅(m,2).

3 A black-box query language

First-order logic over the reals, denoted here by FO(𝐑), is, syntactically, just first-order logic over the vocabulary of elementary arithmetic, i.e., with binary function symbols + and for addition and multiplication, binary predicate <, and constant symbols 0 and 1 [5]. Constants for rational numbers, or even algebraic numbers, can be added as an abbreviation (since they are definable in the logic).

The fragment FO(𝐑lin) of linear formulas uses multiplication only for scalar multiplication, i.e., multiplication of variables with rational number constants. For example, the formula y=3x14x2+7 is linear, but the formula y=5x1x23 is not. In practice, linear queries are often sufficiently expressive, both from earlier applications for temporal or spatial data [21], as well as for querying neural networks (see examples to follow). The only caveat is that many applications assume a distance function on vectors. When using distances based on absolute value differences between real numbers, e.g., the Manhattan distance or the max norm, we still fall within FO(𝐑lin).

We will add to FO(𝐑) extra relation or function symbols; in this paper, we will mainly consider FO(𝐑,F), which is FO(𝐑) with an extra function symbol F. The structure on the domain of reals, with the arithmetic symbols having their obvious interpretation, will be denoted here by 𝐑. Semantically, for any vocabulary τ of extra relation and function symbols, FO(𝐑,τ) formulas are interpreted over structures that expand 𝐑 with additional relations and functions on of the right arities, that interpret the symbols in τ. In this way, FO(𝐑,F) expresses queries about functions F:m.

This language can express a wide variety of properties (queries) considered in interpretable machine learning and neural-network verification. Let us see some examples.

Example 3.1.

To check whether F:m is robust around an m-vector 𝐚 [32], using parameters ϵ and δ, we can write the formula 𝐱(d(𝐱,𝐚)<ϵ|F(𝐱)F(𝐚)|<δ). Here 𝐱 stands for a tuple of m variables, and d stands for some distance function which is assumed to be expressible.

Example 3.2.

Counterfactual explanation methods [37] aim to find the closest 𝐱 to an input 𝐚 such that F(𝐱) is “expected,” assuming that F(𝐚) was unexpected. A typical example is credit denial; what should we change minimally to be granted credit? Typically we can define expectedness by some formula, e.g., F(𝐱)>0.9. Then we can express the counterfactual explanation as F(𝐱)>0.9𝐲(F(𝐲)>0.9d(𝐱,𝐚)d(𝐲,𝐚)).

Example 3.3.

We may define the contribution of an input feature i on an input 𝐚=(a1,,am) as the inverse of the smallest change we have to make to that feature for the output to change significantly. We can express that r is such a change by writing (taking i=1 for clarity) r>0(d(F(a1r,a2,,am),F(𝐚))>ϵd(F(a1+r,a2,,am),F(𝐚))>ϵ). Denoting this formula by 𝑐ℎ𝑎𝑛𝑔𝑒(r), the smallest change is then expressed as 𝑐ℎ𝑎𝑛𝑔𝑒(r)r(𝑐ℎ𝑎𝑛𝑔𝑒(r)rr).

Example 3.4.

We finally illustrate that FO(𝐑,F) can express gradients and many other notions from calculus. For simplicity assume F to be unary. Consider the definition F(a)=limxc(F(x)F(c))/(xc) of the derivative in a point c. So it suffices to show how to express that l=limxcG(x) for a function G that is continuous in c. We can write down the textbook definition literally as ϵ>0δ>0x(|xc|<δ|G(x)l|<ϵ).

Evaluating FO(𝐑) queries.

Black box queries can be effectively evaluated using the decidability and quantifier elimination properties of FO(𝐑). This is the constraint query language approach [18, 21], which we briefly recall next.

A function f:m is called semialgebraic [5] (or semilinear) if there exists an FO(𝐑) (or FO(𝐑lin)) formula φ(x1,,xm,y) such that for any m-vector 𝒂 and real value b, we have 𝐑φ(𝒂,b) if and only if F(𝒂)=b.

Now consider the task of evaluating an FO(𝐑,F) formula ψ on a semialgebraic function f, given by a defining formula φ. By introducing auxiliary variables, we may assume that the function symbol F is used in ψ only in subformulas for the form z=F(u1,,um). Then replace in ψ each such subformula by φ(u1,,um,z), obtaining a pure FO(𝐑) formula χ.

Now famously, the first-order theory of is decidable [33, 5]. In other words, there is an algorithm that decides, for any FO(𝐑) formula χ(x1,,xk) and k-vector 𝒄, whether 𝐑χ(𝒄). Actually, a stronger property holds, to the effect that every FO(𝐑)-formula is equivalent to a quantifier-free formula. The upshot is that there is an algorithm that, given a FO(𝐑,F) query ψ(x1,,xk) and a semialgebraic function f given by a defining formula, outputs a quantifier-free formula defining the result set {𝒄k𝐑,fψ(𝒄)}. If f is given by a quantifier-free formula, the evaluation can be done in polynomial time in the length of the description of f, so polynomial-time data complexity. This is because there are algorithms for quantifier elimination with complexity p(n)e(q), where n is the size of the formula, p is a polynomial, q is the number of quantifiers, and e is a doubly exponential function [18, 5].

Complexity.

Of course, we want to evaluate queries on the functions represented by neural networks. From the definition given in Section 2, it is clear that the functions representable by ReLU-FNNs are always semialgebraic (actually, semilinear). For every output feature j, it is straightforward to compile, from the network, a quantifier-free formula defining the jth output component function. In this way we see that FO(𝐑,F) queries on ReLU-FNNs are, in principle, computable in polynomial time.

However, the algorithms are notoriously complex, and we stress again that FO(𝐑,F) should be mostly seen as a declarative benchmark of expressiveness. Moreover, we assume here for convenience that ReLU is a primitive function. ReLU can be expressed in FO(𝐑) using disjunction, but this may blow up the query formula, e.g., when converting to disjunctive normal form [2]. Symbolic constraint solving algorithms for the reals have been extended to deal with ReLU natively [2].

 Remark 3.5.

In closing this Section we remark that, to query the entire network function, we would not strictly use just only a single function symbol F, but rather the language FO(𝐑,F1,,Fn), with function symbols for the n outputs. In this paper, for the sake of clarity, we will often stick to a single output, but our treatment generalizes to multiple outputs.

4 Weighted structures and FO(SUM)

Weighted structures are standard abstract structures equipped with one or more weight functions from tuples of domain elements to values from some separate, numerical domain. Here, as numerical domain, we will use ={}, the set of “lifted reals” where is an extra element representing an undefined value. Neural networks are weighted graph structures. Hence, since we are interested in declarative query languages for neural networks, we are interested in logics over weighted structures. Such logics were introduced by Grädel and Gurevich [12]. We consider here a concrete instantiation of their approach, which we denote by FO(SUM).

Recall that a (finite, relational) vocabulary is a finite set of function symbols and relation symbols, where each symbol comes with an arity (a natural number). We extend the notion of vocabulary to also include a number of weight function symbols, again with associated arities. We allow 0-ary weight function symbols, which we call weight constant symbols.

A (finite) structure 𝒜 over such a vocabulary Υ consists of a finite domain A, and functions and relations on A of the right arities, interpreting the standard function symbols and relation symbols from Υ. So far this is standard. Now additionally, 𝒜 interprets every weight function symbol w, of arity k, by a function w𝒜:Ak.

The syntax of FO(SUM) formulas (over some vocabulary) is defined exactly as for standard first order logic, with one important extension. In addition to formulas (taking Boolean values) and standard terms (taking values in the structure), the logic contains weight terms taking values in . Weight terms t are defined by the following grammar:

t::=w(s1,,sn)r(t,,t)if φ then t else t𝒙:φt

Here, w is a weight function symbol of arity n and the si are standard terms; r is a rational function applied to weight terms, with rational coefficients; φ is a formula; and 𝒙 is a tuple of variables. The syntax of weight terms and formulas is mutually recursive. As just seen, the syntax of formulas φ is used in the syntax of weight terms; conversely, weight terms t1 and t2 can be combined to form formulas t1=t2 and t1<t2.

Recall that a rational function is a fraction between two polynomials. Thus, the arithmetic operations that we consider are addition, scalar multiplication by a rational number, multiplication, and division.

The free variables of a weight term are defined as follows. The weight term has no free variables. The free variables of w(s1,,sn) are simply the variables occurring in the si. A variable occurs free in r(t1,,tn) if it occurs free in some ti. A variable occurs free in ‘if φ then t1 else t2’ if it occurs free in t1, t2, or φ. The free variables of 𝒙:φt are those of φ and t, except for the variables in 𝒙. A formula or (weight) term is closed if it has no free variables.

We can evaluate a weight term t(x1,,xk) on a structure 𝒜 and a tuple 𝒂Ak providing values to the free variables. The result of the evaluation, denoted by t𝒜,𝒂, is a value in , defined in the obvious manner. In particular, when t is of the form 𝒚:φt, we have

t𝒜,𝒂=𝒃:𝒜φ(𝒂,𝒃)t𝒜,𝒂,𝒃.

Division by zero, which can happen when evaluating terms of the form r(t,,t), is given the value . The arithmetical operations are extended so that x+, q (scalar multiply), x, and x/ and /x always equal . Also, <a holds for all a.

5 White-box querying

For any natural numbers m and n, we introduce a vocabulary for neural networks with m inputs and n outputs. We denote this vocabulary by Υnet(m,n), or just Υnet if m and n are understood. It has a binary relation symbol E for the edges; constant symbols in1, …, inm and out1, …, outn for the input and output nodes; a unary weight function b for the biases, and a binary weight function symbol w for the weights on the edges.

Any ReLU-FNN 𝒩, being a weighted graph, is an Υnet-structure in the obvious way. When there is no edge from node u1 to u2, we put w𝒩(u1,u2)=0. Since inputs have no bias, we put b𝒩(u)= for any input u.

Depending on the application, we may want to enlarge Υnet with some additional parameters. For example, we can use additional weight constant symbols to provide input values to be evaluated, or output values to be compared with, or interval bounds, etc.

The logic FO(SUM) over the vocabulary Υnet (possibly enlarged as just mentioned) serves as a “white-box” query language for neural networks, since the entire model is given and can be directly queried, just like an SQL query can be evaluated on a given relational database. Contrast this with the language FO(𝐑,F) from Section 3, which only has access to the function F represented by the network, as a black box.

Example 5.1.

While the language FO(𝐑,F) cannot see inside the model, at least it has direct access to the function represented by the model. When we use the language FO(SUM), we must compute this function ourselves. At least when we know the depth of the network, this is indeed easy. In the Introduction, we already showed a weight term expressing the evaluation of a one-layer neural network on a single input and output. We can easily generalize this to a weight term expressing the value of any of a fixed number of outputs, with any fixed number m of inputs, and any fixed number of layers. Let 𝑣𝑎𝑙1, …, 𝑣𝑎𝑙m be additional weight constant symbols representing input values. Then the weight term ReLU(b(u)+w(in1,u)𝑣𝑎𝑙1++w(inm,u)𝑣𝑎𝑙m) expresses the value of any neuron u in the first hidden layer (u is a variable). Denote this term by t1(u). Next, for any subsequent layer numbered l>1, we inductively define the weight term tl(u) as

ReLU(b(u)+x:E(x,u)w(x,u)tl1(x)).

Here, ReLU(c) can be taken to be the weight term if c>0 then c else 0. Finally, the value of the jth output is given by the weight term 𝑒𝑣𝑎𝑙j:=b(outj)+x:E(x,outj)w(x,outj)tl(x), where l is the number of the last hidden layer.

Example 5.2.

We can also look for useless neurons: neurons that can be removed from the network without altering the output too much on given values. Recall the weight term 𝑒𝑣𝑎𝑙j from the previous example; for clarity we just write 𝑒𝑣𝑎𝑙. Let z be a fresh variable, and let 𝑒𝑣𝑎𝑙 be the term obtained from 𝑒𝑣𝑎𝑙 by altering the summing conditions E(x,u) and E(x,out) by adding the conjunct xz. Then the formula |𝑒𝑣𝑎𝑙𝑒𝑣𝑎𝑙|<ϵ expresses that z is useless. (For |c| we can take the weight term if c>0 then c else c.)

Another interesting example is computing integrals. Recall that 𝐅(m,) is the class of networks with m inputs, one output, and depth .

Lemma 5.3.

Let m and be natural numbers. There exists an FO(SUM) term t over Υnet(m,1) with m additional pairs of weight constant symbols 𝑚𝑖𝑛i and 𝑚𝑎𝑥i for i{1,,m}, such that for any network 𝒩 in 𝐅(m,), and values ai and bi for the 𝑚𝑖𝑛i and 𝑚𝑎𝑥i, we have t𝒩,a1,b1,,am,bm=a1b1ambmF𝒩𝑑x1𝑑xm.

Proof (sketch).

We sketch here a self-contained and elementary proof for m=1 and =2 (one input, one hidden layer). This case already covers all continuous piecewise linear functions .

Every hidden neuron u may represent a “quasi breakpoint” in the piecewise linear function (that is, a point where its slope may change). Concretely, we consider the hidden neurons with nonzero input weights to avoid dividing by zero. Its x-coordinate is given by the weight term 𝑏𝑟𝑒𝑎𝑘x(u):=b(u)/w(in1,u). The y-value at the breakpoint is then given by 𝑏𝑟𝑒𝑎𝑘y(u):=𝑒𝑣𝑎𝑙1(𝑏𝑟𝑒𝑎𝑘x(u)), where 𝑒𝑣𝑎𝑙1 is the weight term from Example 5.1 and we substitute 𝑏𝑟𝑒𝑎𝑘x(u) for 𝑣𝑎𝑙1.

Pairs (u1,u2) of neurons representing successive breakpoints are easy to define by a formula succ(u1,u2). Such pairs represent the pieces of the function, except for the very first and very last pieces. For this proof sketch, assume we simply want the integral between the first breakpoint and the last breakpoint.

The area (positive or negative) contributed to the integral by the piece (u1,u2) is easy to write as a weight term: 𝑎𝑟𝑒𝑎(u1,u2)=12(𝑏𝑟𝑒𝑎𝑘y(u1)+𝑏𝑟𝑒𝑎𝑘y(u2))(𝑏𝑟𝑒𝑎𝑘x(u2)𝑏𝑟𝑒𝑎𝑘x(u1)). We sum these to obtain the desired integral. However, since different neurons may represent the same quasi breakpoint, we must divide by the number of duplicates. Hence, our desired term t equals u1,u2:succ(u1,u2)𝑎𝑟𝑒𝑎(u1,u2)/(u1,u2:γ1), where γ is the formula succ(u1,u2)𝑏𝑟𝑒𝑎𝑘x(u1)=𝑏𝑟𝑒𝑎𝑘x(u1)𝑏𝑟𝑒𝑎𝑘x(u2)=𝑏𝑟𝑒𝑎𝑘x(u2).

Example 5.4.

A popular alternative to Example 3.3 for measuring the contribution of an input feature i to an input 𝐲=(y1,,ym) is the Shap score [27]. It assumes a probability distribution on the input space and quantifies the change to the expected value of F𝒩 caused by fixing input feature i to yi in a random fixation order of the input features:

Shap(i)=I{1,,m}{i}|I|!(m1|I|!)m!(𝔼(F𝒩(𝒙)𝒙I{i}=𝒚I{i})𝔼(F𝒩(𝒙)𝒙I=𝒚I)).

When we assume that is the product of uniform distributions over the intervals (aj,bj), we can write the conditional expectation 𝔼(F𝒩(𝐱)𝐱J=𝐲J) for some J{1,,m} by setting {1,m}J=:{j1,,jr} as follows.

𝔼(F𝒩(𝒙)𝒙J=𝒚J)=k=1r1bjkajkaj1bj1ajrbjrF𝒩(𝒙|𝒙J=𝒚J)𝑑xjr𝑑xj1

where 𝐱|𝐱J=𝐲J is a short notation for the variable obtained from 𝐱 by replacing xj with yj for all jJ. With lemma 5.3, this conditional expectation can be expressed in FO(SUM) and by replacing J with I or I{i} respectively, we can express the Shap score.

More examples.

Our main result will be that, over networks of a given depth, all of FO(𝐑lin,F) can be expressed in FO(SUM). So the examples from Section 3 (which are linear if a Manhattan or max distance is used) apply here as well. Moreover, the techniques by which we show our main result readily adapt to queries not about the final function F represented by the network, but about the function Fz represented by a neuron z given as a parameter to the query, much as in Example 5.2. For example, in feature visualization [8] we want to find the input that maximizes the activation of some neuron z. Since this is expressible in FO(𝐑lin,F), it is also expressible in FO(SUM).

6 Model-agnostic queries

We have already indicated that FO(𝐑,F) is “black box” while FO(SUM) is “white box”. Black-box queries are commonly called model agnostic [8]. Some FO(SUM) queries may, and others may not, be model agnostic.

Formally, for some 1, let us call a closed FO(SUM) formula φ, possibly using weight constants c1,,ck, depth- model agnostic if for all m1 all neural networks 𝒩,𝒩i=1𝐅(m,i) such that F𝒩=F𝒩, and all a1,,ak we have 𝒩,a1,,akφ 𝒩,a1,,akφ. A similar definition applies to closed FO(SUM) weight terms.

For example, the term of Example 5.1 evaluating the function of a neural network of depth at most is depth- model agnostic. By comparison the formula stating that a network has useless neurons (cf. Example 5.2) is not model agnostic. The term t from Lemma 5.3, computing the integral, is depth- model agnostic.

Theorem 6.1.

The query 01f=0 for functions f𝒫(1) is expressible by a depth-2 agnostic FO(SUM) formula, but not in FO(𝐑,F).

Proof.

We have already seen the expressibility in FO(SUM). We prove nonexpressibility in FO(𝐑,F).

Consider the equal-cardinality query Q about disjoint pairs (S1,S2) of finite sets of reals, asking whether |S1|=|S2|. Over abstract ordered finite structures, equal cardinality is well-known not to be expressible in order-invariant first-order logic [24]. Hence, by the generic collapse theorem for constraint query languages over the reals [21, 24], query Q is not expressible in FO(𝐑,S1,S2).

Figure 1: The function fS1,S2 of the proof of Theorem 6.1 for the set S1 consisting of the three red points and the set S2 consisting of the three white points.

Now for any given S1 and S2, we construct a continuous piecewise linear function fS1,S2 as follows. We first apply a suitable affine transformation so that S1S2 falls within the open interval (0,1). Now fS1,S2 is a sawtooth-like function, with positive teeth at elements from S1, negative teeth (of the same height, say 1) at elements from S2, and zero everywhere else. To avoid teeth that overlap the zero boundary at the left or that overlap each other, we make them of width min{m,M}/2, where m is the minimum of S1S2 and M is the minimum distance between any two distinct elements in S1S2.

Expressing the above construction uniformly in FO(𝐑,S1,S2) poses no difficulties; let ψ(x,y) be a formula defining fS1,S2. Now assume, for the sake of contradiction, that 01F=0 would be expressible by a closed FO(𝐑,F) formula φ. Then composing φ with ψ would express query Q in FO(𝐑,S1,S2). Indeed, clearly, 01fS1,S2=0 if and only if |S1|=|S2|. So, φ cannot exist.

It seems awkward that in the definition of model agnosticity we need to bound the depth. Let us call an FO(SUM) term or formula fully model agnostic if is depth- model agnostic for every . It turns out that there are no nontrivial fully model agnostic FO(SUM) formulas.

Theorem 6.2.

Let φ be a fully model agnostic closed FO(SUM) formula over Υnet(m,1). Then either 𝒩φ for all 𝒩1𝐅(m,) or 𝒩⊧̸φ for all 𝒩1𝐅(m,).

We omit the proof. The idea is that FO(SUM) is Hanf-local [23, 24]. No formula φ can distinguish a long enough structures consisting of two chains where the middle nodes are marked by two distinct constants c1 and c2, from its sibling structure where the markings are swapped. We can turn the two structures into neural networks by replacing the markings by two gadget networks N1 and N2, representing different functions, that φ is supposed to distinguish. However, the construction is done so that the function represented by the structure is the same as that represented by the gadget in the left chain. Still, FO(SUM) cannot distinguish these two structures. So, φ is either not fully model-agnostic, or N1 and N2 cannot exist and φ is trivial.

Corollary 6.3.

The FO(𝐑,F) query F(0)=0 is not expressible in FO(SUM).

7 From FO(𝐑𝐥𝐢𝐧) to FO(SUM)

In practice, the number of layers in the employed neural network architecture is often fixed and known. Our main result then is that FO(SUM) can express all FO(𝐑lin) queries.

Theorem 7.1.

Let m and be natural numbers. For every closed FO(𝐑lin,F) formula ψ there exists a closed FO(SUM) formula φ such that for every network 𝒩 in 𝐅(m,), we have 𝐑,F𝒩ψ iff 𝒩φ.

The challenge in proving this result is to simulate, using quantification and summation over neurons, the unrestricted access to real numbers that is available in FO(𝐑lin). Thereto, we will divide the relevant real space in a finite number of cells which we can represent by finite tuples of neurons.

The proof involves several steps that transform weighted structures. Before presenting the proof, we formalize such transformations in the notion of FO(SUM) translation, which generalize the classical notion of first-order interpretation [16] to weighted structures.

7.1 FO(SUM) translations

Let Υ and Γ be vocabularies for weighted structures, and let n be a natural number. An n-ary FO(SUM) translation φ from Υ to Γ consists of a number of formulas and weight terms over Υ, described next. There are formulas φdom(𝒙) and φ=(𝒙1,𝒙2); formulas φR(𝒙1,,𝒙k) for every k-ary relation symbol R of Γ; and formulas φf(𝒙0,𝒙1,,𝒙k) for every k-ary standard function symbol f of Γ. Furthermore, there are weight terms φw(𝒙1,,𝒙k) for every k-ary weight function w of Γ.

In the above description, bold 𝒙 denote n-tuples of distinct variables. Thus, the formulas and weight terms of φ define relations or weight functions of arities that are a multiple of n.

We say that φ maps a weighted structure 𝒜 over Υ to a weighted structure over Γ if there exists a surjective function h from φdom(𝒜)An to B such that:

  • h(𝒂1)=h(𝒂2)𝒜φ=(𝒂1,𝒂2);

  • (h(𝒂1),,h(𝒂k))R𝒜φR(𝒂1,,𝒂k);

  • (h(𝒂0)=f(h(𝒂1),,h(𝒂k))𝒜φf(𝒂0,𝒂1,,𝒂k);

  • w(h(𝒂1),,h(𝒂m))=φw𝒜(𝒂1,,𝒂n).

In the above, the bold 𝒂 denote n-tuples in φdom(𝒜).

For any given 𝒜, if φ maps 𝒜 to , then is unique up to isomorphism. Indeed, the elements of B can be understood as representing the equivalence classes of the equivalence relation φ=(𝒜) on φdom(𝒜). In particular, for to exist, φ must be admissible on 𝒜, which means that φ=(𝒜) is indeed an equivalence relation on φdom(𝒜), and all relations and all functions φR(𝒜), φf(𝒜) and φw(𝒜) are invariant under this equivalence relation.

If 𝐊 is a class of structures over Υ, and T is a transformation of structures in 𝐊 to structures over Γ, we say that φ expresses T if φ is admissible on every 𝒜 in 𝐊, and maps 𝒜 to T(𝒜).

The relevant reduction theorem for translations is the following:

Theorem 7.2.

Let φ be an n-ary FO(SUM) translation from Υ to Γ, and let ψ(y1,,yk) be a formula over Γ. Then there exists a formula φψ(𝐱1,,𝐱k) over Υ such that whenever φ maps 𝒜 to through h, we have ψ(h(𝐚1),,h(𝐚k)) iff 𝒜φψ(𝐚1,,𝐚k). Furthermore, for any weight term t over Γ, there exists a weight term φt over Υ such that t(h(𝐚1),,h(𝐚k))=φt𝒜(𝐚1,,𝐚k).

Proof (sketch).

As this result is well known and straightforward to prove for classical first-order interpretations, we only deal here with summation terms, which are the main new aspect. Let t be of the form y:γt. Then for φt we take 𝒙:φγφt(𝒙1,,𝒙k,𝒙)/(𝒙:φ=(𝒙,𝒙)1).

7.2 Proof of Theorem 7.1

We sketch the proof of Theorem 7.1. For clarity of exposition, we present it first for single inputs, i.e., the case m=1. We present three Lemmas which can be chained together to obtain the theorem.

Piecewise linear functions.

We can naturally model piecewise linear (PWL) functions from to as weighted structures, where the elements are simply the pieces. Each piece p is defined by a line y=ax+b and left and right endpoints. Accordingly, we use a vocabulary Υ1pwl with four unary weight functions indicating a, b, and the x-coordinates of the endpoints. (The left- and rightmost pieces have no endpoint; we set their x-coordinate to .)

For m=1 and =2, the proof of the following Lemma is based on the same ideas as in the proof sketch we gave for Lemma 5.3. For m>1, PWL functions from m to are more complex; the vocabulary Υmpwl and a general proof of the lemma will be described in Section 7.3.

Lemma 7.3.

Let m and be natural numbers. There is an FO(SUM) translation from Υnet(m,1) to Υmpwl that transforms every network 𝒩 in 𝐅(m,) into a proper weighted structure representing F𝒩.

Hyperplane arrangements.

An affine function on d is a function of the form a0+a1x1++adxd. An affine hyperplane is the set of zeros of some non-constant affine function (i.e. where at least one of the ai with i>0 is non-zero). A hyperplane arrangement is a collection of affine hyperplanes.

We naturally model a hyperplane arrangement as a weighted structure, where the elements are the hyperplanes. The vocabulary Υdarr simply consists of unary weight functions a0, a1, …, ad indicating the coefficients of the affine function defining each hyperplane.

 Remark 7.4.

An Υdarr-structure may have duplicates, i.e., different elements representing the same hyperplane. This happens when they have the same coefficients up to a constant factor. In our development, we will allow structures with duplicates as representations of hyperplane arrangements.

Cylindrical decomposition.

We will make use of a linear version of the notion of cylindrical decomposition (CD) [5], which we call affine CD. An affine CD of d is a sequence 𝒟=𝒟0,,𝒟d, where each 𝒟i is a partition of i. The blocks of partition 𝒟i are referred to as i-cells or simply cells. The precise definition is by induction on d. For the base case, there is only one possibility 𝒟0={0}. Now let d>0. Then 𝒟0,,𝒟d1 should already be an affine CD of d1. Furthermore, for every cell C of 𝒟d1, there must exist finitely many affine functions ξ1, …, ξr from d1 to , where r may depend on C. These are called the section mappings above C, and must satisfy ξ1<<ξr on C. In this way, the section mappings induce a partition of the cylinder C× in sections and sectors. Each section is the graph of a section mapping, restricted to C. Each sector is the volume above C between two consecutive sections. Now 𝒟d must equal {C×SC𝒟d1 and S is a section or sector above C}.

The ordered sequence of cells C×S formed by the sectors and sections of C is called the stack above C, and C is called the base cell for these cells.

An affine CD of d is compatible with a hyperplane arrangement 𝒜 in d if every every d-cell C lies entirely on, or above, or below every hyperplane h=0. (Formally, the affine function h is everywhere zero, or everywhere positive, or everywhere negative, on C.)

We can represent a CD compatible with a hyperplane arrangement as a weighted structure with elements of two sorts: cells and hyperplanes. There is a constant o for the “origin cell” 0. Binary relations link every i+1-cell to its base i-cell, and to its delineating section mappings. (Sections are viewed as degenerate sectors where the two delineating section mappings are identical.) Ternary relations give the order of two hyperplanes in i+1 above an i-cell, and whether they are equal. The vocabulary for CDs of d is denoted by Υdcell.

Lemma 7.5.

Let d be a natural number. There is an FO(SUM) translation from Υdarr to Υdcell that maps any hyperplane arrangement 𝒜 to a CD that is compatible with 𝒜.

Proof (sketch).

We follow the method of vertical decomposition [15]. There is a projection phase, followed by a buildup phase. For the projection phase, let 𝒜d:=𝒜. For i=d,,1, take all intersections between hyperplanes in 𝒜i, and project one dimension down, i.e., project on the first i1 components. The result is a hyperplane arrangement 𝒜i1 in i1. For the buildup phase, let 𝒟0:={0}. For i=0,,d1, build a stack above every cell C in 𝒟i formed by intersecting C× with all hyperplanes in 𝒜i+1. The result is a partition 𝒟i+1 such that 𝒟0,,𝒟i+1 is a CD of i+1 compatible with 𝒜i+1. This algorithm is implementable in FO(SUM).

Ordered formulas and cell selection.

Let ψ be the FO(𝐑lin,F) formula under consideration. Let x1,,xd be an enumeration of the set of variables in ψ, free or bound. We may assume that ψ is in prenex normal form Q1x1Qdxdχ, where each Qi is or , and χ is quantifier-free.

We will furthermore assume that ψ is ordered, meaning that every atomic subformula is of the form F(xi1,,xim)=xj with i1<<im<j, or is a linear constraint of the form a0+a1x1++adxd>0. By using extra variables, every FO(𝐑lin,F) formula can be brought in ordered normal form.

Consider a PWL function f:. Every piece is a segment of a line ax+b=y in 2. We define the hyperplane arrangement corresponding to f in d dimensions to consist of all hyperplanes axi+b=xj, for all lines ax+b=y of f, where i<j (in line with the ordered assumption on the formula ψ). We denote this arrangement by 𝒜f.

Also the query ψ gives rise to a hyperplane arrangement, denoted by 𝒜ψ, which simply consisting of all hyperplanes corresponding to the linear constraints in ψ.

For the following statement, we use the disjoint union of two weighted structures. Such a disjoint union can itself be represented as a weighted structure over the disjoint union of the two vocabularies, with two extra unary relations to distinguish the two domains.

Lemma 7.6.

Let ψQ1x1Qdxdχ be an ordered closed FO(𝐑lin,F) formula with function symbol F of arity m. Let k{0,,d}, and let ψk be Qk+1xk+1Qdxdχ. There exists a unary FO(SUM) query over ΥmpwlΥdcell that returns, on any piecewise linear function f:m and any CD 𝒟 of d compatible with 𝒜f𝒜ψ, a set of cells in k whose union equals {(v1,,vk)𝐑,fψ(v1,,vk)}.

Proof (sketch).

As already mentioned we focus first on m=1. The proof is by downward induction on k. The base case k=d deals with the quantifier-free part of ψ. We focus on the atomic subformulas. Subformulas of the form F(xi)=xj are dealt with as follows. For every piece p of f, with line y=ax+b, select all i-cells where xi lies between p’s endpoints. For each such cell, repeatedly take all cells in the stacks above it until we reach j1-cells. Now for each of these cells, take the section in its stack given by the section mapping xj=axi+b. For each of these sections, again repeatedly take all cells in the stacks above it until we reach d-cells. Denote the obtained set of d-cells by Sp; the desired set of cells is pSp.

Subformulas that are linear constraints, where i is the largest index such that a1 is nonzero, can be dealt with by taking, above every i1-cell all sections that lie above the hyperplane corresponding to the constraint, if ai>0, or, if ai<0, all sections that lie below it. The described algorithm for the quantifier-free part can be implemented in FO(SUM).

For the inductive case, if Qk+1 is , we must show that we can project a set of cells down one dimension, which is easy given the cylindrical nature of the decomposition; we just move to the underlying base cells. If Qk+1 is , we treat it as ¬¬, so we complement the current set of cells, project down, and complement again.

To conclude, let us summarise the structure of the whole proof. We are given a neural network 𝒩 in 𝐅(m,), and we want to evaluate a closed FO(𝐑lin,F) formula ψ. We assume the query to be in prenex normal form and ordered. We start with an interpretation that transforms 𝒩 to a structure representing the piecewise linear function F𝒩 (Lemma 7.3). Then, using another interpretation, we expand the structure by the hyperplane arrangement obtained from the linear pieces of F𝒩 as well as the query. Using Lemma 7.5, we expand the current structure by a cell decomposition compatible with the hyperplane arrangement. Finally, using Lemma 7.6 we inductively process the query on this cell decomposition, at each step selecting the cells representing all tuples satisfying the current formula. Since the formula ψ is closed, we eventually either get the single 0-dimensional cell, in which case ψ holds, or the empty set, in which case ψ does not hold.

7.3 Extension to multiple inputs

For m>1, the notion of PWL function f:m is more complex. We can conceptualize our representation of f as a decomposition of m into polytopes where, additionally, every polytope p is accompanied by an affine function fp such that f=pfp|p. We call fp the component function of f on p. Where for m=1 each pies of piece f was delineated by just two breakpoints, now our polytope in m may be delineated by many hyperplanes, called breakplanes. Thus, the vocabulary Υmpwl includes the position of a polytope relative to the breakplanes, indicating whether the polytope is on the breakplane, or on the positive or negative side of it. We next sketch how to prove Lemma 7.3 in its generality. The proof of Lemma 7.6 poses no additional problems.

We will define a PWL function fu for every neuron u in the network; the final result is then fout. To represent these functions for every neuron, we simply add one extra relation symbol, indicating to which function each element of a Υmpwl-structure belongs. The construction is by induction on the layer number. At the base of the induction are the input neurons. The i-th input neuron defines the PWL functions where there is only one polytope (m itself), whose section mapping is the function 𝒙xi.

Scaling.

For any hidden neuron u and incoming edge vu with weight w, we define an auxiliary function fv,u which simply scales fv by w.

To represent the function defined by u, we need to sum the fv,u’s and add u’s bias; and apply ReLU. We describe these two steps below, which can be implemented in FO(SUM). For u=out, the ReLU step is omitted.

Summing.

For each vu, let 𝒟v,u be the CD for fv,u, and let 𝒜v,u be the set of hyperplanes in m that led to 𝒟v,u. We define the arrangements 𝒜u=v𝒜v,u and 𝒜=u𝒜u. We apply Lemma 7.5 to 𝒜 to obtain a CD 𝒟 of 𝒜, which is also compatible with each 𝒜u. Every m-cell C in 𝒟 is contained in a unique polytope pv,uCfv,u for every vu. We can define pv,uC as the polytope that is positioned the same with respect to the hyperplanes in 𝒜v,u as C is. Two m-cells C and C are called u-equivalent if pv,uC=pv,uC for every vu. We can partition m in polytopes formed by merging each u-equivalence class [C]. Over this partition we define a PWL function gu. On each equivalence class [C], we define gu as vufpv,uC, plus u’s bias. The constructed function gu equals b(u)+vfu,v.

ReLU.

To represent ReLU(gu), we construct the new arrangements u formed by the union of 𝒜u with all hyperplanes given by component functions of gu, and =uu. Again apply Lemma 7.5 to to obtain a CD of , which is compatible with each Bu. Again every m-cell C in is contained in a unique polytope puC of gu with respect to 𝒜u. Now two m-cells C and C are called strongly u-equivalent if they are positioned the same with respect to the hyperplanes in u. This implies puC=puC but is stronger. We can partition m in polytopes formed by merging each u-equivalence class [C]. Over this partition we define a PWL function fu. Let ξuC be the component function of gu on puC. On each equivalence class [C], we define fu as ξuC if it is positive on C; otherwise it is set to be zero. The constructed function fu equals fu as desired.

8 Conclusion

The immediate motivation for this research is explainability and the verification of machine learning models. In this sense, our paper can be read as an application to machine learning of classical query languages known from database theory. The novelty compared to earlier proposals [3, 26] is our focus on real-valued weights and input and output features. More speculatively, we may envision machine learning models as genuine data sources, maybe in combination with more standard databases, and we want to provide a uniform interface. For example, practical applications of large language models will conceivably also need to store a lot of hard facts. However, just being able to query them through natural-language prompts may be suboptimal for integrating them into larger systems. Thus query languages for machine learning models may become a highly relevant research direction.

FO(SUM) queries will be likely very complex, so our result opens up challenges for query processing of complex, analytical SQL queries. Such queries are at the focus of much current database systems research, and supported by recent systems such as DuckDB [30] and Umbra [20]. It remains to be investigated to what extent white-box querying can be made useful in practice. The construction of a cell decomposition of variable space turned out crucial in the proof of our main result. Such cell decompositions might be preproduced by a query processor as a novel kind of index data structure.

While the language FO(𝐑) should mainly be seen as an expressiveness benchmark, techniques from SMT solving and linear programming are being adapted in the context of verifying neural networks [2]. Given the challenge, it is conceivable that for specific classes of applications, FO(𝐑) querying can be made practical.

Many follow-up questions remain open. Does the simulation of FO(𝐑lin) by FO(SUM) extend to FO(𝐑)? Importantly, how about other activations than ReLU [34]? If we extend FO(SUM) with quantifiers over weights, i.e., real numbers, what is the expressiveness gain? Expressing FO(𝐑) on bounded-depth neural networks now becomes immediate, but do we get strictly more expressivity? Also, to overcome the problem of being unable to even evaluate neural networks of unbounded depth, it seems natural to add recursion to FO(SUM). Fixed-point languages with real arithmetic can be difficult to handle [6, 10].

The language FO(SUM) can work with weighted relational structures of arbitrary shapes, so it is certainly not restricted to the FNN architecture. Thus, looking at other NN architectures is another direction for further research. Finally, we mention the question of designing flexible model query languages where the number of inputs, or outputs, need not be known in advance [3, 4].

References

  • [1] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.
  • [2] A. Albarghouthi. Introduction to neural network verification. Foundations and Trends in Programming Languages, 7(1–2):1–157, 2021. doi:10.1561/2500000051.
  • [3] Marcelo Arenas, Daniel Báez, Pablo Barceló, Jorge Pérez, and Bernardo Subercaseaux. Foundations of symbolic languages for model interpretability. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 11690–11701, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/60cb558c40e4f18479664069d9642d5a-Abstract.html.
  • [4] Marcelo Arenas, Pablo Barceló, Diego Bustamante, Jose Caraball, and Bernardo Subercaseaux. A uniform language to explain decision trees. In Pierre Marquis, Magdalena Ortiz, and Maurice Pagnucco, editors, Proceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning, KR 2024, Hanoi, Vietnam. November 2-8, 2024, pages 60–70, 2024. doi:10.24963/kr.2024/6.
  • [5] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in Real Algebraic Geometry. Springer, second edition, 2008.
  • [6] Michael Benedikt, Martin Grohe, Leonid Libkin, and Luc Segoufin. Reachability and connectivity queries in constraint databases. J. Comput. Syst. Sci., 66(1):169–206, 2003. doi:10.1016/S0022-0000(02)00034-X.
  • [7] F. Bodria, F. Giannotti, R. Guidotti, F. Naretto, D. Pedreschi, and S. Rinzivillo. Benchmarking and survey of explanation methods for black box models. Data Mining and Knowledge Discovery, 37:1719–1778, 2023. doi:10.1007/s10618-023-00933-9.
  • [8] Molnar Christoph. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Leanpub, second edition, 2022. URL: https://christophm.github.io/interpretable-ml-book.
  • [9] George Cybenko. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst., 2(4):303–314, 1989. doi:10.1007/BF02551274.
  • [10] Floris Geerts and Bart Kuijpers. On the decidability of termination of query evaluation in transitive-closure logics for polynomial constraint databases. Theor. Comput. Sci., 336(1):125–151, 2005. doi:10.1016/j.tcs.2004.10.034.
  • [11] Ian J. Goodfellow, Yoshua Bengio, and Aaron C. Courville. Deep Learning. Adaptive computation and machine learning. MIT Press, 2016. URL: http://www.deeplearningbook.org/.
  • [12] E. Grädel and Y. Gurevich. Metafinite model theory. Information and Computation, 140(1):26–81, 1998. doi:10.1006/inco.1997.2675.
  • [13] Erich Grädel, Phokion G. Kolaitis, Leonid Libkin, Maarten Marx, Joel Spencer, Moshe Y. Vardi, Yde Venema, and Scott Weinstein. Finite Model Theory and Its Applications. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2007. doi:10.1007/3-540-68804-8.
  • [14] Martin Grohe, Christoph Standke, Juno Steegmans, and Jan Van den Bussche. Query languages for neural networks. CoRR, abs/2408.10362, 2024. doi:10.48550/arXiv.2408.10362.
  • [15] Dan Halperin. Arrangements. In Jacob E. Goodman and Joseph O’Rourke, editors, Handbook of Discrete and Computational Geometry, Second Edition, chapter 24, pages 529–562. Chapman and Hall/CRC, second edition, 2004. doi:10.1201/9781420035315.ch24.
  • [16] Wilfrid Hodges. Model Theory, volume 42 of Encyclopedia of mathematics and its applications. Cambridge University Press, 1993.
  • [17] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, 1991. doi:10.1016/0893-6080(91)90009-T.
  • [18] P.C. Kanellakis, G.M. Kuper, and P.Z. Revesz. Constraint query languages. Journal of Computer and System Sciences, 51(1):26–52, August 1995. doi:10.1006/jcss.1995.1051.
  • [19] A. Klug. Equivalence of relational algebra and relational calculus query languages having aggregate functions. Journal of the ACM, 29(3):699–717, 1982. doi:10.1145/322326.322332.
  • [20] A. Kohn, V. Leis, and Th. Neumann. Tidy tuples and flying start: fast compilation and fast execution of relational queries in Umbra. VLDB Journal, 30(5):883–905, 2021. doi:10.1007/s00778-020-00643-4.
  • [21] Gabriel M. Kuper, Leonid Libkin, and Jan Paredaens, editors. Constraint Databases. Springer, 2000. doi:10.1007/978-3-662-04031-7.
  • [22] Marta Kwiatkowska and Xiyue Zhang. When to trust AI: advances and challenges for certification of neural networks. In Maria Ganzha, Leszek A. Maciaszek, Marcin Paprzycki, and Dominik Slezak, editors, Proceedings of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023, Warsaw, Poland, September 17-20, 2023, volume 35 of Annals of Computer Science and Information Systems, pages 25–37. Polish Information Processing Society, 2023. doi:10.15439/2023F2324.
  • [23] L. Libkin. Expressive power of SQL. Theoretical Computer Science, 296:379–404, 2003. doi:10.1016/S0304-3975(02)00736-3.
  • [24] Leonid Libkin. Elements of Finite Model Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2004. doi:10.1007/978-3-662-07003-1.
  • [25] Changliu Liu, Tomer Arnon, Christopher Lazarus, Christopher Strong, Clark Barrett, Mykel J Kochenderfer, et al. Algorithms for verifying deep neural networks. Foundations and Trends® in Optimization, 4(3-4):244–404, 2021. doi:10.1561/2400000035.
  • [26] X. Liu and E. Lorini. A unified logical framework for explanations in classifier systems. Journal of Logic and Computation, 33(2):485–515, 2023. doi:10.1093/logcom/exac102.
  • [27] Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In NIPS, pages 4765–4774, 2017. URL: https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.
  • [28] Abo Khamis M., H.Q. Ngo, and A. Rudra. Juggling functions inside a database. SIGMOD Record, 46(1):6–13, 2017. doi:10.1145/3093754.3093757.
  • [29] João Marques-Silva. Logic-based explainability in machine learning. In Leopoldo E. Bertossi and Guohui Xiao, editors, Reasoning Web. Causality, Explanations and Declarative Knowledge - 18th International Summer School 2022, Berlin, Germany, September 27-30, 2022, Tutorial Lectures, volume 13759 of Lecture Notes in Computer Science, pages 24–104. Springer, 2022. doi:10.1007/978-3-031-31414-8_2.
  • [30] Mark Raasveldt and Hannes Mühleisen. Duckdb: an embeddable analytical database. In Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska, editors, Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 1981–1984. ACM, 2019. doi:10.1145/3299869.3320212.
  • [31] C. Rudin. Stop explaining black box maching learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1:206–215, 2019. doi:10.1038/s42256-019-0048-x.
  • [32] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014. doi:10.48550/arXiv.1312.6199.
  • [33] A. Tarski. A Decision Method for Elementary Algebra and Geometry. University of California Press, 1951.
  • [34] Vincent Tjeng, Kai Yuanqing Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL: https://openreview.net/forum?id=HyGIdiRqtm.
  • [35] Szymon Torunczyk. Aggregate queries on sparse databases. In Dan Suciu, Yufei Tao, and Zhewei Wei, editors, Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2020, Portland, OR, USA, June 14-19, 2020, pages 427–443. ACM, 2020. doi:10.1145/3375395.3387660.
  • [36] Steffen van Bergerem and Nicole Schweikardt. Learning concepts described by weight aggregation logic. In Christel Baier and Jean Goubault-Larrecq, editors, 29th EACSL Annual Conference on Computer Science Logic, CSL 2021, January 25-28, 2021, Ljubljana, Slovenia (Virtual Conference), volume 183 of LIPIcs, pages 10:1–10:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.CSL.2021.10.
  • [37] Sandra Wachter, Brent D. Mittelstadt, and Chris Russell. Counterfactual explanation without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31(2):841–887, 2018. doi:10.2139/ssrn.3063289.