Query Languages for Neural Networks
Abstract
We lay the foundations for a database-inspired approach to interpreting and understanding neural network models by querying them using declarative languages. Towards this end we study different query languages, based on first-order logic, that mainly differ in their access to the neural network model. First-order logic over the reals naturally yields a language which views the network as a black box; only the input–output function defined by the network can be queried. This is essentially the approach of constraint query languages. On the other hand, a white-box language can be obtained by viewing the network as a weighted graph, and extending first-order logic with summation over weight terms. The latter approach is essentially an abstraction of SQL. In general, the two approaches are incomparable in expressive power, as we will show. Under natural circumstances, however, the white-box approach can subsume the black-box approach; this is our main result. We prove the result concretely for linear constraint queries over real functions definable by feedforward neural networks with a fixed number of hidden layers and piecewise linear activation functions.
Keywords and phrases:
Expressive power of query languages, Machine learning models, languages for interpretability, explainable AIFunding:
Martin Grohe: Funded by the European Union (ERC, SymSim, 101054974). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computation Database query languages (principles)Editors:
Sudeepa Roy and Ahmet KaraSeries and Publisher:

1 Introduction
Neural networks [11] are a popular and successful representation model for real functions learned from data. Once deployed, the neural network is “queried” by supplying it with inputs then obtaining the outputs. In the field of databases, however, we have a much richer conception of querying than simply applying a function to given arguments. For example, in querying a database relation , we can not only ask for Anne’s salary; we can also ask how many salaries are below that of Anne’s; we can ask whether no two employees have the same salary; and so on.
In this paper, we consider the querying of neural networks from this more general perspective. We see many potential applications: obvious ones are in explanation, verification, and interpretability of neural networks and other machine-learning models [8, 2, 25]. These are huge areas [31, 7] where it is important [29, 22] to have formal, logical definitions for the myriad notions of explanation that are being considered. Another potential application is in managing machine-learning projects, where we are testing many different architectures and training datasets, leading to a large number of models, most of which become short-term legacy code. In such a context it would be useful if the data scientist could search the repository for earlier generated models having certain characteristics in their architecture or in their behavior, which were perhaps not duly documented.
The idea of querying machine learning models with an expressive, declarative query language comes naturally to database researchers, and indeed, Arenas et al. already proposed a language for querying boolean functions over an unbounded set of boolean features [3]. In the modal logic community, similar languages are being investigated [26, references therein].
In the present work, we focus on real, rather than boolean, functions and models, as is indeed natural in the setting of verifying neural networks [2].
The constraint query language approach.
A natural language for querying real functions on a fixed number of arguments (features) is obtained by simply using first-order logic over the reals, with a function symbol representing the function to be queried. We denote this by . For example, consider functions with three arguments. The formula expresses that the output on does not depend strongly on the second feature, i.e., is -close to for any . Here, , , and can be real constants or parameters (free variables).
The language (also known as ) and its restriction to linear arithmetic (aka ) were intensively investigated in database theory around the turn of the century, under the heading of constraint query languages, with applications to spatial and temporal databases. See the compendium volume [21] and book chapters [24, chapter 13], [13, chapter 5]. Linear formulas with only universal quantifiers over the reals, in front of a quantifier-free condition involving only linear arithmetic (as the above example formula), can already model many properties considered in the verification of neural networks [2]. This universal fragment of can be evaluated using linear programming techniques [2].
Full allows alternation of quantifiers over the reals, and multiplication in arithmetic. Because the first-order theory of the reals is decidable [5], queries can still be effectively evaluated on any function that is semi-algebraic, i.e., itself definable in first-order logic over the reals. Although the complexity of this theory is high, if the function is presented as a quantifier-free formula, query evaluation actually has polynomial-time data complexity; here, the “data” consists of the given quantifier-free formula [18].
Functions that can be represented by feedforward neural networks with ReLU hidden units and linear output units are clearly semi-algebraic; in fact, they are piecewise linear. For most of our results, we will indeed focus on this class of networks, which are widespread in practice [11], and denote them by ReLU-FNN.
The SQL approach.
Another natural approach to querying neural networks is to query them directly, as graphs of neurons with weights on the nodes and edges. For this purpose one represents such graphs as relational structures with numerical values and uses SQL to query them. As an abstraction of this approach, in this paper, we model neural networks as weighted finite structures. As a query language we use FO(SUM): first-order logic over weighted structures, allowing order comparisons between weight terms, where weight terms can be built up using rational arithmetic, if-then-else, and, importantly, summation.
Based on logics originally introduced by Grädel and Gurevich [12], the language FO(SUM) is comparable to the relational calculus with aggregates [19] and, thus, to SQL [23]. Logics close to FO(SUM), but involving arithmetic in different semirings, were recently also used for unifying different algorithmic problems in query processing [35], as well as for expressing hypotheses in the context of learning over structures [36]. The well-known FAQ framework [28], restricted to the real semiring, can be seen as the conjunctive fragment of FO(SUM).
To give a simple example of an FO(SUM) formula, consider ReLU-FNNs with a single input unit, one hidden layer of ReLU units, and a single linear output unit. The following formula expresses the query that asks if the function evaluation on a given input value is positive:
Here, is the edge relation between neurons, and constants in and out hold the input and output unit, respectively. Thus, variable ranges over the neurons in the hidden layer. Weight functions and indicate the weights of edges and the biases of units, respectively; the weight constant stands for a given input value. We assume for clarity that ReLU is given, but it is definable in FO(SUM).
Just like the relational calculus with aggregates, or SQL select statements, query evaluation for FO(SUM) has polynomial time data complexity, and techniques for query processing and optimization from database systems directly apply.
Comparing expressive powers.
Expressive power of query languages has been a classical topic in database theory and finite model theory [1, 24], so, with the advent of new models, it is natural to revisit questions concerning expressivity. The goal of this paper is to understand and compare the expressive power of the two query languages and FO(SUM) on neural networks over the reals. The two languages are quite different. sees the model as a black-box function , but can quantify over the reals. FO(SUM) can see the model as a white box, a finite weighted structure, but can quantify only over the elements of the structure, i.e., the neurons.
In general, indeed the two expressive powers are incomparable. In FO(SUM), we can express queries about the network topology; for example, we may ask to return the hidden units that do not contribute much to the function evaluation on a given input value. (Formally, leaving them out of the network would yield an output within some of the original output.) Or, we may ask whether there are more than a million neurons in the first hidden layer. For , being a black box language, such queries are obviously out of scope.
A more interesting question is how the two languages compare in expressing model agnostic queries: these are queries that return the same result on any two neural networks that represent the same input–output function. For example, when restricting attention to networks with one hidden layer, the example FO(SUM) formula seen earlier, which evaluates the network, is model agnostic. is model agnostic by design, and, indeed, serves as a very natural declarative benchmark of expressiveness for model-agnostic queries. It turns out that FO(SUM), still restricting to networks of some fixed depth, can express model-agnostic queries that cannot. For example, for any fixed depth , we will show that FO(SUM) can express the integrals of a functions given by a ReLU-FNNs of depth . In contrast, we will show that this cannot be done in (Theorem 6.1).
The depth of a neural network can be taken as a crude notion of “schema”. Standard relational query languages typically cannot be used without knowledge of the schema of the data. Similarly, we will show that without knowledge of the depth, FO(SUM) cannot express any nontrivial model-agnostic query (Theorem 6.2). Indeed, since FO(SUM) lacks recursion, function evaluation can only be expressed if we known the depth. (Extensions with recursion is one of the many interesting directions for further research.)
When the depth is known, however, for model-agnostic queries, the expressiveness of FO(SUM) exceeds the benchmark of expressiveness provided by . Specifically, we show that every query over functions representable by ReLU-FNNs is also expressible in FO(SUM) evaluated on the networks directly (Theorem 7.1). This is our main technical result, and can be paraphrased as “SQL can verify neural networks.” The proof involves showing that the required manipulations of higher-dimensional piecewise linear functions, and the construction of cylindrical cell decompositions in , can all be expressed in FO(SUM). To allow for a modular proof, we also develop the notion of FO(SUM) translation, generalizing the classical notion of first-order interpretations [16].
This paper is organized as follows. Section 2 provides preliminaries on neural networks. Section 3 introduces . Section 4 introduces weighted structures and FO(SUM), after which Section 5 introduces white-box querying. Section 6 considers model-agnostic queries. Section 7 presents the main technical result. Section 8 concludes with a discussion of topics for further research.
A full version of this paper with full proofs is available [14].
2 Preliminaries on neural networks
A feedforward neural network [11], in general, could be defined as a finite, directed, weighted, acyclic graph, with some additional aspects which we discuss next. The nodes are also referred to as neurons or units. Some of the source nodes are designated as inputs, and some of the sink nodes are designated as outputs. Both the inputs, and the outputs, are linearly ordered. Neurons that are neither inputs nor outputs are said to be hidden. All nodes, except for the inputs, carry a weight, a real value, called the bias. All directed edges also carry a weight.
In this paper, we focus on ReLU-FNNs: networks with ReLU activations and linear outputs. This means the following. Let be a neural network with inputs. Then every node in represents a function defined as follows. We proceed inductively based on some topological ordering of . For input nodes , simply , if is the th input node. Now let be a hidden neuron and assume is already defined for all predecessors of , i.e., nodes with an edge to . Let be these predecessors, let be the weights on the respective edges, and let be the bias of . Then
where .
Finally, for an output node , we define similarly to hidden neurons, except that the application of ReLU is omitted. The upshot is that a neural network with inputs and outputs represents a function mapping to . For any node in the network, is always a continuous piecewise linear function. We denote the class of all continuous piecewise linear functions by ; that is, continuous functions that admit a partition of into finitely many polytopes such that is affine linear on each of them.
Hidden layers.
Commonly, the hidden neurons are organized in disjoint blocks called layers. The layers are ordered, such that the neurons in the first layer have only edges from inputs, and the neurons in any later layer have only edges from neurons in the previous layer. Finally, outputs have only edges from neurons in the last layer.
We will use to denote the class of layered networks with inputs of depth , that is, with an input layer with nodes, hidden layers, and an output layer with a single node. Recall that the nodes on all hidden layer use ReLU activations and the output node uses the identity function.
It is easy to see that networks in just compute linear functions and that for every we have , that is, the class of functions that can be computed by a network in is the class of all continuous piecewise linear functions. The well-known Universal Approximation Theorem [9, 17] says that every continuous function defined on a compact domain can be approximated to any additive error by a network in .
3 A black-box query language
First-order logic over the reals, denoted here by , is, syntactically, just first-order logic over the vocabulary of elementary arithmetic, i.e., with binary function symbols and for addition and multiplication, binary predicate , and constant symbols and [5]. Constants for rational numbers, or even algebraic numbers, can be added as an abbreviation (since they are definable in the logic).
The fragment of linear formulas uses multiplication only for scalar multiplication, i.e., multiplication of variables with rational number constants. For example, the formula is linear, but the formula is not. In practice, linear queries are often sufficiently expressive, both from earlier applications for temporal or spatial data [21], as well as for querying neural networks (see examples to follow). The only caveat is that many applications assume a distance function on vectors. When using distances based on absolute value differences between real numbers, e.g., the Manhattan distance or the max norm, we still fall within .
We will add to extra relation or function symbols; in this paper, we will mainly consider , which is with an extra function symbol . The structure on the domain of reals, with the arithmetic symbols having their obvious interpretation, will be denoted here by . Semantically, for any vocabulary of extra relation and function symbols, formulas are interpreted over structures that expand with additional relations and functions on of the right arities, that interpret the symbols in . In this way, expresses queries about functions .
This language can express a wide variety of properties (queries) considered in interpretable machine learning and neural-network verification. Let us see some examples.
Example 3.1.
To check whether is robust around an -vector [32], using parameters and , we can write the formula . Here stands for a tuple of variables, and stands for some distance function which is assumed to be expressible.
Example 3.2.
Counterfactual explanation methods [37] aim to find the closest to an input such that is “expected,” assuming that was unexpected. A typical example is credit denial; what should we change minimally to be granted credit? Typically we can define expectedness by some formula, e.g., . Then we can express the counterfactual explanation as .
Example 3.3.
We may define the contribution of an input feature on an input as the inverse of the smallest change we have to make to that feature for the output to change significantly. We can express that is such a change by writing (taking for clarity) . Denoting this formula by , the smallest change is then expressed as .
Example 3.4.
We finally illustrate that can express gradients and many other notions from calculus. For simplicity assume to be unary. Consider the definition of the derivative in a point . So it suffices to show how to express that for a function that is continuous in . We can write down the textbook definition literally as .
Evaluating queries.
Black box queries can be effectively evaluated using the decidability and quantifier elimination properties of . This is the constraint query language approach [18, 21], which we briefly recall next.
A function is called semialgebraic [5] (or semilinear) if there exists an (or ) formula such that for any -vector and real value , we have if and only if .
Now consider the task of evaluating an formula on a semialgebraic function , given by a defining formula . By introducing auxiliary variables, we may assume that the function symbol is used in only in subformulas for the form . Then replace in each such subformula by , obtaining a pure formula .
Now famously, the first-order theory of is decidable [33, 5]. In other words, there is an algorithm that decides, for any formula and -vector , whether . Actually, a stronger property holds, to the effect that every -formula is equivalent to a quantifier-free formula. The upshot is that there is an algorithm that, given a query and a semialgebraic function given by a defining formula, outputs a quantifier-free formula defining the result set . If is given by a quantifier-free formula, the evaluation can be done in polynomial time in the length of the description of , so polynomial-time data complexity. This is because there are algorithms for quantifier elimination with complexity , where is the size of the formula, is a polynomial, is the number of quantifiers, and is a doubly exponential function [18, 5].
Complexity.
Of course, we want to evaluate queries on the functions represented by neural networks. From the definition given in Section 2, it is clear that the functions representable by ReLU-FNNs are always semialgebraic (actually, semilinear). For every output feature , it is straightforward to compile, from the network, a quantifier-free formula defining the th output component function. In this way we see that queries on ReLU-FNNs are, in principle, computable in polynomial time.
However, the algorithms are notoriously complex, and we stress again that should be mostly seen as a declarative benchmark of expressiveness. Moreover, we assume here for convenience that ReLU is a primitive function. ReLU can be expressed in using disjunction, but this may blow up the query formula, e.g., when converting to disjunctive normal form [2]. Symbolic constraint solving algorithms for the reals have been extended to deal with ReLU natively [2].
Remark 3.5.
In closing this Section we remark that, to query the entire network function, we would not strictly use just only a single function symbol , but rather the language , with function symbols for the outputs. In this paper, for the sake of clarity, we will often stick to a single output, but our treatment generalizes to multiple outputs.
4 Weighted structures and FO(SUM)
Weighted structures are standard abstract structures equipped with one or more weight functions from tuples of domain elements to values from some separate, numerical domain. Here, as numerical domain, we will use , the set of “lifted reals” where is an extra element representing an undefined value. Neural networks are weighted graph structures. Hence, since we are interested in declarative query languages for neural networks, we are interested in logics over weighted structures. Such logics were introduced by Grädel and Gurevich [12]. We consider here a concrete instantiation of their approach, which we denote by FO(SUM).
Recall that a (finite, relational) vocabulary is a finite set of function symbols and relation symbols, where each symbol comes with an arity (a natural number). We extend the notion of vocabulary to also include a number of weight function symbols, again with associated arities. We allow -ary weight function symbols, which we call weight constant symbols.
A (finite) structure over such a vocabulary consists of a finite domain , and functions and relations on of the right arities, interpreting the standard function symbols and relation symbols from . So far this is standard. Now additionally, interprets every weight function symbol , of arity , by a function .
The syntax of FO(SUM) formulas (over some vocabulary) is defined exactly as for standard first order logic, with one important extension. In addition to formulas (taking Boolean values) and standard terms (taking values in the structure), the logic contains weight terms taking values in . Weight terms are defined by the following grammar:
Here, is a weight function symbol of arity and the are standard terms; is a rational function applied to weight terms, with rational coefficients; is a formula; and is a tuple of variables. The syntax of weight terms and formulas is mutually recursive. As just seen, the syntax of formulas is used in the syntax of weight terms; conversely, weight terms and can be combined to form formulas and .
Recall that a rational function is a fraction between two polynomials. Thus, the arithmetic operations that we consider are addition, scalar multiplication by a rational number, multiplication, and division.
The free variables of a weight term are defined as follows. The weight term has no free variables. The free variables of are simply the variables occurring in the . A variable occurs free in if it occurs free in some . A variable occurs free in ‘if then else ’ if it occurs free in , , or . The free variables of are those of and , except for the variables in . A formula or (weight) term is closed if it has no free variables.
We can evaluate a weight term on a structure and a tuple providing values to the free variables. The result of the evaluation, denoted by , is a value in , defined in the obvious manner. In particular, when is of the form , we have
Division by zero, which can happen when evaluating terms of the form , is given the value . The arithmetical operations are extended so that , (scalar multiply), , and and always equal . Also, holds for all .
5 White-box querying
For any natural numbers and , we introduce a vocabulary for neural networks with inputs and outputs. We denote this vocabulary by , or just if and are understood. It has a binary relation symbol for the edges; constant symbols , …, and , …, for the input and output nodes; a unary weight function for the biases, and a binary weight function symbol for the weights on the edges.
Any ReLU-FNN , being a weighted graph, is an -structure in the obvious way. When there is no edge from node to , we put . Since inputs have no bias, we put for any input .
Depending on the application, we may want to enlarge with some additional parameters. For example, we can use additional weight constant symbols to provide input values to be evaluated, or output values to be compared with, or interval bounds, etc.
The logic FO(SUM) over the vocabulary (possibly enlarged as just mentioned) serves as a “white-box” query language for neural networks, since the entire model is given and can be directly queried, just like an SQL query can be evaluated on a given relational database. Contrast this with the language from Section 3, which only has access to the function represented by the network, as a black box.
Example 5.1.
While the language cannot see inside the model, at least it has direct access to the function represented by the model. When we use the language FO(SUM), we must compute this function ourselves. At least when we know the depth of the network, this is indeed easy. In the Introduction, we already showed a weight term expressing the evaluation of a one-layer neural network on a single input and output. We can easily generalize this to a weight term expressing the value of any of a fixed number of outputs, with any fixed number of inputs, and any fixed number of layers. Let , …, be additional weight constant symbols representing input values. Then the weight term expresses the value of any neuron in the first hidden layer ( is a variable). Denote this term by . Next, for any subsequent layer numbered , we inductively define the weight term as
Here, can be taken to be the weight term if then else 0. Finally, the value of the th output is given by the weight term , where is the number of the last hidden layer.
Example 5.2.
We can also look for useless neurons: neurons that can be removed from the network without altering the output too much on given values. Recall the weight term from the previous example; for clarity we just write . Let be a fresh variable, and let be the term obtained from by altering the summing conditions and by adding the conjunct . Then the formula expresses that is useless. (For we can take the weight term if then else .)
Another interesting example is computing integrals. Recall that is the class of networks with inputs, one output, and depth .
Lemma 5.3.
Let and be natural numbers. There exists an FO(SUM) term over with additional pairs of weight constant symbols and for , such that for any network in , and values and for the and , we have .
Proof (sketch).
We sketch here a self-contained and elementary proof for and (one input, one hidden layer). This case already covers all continuous piecewise linear functions .
Every hidden neuron may represent a “quasi breakpoint” in the piecewise linear function (that is, a point where its slope may change). Concretely, we consider the hidden neurons with nonzero input weights to avoid dividing by zero. Its -coordinate is given by the weight term . The -value at the breakpoint is then given by , where is the weight term from Example 5.1 and we substitute for .
Pairs of neurons representing successive breakpoints are easy to define by a formula . Such pairs represent the pieces of the function, except for the very first and very last pieces. For this proof sketch, assume we simply want the integral between the first breakpoint and the last breakpoint.
The area (positive or negative) contributed to the integral by the piece is easy to write as a weight term: . We sum these to obtain the desired integral. However, since different neurons may represent the same quasi breakpoint, we must divide by the number of duplicates. Hence, our desired term equals where is the formula .
Example 5.4.
A popular alternative to Example 3.3 for measuring the contribution of an input feature to an input is the score [27]. It assumes a probability distribution on the input space and quantifies the change to the expected value of caused by fixing input feature to in a random fixation order of the input features:
When we assume that is the product of uniform distributions over the intervals , we can write the conditional expectation for some by setting as follows.
where is a short notation for the variable obtained from by replacing with for all . With lemma 5.3, this conditional expectation can be expressed in FO(SUM) and by replacing with or respectively, we can express the score.
More examples.
Our main result will be that, over networks of a given depth, all of can be expressed in FO(SUM). So the examples from Section 3 (which are linear if a Manhattan or max distance is used) apply here as well. Moreover, the techniques by which we show our main result readily adapt to queries not about the final function represented by the network, but about the function represented by a neuron given as a parameter to the query, much as in Example 5.2. For example, in feature visualization [8] we want to find the input that maximizes the activation of some neuron . Since this is expressible in , it is also expressible in FO(SUM).
6 Model-agnostic queries
We have already indicated that is “black box” while FO(SUM) is “white box”. Black-box queries are commonly called model agnostic [8]. Some FO(SUM) queries may, and others may not, be model agnostic.
Formally, for some , let us call a closed FO(SUM) formula , possibly using weight constants , depth- model agnostic if for all all neural networks such that , and all we have . A similar definition applies to closed FO(SUM) weight terms.
For example, the term of Example 5.1 evaluating the function of a neural network of depth at most is depth- model agnostic. By comparison the formula stating that a network has useless neurons (cf. Example 5.2) is not model agnostic. The term from Lemma 5.3, computing the integral, is depth- model agnostic.
Theorem 6.1.
The query for functions is expressible by a depth- agnostic FO(SUM) formula, but not in .
Proof.
We have already seen the expressibility in FO(SUM). We prove nonexpressibility in .
Consider the equal-cardinality query about disjoint pairs of finite sets of reals, asking whether . Over abstract ordered finite structures, equal cardinality is well-known not to be expressible in order-invariant first-order logic [24]. Hence, by the generic collapse theorem for constraint query languages over the reals [21, 24], query is not expressible in .
Now for any given and , we construct a continuous piecewise linear function as follows. We first apply a suitable affine transformation so that falls within the open interval . Now is a sawtooth-like function, with positive teeth at elements from , negative teeth (of the same height, say 1) at elements from , and zero everywhere else. To avoid teeth that overlap the zero boundary at the left or that overlap each other, we make them of width , where is the minimum of and is the minimum distance between any two distinct elements in .
Expressing the above construction uniformly in poses no difficulties; let be a formula defining . Now assume, for the sake of contradiction, that would be expressible by a closed formula . Then composing with would express query in . Indeed, clearly, if and only if . So, cannot exist.
It seems awkward that in the definition of model agnosticity we need to bound the depth. Let us call an FO(SUM) term or formula fully model agnostic if is depth- model agnostic for every . It turns out that there are no nontrivial fully model agnostic FO(SUM) formulas.
Theorem 6.2.
Let be a fully model agnostic closed FO(SUM) formula over . Then either for all or for all .
We omit the proof. The idea is that FO(SUM) is Hanf-local [23, 24]. No formula can distinguish a long enough structures consisting of two chains where the middle nodes are marked by two distinct constants and , from its sibling structure where the markings are swapped. We can turn the two structures into neural networks by replacing the markings by two gadget networks and , representing different functions, that is supposed to distinguish. However, the construction is done so that the function represented by the structure is the same as that represented by the gadget in the left chain. Still, FO(SUM) cannot distinguish these two structures. So, is either not fully model-agnostic, or and cannot exist and is trivial.
Corollary 6.3.
The query is not expressible in FO(SUM).
7 From FO() to FO(SUM)
In practice, the number of layers in the employed neural network architecture is often fixed and known. Our main result then is that FO(SUM) can express all queries.
Theorem 7.1.
Let and be natural numbers. For every closed formula there exists a closed FO(SUM) formula such that for every network in , we have iff .
The challenge in proving this result is to simulate, using quantification and summation over neurons, the unrestricted access to real numbers that is available in . Thereto, we will divide the relevant real space in a finite number of cells which we can represent by finite tuples of neurons.
The proof involves several steps that transform weighted structures. Before presenting the proof, we formalize such transformations in the notion of FO(SUM) translation, which generalize the classical notion of first-order interpretation [16] to weighted structures.
7.1 FO(SUM) translations
Let and be vocabularies for weighted structures, and let be a natural number. An -ary FO(SUM) translation from to consists of a number of formulas and weight terms over , described next. There are formulas and ; formulas for every -ary relation symbol of ; and formulas for every -ary standard function symbol of . Furthermore, there are weight terms for every -ary weight function of .
In the above description, bold denote -tuples of distinct variables. Thus, the formulas and weight terms of define relations or weight functions of arities that are a multiple of .
We say that maps a weighted structure over to a weighted structure over if there exists a surjective function from to such that:
-
;
-
;
-
;
-
.
In the above, the bold denote -tuples in .
For any given , if maps to , then is unique up to isomorphism. Indeed, the elements of can be understood as representing the equivalence classes of the equivalence relation on . In particular, for to exist, must be admissible on , which means that is indeed an equivalence relation on , and all relations and all functions , and are invariant under this equivalence relation.
If is a class of structures over , and is a transformation of structures in to structures over , we say that expresses if is admissible on every in , and maps to .
The relevant reduction theorem for translations is the following:
Theorem 7.2.
Let be an -ary FO(SUM) translation from to , and let be a formula over . Then there exists a formula over such that whenever maps to through , we have iff . Furthermore, for any weight term over , there exists a weight term over such that .
Proof (sketch).
As this result is well known and straightforward to prove for classical first-order interpretations, we only deal here with summation terms, which are the main new aspect. Let be of the form . Then for we take .
7.2 Proof of Theorem 7.1
We sketch the proof of Theorem 7.1. For clarity of exposition, we present it first for single inputs, i.e., the case . We present three Lemmas which can be chained together to obtain the theorem.
Piecewise linear functions.
We can naturally model piecewise linear (PWL) functions from to as weighted structures, where the elements are simply the pieces. Each piece is defined by a line and left and right endpoints. Accordingly, we use a vocabulary with four unary weight functions indicating , , and the -coordinates of the endpoints. (The left- and rightmost pieces have no endpoint; we set their -coordinate to .)
For and , the proof of the following Lemma is based on the same ideas as in the proof sketch we gave for Lemma 5.3. For , PWL functions from to are more complex; the vocabulary and a general proof of the lemma will be described in Section 7.3.
Lemma 7.3.
Let and be natural numbers. There is an FO(SUM) translation from to that transforms every network in into a proper weighted structure representing .
Hyperplane arrangements.
An affine function on is a function of the form . An affine hyperplane is the set of zeros of some non-constant affine function (i.e. where at least one of the with is non-zero). A hyperplane arrangement is a collection of affine hyperplanes.
We naturally model a hyperplane arrangement as a weighted structure, where the elements are the hyperplanes. The vocabulary simply consists of unary weight functions , , …, indicating the coefficients of the affine function defining each hyperplane.
Remark 7.4.
An -structure may have duplicates, i.e., different elements representing the same hyperplane. This happens when they have the same coefficients up to a constant factor. In our development, we will allow structures with duplicates as representations of hyperplane arrangements.
Cylindrical decomposition.
We will make use of a linear version of the notion of cylindrical decomposition (CD) [5], which we call affine CD. An affine CD of is a sequence , where each is a partition of . The blocks of partition are referred to as -cells or simply cells. The precise definition is by induction on . For the base case, there is only one possibility . Now let . Then should already be an affine CD of . Furthermore, for every cell of , there must exist finitely many affine functions , …, from to , where may depend on . These are called the section mappings above , and must satisfy on . In this way, the section mappings induce a partition of the cylinder in sections and sectors. Each section is the graph of a section mapping, restricted to . Each sector is the volume above between two consecutive sections. Now must equal and is a section or sector above .
The ordered sequence of cells formed by the sectors and sections of is called the stack above , and is called the base cell for these cells.
An affine CD of is compatible with a hyperplane arrangement in if every every -cell lies entirely on, or above, or below every hyperplane . (Formally, the affine function is everywhere zero, or everywhere positive, or everywhere negative, on .)
We can represent a CD compatible with a hyperplane arrangement as a weighted structure with elements of two sorts: cells and hyperplanes. There is a constant for the “origin cell” . Binary relations link every -cell to its base -cell, and to its delineating section mappings. (Sections are viewed as degenerate sectors where the two delineating section mappings are identical.) Ternary relations give the order of two hyperplanes in above an -cell, and whether they are equal. The vocabulary for CDs of is denoted by .
Lemma 7.5.
Let be a natural number. There is an FO(SUM) translation from to that maps any hyperplane arrangement to a CD that is compatible with .
Proof (sketch).
We follow the method of vertical decomposition [15]. There is a projection phase, followed by a buildup phase. For the projection phase, let . For , take all intersections between hyperplanes in , and project one dimension down, i.e., project on the first components. The result is a hyperplane arrangement in . For the buildup phase, let . For , build a stack above every cell in formed by intersecting with all hyperplanes in . The result is a partition such that is a CD of compatible with . This algorithm is implementable in FO(SUM).
Ordered formulas and cell selection.
Let be the formula under consideration. Let be an enumeration of the set of variables in , free or bound. We may assume that is in prenex normal form , where each is or , and is quantifier-free.
We will furthermore assume that is ordered, meaning that every atomic subformula is of the form with , or is a linear constraint of the form . By using extra variables, every formula can be brought in ordered normal form.
Consider a PWL function . Every piece is a segment of a line in . We define the hyperplane arrangement corresponding to in dimensions to consist of all hyperplanes , for all lines of , where (in line with the ordered assumption on the formula ). We denote this arrangement by .
Also the query gives rise to a hyperplane arrangement, denoted by , which simply consisting of all hyperplanes corresponding to the linear constraints in .
For the following statement, we use the disjoint union of two weighted structures. Such a disjoint union can itself be represented as a weighted structure over the disjoint union of the two vocabularies, with two extra unary relations to distinguish the two domains.
Lemma 7.6.
Let be an ordered closed formula with function symbol of arity . Let , and let be . There exists a unary FO(SUM) query over that returns, on any piecewise linear function and any CD of compatible with , a set of cells in whose union equals .
Proof (sketch).
As already mentioned we focus first on . The proof is by downward induction on . The base case deals with the quantifier-free part of . We focus on the atomic subformulas. Subformulas of the form are dealt with as follows. For every piece of , with line , select all -cells where lies between ’s endpoints. For each such cell, repeatedly take all cells in the stacks above it until we reach -cells. Now for each of these cells, take the section in its stack given by the section mapping . For each of these sections, again repeatedly take all cells in the stacks above it until we reach -cells. Denote the obtained set of -cells by ; the desired set of cells is .
Subformulas that are linear constraints, where is the largest index such that is nonzero, can be dealt with by taking, above every -cell all sections that lie above the hyperplane corresponding to the constraint, if , or, if , all sections that lie below it. The described algorithm for the quantifier-free part can be implemented in FO(SUM).
For the inductive case, if is , we must show that we can project a set of cells down one dimension, which is easy given the cylindrical nature of the decomposition; we just move to the underlying base cells. If is , we treat it as , so we complement the current set of cells, project down, and complement again.
To conclude, let us summarise the structure of the whole proof. We are given a neural network in , and we want to evaluate a closed formula . We assume the query to be in prenex normal form and ordered. We start with an interpretation that transforms to a structure representing the piecewise linear function (Lemma 7.3). Then, using another interpretation, we expand the structure by the hyperplane arrangement obtained from the linear pieces of as well as the query. Using Lemma 7.5, we expand the current structure by a cell decomposition compatible with the hyperplane arrangement. Finally, using Lemma 7.6 we inductively process the query on this cell decomposition, at each step selecting the cells representing all tuples satisfying the current formula. Since the formula is closed, we eventually either get the single -dimensional cell, in which case holds, or the empty set, in which case does not hold.
7.3 Extension to multiple inputs
For , the notion of PWL function is more complex. We can conceptualize our representation of as a decomposition of into polytopes where, additionally, every polytope is accompanied by an affine function such that . We call the component function of on . Where for each pies of piece was delineated by just two breakpoints, now our polytope in may be delineated by many hyperplanes, called breakplanes. Thus, the vocabulary includes the position of a polytope relative to the breakplanes, indicating whether the polytope is on the breakplane, or on the positive or negative side of it. We next sketch how to prove Lemma 7.3 in its generality. The proof of Lemma 7.6 poses no additional problems.
We will define a PWL function for every neuron in the network; the final result is then . To represent these functions for every neuron, we simply add one extra relation symbol, indicating to which function each element of a -structure belongs. The construction is by induction on the layer number. At the base of the induction are the input neurons. The -th input neuron defines the PWL functions where there is only one polytope ( itself), whose section mapping is the function .
Scaling.
For any hidden neuron and incoming edge with weight , we define an auxiliary function which simply scales by .
To represent the function defined by , we need to sum the ’s and add ’s bias; and apply ReLU. We describe these two steps below, which can be implemented in FO(SUM). For , the ReLU step is omitted.
Summing.
For each , let be the CD for , and let be the set of hyperplanes in that led to . We define the arrangements and . We apply Lemma 7.5 to to obtain a CD of , which is also compatible with each . Every -cell in is contained in a unique polytope for every . We can define as the polytope that is positioned the same with respect to the hyperplanes in as is. Two -cells and are called -equivalent if for every . We can partition in polytopes formed by merging each -equivalence class . Over this partition we define a PWL function . On each equivalence class , we define as , plus ’s bias. The constructed function equals .
ReLU.
To represent , we construct the new arrangements formed by the union of with all hyperplanes given by component functions of , and . Again apply Lemma 7.5 to to obtain a CD of , which is compatible with each . Again every -cell in is contained in a unique polytope of with respect to . Now two -cells and are called strongly -equivalent if they are positioned the same with respect to the hyperplanes in . This implies but is stronger. We can partition in polytopes formed by merging each -equivalence class . Over this partition we define a PWL function . Let be the component function of on . On each equivalence class , we define as if it is positive on ; otherwise it is set to be zero. The constructed function equals as desired.
8 Conclusion
The immediate motivation for this research is explainability and the verification of machine learning models. In this sense, our paper can be read as an application to machine learning of classical query languages known from database theory. The novelty compared to earlier proposals [3, 26] is our focus on real-valued weights and input and output features. More speculatively, we may envision machine learning models as genuine data sources, maybe in combination with more standard databases, and we want to provide a uniform interface. For example, practical applications of large language models will conceivably also need to store a lot of hard facts. However, just being able to query them through natural-language prompts may be suboptimal for integrating them into larger systems. Thus query languages for machine learning models may become a highly relevant research direction.
FO(SUM) queries will be likely very complex, so our result opens up challenges for query processing of complex, analytical SQL queries. Such queries are at the focus of much current database systems research, and supported by recent systems such as DuckDB [30] and Umbra [20]. It remains to be investigated to what extent white-box querying can be made useful in practice. The construction of a cell decomposition of variable space turned out crucial in the proof of our main result. Such cell decompositions might be preproduced by a query processor as a novel kind of index data structure.
While the language should mainly be seen as an expressiveness benchmark, techniques from SMT solving and linear programming are being adapted in the context of verifying neural networks [2]. Given the challenge, it is conceivable that for specific classes of applications, querying can be made practical.
Many follow-up questions remain open. Does the simulation of by FO(SUM) extend to ? Importantly, how about other activations than ReLU [34]? If we extend FO(SUM) with quantifiers over weights, i.e., real numbers, what is the expressiveness gain? Expressing on bounded-depth neural networks now becomes immediate, but do we get strictly more expressivity? Also, to overcome the problem of being unable to even evaluate neural networks of unbounded depth, it seems natural to add recursion to FO(SUM). Fixed-point languages with real arithmetic can be difficult to handle [6, 10].
The language FO(SUM) can work with weighted relational structures of arbitrary shapes, so it is certainly not restricted to the FNN architecture. Thus, looking at other NN architectures is another direction for further research. Finally, we mention the question of designing flexible model query languages where the number of inputs, or outputs, need not be known in advance [3, 4].
References
- [1] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.
- [2] A. Albarghouthi. Introduction to neural network verification. Foundations and Trends in Programming Languages, 7(1–2):1–157, 2021. doi:10.1561/2500000051.
- [3] Marcelo Arenas, Daniel Báez, Pablo Barceló, Jorge Pérez, and Bernardo Subercaseaux. Foundations of symbolic languages for model interpretability. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 11690–11701, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/60cb558c40e4f18479664069d9642d5a-Abstract.html.
- [4] Marcelo Arenas, Pablo Barceló, Diego Bustamante, Jose Caraball, and Bernardo Subercaseaux. A uniform language to explain decision trees. In Pierre Marquis, Magdalena Ortiz, and Maurice Pagnucco, editors, Proceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning, KR 2024, Hanoi, Vietnam. November 2-8, 2024, pages 60–70, 2024. doi:10.24963/kr.2024/6.
- [5] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in Real Algebraic Geometry. Springer, second edition, 2008.
- [6] Michael Benedikt, Martin Grohe, Leonid Libkin, and Luc Segoufin. Reachability and connectivity queries in constraint databases. J. Comput. Syst. Sci., 66(1):169–206, 2003. doi:10.1016/S0022-0000(02)00034-X.
- [7] F. Bodria, F. Giannotti, R. Guidotti, F. Naretto, D. Pedreschi, and S. Rinzivillo. Benchmarking and survey of explanation methods for black box models. Data Mining and Knowledge Discovery, 37:1719–1778, 2023. doi:10.1007/s10618-023-00933-9.
- [8] Molnar Christoph. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Leanpub, second edition, 2022. URL: https://christophm.github.io/interpretable-ml-book.
- [9] George Cybenko. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst., 2(4):303–314, 1989. doi:10.1007/BF02551274.
- [10] Floris Geerts and Bart Kuijpers. On the decidability of termination of query evaluation in transitive-closure logics for polynomial constraint databases. Theor. Comput. Sci., 336(1):125–151, 2005. doi:10.1016/j.tcs.2004.10.034.
- [11] Ian J. Goodfellow, Yoshua Bengio, and Aaron C. Courville. Deep Learning. Adaptive computation and machine learning. MIT Press, 2016. URL: http://www.deeplearningbook.org/.
- [12] E. Grädel and Y. Gurevich. Metafinite model theory. Information and Computation, 140(1):26–81, 1998. doi:10.1006/inco.1997.2675.
- [13] Erich Grädel, Phokion G. Kolaitis, Leonid Libkin, Maarten Marx, Joel Spencer, Moshe Y. Vardi, Yde Venema, and Scott Weinstein. Finite Model Theory and Its Applications. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2007. doi:10.1007/3-540-68804-8.
- [14] Martin Grohe, Christoph Standke, Juno Steegmans, and Jan Van den Bussche. Query languages for neural networks. CoRR, abs/2408.10362, 2024. doi:10.48550/arXiv.2408.10362.
- [15] Dan Halperin. Arrangements. In Jacob E. Goodman and Joseph O’Rourke, editors, Handbook of Discrete and Computational Geometry, Second Edition, chapter 24, pages 529–562. Chapman and Hall/CRC, second edition, 2004. doi:10.1201/9781420035315.ch24.
- [16] Wilfrid Hodges. Model Theory, volume 42 of Encyclopedia of mathematics and its applications. Cambridge University Press, 1993.
- [17] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, 1991. doi:10.1016/0893-6080(91)90009-T.
- [18] P.C. Kanellakis, G.M. Kuper, and P.Z. Revesz. Constraint query languages. Journal of Computer and System Sciences, 51(1):26–52, August 1995. doi:10.1006/jcss.1995.1051.
- [19] A. Klug. Equivalence of relational algebra and relational calculus query languages having aggregate functions. Journal of the ACM, 29(3):699–717, 1982. doi:10.1145/322326.322332.
- [20] A. Kohn, V. Leis, and Th. Neumann. Tidy tuples and flying start: fast compilation and fast execution of relational queries in Umbra. VLDB Journal, 30(5):883–905, 2021. doi:10.1007/s00778-020-00643-4.
- [21] Gabriel M. Kuper, Leonid Libkin, and Jan Paredaens, editors. Constraint Databases. Springer, 2000. doi:10.1007/978-3-662-04031-7.
- [22] Marta Kwiatkowska and Xiyue Zhang. When to trust AI: advances and challenges for certification of neural networks. In Maria Ganzha, Leszek A. Maciaszek, Marcin Paprzycki, and Dominik Slezak, editors, Proceedings of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023, Warsaw, Poland, September 17-20, 2023, volume 35 of Annals of Computer Science and Information Systems, pages 25–37. Polish Information Processing Society, 2023. doi:10.15439/2023F2324.
- [23] L. Libkin. Expressive power of SQL. Theoretical Computer Science, 296:379–404, 2003. doi:10.1016/S0304-3975(02)00736-3.
- [24] Leonid Libkin. Elements of Finite Model Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2004. doi:10.1007/978-3-662-07003-1.
- [25] Changliu Liu, Tomer Arnon, Christopher Lazarus, Christopher Strong, Clark Barrett, Mykel J Kochenderfer, et al. Algorithms for verifying deep neural networks. Foundations and Trends® in Optimization, 4(3-4):244–404, 2021. doi:10.1561/2400000035.
- [26] X. Liu and E. Lorini. A unified logical framework for explanations in classifier systems. Journal of Logic and Computation, 33(2):485–515, 2023. doi:10.1093/logcom/exac102.
- [27] Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In NIPS, pages 4765–4774, 2017. URL: https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.
- [28] Abo Khamis M., H.Q. Ngo, and A. Rudra. Juggling functions inside a database. SIGMOD Record, 46(1):6–13, 2017. doi:10.1145/3093754.3093757.
- [29] João Marques-Silva. Logic-based explainability in machine learning. In Leopoldo E. Bertossi and Guohui Xiao, editors, Reasoning Web. Causality, Explanations and Declarative Knowledge - 18th International Summer School 2022, Berlin, Germany, September 27-30, 2022, Tutorial Lectures, volume 13759 of Lecture Notes in Computer Science, pages 24–104. Springer, 2022. doi:10.1007/978-3-031-31414-8_2.
- [30] Mark Raasveldt and Hannes Mühleisen. Duckdb: an embeddable analytical database. In Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska, editors, Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 1981–1984. ACM, 2019. doi:10.1145/3299869.3320212.
- [31] C. Rudin. Stop explaining black box maching learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1:206–215, 2019. doi:10.1038/s42256-019-0048-x.
- [32] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014. doi:10.48550/arXiv.1312.6199.
- [33] A. Tarski. A Decision Method for Elementary Algebra and Geometry. University of California Press, 1951.
- [34] Vincent Tjeng, Kai Yuanqing Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL: https://openreview.net/forum?id=HyGIdiRqtm.
- [35] Szymon Torunczyk. Aggregate queries on sparse databases. In Dan Suciu, Yufei Tao, and Zhewei Wei, editors, Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2020, Portland, OR, USA, June 14-19, 2020, pages 427–443. ACM, 2020. doi:10.1145/3375395.3387660.
- [36] Steffen van Bergerem and Nicole Schweikardt. Learning concepts described by weight aggregation logic. In Christel Baier and Jean Goubault-Larrecq, editors, 29th EACSL Annual Conference on Computer Science Logic, CSL 2021, January 25-28, 2021, Ljubljana, Slovenia (Virtual Conference), volume 183 of LIPIcs, pages 10:1–10:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.CSL.2021.10.
- [37] Sandra Wachter, Brent D. Mittelstadt, and Chris Russell. Counterfactual explanation without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31(2):841–887, 2018. doi:10.2139/ssrn.3063289.