Query Languages for Neural Networks

Grohe, Martin; Standke, Christoph; Steegmans, Juno; Van den Bussche, Jan

doi:10.4230/LIPIcs.ICDT.2025.9

Query Languages for Neural Networks

Martin Grohe

RWTH Aachen University, Aachen, Germany Christoph Standke

RWTH Aachen University, Aachen, Germany Juno Steegmans

Data Science Institute, UHasselt, Diepenbeek, Belgium Jan Van den Bussche

Data Science Institute, UHasselt, Diepenbeek, Belgium

Abstract

We lay the foundations for a database-inspired approach to interpreting and understanding neural network models by querying them using declarative languages. Towards this end we study different query languages, based on first-order logic, that mainly differ in their access to the neural network model. First-order logic over the reals naturally yields a language which views the network as a black box; only the input–output function defined by the network can be queried. This is essentially the approach of constraint query languages. On the other hand, a white-box language can be obtained by viewing the network as a weighted graph, and extending first-order logic with summation over weight terms. The latter approach is essentially an abstraction of SQL. In general, the two approaches are incomparable in expressive power, as we will show. Under natural circumstances, however, the white-box approach can subsume the black-box approach; this is our main result. We prove the result concretely for linear constraint queries over real functions definable by feedforward neural networks with a fixed number of hidden layers and piecewise linear activation functions.

Keywords and phrases:

Expressive power of query languages, Machine learning models, languages for interpretability, explainable AI

Funding:

Martin Grohe: Funded by the European Union (ERC, SymSim, 101054974). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

Christoph Standke: Funded by the German Research Foundation (DFG) under grants GR 1492/16-1 and GRK 2236 (UnRAVeL).

Juno Steegmans: Supported by the Special Research Fund (BOF) of UHasselt.

Jan Van den Bussche: Partially supported by the Flanders AI Program (FAIR).

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Database query languages (principles)

Editors:

Sudeepa Roy and Ahmet Kara

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Neural networks [11] are a popular and successful representation model for real functions learned from data. Once deployed, the neural network is “queried” by supplying it with inputs then obtaining the outputs. In the field of databases, however, we have a much richer conception of querying than simply applying a function to given arguments. For example, in querying a database relation $\rm Employee(name,salary)$ , we can not only ask for Anne’s salary; we can also ask how many salaries are below that of Anne’s; we can ask whether no two employees have the same salary; and so on.

In this paper, we consider the querying of neural networks from this more general perspective. We see many potential applications: obvious ones are in explanation, verification, and interpretability of neural networks and other machine-learning models [8, 2, 25]. These are huge areas [31, 7] where it is important [29, 22] to have formal, logical definitions for the myriad notions of explanation that are being considered. Another potential application is in managing machine-learning projects, where we are testing many different architectures and training datasets, leading to a large number of models, most of which become short-term legacy code. In such a context it would be useful if the data scientist could search the repository for earlier generated models having certain characteristics in their architecture or in their behavior, which were perhaps not duly documented.

The idea of querying machine learning models with an expressive, declarative query language comes naturally to database researchers, and indeed, Arenas et al. already proposed a language for querying boolean functions over an unbounded set of boolean features [3]. In the modal logic community, similar languages are being investigated [26, references therein].

In the present work, we focus on real, rather than boolean, functions and models, as is indeed natural in the setting of verifying neural networks [2].

The constraint query language approach.

A natural language for querying real functions on a fixed number of arguments (features) is obtained by simply using first-order logic over the reals, with a function symbol $F$ representing the function to be queried. We denote this by $\textup{FO}({\mathbf{R}})$ . For example, consider functions $F$ with three arguments. The formula $\forall b^{\prime}\,|F(a,b,c)-F(a,b^{\prime},c)|<\epsilon$ expresses that the output on $(a,b,c)$ does not depend strongly on the second feature, i.e., $F(a,b^{\prime},c)$ is $\epsilon$ -close to $F(a,b,c)$ for any $b^{\prime}$ . Here, $a$ , $b$ , $c$ and $\epsilon$ can be real constants or parameters (free variables).

The language $\textup{FO}({\mathbf{R}})$ (also known as $\textsc{FO}+\textsc{Poly}$ ) and its restriction $\textup{FO}({\mathbf{R}_{\rm lin}})$ to linear arithmetic (aka $\textsc{FO}+\textsc{Lin}$ ) were intensively investigated in database theory around the turn of the century, under the heading of constraint query languages, with applications to spatial and temporal databases. See the compendium volume [21] and book chapters [24, chapter 13], [13, chapter 5]. Linear formulas with only universal quantifiers over the reals, in front of a quantifier-free condition involving only linear arithmetic (as the above example formula), can already model many properties considered in the verification of neural networks [2]. This universal fragment of $\textup{FO}({\mathbf{R}_{\rm lin}})$ can be evaluated using linear programming techniques [2].

Full $\textup{FO}({\mathbf{R}})$ allows alternation of quantifiers over the reals, and multiplication in arithmetic. Because the first-order theory of the reals is decidable [5], $\textup{FO}({\mathbf{R}})$ queries can still be effectively evaluated on any function that is semi-algebraic, i.e., itself definable in first-order logic over the reals. Although the complexity of this theory is high, if the function is presented as a quantifier-free formula, $\textup{FO}({\mathbf{R}})$ query evaluation actually has polynomial-time data complexity; here, the “data” consists of the given quantifier-free formula [18].

Functions that can be represented by feedforward neural networks with ReLU hidden units and linear output units are clearly semi-algebraic; in fact, they are piecewise linear. For most of our results, we will indeed focus on this class of networks, which are widespread in practice [11], and denote them by ReLU-FNN.

The SQL approach.

Another natural approach to querying neural networks is to query them directly, as graphs of neurons with weights on the nodes and edges. For this purpose one represents such graphs as relational structures with numerical values and uses SQL to query them. As an abstraction of this approach, in this paper, we model neural networks as weighted finite structures. As a query language we use FO(SUM): first-order logic over weighted structures, allowing order comparisons between weight terms, where weight terms can be built up using rational arithmetic, if-then-else, and, importantly, summation.

Based on logics originally introduced by Grädel and Gurevich [12], the language FO(SUM) is comparable to the relational calculus with aggregates [19] and, thus, to SQL [23]. Logics close to FO(SUM), but involving arithmetic in different semirings, were recently also used for unifying different algorithmic problems in query processing [35], as well as for expressing hypotheses in the context of learning over structures [36]. The well-known FAQ framework [28], restricted to the real semiring, can be seen as the conjunctive fragment of FO(SUM).

To give a simple example of an FO(SUM) formula, consider ReLU-FNNs with a single input unit, one hidden layer of ReLU units, and a single linear output unit. The following formula expresses the query that asks if the function evaluation on a given input value is positive:

0<b(\mathrm{out})+\sum_{x:E(\mathrm{in},x)}w(x,\mathrm{out})\cdot\mathrm{ReLU}% (w(\mathrm{in},x)\cdot\mathit{val}+b(x)).

Here, $E$ is the edge relation between neurons, and constants in and out hold the input and output unit, respectively. Thus, variable $x$ ranges over the neurons in the hidden layer. Weight functions $w$ and $b$ indicate the weights of edges and the biases of units, respectively; the weight constant $\it val$ stands for a given input value. We assume for clarity that ReLU is given, but it is definable in FO(SUM).

Just like the relational calculus with aggregates, or SQL select statements, query evaluation for FO(SUM) has polynomial time data complexity, and techniques for query processing and optimization from database systems directly apply.

Comparing expressive powers.

Expressive power of query languages has been a classical topic in database theory and finite model theory [1, 24], so, with the advent of new models, it is natural to revisit questions concerning expressivity. The goal of this paper is to understand and compare the expressive power of the two query languages $\textup{FO}({\mathbf{R}})$ and FO(SUM) on neural networks over the reals. The two languages are quite different. $\textup{FO}({\mathbf{R}})$ sees the model as a black-box function $F$ , but can quantify over the reals. FO(SUM) can see the model as a white box, a finite weighted structure, but can quantify only over the elements of the structure, i.e., the neurons.

In general, indeed the two expressive powers are incomparable. In FO(SUM), we can express queries about the network topology; for example, we may ask to return the hidden units that do not contribute much to the function evaluation on a given input value. (Formally, leaving them out of the network would yield an output within some $\epsilon$ of the original output.) Or, we may ask whether there are more than a million neurons in the first hidden layer. For $\textup{FO}({\mathbf{R}})$ , being a black box language, such queries are obviously out of scope.

A more interesting question is how the two languages compare in expressing model agnostic queries: these are queries that return the same result on any two neural networks that represent the same input–output function. For example, when restricting attention to networks with one hidden layer, the example FO(SUM) formula seen earlier, which evaluates the network, is model agnostic. $\textup{FO}({\mathbf{R}})$ is model agnostic by design, and, indeed, serves as a very natural declarative benchmark of expressiveness for model-agnostic queries. It turns out that FO(SUM), still restricting to networks of some fixed depth, can express model-agnostic queries that $\textup{FO}({\mathbf{R}})$ cannot. For example, for any fixed depth $d$ , we will show that FO(SUM) can express the integrals of a functions given by a ReLU-FNNs of depth $d$ . In contrast, we will show that this cannot be done in $\textup{FO}({\mathbf{R}})$ (Theorem 6.1).

The depth of a neural network can be taken as a crude notion of “schema”. Standard relational query languages typically cannot be used without knowledge of the schema of the data. Similarly, we will show that without knowledge of the depth, FO(SUM) cannot express any nontrivial model-agnostic query (Theorem 6.2). Indeed, since FO(SUM) lacks recursion, function evaluation can only be expressed if we known the depth. (Extensions with recursion is one of the many interesting directions for further research.)

When the depth is known, however, for model-agnostic queries, the expressiveness of FO(SUM) exceeds the benchmark of expressiveness provided by $\textup{FO}({\mathbf{R}_{\rm lin}})$ . Specifically, we show that every $\textup{FO}({\mathbf{R}_{\rm lin}})$ query over functions representable by ReLU-FNNs is also expressible in FO(SUM) evaluated on the networks directly (Theorem 7.1). This is our main technical result, and can be paraphrased as “SQL can verify neural networks.” The proof involves showing that the required manipulations of higher-dimensional piecewise linear functions, and the construction of cylindrical cell decompositions in ${\mathbb{R}}^{n}$ , can all be expressed in FO(SUM). To allow for a modular proof, we also develop the notion of FO(SUM) translation, generalizing the classical notion of first-order interpretations [16].

This paper is organized as follows. Section 2 provides preliminaries on neural networks. Section 3 introduces $\textup{FO}({\mathbf{R}})$ . Section 4 introduces weighted structures and FO(SUM), after which Section 5 introduces white-box querying. Section 6 considers model-agnostic queries. Section 7 presents the main technical result. Section 8 concludes with a discussion of topics for further research.

A full version of this paper with full proofs is available [14].

2 Preliminaries on neural networks

A feedforward neural network [11], in general, could be defined as a finite, directed, weighted, acyclic graph, with some additional aspects which we discuss next. The nodes are also referred to as neurons or units. Some of the source nodes are designated as inputs, and some of the sink nodes are designated as outputs. Both the inputs, and the outputs, are linearly ordered. Neurons that are neither inputs nor outputs are said to be hidden. All nodes, except for the inputs, carry a weight, a real value, called the bias. All directed edges also carry a weight.

In this paper, we focus on ReLU-FNNs: networks with ReLU activations and linear outputs. This means the following. Let ${\mathcal{N}}$ be a neural network with $m$ inputs. Then every node $u$ in ${\mathcal{N}}$ represents a function $F^{{\mathcal{N}}}_{u}:{\mathbb{R}}^{m}\to{\mathbb{R}}$ defined as follows. We proceed inductively based on some topological ordering of ${\mathcal{N}}$ . For input nodes $u$ , simply $F^{{\mathcal{N}}}_{u}(x_{1},\dots,x_{m}):=x_{i}$ , if $u$ is the $i$ th input node. Now let $u$ be a hidden neuron and assume $F^{{\mathcal{N}}}_{v}$ is already defined for all predecessors $v$ of $u$ , i.e., nodes $v$ with an edge to $u$ . Let $v_{1},\dots,v_{l}$ be these predecessors, let $w_{1},\dots,w_{l}$ be the weights on the respective edges, and let $b$ be the bias of $u$ . Then

F^{{\mathcal{N}}}_{u}({\boldsymbol{x}}):=\mathrm{ReLU}(b+\sum_{i}w_{i}F^{{% \mathcal{N}}}_{v_{i}}({\boldsymbol{x}})),

where $\mathrm{ReLU}:{\mathbb{R}}\to{\mathbb{R}}:z\mapsto\max(0,z)$ .

Finally, for an output node $u$ , we define $F^{{\mathcal{N}}}_{u}$ similarly to hidden neurons, except that the application of ReLU is omitted. The upshot is that a neural network ${\mathcal{N}}$ with $m$ inputs and $n$ outputs $u_{1},\dots,u_{n}$ represents a function $F^{{\mathcal{N}}}:{\mathbb{R}}^{m}\to{\mathbb{R}}^{n}$ mapping ${\boldsymbol{x}}$ to $(F^{{\mathcal{N}}}_{u_{1}}({\boldsymbol{x}}),\dots,F^{{\mathcal{N}}}_{u_{n}}({% \boldsymbol{x}}))$ . For any node $u$ in the network, $F^{{\mathcal{N}}}_{u}$ is always a continuous piecewise linear function. We denote the class of all continuous piecewise linear functions $F:{\mathbb{R}}^{m}\to{\mathbb{R}}$ by $\mathcal{P\!\!L}(m)$ ; that is, continuous functions $F$ that admit a partition of ${\mathbb{R}}^{m}$ into finitely many polytopes such that $F$ is affine linear on each of them.

Hidden layers.

Commonly, the hidden neurons are organized in disjoint blocks called layers. The layers are ordered, such that the neurons in the first layer have only edges from inputs, and the neurons in any later layer have only edges from neurons in the previous layer. Finally, outputs have only edges from neurons in the last layer.

We will use $\mathbf{F}(m,\ell)$ to denote the class of layered networks with $m$ inputs of depth $\ell$ , that is, with an input layer with $m$ nodes, $\ell-1$ hidden layers, and an output layer with a single node. Recall that the nodes on all hidden layer use ReLU activations and the output node uses the identity function.

It is easy to see that networks in $\mathbf{F}(1,1)$ just compute linear functions and that for every $\ell\geq 2$ we have $\{F^{{\mathcal{N}}}\mid{\mathcal{N}}\in\mathbf{F}(1,\ell)\}=\mathcal{P\!\!L}(1)$ , that is, the class of functions ${\mathbb{R}}\to{\mathbb{R}}$ that can be computed by a network in $\mathbf{F}(1,\ell)$ is the class of all continuous piecewise linear functions. The well-known Universal Approximation Theorem [9, 17] says that every continuous function $f:K\to{\mathbb{R}}$ defined on a compact domain $K\subseteq{\mathbb{R}}^{m}$ can be approximated to any additive error by a network in $\mathbf{F}(m,2)$ .

3 A black-box query language

First-order logic over the reals, denoted here by $\textup{FO}({\mathbf{R}})$ , is, syntactically, just first-order logic over the vocabulary of elementary arithmetic, i.e., with binary function symbols $+$ and $\cdot$ for addition and multiplication, binary predicate $<$ , and constant symbols $0$ and $1$ [5]. Constants for rational numbers, or even algebraic numbers, can be added as an abbreviation (since they are definable in the logic).

The fragment $\textup{FO}({\mathbf{R}_{\rm lin}})$ of linear formulas uses multiplication only for scalar multiplication, i.e., multiplication of variables with rational number constants. For example, the formula $y=3x_{1}-4x_{2}+7$ is linear, but the formula $y=5x_{1}\cdot x_{2}-3$ is not. In practice, linear queries are often sufficiently expressive, both from earlier applications for temporal or spatial data [21], as well as for querying neural networks (see examples to follow). The only caveat is that many applications assume a distance function on vectors. When using distances based on absolute value differences between real numbers, e.g., the Manhattan distance or the max norm, we still fall within $\textup{FO}({\mathbf{R}_{\rm lin}})$ .

We will add to $\textup{FO}({\mathbf{R}})$ extra relation or function symbols; in this paper, we will mainly consider $\textup{FO}({\mathbf{R}},F)$ , which is $\textup{FO}({\mathbf{R}})$ with an extra function symbol $F$ . The structure on the domain ${\mathbb{R}}$ of reals, with the arithmetic symbols having their obvious interpretation, will be denoted here by ${\mathbf{R}}$ . Semantically, for any vocabulary $\tau$ of extra relation and function symbols, $\textup{FO}({\mathbf{R}},\tau)$ formulas are interpreted over structures that expand ${\mathbf{R}}$ with additional relations and functions on ${\mathbb{R}}$ of the right arities, that interpret the symbols in $\tau$ . In this way, $\textup{FO}({\mathbf{R}},F)$ expresses queries about functions $F\colon{\mathbb{R}}^{m}\to{\mathbb{R}}$ .

This language can express a wide variety of properties (queries) considered in interpretable machine learning and neural-network verification. Let us see some examples.

Example 3.1.

To check whether $F\colon{\mathbb{R}}^{m}\to{\mathbb{R}}$ is robust around an $m$ -vector ${\boldsymbol{a}}$ [32], using parameters $\epsilon$ and $\delta$ , we can write the formula $\forall{\boldsymbol{x}}(d({\boldsymbol{x}},{\boldsymbol{a}})<\epsilon% \Rightarrow|F({\boldsymbol{x}})-F({\boldsymbol{a}})|<\delta)$ . Here ${\boldsymbol{x}}$ stands for a tuple of $m$ variables, and $d$ stands for some distance function which is assumed to be expressible.

Example 3.2.

Counterfactual explanation methods [37] aim to find the closest ${\boldsymbol{x}}$ to an input ${\boldsymbol{a}}$ such that $F({\boldsymbol{x}})$ is “expected,” assuming that $F({\boldsymbol{a}})$ was unexpected. A typical example is credit denial; what should we change minimally to be granted credit? Typically we can define expectedness by some formula, e.g., $F({\boldsymbol{x}})>0.9$ . Then we can express the counterfactual explanation as $F({\boldsymbol{x}})>0.9\land\forall{\boldsymbol{y}}(F({\boldsymbol{y}})>0.9% \Rightarrow d({\boldsymbol{x}},{\boldsymbol{a}})\leq d({\boldsymbol{y}},{% \boldsymbol{a}}))$ .

Example 3.3.

We may define the contribution of an input feature $i$ on an input ${\boldsymbol{a}}=(a_{1},\dots,a_{m})$ as the inverse of the smallest change we have to make to that feature for the output to change significantly. We can express that $r$ is such a change by writing (taking $i=1$ for clarity) $r>0\land(d(F(a_{1}-r,a_{2},\dots,a_{m}),F({\boldsymbol{a}}))>\epsilon\lor d(F(% a_{1}+r,a_{2},\dots,a_{m}),F({\boldsymbol{a}}))>\epsilon)$ . Denoting this formula by $\mathit{change}(r)$ , the smallest change is then expressed as $\mathit{change}(r)\land\forall r^{\prime}(\mathit{change}(r^{\prime})% \Rightarrow r\leq r^{\prime})$ .

Example 3.4.

We finally illustrate that $\textup{FO}({\mathbf{R}},F)$ can express gradients and many other notions from calculus. For simplicity assume $F$ to be unary. Consider the definition $F^{\prime}(a)=\lim_{x\to c}(F(x)-F(c))/(x-c)$ of the derivative in a point $c$ . So it suffices to show how to express that $l=\lim_{x\to c}G(x)$ for a function $G$ that is continuous in $c$ . We can write down the textbook definition literally as $\forall\epsilon>0\,\exists\delta>0\,\forall x(|x-c|<\delta\Rightarrow|G(x)-l|<\epsilon)$ .

Evaluating $\textup{FO}({\mathbf{R}})$ queries.

Black box queries can be effectively evaluated using the decidability and quantifier elimination properties of $\textup{FO}({\mathbf{R}})$ . This is the constraint query language approach [18, 21], which we briefly recall next.

A function $f\colon{\mathbb{R}}^{m}\to{\mathbb{R}}$ is called semialgebraic [5] (or semilinear) if there exists an $\textup{FO}({\mathbf{R}})$ (or $\textup{FO}({\mathbf{R}_{\rm lin}})$ ) formula $\varphi(x_{1},\dots,x_{m},y)$ such that for any $m$ -vector ${\boldsymbol{a}}$ and real value $b$ , we have ${\mathbf{R}}\models\varphi({\boldsymbol{a}},b)$ if and only if $F({\boldsymbol{a}})=b$ .

Now consider the task of evaluating an $\textup{FO}({\mathbf{R}},F)$ formula $\psi$ on a semialgebraic function $f$ , given by a defining formula $\varphi$ . By introducing auxiliary variables, we may assume that the function symbol $F$ is used in $\psi$ only in subformulas for the form $z=F(u_{1},\dots,u_{m})$ . Then replace in $\psi$ each such subformula by $\varphi(u_{1},\dots,u_{m},z)$ , obtaining a pure $\textup{FO}({\mathbf{R}})$ formula $\chi$ .

Now famously, the first-order theory of ${\mathbb{R}}$ is decidable [33, 5]. In other words, there is an algorithm that decides, for any $\textup{FO}({\mathbf{R}})$ formula $\chi(x_{1},\dots,x_{k})$ and $k$ -vector ${\boldsymbol{c}}$ , whether ${\mathbf{R}}\models\chi({\boldsymbol{c}})$ . Actually, a stronger property holds, to the effect that every $\textup{FO}({\mathbf{R}})$ -formula is equivalent to a quantifier-free formula. The upshot is that there is an algorithm that, given a $\textup{FO}({\mathbf{R}},F)$ query $\psi(x_{1},\dots,x_{k})$ and a semialgebraic function $f$ given by a defining formula, outputs a quantifier-free formula defining the result set $\{{\boldsymbol{c}}\in{\mathbb{R}}^{k}\mid{\mathbf{R}},f\models\psi({% \boldsymbol{c}})\}$ . If $f$ is given by a quantifier-free formula, the evaluation can be done in polynomial time in the length of the description of $f$ , so polynomial-time data complexity. This is because there are algorithms for quantifier elimination with complexity $p(n)\cdot e(q)$ , where $n$ is the size of the formula, $p$ is a polynomial, $q$ is the number of quantifiers, and $e$ is a doubly exponential function [18, 5].

Complexity.

Of course, we want to evaluate queries on the functions represented by neural networks. From the definition given in Section 2, it is clear that the functions representable by ReLU-FNNs are always semialgebraic (actually, semilinear). For every output feature $j$ , it is straightforward to compile, from the network, a quantifier-free formula defining the $j$ th output component function. In this way we see that $\textup{FO}({\mathbf{R}},F)$ queries on ReLU-FNNs are, in principle, computable in polynomial time.

However, the algorithms are notoriously complex, and we stress again that $\textup{FO}({\mathbf{R}},F)$ should be mostly seen as a declarative benchmark of expressiveness. Moreover, we assume here for convenience that ReLU is a primitive function. ReLU can be expressed in $\textup{FO}({\mathbf{R}})$ using disjunction, but this may blow up the query formula, e.g., when converting to disjunctive normal form [2]. Symbolic constraint solving algorithms for the reals have been extended to deal with ReLU natively [2].

$\blacktriangleright$ Remark 3.5.

In closing this Section we remark that, to query the entire network function, we would not strictly use just only a single function symbol $F$ , but rather the language $\textup{FO}({\mathbf{R}},F_{1},\dots,F_{n})$ , with function symbols for the $n$ outputs. In this paper, for the sake of clarity, we will often stick to a single output, but our treatment generalizes to multiple outputs.

4 Weighted structures and FO(SUM)

Weighted structures are standard abstract structures equipped with one or more weight functions from tuples of domain elements to values from some separate, numerical domain. Here, as numerical domain, we will use ${{\mathbb{R}}_{\bot}}={\mathbb{R}}\cup\{\bot\}$ , the set of “lifted reals” where $\bot$ is an extra element representing an undefined value. Neural networks are weighted graph structures. Hence, since we are interested in declarative query languages for neural networks, we are interested in logics over weighted structures. Such logics were introduced by Grädel and Gurevich [12]. We consider here a concrete instantiation of their approach, which we denote by FO(SUM).

Recall that a (finite, relational) vocabulary is a finite set of function symbols and relation symbols, where each symbol comes with an arity (a natural number). We extend the notion of vocabulary to also include a number of weight function symbols, again with associated arities. We allow $0$ -ary weight function symbols, which we call weight constant symbols.

A (finite) structure $\mathcal{A}$ over such a vocabulary $\Upsilon$ consists of a finite domain $A$ , and functions and relations on $A$ of the right arities, interpreting the standard function symbols and relation symbols from $\Upsilon$ . So far this is standard. Now additionally, $\mathcal{A}$ interprets every weight function symbol $w$ , of arity $k$ , by a function $w^{\mathcal{A}}:A^{k}\to{{\mathbb{R}}_{\bot}}$ .

The syntax of FO(SUM) formulas (over some vocabulary) is defined exactly as for standard first order logic, with one important extension. In addition to formulas (taking Boolean values) and standard terms (taking values in the structure), the logic contains weight terms taking values in ${{\mathbb{R}}_{\bot}}$ . Weight terms $t$ are defined by the following grammar:

t::=\bot\mid w(s_{1},\dots,s_{n})\mid r(t,\dots,t)\mid\textsf{if $\varphi$ % then $t$ else $t$}\mid\sum_{{\boldsymbol{x}}:\varphi}t

Here, $w$ is a weight function symbol of arity $n$ and the $s_{i}$ are standard terms; $r$ is a rational function applied to weight terms, with rational coefficients; $\varphi$ is a formula; and ${\boldsymbol{x}}$ is a tuple of variables. The syntax of weight terms and formulas is mutually recursive. As just seen, the syntax of formulas $\varphi$ is used in the syntax of weight terms; conversely, weight terms $t_{1}$ and $t_{2}$ can be combined to form formulas $t_{1}=t_{2}$ and $t_{1}<t_{2}$ .

Recall that a rational function is a fraction between two polynomials. Thus, the arithmetic operations that we consider are addition, scalar multiplication by a rational number, multiplication, and division.

The free variables of a weight term are defined as follows. The weight term $\bot$ has no free variables. The free variables of $w(s_{1},\dots,\allowbreak s_{n})$ are simply the variables occurring in the $s_{i}$ . A variable occurs free in $r(t_{1},\dots,t_{n})$ if it occurs free in some $t_{i}$ . A variable occurs free in ‘if $\varphi$ then $t_{1}$ else $t_{2}$ ’ if it occurs free in $t_{1}$ , $t_{2}$ , or $\varphi$ . The free variables of $\sum_{{\boldsymbol{x}}:\varphi}t$ are those of $\varphi$ and $t$ , except for the variables in ${\boldsymbol{x}}$ . A formula or (weight) term is closed if it has no free variables.

We can evaluate a weight term $t(x_{1},\dots,x_{k})$ on a structure $\mathcal{A}$ and a tuple ${\boldsymbol{a}}\in A^{k}$ providing values to the free variables. The result of the evaluation, denoted by $t^{\mathcal{A},{\boldsymbol{a}}}$ , is a value in ${{\mathbb{R}}_{\bot}}$ , defined in the obvious manner. In particular, when $t$ is of the form $\sum_{{\boldsymbol{y}}:\varphi}t^{\prime}$ , we have

t^{\mathcal{A},{\boldsymbol{a}}}=\sum_{{\boldsymbol{b}}:\mathcal{A}\models% \varphi({\boldsymbol{a}},{\boldsymbol{b}})}t^{\prime\mathcal{A},{\boldsymbol{a% }},{\boldsymbol{b}}}.

Division by zero, which can happen when evaluating terms of the form $r(t,\dots,t)$ , is given the value $\bot$ . The arithmetical operations are extended so that $x+\bot$ , $q\bot$ (scalar multiply), $x\cdot\bot$ , and $x/\bot$ and $\bot/x$ always equal $\bot$ . Also, $\bot<a$ holds for all $a\in{\mathbb{R}}$ .

5 White-box querying

For any natural numbers $m$ and $n$ , we introduce a vocabulary for neural networks with $m$ inputs and $n$ outputs. We denote this vocabulary by $\Upsilon^{\mathrm{net}}(m,n)$ , or just $\Upsilon^{\mathrm{net}}$ if $m$ and $n$ are understood. It has a binary relation symbol $E$ for the edges; constant symbols $\mathrm{in}_{1}$ , …, $\mathrm{in}_{m}$ and $\mathrm{out}_{1}$ , …, $\mathrm{out}_{n}$ for the input and output nodes; a unary weight function $b$ for the biases, and a binary weight function symbol $w$ for the weights on the edges.

Any ReLU-FNN ${\mathcal{N}}$ , being a weighted graph, is an $\Upsilon^{\mathrm{net}}$ -structure in the obvious way. When there is no edge from node $u_{1}$ to $u_{2}$ , we put $w^{\mathcal{N}}(u_{1},u_{2})=0$ . Since inputs have no bias, we put $b^{\mathcal{N}}(u)=\bot$ for any input $u$ .

Depending on the application, we may want to enlarge $\Upsilon^{\mathrm{net}}$ with some additional parameters. For example, we can use additional weight constant symbols to provide input values to be evaluated, or output values to be compared with, or interval bounds, etc.

The logic FO(SUM) over the vocabulary $\Upsilon^{\mathrm{net}}$ (possibly enlarged as just mentioned) serves as a “white-box” query language for neural networks, since the entire model is given and can be directly queried, just like an SQL query can be evaluated on a given relational database. Contrast this with the language $\textup{FO}({\mathbf{R}},F)$ from Section 3, which only has access to the function $F$ represented by the network, as a black box.

Example 5.1.

While the language $\textup{FO}({\mathbf{R}},F)$ cannot see inside the model, at least it has direct access to the function represented by the model. When we use the language FO(SUM), we must compute this function ourselves. At least when we know the depth of the network, this is indeed easy. In the Introduction, we already showed a weight term expressing the evaluation of a one-layer neural network on a single input and output. We can easily generalize this to a weight term expressing the value of any of a fixed number of outputs, with any fixed number $m$ of inputs, and any fixed number of layers. Let $\mathit{val}_{1}$ , …, $\mathit{val}_{m}$ be additional weight constant symbols representing input values. Then the weight term $\mathrm{ReLU}(b(u)+w(\mathrm{in}_{1},u)\cdot\mathit{val}_{1}+\cdots+w(\mathrm{% in}_{m},u)\cdot\mathit{val}_{m})$ expresses the value of any neuron $u$ in the first hidden layer ( $u$ is a variable). Denote this term by $t_{1}(u)$ . Next, for any subsequent layer numbered $l>1$ , we inductively define the weight term $t_{l}(u)$ as

\mathrm{ReLU}(b(u)+\sum_{x:E(x,u)}w(x,u)\cdot t_{l-1}(x)).

Here, $\mathrm{ReLU}(c)$ can be taken to be the weight term if $c>0$ then $c$ else 0. Finally, the value of the $j$ th output is given by the weight term $\mathit{eval}_{j}:=b(\mathrm{out}_{j})+\sum_{x:E(x,\mathrm{out}_{j})}w(x,% \mathrm{out}_{j})\cdot t_{l}(x)$ , where $l$ is the number of the last hidden layer.

Example 5.2.

We can also look for useless neurons: neurons that can be removed from the network without altering the output too much on given values. Recall the weight term $\mathit{eval}_{j}$ from the previous example; for clarity we just write $\mathit{eval}$ . Let $z$ be a fresh variable, and let $\mathit{eval}^{\prime}$ be the term obtained from $\mathit{eval}$ by altering the summing conditions $E(x,u)$ and $E(x,\mathrm{out})$ by adding the conjunct $x\neq z$ . Then the formula $|\mathit{eval}-\mathit{eval}^{\prime}|<\epsilon$ expresses that $z$ is useless. (For $|c|$ we can take the weight term if $c>0$ then $c$ else $-c$ .)

Another interesting example is computing integrals. Recall that $\mathbf{F}(m,\ell)$ is the class of networks with $m$ inputs, one output, and depth $\ell$ .

Lemma 5.3.

Let $m$ and $\ell$ be natural numbers. There exists an FO(SUM) term $t$ over $\Upsilon^{\mathrm{net}}(m,1)$ with $m$ additional pairs of weight constant symbols $\mathit{min}_{i}$ and $\mathit{max}_{i}$ for $i\in\{1,\dots,m\}$ , such that for any network ${\mathcal{N}}$ in $\mathbf{F}(m,\ell)$ , and values $a_{i}$ and $b_{i}$ for the $\mathit{min}_{i}$ and $\mathit{max}_{i}$ , we have $t^{{\mathcal{N}},a_{1},b_{1},\dots,a_{m},b_{m}}=\int_{a_{1}}^{b_{1}}\cdots\int% _{a_{m}}^{b_{m}}F^{{\mathcal{N}}}\,dx_{1}\dots dx_{m}$ .

Proof (sketch).

We sketch here a self-contained and elementary proof for $m=1$ and $\ell=2$ (one input, one hidden layer). This case already covers all continuous piecewise linear functions ${\mathbb{R}}\to{\mathbb{R}}$ .

Every hidden neuron $u$ may represent a “quasi breakpoint” in the piecewise linear function (that is, a point where its slope may change). Concretely, we consider the hidden neurons with nonzero input weights to avoid dividing by zero. Its $x$ -coordinate is given by the weight term $\mathit{break}_{x}(u):=-b(u)/w(\mathrm{in}_{1},u)$ . The $y$ -value at the breakpoint is then given by $\mathit{break}_{y}(u)\mathrel{\mathord{:}=}\mathit{eval}_{1}(\mathit{break}_{x% }(u))$ , where $\mathit{eval}_{1}$ is the weight term from Example 5.1 and we substitute $\mathit{break}_{x}(u)$ for $\mathit{val}_{1}$ .

Pairs $(u_{1},u_{2})$ of neurons representing successive breakpoints are easy to define by a formula $\textit{succ}(u_{1},u_{2})$ . Such pairs represent the pieces of the function, except for the very first and very last pieces. For this proof sketch, assume we simply want the integral between the first breakpoint and the last breakpoint.

The area (positive or negative) contributed to the integral by the piece $(u_{1},u_{2})$ is easy to write as a weight term: $\mathit{area}(u_{1},u_{2})=\frac{1}{2}(\mathit{break}_{y}(u_{1})+\mathit{break% }_{y}(u_{2}))(\mathit{break}_{x}(u_{2})-\mathit{break}_{x}(u_{1}))$ . We sum these to obtain the desired integral. However, since different neurons may represent the same quasi breakpoint, we must divide by the number of duplicates. Hence, our desired term $t$ equals $\sum_{u_{1},u_{2}:\textit{succ}(u_{1},u_{2})}\mathit{area}(u_{1},u_{2})/(\sum_% {u^{\prime}_{1},u^{\prime}_{2}:\gamma}1),$ where $\gamma$ is the formula $\textit{succ}(u^{\prime}_{1},u^{\prime}_{2})\land\mathit{break}_{x}(u^{\prime}% _{1})=\mathit{break}_{x}(u_{1})\land\mathit{break}_{x}(u^{\prime}_{2})=\mathit% {break}_{x}(u_{2})$ . $\hfill\blacktriangleleft$

Example 5.4.

A popular alternative to Example 3.3 for measuring the contribution of an input feature $i$ to an input ${\boldsymbol{y}}=(y_{1},\ldots,y_{m})$ is the $\operatorname{\text{{Shap}}}$ score [27]. It assumes a probability distribution $\mathbb{P}$ on the input space and quantifies the change to the expected value of $F^{{\mathcal{N}}}$ caused by fixing input feature $i$ to $y_{i}$ in a random fixation order of the input features:

\operatorname{\text{{Shap}}}(i)={}\\ \mkern-6.0mu\sum_{I\subseteq\{1,\ldots,m\}\setminus\{i\}}\mkern-6.0mu\frac{% \lvert I\rvert!(m-1-\lvert I\rvert!)}{m!}\Big{(}\mathbb{E}\big{(}F^{{\mathcal{% N}}}({\boldsymbol{x}})\mid{\boldsymbol{x}}_{I\cup\{i\}}={\boldsymbol{y}}_{I% \cup\{i\}}\big{)}-\mathbb{E}\big{(}F^{{\mathcal{N}}}({\boldsymbol{x}})\mid{% \boldsymbol{x}}_{I}={\boldsymbol{y}}_{I}\big{)}\Big{)}.

When we assume that $\mathbb{P}$ is the product of uniform distributions over the intervals $(a_{j},b_{j})$ , we can write the conditional expectation $\mathbb{E}\big{(}F^{{\mathcal{N}}}({\boldsymbol{x}})\mid{\boldsymbol{x}}_{J}={% \boldsymbol{y}}_{J}\big{)}$ for some $J\subseteq\{1,\ldots,m\}$ by setting $\{1,\ldots m\}\setminus J\mathrel{=\mathord{:}}\{j_{1},\ldots,j_{r}\}$ as follows.

\mathbb{E}\big{(}F^{{\mathcal{N}}}({\boldsymbol{x}})\mid{\boldsymbol{x}}_{J}={% \boldsymbol{y}}_{J}\big{)}=\ \prod_{k=1}^{r}\frac{1}{b_{j_{k}}-a_{j_{k}}}\cdot% \int_{a_{j_{1}}}^{b_{j_{1}}}\cdots\int_{a_{j_{r}}}^{b_{j_{r}}}F^{{\mathcal{N}}% }({\boldsymbol{x}}|_{{\boldsymbol{x}}_{J}={\boldsymbol{y}}_{J}})\,dx_{j_{r}}% \ldots dx_{j_{1}}

where ${\boldsymbol{x}}|_{{\boldsymbol{x}}_{J}={\boldsymbol{y}}_{J}}$ is a short notation for the variable obtained from ${\boldsymbol{x}}$ by replacing $x_{j}$ with $y_{j}$ for all $j\in J$ . With lemma 5.3, this conditional expectation can be expressed in FO(SUM) and by replacing $J$ with $I$ or $I\cup\{i\}$ respectively, we can express the $\operatorname{\text{{Shap}}}$ score.

More examples.

Our main result will be that, over networks of a given depth, all of $\textup{FO}({\mathbf{R}_{\rm lin}},F)$ can be expressed in FO(SUM). So the examples from Section 3 (which are linear if a Manhattan or max distance is used) apply here as well. Moreover, the techniques by which we show our main result readily adapt to queries not about the final function $F$ represented by the network, but about the function $F_{z}$ represented by a neuron $z$ given as a parameter to the query, much as in Example 5.2. For example, in feature visualization [8] we want to find the input that maximizes the activation of some neuron $z$ . Since this is expressible in $\textup{FO}({\mathbf{R}_{\rm lin}},F)$ , it is also expressible in FO(SUM).

6 Model-agnostic queries

We have already indicated that $\textup{FO}({\mathbf{R}},F)$ is “black box” while FO(SUM) is “white box”. Black-box queries are commonly called model agnostic [8]. Some FO(SUM) queries may, and others may not, be model agnostic.

Formally, for some $\ell\geq 1$ , let us call a closed FO(SUM) formula $\varphi$ , possibly using weight constants $c_{1},\ldots,c_{k}$ , depth- $\ell$ model agnostic if for all $m\geq 1$ all neural networks ${\mathcal{N}},{\mathcal{N}}^{\prime}\in\bigcup_{i=1}^{\ell}\mathbf{F}(m,i)$ such that $F^{{\mathcal{N}}}=F^{{\mathcal{N}}^{\prime}}$ , and all $a_{1},\ldots,a_{k}\in{\mathbb{R}}$ we have ${\mathcal{N}},a_{1},\ldots,a_{k}\models\varphi$ $\Leftrightarrow$ ${\mathcal{N}}^{\prime},a_{1},\ldots,a_{k}\models\varphi$ . A similar definition applies to closed FO(SUM) weight terms.

For example, the term of Example 5.1 evaluating the function of a neural network of depth at most $\ell$ is depth- $\ell$ model agnostic. By comparison the formula stating that a network has useless neurons (cf. Example 5.2) is not model agnostic. The term $t$ from Lemma 5.3, computing the integral, is depth- $\ell$ model agnostic.

Theorem 6.1.

The query $\int_{0}^{1}f=0$ for functions $f\in\mathcal{P\!\!L}(1)$ is expressible by a depth- $2$ agnostic FO(SUM) formula, but not in $\textup{FO}({\mathbf{R}},F)$ .

Proof.

We have already seen the expressibility in FO(SUM). We prove nonexpressibility in $\textup{FO}({\mathbf{R}},F)$ .

Consider the equal-cardinality query $Q$ about disjoint pairs $(S_{1},S_{2})$ of finite sets of reals, asking whether $|S_{1}|=|S_{2}|$ . Over abstract ordered finite structures, equal cardinality is well-known not to be expressible in order-invariant first-order logic [24]. Hence, by the generic collapse theorem for constraint query languages over the reals [21, 24], query $Q$ is not expressible in $\textup{FO}({\mathbf{R}},S_{1},S_{2})$ .

Figure 1: The function

f_{S_{1},S_{2}}

of the proof of Theorem 6.1 for the set

S_{1}

consisting of the three red points and the set

S_{2}

consisting of the three white points.

Now for any given $S_{1}$ and $S_{2}$ , we construct a continuous piecewise linear function $f_{S_{1},S_{2}}$ as follows. We first apply a suitable affine transformation so that $S_{1}\cup S_{2}$ falls within the open interval $(0,1)$ . Now $f_{S_{1},S_{2}}$ is a sawtooth-like function, with positive teeth at elements from $S_{1}$ , negative teeth (of the same height, say 1) at elements from $S_{2}$ , and zero everywhere else. To avoid teeth that overlap the zero boundary at the left or that overlap each other, we make them of width $\min\{m,M\}/2$ , where $m$ is the minimum of $S_{1}\cup S_{2}$ and $M$ is the minimum distance between any two distinct elements in $S_{1}\cup S_{2}$ .

Expressing the above construction uniformly in $\textup{FO}({\mathbf{R}},S_{1},S_{2})$ poses no difficulties; let $\psi(x,y)$ be a formula defining $f_{S_{1},S_{2}}$ . Now assume, for the sake of contradiction, that $\int_{0}^{1}F=0$ would be expressible by a closed $\textup{FO}({\mathbf{R}},F)$ formula $\varphi$ . Then composing $\varphi$ with $\psi$ would express query $Q$ in $\textup{FO}({\mathbf{R}},S_{1},S_{2})$ . Indeed, clearly, $\int_{0}^{1}f_{S_{1},S_{2}}=0$ if and only if $|S_{1}|=|S_{2}|$ . So, $\varphi$ cannot exist. $\hfill\blacktriangleleft$

It seems awkward that in the definition of model agnosticity we need to bound the depth. Let us call an FO(SUM) term or formula fully model agnostic if is depth- $\ell$ model agnostic for every $\ell$ . It turns out that there are no nontrivial fully model agnostic FO(SUM) formulas.

Theorem 6.2.

Let $\varphi$ be a fully model agnostic closed FO(SUM) formula over $\Upsilon^{\mathrm{net}}(m,1)$ . Then either ${\mathcal{N}}\models\varphi$ for all ${\mathcal{N}}\in\bigcup_{\ell\geq 1}\mathbf{F}(m,\ell)$ or ${\mathcal{N}}\not\models\varphi$ for all ${\mathcal{N}}\in\bigcup_{\ell\geq 1}\mathbf{F}(m,\ell)$ .

We omit the proof. The idea is that FO(SUM) is Hanf-local [23, 24]. No formula $\varphi$ can distinguish a long enough structures consisting of two chains where the middle nodes are marked by two distinct constants $c_{1}$ and $c_{2}$ , from its sibling structure where the markings are swapped. We can turn the two structures into neural networks by replacing the markings by two gadget networks $N_{1}$ and $N_{2}$ , representing different functions, that $\varphi$ is supposed to distinguish. However, the construction is done so that the function represented by the structure is the same as that represented by the gadget in the left chain. Still, FO(SUM) cannot distinguish these two structures. So, $\varphi$ is either not fully model-agnostic, or $N_{1}$ and $N_{2}$ cannot exist and $\varphi$ is trivial.

Corollary 6.3.

The $\textup{FO}({\mathbf{R}},F)$ query $F(0)=0$ is not expressible in FO(SUM).

7 From FO( $\mathbf{R}_{\rm lin}$ ) to FO(SUM)

In practice, the number of layers in the employed neural network architecture is often fixed and known. Our main result then is that FO(SUM) can express all $\textup{FO}({\mathbf{R}_{\rm lin}})$ queries.

Theorem 7.1.

Let $m$ and $\ell$ be natural numbers. For every closed $\textup{FO}({\mathbf{R}_{\rm lin}},F)$ formula $\psi$ there exists a closed FO(SUM) formula $\varphi$ such that for every network ${\mathcal{N}}$ in $\mathbf{F}(m,\ell)$ , we have ${\mathbf{R}},F^{{\mathcal{N}}}\models\psi$ iff ${\mathcal{N}}\models\varphi$ .

The challenge in proving this result is to simulate, using quantification and summation over neurons, the unrestricted access to real numbers that is available in $\textup{FO}({\mathbf{R}_{\rm lin}})$ . Thereto, we will divide the relevant real space in a finite number of cells which we can represent by finite tuples of neurons.

The proof involves several steps that transform weighted structures. Before presenting the proof, we formalize such transformations in the notion of FO(SUM) translation, which generalize the classical notion of first-order interpretation [16] to weighted structures.

7.1 FO(SUM) translations

Let $\Upsilon$ and $\Gamma$ be vocabularies for weighted structures, and let $n$ be a natural number. An $n$ -ary FO(SUM) translation $\varphi$ from $\Upsilon$ to $\Gamma$ consists of a number of formulas and weight terms over $\Upsilon$ , described next. There are formulas $\varphi_{\mathrm{dom}}({\boldsymbol{x}})$ and $\varphi_{=}({\boldsymbol{x}}_{1},{\boldsymbol{x}}_{2})$ ; formulas $\varphi_{R}({\boldsymbol{x}}_{1},\dots,{\boldsymbol{x}}_{k})$ for every $k$ -ary relation symbol $R$ of $\Gamma$ ; and formulas $\varphi_{f}({\boldsymbol{x}}_{0},{\boldsymbol{x}}_{1},\dots,{\boldsymbol{x}}_{% k})$ for every $k$ -ary standard function symbol $f$ of $\Gamma$ . Furthermore, there are weight terms $\varphi_{w}({\boldsymbol{x}}_{1},\dots,{\boldsymbol{x}}_{k})$ for every $k$ -ary weight function $w$ of $\Gamma$ .

In the above description, bold ${\boldsymbol{x}}$ denote $n$ -tuples of distinct variables. Thus, the formulas and weight terms of $\varphi$ define relations or weight functions of arities that are a multiple of $n$ .

We say that $\varphi$ maps a weighted structure $\mathcal{A}$ over $\Upsilon$ to a weighted structure $\mathcal{B}$ over $\Gamma$ if there exists a surjective function $h$ from $\varphi_{\mathrm{dom}}({\mathcal{A}})\subseteq A^{n}$ to $B$ such that:

$\blacksquare$

$h({\boldsymbol{a}}_{1})=h({\boldsymbol{a}}_{2})\Leftrightarrow\mathcal{A}% \models\varphi_{=}({\boldsymbol{a}}_{1},{\boldsymbol{a}}_{2})$ ;
$\blacksquare$

$(h({\boldsymbol{a}}_{1}),\dots,h({\boldsymbol{a}}_{k}))\in R^{\mathcal{B}}% \Leftrightarrow\mathcal{A}\models\varphi_{R}({\boldsymbol{a}}_{1},\dots,{% \boldsymbol{a}}_{k})$ ;
$\blacksquare$

$(h({\boldsymbol{a}}_{0})=f^{\mathcal{B}}(h({\boldsymbol{a}}_{1}),\dots,h({% \boldsymbol{a}}_{k}))\Leftrightarrow\mathcal{A}\models\varphi_{f}({\boldsymbol% {a}}_{0},{\boldsymbol{a}}_{1},\dots,{\boldsymbol{a}}_{k})$ ;
$\blacksquare$

$w^{\mathcal{B}}(h({\boldsymbol{a}}_{1}),\dots,h({\boldsymbol{a}}_{m}))=\varphi% _{w}^{\mathcal{A}}({\boldsymbol{a}}_{1},\dots,{\boldsymbol{a}}_{n})$ .

In the above, the bold ${\boldsymbol{a}}$ denote $n$ -tuples in $\varphi_{\mathrm{dom}}(\mathcal{A})$ .

For any given $\mathcal{A}$ , if $\varphi$ maps $\mathcal{A}$ to $\mathcal{B}$ , then $\mathcal{B}$ is unique up to isomorphism. Indeed, the elements of $B$ can be understood as representing the equivalence classes of the equivalence relation $\varphi_{=}(\mathcal{A})$ on $\varphi_{\mathrm{dom}}(\mathcal{A})$ . In particular, for $\mathcal{B}$ to exist, $\varphi$ must be admissible on $\mathcal{A}$ , which means that $\varphi_{=}(\mathcal{A})$ is indeed an equivalence relation on $\varphi_{\mathrm{dom}}({\mathcal{A}})$ , and all relations and all functions $\varphi_{R}(\mathcal{A})$ , $\varphi_{f}(\mathcal{A})$ and $\varphi_{w}(\mathcal{A})$ are invariant under this equivalence relation.

If $\mathbf{K}$ is a class of structures over $\Upsilon$ , and $T$ is a transformation of structures in $\mathbf{K}$ to structures over $\Gamma$ , we say that $\varphi$ expresses $T$ if $\varphi$ is admissible on every $\mathcal{A}$ in $\mathbf{K}$ , and maps $\mathcal{A}$ to $T(\mathcal{A})$ .

The relevant reduction theorem for translations is the following:

Theorem 7.2.

Let $\varphi$ be an $n$ -ary FO(SUM) translation from $\Upsilon$ to $\Gamma$ , and let $\psi(y_{1},\dots,y_{k})$ be a formula over $\Gamma$ . Then there exists a formula $\varphi_{\psi}({\boldsymbol{x}}_{1},\dots,{\boldsymbol{x}}_{k})$ over $\Upsilon$ such that whenever $\varphi$ maps $\mathcal{A}$ to $\mathcal{B}$ through $h$ , we have $\mathcal{B}\models\psi(h({\boldsymbol{a}}_{1}),\dots,h({\boldsymbol{a}}_{k}))$ iff $\mathcal{A}\models\varphi_{\psi}({\boldsymbol{a}}_{1},\dots,{\boldsymbol{a}}_{% k})$ . Furthermore, for any weight term $t$ over $\Gamma$ , there exists a weight term $\varphi_{t}$ over $\Upsilon$ such that $t^{\mathcal{B}}(h({\boldsymbol{a}}_{1}),\dots,h({\boldsymbol{a}}_{k}))=\varphi% _{t}^{\mathcal{A}}({\boldsymbol{a}}_{1},\dots,{\boldsymbol{a}}_{k})$ .

Proof (sketch).

As this result is well known and straightforward to prove for classical first-order interpretations, we only deal here with summation terms, which are the main new aspect. Let $t$ be of the form $\sum_{y:\gamma}t^{\prime}$ . Then for $\varphi_{t}$ we take $\sum_{{\boldsymbol{x}}:\varphi_{\gamma}}{\varphi_{t^{\prime}}({\boldsymbol{x}}% _{1},\dots,{\boldsymbol{x}}_{k},{\boldsymbol{x}})}/(\sum_{{\boldsymbol{x}}^{% \prime}:\varphi_{=}({\boldsymbol{x}},{\boldsymbol{x}}^{\prime})}1)$ . $\hfill\blacktriangleleft$

7.2 Proof of Theorem 7.1

We sketch the proof of Theorem 7.1. For clarity of exposition, we present it first for single inputs, i.e., the case $m=1$ . We present three Lemmas which can be chained together to obtain the theorem.

Piecewise linear functions.

We can naturally model piecewise linear (PWL) functions from ${\mathbb{R}}$ to ${\mathbb{R}}$ as weighted structures, where the elements are simply the pieces. Each piece $p$ is defined by a line $y=ax+b$ and left and right endpoints. Accordingly, we use a vocabulary $\Upsilon^{\mathrm{pwl}}_{1}$ with four unary weight functions indicating $a$ , $b$ , and the $x$ -coordinates of the endpoints. (The left- and rightmost pieces have no endpoint; we set their $x$ -coordinate to $\bot$ .)

For $m=1$ and $\ell=2$ , the proof of the following Lemma is based on the same ideas as in the proof sketch we gave for Lemma 5.3. For $m>1$ , PWL functions from ${\mathbb{R}}^{m}$ to ${\mathbb{R}}$ are more complex; the vocabulary $\Upsilon^{\mathrm{pwl}}_{m}$ and a general proof of the lemma will be described in Section 7.3.

Lemma 7.3.

Let $m$ and $\ell$ be natural numbers. There is an FO(SUM) translation from $\Upsilon^{\mathrm{net}}(m,1)$ to $\Upsilon^{\mathrm{pwl}}_{m}$ that transforms every network ${\mathcal{N}}$ in $\mathbf{F}(m,\ell)$ into a proper weighted structure representing $F^{{\mathcal{N}}}$ .

Hyperplane arrangements.

An affine function on ${\mathbb{R}}^{d}$ is a function of the form $a_{0}+a_{1}x_{1}+\cdots+a_{d}x_{d}$ . An affine hyperplane is the set of zeros of some non-constant affine function (i.e. where at least one of the $a_{i}$ with $i>0$ is non-zero). A hyperplane arrangement is a collection of affine hyperplanes.

We naturally model a hyperplane arrangement as a weighted structure, where the elements are the hyperplanes. The vocabulary $\Upsilon^{\mathrm{arr}}_{d}$ simply consists of unary weight functions $a_{0}$ , $a_{1}$ , …, $a_{d}$ indicating the coefficients of the affine function defining each hyperplane.

$\blacktriangleright$ Remark 7.4.

An $\Upsilon^{\mathrm{arr}}_{d}$ -structure may have duplicates, i.e., different elements representing the same hyperplane. This happens when they have the same coefficients up to a constant factor. In our development, we will allow structures with duplicates as representations of hyperplane arrangements.

Cylindrical decomposition.

We will make use of a linear version of the notion of cylindrical decomposition (CD) [5], which we call affine CD. An affine CD of ${\mathbb{R}}^{d}$ is a sequence ${\mathcal{D}}={\mathcal{D}}_{0},\dots,{\mathcal{D}}_{d}$ , where each ${\mathcal{D}}_{i}$ is a partition of ${\mathbb{R}}^{i}$ . The blocks of partition ${\mathcal{D}}_{i}$ are referred to as $i$ -cells or simply cells. The precise definition is by induction on $d$ . For the base case, there is only one possibility ${\mathcal{D}}_{0}=\{{\mathbb{R}}^{0}\}$ . Now let $d>0$ . Then ${\mathcal{D}}_{0},\dots,{\mathcal{D}}_{d-1}$ should already be an affine CD of ${\mathbb{R}}^{d-1}$ . Furthermore, for every cell $C$ of ${\mathcal{D}}_{d-1}$ , there must exist finitely many affine functions $\xi_{1}$ , …, $\xi_{r}$ from ${\mathbb{R}}^{d-1}$ to ${\mathbb{R}}$ , where $r$ may depend on $C$ . These are called the section mappings above $C$ , and must satisfy $\xi_{1}<\cdots<\xi_{r}$ on $C$ . In this way, the section mappings induce a partition of the cylinder $C\times{\mathbb{R}}$ in sections and sectors. Each section is the graph of a section mapping, restricted to $C$ . Each sector is the volume above $C$ between two consecutive sections. Now ${\mathcal{D}}_{d}$ must equal $\{C\times S\mid C\in{\mathcal{D}}_{d-1}$ and $S$ is a section or sector above $C\}$ .

The ordered sequence of cells $C\times S$ formed by the sectors and sections of $C$ is called the stack above $C$ , and $C$ is called the base cell for these cells.

An affine CD of ${\mathbb{R}}^{d}$ is compatible with a hyperplane arrangement $\mathcal{A}$ in ${\mathbb{R}}^{d}$ if every every $d$ -cell $C$ lies entirely on, or above, or below every hyperplane $h=0$ . (Formally, the affine function $h$ is everywhere zero, or everywhere positive, or everywhere negative, on $C$ .)

We can represent a CD compatible with a hyperplane arrangement as a weighted structure with elements of two sorts: cells and hyperplanes. There is a constant $o$ for the “origin cell” ${\mathbb{R}}^{0}$ . Binary relations link every $i+1$ -cell to its base $i$ -cell, and to its delineating section mappings. (Sections are viewed as degenerate sectors where the two delineating section mappings are identical.) Ternary relations give the order of two hyperplanes in ${\mathbb{R}}_{i+1}$ above an $i$ -cell, and whether they are equal. The vocabulary for CDs of ${\mathbb{R}}^{d}$ is denoted by $\Upsilon^{\mathrm{cell}}_{d}$ .

Lemma 7.5.

Let $d$ be a natural number. There is an FO(SUM) translation from $\Upsilon^{\mathrm{arr}}_{d}$ to $\Upsilon^{\mathrm{cell}}_{d}$ that maps any hyperplane arrangement $\mathcal{A}$ to a CD that is compatible with $\mathcal{A}$ .

Proof (sketch).

We follow the method of vertical decomposition [15]. There is a projection phase, followed by a buildup phase. For the projection phase, let $\mathcal{A}_{d}:=\mathcal{A}$ . For $i=d,\dots,1$ , take all intersections between hyperplanes in $\mathcal{A}_{i}$ , and project one dimension down, i.e., project on the first $i-1$ components. The result is a hyperplane arrangement $\mathcal{A}_{i-1}$ in ${\mathbb{R}}^{i-1}$ . For the buildup phase, let ${\mathcal{D}}_{0}:=\{{\mathbb{R}}^{0}\}$ . For $i=0,\dots,d-1$ , build a stack above every cell $C$ in ${\mathcal{D}}_{i}$ formed by intersecting $C\times{\mathbb{R}}$ with all hyperplanes in $\mathcal{A}_{i+1}$ . The result is a partition ${\mathcal{D}}_{i+1}$ such that ${\mathcal{D}}_{0},\dots,{\mathcal{D}}_{i+1}$ is a CD of ${\mathbb{R}}^{i+1}$ compatible with $\mathcal{A}_{i+1}$ . This algorithm is implementable in FO(SUM). $\hfill\blacktriangleleft$

Ordered formulas and cell selection.

Let $\psi$ be the $\textup{FO}({\mathbf{R}_{\rm lin}},F)$ formula under consideration. Let $x_{1},\dots,x_{d}$ be an enumeration of the set of variables in $\psi$ , free or bound. We may assume that $\psi$ is in prenex normal form $Q_{1}x_{1}\dots Q_{d}x_{d}\,\chi$ , where each $Q_{i}$ is $\exists$ or $\forall$ , and $\chi$ is quantifier-free.

We will furthermore assume that $\psi$ is ordered, meaning that every atomic subformula is of the form $F(x_{i_{1}},\dots,x_{i_{m}})=x_{j}$ with $i_{1}<\cdots<{i_{m}}<j$ , or is a linear constraint of the form $a_{0}+a_{1}x_{1}+\cdots+a_{d}x_{d}>0$ . By using extra variables, every $\textup{FO}({\mathbf{R}_{\rm lin}},F)$ formula can be brought in ordered normal form.

Consider a PWL function $f:{\mathbb{R}}\to{\mathbb{R}}$ . Every piece is a segment of a line $ax+b=y$ in ${\mathbb{R}}^{2}$ . We define the hyperplane arrangement corresponding to $f$ in $d$ dimensions to consist of all hyperplanes $ax_{i}+b=x_{j}$ , for all lines $ax+b=y$ of $f$ , where $i<j$ (in line with the ordered assumption on the formula $\psi$ ). We denote this arrangement by $\mathcal{A}_{f}$ .

Also the query $\psi$ gives rise to a hyperplane arrangement, denoted by $\mathcal{A}_{\psi}$ , which simply consisting of all hyperplanes corresponding to the linear constraints in $\psi$ .

For the following statement, we use the disjoint union $\uplus$ of two weighted structures. Such a disjoint union can itself be represented as a weighted structure over the disjoint union of the two vocabularies, with two extra unary relations to distinguish the two domains.

Lemma 7.6.

Let $\psi\equiv Q_{1}x_{1}\dots Q_{d}x_{d}\,\chi$ be an ordered closed $\textup{FO}({\mathbf{R}_{\rm lin}},F)$ formula with function symbol $F$ of arity $m$ . Let $k\in\{0,\dots,d\}$ , and let $\psi_{k}$ be $Q_{k+1}x_{k+1}\dots Q_{d}x_{d}\,\chi$ . There exists a unary FO(SUM) query over $\Upsilon^{\mathrm{pwl}}_{m}\uplus\Upsilon^{\mathrm{cell}}_{d}$ that returns, on any piecewise linear function $f:{\mathbb{R}}^{m}\to{\mathbb{R}}$ and any CD ${\mathcal{D}}$ of ${\mathbb{R}}^{d}$ compatible with $\mathcal{A}_{f}\cup\mathcal{A}_{\psi}$ , a set of cells in ${\mathbb{R}}^{k}$ whose union equals $\{(v_{1},\dots,v_{k})\mid{\mathbf{R}},f\models\psi(v_{1},\dots,v_{k})\}$ .

Proof (sketch).

As already mentioned we focus first on $m=1$ . The proof is by downward induction on $k$ . The base case $k=d$ deals with the quantifier-free part of $\psi$ . We focus on the atomic subformulas. Subformulas of the form $F(x_{i})=x_{j}$ are dealt with as follows. For every piece $p$ of $f$ , with line $y=ax+b$ , select all $i$ -cells where $x_{i}$ lies between $p$ ’s endpoints. For each such cell, repeatedly take all cells in the stacks above it until we reach $j-1$ -cells. Now for each of these cells, take the section in its stack given by the section mapping $x_{j}=ax_{i}+b$ . For each of these sections, again repeatedly take all cells in the stacks above it until we reach $d$ -cells. Denote the obtained set of $d$ -cells by $S_{p}$ ; the desired set of cells is $\bigcup_{p}S_{p}$ .

Subformulas that are linear constraints, where $i$ is the largest index such that $a_{1}$ is nonzero, can be dealt with by taking, above every $i-1$ -cell all sections that lie above the hyperplane corresponding to the constraint, if $a_{i}>0$ , or, if $a_{i}<0$ , all sections that lie below it. The described algorithm for the quantifier-free part can be implemented in FO(SUM).

For the inductive case, if $Q_{k+1}$ is $\exists$ , we must show that we can project a set of cells down one dimension, which is easy given the cylindrical nature of the decomposition; we just move to the underlying base cells. If $Q_{k+1}$ is $\forall$ , we treat it as $\neg\exists\neg$ , so we complement the current set of cells, project down, and complement again. $\hfill\blacktriangleleft$

To conclude, let us summarise the structure of the whole proof. We are given a neural network ${\mathcal{N}}$ in $\mathbf{F}(m,\ell)$ , and we want to evaluate a closed $\textup{FO}({\mathbf{R}_{\rm lin}},F)$ formula $\psi$ . We assume the query to be in prenex normal form and ordered. We start with an interpretation that transforms ${\mathcal{N}}$ to a structure representing the piecewise linear function $F^{{\mathcal{N}}}$ (Lemma 7.3). Then, using another interpretation, we expand the structure by the hyperplane arrangement obtained from the linear pieces of $F^{{\mathcal{N}}}$ as well as the query. Using Lemma 7.5, we expand the current structure by a cell decomposition compatible with the hyperplane arrangement. Finally, using Lemma 7.6 we inductively process the query on this cell decomposition, at each step selecting the cells representing all tuples satisfying the current formula. Since the formula $\psi$ is closed, we eventually either get the single $0$ -dimensional cell, in which case $\psi$ holds, or the empty set, in which case $\psi$ does not hold.

7.3 Extension to multiple inputs

For $m>1$ , the notion of PWL function $f:{\mathbb{R}}^{m}\to{\mathbb{R}}$ is more complex. We can conceptualize our representation of $f$ as a decomposition of ${\mathbb{R}}^{m}$ into polytopes where, additionally, every polytope $p$ is accompanied by an affine function $f_{p}$ such that $f=\bigcup_{p}f_{p}|_{p}$ . We call $f_{p}$ the component function of $f$ on $p$ . Where for $m=1$ each pies of piece $f$ was delineated by just two breakpoints, now our polytope in ${\mathbb{R}}^{m}$ may be delineated by many hyperplanes, called breakplanes. Thus, the vocabulary $\Upsilon^{\mathrm{pwl}}_{m}$ includes the position of a polytope relative to the breakplanes, indicating whether the polytope is on the breakplane, or on the positive or negative side of it. We next sketch how to prove Lemma 7.3 in its generality. The proof of Lemma 7.6 poses no additional problems.

We will define a PWL function $f_{u}$ for every neuron $u$ in the network; the final result is then $f_{\rm out}$ . To represent these functions for every neuron, we simply add one extra relation symbol, indicating to which function each element of a $\Upsilon^{\mathrm{pwl}}_{m}$ -structure belongs. The construction is by induction on the layer number. At the base of the induction are the input neurons. The $i$ -th input neuron defines the PWL functions where there is only one polytope ( ${\mathbb{R}}^{m}$ itself), whose section mapping is the function ${\boldsymbol{x}}\mapsto x_{i}$ .

Scaling.

For any hidden neuron $u$ and incoming edge $v\to u$ with weight $w$ , we define an auxiliary function $f_{v,u}$ which simply scales $f_{v}$ by $w$ .

To represent the function defined by $u$ , we need to sum the $f_{v,u}$ ’s and add $u$ ’s bias; and apply ReLU. We describe these two steps below, which can be implemented in FO(SUM). For $u=\rm out$ , the ReLU step is omitted.

Summing.

For each $v\to u$ , let ${\mathcal{D}}_{v,u}$ be the CD for $f_{v,u}$ , and let $\mathcal{A}_{v,u}$ be the set of hyperplanes in ${\mathbb{R}}^{m}$ that led to ${\mathcal{D}}_{v,u}$ . We define the arrangements $\mathcal{A}_{u}=\bigcup_{v}\mathcal{A}_{v,u}$ and $\mathcal{A}=\bigcup_{u}\mathcal{A}_{u}$ . We apply Lemma 7.5 to $\mathcal{A}$ to obtain a CD ${\mathcal{D}}$ of $\mathcal{A}$ , which is also compatible with each $\mathcal{A}_{u}$ . Every $m$ -cell $C$ in ${\mathcal{D}}$ is contained in a unique polytope $p_{v,u}^{C}\in f_{v,u}$ for every $v\to u$ . We can define $p_{v,u}^{C}$ as the polytope that is positioned the same with respect to the hyperplanes in $\mathcal{A}_{v,u}$ as $C$ is. Two $m$ -cells $C$ and $C^{\prime}$ are called $u$ -equivalent if $p_{v,u}^{C}=p_{v,u}^{C^{\prime}}$ for every $v\to u$ . We can partition ${\mathbb{R}}^{m}$ in polytopes formed by merging each $u$ -equivalence class $[C]$ . Over this partition we define a PWL function $g_{u}$ . On each equivalence class $[C]$ , we define $g_{u}$ as $\sum_{v\to u}f_{p_{v,u}^{C}}$ , plus $u$ ’s bias. The constructed function $g_{u}$ equals $b(u)+\sum_{v}f_{u,v}$ .

ReLU.

To represent $\mathrm{ReLU}(g_{u})$ , we construct the new arrangements $\mathcal{B}_{u}$ formed by the union of $\mathcal{A}_{u}$ with all hyperplanes given by component functions of $g_{u}$ , and $\mathcal{B}=\bigcup_{u}\mathcal{B}_{u}$ . Again apply Lemma 7.5 to $\mathcal{B}$ to obtain a CD ${\mathcal{E}}$ of $\mathcal{B}$ , which is compatible with each $B_{u}$ . Again every $m$ -cell $C$ in ${\mathcal{E}}$ is contained in a unique polytope $p_{u}^{C}$ of $g_{u}$ with respect to $\mathcal{A}_{u}$ . Now two $m$ -cells $C$ and $C^{\prime}$ are called strongly $u$ -equivalent if they are positioned the same with respect to the hyperplanes in $\mathcal{B}_{u}$ . This implies $p_{u}^{C}=p_{u}^{C^{\prime}}$ but is stronger. We can partition ${\mathbb{R}}^{m}$ in polytopes formed by merging each $u$ -equivalence class $[C]$ . Over this partition we define a PWL function ${f_{u}}^{\prime}$ . Let $\xi_{u}^{C}$ be the component function of $g_{u}$ on $p_{u}^{C}$ . On each equivalence class $[C]$ , we define ${f_{u}}^{\prime}$ as $\xi_{u}^{C}$ if it is positive on $C$ ; otherwise it is set to be zero. The constructed function ${f_{u}}^{\prime}$ equals $f_{u}$ as desired.

8 Conclusion

The immediate motivation for this research is explainability and the verification of machine learning models. In this sense, our paper can be read as an application to machine learning of classical query languages known from database theory. The novelty compared to earlier proposals [3, 26] is our focus on real-valued weights and input and output features. More speculatively, we may envision machine learning models as genuine data sources, maybe in combination with more standard databases, and we want to provide a uniform interface. For example, practical applications of large language models will conceivably also need to store a lot of hard facts. However, just being able to query them through natural-language prompts may be suboptimal for integrating them into larger systems. Thus query languages for machine learning models may become a highly relevant research direction.

FO(SUM) queries will be likely very complex, so our result opens up challenges for query processing of complex, analytical SQL queries. Such queries are at the focus of much current database systems research, and supported by recent systems such as DuckDB [30] and Umbra [20]. It remains to be investigated to what extent white-box querying can be made useful in practice. The construction of a cell decomposition of variable space turned out crucial in the proof of our main result. Such cell decompositions might be preproduced by a query processor as a novel kind of index data structure.

While the language $\textup{FO}({\mathbf{R}})$ should mainly be seen as an expressiveness benchmark, techniques from SMT solving and linear programming are being adapted in the context of verifying neural networks [2]. Given the challenge, it is conceivable that for specific classes of applications, $\textup{FO}({\mathbf{R}})$ querying can be made practical.

Many follow-up questions remain open. Does the simulation of $\textup{FO}({\mathbf{R}_{\rm lin}})$ by FO(SUM) extend to $\textup{FO}({\mathbf{R}})$ ? Importantly, how about other activations than ReLU [34]? If we extend FO(SUM) with quantifiers over weights, i.e., real numbers, what is the expressiveness gain? Expressing $\textup{FO}({\mathbf{R}})$ on bounded-depth neural networks now becomes immediate, but do we get strictly more expressivity? Also, to overcome the problem of being unable to even evaluate neural networks of unbounded depth, it seems natural to add recursion to FO(SUM). Fixed-point languages with real arithmetic can be difficult to handle [6, 10].

The language FO(SUM) can work with weighted relational structures of arbitrary shapes, so it is certainly not restricted to the FNN architecture. Thus, looking at other NN architectures is another direction for further research. Finally, we mention the question of designing flexible model query languages where the number of inputs, or outputs, need not be known in advance [3, 4].

References

[1] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.
[2] A. Albarghouthi. Introduction to neural network verification. Foundations and Trends in Programming Languages, 7(1–2):1–157, 2021. doi:10.1561/2500000051.
[3] Marcelo Arenas, Daniel Báez, Pablo Barceló, Jorge Pérez, and Bernardo Subercaseaux. Foundations of symbolic languages for model interpretability. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 11690–11701, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/60cb558c40e4f18479664069d9642d5a-Abstract.html.
[4] Marcelo Arenas, Pablo Barceló, Diego Bustamante, Jose Caraball, and Bernardo Subercaseaux. A uniform language to explain decision trees. In Pierre Marquis, Magdalena Ortiz, and Maurice Pagnucco, editors, Proceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning, KR 2024, Hanoi, Vietnam. November 2-8, 2024, pages 60–70, 2024. doi:10.24963/kr.2024/6.
[5] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in Real Algebraic Geometry. Springer, second edition, 2008.
[6] Michael Benedikt, Martin Grohe, Leonid Libkin, and Luc Segoufin. Reachability and connectivity queries in constraint databases. J. Comput. Syst. Sci., 66(1):169–206, 2003. doi:10.1016/S0022-0000(02)00034-X.
[7] F. Bodria, F. Giannotti, R. Guidotti, F. Naretto, D. Pedreschi, and S. Rinzivillo. Benchmarking and survey of explanation methods for black box models. Data Mining and Knowledge Discovery, 37:1719–1778, 2023. doi:10.1007/s10618-023-00933-9.
[8] Molnar Christoph. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Leanpub, second edition, 2022. URL: https://christophm.github.io/interpretable-ml-book.
[9] George Cybenko. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst., 2(4):303–314, 1989. doi:10.1007/BF02551274.
[10] Floris Geerts and Bart Kuijpers. On the decidability of termination of query evaluation in transitive-closure logics for polynomial constraint databases. Theor. Comput. Sci., 336(1):125–151, 2005. doi:10.1016/j.tcs.2004.10.034.
[11] Ian J. Goodfellow, Yoshua Bengio, and Aaron C. Courville. Deep Learning. Adaptive computation and machine learning. MIT Press, 2016. URL: http://www.deeplearningbook.org/.
[12] E. Grädel and Y. Gurevich. Metafinite model theory. Information and Computation, 140(1):26–81, 1998. doi:10.1006/inco.1997.2675.
[13] Erich Grädel, Phokion G. Kolaitis, Leonid Libkin, Maarten Marx, Joel Spencer, Moshe Y. Vardi, Yde Venema, and Scott Weinstein. Finite Model Theory and Its Applications. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2007. doi:10.1007/3-540-68804-8.
[14] Martin Grohe, Christoph Standke, Juno Steegmans, and Jan Van den Bussche. Query languages for neural networks. CoRR, abs/2408.10362, 2024. doi:10.48550/arXiv.2408.10362.
[15] Dan Halperin. Arrangements. In Jacob E. Goodman and Joseph O’Rourke, editors, Handbook of Discrete and Computational Geometry, Second Edition, chapter 24, pages 529–562. Chapman and Hall/CRC, second edition, 2004. doi:10.1201/9781420035315.ch24.
[16] Wilfrid Hodges. Model Theory, volume 42 of Encyclopedia of mathematics and its applications. Cambridge University Press, 1993.
[17] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, 1991. doi:10.1016/0893-6080(91)90009-T.
[18] P.C. Kanellakis, G.M. Kuper, and P.Z. Revesz. Constraint query languages. Journal of Computer and System Sciences, 51(1):26–52, August 1995. doi:10.1006/jcss.1995.1051.
[19] A. Klug. Equivalence of relational algebra and relational calculus query languages having aggregate functions. Journal of the ACM, 29(3):699–717, 1982. doi:10.1145/322326.322332.
[20] A. Kohn, V. Leis, and Th. Neumann. Tidy tuples and flying start: fast compilation and fast execution of relational queries in Umbra. VLDB Journal, 30(5):883–905, 2021. doi:10.1007/s00778-020-00643-4.
[21] Gabriel M. Kuper, Leonid Libkin, and Jan Paredaens, editors. Constraint Databases. Springer, 2000. doi:10.1007/978-3-662-04031-7.
[22] Marta Kwiatkowska and Xiyue Zhang. When to trust AI: advances and challenges for certification of neural networks. In Maria Ganzha, Leszek A. Maciaszek, Marcin Paprzycki, and Dominik Slezak, editors, Proceedings of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023, Warsaw, Poland, September 17-20, 2023, volume 35 of Annals of Computer Science and Information Systems, pages 25–37. Polish Information Processing Society, 2023. doi:10.15439/2023F2324.
[23] L. Libkin. Expressive power of SQL. Theoretical Computer Science, 296:379–404, 2003. doi:10.1016/S0304-3975(02)00736-3.
[24] Leonid Libkin. Elements of Finite Model Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2004. doi:10.1007/978-3-662-07003-1.
[25] Changliu Liu, Tomer Arnon, Christopher Lazarus, Christopher Strong, Clark Barrett, Mykel J Kochenderfer, et al. Algorithms for verifying deep neural networks. Foundations and Trends® in Optimization, 4(3-4):244–404, 2021. doi:10.1561/2400000035.
[26] X. Liu and E. Lorini. A unified logical framework for explanations in classifier systems. Journal of Logic and Computation, 33(2):485–515, 2023. doi:10.1093/logcom/exac102.
[27] Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In NIPS, pages 4765–4774, 2017. URL: https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.
[28] Abo Khamis M., H.Q. Ngo, and A. Rudra. Juggling functions inside a database. SIGMOD Record, 46(1):6–13, 2017. doi:10.1145/3093754.3093757.
[29] João Marques-Silva. Logic-based explainability in machine learning. In Leopoldo E. Bertossi and Guohui Xiao, editors, Reasoning Web. Causality, Explanations and Declarative Knowledge - 18th International Summer School 2022, Berlin, Germany, September 27-30, 2022, Tutorial Lectures, volume 13759 of Lecture Notes in Computer Science, pages 24–104. Springer, 2022. doi:10.1007/978-3-031-31414-8_2.
[30] Mark Raasveldt and Hannes Mühleisen. Duckdb: an embeddable analytical database. In Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska, editors, Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 1981–1984. ACM, 2019. doi:10.1145/3299869.3320212.
[31] C. Rudin. Stop explaining black box maching learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1:206–215, 2019. doi:10.1038/s42256-019-0048-x.
[32] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014. doi:10.48550/arXiv.1312.6199.
[33] A. Tarski. A Decision Method for Elementary Algebra and Geometry. University of California Press, 1951.
[34] Vincent Tjeng, Kai Yuanqing Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL: https://openreview.net/forum?id=HyGIdiRqtm.
[35] Szymon Torunczyk. Aggregate queries on sparse databases. In Dan Suciu, Yufei Tao, and Zhewei Wei, editors, Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2020, Portland, OR, USA, June 14-19, 2020, pages 427–443. ACM, 2020. doi:10.1145/3375395.3387660.
[36] Steffen van Bergerem and Nicole Schweikardt. Learning concepts described by weight aggregation logic. In Christel Baier and Jean Goubault-Larrecq, editors, 29th EACSL Annual Conference on Computer Science Logic, CSL 2021, January 25-28, 2021, Ljubljana, Slovenia (Virtual Conference), volume 183 of LIPIcs, pages 10:1–10:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.CSL.2021.10.
[37] Sandra Wachter, Brent D. Mittelstadt, and Chris Russell. Counterfactual explanation without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31(2):841–887, 2018. doi:10.2139/ssrn.3063289.

[bib.bib1] [1] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.

[bib.bib2] [2] A. Albarghouthi. Introduction to neural network verification. Foundations and Trends in Programming Languages, 7(1–2):1–157, 2021. doi:10.1561/2500000051.

[bib.bib3] [3] Marcelo Arenas, Daniel Báez, Pablo Barceló, Jorge Pérez, and Bernardo Subercaseaux. Foundations of symbolic languages for model interpretability. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 11690–11701, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/60cb558c40e4f18479664069d9642d5a-Abstract.html.

[bib.bib4] [4] Marcelo Arenas, Pablo Barceló, Diego Bustamante, Jose Caraball, and Bernardo Subercaseaux. A uniform language to explain decision trees. In Pierre Marquis, Magdalena Ortiz, and Maurice Pagnucco, editors, Proceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning, KR 2024, Hanoi, Vietnam. November 2-8, 2024, pages 60–70, 2024. doi:10.24963/kr.2024/6.

[bib.bib5] [5] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in Real Algebraic Geometry. Springer, second edition, 2008.

[bib.bib6] [6] Michael Benedikt, Martin Grohe, Leonid Libkin, and Luc Segoufin. Reachability and connectivity queries in constraint databases. J. Comput. Syst. Sci., 66(1):169–206, 2003. doi:10.1016/S0022-0000(02)00034-X.

[bib.bib7] [7] F. Bodria, F. Giannotti, R. Guidotti, F. Naretto, D. Pedreschi, and S. Rinzivillo. Benchmarking and survey of explanation methods for black box models. Data Mining and Knowledge Discovery, 37:1719–1778, 2023. doi:10.1007/s10618-023-00933-9.

[bib.bib8] [8] Molnar Christoph. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Leanpub, second edition, 2022. URL: https://christophm.github.io/interpretable-ml-book.

[bib.bib9] [9] George Cybenko. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst., 2(4):303–314, 1989. doi:10.1007/BF02551274.

[bib.bib10] [10] Floris Geerts and Bart Kuijpers. On the decidability of termination of query evaluation in transitive-closure logics for polynomial constraint databases. Theor. Comput. Sci., 336(1):125–151, 2005. doi:10.1016/j.tcs.2004.10.034.

[bib.bib11] [11] Ian J. Goodfellow, Yoshua Bengio, and Aaron C. Courville. Deep Learning. Adaptive computation and machine learning. MIT Press, 2016. URL: http://www.deeplearningbook.org/.

[bib.bib12] [12] E. Grädel and Y. Gurevich. Metafinite model theory. Information and Computation, 140(1):26–81, 1998. doi:10.1006/inco.1997.2675.

[bib.bib13] [13] Erich Grädel, Phokion G. Kolaitis, Leonid Libkin, Maarten Marx, Joel Spencer, Moshe Y. Vardi, Yde Venema, and Scott Weinstein. Finite Model Theory and Its Applications. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2007. doi:10.1007/3-540-68804-8.

[bib.bib14] [14] Martin Grohe, Christoph Standke, Juno Steegmans, and Jan Van den Bussche. Query languages for neural networks. CoRR, abs/2408.10362, 2024. doi:10.48550/arXiv.2408.10362.

[bib.bib15] [15] Dan Halperin. Arrangements. In Jacob E. Goodman and Joseph O’Rourke, editors, Handbook of Discrete and Computational Geometry, Second Edition, chapter 24, pages 529–562. Chapman and Hall/CRC, second edition, 2004. doi:10.1201/9781420035315.ch24.

[bib.bib16] [16] Wilfrid Hodges. Model Theory, volume 42 of Encyclopedia of mathematics and its applications. Cambridge University Press, 1993.

[bib.bib17] [17] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, 1991. doi:10.1016/0893-6080(91)90009-T.

[bib.bib18] [18] P.C. Kanellakis, G.M. Kuper, and P.Z. Revesz. Constraint query languages. Journal of Computer and System Sciences, 51(1):26–52, August 1995. doi:10.1006/jcss.1995.1051.

[bib.bib19] [19] A. Klug. Equivalence of relational algebra and relational calculus query languages having aggregate functions. Journal of the ACM, 29(3):699–717, 1982. doi:10.1145/322326.322332.

[bib.bib20] [20] A. Kohn, V. Leis, and Th. Neumann. Tidy tuples and flying start: fast compilation and fast execution of relational queries in Umbra. VLDB Journal, 30(5):883–905, 2021. doi:10.1007/s00778-020-00643-4.

[bib.bib21] [21] Gabriel M. Kuper, Leonid Libkin, and Jan Paredaens, editors. Constraint Databases. Springer, 2000. doi:10.1007/978-3-662-04031-7.

[bib.bib22] [22] Marta Kwiatkowska and Xiyue Zhang. When to trust AI: advances and challenges for certification of neural networks. In Maria Ganzha, Leszek A. Maciaszek, Marcin Paprzycki, and Dominik Slezak, editors, Proceedings of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023, Warsaw, Poland, September 17-20, 2023, volume 35 of Annals of Computer Science and Information Systems, pages 25–37. Polish Information Processing Society, 2023. doi:10.15439/2023F2324.

[bib.bib23] [23] L. Libkin. Expressive power of SQL. Theoretical Computer Science, 296:379–404, 2003. doi:10.1016/S0304-3975(02)00736-3.

[bib.bib24] [24] Leonid Libkin. Elements of Finite Model Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2004. doi:10.1007/978-3-662-07003-1.

[bib.bib25] [25] Changliu Liu, Tomer Arnon, Christopher Lazarus, Christopher Strong, Clark Barrett, Mykel J Kochenderfer, et al. Algorithms for verifying deep neural networks. Foundations and Trends® in Optimization, 4(3-4):244–404, 2021. doi:10.1561/2400000035.

[bib.bib26] [26] X. Liu and E. Lorini. A unified logical framework for explanations in classifier systems. Journal of Logic and Computation, 33(2):485–515, 2023. doi:10.1093/logcom/exac102.

[bib.bib27] [27] Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In NIPS, pages 4765–4774, 2017. URL: https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.

[bib.bib28] [28] Abo Khamis M., H.Q. Ngo, and A. Rudra. Juggling functions inside a database. SIGMOD Record, 46(1):6–13, 2017. doi:10.1145/3093754.3093757.

[bib.bib29] [29] João Marques-Silva. Logic-based explainability in machine learning. In Leopoldo E. Bertossi and Guohui Xiao, editors, Reasoning Web. Causality, Explanations and Declarative Knowledge - 18th International Summer School 2022, Berlin, Germany, September 27-30, 2022, Tutorial Lectures, volume 13759 of Lecture Notes in Computer Science, pages 24–104. Springer, 2022. doi:10.1007/978-3-031-31414-8_2.

[bib.bib30] [30] Mark Raasveldt and Hannes Mühleisen. Duckdb: an embeddable analytical database. In Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska, editors, Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 1981–1984. ACM, 2019. doi:10.1145/3299869.3320212.

[bib.bib31] [31] C. Rudin. Stop explaining black box maching learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1:206–215, 2019. doi:10.1038/s42256-019-0048-x.

[bib.bib32] [32] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014. doi:10.48550/arXiv.1312.6199.

[bib.bib33] [33] A. Tarski. A Decision Method for Elementary Algebra and Geometry. University of California Press, 1951.

[bib.bib34] [34] Vincent Tjeng, Kai Yuanqing Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL: https://openreview.net/forum?id=HyGIdiRqtm.

[bib.bib35] [35] Szymon Torunczyk. Aggregate queries on sparse databases. In Dan Suciu, Yufei Tao, and Zhewei Wei, editors, Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2020, Portland, OR, USA, June 14-19, 2020, pages 427–443. ACM, 2020. doi:10.1145/3375395.3387660.

[bib.bib36] [36] Steffen van Bergerem and Nicole Schweikardt. Learning concepts described by weight aggregation logic. In Christel Baier and Jean Goubault-Larrecq, editors, 29th EACSL Annual Conference on Computer Science Logic, CSL 2021, January 25-28, 2021, Ljubljana, Slovenia (Virtual Conference), volume 183 of LIPIcs, pages 10:1–10:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.CSL.2021.10.

[bib.bib37] [37] Sandra Wachter, Brent D. Mittelstadt, and Chris Russell. Counterfactual explanation without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31(2):841–887, 2018. doi:10.2139/ssrn.3063289.

Query Languages for Neural Networks

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

The constraint query language approach.

The SQL approach.

Comparing expressive powers.

2 Preliminaries on neural networks

Hidden layers.

3 A black-box query language

Example 3.1.

Example 3.2.

Example 3.3.

Example 3.4.

Evaluating FO⁢(𝐑) queries.

Complexity.

▶ Remark 3.5.

4 Weighted structures and FO(SUM)

5 White-box querying

Example 5.1.

Example 5.2.

Lemma 5.3.

Proof (sketch).

Example 5.4.

More examples.

6 Model-agnostic queries

Theorem 6.1.

Proof.

Theorem 6.2.

Corollary 6.3.

7 From FO(𝐑𝐥𝐢𝐧) to FO(SUM)

Theorem 7.1.

7.1 FO(SUM) translations

Theorem 7.2.

Proof (sketch).

7.2 Proof of Theorem 7.1

Piecewise linear functions.

Lemma 7.3.

Hyperplane arrangements.

▶ Remark 7.4.

Cylindrical decomposition.

Lemma 7.5.

Proof (sketch).

Ordered formulas and cell selection.

Lemma 7.6.

Proof (sketch).

7.3 Extension to multiple inputs

Scaling.

Summing.

ReLU.

8 Conclusion

References

Evaluating $\textup{FO}({\mathbf{R}})$ queries.

$\blacktriangleright$ Remark 3.5.

7 From FO( $\mathbf{R}_{\rm lin}$ ) to FO(SUM)

$\blacktriangleright$ Remark 7.4.