Probabilistic and Causal Satisfiability:
Constraining the Model
Abstract
We study the complexity of satisfiability problems in probabilistic and causal reasoning. Given random variables over finite domains, the basic terms are probabilities of propositional formulas over atomic events , such as or . The basic terms can be combined using addition (yielding linear terms) or multiplication (polynomial terms). The probabilistic satisfiability problem asks whether a joint probability distribution satisfies a Boolean combination of (in)equalities over such terms. Fagin et al. [11] showed that for basic and linear terms, this problem is -complete, making it no harder than Boolean satisfiability, while Mossé et al. [22] proved that for polynomial terms, it is complete for the existential theory of the reals.
Pearl’s Causal Hierarchy (PCH) extends the probabilistic setting with interventional and counterfactual reasoning, enriching the expressiveness of the languages. However, Mossé et al. [22] found that the complexity of satisfiability remains unchanged. Van der Zander et al. [38] showed that introducing a marginalization operator to languages induces a significant increase in complexity.
We extend this line of work by adding two new dimensions to the problem by constraining the models. First, we fix the graph structure of the underlying structural causal model, motivated by settings like Pearl’s do-calculus, and give a nearly complete landscape across different arithmetics and PCH levels. Second, we study small models. While earlier work showed that satisfiable instances admit polynomial-size models, this is no longer guaranteed with compact marginalization. We characterize the complexities of satisfiability under small-model constraints across different settings.
Keywords and phrases:
Existential theory of the real numbers, Computational complexity, Probabilistic logic, Structural Causal ModelsCategory:
Track B: Automata, Logic, Semantics, and Theory of ProgrammingFunding:
Benito van der Zander: Work supported by the Deutsche Forschungsgemeinschaft (DFG) grant 471183316 (ZA 1244/1-1).Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computation Complexity theory and logicEditors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele PuppisSeries and Publisher:

1 Introduction
Reasoning about probability is essential in many research fields. In computer science, it plays a crucial role in analyzing probabilistic programs, understanding a program’s behavior under probabilistic input assumptions, and handling uncertain information in expert systems. In a seminal paper, Fagin et al. [11], introduce a logic to reason about probabilities. Their aim was to formulate a calculus that allows to phrase statements like “” or “”. They study (among other things) the satisfiability problem for Boolean combinations of such terms, i.e., is there a probability distribution that satisfies all the terms. It turns out that the expressiveness of the underlying calculus has a great influence on the complexity of such satisfiability problems. For instance, in their original work, Fagin et al. investigated three types of terms in the (in)equalities: basic ones, like the one above, which consist of only single basic probabilities, linear combinations of basic probabilities, and polynomial expressions in basic probabilities. This choice parameterizes the satisfiability problem and it turns out that the type of equations has an impact on the complexity of the corresponding satisfiability problem.
We will study the satisfiability of such formulas from a multi-parametric view: there are several parameters in the underlying calculus, like the type of equations mentioned above, and by adjusting these parameters, we get a large family of satisfiability of varying complexities, depending on the choice of our parameters. There has been a lot of work along these lines, see e.g. [11, 15, 22, 17, 38, 4, 16, 8], and the main contribution of this paper is that we finally (almost) complete the whole picture.
Another dimension is that we can enhance the basic probability terms. This is inspired by the causal theory in AI. The development of the modern causal theory in AI and empirical sciences has greatly benefited from an influential structured approach to inference about causal phenomena, which is based on a reasoning hierarchy named “Ladder of Causation”, also often referred to as “Pearl’s Causal Hierarchy” (PCH) ([36, 28, 3], see also [29] for a gentle introduction to the topic). This three-level framework formalizes various types of reasoning that reflect the progressive sophistication of human thought regarding causation. It arises from a collection of causal mechanisms that model the “ground truth” of unobserved nature formalized within a Structural Causal Model (SCM). These mechanisms are then combined with three patterns of reasoning concerning observed phenomena expressed at the corresponding layers of the hierarchy, known as probabilistic (also called associational in the AI literature), interventional, and counterfactual (for formal definitions of these concepts as well as for illustrative examples, see Section 2).
A basic term at the probabilistic (observational) layer is expressed as a common probability, such as111In our paper, we consider random variables over discrete, finite domains. By an event, we mean a propositional formula over atomic events of the form , such as or . Moreover, by , etc., we mean, as usually, . Finally by using lowercase letters in , we abbreviate . . This may represent queries like “How likely does a patient have both diabetes and high blood pressure ?” This layer corresponds precisely to the calculus developed by Fagin et al. mentioned above. The interventional patterns extend the basic probability terms by allowing the use of Pearl’s do-operator [28] which models an experiment like a randomized controlled trial [12]. For instance, which222A common and popular notation for the post-interventional probability is . In this paper, however, we use the notation since it is more convenient for our analysis., in general differs from , allows to ask hypothetical questions such as, e.g., “How likely it is that a patient’s headache will be cured if he or she takes aspirin ?”. An example formula at this layer is which estimates the causal effect of the intervention (all patients take aspirin) on outcome variable (headache cure). It illustrates the use of the prominent back-door adjustment to eliminate the confounding effect of a factor represented by variable [28]. The basic terms at the highest level of the hierarchy enable us to formulate queries related to counterfactual situations. For example, expresses the probability that, for instance, a patient who did not receive a vaccine and died would have lived () if he or she had been vaccinated ().
The computational complexity aspects of reasoning about uncertainty in this framework have been the subject of intensive studies in the past decades. The research has resulted in a large number of significant achievements, especially in the case of probabilistic inference with the input probability distributions encoded by Bayesian networks [26]. These problems are of great interest in AI and important results from this perspective include [6, 7, 34, 25, 18]. However, despite intensive research, many fundamental problems in the field remain open.
The main interest of our research is focused on the precise characterization of the computational complexity of satisfiability problems (and their validity counterparts) for languages of all PCH layers, combined with increasing the expressiveness of (in)equalities by enabling the use of more complex operators. Our primary concern is the effect of constraining the model in these satisfiability problems, on the one hand by fixing the graph structure of the model, and on the other by bounding the model size to be polynomial.
1.1 Reasoning about probabilities: a multi-parametric view
One starting point for our investigations is a pioneering paper work by Fagin et al. [11], who introduce a logic to reason about probabilities. We are given a set of random variables over some fixed finite domain , often . They consider formulas consisting of Boolean combinations of (in)equalities of basic and linear terms, like with binary variables . The authors provide a complete axiomatization for the used logic, which is essentially a formalization of Nilsson’s probabilistic logic [24] and they show that the problem of deciding satisfiability is -complete. Thus, surprisingly, the complexity is no worse than that of propositional logic. Fagin et al. extend then the language to (in)equalities of polynomial terms, with the goal of reasoning about conditional probabilities. They prove that the latter variant is -hard and contained in . Recently, Mossé et al. [22] have given the exact complexity of this problem, showing that deciding the satisfiability is -complete, where is the well-studied class defined as the closure of the Existential Theory of the Reals (ETR) under polynomial time many-one reductions.
In this work, we study the satisfiability (and validity) problem for such formulas involving probabilities. As already seen above, the choice of the admissible operations have an impact on the complexity. There are further choices, which we can make and which we will discuss in the following. All these choices have an impact on the complexity. Therefore, we will phrase these satisfiability problems as multi-parametric or multi-dimensional problems. The first dimension is the expressiveness of the underlying arithmetic:
- Arithmetic:
-
Basic, linear, or polynomial.
The second dimension will be the layer in the Pearl’s Causal Hierarchy which we have already discussed in the previous section. Reasoning about uncertainty has been the subject of intensive studies in the past decades. The research has resulted in a large number of significant achievements, especially in the case of probabilistic inference with the input probability distributions encoded by Bayesian networks [26]. The setting is of great interest in AI since it allows one not only to reason about observations, but also about causality, and even counterfactuals, which is for instance important in fairness considerations. These three layers form a hierarchy of increasing expressiveness (for a more detailed description of the concepts, see Section 2).
- PCH layer:
-
Probabilistic (observational), causal (interventional), or counterfactual.
The languages used in [11, 22] are capable of fully expressing probabilistic reasoning, in particular, they allow to express marginalization, which is a common paradigm in this field. However, in the frameworks discussed above, this ability is very limited since the languages do not allow the use of a unary summation operator . Thus, for instance333 We have chosen this example because it is short. In this particular example, the joint probability could be directly written as in the calculus of Fagin et al., see the formal definition in Section 2. This is however not true anymore if we sum over more complicated expressions., to express the marginal distribution of a random variable over a subset of (binary) variables as , an encoding without summation requires an expansion into , which of exponential size in . Consequently, to analyze the complexity aspects of the studied problems, languages that exploit the standard notation of probability theory and statistics, one requires an encoding that represents marginalization via the sum operator . In the recent paper [38], the authors present the first systematic study in this setting. They introduce a new natural class, named , which can be viewed as a succinct variant of .
Given the impact of the compact marginalization operator, the way we can perform marginalization will be the third dimension:
- Marginalization:
-
Expanded or compact.
For expanded marginalization, the complexity results by Fagin et al. [11], as well as Mossé [22] et al., can be easily adapted to capture all other parameter ranges, see the full version of our work. Therefore, throughout this paper, we only deal with the case when we have a compact marginalization operator in our calculus.
The fact that the decision problem for basic and linear arithmetic without summation is in is not obvious. A model for a formula is a joint probability distribution of , say, binary variables, which is a table with entries. Furthermore, it is a priori not clear that the bit size of the entries is polynomial. It turns out, that these problems have what is called a small model property. If there is a model, that is, a joint probability distribution satisfying the expression, then there is one which has only polynomially many non-zero entries. This follows by linear algebra arguments. If the model is small, then it can be guessed efficiently in . In the case of polynomial arithmetic, we still have the small model property, but one cannot guarantee that the entries have polynomial size. However, the compact marginalization operator destroys the small model property that was crucial in the work of [11] and [22] and in fact increases the complexity of the satisfiability problems dramatically [38]. So the next dimension is the model size, which can either be polynomially bounded or arbitrary:
- Model size:
-
Small (polynomially many entries) or unbounded.
The impact of restricting the model size in the presence of a compact marginalization operator will be one of the main topics in this work (see the results in Section 5).
Causal models are often represented as a graph or Bayesian network where all random variables are represented as nodes and the edges encode which variables influence each other. E.g., at the probabilistic level, the graph would encode which variables are conditionally independent of each other. In the satisfiability problems described so far, the graph was not mentioned, so the task was to decide whether there exists any model with any graph structure that satisfies the formula. In many studies, a researcher can infer information about the graph structure, combining observed and experimental data with existing knowledge. Indeed, learning graphical, causal structures from data has been the subject of a considerable amount of research (see, e.g., [21, 9, 13, 37] for recent reviews). Thus, we will study the satisfiability (resp. validity) of formulas with the additional requirement that the graph structure is also given in the input. The goal would be to determine whether the formula is satisfied in a model (resp. is valid in all models) having the structure .
In fact, such a constraint validity problem is one of the central problems of causality. E.g., given a graph structure, Pearl’s prominent do-calculus [28] is often applied to show the validity of the equivalence between causal and probabilistic expressions. The impact of including the graph in the input on the complexity will be the second important question that we will study in this work (see Section 4). Thus, as a fifth dimension, we study the model structure:
- Model structure:
-
Specified by a graph as a part of the input or unconstrained.
1.2 A brief overview of our results
We continue by giving an informal description of our results. A detailed description can be found in Section 3.
We first show that constraining the graph structure of the model makes the problem only harder in the sense that the unconstrained problem can be reduced to the constrained one, since the unconstrained problem can always be reduced to the constrained one with the complete directed acyclic graph as a model. It turns out that for the probabilistic or interventional layer together with basic or linear arithmetic, there is indeed a jump. The probabilistic layer becomes at least -hard, while the interventional layer jumps from to . For all other combinations of the PCH layer and arithmetic, no such jump happens and the complexity stays the same, however, we require new proof techniques. The exact results are given compactly in Table 2 (for comparison, the complexity landscape for unconstrained models, can be found in Table 1).
Second, we consider the case when the underlying model is small; small here means that for the underlying probability distribution , the number of tuples with is polynomially bounded in the input size. For the interventional and counterfactual layer with polynomial arithmetic, this does not affect the complexity of the problem (again, new proof techniques are required), but for the probabilistic layer, the complexity drops down to , a new class contained in . The full results are given compactly in Table 3.
2 Preliminaries
2.1 Pearl’s Causal Hierarchy: An example
To illustrate the main ideas behind Pearl’s Causal Hierarchy (PCH), we present below an example that, we hope, will make it easier to understand the formal definitions given in Section 2.2. In this example, we consider a hypothetical scenario involving three attributes represented by binary random variables: Age, modeled by (young) and (old), (COVID-19) Vaccination represented by , if yes, and if not vaccinated, and Recovery, represented by (and meaning mortality). Below, we describe a structural causal model (SCM) that represents an unobserved true mechanism underlying this scenario and illustrates the canonical patterns of reasoning expressible at different levels of the PCH.
Structural Causal Model.
An SCM is defined as a tuple which is of unobserved nature from the perspective of an empirical researcher. It specifies the distribution of the population and the mechanism . In our example, the model assumes three independent binary random variables , with probabilities: . They affect the observed endogenous (observed) random variables via the mechanism specified as follows: .
Thus, the model determines the distribution , for , and the values for the observed variables, as can be seen in Fig. 1.
The unobserved random variable reflects the age of the population and is a function of (which may be more complex in a real population). Getting COVID-19 vaccination depends on age but also on other circumstances (severe allergic reaction to a component of the COVID-19 vaccine, moderate or severe acute illness, etc.) and this is modeled by . So is a function of and . Finally, depends on age, being vaccinated, and on further circumstances like having other diseases, which are modeled by . So is a function of , , and . The functions define a directed graph structure on the random (observed) variables: there is an edge from to if depends on . We always assume that dependency graph of the SCM is acyclic. This property is also called semi-Markovian. In our example, we get the DAG:
Layer 1 (probabilistic).
Empirical sciences primarily rely on observed data, typically represented as probability distributions over measurable variables. In our example, this corresponds to the distribution . The unobserved variables along with the causal mechanism , remain hidden from direct observation. A researcher thus receives the probabilities probabilities (shown in Fig. 1) , where vectors and if and ; otherwise The relevant query in our scenario can be evaluated as which says that the probability for recovery () is high given that the patient was vaccinated (). On the other hand, the query for can be evaluated as , which may lead to the opinion that the vaccine is relevant to recovery. However, these results do not reflect a randomized controlled trial [12] that could establish the differences between a COVID-19 vaccination and a placebo vaccination.
Layer 2 (interventional).
Consider a randomized trial in which each patient is vaccinated, denoted as , regardless of age () and other conditions (). We model this by performing a hypothetical intervention in which we replace in the mechanism by the constant function and leaving the remaining functions unchanged (see, Fig. 1). If denotes the new mechanism, then the post-interventional distribution is specified as where denotes function as above, but for the new mechanism (the distribution is shown in Fig. 1)444A common and popular notation for the post-interventional probability is . In this paper, we use the notation since it is more convenient for analyses involving counterfactuals.. To determine the causal effect of the COVID-19 vaccination on recovery, we compute, in an analogous way, the distribution after the intervention , which means that all patients receive a placebo vaccination. Then, comparing the value with we can conclude that . This can be interpreted as a positive (average) effect of the vaccine in the population. Note that it is not obvious how to compute the post-interventional distributions from the observed probability ; Indeed, this is a challenging task in the field of causality.
Layer 3 (counterfactual).
The key phenomena that can be modeled and analyzed at this level are counterfactual situations. Imagine, e.g., in our scenario, there is a group of patients who have not been vaccinated and died . One may ask, what the outcome () would be had they been vaccinated . In particular, one can ask what the probability of recovery would be if we had vaccinated patients in this group. Using the formalism of Layer 3, we can express this as a counterfactual query: Note that the event incorporates simultaneously two counterfactual mechanisms: and . This is the key difference to Layer 2, where we can only have one.
2.2 Syntax and semantics of probabilistic calculi
In this section, we describe the underlying formalism of the satisfiability problems that we consider. This will cover the first three parameter dimensions: the underlying arithmetic, with or without compact marginalization, and the PCH level. The other two dimensions then naturally follow since they are obtained by restricting the underlying model.
We always consider discrete distributions in the probabilistic and causal languages studied in this paper. We represent the values of the random variables as and denote by the set of random variables used in a system. Here, we use bold letters for sets of variables or sets of values. By capital letters , we denote the individual variables and assume, w.l.o.g., that they all share the same domain . A value of is often denoted by the corresponding lowercase letter or a natural number. In this section, we describe the syntax and semantics of the languages starting with probabilistic ones and then we provide extensions to the causal systems.
By an atomic event, we mean an event of the form , where is a random variable and is a value in the domain of . The language of propositional formulas over atomic events is defined as the closure of such events under the Boolean operators and . To specify the syntax of interventional and counterfactual events, we define the intervention and extend the syntax of to and , respectively, using the following grammars:
Note that since means that no intervention has been applied, we can assume that .
The PCH consists of three languages , each of which is based on terms of the form . For the (observational or associational) language , we have , for the (interventional) language , we have and for the (counterfactual) language , . The expressive power and computational complexity properties of the languages depend largely on the operations that we are allowed to apply to the basic terms. Allowing gradually more complex operators, we describe the languages which are the subject of our studies below. We start with the description of the languages of terms, with , using the following grammars555In the given grammars, we omit the brackets for readability, but we assume that they can be used in a standard way.
where are formulas in , , .
The probabilities of the form are called primitives or basic terms. In the summation operator , we have a dummy variable which ranges over all values . The summation is a purely syntactical concept which represents the sum , whereby , we mean the expression in which all occurrences of are replaced with value . For example, for , the expression semantically represents .
We note that the dummy variable is not a (random) variable in the usual sense and that its scope is defined in the standard way.
In the table above, the terms in are just basic probabilities with the events given by the corresponding languages , , or . Next, we extend terms by being able to compute sums of probabilities and by adding the same term several times, we also allow for weighted sums with weights given in unary. Note that this is enough to state all our hardness results. All matching upper bounds also work when we allow for explicit weights given in binary. In the case of , we are allowed to build polynomial terms in the primitives. On the right-hand side of the table, we have the same three kinds of terms, but to each of them, we add a marginalization operator as a building block.
The polynomial calculus was originally introduced by Fagin, Halpern, and Megiddo [11] (for ) to be able to express conditional probabilities by clearing denominators. While this works for , this does not work in the case of , since clearing denominators with exponential sums creates expressions that are too large. But we could introduce basic terms of the form with explicitly. All our hardness proofs work without conditional probabilities but all our matching upper bounds are still true with explicit conditional probabilities.
Expressions like are valid terms in and and are valid terms in the language , for example.
Now, let denote the labels of all variants of languages. Then for each and , we define the languages of Boolean combinations of inequalities in a standard way:
Although the language and its operations can appear rather restricted, all the usual elements of probabilistic and causal formulas can be encoded. Namely, equality is encoded as greater-or-equal in both directions, e.g. means .
The number can be encoded as an inconsistent probability, i.e., . In a language allowing addition and multiplication, any positive integer can be easily encoded from the fact , e.g. . If a language does not allow multiplication, one can show that the encoding is still possible. Note that these encodings barely change the size of the expressions, so allowing or disallowing these additional operators does not affect any complexity results involving these expressions.
To define the semantics of the languages, we use a structural causal model (SCM) as in [28, Sec. 3.2]. An SCM is a tuple , such that is a set of variables partitioned into exogenous (unobserved) variables and endogenous variables . The tuple consists of functions such that function calculates the value of variable from the values of other variables in as 666We consider recursive models, that is, we assume the endogenous variables are ordered such that variable (i.e. function ) is not affected by any with . Here, we also use the usual notation with capital letters and lowercase letters where are variables and is an assignment of values to those variables., where and . specifies a probability distribution of all exogenous variables . Since variables depend deterministically on the exogenous variables via functions , and obviously define the joint probability distribution of .
Throughout this paper, we assume that domains of endogenous variables are discrete and finite. In this setting, exogenous variables could take values in any domain, including infinite and continuous ones. A recent paper [39] shows, however, that any SCM over discrete endogenous variables is equivalent for evaluating post-interventional probabilities to an SCM where all exogenous variables are discrete with finite domains.
As a consequence, throughout this paper, we assume that domains of exogenous variables are discrete and finite, too.
For any basic -formula (which, in our notation, means ), we denote by the function obtained from by replacing with the constant function . We generalize this definition for all interventions specified by in a natural way and denote as the resulting functions. For any , we write if is satisfied for values of calculated from the values . For , we write if . And for all , we write if and if and .
Finally, for , let .
We define , for some expression , recursively in a natural way, starting with basic terms as follows and, for , , assuming that the expression is undefined if . For two expressions and , we define , if and only if, The semantics for negation and conjunction are defined in the usual way, giving the semantics for for any formula in .
2.3 Probabilistic and causal satisfiability problems
The (decision) satisfiability problems for languages of the PCH, denoted by , with and , take as input a formula in and ask whether there exists a model such that . Analogously, the validity problems for consist in deciding whether, for a given , holds for all models . From the definitions, it is obvious that variants of the problems for the level are at least as hard as their counterparts at the lower level.
In many studies, a researcher can infer information about the graph structure of the SCM, combining observed and experimental data with existing knowledge. Indeed, learning causal structures from data has been the subject of a considerable amount of research (see, e.g., [9, 13, 37] for recent reviews and [30, 23, 20, 33, 31] for current achievements). Thus, the problem of interest in this setting, which is a subject of this section, is to determine the satisfiability, resp., validity of formulas in SCMs with the additional requirement on the graph structure of the models.
Let be an SCM. We will assume that the models are semi-Markovian in the general case and Markovian in the graph constrained case777We note that the general semi-Markovian model, which allows for the sharing of exogenous arguments and allows for arbitrary dependencies among the exogenous variables, can be reduced in a standard way to the Markovian model by introducing new auxiliary “latent” variables which belong to the endogenous variables of the model, but which are non-measurable in contrast to the “standard” observed / measurable variables. Then we can use such latent variables as nodes of a DAG but the joint probability distribution on is restricted to only the measurable variables and ignores . , Markovian means that the exogenous arguments of , resp. are independent whenever . We define that a DAG represents the graph structure of if, for every appearing as an argument of , is an edge in . DAG is called a causal diagram of the model [28, 3]. The satisfiability problems with an additional requirement on the causal diagram, take as input a formula and a DAG with the task of deciding whether there exists an SCM with the structure for . In this way, for problems , we get the corresponding problems denoted as , with .
An important feature of instances, over all levels , is the small model property which says that, for every satisfiable formula, there is a model whose size is bounded polynomially with respect to the length of the input. This property was used to prove the memberships of in and in [11, 15, 22, 16]. Interestingly, membership proofs of in PP and in [8] rely on the property as well.
Apart from these advantages, the small model property is interesting in itself. For example, on the probabilistic layer, if a formula (without the summation) over variables is satisfiable, then there exists a Bayesian network , with a DAG over nodes and conditional probability tables for each variable , such that is true in and the total size of the tables is bounded by a polynomial of .
These have motivated the introduction of the small-model problem, denoted as , which is defined like with the additional constraint that a satisfying distribution should only have polynomially large support, that is, only polynomially many entries in the exponentially large table of probabilities are non-zero [4]. In the paper, the authors achieve this by extending an instance with an additional unary input and requiring that the satisfying distribution has a support of size at most . We define and in a similar way. Formally, we use the following:
Definition 1.
The decision problems , with , take as input a formula and a unary encoded number and ask whether there exists a model such that and
2.4 The (succinct) existential theory of the reals
For two computational problems , we will write if can be reduced to in polynomial time, which means is not harder to solve than . A problem is complete for a complexity class , if and, for every other problem , it holds . By , we denote the class of all problems such that their complements belong to .
To measure the computational complexity of , a central role play the following, well-known Boolean complexity classes and (for formal definitions see, e.g., [2]). Recent research has shown that the precise complexity of several natural satisfiability problems can be expressed in terms of the classes over the real numbers and . Recall, that the existential theory of the reals is the set of true sentences of the form
(1) |
where is a quantifier-free Boolean formula over the basis and a signature consisting of the constants and , the functional symbols and , and the relational symbols , , and . The sentence is interpreted over the real numbers in the standard way. The theory forms its own complexity class which is defined as the closure of the ETR under polynomial time many-one reductions [14, 5, 35]. A succinct variant of ETR, denoted as succ-ETR, and the corresponding class , have been introduced in [38]. succ-ETR is the set of all Boolean circuits that encode a true sentence as in as follows. Assume that computes a function . Then represents the node set of the tree underlying and is an encoding of the description of node , consisting of the label of , its parent, and its two children. The variables in are . As in the case of , to belong all languages which are polynomial time many-one reducible to succ-ETR. and can also be defined in terms of real RAMs [10, 4]. These two classes correspond to nondeterministic polynomial and exponential time, respectively, on such machines.
In a breakthrough result, Renegar [32] showed that . In addition, his result also shows that even if the formula contains an exponential number of arithmetic terms of exponential size with exponential degree, it is still possible to solve the formula in , as long as the number of variables is polynomially bounded.
3 Results
Intuitively, one would expect that when we make the arithmetic of the underlying satisfiability problem more powerful, then the complexity of the problem will also increase. The same is true when we go up in the PCH. While this is sometimes the case, we will also see many interesting cases where the complexity remains unchanged when we change a certain parameter. Our main focus will be on the effect of constraining the model on the complexity of the problem. As mentioned earlier, we will always assume a compact marginalization operator since this is a very natural operation that frequently appears in expressions involving probabilities. Furthermore, without compact marginalization, everything follows from the results [11, 22] since the models appearing in their proofs are already very constrained (see the full version).
3.1 Previous work
Terms | (prob.) | (interv.) | (count.) |
---|---|---|---|
basic | |||
lin | |||
poly | |||
basic & marg. | |||
lin & marg. | |||
poly & marg. |
For unconstrained models, we have a clear picture of the satisfiability landscape, see Table 1. Fagin et al. [11] prove the -completeness for the probabilistic layer with basic and linear terms and the -hardness with polynomial terms. Mossé et al. [22] prove that the latter is in fact complete for the existential theory of the reals . They also prove that the same complexity results also hold for the interventional and counterfactual layers of the PCH. This means that the layer of the PCH does not have any impact on the complexity of the corresponding satisfiability problem. Things however change, when we add a marginalization operator. As shown in [8] if the arithmetic is basic or linear, then the satisfiability problem is complete for , , and when the PCH layer increases from probabilistic to counterfactual. If, however, the arithmetic is polynomial, then all three layers have the same complexity again, they are complete for the new class [38, 17, 8].
3.2 Our results
As our main result, we complete the complexity landscape of the probabilistic and causal satisfiability problems when we either constrain the graph structure of the SCM or when we force the model to be small.
Terms | (prob.) | (interv.) | (count.) |
---|---|---|---|
basic & marg. | -hard | ||
lin & marg. | |||
poly & marg. |
As the first part of our main results, we classify the difficulty of the satisfiability problems when the graph structure is given as a part of the input. We will show that these cases are always at least as hard as the corresponding problems without specified graph structures. This is due to the fact that we can always take the complete acyclic graph. The most notable difference to the unconstrained problems happens at the interventional layer. Here we have an increase in complexity from to . We also compare our results to the model checking problem. Here we are not only given a graph structure but the full SCM and we need to check whether the model satisfies the input formula. It turns out that model checking is considerably easier. For instance, when the arithmetic is polynomial with marginalization, then even at the probabilistic layer, the satisfiability problem is -complete, while the corresponding model checking problem is in (Theorem 4).
Terms | (prob.) | (interv.) | (count.) |
---|---|---|---|
basic & marg. | |||
lin & marg. | |||
poly & marg. |
Secondly, we consider the satisfiability problems with the small model property and investigate how the complexity increases across the PCH, in the presence of summation operators. In Definition 1, we defined the small model property for SCMs. In the original work of Fagin et al., there was no underlying SCM since they only dealt with the probabilistic layer, and a small model meant that we only have polynomially many entries in the table of the joint probability distribution of the (observed) variables. Since each observed variable is a function in the unobserved variables, both definitions are equivalent for semi-Markovian models as we show in Fact 3. Most notably, with polynomial arithmetic, at the probabilistic layer, having a small model reduces the complexity to , the existential theory of the reals enhanced with a compact summation operator. This was shown in [4] for a probabilistic model without an underlying SCM and with Fact 3 we show that it still holds for Definition 1. As soon as we are at the interventional or even counterfactual layer, then the complexity jumps, but only to . Thus when we have a small model, the problem is hard for a Boolean class, although we have polynomial arithmetic and would have expected that the problem is hard for some theory over the reals. For linear arithmetic with a compact summation operator, it is known that the complexity increases along the PCH, and we show that the corresponding proofs of [8] are still valid if one requires a small-model property. The results are summarized in Figure 3.
4 Satisfiability and validity with requirements on the graph structure of SCMs
4.1 Probabilistic layer
A Bayesian network (BN) consists of a DAG over nodes representing random variables and a distribution which factorizes over , meaning that the joint probability can be written as a product , where denotes the values of parents of in . Then the distribution is specified as a set of conditional probability tables for each variable conditioned on its parents in [18]. So, for example, in the case of binary variables, even if the array representing the joint distribution has non-zero elements, it can still be compactly represented by a set of arrays of total size . This representation has many additional advantages, for example, enabling efficient probabilistic inference.
In the case of purely probabilistic languages, the problem can be read as follows: given a DAG over nodes of a Bayesian network (but not being given the distribution ) and a formula in the language decide whether there exists such that is true in some BN , where factorizes over .
We start with the observation that is not harder than for any .
Proposition 2.
For all and for any over variables , it is true that if and only if , where denotes a complete DAG over nodes , with edges for all .
Proof.
Assume is satisfiable in a BN . Let , where, for every , we define the conditional probability table as if ; otherwise we set the value arbitrarily, e.g., , where is the cardinality of set of values of . Due to the chain rule, we get that which means that is satisfiable in .
The reverse implication is obvious.
Due to their great importance in probabilistic reasoning, a lot of attention in AI research has been given to both function and decision problems, in which the entire BN is given as input with the goal to answer specific queries to the distribution. Certainly, to the most basic ones belongs the BN-Pr problem to compute, for a given BN over , a variable in , and a value , the probability . The next important primitive of probabilistic reasoning consists in finding the Maximum a Posteriori Hypothesis (MAP). To study its computational complexity, the natural decision problem, named D-MAP, has been investigated, which asks if, for a given rational number , evidence , and some subset of variables , there is an instantiation to such that . It is well known that BN-Pr is -complete and D-MAP is PP-complete [34, 25]. To relate our setting to the standard Bayesian reasoning, we introduce a Model Checking problem,
Definition 3.
The Bayesian Network Model Checking problem, denoted as , gets a BN and a formula in , and verifies whether the formula is satisfied in the BN .
This problem can be considered as a generalization of the decision version of BN-Pr.
Theorem 4.
is in . Equivalently, it is in .
Combining this lemma with [8, Thm. 4], the result of [25], and Proposition 2, we get the following:
which presents the relationships between the model checking, D-MAP, and satisfiability problems for probabilistic languages from a complexity perspective. In the general case, i.e., in the case of polynomial languages with summation, the constraints on DAGs do not make the problems more difficult, but restricting the models to BNs leads to different complexities, under a standard complexity assumption.
Lemma 5.
is -hard.
This means that , which is at least as hard as either D-MAP or , is both -hard and -hard. Since these are very distinct classes, one should not expect to be complete for either class.
Proposition 6.
is -complete.
Proof.
From Theorem 11, we know that is -complete. Since any probabilistic formula is a special case of a counterfactual one and is -complete, we get . On the other hand, from Proposition 2, we can conclude that the opposite relation is also true.
Thus and the problems are computationally harder than if
4.2 Interventional and counterfactual reasoning
One of the key components in the development of structural causal models is the do-calculus introduced by Pearl [27, 28], which allows one to estimate causal effects from observational data. The calculus can be expressed as a validity problem for intervention-level languages with the graph structure requirements on SCMs. In particular, the rules of the calculus can be seen as instances of validity problems for which equivalent graphical conditions in terms of -separation are given. For example, the “insertion/deletion of observations” rule requires, for a given DAG , that, for all SCMs with the causal structure , and for all values , it holds . This rule can be written in a compact way as . In this section, we prove that, in general, the validity problem with polynomial arithmetic is very hard, namely we will see that it is complete for .
To this goal, we study the complexity of the satisfiabilities with graph structure requirements in the case of the interventional languages. The base and linear languages are neither weak enough to be independent of the graph structure nor strong enough to be able to distinguish graph structures exactly. In particular the complexity of and increases above the -completeness of and . On the other hand, the polynomial languages are strong enough to encode the DAG structure of the model which implies that the complexity for these languages is the same as .
Theorem 7.
and are -complete. Thus, the validity problems for the corresponding languages are complete for .
An important tool for the proof of this theorem is the satisfiability of a Schönfinkel-Bernays sentence. The class of Schönfinkel–Bernays sentences (also called Effectively Propositional Logic, EPR) is a fragment of first-order logic formulas where satisfiability is decidable. Each sentence in the class is of the form whereby can contain logical operations , variables and , equalities, and relations which depend on a set of variables, but cannot contain any quantifier or functions. Determining whether a Schönfinkel-Bernays sentence is satisfiable is an -complete problem [19] even if all variables are restricted to binary values [1].
Corollary 8.
is computationally harder than unless .
If the graph is complete, that is, all nodes are connected by an edge, it is equivalent to a causal ordering, an ordering , such that variable can only depend on variables with . If a causal ordering is given as a constraint, it does not change the complexity of the problem:
Lemma 9.
In one can encode a causal ordering.
Proof.
Given (in-)equalities in , we add a new variable and add to each primitive the intervention . This does not change the satisfiability of (in-)equalities.
Given a causal order , we add equations for each variable , :
for .
The equations ensure that, if one variable is changed, and is set, the next variable in the causal ordering has the same value, thus fixing an order from the first to the last variable.
The equivalence between causal orderings and complete graphs thus results in:
Corollary 10.
remains -complete in the special case that happens to be a complete graph.
Counterfactual languages on the other hand are strong enough to be able to enforce an explicit graph structure directly in the formula using exponential sums. That is, for each variable , the formula can encode which variables are the parents of by requiring that remains constant if the parents remain constant. This condition is encoded with an intervention on the parents and an intervention on the other variables, and a counterfactual query ensuring that the second intervention does not change the probability distribution of after the first intervention. An exponential sum can run this through all possible interventions.
Theorem 11.
for any .
Corollary 12.
is succ-ETR complete.
5 Probabilistic and causal reasoning with a small model
In [4], the complexity of a problem named is investigated. The authors consider purely probabilistic models where means
is small, i.e., is polynomially bounded in the input size. Before talking further about , we need to show that their problem is equivalent to our problem according to Definition 1 where
is small (we cannot use their definition, as it is not meaningful to give constraints on the distribution for causal models where interventions can change this distribution in various ways).
Fact 2.
For a given model , let be the number of assignments to the observed variables with positive probability and be the number of assignments to the unobserved variables with positive probability.
Then .
The reverse does not hold. All functions in the model might be constant, in which case , regardless of . Thus a model that is a solution for the problem of [4] might not be a solution to our problem . Still:
Fact 3.
For a given probabilistic formula and a number , there exists a semi-Markovian model such that and if and only if there exists a model such that and .
Thereby it is critical to consider semi-Markovian models, as the unobserved variables constructed in the proof might not be independent, as it was required in Markovian models.
Fact 4.
Fact 3 does not hold for Markovian models.
Nevertheless, Fact 3 implies that the -completeness proof for a “” version restricted on the observed distribution in [4] also applies to our . Thus we know that is complete for a new complexity class , a class that extends the existential theory of reals with summation operators. This is not a succinct class, so it is different from , in fact they show
where is the class of problems decided by nondeterministic real Random Access Machines (RAMs) in exponential time (see [4] for the exact details).
Not allowing multiplication reduces the complexity, however, once we allow for interventions, the complexity increases again. We know the following complexities for linear arithmetic from [8], whose results translate to small models:
Lemma 13.
The following completeness results hold for small models:
-
and are -complete.
-
and are -complete.
-
and are -complete.
Allowing multiplications again leads to -completeness, even on the second (interventional) level. Note however that this is likely still weaker than the -completeness of and .
Theorem 14.
and are -complete.
This follows because a causal model consists of a probabilistic part and a deterministic, causal part . In the small model, is restricted to have polynomial size, but can still have exponential size. In , one cannot reason about , but and can access the full power of . Since subsumes , reasoning about polynomially many real numbers (probabilities) cannot increase the complexity.
6 Discussion
We investigated the computational complexities of probabilistic satisfiability problems from a multi-parametric point of view. Our new completeness results nicely extend and complement the previous achievements by [11, 22, 38, 4]. Our main focus was on the effect of constraining the model. We have shown that including a DAG in the input increases the complexity on the second level for linear languages (i.e., ) to -complete, while it does not change the complexity on the third level or for the full languages. We leave the exact complexity of the first linear level (i.e., ) for future work, but we conjecture that it is -complete. On the other hand, including a Bayesian network, a DAG and its probability tables reduces the complexity to being in . Here, the completeness is left for future work888As the main computation in the proof of Lemma 4 is performed by the -oracle; one can extend the proof to show the containment of in the space-limited complexity class and in the circuit complexity class . Thus, it is unlikely to be -complete, although it might be possible that all three classes are identical..
Another interesting feature is that when the model is unconstrained, then the completeness for linear languages is expressed in terms of standard Boolean classes while the completeness of satisfiability for languages involving polynomials over the probabilities requires classes over the reals. However, once we constrain the model to be small, then the complexity with polynomial arithmetic is again described by a Boolean complexity class (for PCH layers two and three). This is due to the fact that the running time of Renegar’s algorithm is essentially determined by the number of variables (which is small in the case of a small model) and not by the number of equations (which still can be large).
References
- [1] Antonis Achilleos. NEXP-completeness and universal hardness results for justification logic. In International Computer Science Symposium in Russia, pages 27–52. Springer, 2015. doi:10.1007/978-3-319-20297-6_3.
- [2] Sanjeev Arora and Boaz Barak. Computational complexity: a modern approach. Cambridge University Press, 2009.
- [3] Elias Bareinboim, Juan D. Correa, Duligur Ibeling, and Thomas Icard. On Pearl’s Hierarchy and the Foundations of Causal Inference, pages 507–556. Association for Computing Machinery, New York, NY, USA, 2022. doi:10.1145/3501714.3501743.
- [4] Markus Bläser, Julian Dörfler, Maciej Liśkiewicz, and Benito van der Zander. The existential theory of the reals with summation operators. In 35th Int. Symposium on Algorithms and Computation, ISAAC 2024, volume 322 of LIPIcs, pages 13:1–13:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.ISAAC.2024.13.
- [5] John Canny. Some algebraic and geometric computations in PSPACE. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 460–467. ACM, 1988. doi:10.1145/62212.62257.
- [6] Gregory F. Cooper. The computational complexity of probabilistic inference using Bayesian belief networks. Artificial intelligence, 42(2-3):393–405, 1990. doi:10.1016/0004-3702(90)90060-D.
- [7] Paul Dagum and Michael Luby. Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 60(1):141–153, 1993. doi:10.1016/0004-3702(93)90036-B.
- [8] Julian Dörfler, Benito van der Zander, Markus Bläser, and Maciej Liśkiewicz. From probability to counterfactuals: the increasing complexity of satisfiability in Pearl’s Causal Hierarchy. In International Conference on Learning Representations (ICLR). PMLR, to appear, 2025. Available as ArXiv TR 2405.07373.
- [9] Mathias Drton and Marloes H. Maathuis. Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 4:365–393, 2017.
- [10] Jeff Erickson, Ivor Van Der Hoog, and Tillmann Miltzow. Smoothing the gap between NP and ER. SIAM Journal on Computing, 53:FOCS20–102–FOCS20–138, 2024.
- [11] Ronald Fagin, Joseph Y. Halpern, and Nimrod Megiddo. A logic for reasoning about probabilities. Information and computation, 87(1-2):78–128, 1990. doi:10.1016/0890-5401(90)90060-U.
- [12] Ronald Aylmer Fisher. Design of experiments. British Medical Journal, 1(3923):554, 1936.
- [13] Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models. Frontiers in genetics, 10:524, 2019.
- [14] Dima Grigoriev and Nicolai Vorobjov. Solving systems of polynomial inequalities in subexponential time. J. Symb. Comput., 5(1/2):37–64, 1988. doi:10.1016/S0747-7171(88)80005-1.
- [15] Duligur Ibeling and Thomas Icard. Probabilistic reasoning across the causal hierarchy. In The 34th AAAI Conference on Artificial Intelligence, AAAI 2020, pages 10170–10177. AAAI Press, 2020. doi:10.1609/AAAI.V34I06.6577.
- [16] Duligur Ibeling, Thomas Icard, Krzysztof Mierzewski, and Milan Mossé. Probing the quantitative–qualitative divide in probabilistic reasoning. Annals of Pure and Applied Logic, 175(9):103339, 2024. doi:10.1016/J.APAL.2023.103339.
- [17] Duligur Ibeling, Thomas Icard, and Milan Mossé. On probabilistic and causal reasoning with summation operators. Journal of Logic and Computation, 2024.
- [18] Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
- [19] Harry R. Lewis. Complexity results for classes of quantificational formulas. Journal of Computer and System Sciences, 21(3):317–353, 1980. doi:10.1016/0022-0000(80)90027-6.
- [20] Ni Y. Lu, Kun Zhang, and Changhe Yuan. Improving causal discovery by optimal Bayesian network learning. In Proc. of the AAAI Conference on Artificial Intelligence, volume 35(10), pages 8741–8748, 2021. doi:10.1609/AAAI.V35I10.17059.
- [21] Nicolai Meinshausen, Alain Hauser, Joris M. Mooij, Jonas Peters, Philip Versteeg, and Peter Bühlmann. Methods for causal inference from gene perturbation experiments and validation. Proceedings of the National Academy of Sciences, 113(27):7361–7368, 2016.
- [22] Milan Mossé, Duligur Ibeling, and Thomas Icard. Is causal reasoning harder than probabilistic reasoning? The Review of Symbolic Logic, pages 1–26, 2022.
- [23] Ignavier Ng, Yujia Zheng, Jiji Zhang, and Kun Zhang. Reliable causal discovery with improved exact search and weaker assumptions. Advances in Neural Information Processing Systems, NeurIPS, 34:20308–20320, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/a9b4ec2eb4ab7b1b9c3392bb5388119d-Abstract.html.
- [24] Nils J. Nilsson. Probabilistic logic. Artificial Intelligence, 28(1):71–87, 1986. doi:10.1016/0004-3702(86)90031-7.
- [25] James D. Park and Adnan Darwiche. Complexity results and approximation strategies for MAP explanations. Journal of Artificial Intelligence Research, 21:101–133, 2004. doi:10.1613/JAIR.1236.
- [26] Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, 1988.
- [27] Judea Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–688, 1995.
- [28] Judea Pearl. Causality. Cambridge University Press, 2009.
- [29] Judea Pearl and Dana Mackenzie. The book of why: the new science of cause and effect. Basic books, 2018.
- [30] Alexander Reisach, Christof Seiler, and Sebastian Weichwald. Beware of the simulated dag! causal discovery benchmarks may be easy to game. Advances in Neural Information Processing Systems, NeurIPS, 34:27772–27784, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/e987eff4a7c7b7e580d659feb6f60c1a-Abstract.html.
- [31] Alexander Reisach, Myriam Tami, Christof Seiler, Antoine Chambaz, and Sebastian Weichwald. A scale-invariant sorting criterion to find a causal order in additive noise models. Advances in Neural Information Processing Systems, NeurIPS, 36:785–807, 2023.
- [32] James Renegar. On the computational complexity and geometry of the first-order theory of the reals. Part I: Introduction. Preliminaries. The geometry of semi-algebraic sets. The decision problem for the existential theory of the reals. Journal of symbolic computation, 13(3):255–299, 1992. doi:10.1016/S0747-7171(10)80003-3.
- [33] Paul Rolland, Volkan Cevher, Matthäus Kleindessner, Chris Russell, Dominik Janzing, Bernhard Schölkopf, and Francesco Locatello. Score matching enables causal discovery of nonlinear additive noise models. In International Conference on Machine Learning, ICLM, pages 18741–18753. PMLR, 2022. URL: https://proceedings.mlr.press/v162/rolland22a.html.
- [34] Dan Roth. On the hardness of approximate reasoning. Artificial Intelligence, 82(1-2):273–302, 1996. doi:10.1016/0004-3702(94)00092-1.
- [35] Marcus Schaefer. Complexity of some geometric and topological problems. In International Symposium on Graph Drawing, pages 334–344. Springer, 2009. doi:10.1007/978-3-642-11805-0_32.
- [36] Ilya Shpitser and Judea Pearl. Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9(Sep):1941–1979, 2008. doi:10.5555/1390681.1442797.
- [37] Chandler Squires and Caroline Uhler. Causal structure learning: A combinatorial perspective. Foundations of Computational Mathematics, pages 1–35, 2022.
- [38] Benito van der Zander, Markus Bläser, and Maciej Liśkiewicz. The hardness of reasoning about probabilities and causality. In Proc. Joint Conference on Artificial Intelligence (IJCAI 2023), 2023.
- [39] Junzhe Zhang, Jin Tian, and Elias Bareinboim. Partial counterfactual identification from observational and experimental data. In International Conference on Machine Learning, pages 26548–26558. PMLR, 2022. URL: https://proceedings.mlr.press/v162/zhang22ab.html.