Abstract 1 Introduction 2 Preliminaries 3 Results 4 Satisfiability and validity with requirements on the graph structure of SCMs 5 Probabilistic and causal reasoning with a small model 6 Discussion References

Probabilistic and Causal Satisfiability:
Constraining the Model

Markus Bläser ORCID Saarland University, Saarbrücken, Germany Julian Dörfler ORCID Saarland University, Saarbrücken, Germany Maciej Liśkiewicz ORCID University of Lübeck, Germany Benito van der Zander ORCID University of Lübeck, Germany
Abstract

We study the complexity of satisfiability problems in probabilistic and causal reasoning. Given random variables X1,X2, over finite domains, the basic terms are probabilities of propositional formulas over atomic events Xi=xi, such as (X1=x1) or (X1=x1X2=x2). The basic terms can be combined using addition (yielding linear terms) or multiplication (polynomial terms). The probabilistic satisfiability problem asks whether a joint probability distribution satisfies a Boolean combination of (in)equalities over such terms. Fagin et al. [11] showed that for basic and linear terms, this problem is 𝖭𝖯-complete, making it no harder than Boolean satisfiability, while Mossé et al. [22] proved that for polynomial terms, it is complete for the existential theory of the reals.

Pearl’s Causal Hierarchy (PCH) extends the probabilistic setting with interventional and counterfactual reasoning, enriching the expressiveness of the languages. However, Mossé et al. [22] found that the complexity of satisfiability remains unchanged. Van der Zander et al. [38] showed that introducing a marginalization operator to languages induces a significant increase in complexity.

We extend this line of work by adding two new dimensions to the problem by constraining the models. First, we fix the graph structure of the underlying structural causal model, motivated by settings like Pearl’s do-calculus, and give a nearly complete landscape across different arithmetics and PCH levels. Second, we study small models. While earlier work showed that satisfiable instances admit polynomial-size models, this is no longer guaranteed with compact marginalization. We characterize the complexities of satisfiability under small-model constraints across different settings.

Keywords and phrases:
Existential theory of the real numbers, Computational complexity, Probabilistic logic, Structural Causal Models
Category:
Track B: Automata, Logic, Semantics, and Theory of Programming
Funding:
Benito van der Zander: Work supported by the Deutsche Forschungsgemeinschaft (DFG) grant 471183316 (ZA 1244/1-1).
Copyright and License:
[Uncaptioned image] © Markus Bläser, Julian Dörfler, Maciej Liśkiewicz, and Benito van der Zander; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Complexity theory and logic
Related Version:
Full Version: http://arxiv.org/abs/2504.19944
Editors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

1 Introduction

Reasoning about probability is essential in many research fields. In computer science, it plays a crucial role in analyzing probabilistic programs, understanding a program’s behavior under probabilistic input assumptions, and handling uncertain information in expert systems. In a seminal paper, Fagin et al. [11], introduce a logic to reason about probabilities. Their aim was to formulate a calculus that allows to phrase statements like “(X=x)=1/2” or “(X=x)<(Y=y)”. They study (among other things) the satisfiability problem for Boolean combinations of such terms, i.e., is there a probability distribution that satisfies all the terms. It turns out that the expressiveness of the underlying calculus has a great influence on the complexity of such satisfiability problems. For instance, in their original work, Fagin et al. investigated three types of terms in the (in)equalities: basic ones, like the one above, which consist of only single basic probabilities, linear combinations of basic probabilities, and polynomial expressions in basic probabilities. This choice parameterizes the satisfiability problem and it turns out that the type of equations has an impact on the complexity of the corresponding satisfiability problem.

We will study the satisfiability of such formulas from a multi-parametric view: there are several parameters in the underlying calculus, like the type of equations mentioned above, and by adjusting these parameters, we get a large family of satisfiability of varying complexities, depending on the choice of our parameters. There has been a lot of work along these lines, see e.g. [11, 15, 22, 17, 38, 4, 16, 8], and the main contribution of this paper is that we finally (almost) complete the whole picture.

Another dimension is that we can enhance the basic probability terms. This is inspired by the causal theory in AI. The development of the modern causal theory in AI and empirical sciences has greatly benefited from an influential structured approach to inference about causal phenomena, which is based on a reasoning hierarchy named “Ladder of Causation”, also often referred to as “Pearl’s Causal Hierarchy” (PCH) ([36, 28, 3], see also [29] for a gentle introduction to the topic). This three-level framework formalizes various types of reasoning that reflect the progressive sophistication of human thought regarding causation. It arises from a collection of causal mechanisms that model the “ground truth” of unobserved nature formalized within a Structural Causal Model (SCM). These mechanisms are then combined with three patterns of reasoning concerning observed phenomena expressed at the corresponding layers of the hierarchy, known as probabilistic (also called associational in the AI literature), interventional, and counterfactual (for formal definitions of these concepts as well as for illustrative examples, see Section 2).

A basic term at the probabilistic (observational) layer is expressed as a common probability, such as111In our paper, we consider random variables over discrete, finite domains. By an event, we mean a propositional formula over atomic events of the form X=x, such as (X=xY=y) or (X=xYy). Moreover, by (Y=y,X=x), etc., we mean, as usually, (X=xY=y). Finally by using lowercase letters in (x,y), we abbreviate (Y=y,X=x). (x,y). This may represent queries like “How likely does a patient have both diabetes (X=x) and high blood pressure (Y=y)?” This layer corresponds precisely to the calculus developed by Fagin et al. mentioned above. The interventional patterns extend the basic probability terms by allowing the use of Pearl’s do-operator [28] which models an experiment like a randomized controlled trial [12]. For instance, ([x]y) which222A common and popular notation for the post-interventional probability is (Y=y|do(X=x)). In this paper, however, we use the notation ([X=x]Y=y) since it is more convenient for our analysis., in general differs from (y|x), allows to ask hypothetical questions such as, e.g., “How likely it is that a patient’s headache will be cured (Y=y) if he or she takes aspirin (X=x)?”. An example formula at this layer is ([x]y)=z(y|x,z)(z) which estimates the causal effect of the intervention do(X=x) (all patients take aspirin) on outcome variable Y=y (headache cure). It illustrates the use of the prominent back-door adjustment to eliminate the confounding effect of a factor represented by variable Z [28]. The basic terms at the highest level of the hierarchy enable us to formulate queries related to counterfactual situations. For example, ([X=x]Y=y|(X=x,Y=y)) expresses the probability that, for instance, a patient who did not receive a vaccine (X=x) and died (Y=y) would have lived (Y=y) if he or she had been vaccinated (X=x).

The computational complexity aspects of reasoning about uncertainty in this framework have been the subject of intensive studies in the past decades. The research has resulted in a large number of significant achievements, especially in the case of probabilistic inference with the input probability distributions encoded by Bayesian networks [26]. These problems are of great interest in AI and important results from this perspective include [6, 7, 34, 25, 18]. However, despite intensive research, many fundamental problems in the field remain open.

The main interest of our research is focused on the precise characterization of the computational complexity of satisfiability problems (and their validity counterparts) for languages of all PCH layers, combined with increasing the expressiveness of (in)equalities by enabling the use of more complex operators. Our primary concern is the effect of constraining the model in these satisfiability problems, on the one hand by fixing the graph structure of the model, and on the other by bounding the model size to be polynomial.

1.1 Reasoning about probabilities: a multi-parametric view

One starting point for our investigations is a pioneering paper work by Fagin et al. [11], who introduce a logic to reason about probabilities. We are given a set of random variables over some fixed finite domain D, often {0,1}. They consider formulas consisting of Boolean combinations of (in)equalities of basic and linear terms, like ((X=0Y=1)(X=0Y=0))=1((X=0)=0(X=0)=1)((Y=0)=0(Y=0)=1) with binary variables X,Y. The authors provide a complete axiomatization for the used logic, which is essentially a formalization of Nilsson’s probabilistic logic [24] and they show that the problem of deciding satisfiability is 𝖭𝖯-complete. Thus, surprisingly, the complexity is no worse than that of propositional logic. Fagin et al. extend then the language to (in)equalities of polynomial terms, with the goal of reasoning about conditional probabilities. They prove that the latter variant is 𝖭𝖯-hard and contained in 𝖯𝖲𝖯𝖠𝖢𝖤. Recently, Mossé et al. [22] have given the exact complexity of this problem, showing that deciding the satisfiability is -complete, where is the well-studied class defined as the closure of the Existential Theory of the Reals (ETR) under polynomial time many-one reductions.

In this work, we study the satisfiability (and validity) problem for such formulas involving probabilities. As already seen above, the choice of the admissible operations have an impact on the complexity. There are further choices, which we can make and which we will discuss in the following. All these choices have an impact on the complexity. Therefore, we will phrase these satisfiability problems as multi-parametric or multi-dimensional problems. The first dimension is the expressiveness of the underlying arithmetic:

Arithmetic:

Basic, linear, or polynomial.

The second dimension will be the layer in the Pearl’s Causal Hierarchy which we have already discussed in the previous section. Reasoning about uncertainty has been the subject of intensive studies in the past decades. The research has resulted in a large number of significant achievements, especially in the case of probabilistic inference with the input probability distributions encoded by Bayesian networks [26]. The setting is of great interest in AI since it allows one not only to reason about observations, but also about causality, and even counterfactuals, which is for instance important in fairness considerations. These three layers form a hierarchy of increasing expressiveness (for a more detailed description of the concepts, see Section 2).

PCH layer:

Probabilistic (observational), causal (interventional), or counterfactual.

The languages used in [11, 22] are capable of fully expressing probabilistic reasoning, in particular, they allow to express marginalization, which is a common paradigm in this field. However, in the frameworks discussed above, this ability is very limited since the languages do not allow the use of a unary summation operator Σ. Thus, for instance333 We have chosen this example because it is short. In this particular example, the joint probability could be directly written as (y) in the calculus of Fagin et al., see the formal definition in Section 2. This is however not true anymore if we sum over more complicated expressions., to express the marginal distribution of a random variable Y over a subset of (binary) variables {Z1,,Zm} as z1,,zm(y,z1,,zm), an encoding without summation requires an expansion into (y,Z1=0,,Zm=0)++(y,Z1=1,,Zm=1), which of exponential size in m. Consequently, to analyze the complexity aspects of the studied problems, languages that exploit the standard notation of probability theory and statistics, one requires an encoding that represents marginalization via the sum operator Σ. In the recent paper [38], the authors present the first systematic study in this setting. They introduce a new natural class, named 𝗌𝗎𝖼𝖼-, which can be viewed as a succinct variant of .

Given the impact of the compact marginalization operator, the way we can perform marginalization will be the third dimension:

Marginalization:

Expanded or compact.

For expanded marginalization, the complexity results by Fagin et al. [11], as well as Mossé [22] et al., can be easily adapted to capture all other parameter ranges, see the full version of our work. Therefore, throughout this paper, we only deal with the case when we have a compact marginalization operator in our calculus.

The fact that the decision problem for basic and linear arithmetic without summation is in 𝖭𝖯 is not obvious. A model for a formula is a joint probability distribution of n, say, binary variables, which is a table with 2n entries. Furthermore, it is a priori not clear that the bit size of the entries is polynomial. It turns out, that these problems have what is called a small model property. If there is a model, that is, a joint probability distribution satisfying the expression, then there is one which has only polynomially many non-zero entries. This follows by linear algebra arguments. If the model is small, then it can be guessed efficiently in 𝖭𝖯. In the case of polynomial arithmetic, we still have the small model property, but one cannot guarantee that the entries have polynomial size. However, the compact marginalization operator destroys the small model property that was crucial in the work of [11] and [22] and in fact increases the complexity of the satisfiability problems dramatically [38]. So the next dimension is the model size, which can either be polynomially bounded or arbitrary:

Model size:

Small (polynomially many entries) or unbounded.

The impact of restricting the model size in the presence of a compact marginalization operator will be one of the main topics in this work (see the results in Section 5).

Causal models are often represented as a graph or Bayesian network where all random variables are represented as nodes and the edges encode which variables influence each other. E.g., at the probabilistic level, the graph would encode which variables are conditionally independent of each other. In the satisfiability problems described so far, the graph was not mentioned, so the task was to decide whether there exists any model with any graph structure that satisfies the formula. In many studies, a researcher can infer information about the graph structure, combining observed and experimental data with existing knowledge. Indeed, learning graphical, causal structures from data has been the subject of a considerable amount of research (see, e.g., [21, 9, 13, 37] for recent reviews). Thus, we will study the satisfiability (resp. validity) of formulas with the additional requirement that the graph structure 𝒢 is also given in the input. The goal would be to determine whether the formula is satisfied in a model (resp. is valid in all models) having the structure 𝒢.

In fact, such a constraint validity problem is one of the central problems of causality. E.g., given a graph structure, Pearl’s prominent do-calculus [28] is often applied to show the validity of the equivalence between causal and probabilistic expressions. The impact of including the graph in the input on the complexity will be the second important question that we will study in this work (see Section 4). Thus, as a fifth dimension, we study the model structure:

Model structure:

Specified by a graph as a part of the input or unconstrained.

1.2 A brief overview of our results

We continue by giving an informal description of our results. A detailed description can be found in Section 3.

We first show that constraining the graph structure of the model makes the problem only harder in the sense that the unconstrained problem can be reduced to the constrained one, since the unconstrained problem can always be reduced to the constrained one with the complete directed acyclic graph as a model. It turns out that for the probabilistic or interventional layer together with basic or linear arithmetic, there is indeed a jump. The probabilistic layer becomes at least -hard, while the interventional layer jumps from 𝖯𝖲𝖯𝖠𝖢𝖤 to 𝖭𝖤𝖷𝖯. For all other combinations of the PCH layer and arithmetic, no such jump happens and the complexity stays the same, however, we require new proof techniques. The exact results are given compactly in Table 2 (for comparison, the complexity landscape for unconstrained models, can be found in Table 1).

Second, we consider the case when the underlying model is small; small here means that for the underlying probability distribution P, the number of tuples (u1,,um) with P(U1=u1,,Um=um)>0 is polynomially bounded in the input size. For the interventional and counterfactual layer with polynomial arithmetic, this does not affect the complexity of the problem (again, new proof techniques are required), but for the probabilistic layer, the complexity drops down to Σ, a new class contained in 𝖯𝖲𝖯𝖠𝖢𝖤. The full results are given compactly in Table 3.

2 Preliminaries

2.1 Pearl’s Causal Hierarchy: An example

To illustrate the main ideas behind Pearl’s Causal Hierarchy (PCH), we present below an example that, we hope, will make it easier to understand the formal definitions given in Section 2.2. In this example, we consider a hypothetical scenario involving three attributes represented by binary random variables: Age, modeled by Z=1 (young) and Z=0 (old), (COVID-19) Vaccination represented by X=1, if yes, and X=0 if not vaccinated, and Recovery, represented by Y=1 (and Y=0 meaning mortality). Below, we describe a structural causal model (SCM) that represents an unobserved true mechanism underlying this scenario and illustrates the canonical patterns of reasoning expressible at different levels of the PCH.

U1U2U3P(𝐮)ZXY0000.04740100010.42660110100.01260010110.11340001000.03161011010.28441001100.00841111110.0756111(a)Hidden    ZXYP(z,x,y)0000.11340010.01260100.04740110.42661000.28441010.03161110.0840(b)Observed PP(𝐮)ZXY0.04740100.42660110.01260100.11340110.03161110.28441110.00841110.0756111(c)do(X=1)ZYP([X=1]z,y)000.06010.54100.00110.40(d)Post-int. P

Figure 1: (a) Unobserved true nature: Probabilities of the unobserved variables and the induced by the mechanism outcomes of the observed variables. (b) Observational layer: Probability distribution P(z,x,y) of the observed variables. (c) Interventional layer: The modified mechanism by using the operator do(X=1) (shown in (c)) leads to the post-interventional distribution (d), denoted as P([X=1]z,y), for z,y{0,1}. A challenging task here is to compute the post-interventional distribution (d) from the observed distribution (b).

Structural Causal Model.

An SCM is defined as a tuple (,P,𝐔,𝐗) which is of unobserved nature from the perspective of an empirical researcher. It specifies the distribution P(𝐔) of the population and the mechanism . In our example, the model assumes three independent binary random variables 𝐔={U1,U2,U3}, with probabilities: P(U1=1)=0.4,P(U2=1)=0.21,P(U3=1)=0.9. They affect the observed endogenous (observed) random variables Z,X,Y via the mechanism ={F1,F2,F3} specified as follows: Z:=F1(U1)=U1; X:=F2(Z,U2)=ZU2+(1Z)(1U2); Y:=F3(Z,X,U3)=XZ+(1X)(1U3)+X(1Z)U3.

Thus, the model determines the distribution P(𝐮), for 𝐮=(u1,u2,u3), and the values for the observed variables, as can be seen in Fig. 1(a).

The unobserved random variable U1 reflects the age of the population and Z is a function of U1 (which may be more complex in a real population). Getting COVID-19 vaccination depends on age but also on other circumstances (severe allergic reaction to a component of the COVID-19 vaccine, moderate or severe acute illness, etc.) and this is modeled by U2. So X is a function of Z and U2. Finally, Y depends on age, being vaccinated, and on further circumstances like having other diseases, which are modeled by U3. So Y is a function of Z, X, and U3. The functions Fi define a directed graph structure on the random (observed) variables: there is an edge from A to B if B depends on A. We always assume that dependency graph of the SCM is acyclic. This property is also called semi-Markovian. In our example, we get the DAG:

Layer 1 (probabilistic).

Empirical sciences primarily rely on observed data, typically represented as probability distributions over measurable variables. In our example, this corresponds to the distribution P(Z,X,Y). The unobserved variables U1,U2,U3 along with the causal mechanism , remain hidden from direct observation. A researcher thus receives the probabilities probabilities (shown in Fig. 1(b)) P(z,x,y)=𝐮δ,𝐮(z,x,y)P(𝐮), where vectors 𝐮=(u1,u2,u3){0,1}3 and δ,𝐮(z,x,y)=1 if F1(u1)=z,F2(z,u2)=x, and F3(z,x,u3)=y; otherwise δ,𝐮(z,x,y)=0. The relevant query in our scenario P(Y=1|X=1) can be evaluated as P(Y=1|X=1)=P(Y=1,X=1)/P(X=1)=0.5106/0.5580.915 which says that the probability for recovery (Y=1) is high given that the patient was vaccinated (X=1). On the other hand, the query for X=0 can be evaluated as P(Y=1|X=0)=0.0442/0.442=0.1, which may lead to the opinion that the vaccine is relevant to recovery. However, these results do not reflect a randomized controlled trial [12] that could establish the differences between a COVID-19 vaccination and a placebo vaccination.

Layer 2 (interventional).

Consider a randomized trial in which each patient is vaccinated, denoted as do(X=1), regardless of age (Z) and other conditions (U2). We model this by performing a hypothetical intervention in which we replace in the mechanism F2(Z,U2) by the constant function 1 and leaving the remaining functions unchanged (see, Fig. 1(c)). If X=1={F1=F1,F2=1,F3=F3} denotes the new mechanism, then the post-interventional distribution P([X=1]Z,Y) is specified as P([X=1]z,y)=𝐮δX=1,𝐮(z,y)P(𝐮), where δX=1,𝐮 denotes function δ as above, but for the new mechanism X=1 (the distribution is shown in Fig. 1(d))444A common and popular notation for the post-interventional probability is P(Z,Y|do(X=1)). In this paper, we use the notation P([X=1]Z,Y) since it is more convenient for analyses involving counterfactuals.. To determine the causal effect of the COVID-19 vaccination on recovery, we compute, in an analogous way, the distribution P([X=0]Z,Y) after the intervention do(X=0), which means that all patients receive a placebo vaccination. Then, comparing the value P([X=1]Y=1)=0.95 with P([X=0]Y=1)=0.0874, we can conclude that P([X=1]Y=1)P([X=0]Y=1)>0. This can be interpreted as a positive (average) effect of the vaccine in the population. Note that it is not obvious how to compute the post-interventional distributions from the observed probability P(Z,X,Y); Indeed, this is a challenging task in the field of causality.

Layer 3 (counterfactual).

The key phenomena that can be modeled and analyzed at this level are counterfactual situations. Imagine, e.g., in our scenario, there is a group of patients who have not been vaccinated and died (X=0,Y=0). One may ask, what the outcome (Y) would be had they been vaccinated (X=1). In particular, one can ask what the probability of recovery would be if we had vaccinated patients in this group. Using the formalism of Layer 3, we can express this as a counterfactual query: P([X=1]Y=1|X=0,Y=0)=P([X=1](Y=1)(X=0,Y=0))/P(X=0,Y=0). Note that the event [X=1](Y=1)(X=0,Y=0) incorporates simultaneously two counterfactual mechanisms: X=1 and . This is the key difference to Layer 2, where we can only have one.

2.2 Syntax and semantics of probabilistic calculi

In this section, we describe the underlying formalism of the satisfiability problems that we consider. This will cover the first three parameter dimensions: the underlying arithmetic, with or without compact marginalization, and the PCH level. The other two dimensions then naturally follow since they are obtained by restricting the underlying model.

We always consider discrete distributions in the probabilistic and causal languages studied in this paper. We represent the values of the random variables as 𝑉𝑎𝑙={0,1,. . .,c1} and denote by 𝐗 the set of random variables used in a system. Here, we use bold letters for sets of variables or sets of values. By capital letters X1,X2,. . ., we denote the individual variables and assume, w.l.o.g., that they all share the same domain 𝑉𝑎𝑙. A value of Xi is often denoted by the corresponding lowercase letter xi or a natural number. In this section, we describe the syntax and semantics of the languages starting with probabilistic ones and then we provide extensions to the causal systems.

By an atomic event, we mean an event of the form X=x, where X is a random variable and x is a value in the domain of X. The language prop of propositional formulas over atomic events is defined as the closure of such events under the Boolean operators and ¬. To specify the syntax of interventional and counterfactual events, we define the intervention and extend the syntax of prop to post-int and counterfact, respectively, using the following grammars:

prop is defined by𝐩::=X=x¬𝐩𝐩𝐩int is defined by𝐢::=X=x𝐢𝐢post-int is defined by𝐩𝐢::=[𝐢]𝐩counterfact is defined by𝐜::=𝐩𝐢¬𝐜𝐜𝐜.

Note that since means that no intervention has been applied, we can assume that proppost-int.

The PCH consists of three languages 1,2,3, each of which is based on terms of the form (δ). For the (observational or associational) language 1, we have δprop, for the (interventional) language 2, we have δpost-int and for the (counterfactual) language 3, δcounterfact. The expressive power and computational complexity properties of the languages depend largely on the operations that we are allowed to apply to the basic terms. Allowing gradually more complex operators, we describe the languages which are the subject of our studies below. We start with the description of the languages 𝒯i of terms, with i=1,2,3, using the following grammars555In the given grammars, we omit the brackets for readability, but we assume that they can be used in a standard way.

𝒯ibase𝐭::=(δi)𝒯ibaseΣ𝐭::=(δi)x𝐭𝒯ilin𝐭::=(δi)𝐭+𝐭𝒯ilinΣ𝐭::=(δi)𝐭+𝐭x𝐭𝒯ipoly𝐭::=(δi)𝐭+𝐭𝐭𝐭𝐭𝒯ipolyΣ𝐭::=(δi)𝐭+𝐭𝐭𝐭𝐭x𝐭

where δ1 are formulas in prop, δ2post-int, δ3counterfact.

The probabilities of the form (δi) are called primitives or basic terms. In the summation operator x, we have a dummy variable x which ranges over all values 0,1,,c1. The summation x𝐭 is a purely syntactical concept which represents the sum 𝐭[0/x]+𝐭[1/x]+. . .+𝐭[c1/x], whereby 𝐭[v/x], we mean the expression in which all occurrences of x are replaced with value v. For example, for 𝑉𝑎𝑙={0,1}, the expression x(Y=1,X=x) semantically represents (Y=1,X=0)+(Y=1,X=1).

We note that the dummy variable x is not a (random) variable in the usual sense and that its scope is defined in the standard way.

In the table above, the terms in 𝒯ibase are just basic probabilities with the events given by the corresponding languages prop, post-int, or counterfact. Next, we extend terms by being able to compute sums of probabilities and by adding the same term several times, we also allow for weighted sums with weights given in unary. Note that this is enough to state all our hardness results. All matching upper bounds also work when we allow for explicit weights given in binary. In the case of 𝒯ipoly, we are allowed to build polynomial terms in the primitives. On the right-hand side of the table, we have the same three kinds of terms, but to each of them, we add a marginalization operator as a building block.

The polynomial calculus 𝒯ipoly was originally introduced by Fagin, Halpern, and Megiddo [11] (for i=1) to be able to express conditional probabilities by clearing denominators. While this works for 𝒯ipoly, this does not work in the case of 𝒯ipolyΣ, since clearing denominators with exponential sums creates expressions that are too large. But we could introduce basic terms of the form (δi|δ) with δprop explicitly. All our hardness proofs work without conditional probabilities but all our matching upper bounds are still true with explicit conditional probabilities.

Expressions like (X=1)+(Y=2)(Y=3) are valid terms in 𝒯1poly and z([X=0](Y=1,Z=z)) and z(([X=0]Y=1),Z=z) are valid terms in the language 𝒯3polyΣ, for example.

Now, let Lab={base,baseΣ,lin,linΣ,poly,polyΣ} denote the labels of all variants of languages. Then for each Lab and i=1,2,3, we define the languages i of Boolean combinations of inequalities in a standard way:

i is defined by𝐟::=𝐭𝐭¬𝐟𝐟𝐟,where 𝐭,𝐭 are terms in 𝒯i.

Although the language and its operations can appear rather restricted, all the usual elements of probabilistic and causal formulas can be encoded. Namely, equality is encoded as greater-or-equal in both directions, e.g. (x)=(y) means (x)(y)(y)(x).

The number 0 can be encoded as an inconsistent probability, i.e., (X=1X=2). In a language allowing addition and multiplication, any positive integer can be easily encoded from the fact ()1, e.g. 4(1+1)(1+1)(()+())(()+()). If a language does not allow multiplication, one can show that the encoding is still possible. Note that these encodings barely change the size of the expressions, so allowing or disallowing these additional operators does not affect any complexity results involving these expressions.

To define the semantics of the languages, we use a structural causal model (SCM) as in [28, Sec. 3.2]. An SCM is a tuple 𝔐=(,P, 𝐔,𝐗), such that 𝐕=𝐔𝐗 is a set of variables partitioned into exogenous (unobserved) variables 𝐔={U1,U2,. . .} and endogenous variables 𝐗. The tuple ={F1,. . .,Fn} consists of functions such that function Fi calculates the value of variable Xi from the values (𝐱,𝐮) of other variables in 𝐕 as Fi(𝐩𝐚i,𝐮i) 666We consider recursive models, that is, we assume the endogenous variables are ordered such that variable Xi (i.e. function Fi) is not affected by any Xj with j>i. Here, we also use the usual notation with capital letters and lowercase letters where 𝐏𝐚i are variables and 𝐩𝐚i is an assignment of values to those variables., where 𝐏𝐚i𝐗 and 𝐔i𝐔. P specifies a probability distribution of all exogenous variables 𝐔. Since variables 𝐗 depend deterministically on the exogenous variables via functions Fi, and P obviously define the joint probability distribution of 𝐗.

Throughout this paper, we assume that domains of endogenous variables 𝐗 are discrete and finite. In this setting, exogenous variables 𝐔 could take values in any domain, including infinite and continuous ones. A recent paper [39] shows, however, that any SCM over discrete endogenous variables is equivalent for evaluating post-interventional probabilities to an SCM where all exogenous variables are discrete with finite domains.

As a consequence, throughout this paper, we assume that domains of exogenous variables 𝐔 are discrete and finite, too.

For any basic int-formula Xi=xi (which, in our notation, means do(Xi=xi)), we denote by Xi=xi the function obtained from by replacing Fi with the constant function Fi(𝐯):=xi. We generalize this definition for all interventions specified by αint in a natural way and denote as α the resulting functions. For any φprop, we write ,𝐮φ if φ is satisfied for values of 𝐗 calculated from the values 𝐮. For αint, we write ,𝐮[α]φ if α,𝐮φ. And for all ψ,ψ1,ψ2counterfact, we write (i) ,𝐮¬ψ if ,𝐮⊧̸ψ and (ii) ,𝐮ψ1ψ2 if ,𝐮ψ1 and ,𝐮ψ2.

Finally, for ψcounterfact, let S𝔐(ψ)={𝐮,𝐮ψ}.

We define 𝐞𝔐, for some expression 𝐞, recursively in a natural way, starting with basic terms as follows (ψ)𝔐=𝐮S𝔐(ψ)P(𝐮) and, for δprop, (ψ|δ)𝔐=(ψδ)𝔐/(δ)𝔐, assuming that the expression is undefined if (δ)𝔐=0. For two expressions 𝐞1 and 𝐞2, we define 𝔐𝐞1𝐞2, if and only if, 𝐞1𝔐𝐞2𝔐. The semantics for negation and conjunction are defined in the usual way, giving the semantics for 𝔐φ for any formula φ in 3.

2.3 Probabilistic and causal satisfiability problems

The (decision) satisfiability problems for languages of the PCH, denoted by Sati, with i=1,2,3 and Lab, take as input a formula φ in i and ask whether there exists a model 𝔐 such that 𝔐φ. Analogously, the validity problems for i consist in deciding whether, for a given φ, 𝔐φ holds for all models 𝔐. From the definitions, it is obvious that variants of the problems for the level i are at least as hard as their counterparts at the lower level.

In many studies, a researcher can infer information about the graph structure of the SCM, combining observed and experimental data with existing knowledge. Indeed, learning causal structures from data has been the subject of a considerable amount of research (see, e.g., [9, 13, 37] for recent reviews and [30, 23, 20, 33, 31] for current achievements). Thus, the problem of interest in this setting, which is a subject of this section, is to determine the satisfiability, resp., validity of formulas in SCMs with the additional requirement on the graph structure of the models.

Let 𝔐=(={F1,,Fn},P,𝐔,𝐗={X1,,Xn}) be an SCM. We will assume that the models are semi-Markovian in the general case and Markovian in the graph constrained case777We note that the general semi-Markovian model, which allows for the sharing of exogenous arguments and allows for arbitrary dependencies among the exogenous variables, can be reduced in a standard way to the Markovian model by introducing new auxiliary “latent” variables L1,,Lk which belong to the endogenous variables 𝐗 of the model, but which are non-measurable in contrast to the “standard” observed / measurable variables. Then we can use such latent variables as nodes of a DAG but the joint probability distribution on 𝐗 is restricted to only the measurable variables and ignores L1,,Lk. , Markovian means that the exogenous arguments Ui,Uj of Fi, resp. Fj are independent whenever ij. We define that a DAG 𝒢=(𝐗,E) represents the graph structure of 𝔐 if, for every Xj appearing as an argument of Fi, XjXi is an edge in E. DAG 𝒢 is called a causal diagram of the model 𝔐 [28, 3]. The satisfiability problems with an additional requirement on the causal diagram, take as input a formula φ and a DAG 𝒢 with the task of deciding whether there exists an SCM with the structure 𝒢 for φ. In this way, for problems Sati, we get the corresponding problems denoted as SatDAG,i, with Lab.

An important feature of Satipoly instances, over all levels i=1,2,3, is the small model property which says that, for every satisfiable formula, there is a model whose size is bounded polynomially with respect to the length of the input. This property was used to prove the memberships of Satipoly in 𝖭𝖯 and in [11, 15, 22, 16]. Interestingly, membership proofs of Sat1linΣ in 𝖭𝖯PP and Sat2linΣ in 𝖯𝖲𝖯𝖠𝖢𝖤 [8] rely on the property as well.

Apart from these advantages, the small model property is interesting in itself. For example, on the probabilistic layer, if a formula φ (without the summation) over variables X1,,Xn is satisfiable, then there exists a Bayesian network =(𝒢,P), with a DAG 𝒢 over nodes X1,,Xn and conditional probability tables P for each variable Xi, such that φ is true in and the total size of the tables is bounded by a polynomial of |φ|.

These have motivated the introduction of the small-model problem, denoted as Satsm1polyΣ, which is defined like Sat1polyΣ with the additional constraint that a satisfying distribution should only have polynomially large support, that is, only polynomially many entries in the exponentially large table of probabilities are non-zero [4]. In the paper, the authors achieve this by extending an instance with an additional unary input p and requiring that the satisfying distribution has a support of size at most p. We define Satsm2polyΣ and Satsm3polyΣ in a similar way. Formally, we use the following:

Definition 1.

The decision problems Satsm, ipolyΣ, with i=1,2,3, take as input a formula φipolyΣ and a unary encoded number p and ask whether there exists a model 𝔐=(,P,𝐔={U1,,Um},𝐗) such that 𝔐φ and #{(u1,,um):P(U1=u1,,Um=um)>0}p.

2.4 The (succinct) existential theory of the reals

For two computational problems A,B, we will write APB if A can be reduced to B in polynomial time, which means A is not harder to solve than B. A problem A is complete for a complexity class 𝒞, if A𝒞 and, for every other problem B𝒞, it holds BPA. By 𝚌𝚘-𝒞, we denote the class of all problems A such that their complements A¯ belong to 𝒞.

To measure the computational complexity of Sati, a central role play the following, well-known Boolean complexity classes 𝖭𝖯,𝖯𝖲𝖯𝖠𝖢𝖤,𝖭𝖤𝖷𝖯, and 𝖤𝖷𝖯𝖲𝖯𝖠𝖢𝖤 (for formal definitions see, e.g., [2]). Recent research has shown that the precise complexity of several natural satisfiability problems can be expressed in terms of the classes over the real numbers and 𝗌𝗎𝖼𝖼-. Recall, that the existential theory of the reals (ETR) is the set of true sentences of the form

x1xnφ(x1,,xn), (1)

where φ is a quantifier-free Boolean formula over the basis {,,¬} and a signature consisting of the constants 0 and 1, the functional symbols + and , and the relational symbols <, , and =. The sentence is interpreted over the real numbers in the standard way. The theory forms its own complexity class which is defined as the closure of the ETR under polynomial time many-one reductions [14, 5, 35]. A succinct variant of ETR, denoted as succ-ETR, and the corresponding class 𝗌𝗎𝖼𝖼-, have been introduced in [38]. succ-ETR is the set of all Boolean circuits C that encode a true sentence as in (1) as follows. Assume that C computes a function {0,1}N{0,1}M. Then {0,1}N represents the node set of the tree underlying φ and C(i) is an encoding of the description of node i, consisting of the label of i, its parent, and its two children. The variables in φ are x1,,x2N. As in the case of , to 𝗌𝗎𝖼𝖼- belong all languages which are polynomial time many-one reducible to succ-ETR. and 𝗌𝗎𝖼𝖼- can also be defined in terms of real RAMs [10, 4]. These two classes correspond to nondeterministic polynomial and exponential time, respectively, on such machines.

In a breakthrough result, Renegar [32] showed that ETR𝖯𝖲𝖯𝖠𝖢𝖤. In addition, his result also shows that even if the formula contains an exponential number of arithmetic terms of exponential size with exponential degree, it is still possible to solve the formula in 𝖯𝖲𝖯𝖠𝖢𝖤, as long as the number of variables is polynomially bounded.

3 Results

Intuitively, one would expect that when we make the arithmetic of the underlying satisfiability problem more powerful, then the complexity of the problem will also increase. The same is true when we go up in the PCH. While this is sometimes the case, we will also see many interesting cases where the complexity remains unchanged when we change a certain parameter. Our main focus will be on the effect of constraining the model on the complexity of the problem. As mentioned earlier, we will always assume a compact marginalization operator since this is a very natural operation that frequently appears in expressions involving probabilities. Furthermore, without compact marginalization, everything follows from the results [11, 22] since the models appearing in their proofs are already very constrained (see the full version).

3.1 Previous work

Table 1: The complexity landscape for unconstrained models. Sources: (a) for 1 [11], for 2 and 3 [22], (b) [22], (c) [8], (d) for 1 and 2 [38], for 3 [17, 8].
Terms 1 (prob.) 2 (interv.) 3 (count.)
basic 𝖭𝖯   (a)
lin
poly    (b)
basic & marg. 𝖭𝖯𝖯𝖯 (c) 𝖯𝖲𝖯𝖠𝖢𝖤 (c) 𝖭𝖤𝖷𝖯 (c)
lin & marg.
poly & marg. 𝗌𝗎𝖼𝖼-   (d)

For unconstrained models, we have a clear picture of the satisfiability landscape, see Table 1. Fagin et al. [11] prove the 𝖭𝖯-completeness for the probabilistic layer with basic and linear terms and the 𝖭𝖯-hardness with polynomial terms. Mossé et al. [22] prove that the latter is in fact complete for the existential theory of the reals . They also prove that the same complexity results also hold for the interventional and counterfactual layers of the PCH. This means that the layer of the PCH does not have any impact on the complexity of the corresponding satisfiability problem. Things however change, when we add a marginalization operator. As shown in [8] if the arithmetic is basic or linear, then the satisfiability problem is complete for 𝖭𝖯𝖯𝖯, 𝖯𝖲𝖯𝖠𝖢𝖤, and 𝖭𝖤𝖷𝖯 when the PCH layer increases from probabilistic to counterfactual. If, however, the arithmetic is polynomial, then all three layers have the same complexity again, they are complete for the new class 𝗌𝗎𝖼𝖼- [38, 17, 8].

3.2 Our results

As our main result, we complete the complexity landscape of the probabilistic and causal satisfiability problems when we either constrain the graph structure of the SCM or when we force the model to be small.

Table 2: The complexity landscape for constrained graph structures. Sources: (a) Lemma 5, Proposition 2, and [8], (b) Theorem 7, (c) Proposition 2 and Theorem 11.
Terms 1 (prob.) 2 (interv.) 3 (count.)
basic & marg. (𝖭𝖯𝖯𝖯)-hard (a) 𝖭𝖤𝖷𝖯 (b)
lin & marg.
poly & marg. 𝗌𝗎𝖼𝖼-   (c)

As the first part of our main results, we classify the difficulty of the satisfiability problems when the graph structure is given as a part of the input. We will show that these cases are always at least as hard as the corresponding problems without specified graph structures. This is due to the fact that we can always take the complete acyclic graph. The most notable difference to the unconstrained problems happens at the interventional layer. Here we have an increase in complexity from 𝖯𝖲𝖯𝖠𝖢𝖤 to 𝖭𝖤𝖷𝖯. We also compare our results to the model checking problem. Here we are not only given a graph structure but the full SCM and we need to check whether the model satisfies the input formula. It turns out that model checking is considerably easier. For instance, when the arithmetic is polynomial with marginalization, then even at the probabilistic layer, the satisfiability problem is 𝗌𝗎𝖼𝖼--complete, while the corresponding model checking problem is in 𝖭𝖯𝖯𝖯 (Theorem 4).

Table 3: The complexity landscape for satisfiability with small models. Source: (a) Lemma 13 and [8], (b) Fact 3 and [4], and (c) Theorem 14 .
Terms 1 (prob.) 2 (interv.) 3 (count.)
basic & marg.  𝖭𝖯𝖯𝖯 (a) 𝖯𝖲𝖯𝖠𝖢𝖤 (a) 𝖭𝖤𝖷𝖯 (a)
lin & marg.
poly & marg. Σ (b) 𝖭𝖤𝖷𝖯 (c)

Secondly, we consider the satisfiability problems with the small model property and investigate how the complexity increases across the PCH, in the presence of summation operators. In Definition 1, we defined the small model property for SCMs. In the original work of Fagin et al., there was no underlying SCM since they only dealt with the probabilistic layer, and a small model meant that we only have polynomially many entries in the table of the joint probability distribution of the (observed) variables. Since each observed variable is a function in the unobserved variables, both definitions are equivalent for semi-Markovian models as we show in Fact 3. Most notably, with polynomial arithmetic, at the probabilistic layer, having a small model reduces the complexity to Σ, the existential theory of the reals enhanced with a compact summation operator. This was shown in [4] for a probabilistic model without an underlying SCM and with Fact 3 we show that it still holds for Definition 1. As soon as we are at the interventional or even counterfactual layer, then the complexity jumps, but only to 𝖭𝖤𝖷𝖯. Thus when we have a small model, the problem is hard for a Boolean class, although we have polynomial arithmetic and would have expected that the problem is hard for some theory over the reals. For linear arithmetic with a compact summation operator, it is known that the complexity increases along the PCH, and we show that the corresponding proofs of [8] are still valid if one requires a small-model property. The results are summarized in Figure 3.

4 Satisfiability and validity with requirements on the graph structure of SCMs

4.1 Probabilistic layer

A Bayesian network (BN) =(𝒢,P) consists of a DAG 𝒢 over nodes representing random variables X1,,Xn and a distribution P which factorizes over 𝒢, meaning that the joint probability P(x1,. . .,xn) can be written as a product i=1nP(xi𝐩𝐚i), where 𝐩𝐚i denotes the values xj1,,xjk of parents 𝐏𝐚i of Xi in 𝒢. Then the distribution P is specified as a set of conditional probability tables for each variable Xi conditioned on its parents in 𝒢 [18]. So, for example, in the case of binary variables, even if the array representing the joint distribution has 2n non-zero elements, it can still be compactly represented by a set of arrays of total size i=1n2|Pai|. This representation has many additional advantages, for example, enabling efficient probabilistic inference.

In the case of purely probabilistic languages, the problem SatDAG,1 can be read as follows: given a DAG 𝒢 over nodes 𝐗 of a Bayesian network (but not being given the distribution P) and a formula φ in the language 1 decide whether there exists P such that φ is true in some BN =(𝒢,P), where P factorizes over 𝒢.

We start with the observation that Sat1 is not harder than SatDAG,1 for any Lab.

Proposition 2.

For all Lab and for any φ1 over variables X1,,Xn, it is true that φSat1 if and only if (φ,𝒢n)SatDAG,1, where 𝒢n denotes a complete DAG over nodes X1,,Xn, with edges XiXj for all 1i<jn.

Proof.

Assume φ is satisfiable in a BN =(𝒢,P). Let =(𝒢n,P), where, for every i, we define the conditional probability table as P(xix1,. . .,xi1):=P(xix1,. . .,xi1) if P(x1,. . .,xi1)>0; otherwise we set the value arbitrarily, e.g., 1/c, where c is the cardinality of set of values of Xi. Due to the chain rule, we get that P(x1,. . .,xn)=P(x1,. . .,xn) which means that φ is satisfiable in .

The reverse implication is obvious.

Due to their great importance in probabilistic reasoning, a lot of attention in AI research has been given to both function and decision problems, in which the entire BN =(𝒢,P) is given as input with the goal to answer specific queries to the distribution. Certainly, to the most basic ones belongs the BN-Pr problem to compute, for a given BN over 𝐗, a variable X in 𝐗, and a value x, the probability P(X=x). The next important primitive of probabilistic reasoning consists in finding the Maximum a Posteriori Hypothesis (MAP). To study its computational complexity, the natural decision problem, named D-MAP, has been investigated, which asks if, for a given rational number τ, evidence 𝐞, and some subset of variables 𝐐, there is an instantiation 𝐪 to 𝐐 such that P(𝐪,𝐞)=𝐲P(𝐪,𝐲,𝐞)>τ. It is well known that BN-Pr is #𝖯-complete and D-MAP is 𝖭𝖯PP-complete [34, 25]. To relate our setting to the standard Bayesian reasoning, we introduce a Model Checking problem,

Definition 3.

The Bayesian Network Model Checking problem, denoted as BN-MC, gets a BN =(𝒢,P) and a formula φ in 1, and verifies whether the formula φ is satisfied in the BN .

This problem can be considered as a generalization of the decision version of BN-Pr.

Theorem 4.

BN-MCpolyΣ is in 𝖯#𝖯. Equivalently, it is in 𝖯𝖯𝖯.

Combining this lemma with [8, Thm. 4], the result of [25], and Proposition 2, we get the following:

BN-MCpolyΣPD-MAPPSat1baseΣPSat1linΣPSatDAG,1baseΣ,

which presents the relationships between the model checking, D-MAP, and satisfiability problems for probabilistic languages from a complexity perspective. In the general case, i.e., in the case of polynomial languages with summation, the constraints on DAGs do not make the problems more difficult, but restricting the models to BNs leads to different complexities, under a standard complexity assumption.

Lemma 5.

SatDAG,1base is -hard.

This means that SatDAG,1baseΣ, which is at least as hard as either D-MAP or SatDAG,1base, is both 𝖭𝖯𝖯𝖯-hard and -hard. Since these are very distinct classes, one should not expect SatDAG,1baseΣ to be complete for either class.

Proposition 6.

SatDAG,1polyΣ is 𝗌𝗎𝖼𝖼--complete.

Proof.

From Theorem 11, we know that SatDAG,3polyΣ is 𝗌𝗎𝖼𝖼--complete. Since any probabilistic formula is a special case of a counterfactual one and Sat1polyΣ is 𝗌𝗎𝖼𝖼--complete, we get SatDAG,1polyΣPSat1polyΣ. On the other hand, from Proposition 2, we can conclude that the opposite relation is also true.

Thus Sat1polyΣPSatDAG,1polyΣ and the problems are computationally harder than BN-MCpolyΣ if 𝖯#𝖯𝖭𝖤𝖷𝖯.

4.2 Interventional and counterfactual reasoning

One of the key components in the development of structural causal models is the do-calculus introduced by Pearl [27, 28], which allows one to estimate causal effects from observational data. The calculus can be expressed as a validity problem for intervention-level languages with the graph structure requirements on SCMs. In particular, the rules of the calculus can be seen as instances of validity problems for which equivalent graphical conditions in terms of d-separation are given. For example, the “insertion/deletion of observations” rule requires, for a given DAG 𝒢, that, for all SCMs 𝔐 with the causal structure 𝒢, and for all values 𝐱,𝐲,𝐳,𝐰, it holds ([𝐱]𝐲|[𝐱](𝐳,𝐰))=([𝐱]𝐲|[𝐱]𝐰). This rule can be written in a compact way as 𝐱,𝐲,𝐳,𝐰(([𝐱]𝐲|[𝐱](𝐳,𝐰))([𝐱]𝐲|[𝐱]𝐰))2=0. In this section, we prove that, in general, the validity problem with polynomial arithmetic is very hard, namely we will see that it is complete for 𝚌𝚘-𝗌𝗎𝖼𝖼-.

To this goal, we study the complexity of the satisfiabilities with graph structure requirements in the case of the interventional languages. The base and linear languages are neither weak enough to be independent of the graph structure nor strong enough to be able to distinguish graph structures exactly. In particular the complexity of SatDAG,2baseΣ and SatDAG,2linΣ increases above the 𝖯𝖲𝖯𝖠𝖢𝖤-completeness of Sat2baseΣ and Sat2linΣ. On the other hand, the polynomial languages are strong enough to encode the DAG structure of the model which implies that the complexity for these languages is the same as Sat2polyΣ.

Theorem 7.

SatDAG,2baseΣ and SatDAG,2linΣ are 𝖭𝖤𝖷𝖯-complete. Thus, the validity problems for the corresponding languages are complete for 𝚌𝚘-𝖭𝖤𝖷𝖯.

An important tool for the proof of this theorem is the satisfiability of a Schönfinkel-Bernays sentence. The class of Schönfinkel–Bernays sentences (also called Effectively Propositional Logic, EPR) is a fragment of first-order logic formulas where satisfiability is decidable. Each sentence in the class is of the form 𝐱𝐲ψ whereby ψ can contain logical operations ,,¬, variables 𝐱 and 𝐲, equalities, and relations Ri(𝐱,𝐲) which depend on a set of variables, but ψ cannot contain any quantifier or functions. Determining whether a Schönfinkel-Bernays sentence is satisfiable is an 𝖭𝖤𝖷𝖯-complete problem [19] even if all variables are restricted to binary values [1].

Corollary 8.

SatDAG,2baseΣ is computationally harder than Sat2linΣ unless 𝖭𝖤𝖷𝖯=𝖯𝖲𝖯𝖠𝖢𝖤.

If the graph 𝒢 is complete, that is, all nodes are connected by an edge, it is equivalent to a causal ordering, an ordering Xi1Xi2Xin, such that variable Xij can only depend on variables Xik with ik<ij. If a causal ordering is given as a constraint, it does not change the complexity of the problem:

Lemma 9.

In 2 one can encode a causal ordering.

Proof.

Given (in-)equalities in 2, we add a new variable C and add to each primitive the intervention [C=0]. This does not change the satisfiability of (in-)equalities.

Given a causal order Vi1Vi2, we add c equations for each variable Vij, j>1:

([C=1,Vij1=k]Vij=k)=1 for k=1,,c.

The equations ensure that, if one variable is changed, and C=1 is set, the next variable in the causal ordering has the same value, thus fixing an order from the first to the last variable.

The equivalence between causal orderings and complete graphs thus results in:

Corollary 10.

SatDAG,2linΣ remains 𝖯𝖲𝖯𝖠𝖢𝖤-complete in the special case that 𝒢 happens to be a complete graph.

Counterfactual languages on the other hand are strong enough to be able to enforce an explicit graph structure directly in the formula using exponential sums. That is, for each variable Xi, the formula can encode which variables are the parents of Xi by requiring that Xi remains constant if the parents remain constant. This condition is encoded with an intervention on the parents and an intervention on the other variables, and a counterfactual query ensuring that the second intervention does not change the probability distribution of Xi after the first intervention. An exponential sum can run this through all possible interventions.

Theorem 11.

SatDAG,3PSat3 for any {baseΣ,linΣ,polyΣ}.

Since we now know SatDAG,1polyΣPSatDAG,2polyΣPSatDAG,3polyΣPSat3polyΣ, Proposition 6 together with Sat3polyΣ being succ-ETR complete [8], we obtain:

Corollary 12.

SatDAG,2polyΣ is succ-ETR complete.

5 Probabilistic and causal reasoning with a small model

In [4], the complexity of a problem named Satsm1polyΣ is investigated. The authors consider purely probabilistic models where sm means

#{(x1,,xm):(X1=x1,,Xn=xm)>0}

is small, i.e., is polynomially bounded in the input size. Before talking further about Satsm1polyΣ, we need to show that their problem Satsm1polyΣ is equivalent to our problem Satsm1polyΣ according to Definition 1 where

#{(u1,,um):P(U1=u1,,Um=um)>0}

is small (we cannot use their definition, as it is not meaningful to give constraints on the distribution (𝐗) for causal models where interventions can change this distribution in various ways).

Fact 2.

For a given model 𝔐, let #𝐗+(𝔐)=#{(x1,,xn):(X1=x1,,Xn=xn)>0} be the number of assignments to the observed variables with positive probability and #𝐔+(𝔐)=#{(u1,,um):P(U1=u1,,Um=um)>0} be the number of assignments to the unobserved variables with positive probability.

Then #𝐗+(𝔐)#𝐔+(𝔐).

The reverse does not hold. All functions in the model might be constant, in which case #𝐗+=1, regardless of #𝐔+. Thus a model that is a solution for the Satsm1polyΣ problem of [4] might not be a solution to our problem Satsm1polyΣ. Still:

Fact 3.

For a given probabilistic formula φ and a number p, there exists a semi-Markovian model 𝔐=(,P,𝐔={U1,,Um},𝐗) such that 𝔐φ and #𝐔+(𝔐)p if and only if there exists a model 𝔐=(,P,𝐔,𝐗) such that 𝔐φ and #𝐗+(𝔐)p.

Thereby it is critical to consider semi-Markovian models, as the unobserved variables constructed in the proof might not be independent, as it was required in Markovian models.

Fact 4.

Fact 3 does not hold for Markovian models.

Nevertheless, Fact 3 implies that the Σ-completeness proof for a “Satsm1polyΣ” version restricted on the observed distribution in [4] also applies to our Satsm1polyΣ. Thus we know that Satsm1polyΣ is complete for a new complexity class Σ, a class that extends the existential theory of reals with summation operators. This is not a succinct class, so it is different from 𝗌𝗎𝖼𝖼-, in fact they show

𝖭𝖯𝖯𝖯Σ𝖯𝖲𝖯𝖠𝖢𝖤𝖭𝖤𝖷𝖯𝗌𝗎𝖼𝖼-=𝖭𝖤𝖷𝖯real,

where 𝖭𝖤𝖷𝖯real is the class of problems decided by nondeterministic real Random Access Machines (RAMs) in exponential time (see [4] for the exact details).

Not allowing multiplication reduces the complexity, however, once we allow for interventions, the complexity increases again. We know the following complexities for linear arithmetic from [8], whose results translate to small models:

Lemma 13.

The following completeness results hold for small models:

  • Satsm, 1basicΣ and Satsm, 1linΣ are 𝖭𝖯𝖯𝖯-complete.

  • Satsm, 2basicΣ and Satsm, 2linΣ are 𝖯𝖲𝖯𝖠𝖢𝖤-complete.

  • Satsm, 3basicΣ and Satsm, 3linΣ are 𝖭𝖤𝖷𝖯-complete.

Allowing multiplications again leads to 𝖭𝖤𝖷𝖯-completeness, even on the second (interventional) level. Note however that this is likely still weaker than the 𝖭𝖤𝖷𝖯real-completeness of Sat2polyΣ and Sat3polyΣ.

Theorem 14.

Satsm, 2polyΣ and Satsm, 3polyΣ are 𝖭𝖤𝖷𝖯-complete.

This follows because a causal model consists of a probabilistic part (𝐮) and a deterministic, causal part . In the small model, (𝐮) is restricted to have polynomial size, but can still have exponential size. In Satsm1polyΣ, one cannot reason about , but Satsm2polyΣ and Satsm3polyΣ can access the full power of . Since 𝖭𝖤𝖷𝖯 subsumes , reasoning about polynomially many real numbers (probabilities) cannot increase the complexity.

6 Discussion

We investigated the computational complexities of probabilistic satisfiability problems from a multi-parametric point of view. Our new completeness results nicely extend and complement the previous achievements by [11, 22, 38, 4]. Our main focus was on the effect of constraining the model. We have shown that including a DAG in the input increases the complexity on the second level for linear languages (i.e., SatDAG,2linΣ) to 𝖭𝖤𝖷𝖯-complete, while it does not change the complexity on the third level or for the full languages. We leave the exact complexity of the first linear level (i.e., SatDAG,1linΣ) for future work, but we conjecture that it is Σ-complete. On the other hand, including a Bayesian network, a DAG and its probability tables reduces the complexity to being in 𝖯𝖯𝖯. Here, the completeness is left for future work888As the main computation in the proof of Lemma 4 is performed by the 𝖯𝖯-oracle; one can extend the proof to show the containment of BN-MCpolyΣ in the space-limited complexity class 𝖫𝖯𝖯 and in the circuit complexity class 𝖭𝖢1𝖯𝖯. Thus, it is unlikely to be 𝖯𝖯𝖯-complete, although it might be possible that all three classes are identical..

Another interesting feature is that when the model is unconstrained, then the completeness for linear languages is expressed in terms of standard Boolean classes while the completeness of satisfiability for languages involving polynomials over the probabilities requires classes over the reals. However, once we constrain the model to be small, then the complexity with polynomial arithmetic is again described by a Boolean complexity class (for PCH layers two and three). This is due to the fact that the running time of Renegar’s algorithm is essentially determined by the number of variables (which is small in the case of a small model) and not by the number of equations (which still can be large).

References

  • [1] Antonis Achilleos. NEXP-completeness and universal hardness results for justification logic. In International Computer Science Symposium in Russia, pages 27–52. Springer, 2015. doi:10.1007/978-3-319-20297-6_3.
  • [2] Sanjeev Arora and Boaz Barak. Computational complexity: a modern approach. Cambridge University Press, 2009.
  • [3] Elias Bareinboim, Juan D. Correa, Duligur Ibeling, and Thomas Icard. On Pearl’s Hierarchy and the Foundations of Causal Inference, pages 507–556. Association for Computing Machinery, New York, NY, USA, 2022. doi:10.1145/3501714.3501743.
  • [4] Markus Bläser, Julian Dörfler, Maciej Liśkiewicz, and Benito van der Zander. The existential theory of the reals with summation operators. In 35th Int. Symposium on Algorithms and Computation, ISAAC 2024, volume 322 of LIPIcs, pages 13:1–13:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.ISAAC.2024.13.
  • [5] John Canny. Some algebraic and geometric computations in PSPACE. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 460–467. ACM, 1988. doi:10.1145/62212.62257.
  • [6] Gregory F. Cooper. The computational complexity of probabilistic inference using Bayesian belief networks. Artificial intelligence, 42(2-3):393–405, 1990. doi:10.1016/0004-3702(90)90060-D.
  • [7] Paul Dagum and Michael Luby. Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 60(1):141–153, 1993. doi:10.1016/0004-3702(93)90036-B.
  • [8] Julian Dörfler, Benito van der Zander, Markus Bläser, and Maciej Liśkiewicz. From probability to counterfactuals: the increasing complexity of satisfiability in Pearl’s Causal Hierarchy. In International Conference on Learning Representations (ICLR). PMLR, to appear, 2025. Available as ArXiv TR 2405.07373.
  • [9] Mathias Drton and Marloes H. Maathuis. Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 4:365–393, 2017.
  • [10] Jeff Erickson, Ivor Van Der Hoog, and Tillmann Miltzow. Smoothing the gap between NP and ER. SIAM Journal on Computing, 53:FOCS20–102–FOCS20–138, 2024.
  • [11] Ronald Fagin, Joseph Y. Halpern, and Nimrod Megiddo. A logic for reasoning about probabilities. Information and computation, 87(1-2):78–128, 1990. doi:10.1016/0890-5401(90)90060-U.
  • [12] Ronald Aylmer Fisher. Design of experiments. British Medical Journal, 1(3923):554, 1936.
  • [13] Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models. Frontiers in genetics, 10:524, 2019.
  • [14] Dima Grigoriev and Nicolai Vorobjov. Solving systems of polynomial inequalities in subexponential time. J. Symb. Comput., 5(1/2):37–64, 1988. doi:10.1016/S0747-7171(88)80005-1.
  • [15] Duligur Ibeling and Thomas Icard. Probabilistic reasoning across the causal hierarchy. In The 34th AAAI Conference on Artificial Intelligence, AAAI 2020, pages 10170–10177. AAAI Press, 2020. doi:10.1609/AAAI.V34I06.6577.
  • [16] Duligur Ibeling, Thomas Icard, Krzysztof Mierzewski, and Milan Mossé. Probing the quantitative–qualitative divide in probabilistic reasoning. Annals of Pure and Applied Logic, 175(9):103339, 2024. doi:10.1016/J.APAL.2023.103339.
  • [17] Duligur Ibeling, Thomas Icard, and Milan Mossé. On probabilistic and causal reasoning with summation operators. Journal of Logic and Computation, 2024.
  • [18] Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
  • [19] Harry R. Lewis. Complexity results for classes of quantificational formulas. Journal of Computer and System Sciences, 21(3):317–353, 1980. doi:10.1016/0022-0000(80)90027-6.
  • [20] Ni Y. Lu, Kun Zhang, and Changhe Yuan. Improving causal discovery by optimal Bayesian network learning. In Proc. of the AAAI Conference on Artificial Intelligence, volume 35(10), pages 8741–8748, 2021. doi:10.1609/AAAI.V35I10.17059.
  • [21] Nicolai Meinshausen, Alain Hauser, Joris M. Mooij, Jonas Peters, Philip Versteeg, and Peter Bühlmann. Methods for causal inference from gene perturbation experiments and validation. Proceedings of the National Academy of Sciences, 113(27):7361–7368, 2016.
  • [22] Milan Mossé, Duligur Ibeling, and Thomas Icard. Is causal reasoning harder than probabilistic reasoning? The Review of Symbolic Logic, pages 1–26, 2022.
  • [23] Ignavier Ng, Yujia Zheng, Jiji Zhang, and Kun Zhang. Reliable causal discovery with improved exact search and weaker assumptions. Advances in Neural Information Processing Systems, NeurIPS, 34:20308–20320, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/a9b4ec2eb4ab7b1b9c3392bb5388119d-Abstract.html.
  • [24] Nils J. Nilsson. Probabilistic logic. Artificial Intelligence, 28(1):71–87, 1986. doi:10.1016/0004-3702(86)90031-7.
  • [25] James D. Park and Adnan Darwiche. Complexity results and approximation strategies for MAP explanations. Journal of Artificial Intelligence Research, 21:101–133, 2004. doi:10.1613/JAIR.1236.
  • [26] Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, 1988.
  • [27] Judea Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–688, 1995.
  • [28] Judea Pearl. Causality. Cambridge University Press, 2009.
  • [29] Judea Pearl and Dana Mackenzie. The book of why: the new science of cause and effect. Basic books, 2018.
  • [30] Alexander Reisach, Christof Seiler, and Sebastian Weichwald. Beware of the simulated dag! causal discovery benchmarks may be easy to game. Advances in Neural Information Processing Systems, NeurIPS, 34:27772–27784, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/e987eff4a7c7b7e580d659feb6f60c1a-Abstract.html.
  • [31] Alexander Reisach, Myriam Tami, Christof Seiler, Antoine Chambaz, and Sebastian Weichwald. A scale-invariant sorting criterion to find a causal order in additive noise models. Advances in Neural Information Processing Systems, NeurIPS, 36:785–807, 2023.
  • [32] James Renegar. On the computational complexity and geometry of the first-order theory of the reals. Part I: Introduction. Preliminaries. The geometry of semi-algebraic sets. The decision problem for the existential theory of the reals. Journal of symbolic computation, 13(3):255–299, 1992. doi:10.1016/S0747-7171(10)80003-3.
  • [33] Paul Rolland, Volkan Cevher, Matthäus Kleindessner, Chris Russell, Dominik Janzing, Bernhard Schölkopf, and Francesco Locatello. Score matching enables causal discovery of nonlinear additive noise models. In International Conference on Machine Learning, ICLM, pages 18741–18753. PMLR, 2022. URL: https://proceedings.mlr.press/v162/rolland22a.html.
  • [34] Dan Roth. On the hardness of approximate reasoning. Artificial Intelligence, 82(1-2):273–302, 1996. doi:10.1016/0004-3702(94)00092-1.
  • [35] Marcus Schaefer. Complexity of some geometric and topological problems. In International Symposium on Graph Drawing, pages 334–344. Springer, 2009. doi:10.1007/978-3-642-11805-0_32.
  • [36] Ilya Shpitser and Judea Pearl. Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9(Sep):1941–1979, 2008. doi:10.5555/1390681.1442797.
  • [37] Chandler Squires and Caroline Uhler. Causal structure learning: A combinatorial perspective. Foundations of Computational Mathematics, pages 1–35, 2022.
  • [38] Benito van der Zander, Markus Bläser, and Maciej Liśkiewicz. The hardness of reasoning about probabilities and causality. In Proc. Joint Conference on Artificial Intelligence (IJCAI 2023), 2023.
  • [39] Junzhe Zhang, Jin Tian, and Elias Bareinboim. Partial counterfactual identification from observational and experimental data. In International Conference on Machine Learning, pages 26548–26558. PMLR, 2022. URL: https://proceedings.mlr.press/v162/zhang22ab.html.