Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation

Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query, also referred to as parallel-correctness. This paper extends the study of the complexity of parallel-correctness and its constituents, parallel-soundness and parallel-completeness, to unions of conjunctive queries with and without negation. As a by-product it is shown that the containment problem for conjunctive queries with negation is coNEXPTIME-complete.


Introduction
Motivated by recent in-memory systems like Spark [1] and Shark [20], Koutris and Suciu introduced the massively parallel communication model (MPC) [14] where computation proceeds in a sequence of parallel steps each followed by global synchronisation of all servers. Of particular interest in the MPC model are queries that can be evaluated in one round of communication [9]. In its most naïve setting, a query Q is evaluated by reshuffling the data over many servers, according to some distribution policy, and then computing Q at each server in a parallel but communication-free manner. A notable family of distribution policies is formed within the Hypercube algorithm [2,9,11]. A property of Hypercube distributions is that for any instance I, the central execution of Q(I) always equals the union of the evaluations of Q at every computing node (or server). The latter guarantees the correctness of the distributed evaluation for any conjunctive query by the Hypercube algorithm.
Ameloot et al. [5] introduced a general framework for reasoning about oneround evaluation algorithms under arbitrary distribution policies. They introduced parallel-correctness as a property of a query w.r.t. a distribution policy which states that central execution always equals distributed execution, that is, equals the union of the evaluations of the query at each server under the given distribution policy. One of the main results of [5] is that deciding parallelcorrectness for conjunctive queries (CQs) is Π P 2 -complete under arbitrary distribution policies. The upper bound follows rather directly from a semantical characterisation of parallel-correctness in terms of properties of minimal valuations. Specifically, it was shown that a conjunctive query is parallel-correct w.r.t. a distribution policy, if the distribution policy sends for every minimal valuation its required facts to at least one node.
As union and negation are fundamental operators, we extend in this paper the study of parallel-correctness to unions of conjunctive queries (UCQ), conjunctive queries with negation (CQ ¬ ) and unions of conjunctive queries with negation (UCQ ¬ ). In fact, we study two addional but related notions: parallelsoundness and parallel-completeness. While parallel-correctness implies equivalence between centralised and distributed execution, parallel-soundness (respectively, parallel-completeness) requires that distributed execution is contained in (respectively, contains) centralised execution. Of course, parallel-soundness and parallel-completeness together are equivalent to parallel-correctness. Furthermore, since all monotone queries are parallel-sound, on this class parallelcorrectness is equivalent to parallel-completeness.
We start by investigating parallel-correctness for UCQ. Interestingly, for a UCQ to be parallel-correct under a certain distribution policy it is not required that every disjunct is parallel-correct. We extend the characterisation for parallel-correctness in terms of minimal valuations for CQs to UCQs and thereby obtain membership in Π P 2 . The matching lower bound follows, of course, from the lower bound for CQs [5].
Next, we study parallel-correctness for (unions of) conjunctive queries with negation. Sadly, when negation comes into play, parallel-correctness can no longer be characterised in terms of properties of valuations. Instead our algorithms are based on counter-examples of exponential size, yielding coNEXPTIME upper bounds. It turns out that this is optimal, though, as our corresponding lower bounds show. The proof of the lower bounds comes along an unexpected route: we exhibit a reduction from query containment for CQ ¬ to parallelcorrectness of CQ ¬ (and its two variants) and show that query containment for CQ ¬ is coNEXPTIME-complete. This is considerably different from what we thought was folklore knowledge of the community. Indeed, the Π p 2 -completeness result for query containment for CQ ¬ mentioned in [18] only seems to hold for fixed database schemas (or a fixed arity bound, for that matter). We note that Mugnier et al. [16] provide a Π p 2 upper bound proof for CQ ¬ containment and explicitly mention that it holds under the assumption that the arity of predicates is bounded by a constant. Altogether, parallel-correctness (and its variants) for (unions of) conjunctive queries with negation is thus complete for coNEXPTIME.
Finally, a natural question is how the high complexity of parallel-correctness in the presence of negation can be lowered. We identify two cases in which the complexity drops. More specifically, the complexity decreases from coNEXPTIME to Π p 2 if the database schema is fixed or the arity of relations is bounded, and to coNP for unions of full conjunctive queries with negation. In the latter case, we again employ a reduction from containment of full conjunctive queries (with negation) and obtain novel results on the containment problem in this setting as well. All upper bounds hold for queries with inequalities.
Outline. This paper is further organised as follows. In Section 2, we discuss related work. In Section 3, we introduce the necessary definitions. We address parallel-correctness for unions of conjunctive queries in Section 4. We consider containment of conjunctive queries with negation in Section 5 and parallelcorrectness together with its variants in Section 6. We discuss the restriction to full conjunctive queries in Section 7. We conclude in Section 8.

Related work
As mentioned in the introduction, Koutris and Suciu introduced the massively parallel communication model (MPC) [14]. A key property is that computation proceeds in a sequence of parallel steps, each followed by global synchronisation of all computing nodes. In this model, evaluation of conjunctive queries [8,14] and skyline queries [4] has been considered. Beame, Koutris and Suciu [9] proved a matching upper and lower bound for the amount of communication needed to compute a full conjunctive query without self-joins in one communication round. The upper bound is provided by a randomised algorithm called Hypercube which uses a technique that can be traced back to Ganguly, Silberschatz, and Tsur [13] and is described in the context of map-reduce by Afrati and Ullman [2].
Ameloot et al. [5] introduced a general framework for reasoning about oneround evaluation algorithms under arbitrary distribution policies. They introduced the notion of parallel-correctness and proved its associated decision problem to be Π p 2 -complete for conjunctive queries. In addition, towards optimisation in MPC, they considered parallel-correctness transfer. Here, parallel-correctness transfers from Q to Q ′ when Q ′ is parallel-correct under every distribution policy for which Q is parallel-correct. The associated decision problem for conjunctive queries is shown to be Π p 3 -complete. In addition, some restricted cases (e.g., transferability under Hypercube distributions), are shown to be NP-complete.
Our definition of a distribution policy is borrowed from Ameloot et al. [6] (but already surfaces in the work of Zinn et al. [21]), where distribution policies are used to define the class of policy-aware transducer networks. The work by Ameloot et al. [7,6] relates coordination-free computation with definability in variants of Datalog. One-round communication algorithms in MPC can be seen as very restrictive coordination-free computation.
The complexity of query containment for conjunctive queries is proved to be NP-complete by Chandra and Merlin [10]. Levy and Sagiv provide a test for query containment of conjunctive queries with negation [15] that involves exploring an exponential number of possible counter-example instances. In the context of information integration, Ullman [18] gives a comprehensive overview of query containment (with and without negation) and states the complexity of query containment for CQ ¬ to be Π p 2 -complete. As mentioned in the introduc-tion, the latter apparently only holds when the database schema is fixed or the arity of relations is considered to be bounded. A proof for the Π p 2 -lowerbound is given by Farré et al. [12]. Based on [15], Wei and Lausen [19] study a method for testing containment that exploits containment mappings for the positive parts of queries, and additionally provide a characterisation for UCQ ¬ containment.

Queries and instances
We assume an infinite set dom of data values that can be represented by strings over some fixed alphabet. By dom n we denote the set of data values represented by strings of length at most n. A database schema D is a finite set of relation names R, each with some arity ar (R). We also write R (k) as a shorthand to denote that R is a relation of arity k. We call R(t) a fact when R is a relation name and t a tuple over dom of appropriate arity. We say that a fact R(t) is over a database schema D if R ∈ D. For a subset U ⊆ dom we write facts(D, U ) for the set of possible facts over schema D and U and by facts(D) we denote facts(D, dom). A (database) instance I over D is a finite set of facts over D. By adom(I) we denote the set of data values occurring in I. A query Q over input schema D 1 and output schema D 2 is a generic mapping from instances over D 1 to instances over D 2 . Genericity means that for every permutation π of dom and every instance I, Q(π(I)) = π(Q(I)). We say that Q is contained in Q ′ , denoted Q ⊆ Q ′ iff for all instances I, Q(I) ⊆ Q ′ (I).

Unions of conjunctive queries with negation
Let var be an infinite set of variables, disjoint from dom. An atom over schema D is of the form R(x), where R is a relation name from D and x = (x 1 , . . . , x k ) is a tuple of variables in var with k = ar (R). A conjunctive query Q with negation and inequalities over input schema D is an expression of the form T (x) ← R 1 (y 1 ), . . . , R m (y m ), ¬S 1 (z 1 ), . . . , ¬S n (z n ), β 1 , . . . , β p where all R i (y i ) and S i (z j ) are atoms over D, every β i is an inequality of the form s = s ′ where s, s ′ are distinct variables occurring in some y i or z j , and T (x) is an atom for which T ∈ D. Additionally, for safety, we require that every variable in x occurs in some y i and that every variable occurring in a negated atom has to occur in a positive atom as well (safe negation). We refer to the head atom T (x) as head Q , to the set {R 1 (y 1 ), . . . , R m (y m ), S 1 (z 1 ), . . . , S n (z n )} as body Q , and to the set {β 1 , . . . , β p } as ineq Q . Specifically, we refer to {R 1 (y 1 ), . . . , R m (y m )} as the positive atoms in Q, denoted pos Q , and to {S 1 (z 1 ), . . . , S n (z n )} as the negated atoms of Q, denoted neg Q . We denote by vars(Q) the set of all variables occurring in Q. We refer to the class of conjunctive queries with negation and inequalities by CQ ¬, = , its restriction to queries without inequalities, without negated atoms, and without both by CQ ¬ , CQ = , and CQ, respectively. As a shorthand we refer to queries from CQ ¬, = as CQ ¬, = s and similarly for the other classes.
A pre-valuation for a CQ ¬, = Q is a total function V : vars(Q) → dom, which naturally extends to atoms and sets of atoms. It is consistent for Q, if V (pos Q ) ∩ V (neg Q ) = ∅, and V (s) = V (s ′ ), for every inequality s = s ′ of Q, in which case it is called a valuation. Of course, for a conjunctive query without negated atoms and without inequalities, every pre-valuation is also a valuation. We refer to V (pos Q ) as the facts required by V , and to V (neg Q ) as the facts prohibited by V .
A valuation V satisfies Q on instance I if all facts required by V are in I while no fact prohibited by V is in I, that is, if V (pos Q ) ⊆ I and V (neg Q ) ∩ I = ∅. In that case, V derives the fact V (head Q ). The result of Q on instance I, denoted Q(I), is defined as the set of facts that can be derived by satisfying valuations for Q on I.
A union of conjunctive queries with negation and inequalities is a finite union of CQ ¬, = s. That is, Q is of the form n i=1 Q i where all subqueries Q 1 , . . . , Q n have the same relation name in their head atoms. We assume disjoint variable sets among different disjuncts in Q. That is, vars(Q i ) ∩ vars(Q j ) = ∅ for i = j and, in particular, vars(head Qi ) = vars(head Qj ). By varmax (Q) we denote the maximum number of variables that occurs in any disjunct of Q. By UCQ ¬, = we denote the class of unions of conjunctive queries with negation and inequalities and its fragments are denoted correspondingly.
A CQ ¬, = is called full if all of its variables occur in its head. A UCQ ¬, = is full if all its subqueries are full.
The result of Q on instance I is Q(I) = n i=1 Q i (I). Accordingly, a mapping from variables to data values is a valuation for a UCQ ¬, = Q if it is a valuation for one of its subqueries.

Networks, data distribution, and policies
A network N is a nonempty finite set of values from dom, which we call (computing) nodes (or servers). A distribution policy P = (U, rfacts P ) for a database schema D and a network N consists of a universe U and a total function rfacts P that maps each node of N to a set of facts from facts(D, U ). A node κ is responsible for fact f (under policy P ) if f ∈ rfacts P (κ). As a shorthand (and slight abuse of notation), we denote the set of nodes κ that are responsible for some given fact f by P (f ). For a distribution policy P and an instance I over D, let loc-inst P ,I denote the function that maps each κ ∈ N to I ∩ rfacts P (κ), that is, the set of facts in I for which κ is responsible. We sometimes refer to a given instance I as the global instance and to loc-inst P ,I (κ) as the local instance at node κ.
We note that for some facts from facts(D, U ) there are no responsible nodes. This gives our framework some additional flexibility. However, it does not affect our results: in the lower bound proofs we only use distributions for which all facts from facts(D, U ) have some responsible nodes. Each distribution policy implicitly induces a network and each query implicitly defines a database (sub-) schema. Therefore, we often omit the explicit notation for networks and schemas.
Given some policy P that is defined over a network N , the result [Q, P ](I) of the distributed evaluation of a query Q on an instance I in one round is defined as the union of the results of the query evaluated on each node's local instance. Formally, In the decision problem for parallel correctness (to be formalised later), the input consists of a query Q and a distribution policy P . However, it is not obvious how distribution policies should be specified. In principle, they could be defined in an arbitrary fashion, but it is reasonable to assume that given a potential fact f , a node κ and a policy P , it is not too hard to find out whether κ is responsible for f under P .
For UCQ = s, which are monotone, our complexity results are remarkably robust with respect to the choice of the representation of distribution policies. In fact, the complexity results coincide for the two extreme possible choices that we consider in this article. In the first case, distribution policies are specified by an explicit list of tuple-node-pairs, whereas in the second case the test whether a given node is responsible for a given tuple can be carried out by a nondeterministic polynomial-time algorithm. However, we do require that some bound n on the length of strings that represent node names and data values is given. Without such a restriction, no upper complexity bounds would be possible as nodes with names of super-polynomial length in the size of the input would not be accessible.
Considering queries with negated atoms, however, these two settings (seem to) differ, complexity-wise. Thus, we consider a third option, P rule , in which the universe U of a policy is explicitly enumerated and the responsibilities are defined by simple constraints (described below). The latter representation enjoys the same complexity properties as the full NP-test based case. Now we give more precise definitions of classes of policies and their representations as inputs of algorithmic problems. As said before, policies P = (U, rfacts P ) from P fin are specified by an explicit enumeration of U and of all pairs (κ, f ) where κ ∈ P (f ). A policy P = (U, rfacts P ) from P rule is given by an explicit enumeration of U and a list of rules of the form ρ = (A, κ), where A is an atom with variables and/or constants from U , and a network node κ. The semantics of such a rule is as follows: for every substitution µ : var∪dom → dom that maps variables to values from U and leaves constants from U unchanged, the node κ is responsible for the fact µ(A). A rule is a fact rule if its atom does not contain any variables, that is, A = R(a 1 , . . . , a n ), where a 1 , . . . , a n ∈ U . In particular, P fin ⊆ P rule . On global instance I = {Rel (1,7,7), Rel (1,7,8), Rel (2,9,8), Rel (2,9,9)}, policy P induces local instances loc-inst P ,I (κ 1 ) = {Rel(1, 7, 7)} and loc-inst P ,I (κ 2 ) = {Rel (2,9,8), Rel (2,9,9)}.
The most general classes of policies allow to specify policies by means of a "test algorithm" with time bound ℓ k , where ℓ is the length of the input and k some constant. Such an algorithm decides, for an input consisting of a node κ and fact f , whether κ is responsible for f . 1 A policy P = (U, rfacts P ) from P k npoly is specified by a pair (n, A P ), where n is a natural number in unary representation and A P is a non-deterministic algorithm. 2 The universe U of P is the set of all data values that can be represented by strings of length at most n (for some given fixed alphabet) and the underlying network consists of all nodes which are represented by strings of length at most n, that is, N = dom n . A node κ is responsible for a fact f if A P , on input (κ, f ), has an accepting run of at most |(κ, f )| k steps. Clearly, each policy of P fin can be described in P 2 npoly . Let P npoly denote the set 3 {P k npoly | k ≥ 2} of distribution policies and by P the set {P fin , P rule } ∪ P npoly .

Parallel-correctness, soundness, and completeness
In this paper, we mainly consider the one-round evaluation algorithm for a query Q that first distributes (reshuffles) the data over the computing nodes according to P , then evaluates Q in a parallel step at every computing node, and finally outputs all facts that are obtained in this way. 4 As formalised next, the one-round evaluation algorithm is correct (sound, complete) if the query Q is parallel-correct (parallel-sound, parallel-complete) under P . if it is parallel-sound and parallel-complete.

Definition 3.3.
A query Q is parallel-correct (respectively, parallel-sound and parallel-complete) under distribution policy P = (U, rfacts P ), if Q is parallelcorrect (respectively, parallel-sound and parallel-complete) on all instances I ⊆ facts(D, U ).
In [5], parallel-correctness is characterised in terms of minimal valuations as defined next: 1 We note that it is important that for each class of policies there is a fixed k that bounds the exponent in the test algorithm as otherwise we could not expect a polynomial bound for all policies of that class.
2 For concreteness, say, a non-deterministic Turing machine. 3 Since "linear time" is a subtle notion, we rather not consider P 1 npoly . 4 We note that, since P is defined on the granularity of a fact, the reshuffling does not depend on the current distribution of the data and can be done in parallel as well.
The following lemma is key in obtaining the Π p 2 upper bound on the complexity of testing parallel-correctness for conjunctive queries: Lemma 3.5 (Characterisation of parallel-correctness for CQs [5]). A CQ Q is parallel-correct under distribution policy P = (U, rfacts P ) if and only if the following holds: Remark 3.6. Informally, condition (C1) states that there is a node in the network where all facts required for V meet.

Algorithmic problems
We consider the following decision problems for various sub-classes C and C ′ of UCQ ¬, = and classes P of distribution policies from {P fin , P rule } ∪ P npoly .
Parallel-Sound(C, P): Input: Q ∈ C, P ∈ P Question: Is Q parallel-sound under P ?
Parallel-Complete(C, P): Input: Q ∈ C, P ∈ P Question: Is Q parallel-complete under P ?
Parallel-Correct(C, P): Input: Q ∈ C, P ∈ P Question: Is Q parallel-correct under P ?
4 Parallel-correctness: unions of conjunctive queries Parallel-correctness of unions of conjunctive queries (without negation) reduces to parallel-completeness for the simple reason that these queries are monotone and therefore parallel-sound for every distribution policy. We show below that parallel-completeness remains in Π p 2 . Hardness already follows from Π p 2 -hardness of Parallel-Correct(CQ, P fin ) [5].
As a UCQ is parallel-complete under a policy P when all its disjuncts are, it might be tempting to assume that this condition is also necessary. However, as the following example illustrates, this is not the case.
where Q 1 and Q 2 are the following CQs: Further, let P be the policy over network {κ 1 , κ 2 } that maps facts R(a, a) to node κ 1 , for all a ∈ dom, and all other R-facts and all S-facts to node κ 2 .
We argue that Q is parallel-complete under P on all instances. Indeed, assume H(a, b) ∈ Q(I) for some instance I and a, b ∈ dom. If a = b, only the valu- We recall from Section 3.2 that disjuncts in unions of conjunctive queries use disjoint variable sets and a valuation for Q is a valuation for exactly one disjunct. As formalised next, the notion of minimality for valuations given in Definition 3.4 naturally extends to UCQ = .
are as follows: The notion of minimality leads to basically the same simple characterisation of parallel-completeness: Lemma 4.4. A UCQ = Q is parallel-correct under distribution policy P = (U, rfacts P ) if and only if the following holds: Because of monotonicity, we only need to show that Q(I) ⊆ κ∈N Q(loc-inst P ,I (κ)) for every instance I. To this end, let f be an arbitrary fact that is derived by some valuation V for Q on I. Then, there is also a minimal valuation V ′ that is satisfying on I and which derives f . Because of (C1 ′ ), there is a node κ ∈ N where all facts required by V ′ meet (cf. Remark 3.6). Hence, f ∈ κ∈N Q(loc-inst P ,I (κ)), i.e. query Q is parallel-correct under policy P .
(Only if ) For a proof by contraposition, suppose that there is a minimal valuation V ′ for Q for which the required facts do not meet under P . Consider the input instance I = V ′ (body Q ). By definition of minimality, there is no valuation that agrees on the head variables and is satisfying for Q on a strict subset of V ′ (body Q ). Therefore, V ′ (head Q ) is in Q(I) but it is not derived on any node and thus query Q is not parallel-complete under policy P .
The characterisation in Lemma 4.4, in turn, can be used to prove a Π p 2 upper bound.
for every P ∈ P. Proof. It suffices to show that the complement of Parallel-Complete(UCQ = , P k npoly ) is in Σ p 2 for arbitrary k ≥ 2. Let P = (n, T ) be a policy from P k npoly . We have to consider only instances whose data values can be represented by strings of length n over networks whose nodes can be represented by strings of length n.
By Lemma 4.4, a query Q is not parallel-correct under distribution policy P if and only if there exists a minimal valuation V that satisfies Q on some instance I with adom(I) ⊆ dom n such that no node in dom n is responsible for all facts from V (body Q ).
First, the algorithm non-deterministically guesses a valuation V , which can be represented by a string in length polynomial in Q and n. Subsequently, it checks for all valuations V ′ , all nodes κ, and all strings x of polynomial length whether V ′ contradicts minimality of V (in which case the algorithm rejects the input) and, by use of algorithm T , whether node κ is not responsible for at least one fact from V (body Q ) (if so, the algorithm continues, otherwise it rejects). All tests can be done in polynomial time.
From [5] we know the following result.

Containment of CQ ¬ and UCQ ¬
In this section, we establish the complexity of containment for CQ ¬ and UCQ ¬ . We need these results to establish lower bounds on parallel-correctness and its constituents in the next section. Whereas containment for CQ has been intensively studied in the literature, the analogous problems for CQ ¬ and UCQ ¬ have hardly been addressed and seem to belong to folklore. In fact, we only found a reference of a complexity result for containment of CQ ¬ in [18], where a Π p 2 -algorithm for the problem is given, based on observations in [15], and the existence of a matching lower bound is mentioned. However, as we show below, although the problem is indeed in Π p 2 for queries defined over a fixed schema (or when the arity of relations is bounded), it is coNEXPTIME-complete in the general case. We first show the lower bounds. They actually already hold for Boolean queries. We show that Containment(BCQ ¬ , UBCQ ¬ ) is coNEXPTIMEhard by a reduction from the succinct 3-colorability problem and afterwards that Containment(BCQ ¬ , UBCQ ¬ ) can be reduced to Containment(BCQ ¬ , BCQ ¬ ). Here, BCQ ¬ and UBCQ ¬ denote the class of Boolean CQ ¬ s and unions of Boolean CQ ¬ s, respectively. Together this establishes that Containment(BCQ ¬ , BCQ ¬ ) and therefore also Containment(CQ ¬ , CQ ¬ ) are coNEXPTIME-hard.
Proof. The proof is by a reduction from the succinct 3-colorability problem, which asks, whether a graph G, which is implicitly given by a circuit with binary AND-and OR-and unary NEG-gates, is 3-colorable. The latter problem is known to be NEXPTIME-complete [17]. We say that a circuit C, with 2ℓ Boolean inputs, describes a graph G = (N, E), when N = {0, 1} ℓ , and there is an edge (n 1 , n 2 ) ∈ N 2 if and only if C outputs true on input n 1 n 2 .
Let C be an input for the succinct 3-colorability problem with 2ℓ Boolean inputs. We construct queries Q 1 and Q 2 such that Q 1 ⊆ Q 2 if and only if the graph described by C is 3-colorable.
Both queries are over schema D, which consists of relation names DomainValues (3) , (2) , and Label (ℓ+1) . Intuitively, satisfaction of Q 1 will guarantee that there is a tuple (a 0 , a 1 , a 2 ) with three different values in relation DomainValues. We will use, for some such tuple, a 0 , a 1 , a 2 as colors and a 0 , a 1 as truth values. We will often assume without loss of generality that (a 0 , a 1 , a 2 ) = (0, 1, 2). In particular, for such a tuple, a 0 is interpreted as false while a 1 is interpreted as true. The unary relation Bool will be forced by Q 1 to contain at least a 0 and a 1 .
Relations And, Or, and Neg are intended to represent the respective logical functions. The first two attributes represent input values, and the last attribute represents the output. Again, Q 1 will guarantee that at least all triples of Boolean values that are consistent with the semantics of AND, OR, and NEG are present in these relations. Tuples in relation Label represent nodes together with their respective color (one can think of the representation of a node by ℓ-ary addresses over a ternary alphabet). We define query Q 1 as follows: It is easy to see that Q 1 enforces the conditions mentioned above.
In the following, we denote sequences x 1 , . . . , x ℓ of ℓ variables by x.
We define Q 2 as the union of the queries Q 1 2 and Q 2 2 , where subquery Q 1 2 is defined as: Intuitively, Q 1 2 can be satisfied in a database if for some node, represented by x, there is no color.
Subquery Q 2 2 deals with the correctness of a coloring and uses a set circuit of atoms that is intended to check whether for two nodes u and v, represented by y and z, respectively, there is an edge between u and v.
To this end, circuit uses the variables y 1 , . . . , y ℓ , z 1 , . . . , z ℓ , representing the input and, at the same time, the 2ℓ input gates of C, and an additional variable u i , for each gate of C, with the exception of the output gate. The output gate is represented by variable w 1 . For each AND-gate represented by variable v 1 with incoming edges from gates represented by variables u 1 and u 2 , circuit contains an atom And(u 1 , u 2 , v 1 ). Likewise for OR-and NEG-gates.
Subquery Q 2 2 is defined as: Intuitively, Q 2 2 returns true when two nodes, witnessed to be adjacent by the circuit, have the same color.
We show in the appendix that C is 3-colorable if and only if Q 1 ⊆ Q 2 .
Next, we provide the above mentioned reduction.
Recall our assumption, that each disjunct is defined over a disjoint set of variables. Next, we construct CQs We explain the intuition behind the reduction by means of an example. To this end, let Q 1 be H() ← A(x, y) and let Q 2 be the Q 1 The query Q ′ 2 takes the following form: where α(w, Q) denotes the modification of the body of Q by replacing every atom R(x) by R ′ (w, x). Both queries are defined over the schema Notice that Q ′ 2 contains a concatenation of the disjuncts of Q 2 . In addition, relations A and B are extended with a new first column with the purpose of labelling tuples. This labelling allows to encode two (or even more) instances over D by one instance over D ′ . Specifically, body Q ′ 1 (not shown) is constructed in such a way that when there is a satisfying valuation for Q ′ 1 there are two different data values, say 0 and 1. So, an instance I over D can be encoded as 1, a, b) | B(a, b) ∈ I}. In addition, when there is a satisfying valuation for Q ′ 1 , there is an instance I 2 on which every disjunct of Q 2 is true, and there is an instance I 1 on which Q 1 is true. So, both Q ′ 2,1 and Q ′ 2,2 evaluate to true on I 0 2 when ℓ 1 and ℓ 2 are interpreted by label 0. However, for Q 1 to be contained in Q 2 , we need that at least one of the disjuncts Q ′ 2,1 or Q ′ 2,2 evaluates to true over I 1 1 , that is, when its labelling variable is interpreted as 1. Atom Active(x 0 , x 1 ; ℓ 1 , ℓ 2 ) will ensure that x 0 and x 1 correspond with the values 0 and 1, and that at least one of the labelling variables ℓ 1 or ℓ 2 is equal to 1. In other words, Active chooses which disjunct to activate over I 1 . So, at least one disjunct of Q 2 evaluates to true on the instance I 1 on which Q 1 is satisfied.
We explain the reduction in more detail in the appendix.
Combining Propositions 5.1 and 5.2 we get the following corollary: The corresponding upper bounds hold also in the presence of inequalities and are shown by small model (i.e., counter-example) properties. To this end, we make use of a restricted monotonicity property of UCQ ¬, = s which was already observed in Proposition 2.4 of [3]. For an instance I and a set D of data values we denote by I |D the restriction of I to facts that only use values from D. Now we can establish the following small model property for testing containment.
Proof. Let I be as in the lemma and let f be a fact with f ∈ Q 1 (I) and f ∈ Q 2 (I). Let V be a valuation that derives f via some disjunct 1. A NEXPTIME algorithm, on input Q 1 , Q 2 , can simply guess an instance J with a domain of at most m elements and a fact f , and verifies that f ∈ Q 1 (J) but f ∈ Q 2 (J). For the latter tests, it can simply cycle, in exponential time, to all valuations over J for Q 1 and Q 2 .
2. For a fixed arity bound, the minimal counter-example J is of size at most m k . It can thus be guessed in polynomial time. That f ∈ Q 1 (J) can be verified non-deterministically. That f ∈ Q 2 (J) can be verified by a universal computation in polynomial time.
A claim of a Π p 2 upper bound for containment of CQs with negation can be found in [18]. It was not made clear there, that this claim assumes bounded arity of the schema. That the containment problem is Π p 2 -complete for schemas of bounded arity has been explicitly shown in [16]. Clearly, Proposition 5.6.2 follows directly and 5.6.1 is only a variation of it. From Proposition 5.6 and Corollary 5.3 the main result of this section immediately follows.
Of course, the theorem also holds for all classes C of queries with BCQ ¬ ⊆ C ⊆ UCQ ¬, = .
6 Parallel-correctness: unions of conjunctive queries with negation As mentioned in Section 4, for conjunctive queries without negation parallelsoundness always holds and thus parallel-correctness and parallel-completeness coincide, thanks to monotonicity. For queries with negation the situation is different. Distributed evaluation can be complete but not sound, or vice versa. For this reason, we have to distinguish all three problems separately: correctness, soundness, and completeness. However, the complexity is the same in all three cases.
Our results show a second, more crucial difference. Whereas parallel completeness for CQs without negation could be characterised in terms of valuations, that is, objects of polynomial size, our algorithms for CQs with negation involve counter-examples of exponential size (if the arity of schemas is not bounded) and the coNEXPTIME lower bound results indicate that this is unavoidable. We illustrate the observation that counter-examples might need an exponential number of tuples by the following example.
However, there is no smaller instance: let I * be some instance over universe U that has a locally satisfying valuation V . The combination of atoms Bool(w 0 , w 0 ), Bool(w 1 , w 1 ), and ¬Bool(w 0 , w 1 ) in query Q then implies existence of both facts Bool(0, 0) and Bool(1, 1) because variables w 0 and w 1 cannot be mapped onto the same data value.
Assume that fact Rel(a 1 , . . . , a n ), for some (a 1 , . . . , a n ) ∈ {0, 1} n is missing from I * . Then the valuation W that maps w 0 → 0, w 1 → 1 and x i → a i , for every i ∈ {1, . . . , n}, satisfies Q also globally, on instance I * , and can therefore be no example against parallel-soundness, which contradicts our choice of I * . Thus, Rel(a 1 , . . . , a n ) ∈ I * , for every (a 1 , . . . , a n ) ∈ {0, 1} n . We therefore have I ⊆ I * and, in particular, instance I * contains at least as many facts as instance I.
The results of this section are summarised in the following theorem: Theorem 6.2. For every class P ∈ {P rule } ∪ P npoly of distribution policies, the following problems are coNEXPTIME-complete.
• Parallel-Sound(UCQ ¬ , P) • Parallel-Complete(UCQ ¬ , P) • Parallel-Correct(UCQ ¬ , P) Theorem 6.2 follow from Propositions 6.3 and 6.5 below. It also holds for UCQ ¬, = . It is easy to show that, when restricted to schemas with some fixed (but sufficiently large, for hardness) arity bound, all these problems are Π p 2complete.

Upper bounds
In this section, we show the upper bounds of Theorem 6.2, summarised in the following proposition. Proposition 6.3. Parallel-Sound(UCQ ¬, = , P), Parallel-Complete(UCQ ¬, = , P), and Parallel-Correct(UCQ ¬, = , P) are in coNEXPTIME, for every class P ∈ P of distribution policies. If the arity of schemas is bounded by some fixed number, these problems are in Π p 2 .
Proof. As already indicated above, the proof relies on a bound on the size of a smallest counter-example. More specifically, we first show the following claim.
Claim 6.4. Let Q ∈ UCQ ¬, = and let P be an arbitrary distribution policy. Then the following statements hold: 1. If Q is not parallel-complete under P , then there is an instance J over a domain with at most varmax (Q) elements such that Q is not parallelcomplete on J under P .
2. If Q is not parallel-sound under P , then there is an instance J over a domain with at most varmax (Q) elements such that Q is not parallelsound on J under P .
Towards (1) let us assume that Q is not parallel-complete on some instance I under P . Let V be a valuation of a disjunct Q i of Q that derives a fact f globally that is not derived on any node of the network. Let D def = adom(V (pos Qi )) and J def = I |D . Clearly, |D| ≤ varmax (Q) and V still derives f globally on instance J via Q i . On the other hand, for every node κ, Q loc-inst P ,J (κ) = Q loc-inst P ,I (κ) |D ⊆ Q loc-inst P ,I (κ) , thanks to Lemma 5.4. Therefore f is not derived on κ, and thus J witnesses the lack of parallel-completeness of Q under P .
The proof of (2) is completely analogous. Given a counter-example I and a valuation V that derives a fact f on some node κ via Q i , for which f is not derived globally, we define D def = I |adom(V (pos Q i )) and show that J def = I |D is the desired counter-example.
In the appendix, we describe an algorithm that tests the complement of parallel completeness non-deterministically.
Proof. Interestingly, all three results are shown by the same reduction from decision problem Containment(BCQ ¬ , BCQ ¬ ).
The basic idea for this reduction is very simple: it combines both queries Q 1 , Q 2 ∈ BCQ ¬ of the given containment instance into a single query Q ∈ BCQ ¬ and infers an appropriate distribution policy P . To emulate separate derivation for both queries in the combined query, an activation mechanism is used that resembles the proof of Proposition 5.2. In this fashion, the two queries can be evaluated over different subsets of the considered instance by annotating both the facts in the instance as well as the atoms of the query.
We next describe the reduction in detail. Let thus Q 1 , Q 2 ∈ BCQ ¬ be queries over some schema D and let m def = max varmax (Q 1 ), varmax (Q 2 ) . Without loss of generality, we assume the variable sets of Q 1 and Q 2 to be disjoint. We will also assume in the following that both Q 1 and Q 2 are satisfiable. This is the case (for Q 1 ) if and only if pos Q1 ∩ neg Q1 = ∅ and can therefore be easily tested in polynomial time. If one of the test fails, some appropriate constant instance of Parallel-Complete(CQ ¬ , P rule ) or one of the other problem variants, respectively, can be computed.
We define a (Boolean) query Q ∈ BCQ ¬ and a policy P ∈ P rule over domain {1, . . . , m} that can be computed from Q 1 and Q 2 in polynomial time.
That is, each relation name R of D occurs as R ′ in D ′ with an arity incremented by one. Additionally, Q uses relation names Type, Start 1 , Start 2 , and Stop, which we assume not to occur in schema D. Besides the variables of Q 1 and Q 2 , query Q uses variables ℓ 1 , ℓ 2 , t.
We use the function α, defined in the proof of Proposition 5.2, which adds its first parameter as first component to every tuple in its second parameter and translates relation names R into R ′ . In Proposition 5.2, the first parameter was always a variable and the second a set of atoms, but we use α also for a data value as first and a set of facts as second parameter in the obvious way. We write α −1 a for the function mapping sets of facts over D ′ to sets of facts over D, by selecting, from a set of facts, all facts with first parameter a, deleting this parameter and replacing each name R ′ by R. Finally, π a (I) def = α a, α −1 a (I) is the restriction of I to all facts with a in their first component.
The combined query Q has head Q def = H() and body Policy P is defined over universe U • Every node κ i is responsible for the facts Type(1), Start 1 (i), Start 2 (i), Stop(i), and all facts from facts(D ′ , U ).
• Every node σ i is responsible for the facts Type(2), Start 1 (i), Stop(i), all Start 2 -facts, and all facts from facts(D ′ , U ).
It is easy to see that P can be expressed by a polynomial number of rules and that Q and P can be computed in polynomial time. In the appendix, we show that the described function is indeed the desired reduction.

Full conjunctive queries
In this section, we focus attention on full conjunctive queries, in an attempt to lower the complexity of testing parallel-correctness. Requiring queries to be full is a very natural restriction which is known to have practical benefits. For example, the Hypercube algorithm, which describes an optimal way to compute CQs in a setting very similar to ours, completely ignores projections when shuffling data, and only applies them when computing the query locally. The latter is possible because correctness for the full-variant of a query is in a sense more strict than correctness for the query itself. Formally, a (union of) conjunctive queries is called full if all variables of the body also occur in the head. We denote by FCQ ¬, = and UFCQ ¬, = the class of full CQ ¬, = and full UCQ ¬, = queries, respectively, and likewise for other fragments.
The presentation is similar to that of Section 5 and 6. First, we establish the complexity of query containment. Then, we show that containment reduces to parallel-correctness (and variants). Finally, we obtain matching upper bounds.
The following theorem shows that unlike for general conjunctive queries the complexity of deciding containment for FCQ ¬ and UFCQ ¬ do not coincide. 3. Containment(UFCQ ¬ , UFCQ ¬ ) is coNP-complete. All these results also hold for queries with inequalities.
The following theorem determines the complexity for the upper bounds: The result also holds for queries with inequalities.

Discussion
In this paper, we continued the study of parallel-correctness initiated by Ameloot et al. [5] as a framework for reasoning about one-round evaluation algorithms for conjunctive queries under arbitrary distribution policies. Specifically, we considered the case with union and negation. While parallel-correctness for unions of conjunctive queries can be tested by examining properties of single valuations, just like in the union-free case, the latter no longer holds true when negation is present. Consequently, we obtained that deciding parallel-correctness for unions of conjunctive queries remains in Π p 2 , while the analog problem in the presence of negation is hard for coNEXPTIME. Since conjunctive queries with negation are no longer monotone, we considered the related problems of parallelcompleteness and parallel-soundness as well and obtained the same bounds. Interestingly, when negation is present, containment of conjunctive queries can be reduced to parallel-correctness (and its variants) allowing the transfer of lower bounds. We prove that containment for conjunctive queries with negation is hard for coNEXPTIME, which, to the best of our knowledge, is a novel result. In an attempt to lower complexity, we show that parallel-correctness for unions of full conjunctive queries with negation is coNP-complete.
There are quite a number of directions towards future work. While parallelcorrectness for first-order logic is undecidable, it would be interesting to determine the exact frontier for decidability. As the considered problem is a static analysis problem that relates to the size of the queries and not to the size of the instances (at least in the setting of P rule ), exponential lower bounds do not necessarily exclude practical application. It could still be interesting to identify settings that would make parallel-correctness tractable. Possibly independent of tractability considerations, such settings could incorporate bag semantics, integrity constraints, or specific classes (and representations) of distribution policies. We also plan to consider evaluation algorithms that use knowledge about the distribution policy to compute better query results, locally. Another direction for future work is to investigate transferability of parallel-correctness for conjunctive queries as defined in [5] in the presence of union and negation. We claim that label is a valid coloring of the graph represented by C. Towards a contradiction let us assume that there are two nodes n and n ′ that are connected by an edge and for which label(n) = label(n ′ ) = c, for some c. Then Q 2 2 could be satisfied over I by choosing a valuation that corresponds to a computation of C that witnesses that there is an edge between n and n ′ and mapping u to c, the desired contradiction. (Only if ) Let, for some ℓ, C be a circuit with input length 2ℓ, that describes a 3-colorable graph G. Let label : {0, 1} ℓ → {0, 1, 2} be a valid coloring for G. Let I be the database with the following facts: Obviously, T () ∈ Q 1 (I) and T () ∈ Q 1 2 (I). However, since I only contains the "correct" logical facts, to satisfy Q 2 2 it would be necessary to find two nodes with the same label whose adjacency is witnessed by the canonical valuation corresponding to the semantics of C, which does not exist. Thus, Q 1 ⊆ Q 2 . follows: It is easy to verify that V ′ 2 satisfies Q ′ 2 : the first atom and all conjuncts α(z j , Q j 2 ) with j = i become true since the respective facts were guaranteed by Q ′ 1 . Finally, Therefore, let I be an instance, and V 1 a valuation such that V 1 satisfies Q 1 over I, but no Q i 2 has a satisfying valuation over I. Since every query Q i 2 is satisfiable, there is, for every i, a satisfying valuation V i . In fact, these valuations can be chosen with pairwise disjoint range. Now, we define I ′ as the following set of facts: 1; 1, 0, . . . , 0), . . . , Active(0, 1; 0, . . . , 0, 1)}.
It is easy to check 7 that from V 1 and the V i a satisfying valuation for Q ′ 1 over I ′ can be constructed. On the other hand, any satisfying valuation of Q ′ 2 over I ′ would require to use facts from α(1, I) for at least one α(z i , Q i 2 ) and would thus induce a valuation of Q i 2 over I, the desired contradiction.
10 Proofs for Section 6: Parallel-correctness: unions of conjunctive queries with negation Proof of Proposition 6.3 (continued). It only remains to describe the algorithm that tests the complement of parallel completeness non-deterministically. On input Q and P (specified by (n, T ) ∈ P k npoly ), the algorithm simply guesses an instance J over a domain with at most varmax (Q) values from dom n , and verifies that J is a counter-example showing that Q is not parallel-complete under P . From Claim 6.4 it follows that this algorithm is correct, since a counter-example must exist if Q is not parallel-complete under P , and the actual data values do not matter. It remains to show the complexity bounds and, in particular, to describe how the verification part can done.
In the general case, without a bound on the arity of the schema, the verification is done as follows. The algorithm guesses a valuation V that produces some fact f globally and which is not derived at any node. To test that f is not derived at any node, the algorithm cycles through all nodes and all valuations V over dom n . The number of combinations is bounded 8 by 2 n × (2 n ) varmax (Q) = 2 n(varmax (Q)+1) . Each test can be performed by a simulation of all runs of T which amounts to at most 2 n k simulations of at most n k steps each. Altogether the algorithm needs time at most 2 |(Q,P )| k+2 .
If there is a fixed bound ℓ on the arity of the underlying schema then the maximum size of the minimal counter-example becomes polynomial and the test that f is derived globally can be done non-deterministically in polynomial time and the test that it is not derived locally can be done universally in polynomial time, thus altogether yielding a Π p 2 -computation. The case of parallel-soundness is completely analogous (using the second statement of Claim 6.4 and the case of parallel-correctness follows since it suffices to test parallel completeness and soundness).
Proof of Proposition 6.5 (continued). It remains to show that the described function is a reduction from Containment(BCQ ¬ , BCQ ¬ ) to all of Parallel-Complete(CQ ¬ , P rule ), Parallel-Sound(CQ ¬ , P rule ), and Parallel-Correct(CQ ¬ , P rule ).
To this end, we show first that containment Q 1 ⊆ Q 2 implies parallelcompleteness and parallel-soundness of query Q under policy P , and that lack of containment, Q 1 ⊆ Q 2 , implies that query Q is neither parallel-complete nor parallel-sound under policy P .
In both directions, we will make use of the following easy observations. Claim 10.1. Let I be an arbitrary instance and i ∈ {1, 2}.
1. Let V be a valuation for Q, let a def = V (ℓ i ) and V i be the restriction of valuation V to variables in Q i . If V satisfies Q on I, then V i satisfies Q i on α −1 a (I). 2. Let V 1 , V 2 be valuations for queries Q 1 , Q 2 . Let a, b be data values such that V 1 and V 2 satisfy Q 1 and Q 2 , respectively, on α −1 a (I), and such that Type(b), Start 1 (a), Start 2 (a) ∈ I, and Stop(a) / ∈ I. Then the valuation W b a , that agrees with V 1 and V 2 on all variables in Q 1 and Q 2 , respectively, and maps ℓ 1 , ℓ 2 → a, and t → b, satisfies Q on I.
Let us now assume Q 1 ⊆ Q 2 . To show that Q is parallel-complete under P , let V be a valuation that globally satisfies Q on some arbitrary instance I over U . Let  Start 1 (a), Start 2 (a)}. If W (t) = 1, then node κ a is responsible for these facts; if W (t) = 2, then node σ a is responsible for these facts; and otherwise node ρ is responsible for these facts. Hence, query Q is parallel-complete under policy P .
The proof that Q is parallel-sound under P is similar. To this end, let I be a global instance and V be a valuation that satisfies Q on the local instance I κ 8 We assume a binary alphabet, here. of some node κ ∈ N . Let a def = V (ℓ 1 ) and b def = V (t). By definition of P , we can infer Type(b), Start 1 (a), Start 2 (a) ∈ I and Stop(a) / ∈ I from satisfaction of Q by V . Let V 1 be the satisfying valuation for Q 1 on instance α −1 a (I κ ), as given by Claim 10.1.1. Since Q 1 ⊆ Q 2 , there exists a valuation V 2 that satisfies Q 2 on instance α −1 a (I κ ). By definition of P , we have α −1 a (I κ ) = α −1 a (I), and therefore V 2 satisfies Q 2 on α −1 a (I). Thus, by Claim 10.1.2, valuation W def = W b a satisfies Q on I. Hence, query Q is parallel-sound under policy P .
Let now Q 1 ⊆ Q 2 and let I 1 be an instance such that Q 1 (I 1 ) = ∅ and Q 2 (I 1 ) = ∅. Thanks to Lemma 5.5, we can assume that I 1 is over a domain of size at most varmax (Q 1 ). Thanks to genericity, we can assume that the domain is a subset of U . Let V 1 be a valuation that satisfies query Q 1 on I 1 . Furthermore, let V 2 be some consistent valuation for Q 2 and I 2 def = V 2 (pos Q2 ), which exists thanks to our assumption that Q 2 is satisfiable.
To show that Q is not parallel-complete under P , we define I def = α 1 (I 1 ) ∪ α 2 (I 2 ) ∪ {Type(1), Start 1 (1), Start 2 (1), Start 2 (2)}. Then, valuation V which maps all variables in Q 1 and Q 2 as V 1 and V 2 , respectively, and ℓ 1 → 1, ℓ 2 → 2, and t → 1 satisfies query Q on instance I. However, there is no locally satisfying valuation for Q. For a contradiction, assume existence of such a valuation W . Since Type(1) and Start 1 (1) are the only Type-and Start 1 -facts contained in the global instance, we have W (t) = W (ℓ 1 ) = 1. By definition of P , this valuation can only be satisfying on node κ 1 . Since only Start 2 (1) ∈ loc-inst P ,I (κ 1 ), Claim 10.1.1 implies existence of a satisfying valuation W 2 for Q 2 on α −1 1 (I) = I 1 , which contradicts the choice of instance I 1 . Hence, query Q is not parallel-complete under policy P .
To show that Q is also not parallel-sound under P , let instance I def = α 1 (I 1 ) ∪ α 2 (I 2 ) ∪ {Type(2), Start 1 (1), Start 2 (1), Start 2 (2), Stop(2)}. Then, valuation V which maps all variables in Q 1 and Q 2 as V 1 and V 2 , respectively, and ℓ 1 → 1, ℓ 2 → 2, and t → 2 satisfies query Q on the local instance loc-inst P ,I (σ 1 ) of node σ 1 because this node is not responsible for fact Stop (2). However, there is no globally satisfying valuation for Q. Towards a contradiction, assume existence of such a valuation W . Since Start 2 (1), Start 2 (2) are the only Start 2facts in the global instance, we have W (ℓ 2 ) = 1 or W (ℓ 2 ) = 2. The latter cannot hold because the valuation then prohibits the present fact Stop (2). The former implies, by Claim 10.1.1, a satisfying valuation W 2 for Q 2 on α −1 1 (I) = I 1 , which contradicts the choice of instance I 1 . Hence, query Q is not parallel-sound under policy P .
This completes the proof that problem Containment(BCQ ¬ , BCQ ¬ ) is reducible to all three problems Parallel-Complete(CQ ¬ , P rule ), Parallel-Sound(CQ ¬ , P rule ), and Parallel-Correct(CQ ¬ , P rule ) in polynomial time and thus shows coNEXPTIMEhardness via Corollary 5.3.

Proofs for Section 7: Full conjunctive queries
In the proofs below we drop the convention that in UCQs all disjuncts have pairwise disjoint variable sets (which was introduced to simplify the proofs in other sections). However, one can easily observe that a UCQ that does not comply with this convention can be easily transformed to one that does, by for example, adding distinct indices to variables in separate disjuncts.
11.1 Proof for Theorem 7.1 Theorem 7.1 follows from the hardness results in Proposition 11.1, and the upper bound in Proposition 11.3, which are given below.
We show that Q 1 ⊆ Q 2 if and only if h(pos Q2 ) ⊆ pos Q1 and h(neg Q2 ) ⊆ neg Q1 for the substitution h that identifies the head relation, 9 h(head Q2 ) = head Q1 . It then immediately follows that Containment(FCQ ¬, = , FCQ ¬, = ) is in P; To show the claim, let Q 1 ⊆ Q 2 . Let I − be the minimal canonical database for Q 1 , i.e., I − consists of the frozen atoms in pos Q1 . By I + we denote the maximal canonical database for Q 1 , i.e., I + contains every frozen atom over vars(Q) (and the relations where Q 1 and Q 2 are defined over) that is not in neg Q1 . By construction, head Q1 ∈ Q 1 (I − ) and head Q1 ∈ Q 1 (I + ), which implies by containment that head Q1 ∈ Q 2 (I − ) and head Q1 ∈ Q 2 (I + ). By fullness of Q 2 , both are derived by the same valuation V . Now, the former implies V (pos Q2 ) ⊆ I − = pos Q1 , and the latter implies V (neg Q2 ) ∩ I + = ∅. Thus, V (neg Q2 ) ⊆ neg Q1 . Hence, V describes the desired substitution.
For the other direction, suppose that substitution h has the desired properties. Let I be an arbitrary instance, and f ∈ Q 1 (I). Thus, there is a valu- . The latter implies V 2 (neg Q2 ) ∩ I = ∅, and thus f ∈ Q 2 (I).
2. The reduction is from the graph 3-colorability problem, which is well-known to be NP-complete, and asks for a given graph G whether there is a coloring of the nodes in G, using only 3 colors, such that adjacent nodes have distinct colors. Let G be an arbitrary input for the described problem.
We construct queries Q 1 ∈ FCQ ¬, = and Q 2 ∈ UFCQ ¬, = , and show Q 1 ⊆ Q 2 if and only if G is not 3-colorable. Intuitively, this is done by letting Q 1 derive tuples that each represent a complete labelling function for (a substitution of) graph G. Semantically, query Q 2 is very similar to Q 1 , but derives only invalid labelling functions, that is, which either give two adjacent nodes the same color, or use more than three colors. The latter is implemented as a union of queries where each disjunct detects a particular issue.
Queries Q 1 and Q 2 are defined over database schema D def = {E (2) , L (2) }, where E represents the edges of G, and L is used to model mappings from nodes in G onto colors.
Before going to the construction itself, we first define edges as the set of atoms describing the edges of G. We do this by taking for every node in G a unique variable in var, and adding the atom E(x, y) to edges if and only if the nodes represented by x and y are adjacent in G. Second, we define the set labels, consisting of atoms L(x, y x ) for every node-representing variable x, where y x denotes a fresh variable. We call y x a color-representing variable.
Henceforth, we denote by x the node-representing variables, and by y the color-representing variables. Both are assumed to have a fixed order. Now, we define query Q 1 as: H(x, y) ← edges, labels.
Notice that, on given instance I, Q 1 outputs every possible labelling function for substitutions of graph G representable with the facts in I. Query Q 2 is slightly more complex, therefore we chop Q 2 down in two semantically-meaningful UFCQ ¬ s: Q ∀h , which detects whether a described labelling function assigns the same color to some pair of adjacent nodes; and Q >3 , which detects whether a described labelling function uses more than three colors.
We describe the sketched queries in more detail. To this end, let x 1 , x 2 ∈ x be two distinct variables, representing adjacent nodes in G. Based on x 1 and x 2 , we can define a substitution h mapping all the variables of x and y onto themselves, except for y x1 and y x2 , which are mapped onto a fresh variable y. We call h a collision-revealing substitution for x and y. Now Q ∀h is the query defined as the union of queries: for all collision-revealing substitutions h for x and y as described above. Query Q ∀h outputs exactly those labelling functions outputted by Q 1 that assign a same color to at least some pair of adjacent nodes.
Query Q >3 is defined as the union of queries: for every combination of distinct variables y 1 , y 2 , y 3 , y 4 in y.
As the construction of the individual FCQ ¬ can be done in polynomial time in the size of G, and there are only n 2 many collision-revealing substitutions, the construction of Q 1 and Q 2 can be done in polynomial time in the size of G.
Correctness. We show that Q 1 ⊆ Q 2 if and only if G is 3-colorable.
(If ) Suppose G is 3-colorable. So, there is a labelling function ℓ mapping the nodes in G onto 3-colors, such that no two adjacent nodes are assigned the same color. We abuse notation and assume nodes in G, as well as the colors in the image of ℓ, are over dom. Let I def = {E(n 1 , n 2 ) | (n 1 , n 2 ) is an edge in G} ∪ {L n, ℓ(n) | n is a node in G}. Now, let V 1 be the valuation mapping the node-representing variables in Q 1 onto the nodes of G that they represent, and the color-representing variables onto colors as defined by ℓ. Obviously, V 1 satisfies on I and derives a fact f . By choice of ℓ, f ∈ Q ∀h (I), because none of its adjacent nodes are labelled the same color. Further, ℓ introduces only three colors, and thus f ∈ Q >3 (I), which implies f ∈ Q 2 (I). Hence, Q 1 ⊆ Q 2 .
(Only if ) Suppose Q 1 ⊆ Q 2 . Thus, there is an instance I and fact f , where f ∈ Q 1 (I), and f ∈ Q 2 (I). Let V 1 be the valuation for Q 1 on I.
The former implies that I contains an interpretation of the described graph. Notice that f does not necessarily describe G itself, but rather a substitution of G under which some nodes might have been collapsed. We consider the labelling function ℓ, defined as ℓ(n) def = V 1 (x), for every x ∈ x, where x is the variable representing node n in G.
By fullness of the considered queries, for each of the disjuncts of Q 2 , there is only one valuation that can possibly derive f , and f uniquely defines this valuation for the respective disjunct. Therefore, from f ∈ Q ∀h (I) we directly obtain that for every pair of adjacent nodes n 1 , n 2 in G: ℓ(n 1 ) = ℓ(n 2 ). So, ℓ describes a valid labelling function for G. It remains to argue that ℓ does not use more than three colors. The latter follows from f ∈ Q >3 (I), and thus the substitution of G described by f must be 3-colorable. From this it immediately follows that G must be 3-colorable as well.
We next show that the problem remains coNP-hard, even for queries without inequalities. Proof. For this to see, let Q 1 ∈ FCQ ¬, = and Q 2 ∈ UFCQ ¬, = over some schema D. Let y represent the variables used in Q 1 , in some fixed order. We construct a query Q ′ 2 ∈ UFCQ ¬ such that Q 1 ⊆ Q 2 if and only if Q 1 ⊆ Q ′ 2 . For this, we extend D to a schema D ′ that also defines relations Eq (2) and Neq (2) . Intuitively, Eq (2) models the equality relation =, and Neq models the inequality relation =.
We assume Eq and Neq are not in D.
For the construction, we divide Q ′ 2 into two subqueries Q * 2 and Q Neq . Query Q * 2 results from Q 2 by replacing every inequality x = y by atom Eq(x, y). The purpose of query Q Neq is to allow derivation of a fact derivable by Q 1 on every instance where Eq and Neq do not represent equality relation = or inequality relation =, respectively. To this end, we further divide Q Neq into the following queries: Q Neq equal , which detects values that are wrongly identified as being unequal (by Neq); Q Eq,Neq amb , which detects that some values occur in both Eq and Neq; Q Eq asym , which detects values where Neq is not symmetric for; Q Neq asym , which detects values where Eq is not symmetric for; Q Eq,Neq undef , which detects that certain values are not in Neq, nor in Eq; and finally, Q Neq , in which all occurrences of x = y are replaced by atoms Neq(x, y).
More formally, Q Neq equal is defined as the union of the queries: for all collision-revealing substitutions h for x and y. Thus, Q Neq equal outputs on a given instance exactly those facts outputed by Q 1 in which some values are wrongly identified as being inequal.
Correctness. It remains to show Q 1 ⊆ Q 2 if and only if Q 1 ⊆ Q ′ 2 . (If ) Suppose Q 1 ⊆ Q 2 . Thus, for some instance I and fact f , f ∈ Q 1 (I), while f ∈ Q 2 (I). We consider instance I ′ over D ′ , which consists of all facts in I, and for every value a ∈ adom(I) a fact Eq(a, a), and for every pair of distinct values a, b ∈ adom(I) a fact Neq(a, b).
Because Q 1 does not reference relations Eq or Neq, it follows that f ∈ Q 1 (I ′ ). As relations Eq and Neq express exactly = and =, over the values in adom(I), we obtain Q 2 (I) = Q * 2 (I ′ ). Consequently, f ∈ Q 2 (I) implies f ∈ Q * 2 (I ′ ). Further, as Eq and Neq are symmetric, consistent, and completely defined over adom(I ′ ), Thus, for some instance I and fact f , f ∈ Q 1 (I), and f ∈ Q ′ 2 (I). Let V be the valuation deriving f for Q 1 on I and D def = adom(V (body Q1 )). From f ∈ Q ′ 2 (I) it now follows that Eq and Neq are welldefined over D, that is, the relations Eq and Neq are identical to = and = over values in D. Indeed, for all a, b ∈ D: • there are no facts Neq(a, a) (because f ∈ Q Neq equal (I)); • Eq(a, b) in I implies Eq(b, a) (because f ∈ Q Neq asym (I)); • Neq(a, b) in I implies Neq(b, a) (because f ∈ Q Eq asym (I)); and • there is a fact Eq(a, b) or Neq(a, b) in I (from f ∈ Q Eq,Neq undef (I)). As a result, we can simply replace every occurrence of Neq(x, y) in query Q * 2 by x = y without changing its semantics over D. More formally, we obtain Q * 2 (I |D ) = Q 2 (I |D ). By fullness of Q 2 and the definition of D, f ∈ Q 2 (I) would imply f ∈ Q 2 (I |D ) and thus f ∈ Q ′ 2 (I). Consequently, it must be that f ∈ Q 2 (I).
Proof. We observe that when Q ⊆ Q ′ , there is an instance I and fact f , such that f ∈ Q(I), and f ∈ Q ′ (I). In particular, by fullness of Q 1 and Q 2 , there is such an instance of size at most ℓ def = max i∈{1,...,n} {|pos Qi |} + m, where n denotes the number of disjuncts of Q, and m the number of disjuncts in Q ′ .
Indeed, to see this, let V and Q i be the valuation and disjunct of Q where f is derived by on I. Let J def = V (pos Qi ). By fullness it follows that for each disjunct Q j of Q ′ there is at most one valuation V j eligible to derive f for Q j . As, by choice of f , V j does not satisfy on I it must be that either V j (pos Qj ) ⊆ I, or V j (neg Qj ) ∩ I = ∅. We ignore the former. In the latter case we choose one fact from V j (neg Qj ) ∩ I and add it to J. One can now easily verify that J is as desired.
As the above shows that there always is a witnessing instance I of size ℓ, given I, V , and Q i as polynomial size certificate, we can easily verify that Q is indeed not contained in Q ′ , by simply verifying that V indeed satisfies for disjunct Q i of Q on I, and for all m eligible valuations for conjuncts of Q ′ , either not all required facts are in I, or at least one of the prohibited facts is present.

Proof of Proposition 7.2
The result follows from Proposition 11.1 and the reductions below.
Proof. The idea underlying the following reductions is simple: extend both CQ ¬ s Q and Q ′ for the containment problem by a nullary atom Global() or its negation ¬Global(), respectively, and combine them (by union) into a single query Q * . By mapping the fact Global() onto an isolated node, the distribution policy then allows to control global and local derivability on behalf of Q and Q ′ . Without loss of generality, we always assume that queries Q and Q ′ do not use the auxiliary relation Global.
For a CQ ¬ Q, let Q Global and Q ¬Global denote the queries obtained by adding the literal Global() or ¬Global() to Q, respectively. For unions of CQ ¬ s this particularly means adding Global() or ¬Global() to every disjunct of Q. The following identities can be easily proven to hold for every query Q and every instance I: (1d) We only argue the reductions for policy class P rule . It is obvious that such a policy can also be represented by a policy from any class P k npoly ∈ P npoly . 1. We start with Containment(FCQ ¬ , FCQ ¬ ) ≤ p Parallel-Sound(UFCQ ¬ , P rule ).
Let Q and Q ′ be CQ ¬ s. We define a UCQ ¬ Q * and a policy P as follows. For this, let Q * def = Q ¬Global ∪ Q ′ Global . We construct a subset D of dom, where |D| = |vars(Q 1 )|. Since the actual data values do not matter, we can choose those with the shortest representation length and thus also represent set D polynomially in the size of query Q 1 . Now, P is defined as a distribution policy over network N = {κ 1 , κ 2 } that forwards every fact over D except Global() to κ 1 , and Global() to node κ 2 . As the described distribution policy can be straightforwardly expressed with a distribution policy in P fin , the construction of both Q * and P can be done in polynomial time.
Correctness. It remains to show that Q ⊆ Q ′ if and only if Q * is parallel-sound under P . The following observations are crucial for the correctness argument.
Second, for each instance I that does not contain Global(), Q * is equivalent to Further, as κ 2 can only contain the fact Global(), it follows that for each instance I we have Q(loc-inst P ,I (κ 2 )) = ∅.
(Only if ) Assume Q ⊆ Q ′ . Let I be an arbitrary subset of facts(P ). If Global() / ∈ I, the local instance of node κ 1 is identical to the global instance, that is loc-inst P ,I (κ 1 ) = I. Now, by definition of P and thus also the result sets, Q * loc-inst P ,I (κ 1 ) = Q * (I). In particular, this implies Q * loc-inst P ,I (κ 1 ) ⊆ Q * (I), that is, parallel-soundness of Q * under P on instance I.
(If ) For a proof by contraposition assume Q ⊆ Q ′ . This implies existence of an instance I where Q(I) ⊆ Q ′ (I). Without loss of generality we can assume adom(I) ⊆ D. The latter is a safe assumption as from Lemma 5.5 it follows that an instance J ⊆ I exists that preserves the desired property, and where |adom(J)| ≤ |D|. Further, by genericity of Q 1 and Q 2 , we can uniquely rename data values in J to data values in D, resulting in an instance with the desired properties.
Additionally, we may safely assume that Global() ∈ I because neither Q nor Q ′ refers to relation Global. This results in a local instance loc-inst P ,I (κ 1 ) = I \ {Global()}. Again, by Equations (2) and (3), we conclude Q * loc-inst P ,I (κ 1 ) = Q(I) ⊆ Q ′ (I) = Q * (I), that is Q * is not parallel-sound under P on instance I. Therefore, by contraposition, parallel-soundness of query Q * under policy P and domain D implies containment Q ⊆ Q ′ .