SQL's Three-Valued Logic and Certain Answers

SQL uses three-valued logic for evaluating queries on databases with nulls. The standard theoretical approach to evaluating queries on incomplete databases is to compute certain answers. While these two cannot coincide, due to a signiﬁcant complexity mismatch, we can still ask whether the two schemes are related in any way. For instance, does SQL always produce answers we can be certain about? This is not so: SQL’s and certain answers semantics could be totally unrelated. We show, however, that a slight modiﬁcation of the three-valued semantics for relational calculus queries can provide the required certainty guarantees. The key point of the new scheme is to fully utilize the three-valued semantics, and classify answers not into certain or non-certain, as was done before, but rather into certainly true, certainly false, or unknown. This yields relatively small changes to the evaluation procedure, which we consider at the level of both declarative (relational calculus) and procedural (relational algebra) queries. We also introduce a new notion of certain answers with nulls, which properly accounts for queries returning tuples containing null values.


Introduction
SQL's query evaluation engine uses three-valued logic when it comes to handling incomplete information: comparisons involving null values have the truth value unknown [7].This results in a number of well known paradoxes.Consider, for instance, two relations R and S with a single numerical attribute A, and assume that S contains a single row with a null value in it.Then select S.A from S where S.A <= 0 or S.A > 0 returns nothing despite the condition in the where clause being a tautology.This is because both null <= 0 and null > 0 evaluate to unknown and so does their disjunction.Worse yet, for the same reason, the query computing R − S: select R.A from R where R.A not in (select S.A from S) returns nothing if S contains a single null, no matter what R is, telling us that might well have |R| > |S| and R − S = ∅ at the same time.
However unintuitive these answers are (which led to very severe criticism of the design of null-related features of SQL [6,7]) they at least seem not to give us any false positives.To understand what it means, we appeal to the standard theoretical notion of query answering in the presence of incompleteness, certain answers [1,12].Each incomplete database D has an associated semantics [[D]].We can think of [[D]] as the set of possible complete databases that D can represent, i.e., all databases obtained by substituting values for nulls.Then certain answers contain tuples that will be in the answer to Q over all possible complete databases represented by D: How does SQL evaluation of queries relate to certain answers?There is a simple argument that they cannot coincide for relational calculus queries: SQL's evaluation is tractable (very tractable, in fact, of AC 0 data complexity), but data complexity of certain answers is intractable: at least coNP-complete for commonly considered semantics [2].Examples (1) and ( 2) seem to suggest that we at least get a subset of certain answers, but this is not the case: false positives are possible.Consider the query: select R.A from R where R.A not in (select R1.A from R R1 where R1.A not in (select * from S)) expressing R − (R − S) and a database R = {1} and S = {⊥}.SQL's evaluation results in {1}.At the same time the certain answer is empty: if ⊥ is interpreted as any value other than 1, the query produces ∅.
Can we remedy this?Clearly we cannot modify SQL's evaluation rules to generate certain answers due to the complexity mismatch.So the best we can hope for is a reasonable approximation without false positives.The idea itself is not new: in fact for the first time it was expressed in [22], even before complexity bounds for certain answers were known.Despite this, we do not yet have such approximation schemes for SQL query evaluation.Providing them is our goal here.Specifically, we want to achieve the following: find query answers fast, without a significant modification of the existing evaluation techniques, and at the same time guarantee that no false positives occur, i.e., every returned tuple is a certain answer.
We achieve this by providing a small modification to the three-valued logic approach of SQL that restores correctness guarantees: query evaluation no longer produces false positives, and all returned results are guaranteed to be certain answers.
To understand the idea of the modification, notice that SQL's query evaluation actually mixes three-and two-valued logic.Three-valued logic is used to evaluate conditions, but then query results return only those tuples for which conditions evaluate to true, effectively collapsing unknown and false.This works fine for positive queries, but once negation, especially negation in subqueries (e.g., not in or not exists) enters the picture, we have a problem, as it flips truth values.Now true flips to false, but both unknown and false (which were collapsed to one value when a subquery was evaluated) flip to true!This is how unintended tuples end up in the answer.
So to get correctness guarantees, we just need to be faithful to the three-valued approach.This means that there will be three possible outcomes for each candidate answer tuple: it can be either certainly in the answer (truth value true); or certainly not in the answer (truth value false); or possibly in the answer, or possibly not (truth value unknown).
The second modification that we need is using marked, or naïve nulls [1,12] in tables.Such nulls can appear multiple times in tables, and they are often required by applications such as data integration and exchange [3,13].In fact they have already been implemented in I C D T 2 0 1 5 connection with such applications [11,19].Generally, SQL's nulls can be modeled with naïve nulls, simply by forbidding repetition.The reason we need marked nulls is twofold.Firstly, we want to produce more general results.Secondly, we need to overcome an additional (and quite unreasonable) deficiency of SQL's handling of nulls: even comparing whether a null value equals itself produces truth value unknown.Indeed, consider a table T(A,B) with a single tuple (1,null) and a query select T1.A from T T1, T T2 where T1.A=T2.A and T1.B=T2.B, i.e., π A (T ∩ T ).Instead of the expected 1, it gives the empty result, as comparing a value with itself does not evaluate to true.
We remark that using the multi-valued approach has proved very useful in two closely related areas: model-checking [4,10], and knowledge representation [14,18].In fact the procedure of [14] that uses three-valued reasoning with knowledge bases is similar in spirit with the modification of SQL query evaluation that we propose (although the technical details of our procedure are quite different from [14]), and its modifications to achieve tractable reasoning [18] relied on database query evaluation techniques.In the database field the three-valued approach has, by and large, belonged to the practice rather than the theory.

Organization.
In Section 2 we present basic definitions.Section 3 describes the evaluation procedure for relational calculus and SQL's three-valued approach in the presence of nulls.Section 4 presents the modified evaluation procedure and states its correctness.In Section 5 we prove a generalization of that result, relying on a new notion of certain answers with nulls.This generalization properly accounts for all three possible outcomes of query evaluation (certainly true, certainly false, unknown).In Section 6 we look at certainty guarantees for relational algebra queries.Concluding remarks are in Section 7. Due to space limitations, only proof sketches are presented here; complete proofs are available in the full version.

Preliminaries
Incomplete databases.We begin with some standard definitions [1,12].Incomplete databases are populated by constants and nulls.The sets of constants and nulls are countably infinite sets denoted by Const and Null respectively.Nulls are denoted by ⊥, sometimes with sub-or superscripts.
A relational schema (vocabulary) is a set of relation names with associated arities.An incomplete relational instance D assigns to each k-ary relation symbol S from the vocabulary a k-ary relation S D over Const ∪ Null, i.e., a finite subset of (Const ∪ Null) k .When the instance is clear from the context we shall write S, rather than S D , for the relation itself as well.
The sets of constants and nulls that occur By h(D) we denote the image of D, i.e., the set of all tuples S(h(ū)) where S(ū) is in D.
is a constant for every x ∈ adom(D); in other words, it provides a valuation of nulls as constant values.If h is a valuation, then h(D) is complete.We now define the semantics of incomplete databases by means of valuations: This is often referred to as the closed-world assumption, or cwa semantics of incompleteness [12,21].Another common semantics uses the open-world assumption, or owa, and allows adding complete tuples to h(D).In the study of incompleteness, the closed-world semantics is a bit more common [1,2,12] since it is better behaved.We shall offer some comments on the owa semantics in Section 5.2.

Query languages.
As our basic query languages we consider relational calculus and its fragments.Relational calculus has exactly the power of first-order logic, or FO.Its formulae are built from relational atoms R(x), equality atoms x = y, by closing them under conjunction ∧, disjunction ∨, negation ¬, existential ∃ and universal ∀ quantifiers.If x is the list of free variables of a formula ϕ, we write ϕ(x) to indicate this.We write |x| for the length of x.
Conjunctive queries (CQs, also known as select-project-join queries) are defined as queries expressed in the ∃, ∧-fragment of FO.The class UCQ of unions of conjunctive queries is the class of formulae of the form ϕ 1 ∨ . . .∨ ϕ m , where each ϕ i is a conjunctive query.In terms of its expressive power, this is the existential-positive fragment of FO, i.e., the ∃, ∨, ∧-fragment.
We shall use relational algebra, the procedural language equivalent to FO, that has operations of selection σ, projection π, cartesian product ×, union ∪, and difference −.We use the unnamed perspective of relational algebra which does not require the renaming operator [1] (more on this in Section 6, where we shall add explicit intersection to relational algebra).The fragment without the difference operator is referred to as positive relational algebra; it has the same expressiveness as existential positive formulae (and thus unions of conjunctive queries).

Evaluation procedures for FO queries
We shall look at different query evaluation procedures.Each such procedure Eval will take a query (an FO formula) ϕ(x), a database D, and an assignment ν of values to the free variables x.The output Eval(ϕ, D, ν) is a truth value.For the standard Boolean logic, the domain of truth values is {0, 1}, with 0 meaning false and 1 meaning true.For three-valued logic, the domain is {0, 1 2 , 1}, with 1 2 interpreted as unknown.An assignment ν maps each free variable to an element of adom(D).Note that such an element could be a constant or a null; assignments thus are not valuations.We write ν[a/x] for the assignment that changes ν by mapping x to a. Also, given a tuple x = (x 1 , . . ., x n ) of free variables, and a tuple ā = (a 1 , . . ., a n ), we write simply Eval(ϕ, D, ā) if the assignment ν is such that ν(x i ) = a i for all i ≤ n.
Given an evaluation procedure Eval, the outcome of query evaluation for ϕ(x) For all of the evaluation procedures that we use (except two in Subsection 5.2), the I C D T 2 0 1 5 evaluation of the Boolean connectives and quantifiers is completely standard: Thus, from now we only explain the valuation of atomic formulae R(x) and equalities x = y.The classical FO evaluation gives us the procedure Eval FO with the range {0, 1} defined by ( 5) and: SQL's evaluation has {0, 1 2 , 1} as the range of values.Again it uses rules (5), and the rule for Eval SQL (R(x), D, ν) is exactly the same as for Eval FO , but for equality atoms the rule differs: Indeed, SQL's approach is to declare every comparison as unknown if a null is involved.Note that over complete databases, Eval FO and Eval SQL coincide.Also, over incomplete databases, Eval FO is usually referred to as naïve evaluation [1,12].
How do these relate to certain answers?We now examine FO and SQL evaluation.But first note that the definition (3) ensures that only tuples of constants are present in certain answers.There is no such restriction on the standard evaluation procedures.So to do a fair comparison we only compare sets of constant tuples returned by evaluation procedures (this will be relaxed later in the paper).Definition 1.Given a class Q of queries, an evaluation procedure Eval has certainty guarantees for Q if for every query ϕ(x) ∈ Q, every database D, and every tuple ā of constants with |ā| = |x|, we have In other words, Certain answers and Eval FO .The first observation is immediate: The converse in general is not true, we can have However, sometimes certainty guarantees can be established.It has long been known [12] that we get them by excluding universal quantification and negation from first-order logic: Eval FO has certainty guarantees for the class UCQ.This was recently extended in [8] which showed that the same is true for queries from a rather significant expansion of the class UCQ, by adding universal quantification and a limited form of implication.More precisely, we look at the class Q cert FO defined as follows: atomic formulae R(x) and x = y are in , where R is a relation symbol in the schema, and x does not have a repetition of variables.
Then Eval FO has certainty guarantees for Q cert FO queries [8].From the point of view of relational algebra, the class Q cert FO corresponds to operations σ, π, ∪, × and the division operation Q ÷ Q , where Q is written in the π, ∪, ×-fragment of relational algebra, see [16].
Certain answers and Eval SQL .How does SQL change things?Actually, it changes them for the worse: now there is no connection between Eval SQL (ϕ, D) and certain(ϕ, D) whatsoever.Indeed, we saw that for the query ϕ(x) = R(x) ∧ ¬(R(x) ∧ ¬S(x)) and database D with R D = {1} and S D = {⊥}, the certain answer is empty while Eval SQL (ϕ, D) = {1}, and for In a restricted case we provide correctness guarantees: Proposition 2. Eval SQL has certainty guarantees for unions of conjunctive queries.

Evaluation procedures with certainty guarantees
We now introduce an evaluation procedure that comes with certainty guarantees for all relational calculus queries.For that, we have to explain what is wrong with FO and SQL evaluation procedures shown above, particularly for evaluation of atomic formulae.

Atomic relational formulae R(x).
For both SQL and FO, one simply checks, for a given assignment ν, whether ν(x) belongs to R. However, returning 0 if ν(x) ∈ R is too strong if we view 0 as saying that the tuple certainly cannot belong to R.
} and let ν be the identity (recall that the range of ν is the whole active domain).Consider a tuple . Thus, the correct value for evaluating the membership of x in R seems to be 1 2 , not 0. Value 0 should be reserved for cases when no valuation h makes h(x) ∈ h(R) possible.
The Eval FO and Eval SQL procedures return 0 too eagerly, and this becomes a problem when negation is applied to a formula, as 0 becomes a 1, and suddenly we have a false positive answer that in fact is not certain at all.If the value is kept at 1  2 , applying negation still results in 1 − 1 2 = 1 2 , and thus no false 'certain answers' appear.
Equality formulae x = y FO evaluation results in 0 if ν(x) and ν(y) are different nulls, but they could still be mapped to the same constant, so the right value should be 1 2 , not 0. On the other hand, SQL evaluation produces 1  2 if one of ν(x) or ν(y) is a null.But if we know ν(x) = ν(y), then for every valuation h we will have h(ν(x)) = h(ν(y)), so the evaluation procedure must return 1 and not 1  2 in this case, or else it will miss some certain answers.Now with this in mind, we introduce a proper 3-valued evaluation procedure Eval 3v .For this, we need one additional concept.Given two tuples t1 and t2 of the same length over Const ∪ Null, we say that they unify if there is a homomorphism h such that h( t1 ) = h( t2 ).We then write t1 ⇑ t2 .
It is easy to see that we can define t1 ⇑ t2 by asking for a valuation h so that h( t1 ) = h( t2 ).By classical results on unification, it is known that t1 ⇑ t2 can be tested in linear time [20].
These modifications turn out to be sufficient to ensure certainty guarantees for all relational calculus queries.

Certain answers with nulls
While the definition of certain answers (3) has been with us for 30+ years [17], recently it has been questioned [15,16].One of the problems with this definition is that it only returns tuples containing constants.For instance, if a query is given by an FO formula with k free variables, then We now summarize properties of certain answers with nulls.The usual certain answers can be obtained from certain answers with nulls by dropping tuples containing nulls, and certain answers with nulls are always contained in the result of the simple FO evaluation of formulae.Sometimes, but not always, they may coincide with the result of such an evaluation.
Formally, we have the following.We can now state a more general description of the evaluation procedure Eval 3v : the output value 1 guarantees that a tuple belongs to certain answers with nulls for query ϕ, the output value 0 guarantees that it belongs to certain answers with nulls for the negation ¬ϕ, and output value 1  2 comes with no guarantees.Theorem 6.For every FO query ϕ(x) and every database D, Theorem 3 is now an immediate corollary: if ā is a tuple of constants and Eval 3v (ϕ, D, ā) = 1, then by Theorem 6, ā ∈ certain ⊥ (ϕ, D), and by Proposition 5, ā ∈ certain(ϕ, D).

Proof sketch. We first show an auxiliary result that ū ∈ certain ⊥ (ϕ, D) if and only if Eval FO (ϕ, h(D), h(ū)) = 1 for every homomorphism h (rather than every valuation h).
Then the theorem is a consequence of the following: This is shown by induction on ϕ; we provide the proof for the case of atomic formulae (for which Eval 3v differs from Eval FO ) here.
Another corollary says that we can use Eval 3v to find overapproximations of certain answers: Corollary 7.For every FO query ϕ(x) we have As for the complexity of the procedure, one can easily show the following.Proposition 8.For each relational vocabulary σ and α ∈ {0, 1  2 , 1}, from every FO query ϕ(x) one can compute FO queries ϕ α (x) in the vocabulary that extends σ with a unary predicate const(•) interpreted as the set of constants, such that, for every database D,

Consequently, data complexity of computing Eval
This gives us a complexity argument showing that there are cases when Eval 3v fails to produce all certain answers.A concrete example of strict containment of Eval 3v in certain ⊥ will be shown below in Section 5.1.

CQs and UCQs with inequalities
A common extension of conjunctive queries and their unions is by adding inequalities [1].This is a very mild form of negation; essentially, we only allow negation to be applied to equality atoms.Instead of writing them as ¬(x = y), it is common to use x = y in formulae, and refer to them as inequality atoms.Then the ∃, ∧-closure of relational, equality and inequality atoms is referred to as CQs with inequalities, and the ∃, ∧, ∨-closure as UCQs with inequalities.This class of queries is denoted by UCQ = .
We now present a particularly easy evaluation procedure that correctly accounts for Eval 3v producing value 1 for UCQs with inequalities, and thus gives us correctness guarantees for those queries.This procedure uses two-valued, rather than three-valued, logic and only one rule that separates it from Eval FO .To understand it, note for an inequality atom x = y, FO evaluation returns true if x and y are assigned different values -even if they are different nulls.But actually the evaluation of conditions such as ⊥ 1 = ⊥ 2 must be false, since ⊥ 1 and ⊥ 2 can be mapped, by a valuation, to the same element.For UCQ = , there is no risk with assigning false rather than unknown, since negation will never be applied further on.This lets us define the evaluation procedure for UCQ = by adding the following explicit rule for = formulae to the Eval FO rules: This evaluation is particularly easy to implement in SQL with the usual is not null conditions in the where clause.And it has the desired correctness guarantees.Theorem 9.For every UCQ = query ϕ, we have In particular, Eval UCQ = has certainty guarantees for UCQ = queries.
In fact there could be no polynomial-time evaluation procedure for finding certain answers for UCQ = queries since they have coNP-complete data complexity, even without free variables.Indeed, suppose we have a graph G = V, E where the set of vertices is {a 1 , . . ., a n }.Create a binary relation D G with adom(D G ) = {⊥ 1 , . . ., ⊥ n } and pairs (⊥ i , ⊥ j ) for every edge (a i , a j ) ∈ E. Let ϕ ∈ UCQ = be given by ∃x D G (x, x) ∨ ∃x, y, z, u

Open world semantics
Another commonly used semantics of incompleteness is based on the open-world assumption, or owa [1,12,21].Under this assumption, after applying a valuation h to a database, finitely many complete tuples can be added to it.That is, Certain answers under owa are defined as The evaluation procedure Eval 3v no longer has certainty guarantees under owa.To see this, consider D with relations y) ∧ ¬S(x, y).Since the tuple (1, 2) does not unify with either tuple in S D , we have (1, 2) ∈ Eval 3v (ϕ, D).However, under owa, it is not a certain answer: for instance, the database Thus, our question is whether the approach of Eval 3v , guaranteeing correctness for all FO queries under cwa, can be extended to owa.Of course there is always a trivial positive answer: the evaluation procedure that always returns 0 vacuously has correctness guarantees.

Since [[D]] cwa ⊆ [[D]
] owa , certain answers under owa will be included in certain answers under cwa, so the question really is how much we eliminate from the latter so that the result is still meaningful, and provides certainty guarantees under owa.Note also that finding certain answers under owa is undecidable [2] (even for data complexity [9]) which ties our hands even more in terms of finding suitable approximations.
To understand the changes that need to be made under owa, consider again relational atoms.For them, there is no way to assert with certainty that a tuple does not belong to a relation, since each relation can be expanded under owa.Hence, the case when evaluation produces 0 must go.
Next, look at existential formulae.Again we cannot state with certainty that the result of evaluation of those is 0, as perhaps in some extension of the database there is a witness for the existential formula, so the lowest value for evaluating such a formula is 1  2 , not 0. Likewise, for universal formulae, one cannot state with certainty that the result of evaluation is 1, as it requires checking the universal conditions in all extensions of the database, which is an undecidable problem.Hence, the highest value in this case is 1  2 and not 1.This explains the three changes that we make for the evaluation procedure.The procedure Eval owa 3v has the range {0, 1 2 , 1} and differs from Eval 3v in three rules: Note that this procedure is the only one that modifies rules (5).These modifications are sufficient for correctness under owa.
Proposition 10.The evaluation algorithm Eval owa 3v has correctness guarantees under owa.
A remark on equivalence of queries under Eval 3v .Under the usual FO semantics, called Eval FO here, we are used to a number of equivalences that are not necessarily true when Eval 3v is used instead.Consider, for instance, a formula ϕ(x) = ∃y R(x, y) ∧ (y = 1 ∨ y = 1) .Of course we expect it to be equivalent to ϕ (x) = ∃yR(x, y).However, under the three-valued semantics these are not equivalent: if 2 .This point will be important for us in the next section, where we present an evaluation procedure of relational algebra for databases with nulls.

6
Evaluation procedure for relational algebra Queries that get executed in a DBMS are procedural queries, in particular, in the relational case, they are written in relational algebra, or some of its extensions.We now present an algorithm that provides an evaluation with correctness guarantees for relational algebra expressions.Even though from the point of view of expressiveness, relational algebra is equivalent to FO, the equivalence itself, established under the standard two-valued semantics, is not yet a guarantee that it will provide us with a desired evaluation procedure in the three-valued world.
To expand on this, note that by Proposition 8, for every FO query ϕ(x) we have a relational algebra expression e ϕ which has access to the extra predicate const(•) so that e ϕ faithfully implements Eval 3v (ϕ, •).So it seems that starting with a relational algebra query Q, we could find an equivalent FO query ϕ Q and then consider e ϕ Q to evaluate Q.
Reasoning of this sort, however, mixes the equivalence of FO and relational algebra (that is true with respect to the usual two-valued FO evaluation) with the three-valued evaluation.Still, from the equivalence of Eval FO (ϕ Q , •) and Q one can easily derive e ϕ Q (D) = Eval 3v (ϕ Q , D) ⊆ certain ⊥ (Q, D), so we do in fact get correctness guarantees with this approach.Nonetheless, it not satisfactory for two reasons.First, the detour via translation into FO and back to algebra may produce unnecessarily complicated expressions.Second, this approach assumes a particular translation between relational algebra and FO (which of course is not unique), and the quality of the resulting query depends on that translation.For instance, we view expressions R and σ A=1 (R) ∪ σ A =1 (R) as equivalent, but using the latter in e ϕ Q can miss some answers with certainty guarantees due to the presence of nulls.
The bottom line is that it is better to have a direct evaluation procedure for relational algebra that gives us correctness guarantees without going through both algebra-to-FO and FO-to-algebra translations.
In the two-valued world sound translations for relational algebra have been considered in the past [22].Our goal is a bit different though as we have to provide specific correctness guarantees, and relate them to SQL's way of evaluating queries; in fact we shall produce approximations for sets of tuples on which Eval 3v returns 1 and 0.
We now explain the procedure for correct evaluation of relational algebra queries.First, recall the operations of relational algebra.These are selection σ, projection π, cartesian product ×, union ∪, intersection ∩, and difference −.To avoid the clutter, and in particular to avoid renaming, we use the unnamed perspective for presenting relational algebra [1], that is, for each expression returning an m-attribute relation, we simply assume that the names of those attributes are 1, . . ., m.As conditions θ in selections, we use positive Boolean combinations of equalities and inequalities between attribute values and constants.For instance, ( 1 = 2) ∨ ( 3 = 1) is a condition that can be used in selection.Note that such conditions are closed under negation, simply by propagating it all the way to (in)equalities, so we shall also refer sometimes to conditions ¬θ, meaning the result of such a propagation.We refer to this standard relational algebra as RA.
We also consider an extension called RA null .In this extension, conditions θ are positive Boolean combinations of equalities and inequalities between attributes, and conditions const( n) and null( n) stating that the value of attribute n is a constant or a null, respectively.
Our goal is to provide a translation RA → RA null that associates with each query As noticed already, due to coNP-data complexity of certain ⊥ (Q, D), we cannot hope for equality, so this correctness guarantee is the best we can count on.
We shall actually produce more.Let Q be the query that computes the complement of Q, i.e., for an n-ary Q, the result of Q(D) is adom(D) n − Q(D).Then we actually provide a translation Proof sketch.Again, we show the following, by induction on relational algebra expressions: We provide a couple of sample cases.Consider, for instance, the case when Q is R. Then Q − = R .Assume ū ∈ R and let h be a homomorphism.By definition, ū does not unify with any of t ∈ R, in particular, h(ū) cannot equal h( t), thus implying h(ū) ∈ h(R).
Let θ = ( n = m), and assume , and let h be a homomorphism.Since ū ∈ Q + 1 (D), we see, by the hypothesis, that h(ū) ∈ Q 1 (h(D)).Furthermore, since θ * holds, we know that u n and u m , the nth and the mth components of ū, are constants, and The translation in Figure 1 is not just one translation but rather a family of translations, due to the following observation.A translation can be viewed as a mapping F that assigns to each relational algebra operation ω (including nullary operations for base relations) two queries F + ω and F − ω .These queries are simply the queries that appear on the right in the translation; for instance, for the translation scheme we used, F + ∩ is the intersection (since the result of (Q Q are queries of the same type as Q (i.e., they operate on databases of the same schema and have the same arity).Intuitively, these are analogs of Q + and Q − that we had for the translation in Figure 1.
Formally, they are defined as follows.
If ω is a base relation R, then F + R and F − R take no arguments and That is, F + R and F − R are queries that give us certainly positive and certainly negative information about R.
If ω is a unary operation (σ or π), then F + ω and F − ω take two arguments and That is, if we already have queries F + Q and F − Q describing certainly positive and certainly negative answers for Q, the queries describing such answers for ω(Q) are obtained by applying F + ω and F − ω to those.If ω is a binary operation (∪, ∩, −, ×), then F + ω and F − ω take four arguments and That is, if we already have queries F + Qi and F − Qi describing certainly positive and certainly negative answers for Q i , with i = 1, 2, the queries describing such answers for ω(Q 1 , Q 2 ) are again obtained by applying F + ω and F − ω to those.
Given a translation F and another translation G that assigns to each operation ω queries G + ω and G − ω , we say that F is contained in G if F + ω ⊆ G + ω and F − ω ⊆ G − ω , where ⊆ refers to the usual query containment.Proposition 12. Every translation that is contained in the translation of Figure 1 provides correctness guarantees.

I C D T 2 0 1 5
This proposition lets us adjust translations for the sake of efficiency without having to worry about correctness guarantees.For instance, consider the rule in Figure 1.This results in a rather expensive query, as one needs to compute a power of the active domain.But we can replace it with the much simpler rule ( that again requires computing the active domain with the very simple rule (σ θ (Q)) − = Q − .In both cases the result is that the translated queries are significantly more efficient and they still guarantee correctness of the overall translation in the sense that they produce subsets of certain answers with nulls, or the usual certain answers if tuples with nulls are removed.There is a price to pay for the efficiency though: we can get fewer answers in the result.Hence one should decide how to resolve the efficiency vs the quality of approximation tradeoff.
Another corollary concerns positive relational algebra, even extended with inequalities, and it just follows from examining the basic translation of Figure 1.Define PosRA = as the positive fragment of RA (i.e., σ, π, ×, ∪) where conditions in selections are allowed to use inequalities.In terms of its expressiveness, this fragment corresponds to UCQ = .
Corollary 13.Let Q be a PosRA = query, and let Q * be obtained from it by changing each selection condition θ to θ * .Then, for every database D, we have Q * (D) ⊆ certain ⊥ (Q, D).

Conclusions
We have shown that small changes to the 3-valued query evaluation used in SQL produce sound query answers, i.e., answers without false positives.We have presented such evaluation procedures at the levels of both relational calculus and algebra, and also specialized them for unions of conjunctive queries with inequalities.The theoretical complexity of these procedures is very low, in fact it is as low as evaluating relational calculus and algebra themselves, in terms of data complexity.The next obvious step is to implement these algorithms to study their real-life applicability.As indicated at the end of the last section, our translations -especially at the procedural level -are really families of algorithms, with the efficiency vs quality of approximation tradeoff, so there is a lot to play with, to find those that provide a good combination of both.Another natural question is to consider other features of SQL.They include not only such common features as aggregation and grouping, but also derived operations of relational algebra that are used in implementation of SQL queries: for instance, the division operation for the implementation of some universal queries, or semi-joins and anti-joins that can be used for implementing subqueries.
in D are denoted by Const(D) and Null(D).If Null(D) is empty, we refer to D as complete.That is, complete databases are those without nulls.The active domain of D is adom(D) = Const(D) ∪ Null(D).Homomorphisms, valuations, and semantics.Given two relational structures D and D , a homomorphism h : D → D is a map from the active domain of D to the active domain of D such that: 1. for every relation symbol S, if a tuple ū is in relation S in D, then the tuple h(ū) is in the relation S in D ; and 2. h(c) = c for every c ∈ Const(D).

Definition 4 .
Consider a database D with a relation R D = {(1, 2), (3, ⊥)} and a query ψ(x, y) = R(x, y).Then certain(ψ, D) = {(1, 2)} but intuitively we should return the entire relation R D since we are certain its tuples are in the answer.The reason we are certain about it is that for every valuation h, the tuple (3, h(⊥)) is in h(D).We turn this reasoning into a definition.Given an incomplete database D and a k-ary query Q defined over complete databases, certain answers with nulls certain ⊥ (Q, D) is defined as the set of all tuples ū ∈ adom(D) k such that h(ū) ∈ Q(h(D)) for all valuations h.