Optimal Broadcasting Strategies for Conjunctive Queries over Distributed Data

In a distributed context where data is dispersed over many computing nodes, monotone queries can be evaluated in an eventually consistent and coordination-free manner through a simple but naive broadcasting strategy which makes all data available on every computing node. In this paper, we investigate more economical broadcasting strategies for full conjunctive queries without self-joins that only transmit a part of the local data necessary to evaluate the query at hand. We consider oblivious broadcasting strategies which determine which local facts to broadcast independent of the data at other computing nodes. We introduce the notion of broadcast dependency set (BDS) as a sound and complete formalism to represent locally optimal oblivious broadcasting functions. We provide algorithms to construct a BDS for a given conjunctive query and study the complexity of various decision problems related to these algorithms.


Introduction
We assume the setting introduced in the context of declarative networking [6,14], where queries are specified on a logical level over a global schema and are evaluated by multiple computing nodes over which the input database is distributed.These nodes can perform local computations and communicate asynchronously with each other via messages.The model then operates under the assumption that messages can never be lost but can be arbitrarily delayed.It is known that every monotone query can be evaluated in an eventually consistent and coordination-free manner through a naive broadcasting strategy that makes all data available to all nodes [14]. 1 Indeed, every computing node sends all its local data to every other node and reevaluates the query every time new data arrives.This evaluation is eventually consistent as, because of monotonicity, no facts will be derived which later have to be retracted and, furthermore, when all transmitted data has arrived, the output of every node will correspond to the result of the query.In addition, the computation requires no coordination between the nodes.Obviously, the above strategy leads to a very careless evaluation as the whole database is send to every node and every node independently computes the complete answer for the targeted query.In the present paper, we are interested in more economical broadcasting strategies where only a subset of the local data is transmitted and where each computing node contributes to the answer of the query by outputting only a subset of the answer tuples.The result of the query then is the union of the tuples output by the computing nodes.In particular, we focus on full conjunctive queries without self-joins and we consider oblivious broadcasting strategies where every computing node determines which facts will be broadcast solely on the content of its own local database (so, oblivious of the data at other nodes).These facts are referred to as broadcast facts.Facts that are not initially broadcast are called static.We illustrate the ideas behind such strategies by means of an example.
Naive broadcasting strategy.The naive broadcasting algorithm outlined above sends all facts in I(c) to c and all facts in I(c ) to c.Eventually, both c and c receive all data and both of them compute the result of the query, that is, Q 1 (I) = {(1, 2, 3)}.
Improved oblivious broadcasting strategy.The just described strategy is clearly oblivious but also rather wasteful.Therefore consider the following strategy which broadcasts all of the C-facts but none of the A-facts.Furthermore, a B-fact B(i, j) is broadcast only when A(j, i) does not occur in the local database.Executing this strategy for every computing node in our example results in c broadcasting the set {B(2, 1)} while c broadcasts {B (4,4), C(1, 3)}.So, eventually, I * (c) = {A(2, 2), B(2, 1), B(2, 2), B (4,4), C(1, 3)} and I * (c ) = {A(1, 2), B(2, 1), B (4,4), C(1, 3)}.Here, we denote by I * (d) the instance at node d when all transmitted messages have arrived.Therefore, Q 1 (I * (c)) = ∅ and Q 1 (I * (c )) = {(1, 2, 3)}, and Q 1 (I) equals Q 1 (I * (c)) ∪ Q 1 (I * (c )).Intuitively, this strategy is correct in general as the following invariant holds for every computing node d: when a fact B(i, j) is not broadcast at a node d, then every satisfying valuation V for Q on I that maps (x, y) to (i, j) can be realized locally in I * (d).Notice that, a similar strategy reversing the roles of A-and B-facts would work as well.
We will formalize oblivious broadcasting functions as generic mappings.This means that decisions on whether to broadcast facts do not depend only on the name of the predicate but can also depend on the equality type of the fact under consideration.Therefore, the following strategy would be valid as well: always broadcast facts of the form C(i, j) with i = j and keep all facts of the form C(i, i) static; broadcast all B-facts; broadcast a fact A(i, j) only when the fact C(i, i) is not present in the local database.While not immediately obvious, this strategy correctly computes Q on every distributed database.
Both strategies will be presented more formally in Section 5 in terms of broadcast dependency sets and are formalized further in Example 12(1) and 12 (3).
In this paper, we make the following contributions: (i) We provide a semantical characterization of when an oblivious broadcasting function (OBF) correctly evaluates a given conjunctive query.While it is desirable to construct OBFs that minimize the overall amount of transmitted facts over all distributed databases, we show that there is no optimal OBF for any conjunctive query with at least two distinct atoms in its body.Therefore, we turn to a slightly weaker notion of optimality, called local optimal, which requires that an OBF is optimal w.r.t. the local instance at every computing node.Intuitively, this means that no broadcast fact can be made static without sacrificing correctness.We provide a semantical characterization for when an OBF is local optimal for a given conjunctive query.
(ii) We introduce the notion of a broadcast dependency set (BDS) as a formalism to specify OBFs.In brief, a BDS S is a set of pairs (τ, T ) where τ is a partial atomic type and T is a set of partial atomic types.Every such pair encodes a rule that can be interpreted roughly as follows: when a fact f matches type τ , it will be broadcast at a computing node c when the set of facts induced by T is not present at c.We present necessary and sufficient syntactic conditions for when a BDS is correct for a given query and also for when it is local optimal w.r.t. that query.Furthermore, we study the complexity of deciding whether a BDS is correct for a query and whether it is local optimal.Finally, and most importantly, we show that the formalism of BDS is expressively complete w.r.t.local optimal OBFs by obtaining that every local optimal OBF can be represented by a BDS.In fact, every local optimal OBF can already be represented by a BDS that only uses complete types, that is, types where the equalities between all variables are fully specified.(iii) Based on the syntactic criteria of when a BDS is correct for Q and when it is local optimal, we obtain an algorithm bds-build(Q) that computes a local optimal OBF (represented as a BDS) for a given conjunctive query Q.When restricting to open types (these are types without restrictions on the equalities between variables), bds-build(Q) computes a local optimal OBF in time polynomial in the size of Q.When considering complete types, bds-build(Q) computes a local optimal OBF in time exponential in the size of Q simply because there are exponentially many complete types.
Outline.We discuss related work in Section 2 and introduce the necessary definitions and concepts in Section 3. In Section 4, we discuss oblivious broadcasting functions and local optimality.In Section 5, we discuss broadcast dependency sets and study their properties.
In Section 6, we provide an algorithm to construct a local optimal oblivious broadcasting function for a given conjunctive query.We conclude in Section 7.

Related Work
CALM.The approach in this paper is motivated by the work on the CALM-conjecture.Hellerstein [14] formulated the CALM-principle which suggests a link between logical monotonicity and distributed consistency without the need for coordination.The latter principle is, for instance, embedded in BLOOM, a declarative language for distributed programming, for which practical program analysis technique have been developed detecting potential consistency anomalies [3,4,11].Ameloot et al. [6] formalized (and proved) the CALM-conjecture in terms of relational transducer networks.Zinn et al. [19] showed that the generalization of the conjecture to stronger variants of relational transducer networks fails.Ameloot et al. [5] then subsequently provided a more fine-grained answer to the CALM-conjecture by relating these stronger variants of relational transducer networks to weaker notions of monotonicity.All of these works considered naive evaluation strategies that broadcast all of the local data.
In particular, none of these works considered more economic broadcasting evaluation of conjunctive queries.
Massive parallel model.The networked relational transducer model is just one paradigm for studying distributed query evaluation.In the massively parallel (MP) model, introduced by Koutris and Suciu [15], computation proceeds in a sequence of parallel steps, each followed by global synchronization of all servers.In this model, evaluation of conjunctive queries [15,7] as well as skyline queries [2] have been considered.Recently, Beame et al. [8] proved a matching upper and lower bound for the amount of communication needed to compute a full conjunctive query without self-joins in one communication round.The upper bound is provided by a randomized algorithm called Hypercube which dates back to Ganguli et al. [13] and was described by Afrati and Ullman [1] in the context of MapReduce algorithms.Hypercube is motivated by modern massively distributed systems like, for instance, Spark [18], where entire computations reside in main memory, replay is used to recover, and the dominant cost is that of communication.We note that one-round Hypercube is coordination-free and can be easily employed within the framework of relational transducer networks as well.A characteristic of Hypercube-style algorithms is that the space of computing nodes (over which the input data will be distributed) needs to be known in advance.The broadcasting strategies considered in this paper are motivated by a cloud computing setting where data is already initially distributed and the complete space of computing nodes is not necessarily known in advance.In this respect, Hypercube-style and broadcasting algorithms are orthogonal.
Relevance.One approach to minimize data transfer for a query Q, is to find the smallest subset J of a distributed instance I for which Q(I) = Q(J) and then only broadcast the relevant subset J. Determining which part of a database is relevant for answering a query is a problem that arises in different contexts.For instance, causality in databases aims to determine which tuples in the database instance caused the output to a query [16,17].Then, the contingency set asks for the smallest set K such that Q(I) = Q(I − K).So, any set I − K extended with one element is relevant.Similarly, "where" and "why" provenance refer to the location(s) in the source databases from which the output was extracted or by which the output was influenced [10,9].Fan et al. [12] study the problem of scale independence where, through access patterns, the result of a query depends only on a bounded part of the database.It would be interesting to investigate how these different approaches translate to a distributed setting.Most surely, any lower bounds for the sequential setting imply lower bounds for the distributed setting, but upper bounds need to take into account that the initial database instance I is distributed as well.

Preliminaries
Instances and queries.For a finite set S, we denote by |S| its cardinality and by 2 S its powerset.We denote {1, . . ., n} by [n], for n ∈ N. We assume an infinite set dom of data values.A database schema σ is a collection of relation names R where every R has arity ar(R) > 0. We call R( d) a fact when R is a relation name and d is a tuple in dom.We say that a fact R(d 1 , . . ., d k ) is over a database schema σ if R ∈ σ and ar(R) = k.A (database) instance I over σ is simply a finite set of facts over σ.We denote by Adom(I) the set of all values that occur in facts of I.When I = {f }, we simply write Adom(f ) rather than Adom({f }).A query over a schema σ to a schema σ is a generic mapping Q from instances over σ to instances over σ .Genericity means that for every permutation π of dom and every instance For the remainder of the paper, we assume given a database schema σ over which all queries are defined and do not refer to it anymore.A query for all instances I, J with I ⊆ J.We only consider monotone queries in the sequel.
Conjunctive queries.Let var be the universe of variables, disjoint from dom.An atom A is of the form R(u 1 , . . ., u k ) where R is a relation name and each u i ∈ var.We call R the predicate and denote it by pred(A).We denote the variables occurring in A by Vars(A) = {u 1 , . . ., u k }.We say that A is an atom over the database schema σ if pred(A) ∈ σ and k = ar(pred(A)).A conjunctive query Q (CQ) is an expression of the form A 0 ← A 1 , . . ., A n , where for every i ∈ [n], A i is an atom over the schema and A 0 is an atom not over the schema.In particular, A 0 is the head of Q, denoted head Q , and A 1 , . . ., A n is the body of Q, denoted body Q .By Vars(Q) we denote all the variables occurring in is defined as the set of facts that can be derived by satisfying valuations.
In what follows, we assume that every CQ is full and does not contain self-joins.Formally, we require that pred That is, every atom has a unique relation symbol and all variables occurring in the body occur in the head as well.For instance, Q 1 (x, y, z) ← A(x, y), B(x, z), C(y, y) is full and does not contain self-joins, while A(x, z), C(y, y) contains a self-join.

Distributed database.
A network N is a nonempty finite set of values from dom, which we call nodes.A distribution of an instance I over N is a function H that maps each c ∈ N to an instance such that I = c∈N H(c).Notice that facts can be replicated.We also refer to each of the H(c) as the local instances.We consider a model where nodes have unlimited computational power and can send messages to all other nodes.These messages can never be lost but can be arbitrarily delayed.

Oblivious broadcasting
We refrain from introducing the formalism of relational transducer networks from [6], but present a simpler setting more suitable for our needs.In particular, the relational transducer networks needed in this paper only perform two actions: decide which facts to broadcast (and transmit those) and evaluate the query under consideration whenever new data arrives.The only parameter is the used broadcasting strategy and, therefore, forms the focus of our formalization.In brief, we consider broadcasting strategies where computing nodes partition their local database into static and broadcast facts.Static facts are kept local while broadcast facts, as the name already indicates, are sent to all other nodes in the network.As we only consider conjunctive queries which are monotone, the target query can be recomputed whenever new data arrives.

Oblivious broadcasting functions
We now formally define oblivious broadcasting function.

Definition 2. An oblivious broadcasting function (OBF)
f is a generic mapping that maps instances to instances such that f (J) ⊆ J for all instances J.
An OBF specifies which local facts are broadcast.Specifically, f (J) are the broadcast facts while J \ f (J) are the static facts.We use the term oblivious as broadcast facts only depend on the local database instance and their choice is independent of the facts at other computing nodes.An OBF f is naive when there are no static facts, that is, f (J) = J for all instances J.
Given a CQ Q, an instance I, a distribution H of I, and a network N , an OBF f implies a broadcasting algorithm in the following way.Let B(f, H) = c∈N f (H(c)) be the set of as the union of the query result at every computing node over the local instance extended with all broadcast facts. 2emark.We note that the function eval(Q, f, H) implies an evaluation that can be executed by a transducer program π f,Q at every node c as follows: Correctness then follows from the genericity and monotonicity of f .We refer to the execution strategy induced by eval(Q, f, H) as a broadcasting algorithm.Coordination-freeness intuitively follows as π f,Q never waits.Formally, a transducer is coordination-free [6] if there is a so-called ideal distribution, on which the query is already computed by a prefix of a run that does not process any of the incoming facts.For π f,Q this is the distribution that puts the complete instance at every node.We refer to [6] for a more formal treatment of coordination-freeness.
When f is correct for Q, we also say that f is an OBF for Q.The following lemma characterizes correctness in that two compatible facts residing at different computing nodes can never be both static.Indeed, if they are, then the valuation witnessing compatibility is never realized at any computing node and consequently f can not be correct for Q.
We say that two distinct facts f and g are compatible w.r.t Q, denoted f ∼ Q g, when they are assigned to two atoms from the body of Q under one valuation, i.e., there is a valuation V for Q and atoms A, B ∈ body Q , such that V (A) = f and V (B) = g.Lemma 4. Let Q be a CQ and f be an OBF.Then, the following are equivalent: 1. f is correct for Q; and 2. there are no instances I, J, and facts f, g, with f ∼ Q g, g ∈ I, f ∈ J such that f ∈ f (I ∪{f}) and g ∈ f (J ∪ {g}).

Proof. (1)⇒(2)
We start by showing that every OBF for Q satisfies the above condition.
The proof is by contraposition, so we assume that there are instances I and J and compatible facts f and g w.r.t.Q, where g ∈ I and f ∈ J, but f ∈ f (I ∪ {f}) and g ∈ f (J ∪ {g}).Let K be an instance and let V be a satisfying valuation for Q on K witnessing compatibility of f and g.Then consider a network N = {1, 2, 3} and an instance because none of the computing nodes contain both f and g, and f and g are not broadcast.Thus, (2)⇒(1) It remains to show that if the above condition is satisfied, then f is an OBF for Q.For this, let I be an instance, N a network, and H a distribution of I over N .We prove that and c a node for which |H(c) ∩ J| is maximal.We claim that J ⊆ H(c), obviously implying that f will be derived at node c.Towards a contradiction, assume there is an Moreover, by choice of c, |H(d) ∩ J| ≤ |H(c) ∩ J| and thus there must be a fact

Local optimality
We are interested in OBFs that transmit as little data as possible.Thereto, we investigate sensible notions of optimality.We fix a query Q, an instance I, a distribution H of I, and a network N .The total number of transmitted facts equals ||B(f, || for every other OBF g for Q and for every instance I and distribution H.
Intuitively, an OBF is optimal when it transmits the least amount of data over all instances and all distributions.The next result, however, shows that this notion of optimality, although desirable, is unattainable.Lemma 6.There is no optimal OBF for any conjunctive query with at least two distinct atoms in its body.
Proof.Let Q be the conjunctive query A 0 ← A 1 , . . ., A n with n ≥ 2. Towards a contradiction assume there is an optimal OBF f for Q.Let I be the canonical instance for Q where for every i ∈ [n], the relation pred(A i ) is interpreted by the fact A i . 3Now, consider a network N = [n] and a distribution H that places every fact in I on a distinct node.As all of the n facts in I need to be gathered at one node, at least n − 1 facts must be broadcast.Let g be the fact in I that is not broadcast by f and assume w.l.o.g. that pred(g) = A n .As the OBF that broadcasts all A i -facts for i < n and keeps all A n -facts static is correct for Q and only transmits n − 1 facts on I, by assumption on the minimality of f , ||B(f, H)|| = n − 1.Now, consider I = I \ {g}.And let H equal H restricted to only facts in I over N .Then, as g is not broadcast in H, ||B(f, H)|| = ||B(f, H )||.However, the OBF that broadcasts all A i -facts for i > 1 and keeps all A 1 -facts static is correct for Q and only broadcasts n − 2 facts on I contradicting the optimality of f .We next turn to a different form of optimality.For two OBFs f and g, we say that f is included in g, denoted f ⊆ g, iff f (I) ⊆ g(I) for every instance I. Definition 7.An OBF f for a CQ Q is local optimal iff for every other broadcasting function g for Q, g ⊆ f implies f = g.
Intuitively, when f is local optimal there is no subdivision of f that transmits only a strict subset of the facts broadcast by f .
The next lemma gives a sufficient criteria for when an OBF can not be local optimal.Specifically, a condition is given for when a broadcast fact f can be kept static and a more economical OBF f can be derived.Lemma 8. Let Q be a CQ and let f be an OBF for Q.If there is an instance I and fact f for which f ∈ f (I ∪ {f}), but there is no instance J and no fact g for which f ∼ Q g, g ∈ I, f ∈ J, and g ∈ f (J ∪ {g}), then there is an OBF f for Q for which f f .
Proof.Assume f , I, and f as given by the statement of the lemma.The proof is now by construction.Let I f,J be the set of facts that (by genericity) relate the same way to J, as f to I.That is, I f,J = {π(f) | π a permutation s.t.π(I) = J}.Then, define f as the mapping where for every instance J, f (J) = f (J) \ I f,J .Notice that f f by construction of f .Furthermore, f is generic and is an oblivious broadcasting function.It remains to show that f is an oblivious broadcasting function for Q.Towards a contradiction, assume that f is not an oblivious broadcasting function for Q.Then, by Lemma 4, there are instances J 1 and J 2 and facts g 1 and g 2 , for which As f is an oblivious broadcasting function for Q, it holds that Say that g 1 ∈ f (J 1 ∪ {g 1 }).Then, g 1 ∈ I f,J1 , implying J 1 = π(I) and g 1 = π(f) for some permutation π.As Q does not contain self-joins and g 1 ∼ Q g 2 , this means that g 2 ∈ I f,J .Therefore, g 2 ∈ f (J 2 ∪ {g 2 }) which contradicts the condition of the lemma (taking π −1 (g 1 ) and π −1 (J 2 ) as g and J, respectively).
The following lemma now characterizes when an OBF for a query is local optimal.Lemma 9. Let Q be a CQ and let f be an OBF for Q.The following are equivalent: 1. f is local optimal; and 2. for every instance I and fact f for which f ∈ f (I ∪ {f}), there is an instance J and a fact g such that f ∼ Q g, g ∈ I, f ∈ J, and g ∈ f (J ∪ {g}).
Proof.We can assume that Q contains at least two atoms.Indeed, when Q contains one atom, the only local optimal OBF is the one that broadcasts no facts and the lemma trivially holds.The direction from (1) to (2) follows from Lemma 8.
(2)⇒(1) Let f be an OBF for Q.Towards a contradiction assume that f is not local optimal.That is, there exists another OBF f for Q such that f f .In particular, there is an instance I and a fact f such that f ∈ f (I ∪ {f}), while f ∈ f (I ∪ {f}).By Lemma 4, for every fact g with f ∼ Q g where g ∈ I, and for every instance J, where f ∈ J, it must be that g ∈ f (J ∪ {g}).The latter then implies that for every such g and J, g ∈ f (J ∪ {g}) which contradicts condition (2) of the present lemma.

Broadcasting functions based on dependency sets
In this section, we introduce the notion of a broadcast dependency set (BDS) as a formalism to specify OBFs.We present necessary and sufficient conditions for when a BDS induces an OBF which is correct for a given query and also for when it is local optimal.Furthermore, we study the complexity of the corresponding decision problems.Finally, we show that every local optimal OBF can be represented by a BDS thereby obtaining that BDS is complete as a representation formalism for local optimal OBFs.

Broadcast dependency sets
Let Q be the CQ A 0 ← A 1 , . . ., A n .We assume Q is full and does not contain self-joins.Therefore an atom A i in body Q is uniquely identified by its predicate pred(A i ).For a predicate R, we denote by atom(R) the unique atom A ∈ body Q for which pred(A) = R.For a finite set of variables X, a partial (equality) type over X is a pair of binary relations ϕ = (E ϕ , I ϕ ) representing equalities and inequalities among elements in X. Formally, we require that E ϕ ∪ I ϕ ⊆ X × X, E ϕ is an equivalence relation, and I ϕ is irreflexive and symmetric.We abuse notation and also use ϕ to denote the formula {x = y | (x, y) ∈ E ϕ } ∧ {x = y | (x, y) ∈ I ϕ }.We tacitly assume that partial types are always consistent.That is, we always assume that there is a tuple ā such that the formula ϕ(ā) evaluates to true.When for all (x, y) ∈ X × X, either (x, y) ∈ E ϕ or (x, y) ∈ I ϕ , then ϕ completely specifies all relations between variables in X and we call ϕ a type.For emphasis, we sometimes say complete type rather than just type even though type always means complete type.
A partial atomic type (over Q) is a pair τ = (R τ , ϕ τ ), where R τ is a database predicate and ϕ τ is a partial type over Vars(atom(R τ )), that is, the variables occurring in the unique atom A ∈ body Q for which pred(A) = R τ .By Vars(τ ) we denote the variables over which τ is defined, i.e., Vars(τ ) = Vars(atom(R τ )).Sometimes we write atom(τ ) to abbreviate atom(R τ ).We say that τ is an atomic type when ϕ τ is a type.To improve readability, we denote partial atomic types with τ and (complete) types with ω.We denote by PTypes(Q) and Types(Q) the set of all partial atomic types and atomic types over Q, respectively.
A fact f is of type τ or satisfies τ , denoted f |= τ , when there is a valuation h from the variables in atom(R τ ) onto Adom(f) such that h(atom(R τ )) = f and the formula ϕ τ evaluates to true where each x i is substituted by h(x i ).Notice that h is unique for f.Hereafter we will refer to h as V f .By type(f), we denote the unique atomic type satisfied by f when it exists.As atomic types are defined w.r.t.Q, type(f) is not always defined.Indeed, when f = R(a, b) (with a = b) and atom(R) = R(x, x), then there is no τ with f |= τ .Two partial atomic types τ, τ are compatible w.r.t.Q, denoted τ ∼ Q τ , when there are facts f and g with f |= τ and g |= τ such that f ∼ Q g.We say that τ implies τ , denoted τ |= τ , if for all facts f, f |= τ implies f |= τ .We can think of a partial atomic type as a disjunction of types for a shared predicate symbol.Define Types(τ ) = {ω ∈ Types(Q) | ω |= τ } as the set of all atomic types ω which imply τ .Notice that, ω |= τ iff ω ∈ Types(τ ) for any atomic type ω.For a set of partial atomic types T , we use Types(T ) as an abbreviation for τ ∈T Types(τ ).
For a set of variables X and Y , and a partial atomic type τ , X ⊆ τ Y if for all x ∈ X either x ∈ Y or there is an y ∈ Y such that (x, y) ∈ E ϕτ .That is, X is a subset of Y when taking the equalities in E ϕτ into account.For instance, let τ be a type such that (y, z) ∈ E ϕτ , then {x, y, z} ⊆ τ {x, y}.The above definition states that (1) every key can have at most one value in S; (2) every complete type implies at most one partial type τ ∈ Keys(S); and, (3) the set of variables of atom(τ ) is included in the set of variables of atom(τ ) taking into account the equalities in E τ .We first explain informally how a BDS represents an OBF.Let f be a fact in the local instance at a computing node.When type(f) is undefined, then f is static as f can never participate in any satisfying valuation.For instance this happens when f = R(a, b) with a = b and Q contains the atom R(x, x).Every pair (τ, T ) ∈ S now specifies a condition on facts: when f |= τ then f is broadcast only if a set of facts implied by T (to be formalized below) is not present at the local instance.Furthermore, when there is no τ ∈ Keys(S) for which f |= τ , f is broadcast as well.In this light, conditions (1) and ( 2) ensure that every I C D T 2 0 1 5 local fact f is matched by at most one partial type τ ∈ Keys(S); and, condition (3) ensures that when f |= τ then V f can be extended in a unique way to a valuation for every τ ∈ T that is consistent with f, that is, for which type(f) ∼ Q τ .
Next, we formally define how every BDS S implies an OBF f S .Given a fact f, if there is no τ ∈ Keys(S) for which f |= τ then f is always broadcast.Otherwise, by condition ( 1) and ( 2) of Definition 10, there is exactly one τ ∈ Keys(S) such that f |= τ .Recall that V f is the valuation (defined above) such that V f (atom(τ )) = f.Then, by condition (3) of Definition 10, V f can also be interpreted as a valuation for every atom(τ ) for every τ ∈ T for which type(f) ∼ Q τ .Indeed, for every y ∈ Vars(τ ) \ Vars(τ ) there is a variable x ∈ Vars(τ ) for which (x, y) ∈ E τ .Therefore, define for every y ∈ Vars(τ ), As we only consider V f,τ for which type(f) ∼ Q τ , the above is well-defined.Now, f is broadcast when the local instance does not contain all the V f,τ (atom(τ )) for which τ ∈ T and type(f) ∼ Q τ .We refer to these facts as the dependency fact set.To formally define f S , we set Dep(f, T ) = {V f,τ (atom(τ )) | τ ∈ T and type(f) ∼ Q τ }.Then, define Dep(f, S) as Dep(f, T ) when there is a (τ, T ) ∈ S for which f |= τ .Otherwise, Dep(f, S) is undefined.Definition 11.For a CQ Q and a BDS S for Q, define f S as the function that maps every instance J to the set f S (J) of those facts f ∈ J for which (1) type(f) ∈ Types(Q); and, ( 2) Intuitively, f is static only when type(f) ∈ Types(Q) (f can not participate in any satisfying valuation) or the dependency fact set Dep(f, S) is present at the local instance.

Example 12.
(1) For a simple example of a BDS S and OBF f S , recall query Q 1 from Example 1, being Q 1 (x, y, z) ← A(x, y), B(y, x), C(x, z).Let ϕ = (∅, ∅), that is, ϕ imposes no restrictions.Let τ A = (A, ϕ) and τ B = (B, ϕ).Then, S = {(τ B , {τ A }), (τ A , ∅)} is a BDS for Q 1 .Indeed, every partial atomic type occurs at most once as a key.There is no (complete) atomic type that implies both τ A and τ B .Furthermore, the variable containment condition between τ A and τ B is satisfied.Notice that f S simulates exactly the broadcast dependency function which is described in Example 1.
Note that not every BDS for Q induces an OBF which is correct for Q.Indeed, the following lemma provides equivalent semantic and syntactic conditions for an OBF f S to be correct for a query.Lemma 13.Let Q be a CQ and let S be a BDS for Q.Then the following are equivalent: 1. f S is an OBF for Q; 2. there are no instances I, J, and facts f, g, and g ∈ f S (J ∪ {g}); and 3. there are no (complete) atomic types ω 1 , ω 2 , and pairs and ω 2 ∈ Types(T 1 ). Proof.
Notice that the OBFs of Example 12 are all correct for the given query.Two partial atomic types τ 1 , τ 2 are said to be equal, denoted τ 1 = τ 2 , when Types(τ 1 ) = Types(τ 2 ).We say that a BDS S is harmonious when every two partial types in S are either disjoint or equal.That is, when for every two partial atomic types τ 1 , τ 2 ∈ Keys(S) ∪ {τ ∈ T | T ∈ Values(S)}, either τ 1 = τ 2 or Types(τ 1 ) ∩ Types(τ 2 ) = ∅.Theorem 14.Let Q be a CQ and let S be a BDS for Q.Deciding whether f S is correct for Q is conp-complete and in ptime when S is harmonious.

Local optimality
Next, we turn to local optimal OBFs.The following lemma provides equivalent semantic and syntactic conditions for an OBF to be local optimal.Regarding condition (3), the intuition is as follows.While condition (3c) is the syntactic counterpart of condition (2), conditions (3a) and (3b) specify optimality requirements which are inherent to the formalism of BDS.More specifically, condition (3a) specifies that every atomic type implying a partial type in a dependency set in S must also imply a key in S. Indeed, when an atomic type does not imply a key, every local fact of this type is always broadcast and therefore present at every computing node.The atomic type can therefore be removed from every dependency set it occurs in.When Condition (3b) fails for an atomic type ω, S can be adapted to broadcast less while preserving correctness for Q by adding the pair (ω, {τ | τ ∼ Q ω, τ ∈ Types(Keys(S))}).Lemma 15.Let Q be a CQ, S a BDS for Q, and f S an OBF for Q.The following are equivalent: 1. f S is local optimal; 2. for every instance I and fact f for which f ∈ f S (I ∪ {f}), there is an instance J and a fact g such that f ∼ Q g, g ∈ I, f ∈ J, and g ∈ f S (J ∪ {g}); and, 3. S satisfies the following conditions: (a) for (τ, T ) ∈ S and ω ∈ Types(T ), ω ∼ Q τ implies ω |= τ for some τ ∈ Keys(S); (b) for every ω ∈ Types(Q) \ Types(Keys(S)), there is a partial atomic type τ 1 ∈ Keys(S) and a ω 1 ∈ Types(τ 1 ) such that ω ∼ Q ω 1 and Vars(ω 1 ) ⊆ ω1 Vars(ω); and (c) for (τ 1 , T 1 ), (τ 2 , T 2 ) ∈ S, where ω 1 ∈ Types(τ 1 ), ω 2 ∈ Types(τ 2 ), and ω 1 ∼ Q ω 2 : ω 1 ∈ Types(T 2 ) implies ω 2 ∈ Types(T 1 ).
Deciding whether f S is local optimal for arbitrarily given BDS S turns out to be hard (c.f., Theorem 16).Therefore, we also consider the special case of open BDSs.We say that a partial type ϕ = (E, I) is open when it enforces no restrictions.That is, when E = I = ∅.A partial atomic type (R, ϕ) is open when ϕ is.We say that a BDS S is open when it only contains open partial atomic types.Notice that a BDS that is open is also harmonious (but not vice versa).
Similarly to Theorem 14, we have the following decidability result for local optimal OBFs.Theorem 16.Let Q be a CQ and let S be a BDS for Q for which f S is correct for Q.Deciding whether f S is local optimal is in conp and in ptime when S is open.
It remains open though whether deciding local optimality is conp-complete or in ptime (even for harmonious BDS).For harmonious BDS, condition (1) and (3) of Lemma 15 are verifiable in polynomial time.
Next, we show that every local optimal OBF can be represented by a BDS thereby obtaining that BDSs (satisfying the conditions in Lemma 15) are a complete representation of local optimal OBFs.Let Q be a CQ and let f be an OBF for Q.We call a fact f semi-static for f when there is an atomic type ω and an instance I such that f ∈ f (I ∪ {f}) and type(f) = ω.That is, f has an atomic type and there is an instance for which f is not broadcast.We say that a semi-static fact f (for f ) depends on a fact g, when f ∈ f (I ∪ {f}) implies g ∈ I for every instance I.With every semi-static fact f, we associate the set D f containing exactly all facts on which f depends.Thus, f ∈ f (I ∪ {f}) implies D f ⊆ I.
We make use of the following lemma in the proof of Theorem 18.
Lemma 17.Let Q be a CQ, and f be a local optimal OBF for Q.Let f be semi-static for f .Then, f ∈ f (D f ∪ {f}).Furthermore, g ∈ D f implies 1. g is semi-static and g ∼ Q f; 2. Adom(g) ⊆ Adom(f); 3. Vars(atom(g)) ⊆ type(g) Vars(atom(f)); and 4. g = V f,type(g) (atom(g)); We are now ready to prove completeness.The proof of the following theorem shows that the formalism of BDS that only uses complete atomic types can already represent every local optimal OBF.Theorem 18 (Completeness).Let Q be a CQ and f a local optimal OBF for Q.Then, there is a BDS S for Q such that f = f S .
Proof.We start by noting that if f is semi-static for f , then every g with type(f) = type(g) is semi-static for f as well.Therefore, we say that an atomic type τ is semi-static for f when there is a semi-static fact f with type(f) = τ .The proof is by construction.Let S be the set of pairs (τ, D τ ) where τ is semi-static for f and D τ = Types(D f ), where f is a fact with atomic type τ .
We first show that S is a BDS and then that f = f S .Notice that, S has only finitely many pairs, because there are only finitely many distinct atomic-types, and every set in Values(S) is finite by construction.Let (τ, T ) ∈ S, and τ ∈ T .By construction of S, τ is a semi-static atomic type for f and for every atomic type τ there is at most one pair (τ, T ) ∈ S. Furthermore, T = D τ .Let f be a fact of type τ .Then, f is a semi-static fact for f and there is a g ∈ D f , such that type(g) = τ .By Lemma 17(3), Vars(atom(τ )) = Vars(atom(g)) ⊆ type(g) Vars(atom(f)) = Vars(atom(τ )).So, S is a broadcast dependency set for query Q.
Next, we show that f = f S .For this, we assume Let f be a fact and I an instance, such that f ∈ f (I ∪ {f}).If f has no atomic type, then it is never broadcast by f S .So, assume f has an atomic type.Then it must be that D f ⊆ I.However, because (type(f), D type(f) ) ∈ S and D f = Dep(f, D type(f) ), Dep(f, S) ⊆ I. Hence, by definition of f S , f ∈ f S (I ∪ {f}).
For fact f and instance I, where f ∈ f (I ∪ {f}), Lemma 9 implies that f has an atomic type.Either, f is always broadcast by f , or it is semi-static for f.The former implies that there is no pair in S of the form (type(f), T ).So, f is broadcast by f S as well.The latter implies by Lemma 17 that D f ⊆ I and there is a pair (type(f), D type(f) ) ∈ S. In particular, because , it follows by Lemma 17(4) that g ∈ Dep(f, D type(f) ).For the reverse direction, let g ∈ Dep(f, D type(f) ), which implies type(g) ∈ D type(f) .So, there must be some fact g , which is of the same type as g, in D f .In particular, because D f ⊆ Dep(f, D type(f) ), g = V f,type(g ) (atom(g )).However, because g = V f,type(g) (atom(g)), atom(g) = atom(g'), and type(g ) = type(g), it must be that g = g .So, indeed g ∈ D f .

6
Algorithms for constructing a BDS Lemma 13 and Lemma 15 yield a natural algorithm for constructing a local optimal OBF for a given conjunctive query Q by simply starting from S = ∅ and adding new pairs in a one by one fashion till no more pairs can be added.More formally, we introduce the algorithm Remark.By construction, bds-build(Q) prevents any circular dependencies by stratifying the construction of S so that partial atomic types can only depend on partial atomic types that where added before.As illustrated in Example 12(4), dependencies in a BDS can also be circular.To allow for these bds-build can be modified as follows: as an alternative for adding pairs (τ, T ) where every existing key that is compatible with τ is included in T , we can allow adding pairs where some keys that are compatible with τ are in T , and for every other compatible key, their respective value set is expandend to contain τ ; i.e., allowing pairs of the form (τ, D), where D is a subset of C = {ω ∈ Keys(S) | ω ∼ Q ω} satisfying Vars(ω ) ⊆ ω Vars(ω) for every ω ∈ D, and where every existing pair (ω , T ), where ω ∈ C \ D, is expanded to (ω , T ∪ {ω}).Particularly notice that when a given BDS S is changed to S by adding a pair and expanding at least one of the existing pairs as described above, the inherent nature of the described OBF changes, so that not necessarily f S f S .
Remark.Although the machinery developed throughout this paper is motivated by gaining a better understanding of the spectrum of local optimal OBFs, the reader may notice that when no (statistical) information on the actual distribution of the data is available, there is no basis to favor one local optimal OBF over another.
In fact, there is already a very simple algorithm to find an arbitrary local optimal OBF for given CQ Q which is as good as any local optimal one (when no additional information on the distribution of the data is available).Indeed, consider an arbitrary order on the predicates of Q: For every local fact f, with predicate R, if there is an earlier predicate S such that some variable in Vars(S) is not in Vars(R), f is broadcast; otherwise, f is broadcast only if all the facts induced by V f on query Q are in the local instance.
Of course, not every local optimal OBF can take this form.

Discussion
We investigated local optimal oblivious broadcasting functions represented by the formalism of broadcast dependency sets.We obtained semantical and syntactical characterizations, showed completeness of BDSs for representing local optimal OBFs, and gave an algorithm for constructing local optimal OBFs for a given conjunctive query.We present several directions for future work: more expressive query languages, incorporating background knowledge, and non-oblivious broadcast functions.An obvious question is how to generalize our results to the class of all conjunctive queries (possibly extended with negation) or even to (subsets of) Datalog.Of course, to evaluate non-I C D T 2 0 1 5 monotonic queries in a coordination-free manner, computing nodes need more information on how data is distributed (c.f., [6]).
We only discussed how to build a BDS when no information about the way data is distributed is available.Indeed, the best one can do is to let a BDS cover as much types as possible, but at the same time introduce as little dependencies as possible, as these are likely to fail when data is arbitrarily distributed.It would be interesting to devise optimal broadcasting algorithms taking more background knowledge into account like information about clustering of attributes, foreign keys, or cardinality of relations.
Another interesting direction for future work is to investigate non-oblivious broadcasting functions where over time, when new messages arrive, static facts can become broadcast facts (but not vice versa).Such functions are initially more conservative keeping more facts static and only broadcast facts when there is some evidence that they can be used at another computing node.For instance, consider the setting of Example 1. Rather than immediately sending B(i, j) whenever A(j, i) is locally absent, broadcasting is suspended until a C-fact of the form C(j, k) is received.The rationale is that a B-fact that can not contribute to a locally satisfying valuation, should only be broadcast when some evidence is received that it could potentially contribute to a satisfying valuation on a remote node.For our example this means that c waits to send B(2, 1) until C(1, 3) arrives.Moreover, B(4, 4) is never sent.While non-oblivious strategies might seem more attractive as they transmit fewer tuples, such strategies, while remaining coordination-free, can increase the overall evaluation time.