Abstract 1 Introduction 2 Preliminaries 3 Towards PAC 4 General Techniques 5 Algorithm 6 PAC in Relation to Other Numbers 7 Conclusion and Future Work References Appendix A Proofs for Section 4 Appendix B Proofs for Section 5

PAC: Computing Join Queries with Semi-Covers

Heba Aamer ORCID Vrije Universiteit Brussel, Belgium Bas Ketsman ORCID Vrije Universiteit Brussel, Belgium
Abstract

An increased and growing interest in large-scale data processing has triggered a demand for specialized algorithms that thrive in massively parallel shared-nothing systems. To answer the question of how to efficiently compute join queries in this setting, a rich line of research has emerged specifically for the Massively Parallel Communication (MPC) model. In the MPC model, algorithms are executed in rounds, with each round consisting of a synchronized communication phase and a separate local computation phase. The main cost measure is the load of the algorithm, defined as the maximum number of messages received by any server in any round.

We study worst-case optimal algorithms for the join query evaluation problem in the constant-round MPC model. In the single-round variant of MPC, the worst-case optimal load for this problem is well understood and algorithms exist that guarantee this load for any join query. In the constant-round variant of MPC, queries can often be computed with a lower load compared to the single-round variant, but the worst-case optimal load is only known for specific classes of join queries, including graph-like and acyclic join queries, and the associated algorithms use very different techniques. In this paper, we propose a new constant-round MPC algorithm for computing join queries. Our algorithm is correct for every join query and its load matches (up to a polylog factor) the worst-case optimal load for at least all join queries that are acyclic or graph-like.

Keywords and phrases:
Worst-case optimal load, MPC model, join queries
Copyright and License:
[Uncaptioned image] © Heba Aamer and Bas Ketsman; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Distributed computing models
; Theory of computation Logic and databases ; Theory of computation Abstract machines
Funding:
This work is partially funded by FWO-grant G062721N.
Editors:
Sudeepa Roy and Ahmet Kara

1 Introduction

In this paper, we study the query evaluation problem in distributed shared-nothing systems with very large numbers of servers. More precisely, we present a new algorithm for the computation of join queries (also called full self-join free conjunctive queries) in the MPC model. The MPC model is a well-established model within the database theory community for studying distributed algorithms. The model takes the number p as a parameter, representing the number of available servers. Servers have no direct access to each other’s memory but they are connected through a network and can communicate with each other via private (one-to-one) message-passing. Algorithms for the MPC model are executed in rounds, with each round consisting of a local computation phase followed by a global communication phase. During the local computation phase, every server performs a computation over the data it has locally. During this phase, no messages are sent or received. During the communication phase, every server can send messages to other servers in the network. During this phase, no computation is performed by the servers. The communication phase ends with a global synchronization step after which all messages sent during the communication phase arrive in batch at their destination. We refer to the number of messages (i.e., tuples) received by a server during the communication phase as the server’s load in that round.

The cost of an algorithm in the MPC model is measured in terms of maximum load and number of rounds (the number of synchronization barriers, to be precise). By maximum load, we mean the maximum load of any server during any of the algorithm’s rounds. In this paper, we focus on worst-case optimality results for join queries. That is, we are interested in algorithms that optimize the maximum load w.r.t. given cardinality constraints rather than for the specific database instance over which the query is computed. We assume that initially the database instance D is evenly (and randomly) partitioned over the p servers, hence requiring a linear load of |D|/p with |D| being the number of tuples in D, which is essentially the best load one can hope for. Clearly, for the join query evaluation problem, a linear load of |D|/p in the worst-case will be unrealistically low for most queries. In this context, we remark that some care is necessary with respect to the number of allowed rounds, as with p rounds one server can learn the entire database with a load as low as |D|/p by having every other server send all its data in a round, after which computing any query over D becomes trivial. For this reason, it is common to add a restriction on the number of rounds to the model, with single-round and constant-round algorithms being of particular interest in the literature.

(a) The hypergraph of q.
(b) Optimal fractional vertex cover for q.
(c) Query q{e,g} (thin and thick hyperedges) and (q{e,g}) (only thin hyperedges).
Figure 1: Illustration of concepts on join query q:=R1[a,b]R2[b,c,d,e]R3[b,e,f]R4[e,f,g]R5[g,h]R6[g,i]R7[h,i].

An algorithm in the MPC model evaluates a join query correctly, when every tuple in the output can be produced by any of the servers at any round. Several lower bounds for the join query evaluation problem in the MPC model are known. Since many of these bounds [10, 7, 12, 6, 13] rely on the assumption that relation instances have equal cardinalities m, we will also assume this condition throughout the paper. In other words, |D|=rm, with r the number of relations of the query and m the cardinality of the individual relation instances. The bounds relevant to this paper are all of the form |D|/p1/k, with k a number depending on the structure of the target query where k varies depending on the considered variant of the problem. It is noteworthy that the bounds are in terms of |D| (and not the number of bits) because we only consider tuple-based algorithms, similar to other works.

Early work mostly focuses on single-round MPC algorithms [2, 4, 5, 3, 10] and in this variant of the model the worst-case load is now well-understood. An algorithm of particular importance in this context is HyperCube [2, 4], also called the shares algorithm. The HyperCube algorithm is parameterized with an assignment of values for the query attributes that represent their shares. These shares are often computed based on a fractional vertex cover which is a function assigning rational weights between 0 and 1 to the attributes of the query such that, for every relation, the sum of weights assigned to its attributes is at least 1 (i.e., the relation is covered). An example of a fractional vertex cover of the query q, whose hypergraph is in Figure 1(a), is given in Figure 1(b). The sum of the assigned weights is called the weight of the fractional vertex cover. The choice of cover influences the load that the HyperCube will guarantee. More precisely, the lower the weight of the chosen vertex cover is, the lower the load gets. The optimal (i.e., least) weight is called the fractional vertex covering number, denoted τ. The fractional vertex cover in Figure 1(b) is an example of an optimal cover for q and hence τ(q)=3. When the HyperCube is parameterized with an optimal fractional vertex cover, the algorithm is optimal in the sense that its load matches a |D|/p1/τ lower bound [5] up to a polylog factor and with high probability when the input database is skew-free. Skew-freeness means that the degrees of attribute values (and by extension, partial tuples over attributes) do not exceed certain threshold values. While the precise definition of the threshold values is unimportant for the discussion, we remark that they depend on the considered fractional vertex cover. Intuitively: a higher weight on an attribute requires a lower degree for the attribute’s values. In particular, a weight of 0 means that no constraints apply on the attribute’s values.

Without constraints on the database, the worst-case optimal load of single-round MPC algorithms is |D|/p1/ψ and an algorithm exists that can achieve it w.h.p. and up to a polylog factor [10]. The number ψ is the edge quasi-packing number and equals the maximum fractional vertex covering number of residuals of the target query. A residual of a query q is the query obtained after removing some of the attributes of q (from all relations that have these attributes). Since every query is a residual of itself, it is immediate that τψ. An example of a residual query of q is given in Figure 1(c). A simple algorithm to compute join queries with load |D|/p1/ψ [10] goes as follows: first partition the database in fragments such that in each fragment, for every attribute of the query, all values for that attribute either all have a low degree or all have a high degree (with respect to some threshold value). Given a specific such database fragment, say with H the set of heavy attributes, HyperCube is applied parameterized with a fractional vertex cover that assigns weights 0 to all the attributes in H. Since the threshold values will guarantee that the fragment is skew-free w.r.t. any fractional vertex cover assigning weights 0 to attributes in H, the load of this computation is decided by the weight of the considered fractional vertex cover, and is thus at best dependent on the fractional vertex covering number of the residual of q based on H. Since the total number of considered fragments is constant (under data complexity), the overall algorithm computes the target query over all fragments in parallel using the same set of p servers.

The main advantage of using multiple rounds lies in the key insight that the evaluation of some (residual) queries can be simplified by computing semi-joins in one round with linear load. For example, the evaluation of query q{e,g} in Figure 1(c) over some fragment can be simplified into the evaluation of query (q{e,g}) in the same figure (over a modification of the fragment with similar size) by applying semi-joins and hence lowering the load from |D|/p1/4 (as in the single-round algorithm, because τ=4) to |D|/p1/3. Leveraging this technique, multi-round join algorithms rely on the following reduction procedure: A set H of heavy attributes is chosen based on some threshold values. Now, suppose there are s possible heavy tuples over the H attributes in the considered fragment. The set of available servers gets equally divided into |s| (disjoint) groups such that each heavy tuple 𝒉 gets a dedicated group of servers that is responsible for computing the query over the considered fragment only when the attributes of H are fixed as per the values in 𝒉. The H attributes are then dropped from the query and, over every group of servers, semi-joins are applied. How to proceed and evaluate the simplified query afterwards varies depending on the algorithm under consideration.

For the multi-round variant, a few results are known and several questions remain open. It is known that the worst-case optimal load is at least |D|/p1/ρ, where the number ρ is the fractional edge covering number of the query. This lower bound result was obtained in [10], and for some join queries this bound is tight. This is the case for all graph-like queries (i.e., join queries involving relations with at most two attributes) [10, 7, 12, 11, 8] as well as for all join queries that are acyclic [6, 13]. For a while, it has been thought that |D|/p1/ρ could be the optimal load in general, but this idea was recently debunked. In particular, it is shown in [6] that |D|/p1/τ is a tighter lower bound for a specific class of cyclic queries over relations with an arbitrary number of attributes. We remark that both graph-like and acyclic join queries have an important property in common, namely τρ,111Property τρ for reduced acyclic queries was only conjectured in [6]. Interestingly, our construction in Section 6.2 gives a proof for this conjecture as a side result. while for the class of queries considered in [6] we have ρτ. It is worth noting that the worst-case optimal algorithms proposed in the literature for either the graph-like or the acyclic queries cannot straightforwardly be generalized to arbitrary join queries. Thus, it remained open whether there exists a unified multi-round MPC algorithm that can work on arbitrary join queries. Later, in [11] an algorithm is introduced that can compute all join queries with load based on a newly-specified number. This number strictly exceeds ψ for some queries and hence on such queries, using more rounds incurs more load. For this reason and for brevity, we no longer mention this algorithm in the rest of the paper and we will compare it against ours in future work.

In this paper, we propose a new distributed join algorithm whose load is bounded by |D|/p1/γ up to a polylog factor and with high probability, and with γ a new query related number that we call PAC. Our algorithm runs in three rounds and is relatively simple, yet it is powerful enough to show the following:

  1. 1.

    We show that ργψ for all join queries. Moreover, we show that γρ for acyclic and graph-like join queries, i.e., our algorithm is worst-case optimal for all classes of queries for which the worst-case optimal load is currently known (with one exception discussed in Section 7).

  2. 2.

    We also show that ρ=γ<ψ (and hence worst-case optimal) for some join queries that are not considered by any other previous algorithms.

PAC refers to three main concepts (Patches, Anchors, and semi-Covers) that our number, γ, is based on and our algorithm is parameterized with. A high-level overview: both patches and anchors are sets of sets of attributes from the query, and semi-covers are a relaxation of normal fractional vertex covers in which only a subset of the relations of the query should be covered. Similar to before, our algorithm evaluates a query over a database by considering the different fragments separately. Over a particular fragment, the algorithm is parameterized with a choice of patches, anchors, and a semi-cover. The algorithm then generalizes the aforementioned reduction procedure in several aspects. The attributes in the chosen patches are simply the attributes of the set H. This results in evenly dividing the set of available servers among the possible heavy tuples over the attributes in H. Before applying the semi-joins, the algorithm performs the following extra step: every group of servers, for some heavy tuple, gets unevenly divided based on the attributes in the chosen anchors such that the higher the degree of the values in the anchors, the larger the subgroup of servers it gets assigned. The semi-join step gets applied in every subgroup separately, and afterwards, the simplified query gets evaluated using HyperCube parameterized with the chosen semi-cover and extra shares to cover the rest of the query.

The definition of γ is more complicated than the earlier considered numbers, but links quite directly to the underlying algorithm, similar to how the worst-case optimal single-round MPC algorithm relates to the definition of ψ. While this number is not quite as elegant as τ, ρ, or ψ, we believe that it brings us another step closer to a worst-case optimal algorithm for all join queries.

The paper is organized as follows: We give the preliminaries in Section 2. In Section 3 we introduce PAC, which is the central number in this paper. In Section 4 and Section 5 we introduce algorithmic techniques followed by the algorithm underlying γ. In Section 6 we relate γ to τ, ρ, and ψ. We conclude in Section 7.222Due to space limitations, most of the proofs are omitted and will appear in the full version of the paper.

2 Preliminaries

For a set S, 𝒫+(S):={AAS} denotes the set of all non-empty subsets of S. We write [0,1] as abbreviation for the set {i+0i1} of rationals between (and including) 0 and 1.

Relations and join queries

Let rels and attrs be respectively an infinite domain of relation names and attribute names. For convenience of notation, specifically to avoid non-determinacy in some of our notations, we assume a total order rels over rels.

A join query q is a pair (relsq,attrsq) consisting of a finite set relsqrels of relation names and a mapping attrsq associating each relation name Rrelsq to a finite set attrsq(R) of attribute names from attrs.333 Our definition of join a query disallows self-joins and repeated occurrences of attributes in relations, which are restrictions also imposed by all previous papers on the topic.

For a join query q, we write attrs(q) and ins(q)444ins stands for incidence (attribute) subsets. to denote:

attrs(q) :={ααattrsq(R),Rrelsq} and
ins(q) :={A𝒫+(attrs(q))Aattrsq(R),Rrelsq},

where the former is the set of all attributes appearing in the relations of q, and the latter is the set of sets of attributes of q appearing together in at least one relation of q. For consistency, we also write rels(q) instead of relsq.

Two attributes α,β from attrs(q) are called adjacent if they are different and there is a relation Rrels(q) with {α,β}attrsq(R). Moreover, for two different relations R1,R2rels(q), we say that R1 is reducable into R2 subject to a set of attributes Aattrs(q), if attrsq(R1)Aattrsq(R2). When A= we simply say that R1 is reducable into R2. Finally, we say that q is reduced if it has no reducable relations. When q is not reduced, we will write (q) to denote the reduced query of q, which is the query obtained from q by removing all reducable relations.555 We assume that reductions are done using some deterministic procedure, making use of rels to break ties, such that (q) defines a unique query.

Tuples and instances

A tuple 𝒕 over a finite set of attributes Aattrs is a function from A to dom. We will use the term A-tuple to refer to a tuple 𝒕 with specific domain A (in other words, 𝒕 is precisely defined over the attributes of A). A finite set of tuples over the same set of attributes A is called a relation instance over A. For a tuple 𝒕 over A and a subset BA of its attributes, we write 𝒕[B] to denote the tuple over B with, for every αB, 𝒕[B](α)=𝒕(α) in which case we say that 𝒕 is an extension of 𝒕[B]. By convention, we consider every tuple an extension of itself and of the empty tuple. We say that an A-tuple 𝒕 and a B-tuple 𝒖 are consistent if 𝒕[AB]=𝒖[AB].

A (database) instance D:=(D) for join query q consists of a mapping D associating every relation name Rrels(q) with a relation instance RD over attrsq(R). We write |D|:=Rrels(q)|RD| to denote the total number of tuples in D.

For a tuple 𝒕 over Ains(q), we use the following notations:

degRD(𝒕):= {|{𝒖RD𝒖[A]=𝒕}|Aattrsq(R)0otherwise;
degD(𝒕):= Rrels(q)degRD(𝒕); and
tupD(A):= {𝒖[A]𝒖RD,Aattrsq(R),Rrels(q)},

denoting, respectively, number of tuples in RD that are extensions of 𝒕; number of tuples in D that are extensions of 𝒕; and the set of all A-tuples for which an extension in D exists.

Join query semantics

Given an instance D for some join query q and a tuple 𝒕 over Aattrs(q), we say that 𝒕 is consistent with D if, for every Rrels(q), RD contains a tuple that is consistent with 𝒕. We denote the set of all A-tuples consistent with D as joinsD(A):={𝒕𝒕 is an A-tuple consistent with D}.

The output of a join query q over an instance D can then be defined as qD:=joinsD(attrs(q)).

Weight mappings

In this paper we will often consider weight mappings for join queries q, which are functions f:A[0,1] over some set of attributes Aattrs(q) associating a non-negative rational weight f(α)[0,1] to every attribute αA. Remark that we do not require a weight mapping to assign weights to all attributes of the query. Henceforth, given a particular weight mapping f for q, we will use Af to denote the set of attributes f is defined over. For simplicity of notation, we often write f(B) with a set of attributes BAf, to denote the sum f(B)=αBf(α) of the respective attribute weights. We remark that f(B) can be larger than 1. Moreover, we associate a cost c(f) to a weight mapping f with c(f):=max{1,f(Af)}.

We say that a weight mapping f for q covers a relation Rrels(q) if f(attrsq(R)Af)1. Given a weight mapping f and a join query q (it is possible that f is not defined for q), we define coverq,f to be the join-query with:

rels(coverq,f) :={Rrels(q)R is covered by f} and
attrscoverq,f(R) :=attrsq(R), for every Rrels(coverq,f).

Consistent with the literature, we call f a fractional vertex cover for q if Af=attrs(q) and f covers every relation of q (in which case coverq,f is query q itself and we say that f is a cover for q).666We note that every fractional vertex cover for a query q is a cover for q and that every cover f for q can be extended to a fractional vertex cover f of q with same cost (i.e., f(Af)=f(attrs(q))) by assigning a weight f(α)=0 to attributes αattrs(q)Af.

Residual queries

For a join query q and subset Aattrs(q), the residual query qA is the join query with properties

rels(qA) :={Rrels(q)attrsq(R)A} and
attrsqA(R) :=attrsq(R)A, for every Rrels(qA).

In Figure 1(c), we give the residual query q{e,g}. Notice that q{e,g} also exemplifies that the residual query of a reduced query is not necessarily reduced.

3 Towards PAC

We define the PAC number for join queries. The PAC number is defined relative to a sequence 𝒓 of rationals that we call a spectrum. More precisely, a spectrum 𝒓=r1,,rk is a finite sequence of strictly increasing weights from [0,1] with r1=0 and rk=1.

3.1 Configurations

Figure 2: A visual depiction of configuration 𝒞 of q in which grey-scale shades represent spectrum values, with white, grey, and black representing, respectively, the values 1, 12, and 0. Remark that the absence of a shade between sets of variables thus means a weight of 1. For instance, 𝒞({g})=0, 𝒞({b})=12, 𝒞({d})=1, 𝒞({b,c})=12, while 𝒞({c,e})=1.

A configuration 𝒞 of a join query q is a monotone mapping 𝒞:ins(q)[0,1] that assigns a weight to every set of attributes in q appearing together in at least one relation of q. By monotone we mean that for sets A,Bins(q) and BA: 𝒞(B)𝒞(A). In addition, we require 𝒞(attrsq(R))=1 for relations Rrels(q) for which no relation R with attrsq(R)attrsq(R) exists. We say that 𝒞 adheres to 𝒓 if its image contains only weights occurring in 𝒓, i.e., 𝒞(A)𝒓 for every Ains(q).

We will use configurations as a type of histogram for database fragments, More precisely, 𝒞(B) will mean that the number of tuples extending any specific B-tuple is below a threshold value based on 𝒞(B) and above a threshold value based on the element following 𝒞(B) in 𝒓 (if 𝒞(B)<1). The precise threshold values are unimportant for the moment and will be defined in Section 4.

We call a set of attributes Ains(q) heavy w.r.t. 𝒞 if 𝒞(A)<1 and that an individual attribute αattrs(q) is heavy if {α} is heavy. We say that a relation Rrels(q) is heavy w.r.t. 𝒞 if all its attributes are heavy w.r.t. 𝒞. We denote the set of all heavy attributes and heavy relations of q w.r.t. 𝒞 by Heavyattrs(q,𝒞) and Heavyrels(q,𝒞), respectively.

Example 1.

An example of a configuration for q is depicted in Figure 2. For this configuration, 𝒞, Heavyattrs(q,𝒞)={a,b,c,e,g} and Heavyrels(q,𝒞)={R1}. The sets {b,c}, {a},{b},{c},{e},{g} of ins(q) are the (only) heavy sets of attributes for q w.r.t. 𝒞.

Under the presence of a configuration, we will not be interested in arbitrary weight mappings for q, but only in those that are compatible with 𝒞 in the following sense: A weight mapping f for q is compatible with 𝒞 if min{1,f(B)}𝒞(B) for every Bins(q) with BAf.

Example 2.

Weight mapping f for q given in Figure 1(b) is not compatible with configuration 𝒞 for q in Figure 2 because f(a)=14>0=𝒞({a}). An example of a weight mapping for q that is compatible with 𝒞 is given in Figure 3. We remark though that f is not a fractional vertex cover for q.

In our algorithm, compatible weight mappings will be used like fractional vertex covers in HyperCube over skew-free databases (a formal argument is given in Section 4.4). Since compatible weight mappings covering all relations of the query do not always exist we next introduce two new concepts: Anchors and Patches that, intuitively, allow to cover the remaining relations of the query. In fact, even when compatible weight mappings exist that cover the entire query, they are not necessarily the best choice cost-wise.

3.2 Anchors

We call a pair (X,Z) of non-empty sets of attributes of q an anchor for q and refer to the relations Rrels(q) with XZattrsq(R) as relations anchored by (X,Z). From now on we will only consider anchors for which at least one anchored relation exists. We remark that X and Z do not need to be disjoint.

For a set 𝒜 of anchors for q we write rels(𝒜) to denote the set of all anchored relations, that is, rels(𝒜):={Rrels(q)XZattrsq(R),(X,Z)𝒜}. We write H(𝒜) to denote the union of the Z’s in 𝒜, that is, H(𝒜):={ααZ,(X,Z)𝒜}.

We say that an anchor (X,Z) of q is compatible with 𝒞 if 𝒞(X)=1 and 𝒞(Z)<1, and we call a (possibly empty) set of anchors for q that are compatible with 𝒞 an anchoring of q w.r.t. 𝒞. Moreover, we associate a cost c(𝒜) to such an anchoring 𝒜 with c(𝒜):=|𝒜|.

Example 3.

Set 𝒜:={({b,e},{e})} is an example anchoring of q w.r.t. 𝒞 with rels(𝒜)={R2,R3} and c(𝒜)=1. Sets ({b},{e}) and ({h},{i}) are examples of anchors for q not compatible with 𝒞.

3.3 Patches

Formally, a patch 𝒫 for q (w.r.t. 𝒞) is a subset of ins(q) whose elements are all heavy and disjoint. We write attrs(𝒫) for the set of all attributes occurring in elements of 𝒫, i.e.,  attrs(𝒫):=B𝒫B. We also associate a cost to every set B𝒫 with c(B):=ri+1 where ri=𝒞(B). We recall that, by definition of a patch, 𝒞(B)<1, hence ri+1 indeed always exists. The cost of a patch 𝒫, denoted c(𝒫), is then defined as c(𝒫):=B𝒫c(B).

Example 4.

For an example of a patch for q, consider the set 𝒫={{g},{e},{b,c}}. Then, for q w.r.t. 𝒞 adhering to spectrum 0,12,1, we have c(𝒫)=52. However, if the considered spectrum had weights 0,14,12,34,1, c(𝒫) would be 74. Furthermore, we remark that sets {{b,e},{b}} and {{b,e}} cannot be chosen as patches for q w.r.t. 𝒞 because their elements are not disjoint (for the former) or not all heavy (for the latter).

3.4 Semi-Covers and Solutions

Finally, we combine all concepts in a solution for a query q w.r.t. a configuration 𝒞. Before giving the definition, we make formal what we expect from a weight mapping not covering all relations:

Figure 3: A visual depiction of the configuration 𝒞 along with a solution for q :=R1[a,b]R2[b,c,d,e]R3[b,e,f]R4[e,f,g]R5[g,h]R6[g,i]R7[h,i] w.r.t. 𝒞.
Definition 5.

We call a weight mapping f a semi-cover for q w.r.t. a configuration 𝒞 of q, a set Hattrs(q), and a set Srels(q), if AfH= and for every relation Rrels(q)Heavyrels(q,𝒞), there is a relation Rrels(q) such that R is reducable into R subject to H, and R is either covered by f or in the set S.

Example 6.

Remark that f is a semi-cover for q w.r.t. any configuration of q, any set H, and any set S because f covers q (i.e., f is a fractional vertex cover for q).

For a more interesting example, consider the weight mapping f in Figure 3. We can see that R7 is the only relation that is covered by f. Now, take H to be {e,g} and S to be {R2,R3}. Then f is a semi-cover for q w.r.t. 𝒞, H, and S. Indeed, R1Heavyrels(q,𝒞), R4 is reducable into R3S subject to H, both R5 and R6 are reducable into R7 subject to H, and R7 is covered by f.

A solution for q w.r.t. 𝒞 is a triple (𝒜,f,𝒫) with 𝒜 an anchoring of q w.r.t. 𝒞, 𝒫 a patch for q w.r.t. 𝒞, and f a compatible semi-cover for q w.r.t. 𝒞, set H(𝒜)attrs(𝒫), and set rels(𝒜). The cost we associate with a solution 𝒮 is defined c(𝒮):= c(𝒜)+c(f)+c(𝒫).

3.5 The PAC Number

As mentioned earlier, configurations correspond to database fragments. We will show later that the cost of a solution w.r.t. a given configuration is the cost of computing the query over the corresponding fragment using the different components of the solution. We can now define the PAC number γ(q) for a join query q w.r.t. a spectrum 𝒓,777To be precise, γ(q) should be parameterized with the spectrum 𝒓 considered. We omit this parameterization as it will significantly overload the notation throughout the paper. Except where specified differently all results hold for any choice of 𝒓. as

γ(q):=max𝒞{min𝒜,f,𝒫{c(𝒜)+c(f)+c(𝒫)}}

with 𝒞 ranging over all configurations of q adhering to 𝒓 and (𝒜,f,𝒫) ranging over all possible solutions for q w.r.t. 𝒞. Intuitively, the formula of the PAC number states that the cost of a query q is determined (and bounded) by its most computationally difficult configuration. Since there could be multiple solutions w.r.t. the same configuration, we are only interested in any solution with minimal cost.

Example 7.

Following from Examples 3 and 6, we see that 𝒮:=(𝒜,f,𝒫) with 𝒫:={{g}} is a solution for q w.r.t. 𝒞 with c(𝒮)=52. A visual depiction of 𝒮 is given in Figure 3. Notice that every other solution for q does not have lower cost w.r.t. 𝒞.

The reader may wonder if the complexity of the PAC number is justified and whether all of its concepts are necessary. The answer to this question is open, but we will show in the full version of the paper that all obvious simplifications of our number, like restricting/forbidding anchors, considering only spectrum 𝒓=0,1, etc, all give a number that is strictly less tight than γ for some join queries.

The remainder of the paper is devoted to showing the below theorem and how γ relates to the existing numbers τ, ρ, and ψ.

Theorem 8.

Let q be a join query, D an instance, and 𝐫 a fixed spectrum. Then, q can be computed over D using p servers in three rounds with load 𝒪~(|D|/p1/γ(q)) with high probability.

4 General Techniques

In our algorithm, we will make use of techniques and insights that follow directly or with minor modifications from classical results in the literature. In this section, we state these results.

4.1 Database Fragments

Like several of the earlier proposed algorithms in the literature [10, 7, 12, 8], our algorithm will first divide the input database instance in sub-databases. This is where configurations adhering to a spectrum become useful: The consecutive weights in a spectrum 𝒓=r1,,rk define intervals of threshold values that can be used to partition sets of tuples based on their degrees. With this intuition in place, a configuration adhering to 𝒓 thus pinpoints an interval 𝒞(A)=ri for all sets Ains(q). The intervals implied by a spectrum are based on the weights in 𝒓, but also take D, p and q into account. Formally, a tuple 𝒕tupD(A) is in the interval ri of 𝒓 if its degree degD(𝒕) is below (or equal to) |D|/pri/γ(q) and (if i<k) strictly above |D|/pri+1/γ(q). Henceforth, we will refer to the subset of A-tuples in D in the interval 𝒞(A) as FD(A,𝒞).

We can now define a database instance =() over q with R:={𝒕RD𝒕[A]FD(A,𝒞), for every Aattrsq(R)}. Intuitively, contains all tuples from D that respect all degree constraints imposed by 𝒞. Henceforth, we will refer to as Fragment(D,𝒓,𝒞). We remark that some care must be taken when interpreting the imposed degree constraints in the context of , as they are defined w.r.t. D. For example, for a tuple 𝒕R it does not necessarily follow from degD(𝒕)>m that deg(𝒕)>m.

The next proposition, which states that the output of q over D is the same as the combined output of q over the different fragments of D, justifies our definition:

Proposition 9.

For a join query q, an instance D for q, and a spectrum 𝐫, we have

qD=𝒞confs(𝒓)qFragment(D,𝒓,𝒞)

with 𝒞 ranging over all configurations for q adhering to 𝐫. Moreover, when 𝐫 is fixed, |confs(𝐫)| is constant, with confs(𝐫) denoting the set of configurations adhering to 𝐫.

4.2 Tuple-Restricted Database Fragments

Sometimes we will consider a value assignment for some of the attributes of q (defined as a tuple 𝒉 over a subset A of attrs(q)), and then be interested in the subinstance of an instance D for q containing all and only those tuples of D consistent with 𝒉. Formally, this instance for q is defined as :=() with R:={𝒕RD𝒕 consistent with 𝒉}. Henceforth, we refer to by Subfragment(D,𝒉). Accordingly, we can easily see that computing qD amounts to the following:

Proposition 10.

Let q be a join query, D an instance for q, and Aattrs(q), we have qD=𝐭joinsD(A)qSubfragment(D,𝐭).

4.3 Semi-Joins and Reductions

It is well known that semi-joins and intersections can be computed with a single round of communication using p servers with a load not exceeding |D|/p w.h.p., and that guarded join queries (i.e., join queries having a single relation where all the other relations of the query are reducible into) can be computed with precisely two rounds and with same load guarantees independent of the number of reducible relations [10]. The latter follows from the observation that a reducible relation with its guard is a special type of semi-join and we can compute all these semi-joins at the same time in one round. Since after this step there will exist many copies of the guard relation (one for each reducible relation) a second step is needed to compute the intersection of its copies. Formally,

Proposition 11.

A database D for a join query q can be translated to a database (D) for (q) using p servers in two rounds with load not exceeding |D|/p w.h.p.

4.4 HyperCube-Style Query Evaluation

Compatible weight mappings f for q w.r.t. a configuration 𝒞 are essentially generalizations of the weight mappings used by the HyperCube algorithm [9] and the degree constraints associated with so-called skew-free databases w.r.t. these weight mappings, as made formal by the below proposition.

Proposition 12.

Let q be a join query, D an instance for q, 𝐫 a spectrum, 𝒞 a configuration of q adhering to 𝐫, the fragment Fragment(D,𝐫,𝒞), and f a compatible weight mapping for q w.r.t. 𝒞 with f(Af)γ(q). Then, coverq,f can be computed over using pf(Af)/γ(q) servers in one round with load 𝒪~(|D|/p1/γ(q)) w.h.p.

Proposition 12 is based on the analysis of the HyperCube algorithm [9]. A full proof is in Appendix A.1.

4.5 Join Query Decompositions

Some join queries can be decomposed into a set of subqueries, and then computed in one round by computing each subquery in that round, over its own share of servers. This is formulated in Proposition 13. For this, we say that a set of queries Q={q1,,qn} is a decomposition of a join query q if rels(q)=qirels(qi) and for every Rrels(qi) for i=1,,n, we have attrsqi(R)=attrsq(R).

Proposition 13.

Let q be a join query, D an instance for q, and Q={q1,,qn} a decomposition of q. Suppose for i=1,,n, we know that qi is computable over D using pi servers in one round with load 𝒪~(Li). Then, q can be computed over D using ipi servers in one round with load 𝒪~(maxi{Li}) w.h.p.

The observation follows from viewing the evaluation of a join query as a Cartesian product of its decomposition. When distinct queries share common attributes, the Cartesian product returns copies of those attributes (one for each occurrence in a query).

The formal argument of the above load analysis is based on a result in [8, Lemma 3.2], in which this result is proven for decompositions with join queries not sharing any attributes. It however follows immediately that this technique also works if the involved queries share attributes, only requiring an additional local computation step during which tuples are combined taking the values of common attributes into account (see [6] for further discussion).

5 Algorithm

In this section, we show how to compute a join query q over a specific fragment :=Fragment(D,𝒓,𝒞) of the database instance D using p servers, based on some spectrum 𝒓 and configuration 𝒞 for q adhering to 𝒓 using a specific solution 𝒮 for q w.r.t. 𝒞. More precisely, this section is entirely devoted to showing the following:

Theorem 14.

Let q be a join query, D an instance, and 𝒞 a configuration for q adhering to a spectrum 𝐫. For any solution 𝒮 for q w.r.t. 𝒞, we can compute q over Fragment(D,𝐫,𝒞) using p servers in three rounds with load 𝒪~(|D|/p1/c(𝒮)) with high probability.

When 𝒓 is fixed, Theorem 8 is then a corollary of Theorem 14 and Proposition 9.

It is worth mentioning that the number “three” refers to the number of communication rounds of the described algorithm. For the sake of precision, we remark that extra rounds are needed if the required degree information is not readily present. It is not uncommon to assume that this information is available for each server at the start of the algorithm [7].

5.1 Server Organization

Towards the algorithm proving Theorem 14, we first show how the p servers are organized. For this, let H:=H(𝒜)attrs(𝒫). We partition the available p servers in disjoint groups, with equally many groups as there are H-tuples in the fragment . We refer to this set of H-tuples by 𝑯; formally defined as follows:

𝑯:={𝒕𝒕[Z]tup(Z) for every (X,Z)𝒜 and 𝒕[B]tup(B) for every B𝒫}.

For each such H-tuple 𝒉𝑯, the associated group consists of precisely p𝒉 servers:

p𝒉:=pc(f)/c(𝒮)pf×a=(X,Z)𝒜p1/c(𝒮)degD(𝒉[Z])degs(Z)pa,

with degs(Z):=𝒕tup(Z)degD(𝒕) (intuitively denoting the total number of Z-tuples having values in fragment ). The next lemma shows some desirable properties of the server organization (a proof is given in Appendix B.1):

Lemma 15.
  1. 1.

    𝒉𝑯p𝒉p; and

  2. 2.

    p𝒉p1/c(𝒮).

5.2 Communication and Computation

Our algorithm proceeds with a computation of q for each instance Subfragment(,𝒉) apart, using its designated group of p𝒉 servers. Correctness of this approach follows directly from Lemma 15 (1), Proposition 10 and the following lemma.

Lemma 16.

𝒕𝑯qSubfragment(,𝒕)=𝒕joins(H)qSubfragment(,𝒕).

Now the computation of q over F:=Subfragment(,𝒉) for a specific 𝒉𝑯 goes as follows. Here, we assume w.l.o.g. that all servers are aware of the tuples in 𝑯 as well as their associated server groups.

Step 1:

Instance F is transformed in instance F𝒉:= (FH) for q𝒉:= (qH) by dropping the attributes of 𝒉 from all relations in F and computing semi-joins as explained in Section 4.3 using the p𝒉 servers.

Step 2:

The join query q𝒉 is now viewed as a set of join queries {qf, qa1,,qa,qb} (clearly a decomposition of q), with a query qai for every anchor ai𝒜 having precisely the relations of q𝒉 anchored by ai; with qb the query consisting of all relations from Heavyrels(q,𝒞) remaining in q𝒉; and with qf:=coverq𝒉,f. In this step, we broadcast all tuples from relations in qb to all servers of the group. Moreover, we hash-partition the relations of qf over the share pf using HyperCube parameterized with f, and, for each anchor a=(X,Z)𝒜, we hash-partition the relations of the query qa over share pa using the values of attributes X (pretending they are a single value).888We remark that it is not an issue if this requires applying hash functions over attributes α from H (which are not present in q𝒉 nor F𝒉) as in that case there is only one value to consider, namely 𝒉[α].

5.3 Load Analysis

We argue that the outlined algorithm computes q over F using p𝒉 servers with the desired load. A key aspect of the analysis is that our algorithm requires only communication to servers in the group p𝒉. In other words, although this algorithm will be applied in parallel for all tuples of 𝑯 over their respective database fragments, the load of servers will be decided entirely by the specific run of the algorithm for the tuple 𝒉 they are responsible for.

In the remainder of this subsection, we will thus focus on the load of the individual steps 1 and 2. For Step 1, the load follows directly from Proposition 11 and Lemma 15. We also remark that the result of a semi-join (and by extension the size of relations in F𝒉) do not exceed the size of relations they are based on (the size of relations in F, respectively). For Step 2, the desired load follows from Proposition 13 and the fact that the individual queries qai as well as qf and qb can be computed with desired load over their respective shares.

Lemma 17.

The load of computing each of the following in one round is 𝒪~(|D|/p1/c(𝒮)) w.h.p.

  1. 1.

    qbF𝒉 using 1 server;

  2. 2.

    qfF𝒉 using pf servers; and

  3. 3.

    qaF𝒉 using pa servers, for each anchor a.

A proof for Lemma 17 is given in Appendix B.2.

6 PAC in Relation to Other Numbers

In this section, we show how γ relates to the different existing numbers ψ, ρ, and τ, that have been considered in the literature. The main result of this section is the following:

Theorem 18.

For any choice of spectrum:

  1. (a)

    ρ(q)γ(q)ψ(q) for all join queries q; and

  2. (b)

    τ(q)γ(q) for reduced join queries q.

For any spectrum including weight 1/2:

  1. (i)

    ρ(q)=γ(q) for join queries q that are acyclic or graph-like;

  2. (ii)

    ρ(q)=γ(q)<ψ(q) for some join queries q that are neither acyclic nor graph-like.

Property ρ(q)γ(q) is a direct consequence of the correctness of our algorithm (see Theorem 8) and the known lower bound on the load of such algorithms based on ρ(q) [10]. Properties τ(q)γ(q) (for reduced queries q) and γ(q)ψ(q) follow from the below Lemmas 19 and 20. Intuitively, Lemma 19 shows that a solution w.r.t. the configuration that has no heavy attribute sets always makes use of a solution 𝒮 whose semi-cover f is also a cover for q, thus with c(f)=c(𝒮).

Lemma 19.

Let 𝒞 be a configuration for some join query q with 𝒞(A)=1, for every Ains(q). Then, for every solution 𝒮:=(𝒜,f,𝒫) for q w.r.t. 𝒞, we have the following properties:

  1. 1.

    𝒜=𝒫=; and

  2. 2.

    f is a cover for q.

To see γ(q)ψ(q), it suffices to consider a minimal spectrum 𝒓=0,1 and observe that, for every configuration 𝒞 adhering to 𝒓, a fractional vertex cover for qH is a semi-cover for q compatible with 𝒞.

Lemma 20.

Let f be a fractional vertex cover for qH. Then, (,f,) is a solution for q w.r.t. 𝒞. In other words:

  1. 1.

    f is a semi-cover for q w.r.t. 𝒞, , and ; and

  2. 2.

    f is compatible with 𝒞.

For graph-like and acyclic join queries (which are the main classes of join queries for which the worst-case optimal load is known), we have an even stronger result, namely that the load of our algorithm matches the optimal load.

Theorem 21.

For join queries q that are graph-like or acyclic we have that ρ(q)=γ(q).

In the next two subsections, we show why γ(q)ρ(q) for graph-like, respectively, acyclic join queries.

As for item (ii) in Theorem 18, it is enough to give an example of a join query showing that our algorithm is also optimal for join queries that are neither acyclic nor graph-like. A simple example is

q:=R1[a,b]R2[a,c]R3[b,c,d]R4[d,e].

Notice that τ(q)=2 and ρ(q)=212. With spectrum 𝒓=0,12,1, we obtain that γ(q)=212, which is clearly the worst-case optimal cost for this join query. Verifying that γ(q) yields this cost is left as an exercise for the reader. We remark that ψ(q)=3 and thus that our algorithm has a strictly better load than any previously known algorithm that could compute this query.

6.1 Optimality for Graph-like Joins

To show γ(q)ρ(q) for reduced graph-like join queries, it is sufficient to consider spectrum 𝒓=0,12,1. The remainder of this section will be devoted to showing for an arbitrary configuration 𝒞 adhering to 𝒓 that there exists a solution 𝒮 with c(𝒮)ρ(q).

For the construction, we need extra terminology and call an attribute αattrs(q) based on 𝒞

  • relevant if it occurs in a relation Rrels(q)Heavyrels(q,𝒞);

  • heavy if it is relevant and 𝒞({α})=0;

  • isolated if 𝒞({α})=1 and all adjacent attributes are heavy;

  • light if it is relevant non-isolated and 𝒞({α})>0.

We denote the set of isolated attributes by I, the light attributes by L, and the heavy attributes by H.

For the construction of 𝒮=(𝒜,f,𝒫), we first choose an anchoring 𝒜 for q. For this, let us call a relation Rrels(q) an anchor candidate if it has an attribute from I (and therefore also an attribute from H). Now let be any maximal set of anchor candidates with property that the relations in share no attributes. From we then construct the desired anchoring 𝒜:={({α},{β})R,{α}=Iattrsq(R),βattrsq(R)H} of q. For convenience, we will refer by II to the isolated attributes occurring in and by HH to the remaining attributes in . As patch, we choose 𝒫={{α}αHH}. Finally, as weight mapping we take f:L(II)[0,1] assigning weight 1 to all attributes in II and weight 1/2 to attributes in L. Now, the desired result follows from the next proposition.

Proposition 22.
  1. 1.

    f is compatible with 𝒞;

  2. 2.

    f is a semi-cover for q w.r.t. 𝒞, H(𝒜)attrs(𝒫), and rels(𝒜);

  3. 3.

    c(𝒮)ρ(q).

6.2 Optimality for Acyclic Joins

To show γ(q)ρ(q) for reduced acyclic join queries q, it is sufficient to consider spectrum 𝒓=0,1. In the remainder of the section, we show how, for an arbitrary configuration 𝒞 adhering to 𝒓, a solution 𝒮 for q and 𝒞 can be constructed with c(𝒮)ρ(q).

Before going into the details of the construction, we remark that every reduced acyclic join query has an integral fractional edge cover [6]. That is, there is a set ECrels(q) with property |EC|=ρ(q) and every attribute αattrs(q) must appear in at least one relation in EC. Since q is acyclic it also has a join-tree T [1]: a tree having the relations of q as nodes and such that any two relations R1,R3rels(q) sharing an attribute αattrsq(R1)attrsq(R3) are connected by a path in the tree with property that every relation R2 on this path has αattrsq(R2).

In the special case ECHeavyrels(q,𝒞) it follows inevitably (by definition of Heavyrels(q,𝒞)) that rels(q)Heavyrels(q,𝒞) and hence that 𝒮=(,f:[0,1],) is the desired solution with trivial cost c(𝒮)=1ρ(q).

Therefore, we continue with the assumption that there is a relation RECHeavyrels(q,𝒞) and we make use of this assumption to also assume w.l.o.g. an orientation of T making R its root. The solution that we will construct for this case is of the form 𝒮=(𝒜,f,) with f:Af[0,1]:α1, i.e., with empty patch and a weight mapping using only weights 1.

Next, we describe an iterative procedure applied on T. During this procedure we pinpoint the anchors of 𝒜 as well as the attributes of Af. The procedure keeps track of a set V of already visited relations from rels(q), which is initially the empty set. The procedure continues as long as Vrels(q) but will eventually terminate as every step increases V by one or more relations. The procedure goes as follows: Take a relation Rcrels(q)V whose ancestors (according to T) are already in V. We distinguish between three cases:

  1. 1.

    Rc is the root of T;

  2. 2.

    a relation RECEC exists that is a descendant of Rc (according to T) and there is a non-empty set Xattrsq(Rc)attrsq(REC) of attributes with 𝒞(X)=1; or

  3. 3.

    there is no such set of attributes for Rc.

In case (1), we know RcECHeavyrels(q,𝒞). We add Rc to V and one of its attributes αattrsq(Rc) with 𝒞({α})=1 to Af.

In case (2), note that it is possible that REC and Rc are the same relation. We can assume w.l.o.g. that none of the relations between REC and Rc in T are in EC (beside REC itself). Indeed, if there is such a relation REC, we know that Xattrsq(R) by definition of join-tree and thus we can substitute REC by R. If there are multiple choices for the set X, we look at the relation RX with Xattrsq(RX) occurring closest to the root in T (remark that, by definition of join-tree, RX is unique). We then choose the set Xattrsq(Rc)attrsq(REC) with 𝒞(X)=1 whose RX occurs closest to the root of T in T.

Now, if there is an attribute αX with 𝒞({α})=1 and either RX has no parent, or, RX has a parent Rp with attrsq(Rp)attrsq(REC)=, then in both cases we add α to Af. Otherwise, we add (X,Z) as anchor to 𝒜, with Z a singleton containing an attribute αX with 𝒞({α})<1 (if RX has no parent) or Z=attrsq(Rp)attrsq(REC) (if RX has a parent Rp). We add all relations on the path from REC to Rc in T to V.

In case (3), we simply add Rc to V.

The aforementioned construction is indeed a solution w.r.t. 𝒞 as stated in the following lemma.

Lemma 23.

(𝒜,f,) is a solution for q w.r.t. 𝒞.

For the cost of c(𝒮) we remark that every step of the iterative procedure contributing to 𝒮 (namely cases (1 and 2) creates either one anchor or adds one attribute to Af. Since both steps add exactly one relation from ECV to V, it is immediate that |𝒜|+|Af||EC|. Furthermore, since |Af|1 (by choice of the root of T which falls in case (1)) we have that c(𝒮)=|𝒜|+|Af||EC|, as desired.

7 Conclusion and Future Work

Our algorithm can compute every join query q in the constant-round MPC model with a load bounded by the PAC number γ(q). Since γ(q)ψ(q), this load is at least as low as the optimal load for q in the single-round MPC model. Moreover, γ(q)=ρ(q) for any join query for which the worst-case optimal load is known, including all join queries that are graph-like or acyclic. We remark that the queries in  [6] shown to have an optimal load w.r.t. τ(q) have property τ(q)=ψ(q) and hence are also computed optimally by our algorithm. There is one exception: the Loomis-Whitney (LW) join [10]. For some configurations of an LW join query q, we only have to apply step 1 of our algorithm since the query (qH) consists of exactly one relation. In that case, our algorithm requires more load than what is expected “optimally”. The reason for this is that our analysis for step 1 is not fine-grained enough to show that, in some peculiar cases, step 1 can be computed with a better load guarantee. Capturing this case requires a more fine-grained analysis.

Although our algorithm is relatively simple (compared to other worst-case optimal algorithms described in the literature) the number γ is complicated and relative to a spectrum that needs to be decided in advance. Although the spectrum 0,1/2,1 is sufficient for all the proven relationships with ψ, τ, and ρ, there exist queries for which this choice of spectrum is suboptimal. In general, it is not known if a bound on spectra can be assumed or if it is computable from the structure of the query. We remark that it is currently also still unknown what the worst-case optimal load for arbitrary join queries in the constant-round MPC model is.

References

  • [1] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.
  • [2] Foto N. Afrati and Jeffrey D. Ullman. Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng., 23(9):1282–1298, 2011. doi:10.1109/TKDE.2011.47.
  • [3] Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Parallel-correctness and transferability for conjunctive queries. In Tova Milo and Diego Calvanese, editors, Proceedings of the 34th ACM Symposium on Principles of Database Systems, PODS, 2015, pages 47–58. ACM, 2015. doi:10.1145/2745754.2745759.
  • [4] Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. In Richard Hull and Wenfei Fan, editors, Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA, 2013, pages 273–284. ACM, 2013. doi:10.1145/2463664.2465224.
  • [5] Paul Beame, Paraschos Koutris, and Dan Suciu. Skew in parallel query processing. In Richard Hull and Martin Grohe, editors, Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS’14, Snowbird, UT, USA, June 22-27, 2014, pages 212–223. ACM, 2014. doi:10.1145/2594538.2594558.
  • [6] Xiao Hu. Cover or pack: New upper and lower bounds for massively parallel joins. In Leonid Libkin, Reinhard Pichler, and Paolo Guagliardo, editors, PODS’21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2021, pages 181–198. ACM, 2021. doi:10.1145/3452021.3458319.
  • [7] Bas Ketsman and Dan Suciu. A worst-case optimal multi-round algorithm for parallel computation of conjunctive queries. In Emanuel Sallinger, Jan Van den Bussche, and Floris Geerts, editors, Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017, pages 417–428. ACM, 2017. doi:10.1145/3034786.3034788.
  • [8] Bas Ketsman, Dan Suciu, and Yufei Tao. A near-optimal parallel algorithm for joining binary relations. Log. Methods Comput. Sci., 18(2), 2022. doi:10.46298/lmcs-18(2:6)2022.
  • [9] Paraschos Koutris. Query Processing for Massively Parallel Systems. PhD thesis, University of Washington, USA, 2015. URL: https://hdl.handle.net/1773/33697.
  • [10] Paraschos Koutris, Paul Beame, and Dan Suciu. Worst-case optimal algorithms for parallel query processing. In Wim Martens and Thomas Zeume, editors, 19th International Conference on Database Theory, ICDT 2016, volume 48 of LIPIcs, pages 8:1–8:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPIcs.ICDT.2016.8.
  • [11] Miao Qiao and Yufei Tao. Two-attribute skew free, isolated cp theorem, and massively parallel joins. In PODS, pages 166–180, 2021. doi:10.1145/3452021.3458321.
  • [12] Yufei Tao. A simple parallel algorithm for natural joins on binary relations. In Carsten Lutz and Jean Christoph Jung, editors, 23rd International Conference on Database Theory, ICDT 2020, volume 155 of LIPIcs, pages 25:1–25:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.ICDT.2020.25.
  • [13] Yufei Tao. Parallel acyclic joins with canonical edge covers. In Dan Olteanu and Nils Vortmeier, editors, 25th International Conference on Database Theory, ICDT 2022, volume 220 of LIPIcs, pages 9:1–9:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.ICDT.2022.9.

Appendix A Proofs for Section 4

A.1 Proof of Proposition 12

One of the most well-known algorithms in the literature is the HyperCube algorithm [2, 9], which is known to compute join queries with a load based on shares assigned to the attributes of q given that some degree constraints on relations in the input database are true. The shares assigned based on fractional vertex covers for q have a special interest, because they are sufficient to find an optimal assignment of shares. The following result (translated to our terminology) is known:

Theorem 24 ([9]).

Let q be a join query and D an instance for q. Let pα1,,pαk be the shares of the HyperCube algorithm with each pαi is for a specific attribute αiattrs(q). Suppose that for every relation Rrels(q) and every B-tuple 𝐭 with Battrsq(R), we have degRD(𝐭)|RD|β|B|αBpα for some constant β>0. Then w.h.p. the maximum load per server is 𝒪~(maxRrels(q){|RD|αattrsq(R)pα}).

Its proof follows from the next lemma:

Lemma 25 ([9]).

Let q be a join query and D an instance for q. Let Rrels(q) with α1,,αr be the attributes of R listed in some order. Let pα1,,pαr be integers and let p=ipαi. Suppose that we hash each tuple 𝐭RD to the bin (h1(𝐭[{α1}]),,hr(𝐭[{αr}])), where h1,,hr are independent and perfectly random hash functions from the domain dom to pα1,,pαr respectively. Then suppose that for every B-tuple 𝐭 with Battrsq(R) we have

degRD(𝒕)|RD|β|B|αBpα

for some constant β>0. Then the probability that the maximum load exceeds 𝒪~(|RD|p) is exponentially small in p.

Next, we show that compatible weight mappings resemble the degree constraints required by the HyperCube algorithm. More precisely, let q be a join query, D an instance for q, 𝒓 a spectrum, 𝒞 a configuration of q adhering to 𝒓, the fragment Fragment(D,𝒓,𝒞), and f a compatible weight mapping for q w.r.t. 𝒞 with f(Af)γ(q) and with Af=attrs(q). (Notice that in case Afattrs(q), then we assume f(α)=0 for every αattrs(q).) Moreover, let p be some integer and pα:=pf(α)/γ(q) be the share used by HyperCube for attribute α. Now, suppose that a relation Rrels(q) is covered by f (i.e., f(attrsq(R))1 and Rrels(coverq,f)), then we argue next that w.h.p. the load of R over using HyperCube does not exceed 𝒪~(|D|/p1/γ(q)).

To compute the load of R over , we will rely on the HyperCube, as already mentioned, by assigning a share of pα:=pf(α)/γ(q) to every attribute α. When f(attrsq(R))1, then we can verify that it satisfies the degree constraints of Lemma 25, and hence, the load of R is in 𝒪~(|D|/p1/γ(q)). Indeed, since f is compatible w.r.t. 𝒞 and f(attrsq(R))1, we know that f(B)𝒞(B)1 for every Battrsq(R). Accordingly, the following holds for any m: m/p𝒞(B)/γ(q)m/pf(B)/γ(q).

Since every B-tuple 𝒕 in is light w.r.t. 𝒞(B), we obtain that

degR(𝒕)deg(𝒕)|D|p𝒞(B)/γ(q)|D|pf(B)/γ(q).

Moreover, we can see that

|D|pf(B)/γ(q)=k.|RD|pf(B)/γ(q)=|RD|1/kpf(B)/γ(q)|RD|(1/k)|B|pf(B)/γ(q)=|RD|β|B|pf(B)/γ(q)

with k=|rels(q)| and β=1/k, which yields that degR(𝒕)|RD|β|B|pf(B)/γ(q). Thus, we obtain that the load of R does not exceed 𝒪~(|RD|pf(attrsq(R))/γ(q))=𝒪~(|D|pf(attrsq(R))/γ(q)) w.h.p. by Lemma 25 and the fact that |D|=k|RD|. Moreover, from the fact that R is covered by f (i.e., f(attrsq(R))=1), we guarantee that the load of R does not exceed 𝒪~(|D|/p1/γ(q)).

Now, consider the case when f(attrsq(R))>1. In order to show that the load will not exceed the required load, we will show that if we hash partition R using slightly smaller buckets by assigning a lower share compared to the shares assigned by f (i.e., less bins) instead, we can still guarantee a load of 𝒪~(|D|/p1/γ(q)). Hence, the load can only improve when we assign the larger shares.

Thereto, for a relation R that is covered by f, we use the shares pα:=pf(α)/γ(q) with f(α):=f(α)/f(attrsq(R)) for every αattrsq(R). Notice that for each α, we can verify that pαpα since f(attrsq(R))>1. From the compatibility of f, we know that, for every Battrsq(R), min{1,f(B)}𝒞(B) which implies that f(B)𝒞(B). Remark that f(attrsq(R))=1 according to this construction. Therefore, using the argument mentioned above for the case when f(attrsq(R))1, we see that with pf(attrsq(R))/γ(q) servers, the load of R is indeed 𝒪~(|D|/p1/γ(q)).

Appendix B Proofs for Section 5

B.1 Proof of Lemma 15

In what follows, let a1=(X1,Z1),,ak=(Xk,Zk) be the set of anchors in 𝒜 in some order. Recall that each group of p𝒉 servers is defined as p𝒉:=pc(f)/c(𝒮)×ai𝒜pai, where pai:=p1/c(𝒮)degD(𝒉[Zi])degs(Zi). Moreover, the set 𝑯 that 𝒉 ranges over is defined as

{𝒕𝒕[Z]tup(Z) for every (X,Z)𝒜 and 𝒕[B]tup(B) for every B𝒫}.

First, we establish item (1) in Lemma 15 in the following lemma.

Lemma 26.

For a join query q, a fragment of D w.r.t. 𝐫 and 𝒞, and a solution 𝒮=(𝒜,f,𝒫) for q w.r.t. 𝒞, we have

  1. 1.

    the number of tuples over attrs(𝒫) in is at most pc(𝒫)/c(𝒮);

  2. 2.

    𝒉[H(𝒜)]ai𝒜paipc(𝒜)/c(𝒮); and

  3. 3.

    𝒉p𝒉p.

Proof.

(1) By definition, we know that c(B)>𝒞(B) for any set of attributes B in 𝒫, and that all of B-tuples in are heavy for c(B). Consequently, we have at most pc(B)/c(𝒮) heavy tuples in . Now recall that the sets of attributes in 𝒫 are pair-wise disjoint. Hence, the total number of heavy tuples over attrs(𝒫) is at most B𝒫pc(B)/c(𝒮)=pc(𝒫)/c(𝒮).

(2) First remark that the values for 𝒉[Zi] ranges over the tuples in Si=tup(Zi), while 𝒉[H(𝒜)] ranges over the tuples from 𝑯[H(𝒜)]. We can see that 𝑯[H(𝒜)]S1××Sk. Indeed, 𝑯[H(𝒜)] is a restriction of the values in S1××Sk.

𝒉[H(𝒜)]ai𝒜pai =𝒉[H(𝒜)]ai𝒜p1/c(𝒮)degD(𝒉[Z])degs(Z)=pc(𝒜)/c(𝒮)𝒉[H(𝒜)]ai𝒜degD(𝒉[Zi])degs(Zi)
pc(𝒜)/c(𝒮)(𝒉[Z1],,𝒉[Zk])S1××Skai𝒜degD(𝒉[Zi])degs(Zi)
=pc(𝒜)/c(𝒮)ai𝒜𝒉[Zi]SidegD(𝒉[Zi])degs(Zi)=pc(𝒜)/c(𝒮)ai𝒜degs(Zi)degs(Zi)
=pc(𝒜)/c(𝒮).

The inequality holds since the possible values for 𝒉[H(𝒜)] are a subset of the values possible for (𝒉[Z1],,𝒉[Zk]). Moreover, the second to last equality follow from the definition of degs(Zi) (recall that degs(Z):=𝒕tup(Z)degD(𝒕)).

(3) Now, we verify that 𝒉p𝒉p as follows:

𝒉p𝒉=𝒉pc(f)/c(𝒮)×ai𝒜pai=pc(f)/c(𝒮)𝒉ai𝒜paipc(f)/c(𝒮)𝒉[attrs(𝒫)]𝒉[H(𝒜)]ai𝒜paipc(f)/c(𝒮)pc(𝒫)/c(𝒮)𝒉[H(𝒜)]ai𝒜paipc(f)/c(𝒮)pc(𝒫)/c(𝒮)p|𝒜|/c(𝒮)=p

where the first inequality follows from the fact that possible values of 𝒉 are a subset of the ones possible over attrs(𝒫) and H(𝒜) disjointly. The second inequality follows from item (1), and the last inequality follows from item (2).

Next, we establish item (2) in Lemma 15 through the following lemma.

Lemma 27.

For a join query q, a fragment of D w.r.t. 𝐫 and 𝒞, and a solution 𝒮=(𝒜,f,𝒫) for q w.r.t. 𝒞, we have

  1. 1.

    pai1 for every anchor ai𝒜 and every tuple 𝒉 over H(𝒜)attrs(𝒫) in ; and

  2. 2.

    p𝒉p1/c(𝒮).

Proof.

(1) Equivalently, we want to show that degD(𝒉[Zi])degs(Zi)/p1/c(𝒮). Since 𝒞(Zi)<1, we obtain that indeed degD(𝒉[Zi])|D|/p1/γ(q). Moreover, it is clear that |D|degs(Zi) and it is safe to assume that c(𝒮)γ(q).

(2) Next, we show that p𝒉p1/c(𝒮). Indeed, by definition, it is clear that pc(f)/c(𝒮)p1/c(𝒮). Moreover, from item (1), we guarantee that ai𝒜pai1. Hence,

(pc(f)/c(𝒮)×ai𝒜pai)p1/c(𝒮).

B.2 Proof of Lemma 17

Proof.

Since rels(qb)Heavyrels(q,𝒞) and given that the size of a relation in F𝒉 cannot exceed its size in , it is sufficient to show that for every relation RHeavyrels(q,𝒞), we have that |R|<|D|/p1/γ(q) in order to establish our proof of item (1). Let R be a relation from the set Heavyrels(q,𝒞). By definition, 𝒞({α})<1 for every αattrsq(R). Moreover, by definition, we know that degD(𝒕)>|D|/p1/γ(q) for any {α}-tuple 𝒕 in D. Therefore, the total number of possible {α}-tuples in D respecting C({α})<1 is clearly less than p1/γ(q). Consequently, the total number of (attrsq(R))-tuples in D is less than p|attrsq(R)|/γ(q). Since is the instance that corresponds to such a configuration, we see that |R|<p|attrsq(R)|/γ(q) which is considered a low load compared to |D|/p1/γ(q) by our the assumption that |D|p (and precisely, the load is larger than maxRrels(q){p|attrsq(R)|/γ(q)}).

The proof of item (2) directly follows from Proposition 12. As for item (3), it is sufficient to show that the set of relations anchoring a=(X,Z)𝒜 can be hash-partitioned based on the attributes of X with the required load. Accordingly, Lemma 25 requires that deg(𝒕)|D|/pa for every X-tuple 𝒕. Recall that by definition of anchor, we know that deg(𝒕)degD(𝒕)|D|/p1/γ(q)|D|/p1/c(𝒮) and we also know that |D|/p1/c(𝒮)|D|/pa because pap1/c(𝒮) (recall that pa=p1/c(𝒮)degD(𝒉[Z])degs(Z) for some tuple 𝒉.)