Abstract 1 Introduction 2 Preliminaries 3 Beyond Oblivious Nested-loop Join 4 Warm Up: Triangle Join 5 Oblivious Worst-case Optimal Join Algorithm 6 Conclusion References

Optimal Oblivious Algorithms for Multi-Way Joins

Xiao Hu ORCID University of Waterloo, Canada Zhiang Wu ORCID University of Waterloo, Canada
Abstract

In cloud databases, cloud computation over sensitive data uploaded by clients inevitably causes concern about data security and privacy. Even if cryptographic primitives and trusted computing environments are integrated into query processing to safeguard the actual contents of the data, access patterns of algorithms can still leak private information about data. Oblivious RAM (ORAM) and circuits are two generic approaches to address this issue, ensuring that access patterns of algorithms remain oblivious to the data. However, deploying these methods on insecure algorithms, particularly for multi-way join processing, is computationally expensive and inherently challenging.

In this paper, we propose a novel sorting-based algorithm for multi-way join processing that operates without relying on ORAM simulations or other security assumptions. Our algorithm is a non-trivial, provably oblivious composition of basic primitives, with time complexity matching the insecure worst-case optimal join algorithm, up to a logarithmic factor. Furthermore, it is cache-agnostic, with cache complexity matching the insecure lower bound, also up to a logarithmic factor. This clean and straightforward approach has the potential to be extended to other security settings and implemented in practical database systems.

Keywords and phrases:
oblivious algorithms, multi-way joins, worst-case optimality
Copyright and License:
[Uncaptioned image] © Xiao Hu and Zhiang Wu; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Security and privacy Management and querying of encrypted data
; Information systems Join algorithms
Related Version:
Full Version: https://arxiv.org/pdf/2501.04216 [1]
Editors:
Sudeepa Roy and Ahmet Kara

1 Introduction

In outsourced query processing, a client entrusts sensitive data to a cloud service provider, such as Amazon, Google, or Microsoft, and subsequently issues queries to the provider. The service provider performs the required computations and returns the results to the client. Since these computations are carried out on remote infrastructure, ensuring the security and privacy of query evaluation is a critical requirement. Specifically, servers must remain oblivious to any information about the underlying data throughout the computation process. To achieve this, advanced cryptographic techniques and trusted computing hardware are employed to prevent servers from inferring the actual contents of the data [34, 19]. However, the memory accesses during execution may still lead to information leakage, posing an additional challenge to achieving comprehensive privacy. For example, consider the basic (natural) join operator on two database instances: R1={(ai,bi):i[N]}S1={(bi,ci):i[N]} and R2={(ai,b1):i[N]}S2={(b1,ci):i[N]} for some N+, where each pair of tuples can be joined if and only if they have the same b-value. Suppose each relation is sorted by their b-values. Using the merge join algorithm, there is only one access to S1 between two consecutive accesses to R1, but there are N accesses to S2 between two consecutive accesses to R2. Hence, the server can distinguish the degree information of join keys of the input data by observing the sequence of memory accesses. Moreover, if the server counts the total number of memory accesses, it can further infer the number of join results of the input data.

The notion of obliviousness was proposed to formally capture such a privacy guarantee on the memory access pattern of algorithms [31, 30]. This concept has inspired a substantial body of research focused on developing algorithms that achieve obliviousness in practical database systems [55, 24, 20, 17]. A generic approach to achieving obliviousness is Oblivious RAM (ORAM) [31, 41, 29, 52, 23, 48, 8], which translates each logical access into a poly-logarithmic (in terms of the data size) number of physical accesses to random locations of the memory. but the poly-logarithmic additional cost per memory access is very expensive in practice [16]. Another generic approach involves leveraging circuits [53, 26]. Despite their theoretical promise, generating circuits is inherently complex and resource-intensive, and integrating such constructions into database systems often proves to be inefficient. These challenges highlight the advantages of designing algorithms that are inherently oblivious to the input data, eliminating the need for ORAM frameworks or circuit constructions.

In this paper, we take on this question for multi-way join processing, and examine the insecure worst-case optimal join (WCOJ) algorithm [43, 44, 50], that can compute any join queries in time proportional to the maximum number of join results. Our objective is to investigate the intrinsic properties of the WCOJ algorithm and transform it into an oblivious version while preserving its optimal complexity guarantee.

1.1 Problem Definition

Multi-way join.

A (natural) join query can be represented as a hypergraph 𝒬=(𝒱,) [2], where 𝒱 models the set of attributes, and 2𝒱 models the set of relations. Let dom(x) be the domain of attribute x𝒱. An instance of 𝒬 is a function that maps each e to a set of tuples Re, where each tuple tRe specifies a value in dom(x) for each attribute xe. The result of a join query 𝒬 over an instance , denoted by 𝒬(), is the set of all combinations of tuples, one from each relation Re, that share the same values in their common attributes, i.e., 𝒬()={tx𝒱dom(x)e,teRe,πet=te}.

Let N=e|Re| be the input size of instance , i.e., the total number of tuples over all relations. Let OUT=|𝒬(R)| be the output size of the join query 𝒬 over instance . We study the data complexity [2] of join algorithms by measuring their running time in terms of input and output size of the instance. We consider the size of 𝒬, i.e., |𝒱| and ||, as constant.

Model of computation.

We consider a two-level hierarchical memory model [40, 18]. The computation is performed within trusted memory, which consists of M registers of the same width. For simplicity, we assume that the trusted memory size is cM, where c is a constant. This assumption will not change our results by more than a constant factor. Since we assume the query size as a constant, the arity of each relation is irrelevant. Each tuple is assumed to fit into a single register, with one register allocated per tuple, including those from input relations as well as intermediate results. We further assume that cM tuples from any set of relations can fit into the trusted memory. Input data and all intermediate results generated during the execution are encrypted and stored in an untrusted memory of unlimited size. Both trusted and untrusted memory are divided into blocks of size B. One memory access moves a block of B consecutive tuples from trusted to untrusted memory or vice versa. The complexity of an algorithm is measured by the number of such memory accesses.

An algorithm typically operates by repeating the following three steps: (1) read encrypted data from the untrusted memory into the trusted memory, (2) perform computation inside the trusted memory, and (3) Encrypt necessary data and write them back to the untrusted memory. Adversaries can only observe the address of the blocks read from or written to the untrusted memory in (1) and (3), but not data contents. They also cannot interfere with the execution of the algorithm. The sequence of memory accesses to the untrusted memory in the execution is referred to as the “access pattern” of the algorithm. In this context, we focus on two specific scenarios of interest:

  • Random Access Model (RAM). This model can simulate the classic RAM model with M=O(1) and B=1, where the trusted memory corresponds to O(1) registers and the untrusted memory corresponds to the main memory. The time complexity in this model is defined as the number of accesses to the main memory by a RAM algorithm.

  • External Memory Model (EM). This model can naturally simulate the classic EM model [4, 51], where the trusted memory corresponds to the main memory and the untrusted memory corresponds to the disk. Following prior work [28, 21, 18], we focus on the cache-agnostic EM algorithms, which are unaware of the values of M (memory size) and B (block size), a property commonly referred to as cache-oblivious in the literature. To avoid ambiguity, we use the terms “cache-agnostic” to refer to “cache-oblivious” and “oblivious” to refer to “access-pattern-oblivious” throughout this paper. The advantages of cache-agnostic algorithms have been extensively studied, particularly in multi-level memory hierarchies. A cache-agnostic algorithm can seamlessly adapt to operate efficiently between any two adjacent levels of the hierarchy. We adopt the tall cache assumption, M=Ω(B2) and further M=Ω(log1+ϵN)111In this work, log() always means log2() and should be distinguished from logMB(). for an arbitrarily small constant ϵ(0,1), and the wide block assumption, B=Ω(log0.55N). These are standard assumptions widely adopted in the literature of EM algorithms [4, 51, 7, 28, 21, 18]. The cache complexity in this model is defined as the number of accesses to the disk by an EM algorithm.

Oblivious Algorithms.

The notion of obliviousness is defined based on the access pattern of an algorithm. Memory accesses to the trusted memory are invisible to the adversary and, therefore, have no impact on security. Let 𝒜 be an algorithm, 𝒬 a join query, and an arbitrary input instance of 𝒬. We denote 𝖠𝖼𝖼𝖾𝗌𝗌𝒜(𝒬,) as the sequence of memory accesses made by 𝒜 to the untrusted memory when given (𝒬,) as the input, where each memory access is a read or write operation associated with a physical address. The join query 𝒬 and the size N of the input instance are considered non-sensitive information and can be safely exposed to the adversary. In contrast, all input tuples are considered sensitive information and should be hidden from adversaries. This way, the access pattern of an oblivious algorithm 𝒜 should only depend on 𝒬 and N, ensuring no leakage of sensitive information.

Definition 1 (Obliviousness [30, 31, 15]).

An algorithm 𝒜 is oblivious for a join query 𝒬, if given an arbitrary parameter N+, for any pair of instances , of 𝒬 with input size N, 𝖠𝖼𝖼𝖾𝗌𝗌𝒜(𝒬,)𝛿𝖠𝖼𝖼𝖾𝗌𝗌𝒜(𝒬,), where δ is a negligible function in terms of N. Specifically, for any positive constant c, there exists Nc such that δ(N)<1Nc for any N>Nc. The notation 𝛿 indicates the statistical distance between two distributions is at most δ.

This notion of obliviousness applies to both deterministic and randomized algorithms. For a randomized algorithm, different execution states may arise from the same input instance due to the algorithm’s inherent randomness. Each execution state corresponds to a specific sequence of memory accesses, allowing the access pattern to be modeled as a random variable with an associated probability distribution over the set of all possible access patterns. The statistical distance between two probability distributions is typically quantified using standard metrics, such as the total variation distance. A randomized algorithm is indeed oblivious if its access pattern exhibits statistically indistinguishable distributions across all input instances of the same size. Relatively simpler, a deterministic algorithm is oblivious if it displays an identical access pattern for all input instances of the same size.

1.2 Review of Existing Results

Oblivious RAM.

ORAM is a general randomized framework designed to protect access patterns [31]. In ORAM, each logical access is translated into a poly-logarithmic number of random physical accesses, thereby incurring a poly-logarithmic overhead. Goldreich et al. [31] established a lower bound Ω(logN) on the access overhead of ORAMs in the RAM model. Subsequently, Asharov et al. [8] proposed a theoretically optimal ORAM construction with an overhead of O(logN) in the RAM model under the assumption of the existence of a one-way function, which is rather impractical [47]. It remains unknown whether a better cache complexity than O(logN) can be shown for such a construction. Path ORAM [48] is currently the most practical ORAM construction, but it introduces an O(log2N) overhead and requires Ω(1) trusted memory. In the EM model, one can place the tree data structures for ORAM in an Emde Boas layout, resulting in a memory access overhead of O(logNlogBN).

Insecure Join Algorithms.

The WCOJ algorithm [43] have been developed to compute any join query in O(Nρ) time222A hashing-based algorithm achieves O(Nρ) time in the worst case using the lazy array technique [27]., where ρ is the fractional edge cover number of the join query (formally defined in Section 2.1). The optimality is implied by the AGM bound [9]. 333The maximum number of join results produced by any instance of input size N is O(Nρ), which is also tight in the sense that there exists some instance of input size N that can produce Θ(Nρ) join results. However, these WCOJ algorithms are not oblivious. In Section 4, we use triangle join as an example to illustrate the information leakage from the WCOJ algorithm. Another line of research also explored output-sensitive join algorithms. A join query can be computed in O((Nsubw+OUT)𝗉𝗈𝗅𝗒𝗅𝗈𝗀N)) time [54, 3], where subw is the submodular-width of the join query. For example, subw=1 if and only if the join query is acyclic [12, 25]. These algorithms are also not oblivious due to various potential information leakages. For instance, the total number of memory accesses is influenced by the output size, which can range from a constant to a polynomially large value relative to the input size. A possible mitigation strategy is worst-case padding, which involves padding dummy accesses to match the worst case. However, this approach does not necessarily result in oblivious algorithms, as their access patterns may still vary significantly across instances with the same input size.

In contrast, there has been significantly less research on multi-way join processing in the EM model. First of all, we note that an EM version of the WCOJ  algorithm incurs at least Ω(NρB) cache complexity since there are Θ(Nρ) join results in the worst case and all join results should be written back to disk. For the basic two-way join, the nested-loop algorithm has cache complexity O(N2B) and the sort-merge algorithm has cache complexity O(NBlogMBNB+OUTB). For multi-way join queries, an EM algorithm with cache complexity O(NρMρ1BlogMBNB+OUTB) has been achieved for Berge-acyclic joins [37], α-acyclic joins [36, 39], graph joins [38, 22] and Loomis-Whitney joins [39].444Some of these algorithms have been developed for the Massively Parallel Computation (MPC) model [11] and can be adapted to the EM model through the MPC-to-EM reduction [39]. These results were previously stated without including the output-dependent term OUTB since they do not consider the cost of writing join results back to disk. Again, even padding the output size to be as large as the worst case, these algorithms remain non-oblivious since their access patterns heavily depend on the input data. Furthermore, even in the insecure setting, no algorithm with a cache complexity O(NρB) is known for general join queries.

Oblivious Join Algorithms.

Oblivious algorithms have been studied for join queries in both the RAM and EM models. In the RAM model, the naive nested-loop algorithm can be transformed into an oblivious one by incorporating some dummy writes, as it enumerates all possible combinations of tuples from input relations in a fixed order. This algorithm runs in O(N||) time, where || is the number of relations in the join query. Wang et al. [53] designed circuits for conjunctive queries - capturing all join queries as a special case - whose time complexity matches the AGM bound up to poly-logarithmic factors. Running such a circuit will automatically yield an oblivious join algorithm with O(Nρ𝗉𝗈𝗅𝗒𝗅𝗈𝗀N) time complexity. By integrating the insecure WCOJ algorithm [44] with the optimal ORAM [8], it is possible to achieve an oblivious algorithm with O(NρlogN) time complexity, albeit under restrictive theoretical assumptions. Alternatively, incorporating the insecure WCOJ algorithm into the Path ORAM yields an oblivious join algorithm with O(Nρlog2N) time complexity.

In the EM model, He et al. [35] proposed a cache-agnostic nested-loop join algorithm for the basic two-way join RS with O(|R||S|B) cache complexity, which is also oblivious. Applying worst-case padding and the optimal ORAM construction to the existing EM join algorithms, we can derive an oblivious join algorithm with O(NρBlogMBNBlogN) cache complexity for specific cases such as acyclic joins, graph joins and Loomis-Whitney joins. However, these algorithms are not cache-agnostic. For general join queries, no specific oblivious algorithm has been proposed for the EM model, aside from results derived from the oblivious RAM join algorithm. These results yield cache complexities of either O(NρlogN) or O(NρlogNlogBN), as they rely heavily on retrieving tuples from hash tables or range search indices.

Table 1: Comparison between previous and new oblivious algorithms for multi-way joins. N is the input size. ρ and ρ are the input join query’s fractional and integral edge cover numbers, respectively. M is the trusted memory size. B is the block size.
Previous Results New Results
RAM model O(NρlogN) [44, 8] O(NρlogN)
(one-way function assumption) (no assumption)
Cache-agnostic O(Nmin{ρ+1,ρ}BlogMBNmin{ρ+1,ρ}B) O(NρBlogMBNρB)
EM model
(no assumption) (tall cache and wide block assumptions)

Relaxed Variants of Oblivious Join Algorithms.

Beyond fully oblivious algorithms, researchers have explored relaxed notions of obliviousness by allowing specific types of leakage, such as the join size, the multiplicity of join values, and the size of intermediate results. One relevant line of work examines join processing with released input and output sizes. For example, integrating an insecure output-sensitive join algorithm into an ORAM framework produces a relaxed oblivious algorithm with O((Nsubw+OUT)polylogN) time complexity. It is noted that relaxed oblivious algorithm with the same time complexity O((N+OUT)logN) have been proposed without requiring ORAM [6, 40] for the basic two-way join as well as acyclic joins. Although not fully oblivious, these algorithms serve as fundamental building blocks for developing our oblivious algorithms for general join queries. Another line of work considered differentially oblivious algorithms [15, 13, 18], which require only that access patterns appear similar across neighboring input instances. However, differentially oblivious algorithms have so far been limited to the basic two-way join [18]. This paper does not pursue this direction further.

1.3 Our Contribution

Our main contribution can be summarized as follows (see Table 1):

  • We give a nested-loop-based algorithm for general join queries with O(Nmin{ρ+1,ρ}logN) time complexity and O(Nmin{ρ+1,ρ}BlogMBNmin{ρ+1,ρ}B) cache complexity, where ρ and ρ are the fractional and integral edge cover number of the join query, respectively (formally defined in Section 2.1). This algorithm is also cache-agnostic. For classes of join queries with ρ=ρ, such as acyclic joins, even-length cycle joins and boat joins (see Section 3), this is almost optimal up to logarithmic factors.

  • We design an oblivious algorithm for general join queries with O(NρlogN) time complexity, which has matched the insecure counterpart by a logarithmic factor and recovered the previous ORAM-based result, which assumes the existence of one-way functions. This algorithm is also cache-agnostic, with O(NρBlogMBNρB) cache complexity. This cache complexity can be simplified to O(NρBlogMBNB) when B<Ncρc1 for some sufficiently large constant c. This result establishes the first worst-case near-optimal join algorithm in the insecure EM model when all join results are returned to disk.

  • We develop an improved algorithm for relaxed two-way joins with better cache complexity, which is also cache-agnostic. By integrating our oblivious algorithm with generalized hypertree decomposition [33], we obtain a relaxed oblivious algorithm for general join queries with O((Nfhtw+OUT)logN) time complexity and O(Nfhtw+OUTBlogMBNfhtw+OUTB) cache complexity, where fhtw is the fractional hypertree width of the input query.

Roadmap.

This paper is organized as follows. In Section 2, we introduce the preliminaries for building our algorithms. In Section 3, we show our first algorithm based on the nested-loop algorithm. While effective, this algorithm is not always optimal, as demonstrated with the triangle join. In Section 4, we use triangle join to demonstrate the leakage of insecure WCOJ algorithm and show how to transform it into an oblivious algorithm. We introduce our oblivious WCOJ algorithm for general join queries in Section 5, and conclude in Section 6.

2 Preliminaries

2.1 Fractional and Integral Edge Cover Number

For a join query 𝒬=(𝒱,), a function W:[0,1] is a fractional edge cover for 𝒬 if e:xeW(e)1 for any x𝒱. An optimal fractional edge cover is the one minimizing eW(e), which is captured by the following linear program:

mineW(e)s.t. e:xeW(e) 1,x𝒱;W(e)[0,1],e (1)

The optimal value of (1) is the fractional edge cover number of 𝒬, denoted as ρ(𝒬). Similarly, a function W:{0,1} is an integral edge cover if e:xeW(e)1 for any x𝒱. The optimal integral edge cover is the one minimizing eW(e), which can be captured by a similar linear program as (1) except that W(e)[0,1] is replaced with W(e){0,1}. The optimal value of this linear program is the integral edge cover number of 𝒬, denoted as ρ(𝒬).

2.2 Oblivious Primitives

We introduce the following oblivious primitives, which form the foundation of our algorithms. Each primitive displays an identical access pattern across instances of the same input size.

Linear Scan.

Given an input array of N elements, a linear scan of all elements can be done with O(N) time complexity and O(NB) cache complexity in a cache-agnostic way.

Sort [5, 10].

Given an input array of N elements, the goal is to output an array according to some predetermined ordering. The classical bitonic sorting network [10] requires O(Nlog2N) time. Later, this time complexity has been improved to O(NlogN) [5] in 1983. However, due to the large constant parameter hidden behind O(NlogN), the classical bitonic sorting is more commonly used in practice, particularly when the size N is not too large. Ramachandran and Shi [45] showed a randomized algorithm for sorting with O(NlogN) time complexity and O(NBlogMBNB) cache complexity under the tall cache assumption.

Compact [32, 46].

Given an input array of N elements, some of which are distinguished as , the goal is to output an array with all non-distinguished elements moved to the front before any , while preserving the ordering of non-distinguished elements. Lin et al. [42] showed a randomized algorithm for compaction with O(NloglogN) time complexity and O(NB) cache complexity under the tall cache assumption.

We use the above primitives to construct additional building blocks for our algorithms. To ensure obliviousness, all outputs from these primitives include a fixed size equal to the worst-case scenario, i.e., N, comprising both real and dummy elements. All these primitives achieve O(NlogN) time complexity and O(NBlogMBNB) cache complexity. Further details are provided in the full version [1].

SemiJoin.

Given two input relations R, S of at most N tuples and their common attribute(s) x, the goal is to output the set of tuples in R that can be joined with at least one tuple in S.

Project.

Given an input relation R of N tuples defined over attributes e, and a subset of attributes xe, the goal is to output {tR:πxt}, ensuring no duplication.

Intersect.

Given two input arrays R,S of at most N elements, the goal is to output RS.

Augment.

Given a relation R and k additional relations S1,S2,,Sk (each with at most N tuples) sharing common attribute(s) x, the goal is to attach each tuple t the number of tuples in Si (for each i[k]) that can be joined with t on x.

We note that any sequential composition of oblivious primitives yields more complex algorithms that remain oblivious, which is the key principle underlying our approach.

2.3 Oblivious Two-way Join

NestedLoop.

Nested-loop algorithm can compute RS with O(|R||S|) time complexity, which iterates all combinations of tuples from R,S and writes a join result (or a dummy result, if necessary, to maintain obliviousness). He et al. [35] proposed a cache-agnostic version in the EM model with O(|R||S|B) cache complexity, which is also oblivious.

Theorem 2 ([35]).

For RS, there is a cache-agnostic algorithm that can compute RS with O(|R||S|) time complexity and O(|R||S|B) cache complexity, whose access pattern only depends on M,B,|R| and |S|.

RelaxedTwoWay.

The relaxed two-way join algorithm [6, 40] takes as input two relations R,S and a parameter τ|RS|, and output a table of τ elements containing join results of RS, whose access pattern only depends on |R|,|S| and τ. This algorithm can also be easily transformed into a cache-agnostic version with O((|R|+|S|+τ)log(|R|+|S|+τ)) time complexity and O(|R|+|S|+τBlogτ) cache complexity. In the full version [1], we show how to improve the cache complexity of this algorithm without sacrificing the time complexity.

Theorem 3.

For RS and a parameter τ|RS|, there is a cache-agnostic algorithm that can compute RS with O((|R|+|S|+τ)log(|R|+|S|+τ)) time complexity and O(|R|+|S|+τBlogMB|R|+|S|+τB) cache complexity under the tall cache and wide block assumptions, whose access pattern only depends on M,B,|R|,|S| and τ.

3 Beyond Oblivious Nested-loop Join

Although the nested-loop join algorithm is described for the two-way join, it can be extended to multi-way joins. For a general join query with k relations, the nested-loop primitive can be recursively invoked k1 times, resulting in an oblivious algorithm with O(NkB) cache complexity. Careful inspection reveals that we do not necessarily feed all input relations into the nested loop; instead, we can restrict enumeration to combinations of tuples from relations included in an integral edge cover of the join query. Recall that for 𝒬=(𝒱,), an integral edge cover of 𝒬 is a function W:{0,1}, such that e:xeW(e)1 holds for every x𝒱. While enumerating combinations of tuples from relations “chosen” by W, we can apply semi-joins using remaining relations to filter intermediate join results.

As described in Algorithm 1, it first chooses an optimal integral edge cover W of 𝒬 (line 1), and then invokes the NestedLoop primitive to iteratively compute the combinations of tuples from relations with W(e)=1 (line 7), whose output is denoted as L. Meanwhile, we apply the semi-join between L and the remaining relations (line 8).

Below, we analyze the complexity of this algorithm. First, as ||ρ, the intermediate join results materialized in the while-loop is at most O(Nρ). After semi-join filtering, the number of surviving results is at most O(Nρ). In this way, the number of intermediate results materialized by line 7 is at most O(Nρ+1). Putting everything together, we obtain:

Theorem 4.

For a general join query 𝒬, there is an oblivious and cache-agnostic algorithm that can compute 𝒬() for an arbitrary instance of input size N with O(Nmin{ρ,ρ+1}) time complexity and O(Nmin{ρ,ρ+1}BlogMBNmin{ρ,ρ+1}B) cache complexity under the tall cache and wide block assumptions, where ρ and ρ are the optimal fractional and integral edge cover number of 𝒬, respectively.

It is important to note that any oblivious join algorithm incurs a cache complexity of Ω(NρB), so Theorem 4 is optimal up to a logarithmic factor for join queries where ρ=ρ. Below, we list several important classes of join queries that exhibit this desirable property:

Algorithm 1 ObliviousNestedLoopJoin(𝒬,).
1
2W an optimal integral edge cover of 𝒬, L;
3 {e:W(e)=1};
4 while  do
5 e an arbitrary relation in ;
6 {e};
7 if L= then LRe;
8 else LNestedLoop(L,Re);
9 foreach e{e} do LSemiJoin(L,Re);
10 
11return L;
Example 5 (α-acyclic Join).

A join query 𝒬 is α-acyclic [12, 25] if there is a tree structure 𝒯 of 𝒬=(𝒱,) such that (1) there is a one-to-one correspondence between relations in 𝒬 and nodes in 𝒯; (2) for every attribute x𝒱, the set of nodes containing x form a connected subtree of 𝒯. Any α-acyclic join admits an optimal fractional edge cover that is integral [36].

Example 6 (Even-length Cycle Join).

An even-length cycle join is defined as 𝒬=R1(x1,x2)R2(x2,x3)Rk1(xk1,xk)Rk(xk,x1) for some even integer k. It has two integral edge covers {R1,R3,,Rk1} and {R2,R4,,Rk}, both of which are also an optimal fractional edge cover. Hence, ρ=ρ=k2.

Example 7 (Boat Join).

A boat join is defined as 𝒬=R1(x1,y1)R2(x2,y2)Rk(xk,yk)Rk+1(x1,x2,,xk)Rk+2(y1,y2,,yk). It has an integral edge cover {R1,R2} that is also an optimal fractional edge cover. Hence, ρ=ρ=2.

4 Warm Up: Triangle Join

The simplest join query that oblivious nested-loop join algorithm cannot solve optimally is the triangle join 𝒬=R1(x2,x3)R2(x1,x3)R3(x1,x2), which has ρ=2 and ρ=32. While various worst-case optimal algorithms for the triangle join have been proposed in the RAM model, none of these are oblivious due to their inherent leakage of intermediate statistics. Below, we outline the issues with existing insecure algorithms and propose a strategy to make them oblivious.

Insecure Triangle Join Algorithm 2.

We start with attribute x1. Each value adom(x1) induces a subquery 𝒬a=R1(σx1=aR2)(σx1=aR3). Moreover, a value adom(x1) is heavy if |πx3σx1=aR2||πx2σx1=aR3| is greater than |R1|, and light otherwise. If a is light, 𝒬a is computed by materializing the Cartesian product between πx3σx1=aR1 and πx2σx1=aR3, and then filter the intermediate result by a semi-join with R1. Every surviving tuple forms a join result with a, which will be written back to untrusted memory. If a is heavy, 𝒬a is computed by applying the semi-joins between R1 with σx1=aR2 and σx1=aR3. This algorithm achieves a time complexity of O(N32) (see [43] for detailed analysis), but it leaks sensitive information through the following mechanisms:

  • |(πx1R2)(πx1R3)| is leaked by the number of for-loop iterations in line 2;

  • |πx2σx1=aR3| and |πx3σx1=aR2| are leaked by the number of for-loop iterations in line 4;

  • The sizes of heavy and light values in (πx1R2)(πx1R3) are leaked by the if-else condition in lines 3 and 6;

To protect intermediate statistics, we pad dummy tuples to every intermediate result (such as (πx1R2)(πx1R3), πx3σx1=aR2 and πx2σx1=aR3) to match the worst-case size N. To hide heavy and light values, we replace conditional if-else branches with a unified execution plan by visiting every possible combination of (πx2σx1=aR3)×(πx3σx1=aR2) and every tuple of R1. By integrating these techniques, this strategy leads to N2 memory accesses, hence destroying the power of two choices that is a key advantage in the insecure WCOJ algorithm.

Algorithm 2 Compute 𝒬 by power of two choices.
1 L;
2 foreach a(πx1R2)(πx1R3) do
3 if |σx1=aR2||σx1=aR3||R1| then
4    foreach (b,c)(πx2σx1=aR3)×(πx3σx1=aR2) do
5       if (b,c)R1 then write (a,b,c) to L;
6       
7    
8 else
9    foreach (b,c)R1 do
10       if (a,b)R3 and (a,c)R2 then write (a,b,c) to L;
11       
12    
13 
14return L;
Algorithm 3 Inject Obliviousness to Algorithm 2.
1 A(πx1R2)(πx1R3) by Project and Intersect;
2 AAugment(A,{R2,R3},x1);
3 A1,A2;
4 while read (a,Δ1,Δ2) from A do // Δ1=|πx3σx1=aR2| and Δ2=|πx2σx1=aR3|
5 if Δ1Δ2|R1| then write a to A1, write  to A2;
6 else write a to A2, write  to A1;
7 
8L1RelaxedTwoWay(A2,R1,N32);
9 L1SemiJoin(L1,R2), L1SemiJoin(L1,R3);
10 R2SemiJoin(R2,A1), R3SemiJoin(R3,A1);
11 L2RelaxedTwoWay(R2,R3,N32);
12 L2SemiJoin(L2,R1);
13 return Compact L1L2 while keeping the first N32 tuples;

Inject Obliviousness to Algorithm 2.

To inject obliviousness into Algorithm 2, Algorithm 3 leverages oblivious primitives to ensure the same access pattern across all instances of the input size. Here’s a breakdown of how this is achieved and why it works. We start with computing A=(πx1R2)(πx1R3) by the Intersect primitive. Then, we partition values in A into two subsets A1,A2, depending on the relative order between |πx3σx1=aR2||πx2σx1=aR3| and |R1|. We next compute the following two-way joins A2R1 and (R2A1)(R3A2) by invoking the RelaxedTwoWay primitive separately, each with the upper bound N32. At last, we filter intermediate join results by the SemiJoin primitive and remove unnecessary dummy tuples by the Compact primitive.

Analysis of Algorithm 3.

It suffices to show that |(R2A1)(R3A1)|N32 and |A2R1|N32, which directly follows from the query decomposition lemma [44]:

aAmin{|σx1=aR2||σx1=aR3|,|R1|}aA(|R2a||R3a|)12|R1a|12N32.

All other primitives have O(NlogN) time complexity and O(NBlogMBNB) cache complexity. Hence, this whole algorithm incurs O(N32logN) time complexity and O(N32BlogMBN32B) cache complexity. As each step is oblivious, the composition of all these steps is also oblivious.

Insecure Triangle Join Algorithm 4.

We start with attribute x1. We first compute the candidate values in x1 that appear in some join results, i.e., (πx1R2)(πx1R3). For each candidate value a, we retrieve the candidate values in x2 that can appear together with a in some join results, i.e., (πx2σx1=aR3)(πx2R1). Furthermore, for each candidate value b, we explore the possible values in x3 that can appear together with (a,b) in some join results. More precisely, every value c appears in πx3σx2=bR1 as well as πx3σx1=aR2 forms a triangle with a,b. This algorithm runs in O(N32) time (see [44] for detailed analysis). Similarly, it is not oblivious as the following intermediate statistics may be leaked:

  • |(πx1R2)(πx1R3)| is leaked by the number of for-loop iterations in line 2;

  • |(πx2σx1=aR3)(πx2R1)| is leaked by the number of for-loop iterations in line 3;

  • |(πx3σx2=bR2)(πx3πx1=aR2)| is leaked by the number of for-loop iterations in line 4;

Algorithm 4 Compute 𝒬 by delaying computation.
1 L;
2 foreach a(πx1R2)(πx1R3) do
3 foreach b(πx2σx1=aR3)(πx2R1) do
4    foreach c(πx3σx2=bR1)(πx3σx1=aR2) do
5       write (a,b,c) to L;
6       
7    
8 
9return L;
Algorithm 5 Inject Obliviousness to Algorithm 4.
1 R3Augment(R3,R1,x2), R3Augment(R3,R2,x1);
2 K1,K2;
3 while read (t,Δ1,Δ2) from R3 do // Suppose Δi=|Ri{t}|
4 if Δ1Δ2 then write t to K1, write  to K2;
5 else write t to K2, write  to K1;
6 
7L1RelaxedTwoWay(K1,R1,N32), L1SemiJoin(L1,R2);
8 L2RelaxedTwoWay(K2,R2,N32), L2SemiJoin(L2,R1);
9 return Compact L1L2 while keeping the first N32 tuples;

To achieve obliviousness, a straightforward solution is to pad every intermediate result with dummy tuples to match the worst-case size N. However, this would result in N3 memory accesses, which is even less efficient than the nested-loop-based algorithm in Section 3.

Inject Obliviousness to Algorithm 4.

We transform Algorithm 4 into an oblivious version, presented as Algorithm 5, by employing oblivious primitives. The first modification merges the first two for-loops (lines 2–3 in Algorithm 4) into one step (line 1 in Algorithm 5). This is achieved by applying the semi-joins on R3 using R1,R2 separately. Then, the third for-loop (line 4 in Algorithm 4) is replaced with a strategy based on the power of two choices. Specifically, for each surviving tuple (a,b)R3, we first compute the size of two lists, |πx3σx2=bR1| and |πx3σx1=aR2|, and put (a,b) into either K1 or K2, based on the relative order between |πx3σx2=bR1| and |πx3σx1=aR2|. We next compute the following two-way joins K1R1 and K2R2 by invoking the RelaxedTwoWay primitive, each with the upper bound N32 separately. Finally, we filter intermediate join results by the SemiJoin primitive and remove unnecessary dummy tuples by the Compact primitive.

Complexity of Algorithm 5.

It suffices to show that |K1R1|N32 and |K2R2|N32, which directly follows from the query decomposition lemma [44]:

(a,b)R3min{|πx3σx2=bR1|,|πx3σx1=aR2|}(a.b)R3|R1(a,b)|12|R2(a,b)|12N32.

All other primitives incur O(NlogN) time complexity and O(NBlogMBNB) cache complexity. Hence, this algorithm incurs O(N32logN) time complexity and O(N32BlogMBN32B) cache complexity. As each step is oblivious, the composition of all these steps is also oblivious.

Theorem 8.

For triangle join 𝒬, there is an oblivious and cache-agnostic algorithm that can compute 𝒬() for any instance of input size N with O(N32logN) time complexity and O(N32BlogMBN32B) cache complexity under the tall cache and wide block assumptions.

5 Oblivious Worst-case Optimal Join Algorithm

Algorithm 6 GenericJoin(𝒬=(𝒱,),) [44].
1 if |𝒱|=1 then return eRe by Intersect;
2 (I,J) an arbitrary partition of 𝒱;
3 𝒬IGenericJoin((I,[I]),{πIRe:e});
4 foreach t𝒬I do  𝒬tGenericJoin((J,[J]),{πJ(Ret):e});
5 return t𝒬I{t}×𝒬t;

In this section, we start with revisiting the insecure WCOJ algorithm in Section 5.1 and then present our oblivious algorithm in Section 5.2 and present its analysis in Section 5.3. Subsequently, in Section 5.4, we explore the implications of our oblivious algorithm for relaxed oblivious algorithms designed for cyclic join queries.

5.1 Generic Join Revisited

In a join query 𝒬=(𝒱,), for a subset of attributes S𝒱, we use 𝒬[S]=(S,[S]) to denote the sub-query induced by attributes in S, where [S]={eS:e}. For each attribute x𝒱, we use x={e:xe} to denote the set of relations containing x. The insecure WCOJ algorithm described in [44] is outlined in Algorithm 6, which takes as input a general join query 𝒬=(𝒱,) and an instance . In the base case, when only one attribute exists, it computes the intersection of all relations. For the general case, it partitions the attributes into two disjoint subsets, I and J, such that IJ= and IJ=𝒱. The algorithm first computes the sub-query 𝒬[I], induced by attributes in I, whose join result is denoted 𝒬I. Then, for each tuple t𝒬I, it recursively invokes the whole algorithm to compute the sub-query 𝒬[J] induced by attributes in J, over tuples that can be joined with t. The resulting join result for each tuple t is denoted as 𝒬t. Finally, it attaches each tuple in 𝒬t with t, representing the join results in which t participates. The algorithm ultimately returns the union of all join results for tuples in 𝒬I. However, Algorithm 6 exhibits significant leakage of data statistics that violates the obliviousness constraint, for example:

  • |eRe| is leaked in line 1;

  • |πIRe| for each relation e is leaked in line 3;

  • |𝒬I|, |πJ(Ret)|, and |𝒬t| for each tuple t𝒬I are leaked in line 4.

More importantly, this algorithm heavily relies on hashing indexes or range search indexes for retrieving tuples, such that the intersection at line 1 can be computed in O(mine|Re|) time. However, these indexes do not work well in the external memory model since naively extending this algorithm could result in O(Nρ) cache complexity, which is too expensive. Consequently, designing a WCOJ algorithm that simultaneously maintains cache locality and achieves obliviousness remains a significant challenge.

Algorithm 7 ObliviousGenericJoin(𝒬=(𝒱,),).
1 if |𝒱|=1 then return eRe by Intersect;
2 (I,J) a partition of 𝒱 such that (1) |J|=1; or (2) |J|=2 (say J={y,z}) and yz and zy;
3 foreach e do SeProject(Re,eI);
4 𝒬IObliviousGenericJoin((I,[I]),{Se:e});
5 if |J|=1 then // Suppose J={x}
6 foreach ex do  𝒬IAugment(𝒬I,Re,eI);
7 {QIe}exPartition-I(𝒬I,x);
8 foreach ex do
9    LeRelaxedTwoWay(𝒬Ie,Re,Nρ(𝒬));
10    for ex{e} do LeSemiJoin(Le,Re);
11    
12 LexLe;
13 
14else // Suppose J={y,z}
15 foreach eyz do  𝒬IAugment(𝒬I,Re,eI);
16 {𝒬Ie1,e2}(e1,e2)(yz)×(zy),{𝒬Ie3}e3xyPartition-II(QI,y,z);
17 foreach (e1,e2)(yz)×(zy) do
18    Le1,e2RelaxedTwoWay(𝒬Ie1,e2,Re1,Nρ(𝒬));
19    Le1,e2RelaxedTwoWay(Le1,e2,Re2,Nρ(𝒬));
20    foreach e{e1,e2} do Le1,e2SemiJoin(Le1,e2,Re);
21    
22 foreach e3yz do
23    Le3RelaxedTwoWay(𝒬Ie3,Re3,Nρ(𝒬));
24    foreach e{e3} do  Le3SemiJoin(Le3,Re);
25    
26 L((e1,e2)(yz)×(zy)Le1,e2)(e3yzLe3);
27 
28return Compact L while keeping the first Nρ(𝒬) tuples;
Algorithm 8 Partition-I(𝒬I,x).
1 foreach ex do 𝒬Ie;
2 while read (t,{Δe(t)}ex) from 𝒬I do // Suppose Δe(t)=|Re{t}|
3 eargminexΔe(t);
4 write t to 𝒬Ie and write  to 𝒬Ie′′ for each e′′x{e};
5 
return {QIe}ex;
Algorithm 9 Partition-II(𝒬I,y,z).
1 foreach (e1,e2)(yz)×(zy) do  𝒬Ie1,e2;
2 foreach e3yz do 𝒬Ie3;
3 while read (t,{Δe(t)}eyz) from 𝒬I do // Suppose Δe(t)=|Re{t}|
4 e1,e2,e3argmineyzΔe(t),argminezyΔe(t),argmineyzΔe(t);
5 if Δe1(t)Δe2(t)Δe3(t) then
6    write t to 𝒬Ie1,e2;
7    foreach (e1,e2)(yz)×(zy){(e1,e2)} do write  to 𝒬Ie1,e2;
8    foreach e3yz do write  to 𝒬Ie3;
9    
10 else
11    write t to 𝒬Ie3;
12    foreach (e1,e2)(yz)×(zy) do write  to 𝒬Ie1,e2;
13    foreach e3yz{e3} do write  to 𝒬Ie3;
14    
15 
16return {𝒬Ie1,e2}(e1,e2)(yz)×(zy),{𝒬Ie3}e3xy;

5.2 Our Algorithm

Now, we extend our oblivious triangle join algorithms from Section 4 to general join queries, as described in Algorithm 7. It is built on a recursive framework:

Base Case: |𝓥|=𝟏.

In this case, the join degenerates to the set intersection of all input relations, which can be efficiently computed by the Intersect primitive.

General Case: |𝓥|>𝟏.

In general, we partition 𝒱 into two subsets I and J, but with the constraint that |J|=1 or |J|=2 but the two attributes y,z in J must satisfy yz and zy. Similar to Algorithm 6, we compute the sub-query 𝒬[I] by invoking the whole algorithm recursively, whose join result is denoted as 𝒬I. To prevent the potential leakage, we must be careful about the projection of each relation involved in this subquery, which is computed by the Project primitive. We further distinguish two cases based on |J|:

General Case 1: |𝑱|=𝟏.

Suppose J={x}. Recall that for each tuple t𝒬I, Algorithm 6 computes the intersection ex(Ret) on x in the base case. To ensure this step remains oblivious, we must conceal the size of Ret. To achieve this, we augment each tuple t𝒬I with its degree in Re, which is defined as Δe(t)=|Ret|, using the Augment primitive. Then, we partition tuples in 𝒬I into |x| subsets based on their smallest degree across all relations in x. The details are described in Algorithm 8. Let 𝒬Ie𝒬I denote the set of tuples whose degree is the smallest in Re, i.e., e=argminexΔe(t) for each t𝒬Ie. Whenever we write one tuple t𝒬I to one subset, we also write a dummy tuple to the other |x|1 subsets. At last, for each ex, we compute Re𝒬Ie by invoking the RelaxedTwoWay primitive (line 9), with upper bound Nρ, and further filter them by remaining relations with semi-joins (line 10).

General Case 2: |𝑱|=𝟐.

Suppose J={y,z}. Consider an arbitrary tuple t𝒬I. Algorithm 6 computes the residual query {eyz(Ret)}{eyz(Ret)}{ezy(Ret)}. Like the case above, we first compute its degree in Re as Δe(t), by the Augment primitive. We then partition tuples in 𝒬I into |yz|+|yz||zy| subsets based on their degrees, but more complicated than Case 1. The details are described in Algorithm 9. More specifically, for each e3yz, let

𝒬Ie3={t𝒬I: Δe3(t)=mine′′yzΔe′′(t),Δe3(t)<mineyz,ezyΔe(t)Δe(t)};

and for each pair (e1,e2)(yz)×(zy), let

𝒬Ie1,e2={t𝒬I: Δe1(t)Δe2(t)=mineyz,ezyΔe(t)Δe(t)mine′′yzΔe′′(t)}

For each (e1,e2)(yz)×(zy), we compute Re1Re2𝒬Ie1,e2 by invoking the RelaxedTwoWay primitive iteratively (line 16-17), with the upper bound Nρ(𝒬), and filter these results by remaining relations with semi-joins (line 18). For each e3yz, we compute Re3𝒬Ie3 by invoking the RelaxedTwoWay primitive (line 20), with the upper bound Nρ(𝒬), and filter these results by remaining relations with semi-joins (line 21).

5.3 Analysis of Algorithm 7

Base Case: |𝓥|=𝟏.

The obliviousness is guaranteed by the Intersect primitive. The cache complexity is O(NBlogMBNB). In this case, ρ=1. Hence, Theorem 9 holds.

General Case: |𝓥|>𝟏.

By hypothesis, the recursive invocation of ObliviousGenericJoin at line 4 takes O(Nρ(𝒬)logN) time and O(NρBlogMBNB) cache complexity, since ρ((I,[I]))ρ(𝒬). We then show the correctness and complexity for all invocations of RelaxedTwoWay primitive. Let ρ() be an optimal fractional edge cover of 𝒬. The real size of the two-way join at line 9 can be first rewritten as:

ex|Re𝒬Ie|=ext𝒬Ie|Ret|=ext𝒬Ieminex|Ret|t𝒬Iex|Ret|ρ(e)Nρ

where the inequalities follow the facts that exρ(e)1, rx𝒬Ie=𝒬I, and the query decomposition lemma [44]. Hence, Nρ(𝒬) is valid upper bound for Re𝒬Ie for each ex. The real size of the two-way join at lines 18-19 and line 22 can be rewritten as

e1yz,e2zy|Re1Re2𝒬Ie1,e2|+e3yz|Re3𝒬Ie3|
=e1yz,e2zyt𝒬Ie1,e2|(Re1t)(Re2t)|+e3yzt𝒬Ie3|Re3t|
=t𝒬Imin{mine1yz,e2zy|Re1t||Re2t|,mine3yz|Re3t|} (2)

Let ρ1=eyzρ(e), ρ2=ezyρ(e) and ρ3=eyzρ(e). Note ρ31min{ρ1,ρ2} as ρ() is a valid fractional edge cover for both y and z. For each tuple t𝒬I, we have

min{mine1yz,e2zy|Re1t||Re2t|,mine3yz|Re3t|}
(mine1yz|Re1t|)ρ1(mine2zy|Re2t|)ρ2(mine3yz|Re3t|)ρ3
eyz|Ret|ρ(e)ezy|Ret|ρ(e)eyz|Ret|ρ(e)=eyz|Ret|ρ(e),

where the first inequality follows from min{a,b}apb1p for a,b0 and p[0,1], and the third inequality follows from ρ1,ρ2min{ρ1,ρ2}. Now, we can further bound (5.3) as

(5.3)t𝒬Ieyz|Ret|ρ(e)=t𝒬Ieyz|Ret|ρ(e)e|Re|ρ(e)Nρ

where the second last inequality follows the query decomposition lemma [44].

Theorem 9.

For a general join query 𝒬, there is an oblivious and cache-agnostic algorithm that can compute 𝒬() for any instance of input size N with O(NρlogN) time complexity and O(NρBlogMBNρB) cache complexity under the tall cache and wide block assumptions, where ρ is the optimal fractional edge cover number of 𝒬.

5.4 Implications to Relaxed Oblivious Algorithms

Our oblivious WCOJ algorithm can be combined with the generalized hypertree decomposition framework [33] to develop a relaxed oblivious algorithm for general join queries.

Definition 10 (Generalized Hypertree Decomposition (GHD)).

Given a join query 𝒬=(𝒱,), a GHD of 𝒬 is a pair (𝒯,λ), where 𝒯 is a tree as an ordered set of nodes and λ:𝒯2𝒱 is a labeling function which associates to each vertex u𝒯 a subset of attributes in 𝒱, λu, such that (1) for each e, there is a node u𝒯 such that eλu; (2) For each x𝒱, the set of nodes {u𝒯:xλu} forms a connected subtree of 𝒯. The fractional hypertree width of 𝒬 is defined as min(𝒯,λ)maxu𝒯ρ((λu,{eλ:e})).

The pseudocode of our algorithm is given in the full version [1]. Suppose we take as input a join query 𝒬=(𝒱,), an instance , and an upper bound on the output size τ|𝒬()|. Let (𝒯,λ) be an arbitrary GHD of 𝒬. We first invoke Algorithm 7 to compute the subquery 𝒬u=(λu,u) defined by each node u𝒯, where u={eu:e}, and materialize its join result as one relation. We then apply the classic Yannakakis algorithm [54] on the materialized relations by invoking the SemiJoin primitive for semi-joins and the RelaxedTwoWay primitive for pairwise joins. After removing dangling tuples, the size of each two-way join is upper bound by the size of the final join results and, therefore, τ. This leads to a relaxed oblivious algorithm whose access pattern only depends on N and τ.

Theorem 11.

For a join query 𝒬, an instance of input size N, and parameter τ|𝒬()|, there is a cache-agnostic algorithm that can compute 𝒬() with O((Nw+τ)log(Nw+τ)) time complexity and O(Nw+τBlogMBNw+τB) cache complexity, whose access pattern only depends on N and τ, where w is the fractional hypertree width of 𝒬.

6 Conclusion

This paper has introduced a general framework for oblivious multi-way join processing, achieving near-optimal time and cache complexity. However, several intriguing questions remain open for future exploration:

  • Balancing Privacy and Efficiency. Recent research has investigated improved trade-offs between privacy and efficiency, aiming to overcome the challenges of worst-case scenarios, such as differentially oblivious algorithms [15].

  • Emit model for EM algorithms. In the context of EM join algorithms, the emit model - where join results are directly outputted without writing back to disk - has been considered. It remains open whether oblivious, worst-case optimal join algorithms can be developed without requiring all join results to be written back to disk.

  • Communication-oblivious join algorithm for MPC model. A natural connection exists between the MPC and EM models in join processing. While recent work has explored communication-oblivious algorithms in the MPC model [14, 49], extending these ideas to multi-way join processing remains an open challenge.

References

  • [1] https://arxiv.org/pdf/2501.04216.
  • [2] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of databases, volume 8. Addison-Wesley Reading, 1995.
  • [3] Mahmoud Abo Khamis, Hung Q Ngo, and Atri Rudra. Faq: questions asked frequently. In PODS, pages 13–28, 2016. doi:10.1145/2902251.2902280.
  • [4] Alok Aggarwal and S Vitter, Jeffrey. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116–1127, 1988. doi:10.1145/48529.48535.
  • [5] Miklós Ajtai, János Komlós, and Endre Szemerédi. An 0 (n log n) sorting network. In STOC, pages 1–9, 1983.
  • [6] Arvind Arasu and Raghav Kaushik. Oblivious query processing. ICDT, 2013.
  • [7] Lars Arge, Michael A Bender, Erik D Demaine, Bryan Holland-Minkley, and J Ian Munro. An optimal cache-oblivious priority queue and its application to graph algorithms. SIAM Journal on Computing, 36(6):1672–1695, 2007. doi:10.1137/S0097539703428324.
  • [8] Gilad Asharov, Ilan Komargodski, Wei-Kai Lin, Kartik Nayak, Enoch Peserico, and Elaine Shi. Optorama: Optimal oblivious ram. In Eurocrypt, pages 403–432. Springer, 2020. doi:10.1007/978-3-030-45724-2_14.
  • [9] Albert Atserias, Martin Grohe, and Dániel Marx. Size bounds and query plans for relational joins. In FOCS, pages 739–748. IEEE, 2008. doi:10.1109/FOCS.2008.43.
  • [10] Kenneth E Batcher. Sorting networks and their applications. In Proceedings of the April 30–May 2, 1968, spring joint computer conference, pages 307–314, 1968. doi:10.1145/1468075.1468121.
  • [11] Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. JACM, 64(6):1–58, 2017. doi:10.1145/3125644.
  • [12] C. Beeri, R. Fagin, D. Maier, and M. Yannakakis. On the desirability of acyclic database schemes. JACM, 30(3):479–513, 1983. doi:10.1145/2402.322389.
  • [13] Amos Beimel, Kobbi Nissim, and Mohammad Zaheri. Exploring differential obliviousness. In APPROX/RANDOM, 2019.
  • [14] TH Chan, Kai-Min Chung, Wei-Kai Lin, and Elaine Shi. Mpc for mpc: secure computation on a massively parallel computing architecture. In ITCS, 2020.
  • [15] TH Hubert Chan, Kai-Min Chung, Bruce M Maggs, and Elaine Shi. Foundations of differentially oblivious algorithms. In SODA, pages 2448–2467. SIAM, 2019.
  • [16] Zhao Chang, Dong Xie, and Feifei Li. Oblivious ram: A dissection and experimental evaluation. Proc. VLDB Endow., 9(12):1113–1124, 2016. doi:10.14778/2994509.2994528.
  • [17] Zhao Chang, Dong Xie, Sheng Wang, and Feifei Li. Towards practical oblivious join. In SIGMOD, 2022.
  • [18] Shumo Chu, Danyang Zhuo, Elaine Shi, and T-H. Hubert Chan. Differentially Oblivious Database Joins: Overcoming the Worst-Case Curse of Fully Oblivious Algorithms. In ITC, volume 199, pages 19:1–19:24, 2021. doi:10.4230/LIPICS.ITC.2021.19.
  • [19] Victor Costan and Srinivas Devadas. Intel sgx explained. Cryptology ePrint Archive, 2016.
  • [20] Natacha Crooks, Matthew Burke, Ethan Cecchetti, Sitar Harel, Rachit Agarwal, and Lorenzo Alvisi. Obladi: Oblivious serializable transactions in the cloud. In OSDI, pages 727–743, 2018. URL: https://www.usenix.org/conference/osdi18/presentation/crooks.
  • [21] Erik D Demaine. Cache-oblivious algorithms and data structures. Lecture Notes from the EEF Summer School on Massive Data Sets, 8(4):1–249, 2002.
  • [22] Shiyuan Deng and Yufei Tao. Subgraph enumeration in optimal i/o complexity. In ICDT. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024.
  • [23] Srinivas Devadas, Marten van Dijk, Christopher W Fletcher, Ling Ren, Elaine Shi, and Daniel Wichs. Onion oram: A constant bandwidth blowup oblivious ram. In TCC, pages 145–174. Springer, 2016. doi:10.1007/978-3-662-49099-0_6.
  • [24] Saba Eskandarian and Matei Zaharia. Oblidb: Oblivious query processing for secure databases. Proc. VLDB Endow., 13(2), 2019. doi:10.14778/3364324.3364331.
  • [25] R. Fagin. Degrees of acyclicity for hypergraphs and relational database schemes. JACM, 30(3):514–550, 1983. doi:10.1145/2402.322390.
  • [26] Austen Z Fan, Paraschos Koutris, and Hangdong Zhao. Tight bounds of circuits for sum-product queries. SIGMOD, 2(2):1–20, 2024.
  • [27] Jörg Flum, Markus Frick, and Martin Grohe. Query evaluation via tree-decompositions. JACM, 49(6):716–752, 2002. doi:10.1145/602220.602222.
  • [28] Matteo Frigo, Charles E Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. In FOCS, pages 285–297. IEEE, 1999. doi:10.1109/SFFCS.1999.814600.
  • [29] Craig Gentry, Kenny A Goldman, Shai Halevi, Charanjit Julta, Mariana Raykova, and Daniel Wichs. Optimizing oram and using it efficiently for secure computation. In PETs, pages 1–18. Springer, 2013. doi:10.1007/978-3-642-39077-7_1.
  • [30] Oded Goldreich. Towards a theory of software protection and simulation by oblivious rams. In STOC, pages 182–194, 1987. doi:10.1145/28395.28416.
  • [31] Oded Goldreich and Rafail Ostrovsky. Software protection and simulation on oblivious rams. JACM, 43(3):431–473, 1996. doi:10.1145/233551.233553.
  • [32] Michael T Goodrich. Data-oblivious external-memory algorithms for the compaction, selection, and sorting of outsourced data. In SPAA, pages 379–388, 2011. doi:10.1145/1989493.1989555.
  • [33] Georg Gottlob, Nicola Leone, and Francesco Scarcello. Hypertree decompositions and tractable queries. JCSS, 64(3):579–627, 2002. doi:10.1006/JCSS.2001.1809.
  • [34] Hakan Hacigümüş, Bala Iyer, Chen Li, and Sharad Mehrotra. Executing sql over encrypted data in the database-service-provider model. In SIGMOD, pages 216–227, 2002.
  • [35] Bingsheng He and Qiong Luo. Cache-oblivious nested-loop joins. In CIKM, pages 718–727, 2006. doi:10.1145/1183614.1183717.
  • [36] Xiao Hu. Cover or pack: New upper and lower bounds for massively parallel joins. In PODS, pages 181–198, 2021. doi:10.1145/3452021.3458319.
  • [37] Xiaocheng Hu, Miao Qiao, and Yufei Tao. I/o-efficient join dependency testing, loomis–whitney join, and triangle enumeration. JCSS, 82(8):1300–1315, 2016. doi:10.1016/J.JCSS.2016.05.005.
  • [38] Bas Ketsman and Dan Suciu. A worst-case optimal multi-round algorithm for parallel computation of conjunctive queries. In PODS, pages 417–428, 2017. doi:10.1145/3034786.3034788.
  • [39] Paraschos Koutris, Paul Beame, and Dan Suciu. Worst-case optimal algorithms for parallel query processing. In ICDT. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016.
  • [40] Simeon Krastnikov, Florian Kerschbaum, and Douglas Stebila. Efficient oblivious database joins. VLDB, 13(12):2132–2145, 2020. URL: http://www.vldb.org/pvldb/vol13/p2132-krastnikov.pdf.
  • [41] Eyal Kushilevitz, Steve Lu, and Rafail Ostrovsky. On the (in) security of hash-based oblivious ram and a new balancing scheme. In SODA, pages 143–156. SIAM, 2012. doi:10.1137/1.9781611973099.13.
  • [42] Wei-Kai Lin, Elaine Shi, and Tiancheng Xie. Can we overcome the n log n barrier for oblivious sorting? In SODA, pages 2419–2438. SIAM, 2019. doi:10.1137/1.9781611975482.148.
  • [43] Hung Q Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case optimal join algorithms. JACM, 65(3):1–40, 2018. doi:10.1145/3180143.
  • [44] Hung Q Ngo, Christopher Ré, and Atri Rudra. Skew strikes back: New developments in the theory of join algorithms. ACM SIGMOD Record, 42(4):5–16, 2014. doi:10.1145/2590989.2590991.
  • [45] Vijaya Ramachandran and Elaine Shi. Data oblivious algorithms for multicores. In SPAA, pages 373–384, 2021. doi:10.1145/3409964.3461783.
  • [46] Sajin Sasy, Aaron Johnson, and Ian Goldberg. Fast fully oblivious compaction and shuffling. In CCS, pages 2565–2579, 2022. doi:10.1145/3548606.3560603.
  • [47] Elaine Shi. Path oblivious heap: Optimal and practical oblivious priority queue. In SP, pages 842–858. IEEE, 2020. doi:10.1109/SP40000.2020.00037.
  • [48] Emil Stefanov, Marten Van Dijk, Elaine Shi, T-H Hubert Chan, Christopher Fletcher, Ling Ren, Xiangyao Yu, and Srinivas Devadas. Path oram: an extremely simple oblivious ram protocol. JACM, 65(4):1–26, 2018. doi:10.1145/3177872.
  • [49] Yufei Tao, Ru Wang, and Shiyuan Deng. Parallel communication obliviousness: One round and beyond. Proceedings of the ACM on Management of Data, 2(5):1–24, 2024. doi:10.1145/3695832.
  • [50] Todd L Veldhuizen. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In ICDT, 2014.
  • [51] Jeffrey Scott Vitter. External memory algorithms and data structures: Dealing with massive data. CsUR, 33(2):209–271, 2001.
  • [52] Xiao Wang, Hubert Chan, and Elaine Shi. Circuit oram: On tightness of the goldreich-ostrovsky lower bound. In CCS, pages 850–861, 2015. doi:10.1145/2810103.2813634.
  • [53] Yilei Wang and Ke Yi. Query evaluation by circuits. In PODS, 2022.
  • [54] Mihalis Yannakakis. Algorithms for acyclic database schemes. In VLDB, pages 82–94, 1981.
  • [55] Wenting Zheng, Ankur Dave, Jethro G Beekman, Raluca Ada Popa, Joseph E Gonzalez, and Ion Stoica. Opaque: An oblivious and encrypted distributed analytics platform. In NSDI 17, pages 283–298, 2017. URL: https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/zheng.