Abstract 1 Introduction 2 Definitions 3 A 𝛀(𝒅𝒘) Lower Bound on Communication Complexity 4 Bucket-RPIR 5 OneHot-RPIR References

Information-Theoretic Random-Index PIR

Sebastian Kolby ORCID Aarhus University, Denmark Lawrence Roy ORCID Aarhus University, Denmark Jure Sternad ORCID Aarhus University, Denmark Sophia Yakoubov ORCID Aarhus University, Denmark
Abstract

A Private Information Retrieval (PIR) protocol allows a client to learn the ith row of a database held by one or more servers, without revealing i to the servers. A Random-Index PIR (RPIR) protocol, introduced by Gentry et al. (TCC 2021), is a PIR protocol where, instead of being chosen by the client, i is random. This has applications in e.g. anonymous committee selection. Both PIR and RPIR protocols are interesting only if the communication complexity is smaller than the database size; otherwise, the trivial solution where the servers send the entire database suffices.

Unlike PIR, where the client must send at least one message (to encode information about i), RPIR can be executed in a single round of server-to-client communication. In this paper, we study such one-round, information-theoretic RPIR protocols. The only known construction in this setting is SimpleMSRPIR (Gentry et al.), which requires the servers to communicate approximately N2 bits, N being the database size. We show an Ω(N) lower bound on communication complexity for one-round two-server information-theoretic RPIR, and a sublinear upper bound. Finally, we show how to use a sublinear amount of database-independent correlated randomness among multiple servers to get near-optimal online communication complexity (the size of one row plus the size of one index description per server).

Keywords and phrases:
Private information retrieval, Multi-server, Lower bounds
Copyright and License:
[Uncaptioned image] © Sebastian Kolby, Lawrence Roy, Jure Sternad, and Sophia Yakoubov; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Security and privacy Information-theoretic techniques
Acknowledgements:
Supported by the Danish Independent Research Council under Grant-ID DFF-2032-00122B and DFF-2064-00016B (YOSO).
Funding:
Danish Independent Research Council under Grant-ID DFF-2032-00122B and DFF-2064-00016B (YOSO)
Editor:
Niv Gilboa

1 Introduction

PIR (private information retrieval [4]) enables a client to retrieve the ith row of a database from one or more servers in such a way that i remains unknown to the server(s). The trivial solution involves the server(s) sending the client the entire database; however, it is desirable to bring the communication complexity down so that it is strictly smaller than the size of the database.

RPIR (random-index PIR [7]) is a relaxation of PIR, where instead of learning a database row of her choice, the client learns a random row. Just like PIR, RPIR guarantees that the index i of that row is unknown to the server(s). Any PIR construction can be converted to a RPIR construction, simply by having the client choose the index i at random. However, while PIR constructions require at least one message from the client (to communicate something about i) followed by at least one message from the server(s) (to communicate something about the database), not all RPIR constructions require the client to communicate.

Random-index PIR constructions that only take one round of server-client communication are termed “non-interactive”. Gentry et al. [7] show how to build one-server non-interactive RPIR from homomorphic encryption. They also show how to build two-server non-interactive information-theoretic RPIR (which they dub SimpleMSRPIR), the communication complexity of which is dw2+log(d) (where the database contains d rows each of size w).

Our Contributions

In this paper, we take steps towards understanding the communication complexity of non-interactive information-theoretic RPIR. We prove that non-interactive two-server information-theoretic RPIR must have communication complexity at least O(d)w.

On the positive side, we give an upper bound with communication complexity

dwdloglog(d)log(d)+dwlog(d)loglogd+1+Θ(dlog(d))

bits, showing that the communication complexity can in fact be sub-linear in the database size dw (when w=ω(log(d))). 111Note that there is a qualitative gap between our lower and upper bounds as they are currently presented: Our lower bound relies on perfect correctness, while our upper bound allows a failure to occur with some probability. The lower bound can be extended to account for a fixed probability of failure.

Finally, we describe a construction that uses a sublinear amount of correlated randomness, but gets the near-optimal online communication complexity of nlog(d)w (where n is the number of servers, d is the number of database rows, and w is the size of each row).

Related Work

Information-theoretic Private Information Retrieval (PIR) was first introduced by Chor et al. [4], who provided a general construction for the case of n servers, achieving a total communication complexity of O(d1n). Additionally, they proposed a specialized construction for the two-server case with a communication complexity of O(d13). Shortly after, Ambainis [1] significantly improved the upper bound for the n-server setting, reducing the complexity to O(d12n1). Beimel and Ishai [2] made further progress, considering a scenario in which up to t servers may collude. They presented a construction achieving a communication complexity of O(d1(2n1)/t).

Wehner and de Wolf [11] made a key contribution to lower bounds in information-theoretic PIR, where they additionally parametrized the communication complexity by probe complexity b – a measure of the number of actual bits the client reads from the messages sent by the servers – as an additional parameter in the communication complexity analysis. In particular, they established a lower bound of Ω(d1b+1) for the two-server case. Complementing this, they presented a scheme with b=1 and communication complexity O(d). Their results also demonstrated that the original scheme by Chor et al. [4], which with some modifications can be adapted to have b=3, already achieved near-optimal complexity of O(d14). Moreover, in the general case where the probe complexity equals the length of the messages sent by the two servers, their result implies a lower bound of 5logd, which remains the best-known lower bound for two servers.

For the two-server setting, the upper bound of Chor et al. was finally improved by Dvir and Gopi [5], who established an upper bound of nO(loglogdlogd), which remains the best known result for this case. In the multi-server setting, the currently best-known upper bound is due to works [6, 9, 3]. Efremenko [6] provided a three-server construction with communication complexity exp(O(logdloglogd)) and a general 2r-server construction with a communication complexity of exp(O(logdlogr1logdr)). Recently, Ghasemi, Kopparty and Sudan [8] improved Efremenko’s three-server construction, reducing the exponent from square root to the cube root. Other improvements – such as a reduction of the number of required servers of Efremenko’s 2r-server construction – were made by Itoh and Suzuki [9], and Chee et al. [3].

1.1 Technical Overview

In Section 2 we describe our definitions of non-interactive multi-server RPIR, adapted from Gentry et al. [7]. In Section 3 we describe our lower bound for the communication complexity of such protocols. We use the fact that in non-interactive two-server RPIR without correlated randomness, the client can combine any message 𝚖𝚜𝚐1 from the first server with any message 𝚖𝚜𝚐2 from the second server to retrieve a database row. So, if we take d messages 𝚖𝚜𝚐1 and d messages 𝚖𝚜𝚐2, the client can retrieve d×d=d database rows. Some of these may be the same, but we prove that, in expectation, the client ends up with cd different database rows for some constant c, which amounts to cdw bits of information. Communicating cdw bits of information naturally requires that many bits; so, if this much information can be extracted from 2d messages, on average each of these messages must have size at least cdw2d=O(d)w.

In Section 4 we describe our sublinear upper bound (without correlated randomness). We build on the SimpleMSRPIR construction of Gentry et al.. In SimpleMSRPIR, one server sends a random database row, and the other server randomly pairs the rows and sends an XOR of each pair (as well as a concise description of the pairing). The client uses row i sent by the first server and the XOR of row i and row j sent by the second server to recover row j as well. With probability 1d it returns row i; the rest of the time, it returns row j. In SimpleMSRPIR, the servers send a total of dw2+2log(d)+w bits, which is linear in the size dw of the database. In our construction (which we dub “Bucket-PIR”), instead of sending a single row, the first server sends each row with probability p. Instead of pairing the database rows, the second server partitions them into buckets of size b. By setting p=1dloglog(d)log(d) and b=log(d)loglog(d)+1, we ensure that the client retrieves a database row with probability at least half, while pushing the total communication complexity to be sublinear in the size dw of the database as long as w=O(log(d)). (The probability that the client retrieves a row can be boosted arbitrarily close to 1 via parallel repetition; notice that the number of repetitions depends only on the desired probability of success, and not on the database size, so it does not impact the sublinearity of our communication complexity.)

In Section 5, we describe our final construction, which uses correlated randomness. We consider a setting with n2 servers, each of which holds Shamir shares of u random one-hot vectors (with zeros in all but one place) of length s. By locally taking the tensor product of all of its vectors of shares, each server ends up with shares of a size-d random one-hot vector, with a one at i. Each server takes the dot product of this vector of shares with the database, and sends the resulting value to the client. The client can treat the messages it receives as shares which reconstruct exactly to the ith database row.

2 Definitions

We use a definition of multi-server random-index PIR based on that of Gentry et al. [7, Definition 3]. Consider n servers 𝒮1,,𝒮n, and a client 𝒞. Since we limit ourselves to non-interactive (one-round) protocols, we can describe the entire protocol in terms of algorithms 𝚂𝚎𝚛𝚟𝚎𝚛p (for p[n]) and 𝙲𝚕𝚒𝚎𝚗𝚝. Each server 𝒮p runs the algorithm 𝚂𝚎𝚛𝚟𝚎𝚛p on the database D and randomness ρp (from distribution 𝖱𝒮) to obtain message 𝚖𝚜𝚐p. The client 𝒞 runs the algorithm 𝙲𝚕𝚒𝚎𝚗𝚝 on messages 𝚖𝚜𝚐1,,𝚖𝚜𝚐n and randomness ρ𝒞 (from distribution 𝖱𝒞) to obtain (i,Di).

Sometimes, the servers use database-independent correlated randomness. We use 𝖼𝗋p to denote server 𝒮p’s correlated randomness, and 𝖢𝖱 to denote the distribution from which (𝖼𝗋1,,𝖼𝗋n) is drawn. We use grey to denote elements of an MS-RPIR scheme that are only present when correlated randomness is used.

Definition 1 (MS-RPIR: Multi-Server Random-Index PIR for Databases With d Rows of Size w).

A semi-honest, non-interactive (one-round) c-correct n-server threshold-t random-index PIR scheme ({𝚂𝚎𝚛𝚟𝚎𝚛p}p[n],𝙲𝚕𝚒𝚎𝚗𝚝,𝖢𝖱) must satisfy the following properties:

c-Correctness (0<c1):

For every database D{0,1}w×d and index i[d], the client’s probability of outputting Di is cd.

More formally, for every database D{0,1}w×d,

Pr[(i,D[i])=𝙲𝚕𝚒𝚎𝚗𝚝(𝚖𝚜𝚐1,,𝚖𝚜𝚐n;ρ𝒞)whereρp𝖱𝒮 for p[n],ρ𝒞𝖱𝒞;(𝖼𝗋1,,𝖼𝗋n)𝖢𝖱;msgp𝚂𝚎𝚛𝚟𝚎𝚛(D,𝖼𝗋p;ρp) for p[n]]=cd,

for every index i[d]. (Any other output of 𝙲𝚕𝚒𝚎𝚗𝚝 is interpreted to be .)

Client Privacy

For every database D{0,1}w×d, for every set of corrupt servers I[n] s.t. |I|t, the index of the row output by the client is independent of the view of the corrupt servers.

More formally, the following two distributions should be statistically indistinguishable:

{(i,{ρp,𝖼𝗋p}pI)whereρp𝖱𝒮 for p[n],ρ𝒞𝖱𝒞;(𝖼𝗋1,,𝖼𝗋n)𝖢𝖱;msgp𝚂𝚎𝚛𝚟𝚎𝚛(D,𝖼𝗋p;ρp) for p[n];(i,D[i])=𝙲𝚕𝚒𝚎𝚗𝚝(𝚖𝚜𝚐1,,𝚖𝚜𝚐n;ρ𝒞)}{(i,{ρp,𝖼𝗋p}pI)whereρp𝖱𝒮 for p[n];(𝖼𝗋1,,𝖼𝗋n)𝖢𝖱;i[d] with pr. ci= otherwise}
 Remark 2.

Notice that we allow the client to output with constant probability. Parallel executions can bring c down arbitrarily close to 0.

Finally, the trivial solution of the server(s) sending the client the entire database, and the client selecting a random index i and outputting (i,D[i]), satisfies the above definitions. In order to be interesting, a MS-RPIR scheme must also be non-trivial:

Definition 3 (Non-Triviality).

An MS-RPIR scheme is non-trivial if the communication complexity of the protocol is asymptotically smaller than the size of the database.

More formally, let 𝙲𝙲(n,w,d) be the expected communication complexity of the protocol (that is, the expected sum of |𝚖𝚜𝚐1|,,|𝚖𝚜𝚐n|). The scheme is non-trivial if there exists a d0 s.t. for all d>d0, there exists a w0 s.t. for all w>w0,

𝙲𝙲(n,w,d)<wd.

3 A 𝛀(𝒅𝒘) Lower Bound on Communication Complexity

In this section, we focus on the scenario where we have two servers, one of which is corrupt. In this setting, the servers must send Ω(dw) bits. We will prove this by showing that choosing a random size d subset from the space of each server’s possible messages in expectation allows recovering a constant fraction of the database.

Theorem 4 (1-out-of-2 IT MS-RPIR Lower Bound).

Any MS-RPIR protocol with perfect security for two servers, where one may be corrupt, must have at least Ω(dw) expected communication.

Proof.

Let D({0,1}w)d be the database held by the two servers. For i{1,2} let si be the number of possible random tapes for 𝒮i and s0 be the number of possible random tapes for 𝒞. For a fixed random tape a server will always send the same message; some distinct tapes may result in the same message.

For servers 𝒮1 and 𝒮2 we define a matrix M[d]s1×s2, indexed by the choices of 𝒮1 and 𝒮2 randomness. Entry Mr,c encodes the index i[d] which may be recovered by the client for server randomnesses r and c. More formally,

(i,D[i])𝙲𝚕𝚒𝚎𝚗𝚝(𝚂𝚎𝚛𝚟𝚎𝚛1(D;r),𝚂𝚎𝚛𝚟𝚎𝚛2(D;c);Tr,c),

where client randomness Tr,c is chosen uniformly and independently for each of the s1s2 entries of M.

First, we address the case where 𝒮1 has fewer than d possible choices of randomness. If we take all messages that might be sent by 𝒮1 and any single message sent by 𝒮2 it must be possible to recover the entire database. This follows by correctness and privacy, as the client must always be able to recover some row of the database given a pair of messages, and all rows must still be equally likely even when fixing the message of one server. Iterating over all <d of 𝒮1 messages and client randomness will allow the recovery of the database, and therefore the messages of 𝒮1 (together with any one message of 𝒮2) on average contains dw bits of information. A symmetric argument can be made for 𝒮2 with fewer than d possible random tapes.

From now on we will only consider the case where s1,s2d. We will analyse how much of the database is recovered given server messages for k1 choices of 𝒮1 randomness and k2 choices of 𝒮2 randomness. Sample ki distinct choices of randomness for server i{1,2} as

R=(R1,,Rk1),C=(C1,,Ck2).

For all rows i[d] in the database, and a[k1],b[k2], define the random variable:

MRa,Cbi={1if (i,D[i])=𝙲𝚕𝚒𝚎𝚗𝚝(𝚂𝚎𝚛𝚟𝚎𝚛1(D;Ra),𝚂𝚎𝚛𝚟𝚎𝚛2(D;Cb);Ta,b)0otherwise

We may count the occurrences of each row i[d] across the sampled rows and columns,

AR,Ci=a[k1],b[k2]MRa,Cbi.

And define an indicator variable for whether row i[d] is recovered for any pair,

BR,Ci={0if AR,Ci=01otherwise

To prove Theorem 4, it suffices to show that

E[i[d]BR,Ci]d2,

since this expectation describes how many database rows can be retrieved in expectation from our Ω(d) messages.222It would suffice to show that E[i[d]BR,Ci]cd for any constant c; however, the math works out for c=12.

By linearity of expectation, E[i[d]BR,Ci]=i[d]E[BR,Ci]; so, we can focus on E[BR,Ci]=Pr[AR,Ci>0] for a single i.
Going forward we suppress the superscript i, as we will only handle one row i[d] at a time. By the second moment method,

Pr[AR,C>0]E[AR,C]2E[AR,C2].

We can tackle E[AR,C] and E[AR,C2] separately.

Bounding the expectation of 𝑨𝑹,𝑪

First, we handle the simple case for E[AR,C]. By linearity of expectation,

E[AR,C]=a[k1],b[k2]E[MRa,Cb]=a[k1],b[k2]Pr[MRa,Cb].

For all c[s2], privacy implies Pr[MRa,c]=1/d. Therefore, Pr[MRa,Cb]=1/d, and it follows,

E[AR,C]=a[k1],b[k2]1d=k1k2d.

Bounding the expectation of 𝑨𝑹,𝑪𝟐

Expanding the definition and by linearity of expectation,

E[(AR,C)2]=E[(a[k1],b[k2]MRa,Cb)2]=a[k1],b[k2]c[k1],d[k2]E[MRa,CbMRc,Cd].

We wish to derive an upper bound for this expectation. The terms in the sum may be split into four disjoint cases, (1) when a=c and b=d, (2) when a=c but bd, (3) when ac but b=d, and (4) when ac and bd.
We start with case (1) where a=c and b=d,

a=c[k1]b=d[k2]E[MRa,CbMRc,Cd] =a=c[k1]b=d[k2]Pr[MRa,Cb,MRc,Cd]=a[k1]b[k2]Pr[MRa,Cb]
=a[k1],b[k2]1d=k1k2d.

Before we proceed to the next cases, it will be helpful to define some notation. Let s0 be the number of choices of client randomness. For r[s1] and c[s2] let pr,c be the number of client randomness choices for which the client outputs (i,D[i]) for server messages 𝚂𝚎𝚛𝚟𝚎𝚛1(D;r),𝚂𝚎𝚛𝚟𝚎𝚛2(D;c).

For any fixed c[s2] privacy implies the message i must occur for a fraction 1/d of the client choices and server 1 randomness,

r=1s1pr,c=s1s0d.

Similarly, for any fixed r[s1],

c=1s2pr,c=s2s0d.

Let t1=s1s0/d and t2=s2s0/d. It will be convenient to consider a random variable pX,Y which is equal to pr,c when X=r and Y=c.
Now we may proceed to case (2) when a=c but bd,

E[MRa,CbMRa,Cd]=Pr[MRa,Cb,MRa,Cd].

For any outcome r[s1] such that Ra=r,

Pr[Mr,Cb,Mr,Cd]=uv[s2]pr,us0pr,vs0Pr[Cb=u,Cd=v]=1s02s2(s21)uv[s2]pr,upr,v

The sum may be simplified as,

uv[s2]pr,upr,v =(u[s2]pr,u)2u[s2]pr,u2=t22u[s2]pr,u2.

As f(x)=x2 is convex we may apply Jensen’s inequality to see t22/s2u[s2]pr,u2,

(t2s2)2=(u=1s2pr,us2)2u=1s2(pr,u)2s2.

Putting things together we see,

Pr[Mr,Cb,Mr,Cd] 1s02s2(s21)(t22t22s2)=t22s02s2(s21)(11s2)
=s02s22d2s02s2(s21)(s21s2)=1d2.

For this case we may conclude, E[MRa,CbMRa,Cd]1d2.

Moving on to case (3) where ac but b=d, here an analogous argument to that of case (2) gives

E[MRa,CbMRc,Cb]1d2.

We proceed to the final case: (4) ac but bd.

E[MRa,CbMRc,Cd]
=u[s1]v[s1]uvn[s2]m[s2]nmpu,npv,ms02Pr[Ra=u,Rc=v,Cb=n,Cd=m]
=1s1(s11)s2(s21)u[s1]v[s1]uvn[s2]m[s2]nmpu,npv,ms02

Using Jensen’s inequality as we did previously, for any uv[s1],

n[s2]m[s2]nmpu,npv,m=(n[s2]pu,n)2n[s]pu,n2t22(s21s2).

Therefore,

E[MRa,CbMRc,Cd] 1s1(s11)s2(s21)u[s1]v[s1]uvt22(s21s2s02)
=s1(s11)(s21)s1(s11)s22(s21)s02t22=1s22s02(s2s0d)2=1d2.

Having bounded all terms for cases (2), (3) and (4) by 1/d2, we see

a,c[k1]b,d[k2]acbdE[MRa,CbMRc,Cb]k2k1(k11)+k1k2(k21)+k1(k11)k2(k21)d2,

letting k1=k2=d this simplifies to

a,c[k1]b,d[k2]acbdE[MRa,CbMRc,Cb]2(d1)+(d1)2d=d1d.

Combining this with our bound from case (1) we arrive at

a,c[d]b,d[d]E[MRa,CbMRc,Cb]d2d+d1d=21d,

which approaches 2 for increasing d.
So, we get

E[BR,Ci]=Pr[AR,Ci>0]121d,
E[i[d]BR,Ci]d21dd2,

as desired.

4 Bucket-RPIR

In this section, we present a construction based on splitting the database into random subsets, which we refer to as buckets. This construction is inspired by SimpleMSPIR of Gentry et al. [7], in which 𝒮1 sends a single random row, and 𝒮2 pairs the rows and sends the XOR of each pair, for total communication dw2+O(1). The client returns the row which 𝒮1’s row was paired with. We reproduce SimpleMSPIR in Figure 1.

SimpleMSRPIR [7]

 
𝚂𝚎𝚛𝚟𝚎𝚛1(D):

Pick i[d] and return 𝚖𝚜𝚐1=(i,D[i]).

𝚂𝚎𝚛𝚟𝚎𝚛2(D):

Choose a random mask δ[d]. If δ=0, set D=. otherwise, let p1,,pd/2 be the list of pairs of indices pk=(jk,1,jk,2) such that jk,1jk,2=δ (ordered by increasing smallest value in the pair). Define D such that D[k]=D[jk,1]D[jk,2]. (D has size dw2.) Return 𝚖𝚜𝚐2=(δ,D).

𝙲𝚕𝚒𝚎𝚗𝚝(𝚖𝚜𝚐1=(i,D[i]),𝚖𝚜𝚐2=(δ,D)):

If δ=0, return (i,D[i]). Else, find pk such that ipk. Return (iδ,D[k]D[i]).

Figure 1: Description of SimpleMSPIR [7].

SimpleMSPIR has communication complexity linear in the size of the database. In this section, we describe a generalization of SimpleMSPIR where, instead of pairing rows, 𝒮2 distributes the rows into larger buckets. In order to enable the client to retreive a row from XORs of larger sets, 𝒮1 sends many random rows instead of just one. Balancing the size of 𝒮2’s buckets and the number of rows 𝒮1 sends allows us to get communication complexity sublinear in the database size. We describe our protocol in Figure 2, where p is the probability that 𝒮1 sends any given row, and b is the size of 𝒮2’s buckets.

Bucket-RPIR

 
𝚂𝚎𝚛𝚟𝚎𝚛1(D):

Let I=. For each row index i[d], add i to I with probability p. Return 𝚖𝚜𝚐1={(i,D[i])}iI.

𝚂𝚎𝚛𝚟𝚎𝚛2(D):

Pick a random mapping π:[d][db] subject to the constraint that exactly b indices map to each value. (π can be represented in dlog(db) bits.) Define D such that D[k]=D[i1]D[ib] where π[i1]==π[ib]=k. (D has size dwb.) Let 𝚖𝚜𝚐2=(π,D).

𝙲𝚕𝚒𝚎𝚗𝚝(𝚖𝚜𝚐1={(i,D[i])}iI,𝚖𝚜𝚐2=(π,D)):

Let πk denote the set of indices i1,,ib such that π[i1]==π[ib]=k.

If there does not exist a bucket k where exactly all but one of πk are included in 𝚖𝚜𝚐1, abort.

Otherwise, with probability |𝚖𝚜𝚐1|d, output a random value from 𝚖𝚜𝚐1.

With probability 1|𝚖𝚜𝚐1|d, pick a random bucket k for which all but one of πk are included in 𝚖𝚜𝚐1. Let i be such that π(i)=k and i is missing from 𝚖𝚜𝚐1. Return (i,D[k](jπk{i}D[j])).

Figure 2: Description of Bucket-RPIR, Parametrized by p and b.

Next, we need to determine what p and b should be. The probability that the client is able to retrieve a row is

(number of buckets)×Pr[all but one row in some bucket is sent by 𝒮1]
=db×b(1p)pb1=d(1p)pb1.

We need to pick b and p such that this probability is greater than half. Set

b=log(d)loglog(d)+1,p=1dloglog(d)log(d).

Let’s check that with these parameters, the probability of success is indeed greater than half:

Pr[successful row retrieval]=d(1p)pb1
=d(1p)(1dloglog(d)log(d))log(d)loglog(d)+11=1p.

For large enough d, p becomes smaller than half, and thus 1p becomes greater than half.

Circling back to what each server sends:

𝒮1

sends pd=ddloglog(d)log(d)=o(d) separate rows, which amounts to o(d)w bits.

𝒮2

sends db=dlog(d)loglog(d)+1 XORed buckets, as well as a description of the function π. This amounts to o(d)w+dlog(db) bits.

The total amount of communicated information is thus sublinear in dw.

Theorem 5.

Bucket-RPIR (Figure 2) is a secure MS-RPIR protocol (Definition 1) when p and b have the values described above.

Proof.

We start with correctness. The probability that the client outputs is equal to the probability that there is no bucket k such that 𝚖𝚜𝚐1 contains all but one row from πk. We have set this probability to be 1p. Next, subject to the constraint that there is such a bucket, because of the randomness of π every row i not in 𝚖𝚜𝚐1 is equally likely to be the row missing from bucket k. The client chooses a row in 𝚖𝚜𝚐1 and not in 𝚖𝚜𝚐1 with proportional probabilities. So, Bucket-RPIR is (1p) correct.

We argue privacy against 𝒮1 and 𝒮2 separately. 𝒮1 learns nothing about i because, since the client chooses a row in 𝚖𝚜𝚐1 and not in 𝚖𝚜𝚐1 with proportional probabilities, whether a row is included or excluded from 𝚖𝚜𝚐1 does not correlate with its probability of being chosen. If a row in 𝚖𝚜𝚐1 is chosen, it is chosen randomly from 𝚖𝚜𝚐1; if a row outside of 𝚖𝚜𝚐1 is chosen, that choice is dictated by the choice of π, where any row outside of 𝚖𝚜𝚐1 is equally likely to be grouped with b1 rows in 𝚖𝚜𝚐1.

𝒮2 similarly learns nothing about i. This is because each row is included in 𝚖𝚜𝚐1 with the same probability, and due to the fact that π partitions the rows into buckets of equal size, each bucket is equally likely to be almost full (and each row is equally likely to be excluded from the almost-full bucket).

Theorem 6.

Bucket-RPIR (Figure 2) is non-trivial (Definition 3) when p and b have the values described above.

Proof.

𝒮1 communicates pd(log(d)+w) bits; when p=1dloglog(d)log(d) and w=Ω(logd), this is sublinear in dw. 𝒮2 communicates dlog(db)+dwb bits; when b=log(d)loglog(d)+1, this is also sublinear in dw.

5 OneHot-RPIR

In this section, we present a construction based on one-hot vectors. We start by recalling the Shamir secret sharing scheme upon which we provide a high-level overview of the protocol and then offer a formal description in Figure 3.

5.1 Building Block: Secret Sharing

Definition 7 (t-out-of-n secret sharing scheme).

A t-out-of-n secret sharing scheme is a pair of algorithms (𝚜𝚑𝚊𝚛𝚎,𝚛𝚎𝚌𝚘𝚗𝚜𝚝𝚛𝚞𝚌𝚝), such that:

  • (s1,,sn)$𝚜𝚑𝚊𝚛𝚎(s,n,t;ρ) is a randomized algorithm that, on input a secret s, outputs a set of n shares (s1,,sn) using randomness ρ and fixes the threshold t.

  • s𝚛𝚎𝚌𝚘𝚗𝚜𝚝𝚛𝚞𝚌𝚝(si1,,sit) is a deterministic algorithm that on input t shares
    (si1,,sit), where (i1,,it)[n], outputs the secret s.

A secret sharing scheme must satisfy the following properties:

Correctness:

s,T=(i1,,it)[n],ρ we have:

Pr𝚜𝚑𝚊𝚛𝚎(s,n,t;ρ)(s1,,sn)[𝚛𝚎𝚌𝚘𝚗𝚜𝚝𝚛𝚞𝚌𝚝(si1,,sit)=s]=1.
Perfect Security:

s,s,I{1,,n},ρ where |I|<n, the following distributions are indistinguishable:

{(si:iI)(s1,,sn)𝚜𝚑𝚊𝚛𝚎(s,n,t;ρ)}
{(si:iI)(s1,,sn)𝚜𝚑𝚊𝚛𝚎(s,n,t;ρ)}

We describe the Shamir secret sharing scheme [10], which we use to instantiate the construction from expendable correlated randomness.

Let 𝔽q be a finite field, where q is chosen as the smallest prime greater than n; and let s𝔽q be the secret we want to share. We start by picking a uniformly random polynomial p𝔽q[X] of degree t1, such that p(0)=s, and for i[n] define the shares si=p(i). For i[n] we distribute p(i) to the i-th party. Now, observe that given that p is a t1 degree polynomial, any set of at least t shares uniquely determines p. Upon collecting this set of at least t shares, we can reconstruct via polynomial interpolation.

For the sake of completeness, we formally describe the Shamir secret sharing scheme using the 𝚜𝚑𝚊𝚛𝚎 and 𝚛𝚎𝚌𝚘𝚗𝚜𝚝𝚛𝚞𝚌𝚝 algorithms:

Definition 8 (Shamir secret sharing scheme [10]).
  • 𝚜𝚑𝚊𝚛𝚎(s,n,t;ρ): Using randomness ρ, sample a1,,at1𝜌𝔽q, build the polynomial p(x) as p(x)=s+i=1t1aixi and construct the shares (s1,,sn) as sip(i).

  • 𝚛𝚎𝚌𝚘𝚗𝚜𝚝𝚛𝚞𝚌𝚝(si1,,sit): Interpolate the polynomial p(x) for each (i1,,it)[n] points corresponding to the secret shares and output s=p(0).

Properties

  • Additive homomorphism: Let s1 and s2 be two secrets that we secret share using polynomials p1,p2𝔽q both of degree t1. Now, observe that in order to perform the addition of s1 and s2, each party can just locally add the two of its shares - e.g. the party Pi can compute si1+si2 which corresponds to the i-th share of the polynomial p1+p2 (of the same degree t1) and therefore the secret s1+s2.

  • Multiplicative homomorphism: Let s1 and s2 be two secrets that we secret share using polynomials p1,p2𝔽q of different degrees, t11 and t21. Now observe that if each party locally performs the multiplication of its shares, it obtains still a valid share of the polynomial p1p2 but of degree t1+t22. In order for the reconstruction to still naively work,333To work without using techniques such as degree reduction that require additional communication. n must be such that t1+t21n.

5.2 Overview of the construction

Let 𝒮1,,𝒮n denote the set of servers, where each server holds a copy of the database D({0,1}w)d.

Let v1,,vu represent a set of randomly generated one-hot vectors, each of length s, and let vi,j denote the j-th component of the i-th one-hot vector. As the correlated randomness setup, we assume that each server holds a secret share of every component of every one-hot vector. More formally, server 𝒮p holds a set of secret shares {vi,jp}i[u],j[s], where vi,jp is the p-th share of the component vi,j, generated using the secret sharing algorithm 𝚜𝚑𝚊𝚛𝚎(vi,j,n,t+1,ρ). We argue that although we heavily rely on this correlated randomness, this can easily be established in a pre-processing phase, for example, using a multi-party computation protocol.

Note that the tensor product of one-hot vectors results in another one-hot vector, but of a multiplied length (that is, the tensor product of our one-hot vectors will have length su). We refer to the resulting vector as the tensored one-hot vector v. In the online phase, each server 𝒮p computes the tensor product over its shares of the one-hot vectors. Specifically, each server computes

vp=v1pvup,

where for i[u], the vector vip is formed by stacking the shares {vi,jp}j[s] into a single column vector.

Because Shamir sharing has a limited form of multiplicative homomorphism, by multiplying their shares locally as described above, the servers end up with shares of each component of the tensored one-hot vector. However, the degree of the sharing of the tensored one-hot vector will be the sum of the degrees of the sharings of the individual one-hot vectors. We must tune the threshold t (the degree of the polynomials used to share the individual components of the one-hot vectors) such that, after multiplication, there are still enough shares to reconstruct the vector. This can be achieved by setting the parameters t and u such that the number of servers n>tu.

For the final step of the protocol, each server computes the inner product of its shares of the tensored one-hot vector with the database. This corresponds to obtaining a share of the database row at the position where vp is non-zero. Specifically, server 𝒮p computes:

𝚖𝚜𝚐p=D,vp

and sends 𝚖𝚜𝚐p to the client. To ensure correctness, we set the size of the original one-hot vectors s such that su=d, where d is the number of rows in the database.

Upon receiving the messages 𝚖𝚜𝚐1,,𝚖𝚜𝚐n, the client obtains a random row of the database by computing:

D𝚛𝚎𝚌𝚘𝚗𝚜𝚝𝚛𝚞𝚌𝚝(𝚖𝚜𝚐1,,𝚖𝚜𝚐n).

MS-RPIR Protocol based on One-Hot Vectors

 

Servers 𝒮1,,𝒮n each hold the database D({0,1}w)d.

Setup: Each server 𝒮p for p[n] holds shares of one-hot vectors v1,,vu of length s. These shares are computed over the individual components of the vectors as 𝚜𝚑𝚊𝚛𝚎(vi,j,n,t+1,ρ), for i[u],j[s], where vi,j is the j-th component of the vector vi. Namely, each server 𝒮p holds {vi,jp}i[u],j[s].

 

Online Phase:

𝚂𝚎𝚛𝚟𝚎𝚛p(D,{vi,jp}i[u],j[s]) : Compute vp=v1pvup, where for i[u], the vector vip is formed by stacking the shares {vi,jp}j[s] into a single column vector. Set 𝚖𝚜𝚐p=D,vp and send 𝚖𝚜𝚐p to the client.

𝙲𝚕𝚒𝚎𝚗𝚝(𝚖𝚜𝚐1,,𝚖𝚜𝚐n) : Compute D𝚛𝚎𝚌𝚘𝚗𝚜𝚝𝚛𝚞𝚌𝚝(𝚖𝚜𝚐1,,𝚖𝚜𝚐n) and output (,D).444We assume that the database has the index of each row encoded inside that row.

Figure 3: Description of the One-Hot Vectors Protocol.
Theorem 9.

Let q be a max{log(d)+w,log(n)}-bit prime and 𝔽q be a finite field. Then the one-hot vectors protocol from Figure 3 is a non-trivial n-server threshold-t MS-RPIR protocol, satisfying Definition 1, with communication complexity 𝙲𝙲(n,w,d)=n(log(d)w+1).

The protocol is non-trivial (Definition 3), since we replace communication complexity with pre-processing size. During the online phase, each server sends one message of size logq; since the client obtains n messages (one from each server), the communication complexity is

𝙲𝙲(n,w,d)=nlogq=nmax{log(d)+w,log(n)}.

The protocol also requires pre-processing of size non-trivially less than the number of rows d as long as u>1, which can only happen as long as n>2t.

Proof.
Correctness.

Let fi,j(x) be the random polynomial used for the secret sharing of the j-th component of the i-th one-hot vector. After establishing the setup, the server 𝒮p has shares {fi,j(p)}i[u],j[s]={vi,jp}i[u],j[s] which can be rearranged as a set of one-hot vectors {v1p,v2p,,vup}, where

vip =[vi,1pvi,2pvi,sp]. (1)

Note, however, that only one of the rows corresponds to a share of 1, whereas the other rows correspond to some share of 0.

Next, each server 𝒮p computes the tensor product vp=v1pv2pvup, which more closely looks like

vp =[v1,1pv2,1pvu1,1pvu,1pv1,1pv2,1pvu1,1pvu,2pv1,1pv2,1pvu1,1pvu,spv1,1pv2,1pvu1,2pvu,1pv1,1pv2,1pvu1,2pvu,2pv1,1pv2,1pvu1,2pvu,spv1,spv2,spvu1,spvu,1pv1,spv2,spvu1,spvu,2pv1,spv2,spvu1,spvu,sp] (2)

Lastly, the server computes the message 𝚖𝚜𝚐p as the inner product vp,D. Let the indices of the database range from 0 to d1. Then, we can rewrite

𝚖𝚜𝚐p=D,vp=s1,,su[s]D[i[u](si1)sui]j[u]vj,sjp.

Notice that this expression is a summation of su values where each of the values is a database row (some constant in 𝔽t), multiplied by a product of exactly u shares. Since every share corresponds to a polynomial of degree t, the product of u shares corresponds to a secret share of a polynomial of degree tu. As the client obtains n>tu shares, it can successfully interpolate the polynomial in all n points.

Observe again that since vp is a tensor product of u one-hot vectors, only one row is a share of 1 while the others are shares of 0. With that follows the correctness of the client’s output.

The argument that any row in the database is obtained with the same probability 1d follows directly from the randomness of one-hot vectors v1,,vu, as the position of the non-zero row in each of the vectors is picked uniformly at random results in the position of the non-zero row in the vector v, obtained by the tensor product v=v1vu also being uniformly random.

Client Privacy.

To argue for client privacy, we need to show that the distribution of the index output by the client, along with the view of up to threshold-t colluding servers, is indistinguishable from the distribution where the index is sampled uniformly at random. This follows from the randomness of the one-hot vectors: the index of the row received by the client corresponds to the position of the non-zero entry in the one-hot vector v, which is computed as the tensor product v=v1vu. As in the correctness argument, each vector v1,,vu has its non-zero position chosen uniformly at random. Consequently, the non-zero position in v is also uniformly random, ensuring that the client’s obtained index remains uniformly distributed.

The view of up to threshold-t colluding servers is exactly t shares of each vector v1,,vu. As we use the Shamir sharing scheme, which has perfect security, this set of shares reveals no information about the non-zero position in vectors and the index obtained by the client. Therefore, the two distributions are indistinguishable.

References

  • [1] Andris Ambainis. Upper bound on communication complexity of private information retrieval. In Pierpaolo Degano, Roberto Gorrieri, and Alberto Marchetti-Spaccamela, editors, ICALP 97, volume 1256 of LNCS, pages 401–407. Springer, Heidelberg, July 1997. doi:10.1007/3-540-63165-8_196.
  • [2] Amos Beimel and Yuval Ishai. Information-theoretic private information retrieval: A unified construction. In Fernando Orejas, Paul G. Spirakis, and Jan van Leeuwen, editors, ICALP 2001, volume 2076 of LNCS, pages 912–926. Springer, Heidelberg, July 2001. doi:10.1007/3-540-48224-5_74.
  • [3] Yeow Meng Chee, Tao Feng, San Ling, Huaxiong Wang, and Liang Feng Zhang. Query-efficient locally decodable codes of subexponential length, 2010. arXiv:1008.1617.
  • [4] Benny Chor, Oded Goldreich, Eyal Kushilevitz, and Madhu Sudan. Private information retrieval. In 36th FOCS, pages 41–50. IEEE Computer Society Press, October 1995. doi:10.1109/SFCS.1995.492461.
  • [5] Zeev Dvir and Sivakanth Gopi. 2-server PIR with sub-polynomial communication. In Rocco A. Servedio and Ronitt Rubinfeld, editors, 47th ACM STOC, pages 577–584. ACM Press, June 2015. doi:10.1145/2746539.2746546.
  • [6] Klim Efremenko. 3-query locally decodable codes of subexponential length. In Michael Mitzenmacher, editor, 41st ACM STOC, pages 39–44. ACM Press, May / June 2009. doi:10.1145/1536414.1536422.
  • [7] Craig Gentry, Shai Halevi, Bernardo Magri, Jesper Buus Nielsen, and Sophia Yakoubov. Random-index PIR and applications. In Kobbi Nissim and Brent Waters, editors, TCC 2021, Part III, volume 13044 of LNCS, pages 32–61. Springer, Heidelberg, November 2021. doi:10.1007/978-3-030-90456-2_2.
  • [8] Fatemeh Ghasemi, Swastik Kopparty, and Madhu Sudan. Improved pir schemes using matching vectors and derivatives, 2024. doi:10.48550/arXiv.2411.11611.
  • [9] Toshiya ITOH and Yasuhiro SUZUKI. Improved constructions for query-efficient locally decodable codes of subexponential length. IEICE Transactions on Information and Systems, E93-D(2):263–270, 2010. doi:10.1587/transinf.e93.d.263.
  • [10] Adi Shamir. How to share a secret. Commun. ACM, 22(11):612–613, November 1979. doi:10.1145/359168.359176.
  • [11] Stephanie Wehner and Ronald de Wolf. Improved lower bounds for locally decodable codes and private information retrieval. In Luís Caires, Giuseppe F. Italiano, Luís Monteiro, Catuscia Palamidessi, and Moti Yung, editors, ICALP 2005, volume 3580 of LNCS, pages 1424–1436. Springer, Heidelberg, July 2005. doi:10.1007/11523468_115.