Abstract 1 Introduction 2 Preliminaries 3 Defining Security via The Real vs. Ideal World Paradigm 4 Comparing the Definitions 5 A Generic Transformation From Private PIR to Secure-With-Abort PIR References

On the Definition of Malicious
Private Information Retrieval

Bar Alon ORCID Department of Computer Science, Georgetown University, Washington DC, USA Amos Beimel ORCID Department of Computer Science, Ben Gurion University, Be’er-Sheva, Israel
Abstract

A multi-server private information retrieval (PIR) protocol allows a client to obtain an entry of its choice from a database, held by one or more servers, while hiding the identity of the entry from small enough coalitions of servers. In this paper, we study PIR protocols in which some of the servers are malicious and may not send messages according to the pre-described protocol. In previous papers, such protocols were defined by requiring that they are correct, private, and robust to malicious servers, i.e., by listing 3 properties that they should satisfy. However, 40 years of experience in studying secure multiparty protocols taught us that defining the security of protocols by a list of required properties is problematic.

In this paper, we rectify this situation and define the security of PIR protocols with malicious servers using the real vs. ideal paradigm. We study the relationship between the property-based definition of PIR protocols and the real vs. ideal definition, showing the following results:

  • We prove that if we require full security from PIR protocols, e.g., the client outputs the correct value of the database entry with high probability even if a minority of the servers are malicious, then the two definitions are equivalent. This implies that constructions of such protocols that were proven secure using the property-based definition are actually secure under the “correct” definition of security.

  • We show that if we require security-with-abort from PIR protocols (called PIR protocols with error-detection in previous papers), i.e., protocols in which the user either outputs the correct value or an abort symbol, then there are protocols that are secure under the property-based definition; however, they do not satisfy the real vs. ideal definition, that is, they can be attacked allowing selective abort. This shows that the property-based definition of PIR protocols with security-with-abort is problematic.

  • We consider the compiler of Eriguchi et al. (TCC 22) that starts with a PIR protocol that is secure against semi-honest servers and constructs a PIR protocol with security-with-abort; this compiler implies the best-known PIR protocols with security-with-abort. We show that applying this compiler does not result in PIR protocols that are secure according to the real vs. ideal definition. However, we prove that a simple modification of this compiler results in PIR protocols that are secure according to the real vs. ideal definition.

Keywords and phrases:
Private information retrieval, secure multiparty computation
Funding:
Amos Beimel: Partially supported by ISF grant 391/21. Co-funded by the European Union (ERC, NFITSC, 101097959). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.
Copyright and License:
[Uncaptioned image] © Bar Alon and Amos Beimel; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Security and privacy Cryptography
Related Version:
Full Version: https://eprint.iacr.org/2025/683
Editor:
Niv Gilboa

1 Introduction

A private information retrieval (PIR) protocol [16] allows a client to obtain an entry of its choice from a database held by one or more servers, such that nothing is revealed to any small enough coalition of servers about the item being revealed. For example, an investor might want to know the value of a specific stock without revealing which stock they are interested in. This is modeled by considering an N-bit string D held by each server, and the client holding a retrieval index i wishing to learn the ith entry of D, i.e., Di, without revealing i. The trivial solution is to let a single server send the entire database to the client: Chor et al. [16] showed that this is optimal in the information-theoretic setting when there is a single server; however, better protocols are known if the database is held by 2 servers. PIR protocols were used to construct secure multiparty protocols [30, 7], locally decodable codes [34], and inspired constructions of conditional disclosure of secrets [39, 40]. Most constructions of PIR protocols guarantee that any single server will not learn any information on the client’s retrieval index [16, 2, 30, 31, 6, 8, 35, 46, 48, 22, 33, 14, 20, 27, 1], e.g., there is a 3-server PIR protocol secure against a single server with communication complexity 2O~(logN3) [27]. When there are many servers, it is natural to require that the protocol guarantees the privacy of the client against colluding subsets of the servers. Constructions of such PIR protocols are given in [30, 6, 8, 46, 5]. It is also natural to consider malicious servers that may deviate from the protocol. In the stock market example above, one may consider the case where some of the servers try to give the wrong information on the stock. Such PIR protocols were first discussed in [9] and further studied in [47, 46, 28, 18, 49, 43, 44, 37, 4, 23, 24, 25].

The security of PIR protocols in the literature is defined by listing the desired properties from the protocol. However, PIR is a special case of secure multiparty computation (MPC), where it is well-known that separating the security properties is problematic. Indeed, there are MPC protocols that satisfy the natural security properties (such as privacy and correctness) yet should clearly be considered insecure [42, 13, 29]. Instead, the security of MPC protocols is defined using the real vs. ideal world paradigm. Intuitively, we consider an ideal process for computing a function via a trusted party that the adversary cannot corrupt. It is then required that any attack performed in the real world can be simulated in the ideal world, where the adversary is more limited. Different ideal processes capture different security properties, e.g., full security and security-with-abort.

Defining malicious security of PIR protocol via properties is problematic and it is unclear if there are other security properties that can be attacked in such protocols. In this work, we rectify this situation and define the security of PIR protocols using the real vs. ideal world paradigm. Specifically, we are interested in the notions of full security (also known as guaranteed output delivery) and security-with-abort. The former captures the requirement that the client always obtain the “correct” output, and the latter allows the client to abort but never output an “incorrect” value. We ask how such definitions relate to the property-based definitions used so far, and if there are efficient PIR protocols satisfying the real vs. ideal security definition. We consider the perfect setting, the statistical setting, and the computational setting.

1.1 Our Contributions

Our conceptual contribution is defining the ideal processes of the PIR functionality for full security and security-with-abort against malicious adversaries (see Section 3 for the formal definitions). We then compare the simulation-based security definition to the property-based security definition.

1.1.1 The Ideal World Processes

In this section, we describe the ideal world process for each of the desired security notions. A PIR protocol is then considered secure according to an ideal world if any adversary corrupting a subset of the servers can be simulated in the ideal world. We stress that definitions for PIR protocols do not have any privacy requirements from the client; in particular, the client may learn the whole database. Thus, we only consider adversaries that do not corrupt the client.

Full Security of PIR Protocols

We first describe the definition of fully secure PIR protocols. Roughly, the definition captures the requirement that the client outputs the correct value Di, regardless of the messages sent by the malicious servers. Observe that this only makes sense when the majority of the servers are honest.111If the number of servers is and t of them might be corrupted, where 2t, then the client cannot distinguish an execution with the first t servers being corrupted and behaving honestly on input 0N and the t remaining honest servers holding 1N, from an execution with the last tt servers being corrupted and behaving honestly on input 1N and the first t servers being honest and holding 0N. The ideal world for full security proceeds as follows. The client sends i to the trusted party and each honest server sends D to the trusted party. The corrupted servers may change their databases (based on the instruction of the ideal-world adversary) and send them to the trusted party. Let D denote the database that was sent by more than 1/2 of the servers (note that D=D if there is an honest majority). The trusted party then sends Di to the client. The ideal functionality is only defined when there is an honest majority.

Secure-With-Abort PIR Protocols

We next describe the definition of secure-with-abort PIR protocols. Here, we allow the client to output (indicating abort). The security definition captures the requirement that the client never outputs 1Di. Note that, unlike the ideal world for full security, here the definition allows a majority of corrupted servers.

The ideal-world computation proceeds as follows. The client sends i to the trusted party and each honest server sends D to the trusted party. The corrupted servers may change their databases (based on the instruction of the ideal-world adversary) and send them to the trusted party. If the trusted party receives the same database D from all servers, then it sends Di to the client; otherwise, it sends to the client.

1.1.2 Comparing Simulation-Based Security to Property-Based Security

We compare the simulation-based security definitions to the property-based security definitions. In the following, we fix the total number of servers to be , and let t bound the number of corrupted servers. We refer the reader to Section 4 for the formal statements and proofs of the theorems.

Full Security

We first compare full security with the property-based definition. In addition to correctness and t-privacy, here we also consider the notion of t-Byzantine robust222Also known as t-error-correcting [24]. PIR protocols defined by Beimel and Stahl [9]. Roughly, it requires that the client outputs Di even if t malicious servers arbitrarily deviate from the protocol. Note that similarly to full security, this definition only makes sense when the majority of the servers are honest. As would be expected, full security implies all 3 security properties. We show that the converse also holds.

Theorem 1 (Informal, equivalence of full security and the property-based definition).

Any -server PIR protocol is t-fully secure if and only if it is correct, t-private, and t-Byzantine robust.

Thus, to prove full security it suffices to show that each property holds individually. In particular, this shows that previous Byzantine robust constructions, e.g., [9, 47, 46, 43, 44, 37, 4, 23, 24], are also fully secure.

Secure-With-Abort

Finally, we consider security-with-abort. Here, instead of t-Byzantine robustness, we consider the notion of t-error-detecting PIR protocols defined by Eriguchi et al. [23, 24]. In a t-error-detecting PIR protocol, the client can abort (i.e., by outputting ) but security requires that no set of t malicious servers can force the client to output 1Di. Clearly, error-detecting is weaker than Byzantine robust; however, it is possible to construct t-error-detecting PIR protocols when t/2 [23, 24].

Similarly to the previous definitions, the simulation-based definition implies the property-based definition. However, we show that here the converse is not true.

Theorem 2 (Informal, security-with-abort and the property-based definition).

Any -server PIR protocol that is t-secure-with-abort is correct, t-private, and t-error-detecting. However, the converse does not hold.

To see why the properties do not imply simulation-based security, consider the following protocol, in which a single malicious server can cause a “selective abort”, i.e., the user will abort if and only if i=1. In the protocol, each server sends D to the client and additionally, the first server sends an additional bit b{0,1}. An honest server sends b=0. If the client received different databases, or if it has input i=1 and it received b=1, then it outputs . Otherwise, the client only receives the database D, and it outputs Di. Clearly, Π is correct, 1-private, and 1-error-detecting. However, it is not 1-secure-with-abort. This is because the first server can force a different probability for the client to abort on different indices i. Specifically, if the server sends b=1, then the client always aborts on index i=1, and it never aborts on index i1. Note that this holds even though the server does not know anything about i. See Example 18 for the formal argument. In Example 19, we show that even if we require the same probability for abort on all indexes i, then it is still insufficient to claim security-with-abort.

Colombo et al. [17] noticed that standard definitions for the security of PIR might allow for selective abort (however, they did not provide any concrete example). This motivated them to define authenticated PIR to capture security against such attacks. However, their definition is property-based, and it is unclear whether their definition is equivalent to our simulation-based security, or whether it captures security against attacks not mentioned in their paper.

A Compiler From a Private PIR Protocol to a Secure-With-Abort Protocol

Eriguchi et al. [24] showed a compiler that takes any -server correct and t-private PIR protocol, and transforms it into a correct, t-private, and t-error-detecting -server PIR protocol. In light of Theorem 2, it is natural to ask whether their compiler satisfies the stronger requirement of t-security-with-abort. We show that this is not the case; however, we prove that a simple modification to their construction results in a secure-with-abort protocol.

Theorem 3 (Informal, compiler to secure-with-abort).

There exists a compiler that transforms any -server correct and t-private PIR protocol Π into an -server t-secure-with-abort PIR protocol. The total communication complexity (i.e., the total number of bits sent) of the new protocol is 2ω(logN)(c+log), where c is the communication complexity of Π.

This result implies that for a constant number of servers, the communication complexity of t-secure-with-abort PIR protocols is equivalent to PIR protocols secure against t-semi-honest servers (up to a factor of ω(logN)). We refer to Section 5 for the formal statement and proof. In Table 1 we summarize the PIR protocols obtained from our results.

Table 1: A summary of the best PIR protocols tolerating t corrupted servers for various parameters. The results in the second row are obtained by applying our compiler on the protocols from the first row. These results are also obtained by [24] for the weaker property-based security. The last row follows from [24] and Theorem 1.
2t server PIR from [5] + [20] 3t server PIR from [5] + [22] The server PIR, =O(1) [46]
t-private 2O~(tlogN) max{t2O~(logN),2O(t)} N1/(21)/t
t-security
with-abort
2max{O~(tlogN),tlogt} 32tmax{t2O~(logN),t3t} N1(21)/t+o(1)
t-full  security 2t2+max{O~(tlogN),tlogt} 3t2+2tmax{t2O~(logN),2O(t)} N1(21)/t2+o(1)

1.2 Our Techniques

To illustrate our techniques, we next show that t-semi-honest security is equivalent to correctness and t-privacy. Since simulation-based security is clearly stronger than property-based security, we only show that correctness and t-privacy imply t-semi-honest security. We do it for statistical security (other cases are handled similarly). This is a special case of a known result that the property-based definition is equivalent to the simulation-based definition for deterministic functionalities with semi-honest security against unbounded adversaries.

Fix a real-world adversary corrupting a set of at most t servers. We define its simulator 𝖲𝗂𝗆 running the client on index i=1, and outputting the queries that correspond to the corrupted servers. We now show that the statistical distance between the real and ideal worlds is negligible. For a retrieval index i[N] and a database D{0,1}N let yD,i denote the output of the client in the real world, and let 𝐪i denote the queries the corrupted server receives. By construction of the simulator and the fact that the client always outputs Di in the ideal world, we need to show that (𝐪i,yD,i) and (𝐪1,Di) are statistically close. First, by the t-privacy of the protocol, (𝐪1,Di) and (𝐪i,Di) are statistically close. Second, observe that correctness implies that (𝐪i,Di) and (𝐪i,yD,i) are statistically close. Thus, (𝐪i,yD,i) and (𝐪1,Di) are statistically close.

1.3 Related Works

1.3.1 1-Private PIR

We next state the best known PIR protocols. Efremenko [22] constructed a 3-server 1-private PIR protocol with query length 2O~(logN) and 2r-server 1-private PIR protocols with query length 2O~(logNr); the answer length in these protocols is O(1). Dvir and Gopi [20] constructed a PIR protocol with query and answer length 2O~(logN). Very recently, Ghasemi, Kopparty, and Sudan [27] constructed a 3-server PIR protocol with query and answer length 2O~(logN3). These constructions were simplified by Alon, Beimel, and Lasri [1]. The communication complexity of -server PIR protocols for 6 was improved by Itoh and Suzuki [32, 33], Chee, Feng, Ling, Wang, and Zhang [14], Dvir and Gopi [20], and Ghasemi et al. [27]. For example, there is a 6-server PIR protocol with communication complexity 2O~(logN4) [32, 3, 27, 1]. The best known lower bound on the total communication complexity of 2-server PIR protocols is 5logn, proved by Wehner and de Wolf [45] (improving on [41, 34, 36]).

1.3.2 𝒕-Private PIR

t-private PIR protocols were constructed in [16, 2, 30, 6, 8, 46]. In particular, Woodruff and Yekhanin [46] presented a t-private -server PIR construction for general and t with communication complexity 2tlogN1/(21)/t. Barkol, Ishai, and Weinreb [5] presented a general transformation from 1-private PIR protocols to t-private PIR protocol; given a 1-private -server PIR protocol with query length mq and answer length ma, they constructed a t-private t-server PIR protocol with query length O(tmq) and answer length O(mat). When t is small compared to , this gives better protocols than [46], e.g., a t-private 2t-server PIR protocol with communication complexity 2O~(tlog(N)) (using [20]) and a t-private 3t-server PIR protocol with query length t2O~(log(N)) and answer length 2O(t) (using [22]).

1.3.3 Robust and Byzantine-Robust PIR

Beimel and Stahl [9] introduced robust and Byzantine robust PIR protocols. A PIR protocol is t-robust if the client can recover the correct value even if t of the servers go offline. The generalized notion of t-Byzantine robust requires robustness to hold even if t of the servers are malicious. This was further studied in subsequent works [46, 37, 23, 24]. In particular, Eriguchi et al. [24] showed that for a constant number of servers:

  • The communication complexity of perfect t-Byzantine robust and t-private -server PIR is equivalent to the communication complexity of t-private (2t)-server PIR protocol;

  • The communication complexity of statistical t-Byzantine robust and t-private -server PIR is equivalent to t-private (t)-server PIR protocol.

Combining the above results and [46] yields a t-private and t-Byzantine robust -server statistical PIR protocol with total communication ω(logN)N1(21)/t2. When t is relatively small compared to , the results of [5, 20, 22] yield a t-private and a t-Byzantine robust 2t-server statistical PIR protocol with total communication complexity 2t2+max{O~(tlogN),tlogt} and a t-Byzantine robust 3t-server statistical PIR protocol with total communication complexity 3t2+2tmax{t2O~(logN),t3t}. Byzantine robust PIR protocols where the database entries are large were considered in [47, 43, 44, 4]. They measure efficiency compared to the size of the entries rather than the size of the database (as we do in our work).

Eriguchi et al. [23, 24] introduced error-detecting PIR, where the client may abort and it is required that it never outputs the wrong value. In [23], they showed that error-detecting with perfect security and communication that is sublinear in the database size is impossible. In particular, this shows that there is no non-trivial perfect security-with-abort PIR protocol. In [24], they constructed a compiler transforming any t-private -server PIR protocol with communication c0 to a t-error-detecting one with total communication 2ω(logN)(c0+log). When t is very small compared to , this has significantly better communication complexity compared to the Byzantine robust protocols discussed above. For example, there is a t-error-detecting 2t-server PIR protocol with total communication complexity 2max{O~(tlogN),tlogt} and a t-error-detecting 3t-server PIR protocol with total communication complexity 32tmax{t2O~(logN),t3t}. Additional constructions of error-detecting PIR protocols appear in Eriguchi et al. [25].

Colombo et al. [17] also observed that standard PIR security definitions allow for selective abort attacks. This motivated them to define authenticated PIR protocols, where, similarly to error-detection, it is required that either the client aborts or its output is consistent with the honest server’s database. Additionally, the privacy of the client’s retrieval index is required to hold even if the adversary knows whether the client aborts or not, which prevents selective abort attacks. Unlike our definitions, Colombo et al. [17] defined security by listing the desired security properties.

1.3.4 Computational PIR

Computational PIR protocols consider computationally-bounded servers. This model was first considered by Chor and Gilboa [15]. Computational single-server PIR protocols can be constructed with non-trivial communication complexity [38, 12, 26]. Computational PIR can be seen as a special case of fully homomorphic encryption; the result of Brakerski and Vaikuntanathan [11] used this to construct a single-server PIR protocol with total communication poly(κ)+logN, where κ is the security parameter.

1.3.5 Spooky PIR

Dwork et al. [21] studied two-round succinct arguments for NP by composing PCP proofs with computational single-server PIR protocols. They showed that such heuristics may be insecure. More generally, they argued that executing PIR protocols in parallel may introduce “spooky interactions”: even though the queries made by the client are independent and the server knows nothing about the indexes of the client, the server may introduce correlations between the queries and outputs of the client. Dodis et al. [19] later showed that this holds even when using multi-prover interactive proofs instead of PCPs. These results show the necessity of defining the security of PIR protocols via the real vs. ideal world paradigm.

1.3.6 Verifiable PIR

Ben-David et al. [10] introduced verifiable PIR, where a (single) server should be able to prove that the database satisfies various properties. Although one of their definition follows the real vs. ideal world paradigm, it does not require privacy or security against selective abort attacks.

2 Preliminaries

2.1 Notations

We use bold characters to denote vectors. For n, let [n]={1,2,n}. For a set 𝒮, we write s𝒮 to indicate that s is selected uniformly at random from 𝒮. Given a random variable (or a distribution) X, we write xX to indicate that x is selected according to X. ppt stands for probabilistic polynomial time. A function ε() is called negligible if for every positive polynomial p() and all sufficiently large n, ε(n)<1/p(n). For a vector 𝐯 of dimension n, we write vi for its ith coordinate, and for 𝒮[n] we write 𝐯𝒮=(vi)i𝒮.

A distribution ensemble X={Xa,n}a𝖣n,n is an infinite sequence of random variables indexed by a𝖣n and n, where 𝖣n is a domain that might depend on n. Computational indistinguishability is defined as follows.

Definition 4.

Let X={Xa,n}a𝒟n,n and Y={Ya,n}a𝒟n,n be two ensembles. We say that X and Y are computationally indistinguishable, denoted XCY, if for every ppt distinguisher 𝖣, there exists a negligible function ε(), such that for all n and a𝒟n,

|Pr[𝖣(Xa,n)=1]Pr[𝖣(Ya,n)=1]|ε(n).
Definition 5.

The statistical distance between two finite random variables X and Y is defined as

SD(X,Y)=max𝒯(Pr[X𝒯]Pr[Y𝒯]).

Two ensembles X={Xa,n}a𝒟n,n and Y={Ya,n}a𝒟n,n are said to be statistically close, denoted XSY, if there exists a negligible function ε(), such that for all n and a𝒟n,

SD(Xa,n,Ya,n)ε(n).

We say that X and Y are identically distributed, denoted XPY, if ε(n)=0 for all a𝒟n and n.

Theorem 6 (Hoeffding’s inequality for the hypergeometric distribution).

Let n, m,k[n], and A([n]m). Then for every t>0,

PrX([n]k)[|XA|kmnt]e2t2/k

and

PrX([n]k)[|XA|kmnt]e2t2/k.

2.2 Private Information Retrieval

In this section, we start with the definition of single-round private information retrieval (PIR) protocols [16]. We then define property-based security taken from previous papers [9, 23, 24]. To simplify the presentation, we present the definition for a constant number of servers (i.e., independent of the database size). The definition readily extends to a non-constant number of servers. Intuitively, the PIR protocol starts with the client running a query algorithm 𝒬 and sending the jth output qj to the jth server Sj. The jth server, holding the database D and the query qj, responds with the answer 𝒜(j,qj,D). Finally, the client computes the output by applying the reconstruction algorithm 𝒞. We next formalize this.

Definition 7 (PIR protocols).

Let . An -server PIR protocol is given by a 3-tuple Π=(𝒬,𝒜,𝒞) of algorithms with the following syntax.

  1. 1.

    The query algorithm (q1,q2,,q,𝗌𝗍)𝒬(1N,i) is a randomized algorithm that is given the size of the database N and a retrieval index i[N]. It outputs a query qj for every server j[] and a state 𝗌𝗍{0,1}. We denote 𝐪=(q1,,q).

  2. 2.

    The answer algorithm aj=𝒜(j,q,D) is a deterministic algorithm that is given an index j[], a query q, and a database D{0,1}N for some N. It outputs a response aj. We denote 𝐚=(a1,,a).

  3. 3.

    The reconstruction algorithm y=𝒞(1N,𝐚,𝗌𝗍) is a deterministic algorithm that is given the size of the database N, the answers 𝐚=(aj)j[], and the state 𝗌𝗍. It outputs y{0,1,}.

The protocol is efficient if all three algorithms run in polynomial time in N. The total communication complexity is the maximum (taken over the randomness of the client and the databases D) of the total number of bits sent in the protocol, i.e., maxr,D{i=1|qi|+|ai|}, where r is the randomness of 𝒬.

2.3 Defining Security for PIR via Security Properties

We next define the security properties of PIR protocols as they appear in the literature. We first define correctness, which states that in an honest execution, the client outputs the correct value.

Definition 8 (Correctness).

Let and let Π=(𝒬,𝒜,𝒞) be an -server PIR protocol. We say that Π is statistically correct if for any N, any database D{0,1}N, and any retrieval index i[N],

Pr(𝐪,𝗌𝗍)𝒬(1N,i)[𝒞(1N,(𝒜(j,qj,D))j[],𝗌𝗍)=Di]1ε(N),

for some negligible function ε(), where the probability is over the randomness of 𝒬. If ε=0 then we say that Π is perfectly correct.

We now define t-privacy, stating that no set of t servers learns anything about the retrieval index of the client.

Definition 9 (Privacy).

Let , let t[], and let Π=(𝒬,𝒜,𝒞) be an -server PIR protocol. We say that Π is statistically t-private if for any set of t servers ([]t), and any two sequences of indices {iN}N and {iN}N,

{𝐪}N,D{0,1}NS{𝐪}N,D{0,1}N,

where (𝐪,𝗌𝗍)𝒬(1N,iN) and (𝐪,𝗌𝗍)𝒬(1N,iM). We define computationally/perfectly t-privacy by replacing S with C and P, respectively, in the above equation.

We next define security properties that should hold against malicious behavior by a subset of the servers. Following [23, 24], we first define tampering algorithms;333Eriguchi et al. [23, 24] defined tampering functions rather than algorithms. this is an adversary that can modify some of the answers.

Definition 10 (A tampering algorithm [23, 24]).

Let , [], and let Π=(𝒬,𝒜,𝒞) be an -server PIR protocol. A tampering algorithm for Π is a randomized algorithm 𝐚=(aj)j[](,𝐪,D) that is given a set [], a set of queries, and a database D{0,1}N for some N. It outputs answers (aj)j[] such that aj=𝒜(j,qj,D) for all j[], and aj depends on , q, and D for all j.

We now define t-error-detecting. Roughly, it means that no set of t cheating servers can force the client to output the wrong value. Note, however, that the client is allowed to output (even with probability 1).

Definition 11 (Error-detecting PIR [23, 24]).

Let , t[], and let Π=(𝒬,𝒜,𝒞) be an -server PIR protocol. We say that Π is statistical t-error-detecting if for any N, for any database D{0,1}N, for any retrieval index i[N], any subset ([]t) of t servers, and any tampering algorithm for Π,

Pr(𝐪,𝗌𝗍)𝒬(1N,i)[𝒞(1N,(,𝐪,D),𝗌𝗍){Di,}]1ε(N),

for some negligible function ε(), where the probability is over the randomness of 𝒬 and . We define perfect t-error-detecting by requiring the above to hold for ε(N)=0 for all N. We define computational t-error-detecting by requiring the above to hold only for ppt algorithms .

Finally, we define t-Byzantine robustness, which roughly means that the client outputs the correct value (i.e., Di) even if at most t servers cheat. Note that any Byzantine robust PIR protocol is also correct, by taking =.

Definition 12 (Byzantine robust PIR [9]).

Let , let t[], and let Π=(𝒬,𝒜,𝒞) be an -server PIR protocol. We say that Π is statistical t-Byzantine robust if for any N, any database D{0,1}N, any retrieval index i[N], any subset ([]t) of t servers, and any tampering algorithm for Π,

Pr(𝐪,𝗌𝗍)𝒬(1N,i)[𝒞(1N,(,𝐪,D),𝗌𝗍)=Di]1ε(N),

for some negligible function ε(), where the probability is over the randomness of 𝒬 and . We define perfect t-Byzantine robust by requiring the above to hold for ε(N)=0 for all N. We define computational t-Byzantine robust by requiring the above to hold only for ppt algorithms .

3 Defining Security via The Real vs. Ideal World Paradigm

In this section, we present the security definition for PIR via the real vs. ideal world paradigm. We present the definition for both full security (i.e., with guaranteed output delivery) and for security-with-abort. We first define the real world execution, followed by the ideal worlds for full security and security-with-abort. Then, we present the security definitions. Our main contribution in the definition is defining the appropriate functionalities that are computed in the ideal world.

The Real World

Let and let Π=(𝒬,𝒜,𝒞) be an -server PIR protocol. Toward defining security, we first describe an execution of Π in the presence of an adversary . The adversary controls a subset [] of the servers. It receives the database D{0,1}N and the queries the corrupted servers receive from the client, and instructs each server how to respond. We consider malicious adversaries that can instruct the corrupted servers to respond to the client in an arbitrary way. At the end of the execution, the adversary outputs some function of its view (i.e., its randomness, the database, and the queries it received from the client). For a database size N, a database D{0,1}N, and a retrieval index i[N], we let VIEWΠ,𝗋𝖾𝖺𝗅(N,D,i) and OUTΠ,𝗋𝖾𝖺𝗅(N,D,i) denote the outputs of and the client respectively in an execution of Π. Finally, let

REALΠ,(N,D,i)=(VIEWΠ,𝗋𝖾𝖺𝗅(N,D,i),OUTΠ,𝗋𝖾𝖺𝗅(N,D,i)).

The Ideal World – Full Security

We next describe an ideal computation with guaranteed output delivery (also referred to as full security) for PIR, where a trusted party 𝖳 performs the computation on behalf of the parties, and the ideal model adversary cannot abort the computation. Note that we only consider adversaries that do not corrupt the client and corrupt a minority of the servers. The input of the adversary is the input of the servers it corrupts, i.e., the database D. An ideal computation with database size N, on input a database D{0,1}N for the servers and retrieval index i[N] for the client, in the presence of an adversary (a simulator) 𝖲𝗂𝗆 controlling a set [], proceeds as follows.

Parties send inputs to the trusted party:

Each honest server Sj, i.e., j, sends Dj:=D to the trusted party 𝖳. For each corrupted server Sj, i.e., j, the adversary 𝖲𝗂𝗆 sends to 𝖳 a database Dj{0,1}N of its choice. If the server does not send any database, then 𝖳 sets Dj=0N. The client sends i to 𝖳.

The trusted party performs the computation:

If more than 1/2 of the databases equal some D, then the trusted party sends Di to the client. Otherwise, send Di1 to the client.

Output:

The client outputs whatever it received from 𝖳 and the adversary outputs some function of its view (i.e., its random string and the database D).

For a database size N, a database D{0,1}N, and a retrieval index i[N], let VIEW𝖲𝗂𝗆𝗀𝗈𝖽(N,D,i) and OUT𝖲𝗂𝗆𝗀𝗈𝖽(N,D,i) denote the outputs of 𝖲𝗂𝗆 and the client respectively in an execution of the above ideal world. Finally, let

IDEAL𝖲𝗂𝗆𝗀𝗈𝖽(N,D,i)=(VIEW𝖲𝗂𝗆𝗀𝗈𝖽(N,D,i),OUT𝖲𝗂𝗆𝗀𝗈𝖽(N,D,i)).

Note that the client always outputs Di if there is an honest majority.

The Ideal World – Security-With-Abort

We next describe an ideal computation with security-with-abort for PIR. Unlike the ideal model for full security, here the adversary can abort the computation. An ideal computation with database size N, on input a database D{0,1}N and index i[N] for the client, in the presence of an adversary (a simulator) 𝖲𝗂𝗆 controlling a set [], proceeds as follows.

Parties send inputs to the trusted party:

Each honest server Sj sends Dj:=D to the trusted party 𝖳. For each corrupted server Sj, the adversary 𝖲𝗂𝗆 sends to 𝖳 a database Dj{0,1}N of its choice. If the server does not send any database, then 𝖳 sets Dj=0N. The client sends i to 𝖳.

The trusted party performs the computation:

If the databases that 𝖳 received are not identical, then 𝖳 sends to the client. Otherwise, it sends it Di1.

Output:

The client outputs whatever it received from 𝖳 and the adversary outputs some function of its view (i.e., its random string and the database D).

For a database size N, a database D{0,1}N, and a retrieval index i[N], let VIEW𝖲𝗂𝗆𝗌𝗐𝖺(N,D,i) and OUT𝖲𝗂𝗆𝗌𝗐𝖺(N,D,i) denote the outputs of 𝖲𝗂𝗆 and the client respectively in an execution of the above ideal world. Finally, let

IDEAL𝖲𝗂𝗆𝗌𝗐𝖺(N,D,i)=(VIEW𝖲𝗂𝗆𝗌𝗐𝖺(N,D,i),OUT𝖲𝗂𝗆𝗌𝗐𝖺(N,D,i)).

Defining Security

Next, we define the security of a PIR protocol via the real vs. ideal world paradigm. We define full security and security-with-abort.

Definition 13 (Security via real vs. ideal paradigm).

Let , t[], and let Π=(𝒬,𝒜,𝒞) be an -server PIR protocol. We say that Π is statistically t-fully secure if for every adversary controlling a set [] of at most t servers, there exists a simulator 𝖲𝗂𝗆 controlling the same subset in the ideal world, such that

{IDEAL𝖲𝗂𝗆𝗀𝗈𝖽(N,D,i)}N,D{0,1}N,i[N]S{REALΠ,(N,D,i)}N,D{0,1}N,i[N].

We say that Π is statistically t-secure-with-abort if the above holds with IDEAL𝖲𝗂𝗆𝗌𝗐𝖺(N,D,i) replacing IDEAL𝖲𝗂𝗆𝗀𝗈𝖽(N,D,i).

Perfect t-security is defined by replacing S with P in the above equations. Computational t-security is defined by replacing S with C and limiting both the real-world adversary and the ideal-world simulator 𝖲𝗂𝗆 to be ppt algorithms.

 Remark 14.

Standard security definitions following the real vs. ideal world paradigm consider adversaries that receive auxiliary information. The same auxiliary information is given to the real-world adversary and its simulator. This is necessary to argue that composition of secure protocols remains secure. If we were to add this to the security definition, then in order to argue that full security is equivalent to the privacy and Byzantine robustness properties (see Theorem 15 below), we would have had to add the auxiliary information to the privacy requirement (the statement and proof of equivalence remain the same). We decided not to include the auxiliary information in the privacy definition since this is not a common definition in the literature.

4 Comparing the Definitions

In this section, we compare the definitions given in the previous section. We start by showing the equivalence between full security and the property-based security definition. Specifically, we show that a PIR protocol is fully secure if and only if it is both private and Byzantine robust (recall that Byzantine robustness implies correctness). The interesting direction is showing the property-based security definition implies the real vs. ideal definition.

Theorem 15.

Let ,t be such that t</2, let Π=(𝒬,𝒜,𝒞) be an -server PIR protocol, and let 𝗍𝗒𝗉𝖾{computationally, statistically, perfectly}. Then, Π is 𝗍𝗒𝗉𝖾 t-fully secure if and only if it is 𝗍𝗒𝗉𝖾 t-private and 𝗍𝗒𝗉𝖾 t-Byzantine robust.

Proof.

We prove the results for 𝗍𝗒𝗉𝖾=statistically. The other cases are similar. We first prove that if Π is both statistically t-private and statistically t-Byzantine robust, then it is statistically t-fully secure. Fix a real-world adversary corrupting a set of at most t servers. We define its simulator 𝖲𝗂𝗆 as follows. First, it sends to the trusted party the database D for every corrupted party. It then computes (𝐪1,𝗌𝗍1)𝒬(1N,1) (i.e., queries for the index 1), sends 𝐪1 to the adversary, and outputs whatever it outputs. Note that if and 𝒬 are ppt algorithms, then 𝖲𝗂𝗆 is a ppt algorithm.

We now show that the statistical difference between the real and ideal worlds is negligible. For a retrieval index i[N] and a database D{0,1}N, let yD,i denote the output of the client in the real world. Note that it suffices to show that

{(𝐪i,yD,i)}N,D{0,1}N,i[N]S{(𝐪1,Di)}N,D{0,1}N,i[N], (1)

where (𝐪i,𝗌𝗍i)𝒬(1N,i).

Fix an event 𝒯 and let

Δ=Pr[(𝐪i,yD,i)𝒯]Pr[(𝐪1,Di)𝒯],

where the probabilities are over the execution of the real/ideal world. Then

Δ =Pr[(𝐪i,yD,i)𝒯]Pr[(𝐪1,Di)𝒯]
=Pr[(𝐪i,yD,i)𝒯]Pr[(𝐪i,Di)𝒯]+Pr[(𝐪i,Di)𝒯]Pr[(𝐪1,Di)𝒯].

Observe that by the t-privacy of Π,

Pr[(𝐪i,Di)𝒯]Pr[(𝐪1,Di)𝒯]=neg(N).

Indeed, if this was not the case, then

Pr[𝐪i𝒯Di]Pr[𝐪1𝒯Di]

is non-negligible, where 𝒯Di={𝐪:(𝐪,Di)𝒯}. Therefore,

Δ Pr[(𝐪i,yD,i)𝒯]Pr[(𝐪i,Di)𝒯]+neg(N)
=Pr[(𝐪i,Di)𝒯]Pr[yD,i=Di]+Pr[(𝐪i,1Di)𝒯]Pr[yD,i=1Di]
Pr[(𝐪i,Di)𝒯]+neg(N).

By the t-Byzantine robustness of Π,

Pr[yD,i=1Di]=neg(N).

Thus,

Δ Pr[(𝐪i,Di)𝒯]Pr[yD,i=Di]+Pr[(𝐪i,1Di)𝒯]Pr[yD,i=1Di]
Pr[(𝐪i,Di)𝒯]+neg(N)
=Pr[(𝐪i,Di)𝒯](1neg(N))+Pr[(𝐪i,1Di)𝒯]neg(N)
Pr[(𝐪i,Di)𝒯]+neg(N)
=Pr[(𝐪i,Di)𝒯]Pr[(𝐪i,Di)𝒯]+neg(N)
=neg(N).

We now prove the second direction. Assume that Π is statistically t-fully secure. We first prove that Π is statistically t-private. Fix a set of size t and consider the real-world adversary controlling the servers in that outputs the queries it received from the client. Then by assumption, there exists an ideal-world simulator 𝖲𝗂𝗆 such that

{IDEAL𝖲𝗂𝗆𝗀𝗈𝖽(N,D,i)}N,D{0,1}N,i[N]S{REALΠ,(N,D,i)}N,D{0,1}N,i[N]. (2)

In particular,

{VIEW𝖲𝗂𝗆𝗀𝗈𝖽(N,D,i)}N,D{0,1}N,i[N]S{VIEWΠ,𝗋𝖾𝖺𝗅(N,D,i)}N,D{0,1}N,i[N].

Now, as 𝖲𝗂𝗆 does not receive any message from the trusted party, it follows that its output is independent of i. Thus,

VIEW𝖲𝗂𝗆𝗀𝗈𝖽(D,i)VIEW𝖲𝗂𝗆𝗀𝗈𝖽(D,i)

for all D{0,1}N and i,i[N]. Therefore,

{VIEWΠ,𝗋𝖾𝖺𝗅(N,D,i)}N,D{0,1}N,i,i[N]C{VIEWΠ,𝗋𝖾𝖺𝗅(N,D,i)}N,D{0,1}N,i,i[N].

Privacy follows from the fact that VIEWΠ,𝗋𝖾𝖺𝗅(N,D,i)=𝐪, where (𝐪,𝗌𝗍)𝒬(1N,i).

We now show that Π is statistically t-Byzantine robust. Fix a tampering algorithm and consider the adversary that instructs the corrupted server to respond by computing . Then by Equation 2,

|Pr[OUT𝖲𝗂𝗆𝗀𝗈𝖽(N,D,i)=Di]Pr[OUTΠ,𝗋𝖾𝖺𝗅(N,D,i)=Di]|ε(N),

for some negligible function ε(). Since t</2, in the ideal world, the client always outputs Di. Thus, Pr[OUTΠ,𝗋𝖾𝖺𝗅(N,D,i)=Di]1ε(N).

 Remark 16.

Note that the above result holds for PIR protocols with more than a single round. To see this, let the simulator run the client on input 1 and simulate the entire execution of the protocol to generate the adversary’s view. The same analysis shows that privacy and Byzantine robustness suffice to argue that the simulator succeeds.

For security-with-abort, we show that the real vs. ideal definition is strictly stronger than the property-based security. The next theorem states that security-with-abort implies the corresponding property-based security.

Theorem 17.

Let ,t, let Π=(𝒬,𝒜,𝒞) be an -server PIR protocol, and let 𝗍𝗒𝗉𝖾{computationally, statistically, perfectly}. If Π is 𝗍𝗒𝗉𝖾 t-secure-with-abort, then it is 𝗍𝗒𝗉𝖾 correct, 𝗍𝗒𝗉𝖾 t-private, and 𝗍𝗒𝗉𝖾 t-error-detecting.

Proof.

We assume that 𝗍𝗒𝗉𝖾=statistically since the other cases are handled similarly. For correctness, consider an adversary that does not corrupt any server. Then the output of the client in the ideal world is Di. Therefore, in the real world, the client outputs Di except with negligible probability. The proof that Π is statistically t-private is identical to the proof done in Theorem 15 and is therefore omitted. We now show that Π is statistically t-error-detecting. Fix a tampering algorithm and consider the adversary that instructs the corrupted server to respond by computing . Then there is a simulator 𝖲𝗂𝗆 such that

|Pr[OUT𝖲𝗂𝗆𝗌𝗐𝖺(N,D,i)=y]Pr[OUTΠ,𝗋𝖾𝖺𝗅(N,D,i)=y]|ε(N),

for all y{0,1,}, where ε() is a negligible function. Since in the ideal world the client never outputs 1Di, it follows that Pr[OUTΠ,𝗋𝖾𝖺𝗅(N,D,i){Di,}]1ε(N).

We next show that the converse does not hold: there exists an -server PIR protocol Π that is correct, 1-private, and 1-error-detecting but is not computationally 1-secure-with-abort. Intuitively, we construct a protocol where a malicious server can force a “selective abort” of the client, that is, the adversary forces the client to abort if and only if it holds certain indices i (e.g., i=1). This is impossible to simulate since the simulation is independent of i. We next formalize this intuition.

Example 18.

Let Π be a perfectly 1-secure-with-abort PIR protocol (e.g., each server sends D to the client; if all databases are the same, then the client outputs Di, else it aborts). Consider the following PIR protocol Π. The client computes the same queries as in Π, all servers respond with the same messages, and additionally, the first server sends an additional bit b{0,1}. An honest server sends b=0. If the client has input i=1 and it received b=1 then it outputs . Otherwise, it proceeds as in Π. Clearly, Π is correct, 1-private, and 1-error-detecting.

We next show that Π is not 1-secure-with-abort. Intuitively, this is because the adversary can force a probability for abort on i=1 that is different from the probability on i1 (although each server does not know i). Fix an adversary that corrupts S1 and sends b=1, and assume towards contradiction that there exists a simulator 𝖲𝗂𝗆 for it. Consider an execution of Π with i{1,2} (i.e., i=1 with probability 1/2 and i=2 with probability 1/2) and D=1N. Then in the real world,

Pri{1,2}[i=1,OUTΠ,𝗋𝖾𝖺𝗅(N,1N,i)=]=Pri{1,2}[i=2,OUTΠ,𝗋𝖾𝖺𝗅(N,1N,i)=1]=12.

Let us analyze the ideal world. Let p denote the probability that 𝖲𝗂𝗆 sends D=1N to the trusted party (and thus the client outputs 1). Since in the ideal world the simulator is independent of i,

Pri{1,2}[i=1,OUT𝖲𝗂𝗆𝗌𝗐𝖺(N,1N,i)=]=12(1p)

and

Pri{1,2}[i=2,OUT𝖲𝗂𝗆𝗌𝗐𝖺(N,1N,i)=1]=12p.

Clearly, it cannot be the case where both values are 1/2 as in the real world, thus the two worlds can be easily distinguished.

One might expect that the reason the above example worked is because the client aborts with different probabilities for different inputs. We next show that adding such a requirement is still insufficient to argue for simulation-based security. Roughly, this is done by letting the client send to the first server a random index i that will inform the server whether the client will abort on i (if the additional bit b from the previous example is 1). Thus, in the real world, the adversary can choose for which inputs the client will abort, which cannot be simulated in the ideal world. We next formalize this.

Example 19.

Let Π be a perfectly 1-secure-with-abort PIR protocol (e.g., the servers send D to the client who then proceeds like the trusted party). Consider the following PIR protocol Π. The client computes the same queries as in Π and appends the first query a random i[N] sampled uniformly at random. All servers respond with the same messages, and additionally, the first server sends an additional bit b{0,1}. An honest server sends b=0. If the client has input i=i and it received b=1, then it outputs . Otherwise, it proceeds as in Π. Clearly, Π is correct, 1-private, 1-error-detecting, and the probability the client aborts is 1/N if the first server cheats (and 0 otherwise). The formal argument that Π is not secure-with-abort is similar to the previous example and is therefore omitted.

5 A Generic Transformation From Private PIR to Secure-With-Abort PIR

In this section, we show a generic transformation of a private PIR protocol to a secure-with-abort protocol, based on Eriguchi et al. [24]. The original construction of [24] is insecure according to Definition 13 (see Remark 26 below). However, we prove that a modification of the construction is secure. We prove the following.

Theorem 20.

Let , t[], and 𝗍𝗒𝗉𝖾{statistically, computationally}. There exists a compiler that transforms any statistically correct and 𝗍𝗒𝗉𝖾 t-private -server PIR protocol Π to a 𝗍𝗒𝗉𝖾 t-secure-with-abort protocol. Moreover, if the total communication complexity of Π is c, then the new protocol has total communication complexity 2ω(logN)(c+log).

Proof.

We prove the result for 𝗍𝗒𝗉𝖾=statistically. The case where 𝗍𝗒𝗉𝖾=computationally follows from the fact that the simulator we construct is efficient (if the adversary is) and from simple hybrid arguments. Let Π=(𝒬,𝒜,𝒞) be a statistically correct and statistically t-private PIR protocol. The idea of the transformation is the following. First, the client generates the queries 𝐪 as in Π. Then, with probability 1/2, the client samples two indices m1,m2[] at random such that m1m2, and sends qj to the jth server Sj for every jm1, and sends (m2,qm2) to server Sm1 (i.e., query qm2 is a duplicate query). With probability 1/2, the client sends qj to Sj for every j[] (without duplicating any of them). We refer to the first kind of execution, where the client duplicated a query, as a test execution, and we refer to the second kind of execution as a real execution. The jth server, upon receiving qj, responds as Sj does in Π, and upon receiving (m2,qm2), Sm1 responds as Sm2 does in Π.

Notice that if in a test execution, Sm1 is honest, then the view of the adversary in this case is identical to its view in a real execution. Thus, if the adversary cheats in the test execution, then it cheats in the real execution.

We then let the parties repeat this process in parallel sufficiently many times. The output of the client is defined as follows. If in one of the test executions, the client receives from the servers Sm1 and Sm2 different answers, then one of them is malicious and so the client aborts. Otherwise, the client computes the output of all of the real executions and outputs the majority. Intuitively, if we repeat this sufficiently many times, then the adversary cannot distinguish the real executions from most of the test executions. Therefore, if it cheats on too many real executions, then the client will catch it in one of the test executions. Otherwise, it will cheat on a minority of the real executions; hence, the correctness of the underlying PIR protocol implies that the client will compute the correct value most of the time.

For convenience, instead of deciding at random and independently which execution is a test and which is a real execution, we let the client randomly sample exactly half of the executions to be real.444Note that if Π is perfectly correct, our protocol achieves only statistical correctness since it could be the case where the client duplicates a query in every execution of Π. We next formalize this.

.

Protocol 21 (Π𝗌𝗐𝖺=(𝒬𝗌𝗐𝖺,𝒜𝗌𝗐𝖺,𝒞𝗌𝗐𝖺)).

Inputs: The client holds the database length 1N and a retrieval index i[N]. Each server holds a database D{0,1}N.
Let λ be an integer such that λ/2 is odd to be determined by the analysis below.

The query algorithm:

The client samples a set 𝒦([λ]λ/2) uniformly at random and computes (𝐪1,𝗌𝗍1),,(𝐪λ,𝗌𝗍λ)𝒬(1N,i) independently. It then does the following for all k[λ].

  1. 1.

    Sample a pair m1k,m2k[] of distinct elements independently and uniformly at random.

  2. 2.

    If k𝒦 then set q^jk=qjk for all j[]. Otherwise, for every j[]{m1k} set q^jk=qjk, and set q^m1kk=(m2k,qm2kk).

Let 𝗌𝗍=𝒦||((m1k,m2k,𝗌𝗍k))k[λ]. The client sends (q^j1,,q^jλ) to the jth server.

The answer algorithm:

For every k[λ] and j[], if server Sj receives q^jk=qjk from the client, it computes ajk=𝒜(j,q^jk,D). Else, it receives q^jk=(m2k,qm2kk) and computes ajk=𝒜(m2k,qm2kk,D). It sends (aj1,,ajλ) to the client.

The reconstruction algorithm:

The client computes yk=𝒞(1N,a1k,,ak,𝗌𝗍k) for all k𝒦. Output if there exists k[λ]𝒦 such that and am1kkam2kk. Otherwise, the client outputs 𝗆𝖺𝗃k𝒦{yk}.

.

We now show the security of Π𝗌𝗐𝖺. Fix an adversary corrupting a set [] of at most t servers. We define its simulator 𝖲𝗂𝗆 whose input is D as follows. Sample (𝐪^,𝗌𝗍)𝒬𝗌𝗐𝖺(1N,1) and send 𝐪^ to . For every j let aj denote the response made for server Sj, and for every j[] let aj=𝒜𝗌𝗐𝖺(j,q^j,D). Write 𝗌𝗍=𝒦||((m1k,m2k,𝗌𝗍k))k[λ]. If there exists k[λ]𝒦 such that am1kkam2kk, then send to the trusted party a different database for the corrupted servers. Otherwise, send D as the input of every corrupted server. Finally, output whatever outputs and halt. We show that

{IDEAL𝖲𝗂𝗆𝗌𝗐𝖺(N,D,i)}N,D{0,1}N,i[N]S{REALΠ𝗌𝗐𝖺,(N,D,i)}N,D{0,1}N,i[N]. (3)

To prove this, we first prove the following lemma, stating that cannot force the client to output 1Di with noticeable probability.

Lemma 22.

Let 𝗇𝗈𝖠𝖻𝗈𝗋𝗍 be the event that am1kk=am2kk for every k[λ]𝒦 (i.e., the client did not output ), and let 𝖥𝖺𝗂𝗅 denote the event that 𝗆𝖺𝗃k𝒦{yk}=1Di (i.e., the client output the wrong value). Then

Pr[𝗇𝗈𝖠𝖻𝗈𝗋𝗍𝖥𝖺𝗂𝗅]2eλ/64+eλ8(1)+neg(N), (4)

where the probability is over the random coins of the client and the adversary.

Proof.

We prove that the bound holds for any fixed randomness of the adversary. In the following, we assume the randomness of and the queries of the client before duplication (i.e., 𝐪1,,𝐪λ) are fixed. Observe that for every k[λ], the adversary cannot distinguish the case where k𝒦 and the case where k𝒦 and m1k. Therefore, if it instructs a server to cheat in one case, then it will instruct it to cheat in the other case.

To this end, we say that a server Sj cheated in execution k if its answer is different from 𝒜(j,qjk,D). That is, if it were to receive its original query before duplicating, then it would send a different answer. Let

A={k[λ]:j s.t. Sj cheated in execution k},

i.e., A denotes the set of all executions where the adversary cheated. As we fixed the randomness of the adversary and 𝐪1,,𝐪λ, the set A is properly defined. We show that if |A| is too large, then the client will catch the adversary with overwhelming probability in one of the test executions, and if it is too small, then it is unlikely that the majority of the outputs computed for the real execution will be incorrect. We separate the proof into two cases.

We first analyze the probability in (4) assuming |A|<3λ/8. We show that it is unlikely that the client outputs the wrong value. That is, we show 𝖥𝖺𝗂𝗅 occurs with low probability. Observe that the expectation of |A𝒦| is at most |A|/2<3λ/16. By Hoeffding’s inequality (Theorem 6), it follows that

Pr𝒦([λ]λ/2)[|A𝒦|λ4]Pr𝒦([λ]λ/2)[|A𝒦|3λ16λ16]eλ/64.

Let k𝒦A, i.e., the kth execution is an honest real execution. By the statistical correctness of the original PIR protocol, the probability that the client outputs 1Di in the kth execution of the protocol is negligible. Therefore,

Pr[𝖥𝖺𝗂𝗅] =Pr[𝗆𝖺𝗃k𝒦{yk}=1Di]
=Pr[|A𝒦|λ4]Pr[𝗆𝖺𝗃k𝒦{yk}=1Di||A𝒦|λ4]
+Pr[|A𝒦|<λ4]Pr[𝗆𝖺𝗃k𝒦{yk}=1Di||A𝒦|<λ4]
Pr[|A𝒦|λ4]+Pr[𝗆𝖺𝗃k𝒦{yk}=1Di||A𝒦|<λ4]
eλ/64+Pr[k𝒦Ayk=1Di||A𝒦|<λ4]
eλ64+neg(N).

We now analyze the probability in (4) assuming |A|3λ/8. We show that in this case, the client will catch the adversary cheating with high probability. Since the expectation of |A𝒦¯| is at least |A|/23λ/16, by Hoeffding’s inequality, it follows that

Pr𝒦([λ]λ/2)[|A𝒦¯|λ8]Pr𝒦([λ]λ/2)[|A𝒦¯|3λ16λ16]eλ/64.

Next, for every kA𝒦¯ let 𝖢𝖺𝗎𝗀𝗁𝗍k be the event that m1k and m2k cheated in execution k (that is, it is the event the adversary got caught cheating in the kth execution). Fix an honest server h[], and for every kA𝒦¯ fix a server ck that cheated execution k. Observe that for every kA𝒦¯ the client will catch the adversary with probability at least

Pr[𝖢𝖺𝗎𝗀𝗁𝗍k|kA]Pr[m1k=h,m2k=ck|kA]1(1).

Now, observe that for every k[λ]𝒦, if am1kk=am2kk then either the adversary did not cheat on the kth execution or 𝖢𝖺𝗎𝗀𝗁𝗍k did not occur. Therefore,

Pr[𝗇𝗈𝖠𝖻𝗈𝗋𝗍] =Pr[k[λ]𝒦am1kk=am2kk]
Pr[k([λ]𝒦)A¬𝖢𝖺𝗎𝗀𝗁𝗍k]
=Pr[|A𝒦¯|λ8]Pr[k([λ]𝒦)A¬𝖢𝖺𝗎𝗀𝗁𝗍k||A𝒦¯|λ8]
+Pr[|A𝒦¯|>λ8]Pr[k([λ]𝒦)A¬𝖢𝖺𝗎𝗀𝗁𝗍k||A𝒦¯|>λ8]
eλ/64+Pr[kA𝒦¯¬𝖢𝖺𝗎𝗀𝗁𝗍k||A𝒦¯|>λ8]
eλ/64+(11(1))λ/8
eλ/64+eλ8(1).

We conclude that

Pr[𝗇𝗈𝖠𝖻𝗈𝗋𝗍𝖥𝖺𝗂𝗅]2eλ/64+eλ8(1)+neg(N).

We now use Lemma 22 to show that Equation 3 holds. Specifically, we show that conditioned that the event 𝗇𝗈𝖠𝖻𝗈𝗋𝗍𝖥𝖺𝗂𝗅 does not occur, the two ensembles are statistically close. First, note that by the t-privacy of Π, the queries that receives in the real world are statistically close to the queries it receives from 𝖲𝗂𝗆 in the ideal world. Therefore, its responses are statistically close. Now, if the client aborts in the real world, then there exists a test execution where it gets different responses. Thus, the same holds in the ideal world (except with negligible probability), hence the simulator changes its database. This means that in both worlds, the client outputs . Otherwise, since we condition on 𝗇𝗈𝖠𝖻𝗈𝗋𝗍𝖥𝖺𝗂𝗅 not occurring, the client outputs Di in both worlds. Finally, setting λ=ω(2logN) implies that 𝗇𝗈𝖠𝖻𝗈𝗋𝗍𝖥𝖺𝗂𝗅 occurs with negligible probability, which completes the proof.

Using the t-private -server PIR protocol of Woodruff and Yekhanin [46] we obtain the following.

Corollary 23.

For every and t[1] there exists a t-secure-with-abort -server PIR protocol with communication complexity

ω(logN)4tlogN1/(21)/t.

Applying Theorem 20 to the protocol resulting from the compiler of Barkol et al. [5] applied to the 2-server protocol of Dvir and Gopi [20] yields the following.

Corollary 24.

For every t, there exists a t-secure-with-abort 2t-server PIR protocol with communication complexity

2max{O~(logN),tlogt}.

Finally, for 3t-server PIR protocols, Barkol et al. [5] and Efremenko [22] gives the following.

Corollary 25.

For every t, there exists a t-secure-with-abort 3t-server PIR protocol with communication complexity

32tmax{t2O~(logN),t3t}.
 Remark 26 (On the insecurity of the compiler of [24]).

In the original construction of Eriguchi et al. [24], instead of computing 𝗆𝖺𝗃k𝒦{yk}, the client also tests whether all of these values are equal. If not, it outputs , and if they are equal, the client outputs this value. Although their protocol is correct, t-private, and t-error-detecting, it is not t-secure-with-abort. Intuitively, this is because in the underlying protocol Π the adversary can cause the client to output an incorrect value for different inputs i (recall that Π is not guaranteed to satisfy any robustness property). This causes the client to abort only on certain inputs; thus it has the same selective abort issue presented in Examples 18 and 19. We next give details.

Consider the protocol Π, where each server sends D to the client, and the client computes the output as follows. If i=1 then output the ith entry of the database sent by S1, and otherwise, output the ith entry of the database sent by S2. Clearly this protocol is correct and private. However, the compiled protocol of Eriguchi et al. [24] is not 1-secure-with-abort since an adversary corrupting S1 can force the client to output an incorrect value only on i=1. Therefore, the client will abort only when i=1.

References

  • [1] Bar Alon, Amos Beimel, and Or Lasri. Simplified PIR and CDS protocols and improved linear secret-sharing schemes. Cryptology ePrint Archive, Paper 2024/1599, 2024. URL: https://eprint.iacr.org/2024/1599.
  • [2] Andris Ambainis. Upper bound on the communication complexity of private information retrieval. In P. Degano, R. Gorrieri, and A. Marchetti-Spaccamela, editors, Proc. of the 24th International Colloquium on Automata, Languages and Programming, volume 1256 of Lecture Notes in Computer Science, pages 401–407, 1997. doi:10.1007/3-540-63165-8_196.
  • [3] Hadar Amran. Constructing multi-servers private information retrieval protocols. Master’s thesis, Ben-Gurion University, 2016.
  • [4] Karim A. Banawan and Sennur Ulukus. The capacity of private information retrieval from byzantine and colluding databases. IEEE Trans. Inf. Theory, 65(2):1206–1219, 2019. doi:10.1109/TIT.2018.2869154.
  • [5] Omer Barkol, Yuval Ishai, and Enav Weinreb. On locally decodable codes, self-correctable codes, and t -private PIR. In Moses Charikar, Klaus Jansen, Omer Reingold, and José D. P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 10th International Workshop, APPROX 2007, and 11th International Workshop, RANDOM 2007, volume 4627 of Lecture Notes in Computer Science, pages 311–325. Springer, 2007. doi:10.1007/978-3-540-74208-1_23.
  • [6] Amos Beimel and Yuval Ishai. Information-theoretic private information retrieval: A unified construction. In Fernando Orejas, Paul G. Spirakis, and Jan van Leeuwen, editors, Automata, Languages and Programming, 28th International Colloquium, ICALP 2001, Crete, Greece, July 8-12, 2001, Proceedings, volume 2076 of Lecture Notes in Computer Science, pages 912–926. Springer, 2001. doi:10.1007/3-540-48224-5_74.
  • [7] Amos Beimel, Yuval Ishai, Ranjit Kumaresan, and Eyal Kushilevitz. On the cryptographic complexity of the worst functions. In Yehuda Lindell, editor, TCC 2014, volume 8349 of Lecture Notes in Computer Science, pages 317–342. Springer, 2014. doi:10.1007/978-3-642-54242-8_14.
  • [8] Amos Beimel, Yuval Ishai, and Eyal Kushilevitz. General constructions for information-theoretic private information retrieval. Journal of Computer and System Sciences, 71(2):213–247, 2005. doi:10.1016/j.jcss.2005.03.002.
  • [9] Amos Beimel and Yoav Stahl. Robust information-theoretic private information retrieval. J. Cryptol., 20(3):295–321, 2007. doi:10.1007/S00145-007-0424-2.
  • [10] Shany Ben-David, Yael Tauman Kalai, and Omer Paneth. Verifiable private information retrieval. In Eike Kiltz and Vinod Vaikuntanathan, editors, Theory of Cryptography – 20th International Conference, TCC 2022, Part III, volume 13749 of Lecture Notes in Computer Science, pages 3–32. Springer, 2022. doi:10.1007/978-3-031-22368-6_1.
  • [11] Zvika Brakerski and Vinod Vaikuntanathan. Efficient fully homomorphic encryption from (standard) LWE. In Rafail Ostrovsky, editor, IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS 2011, pages 97–106. IEEE Computer Society, 2011. doi:10.1109/FOCS.2011.12.
  • [12] Christian Cachin, Silvio Micali, and Markus Stadler. Computationally private information retrieval with polylogarithmic communication. In Jacques Stern, editor, Advances in Cryptology – EUROCRYPT ’99. Springer, 1999. doi:10.1007/3-540-48910-X_28.
  • [13] Ran Canetti. Security and composition of multiparty cryptographic protocols. Journal of Cryptology, 13(1):143–202, 2000. doi:10.1007/S001459910006.
  • [14] Yeow Meng Chee, Tao Feng, San Ling, Huaxiong Wang, and Liang Feng Zhang. Query-efficient locally decodable codes of subexponential length. Computational Complexity, 22(1):159–189, 2013. doi:10.1007/s00037-011-0017-1.
  • [15] Benny Chor and Niv Gilboa. Computationally private information retrieval (extended abstract). In Frank Thomson Leighton and Peter W. Shor, editors, Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing, pages 304–313. ACM, 1997. doi:10.1145/258533.258609.
  • [16] Benny Chor, Oded Goldreich, Eyal Kushilevitz, and Madhu Sudan. Private information retrieval. In 36th Annual Symposium on Foundations of Computer Science, pages 41–50. IEEE Computer Society, 1995. doi:10.1109/SFCS.1995.492461.
  • [17] Simone Colombo, Kirill Nikitin, Henry Corrigan-Gibbs, David J. Wu, and Bryan Ford. Authenticated private information retrieval. In Joseph A. Calandrino and Carmela Troncoso, editors, 32nd USENIX Security Symposium, USENIX Security 2023, pages 3835–3851. USENIX Association, 2023. URL: https://www.usenix.org/conference/usenixsecurity23/presentation/colombo.
  • [18] Casey Devet, Ian Goldberg, and Nadia Heninger. Optimally robust private information retrieval. In Tadayoshi Kohno, editor, Proceedings of the 21th USENIX Security Symposium, pages 269–283. USENIX Association, 2012. URL: https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/devet.
  • [19] Yevgeniy Dodis, Shai Halevi, Ron D. Rothblum, and Daniel Wichs. Spooky encryption and its applications. In Matthew Robshaw and Jonathan Katz, editors, Advances in Cryptology – CRYPTO 2016 – 36th Annual International Cryptology Conference, Part III, volume 9816 of Lecture Notes in Computer Science, pages 93–122. Springer, 2016. doi:10.1007/978-3-662-53015-3_4.
  • [20] Zeev Dvir and Sivakanth Gopi. 2-server PIR with subpolynomial communication. J. ACM, 63(4):39:1–39:15, 2016. doi:10.1145/2968443.
  • [21] Cynthia Dwork, Michael Langberg, Moni Naor, Kobbi Nissim, and Omer Reingold. Succinct proofs for np and spooky interactions. Unpublished manuscript, https://www.wisdom.weizmann.ac.il/˜naor/PAPERS/spooky.pdf, 2004.
  • [22] Klim Efremenko. 3-query locally decodable codes of subexponential length. SIAM J. Comput., 41(6):1694–1703, 2012. doi:10.1137/090772721.
  • [23] Reo Eriguchi, Kaoru Kurosawa, and Koji Nuida. Multi-server PIR with full error detection and limited error correction. In Dana Dachman-Soled, editor, 3rd Conference on Information-Theoretic Cryptography, ITC 2022, volume 230 of LIPIcs, pages 1:1–1:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.ITC.2022.1.
  • [24] Reo Eriguchi, Kaoru Kurosawa, and Koji Nuida. On the optimal communication complexity of error-correcting multi-server PIR. In Eike Kiltz and Vinod Vaikuntanathan, editors, Theory of Cryptography – 20th International Conference, TCC 2022, Part III, volume 13749 of Lecture Notes in Computer Science, pages 60–88. Springer, 2022. doi:10.1007/978-3-031-22368-6_3.
  • [25] Reo Eriguchi, Kaoru Kurosawa, and Koji Nuida. Efficient and generic methods to achieve active security in private information retrieval and more advanced database search. In Marc Joye and Gregor Leander, editors, Advances in Cryptology – EUROCRYPT 2024, Part V, volume 14655 of Lecture Notes in Computer Science, pages 92–121. Springer, 2024. doi:10.1007/978-3-031-58740-5_4.
  • [26] Craig Gentry and Zulfikar Ramzan. Single-database private information retrieval with constant communication rate. In Luís Caires, Giuseppe F. Italiano, Luís Monteiro, Catuscia Palamidessi, and Moti Yung, editors, Automata, Languages and Programming, 32nd International Colloquium, ICALP 2005, volume 3580 of Lecture Notes in Computer Science, pages 803–815. Springer, 2005. doi:10.1007/11523468_65.
  • [27] Fatemeh Ghasemi, Swastik Kopparty, and Madhu Sudan. Improved pir schemes using matching vectors and derivatives. Technical Report 2411.11611, Cryptology ePrint Archive, 2024. doi:10.48550/arXiv.2411.11611.
  • [28] Ian Goldberg. Improving the robustness of private information retrieval. In 2007 IEEE Symposium on Security and Privacy (S&P 2007), 20-23, pages 131–148. IEEE Computer Society, 2007. doi:10.1109/SP.2007.23.
  • [29] Oded Goldreich. Foundations of Cryptography – VOLUME 2: Basic Applications. Cambridge University Press, 2004. doi:10.1017/CBO9780511721656.
  • [30] Yuval Ishai and Eyal Kushilevitz. Improved upper bounds on information-theoretic private information retrieval (extended abstract). In Jeffrey Scott Vitter, Lawrence L. Larmore, and Frank Thomson Leighton, editors, Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, pages 79–88. ACM, 1999. doi:10.1145/301250.301275.
  • [31] Toshiya Itoh. Efficient private information retrieval. IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, E82-A(1):11–20, 1999.
  • [32] Toshiya Itoh and Yasuhiro Suzuki. New constructions for query-efficient locally decodable codes of subexponential length. CoRR, abs/0810.4576, 2008. arXiv:0810.4576.
  • [33] Toshiya Itoh and Yasuhiro Suzuki. Improved constructions for query-efficient locally decodable codes of subexponential length. IEICE Transactions on Information and Systems, E93-D(2):263–270, 2010. doi:10.1587/transinf.e93.d.263.
  • [34] Jonathan Katz and Luca Trevisan. On the efficiency of local decoding procedures for error-correcting codes. In F. Frances Yao and Eugene M. Luks, editors, Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, pages 80–86. ACM, 2000. doi:10.1145/335305.335315.
  • [35] Kiran S. Kedlaya and Sergey Yekhanin. Locally decodable codes from nice subsets of finite fields and prime factors of Mersenne numbers. Technical Report TR07-040, Electronic Colloquium on Computational Complexity, 2007. URL: https://eccc.weizmann.ac.il/report/2007/040/.
  • [36] Iordanis Kerenidis and Ronald de Wolf. Exponential lower bound for 2-query locally decodable codes via a quantum argument. Journal of Computer and System Sciences, 69(3):395–420, 2004. doi:10.1016/J.JCSS.2004.04.007.
  • [37] Kaoru Kurosawa. How to correct errors in multi-server PIR. In Steven D. Galbraith and Shiho Moriai, editors, Advances in Cryptology – ASIACRYPT 2019, Part II, volume 11922 of Lecture Notes in Computer Science, pages 564–574. Springer, 2019. doi:10.1007/978-3-030-34621-8_20.
  • [38] Eyal Kushilevitz and Rafail Ostrovsky. Replication is NOT needed: SINGLE database, computationally-private information retrieval. In 38th Annual Symposium on Foundations of Computer Science, FOCS ’97, 1997, pages 364–373. IEEE Computer Society, 1997. doi:10.1109/SFCS.1997.646125.
  • [39] Tianren Liu, Vinod Vaikuntanathan, and Hoeteck Wee. Conditional disclosure of secrets via non-linear reconstruction. In Jonathan Katz and Hovav Shacham, editors, Advances in Cryptology – CRYPTO 2017, Part I, volume 10401 of Lecture Notes in Computer Science, pages 758–790. Springer, 2017. doi:10.1007/978-3-319-63688-7_25.
  • [40] Tianren Liu, Vinod Vaikuntanathan, and Hoeteck Wee. Towards breaking the exponential barrier for general secret sharing. In EUROCRYPT 2018, volume 10820 of LNCS, pages 567–596, 2018. doi:10.1007/978-3-319-78381-9_21.
  • [41] Eran Mann. Private access to distributed information. Master’s thesis, Technion – Israel Institute of Technology, Haifa, 1998.
  • [42] Silvio Micali and Phillip Rogaway. Secure computation (abstract). In Joan Feigenbaum, editor, Advances in Cryptology – CRYPTO ’91, 11th Annual International Cryptology Conference, volume 576 of Lecture Notes in Computer Science, pages 392–404. Springer, 1991. doi:10.1007/3-540-46766-1_32.
  • [43] Hua Sun and Syed Ali Jafar. The capacity of private information retrieval. IEEE Trans. Inf. Theory, 63(7):4075–4088, 2017. doi:10.1109/TIT.2017.2689028.
  • [44] Hua Sun and Syed Ali Jafar. The capacity of robust private information retrieval with colluding databases. IEEE Trans. Inf. Theory, 64(4):2361–2370, 2018. doi:10.1109/TIT.2017.2777490.
  • [45] Stephanie Wehner and Ronald de Wolf. Improved lower bounds for locally decodable codes and private information retrieval. In L. Caires, G. F. Italiano, L. Monteiro, C. Palamidessi, and M. Yung, editors, Proc. of the 32nd International Colloquium on Automata, Languages and Programming, volume 3580 of Lecture Notes in Computer Science, pages 1424–1436, 2005. doi:10.1007/11523468_115.
  • [46] David P. Woodruff and Sergey Yekhanin. A geometric approach to information-theoretic private information retrieval. In 20th Annual IEEE Conference on Computational Complexity (CCC 2005), pages 275–284. IEEE Computer Society, 2005. doi:10.1109/CCC.2005.2.
  • [47] E.Y. Yang, Jie Xu, and K.H. Bennett. Private information retrieval in the presence of malicious failures. In Proceedings 26th Annual International Computer Software and Applications, pages 805–810, 2002. doi:10.1109/CMPSAC.2002.1045104.
  • [48] Sergey Yekhanin. Towards 3-query locally decodable codes of subexponential length. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC), pages 266–274, 2007. doi:10.1145/1250790.1250830.
  • [49] Liang Feng Zhang and Reihaneh Safavi-Naini. Verifiable multi-server private information retrieval. In Ioana Boureanu, Philippe Owesarski, and Serge Vaudenay, editors, Applied Cryptography and Network Security – 12th International Conference, ACNS 2014, Lausanne, Switzerland, volume 8479 of Lecture Notes in Computer Science, pages 62–79. Springer, 2014. doi:10.1007/978-3-319-07536-5_5.