Abstract 1 Introduction and Background 2 Preliminaries 3 Proofs of Main Results 4 Concluding Remarks References Appendix A WE from Indistinguishability Obfuscation Appendix B Public-key Encryption from WE and OWFs

Witness Encryption and NP-Hardness of Learning

Halley Goldberg ORCID Simon Fraser University, Burnaby, Canada Valentine Kabanets Simon Fraser University, Burnaby, Canada
Abstract

We study connections between two fundamental questions from computer science theory. (1) Is witness encryption possible for 𝖭𝖯 [11]? That is, given an instance x of an 𝖭𝖯-complete language L, can one encrypt a secret message with security contingent on the ability to provide a witness for xL? (2) Is computational learning (in the sense of [46, 30]) hard for 𝖭𝖯? That is, is there a polynomial-time reduction from instances of L to instances of learning?

Our main contribution is that certain formulations of 𝖭𝖯-hardness of learning characterize the existence of witness encryption for 𝖭𝖯. More specifically, we show:

  • witness encryption for a language L𝖭𝖯 is equivalent to a half-Levin reduction from L to the Computational Gap Learning problem (denoted 𝖢𝖦𝖫 [2]),

where a half-Levin reduction is the same as a Levin reduction but only required to preserve witnesses in one direction, and 𝖢𝖦𝖫 formalizes agnostic learning as a decision problem. We show versions of the statement above for witness encryption secure against non-uniform and uniform adversaries. We also show that witness encryption for 𝖭𝖯 with ciphertexts of logarithmic length, along with a circuit lower bound for 𝖤, are together equivalent to 𝖭𝖯-hardness of a generalized promise version of 𝖬𝖢𝖲𝖯.

We complement the above with a number of unconditional 𝖭𝖯-hardness results for agnostic PAC learning. Extending a result of [16] to the standard setting of boolean circuits, we show 𝖭𝖯-hardness of “semi-proper” learning. Namely:

  • for some polynomial s, it is 𝖭𝖯-hard to agnostically learn circuits of size s(n) by circuits of size s(n)n1/(loglogn)O(1).

Looking beyond the computational model of standard boolean circuits enables us to prove 𝖭𝖯-hardness of improper learning (ie. without a restriction on the size of hypothesis returned by the learner). We obtain such results for:

  • learning circuits with oracle access to a given randomly sampled string, and

  • learning RAM programs.

In particular, we show that a variant of 𝖬𝖨𝖭𝖫𝖳 [31] for RAM programs is 𝖭𝖯-hard with parameters corresponding to the setting of improper learning. We view these results as partial progress toward the ultimate goal of showing 𝖭𝖯-hardness of learning boolean circuits in an improper setting.

Lastly, we give some consequences of 𝖭𝖯-hardness of learning for private- and public-key cryptography. Improving a main result of [2], we show that if improper agnostic PAC learning is 𝖭𝖯-hard under a randomized non-adaptive reduction (with some restrictions), then 𝖭𝖯𝖡𝖯𝖯 implies the existence of i.o. one-way functions. In contrast, if 𝖢𝖦𝖫 is 𝖭𝖯-hard under a half-Levin reduction, then 𝖭𝖯𝖡𝖯𝖯 implies the existence of i.o. public-key encryption.

Keywords and phrases:
agnostic PAC learning, witness encryption, NP-hardness
Copyright and License:
[Uncaptioned image] © Halley Goldberg and Valentine Kabanets; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Computational complexity and cryptography
Editors:
Srikanth Srinivasan

1 Introduction and Background

It has long been recognized that learning and cryptography are two sides of the same coin. As Valiant already observed in his paper introducing PAC learning, its easiness would preclude the existence of pseudorandom functions [46]. In 1993, building on a work of Impagliazzo and Levin [24], Blum, Furst, Kearns, and Lipton sharpened Valiant’s observation by showing that private-key cryptography can be broken if and only if PAC learning is easy in an average-case setting [6]. A number of more recent results have developed such connections further (eg. [41, 39, 18]).

In the present work, we turn to more structured kinds of hardness of learning (namely, 𝖭𝖯-hardness) and cryptography (namely, witness encryption), which we show to be likewise equivalent.

NP-hardness of PAC learning

Learning theory asks whether an efficient entity can hope to learn general truths about reality from necessarily limited input from experience. Valiant formalized this idea in 1984 with his definition of Probably Approximately Correct (PAC) Learning [46]. In this framework, a learner for some representation class 𝒞 gets access to labeled examples (x,f(x)) for some unknown concept f𝒞 and x randomly sampled according to an arbitrary distribution 𝒟. The learner is asked to produce, with high probability, a hypothesis that approximates f well over 𝒟. One can distinguish between “proper” and “improper” settings: in proper learning, the learner should produce a hypothesis that also belongs to 𝒞, whereas in improper learning, there is no restriction on the complexity of the hypothesis. One can also consider various “semi-proper” settings, in which the hypothesis must belong to some specific concept class 𝒞 containing 𝒞. Lastly, one can relax the model of learning to be “agnostic” in that the labeled examples (x,b) do not necessarily reflect the values of a function f𝒞, and the learner is only asked to do as well as the best function in 𝒞 does [30].

A long line of work has established 𝖭𝖯-hardness of PAC learning for various concept classes (eg. [43, 1, 16, 32]). Early results obtained 𝖭𝖯-hardness of proper learning of restricted circuit classes. Over time, progress has pushed toward 𝖭𝖯-hardness of “less proper” learning of less restricted circuit classes. However, 𝖭𝖯-hardness of learning unrestricted boolean circuits in the improper setting remains elusive. By this, we refer to the problem of learning the concept class 𝖲𝖨𝖹𝖤[s(n)] for some fixed polynomial s by a hypothesis of arbitrary polynomial size. Proving 𝖭𝖯-hardness in this setting seems especially challenging in light of a 2008 result of Applebaum et al.:

Theorem 1 ([2]).

Suppose there is a randomized non-adaptive honest111Roughly, “honest” means that the input length of the learning instance is polynomially related to the length of the input to the reduction. See Definition 28. reduction R and a polynomial s: such that, for every constant c, R reduces 𝖲𝖠𝖳 to agnostically PAC learning 𝖲𝖨𝖹𝖤[s(n)] by 𝖲𝖨𝖹𝖤[s(n)c]. Then, if 𝖭𝖯 is hard on average, infinitely-often one-way functions exist.

Of course, there are various ways in which one could formulate reductions to PAC learning. The aforementioned paper [2] describes two (of many) possibilities. The statement of Theorem 1 refers to one kind of 𝖭𝖯-hardness reduction described in which a randomized oracle machine on an input of length n makes non-adaptive queries to its oracle, with being a 𝗉𝗈𝗅𝗒(n)-size circuit sampling example-label pairs (x,b){0,1}nΩ(1)×{0,1}. The oracle returns, for each query, a polynomial-size circuit h having high agreement over (i.e. Pr(x,b)[h(x)=b] close to 1), which the reduction may use in any way.

In the other kind of 𝖭𝖯-hardness reduction that [2] describes, the reduction only uses its learning oracle to decide if a small circuit consistent with exists. The relevant decision problem introduced by [2] is the Computational Gap Learning Problem (𝖢𝖦𝖫). The authors show that if 𝖢𝖦𝖫 is 𝖭𝖯-hard under a deterministic many-one reduction, then every language in 𝖭𝖯 has a statistical zero-knowledge argument.

 Remark 2.

The results in [2] indicate it is a challenge to show 𝖭𝖯-hardness of agnostic PAC learning, as it would prove breakthrough results in cryptography, which, although conjectured to be true, seem beyond our reach at the moment. However, we should note that the assumption in Theorem 1 is actually quite strong in its requirement that the same reduction must work for every constant c. Theorem 1 also requires that queries made by the reduction sample pairs (x,b) with the length of x polynomially related to the length of the given 𝖲𝖠𝖳 instance. It is possible that PAC learning is 𝖭𝖯-hard in the improper setting without either of these conditions being met. For example, if a different reduction exists for each c (perhaps taking longer time for larger c), then the easiness of improper learning would still imply the easiness of 𝖭𝖯.

Witness encryption for NP

First introduced in a 2013 paper of Garg, Gentry, Sahai, and Waters [11], witness encryption (WE) is a cryptographic primitive in which security is contingent on the ability to provide a proof of an answer to a hard problem. As Garg et al. put it,

What if we don’t really care if [the receiver] knows a secret key, but we do care if he knows a solution to a crossword puzzle that we saw in the Times? [11]

More formally, the functionality of a witness encryption scheme is based on the structure of a particular language L𝖭𝖯. An instance x of L is made public, and anyone can encrypt a secret message using the witness encryption scheme along with x. It is guaranteed that if x belongs to L, then anyone with a witness proving xL can decrypt, but if x does not belong to L, then no one can do so.

Witness encryption is intimately connected to another cryptographic primitive known as indistinguishability obfuscation (iO) [3], which has been studied extensively (eg. [33, 10, 23, 37]) and has many candidate constructions (see [28]). iO is known to imply WE, though the converse is not known [10]. (We give a sketch of the proof that iO implies WE in Appendix A.) WE is also closely related to secret-sharing, another important cryptographic tool [4, 34]. In fact, the candidate construction of WE in [11] also yields a Rudich-type secret sharing scheme. As with much of cryptography, we have no unconditional proof that WE is possible.

WE alone does not imply the existence of one-way functions, since it is trivially possible if 𝖭𝖯𝖡𝖯𝖯. However, WE along with one-way functions becomes very powerful, together yielding public key encryption, as demonstrated in [11]. By combining this result with two subsequent works, one obtains the following.

Theorem 3 ([11, 33, 19]; see [35]).

Assume 𝖭𝖯𝗂𝗈-𝖯/𝗉𝗈𝗅𝗒. If 𝖯/𝗉𝗈𝗅𝗒-secure witness encryption exists for 𝖭𝖯, then public-key encryption is possible.

In this paper, we show that 𝖢𝖦𝖫 is 𝖭𝖯-hard if and only if witness encryption exists for 𝖭𝖯. We prove statements along these lines for a few different settings: WE secure against PPT adversaries, WE secure against polynomial-size circuits, and WE with logarithmic-length ciphertexts; the latter roughly characterizes the 𝖭𝖯-hardness of a promise version of the Minimum Circuit Size Problem [29].

In contrast to the aforementioned results of [2] (such as Theorem 1), not only do we show that proving 𝖭𝖯-hardness of learning is a challenge as it would imply a breakthrough cryptographic construction (namely, WE), but we also show the converse: progress in cryptography (showing that WE is possible) would imply that agnostic PAC learning is 𝖭𝖯-hard. Hence, if one believes in cryptography such as iO or WE, then one should also believe in the 𝖭𝖯-hardness of improper agnostic learning, even in the extremely restrictive sense of a deterministic, many-one, half-Levin reduction to 𝖢𝖦𝖫. We interpret our results as evidence that improper learning is likely 𝖭𝖯-hard.

As a complement, we prove a number of unconditional 𝖭𝖯-hardness results for agnostic PAC learning. For one, we obtain 𝖭𝖯-hardness of agnostic learning for unrestricted boolean circuits in a semi-proper setting that improves on the previous state of the art. In other settings, we obtain 𝖭𝖯-hardness of improper agnostic learning. Though we have made some partial progress toward showing 𝖭𝖯-hardness unconditionally, we feel that it remains a promising direction for further research.

Lastly, we give some applications of our results for the possibility of private- and public-key cryptography. We improve Theorem 1 to conditionally obtain infinitely-often one-way functions from worst-case hardness of 𝖭𝖯. In contrast, if learning is 𝖭𝖯-hard in a more restricted sense (namely, a half-Levin reduction to 𝖢𝖦𝖫), then one obtains infinitely-often public-key encryption assuming 𝖭𝖯𝖡𝖯𝖯. Along the way, we prove Theorem 3 under the weaker uniform assumption 𝖭𝖯𝖡𝖯𝖯.222We believe that this last statement follows from a combination of techniques used in prior work ([11, 33, 19]; see [35]), but we have not seen the uniform version stated. In any case, we offer an alternative proof that does not rely on properties of statistical zero-knowledge arguments.

1.1 Our Results

Connections between witness encryption and NP-hardness of learning

We first discuss a set of results that characterize witness encryption in terms of 𝖭𝖯-hardness reductions to PAC learning.

Witness encryption for an 𝖭𝖯 language L consists of a pair of algorithms (𝖤𝗇𝖼,𝖣𝖾𝖼) such that, for an L-instance x and a secret bit b{0,1}, 𝖤𝗇𝖼(x,b) outputs a ciphertext string c using some randomness. It is guaranteed that, if xL and w is a witness for the membership of x under a fixed witness relation for L, then 𝖣𝖾𝖼(c,x,w) deterministically recovers the bit b. On the other hand, if xL, then 𝖤𝗇𝖼(x,0) and 𝖤𝗇𝖼(x,1) are indistinguishable (within a negligible difference) for polynomial-size circuits (if the witness encryption is 𝖯/𝗉𝗈𝗅𝗒-secure) or PPT algorithms (if the witness encryption is 𝖡𝖯𝖯-secure). See Definition 39.

Our first result characterizes 𝖯/𝗉𝗈𝗅𝗒-secure witness encryption in terms of a deterministic many-one reduction to a promise problem known as 𝖢𝖦𝖫, introduced in a work of Applebaum et al. [2]. Inputs to 𝖢𝖦𝖫 are distributions over string-label pairs (x,b){0,1}n×{0,1}, where is represented by a 𝗉𝗈𝗅𝗒(n)-size sampling circuit. For parameters s1<s2 and 0<ε<1/2, a distribution is a yes-instance of 𝖢𝖦𝖫s1s2[ε] if there is a circuit C of size at most s1 such that, for (x,b), Pr[C(x)=b]=1; is a no-instance of 𝖢𝖦𝖫s1s2[ε] if for every circuit C of size at most s2, for (x,b), Pr[C(x)=b]<1/2+ε.

Note that a many-one reduction to 𝖢𝖦𝖫 is quite a restrictive notion of reducibility to agnostic PAC learning: the reduction only uses its learning oracle to decide if a small circuit exists for the given distribution. As discussed earlier, one can imagine other kinds of reduction to learning that make more robust use of a learner: for example, actually examining the circuits returned by the learning oracle.

We further restrict the definition of a many-one reduction to 𝖢𝖦𝖫 by requiring that it be a half-Levin reduction. This is a term that we introduce here for a kind of reduction that is intermediate between a many-one reduction and a Levin reduction. A half-Levin reduction (R,Rwit) from L1 to L2 is a pair of polynomial-time machines such that R is a many-one reduction from L1 to L2, and Rwit transforms witnesses for xL1 into witnesses for R(x)L2. In contrast to a standard Levin reduction, we do not require a third algorithm transforming L2-witnesses into L1-witnesses. We emphasize that the half-Levin reductions considered in this work are deterministic unless stated otherwise.

We call a reduction to 𝖢𝖦𝖫 honest if, on inputs of length n, it outputs a distribution supported over {0,1}nΩ(1)×{0,1}.

We are now ready to state our main result. In general, fixing parameters s1,s2, and ε, we show that a half-Levin reduction from a language L to 𝖢𝖦𝖫s1s2[ε] is equivalent to a witness encryption scheme for L with running time roughly s1, security against circuits of size roughly s2, and with advantage parameter roughly ε; see Lemmas 52 and 54. For the special case of 𝖯/𝗉𝗈𝗅𝗒-secure witness encryption, we get the following.

Theorem 4.

Consider any language L𝖭𝖯. The following are equivalent.

  1. 1.

    𝖯/𝗉𝗈𝗅𝗒-secure witness encryption exists for L;

  2. 2.

    there exists a polynomial s1 and a pair of machines (R,Rwit) such that, for all polynomials s2,ε1, (R,Rwit) is an honest half-Levin reduction from L to 𝖢𝖦𝖫s1s2[ε].

We also note that the above characterization is related to a very recent result from Liu, Mazor, and Pass, which shows that 𝖯/𝗉𝗈𝗅𝗒-secure witness encryption for an 𝖭𝖯-language L is equivalent to the existence of a laconic special-honest verifier zero-knowledge argument for L [36]. We refer to that paper for the definition of this kind of protocol, though we mention that the term “laconic” means that the total length of the prover’s messages is bounded by O(logn). As a corollary of [36] and our Theorem 4 above, we obtain the following.

Corollary 5.

Consider any language L𝖭𝖯. The following are all equivalent.

  1. 1.

    𝖯/𝗉𝗈𝗅𝗒-secure witness encryption exists for L;

  2. 2.

    there exists a polynomial s1 and a pair of machines (R,Rwit) such that, for all polynomials s2,ε1, (R,Rwit) is an honest half-Levin reduction from L to 𝖢𝖦𝖫s1s2[ε];

  3. 3.

    there exists a laconic special-honest verifier zero-knowledge argument for L.

One may also be interested in the possibility that 𝖭𝖯 has witness encryption schemes secure against PPT algorithms but not against polynomial-size circuits. We therefore present a kind of reduction to learning that is equivalent to 𝖡𝖯𝖯-secure witness encryption. This kind of reduction is intermediate between a half-Levin reduction to 𝖢𝖦𝖫 and the more general kind of reduction described in [2] wherein the reduction may actually inspect the hypotheses returned by the learner and use them in any way.

More specifically, in our case, the reduction makes a single query to the search version of 𝖢𝖦𝖫, getting back a small circuit consistent with if is a yes-instance of 𝖢𝖦𝖫. We still require the reduction to be half-Levin. Furthermore, we require the reduction to be 𝖡𝖯𝖯-black-box;333The standard notion of a class-specific black-box reduction from a language L1 to a language L2 was defined by [13] as a way to formalize an intermediate case between treating L2 as an oracle (“black box”) and requiring a source code for an L2 algorithm implemented in some complexity class (“white box”); see Definitions 30 and 31. see Definition 32.

Theorem 6.

Consider any language L𝖭𝖯. The following are equivalent.

  1. 1.

    𝖡𝖯𝖯-secure witness encryption exists for L;

  2. 2.

    there exist a polynomial s1 and a pair of machines (R,Rwit) such that, for all polynomials s2 and ε1, (R,Rwit) is an honest 𝖡𝖯𝖯-black-box half-Levin reduction from L to 𝗌𝖾𝖺𝗋𝖼𝗁-𝖢𝖦𝖫s1s2[ε].

So far, we have only discussed learning in the “polynomial regime”, where the distribution over pairs (x,b) is in 𝖯𝖲𝖠𝖬𝖯/𝗉𝗈𝗅𝗒. However, the original PAC learning framework of Valiant makes no such restriction: a PAC learner should be able to learn over arbitrary distributions [46]. We thus consider reductions to learning that make further use of this capacity. In particular, imagine a polynomial-time reduction that, on input length n, queries a learner for a function on O(logn)-length inputs. In this case, it will be possible to ask the learner to work on inefficiently samplable distributions, and a truth-table of the function in question can be given to the learner explicitly. In this particular setup, the distributions queried represent well-defined functions, which may not be the case in the setup above with 𝖢𝖦𝖫. In this sense, we get closer to standard PAC learning rather than agnostic PAC learning.

This notion of learning in the “exponential regime” is closely related to 𝖬𝖢𝖲𝖯, which asks, given a truth-table T and a size threshold s, whether there exists a circuit of size s consistent with T. We generalize the problem to the context of learning by providing, along with T, a probability distribution μ represented as a table. We call this problem “Gap Distributional 𝖬𝖢𝖲𝖯”, denoted 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯. Specifically, (T,μ) is a yes-instance of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯s1s2[ε] if there exists a circuit C of size at most s1 such C is consistent with T, and (T,μ) is a no-instance if for every circuit C of size at most s2, the probability over xμ that C(x)=T(x) is at most 1/2+ε. Note that this definition naturally generalizes 𝖠𝗏𝖾𝖬𝖢𝖲𝖯 as defined in [45] (and in [7], denoted 𝖬𝖠𝖢𝖲𝖯), where the definition is the same except the distribution fixed to uniform.

We show that 𝖭𝖯-hardness of learning in this sense is equivalent to witness encryption for 𝖭𝖯 such that the ciphertexts output by 𝖤𝗇𝖼 have only logarithmic length, along with a circuit lower bound for 𝖤. Below we use 𝖲𝖨𝖹𝖤[s(n)] to denote the class of n-input boolean functions computable by circuits of size s(n); for 0α1, we use 𝖲𝖨𝖹𝖤Dα[s(n)] to denote the class of n-input boolean functions that can be computed by circuits of size s(n) on average, with probability at least α over inputs x{0,1}n sampled according to the distribution D.

Theorem 7.

Consider any language L𝖭𝖯, constant c, polynomial s1, and s2,ε1:. The following are equivalent.

  1. 1.
    • (𝖲𝖨𝖹𝖤[Ω(s2(n))],O(ε(n)))-secure (clogn+O(1))-length O~(s1(n))-decryption-time witness encryption exists for L; and

    • 𝖤𝗂𝗈-𝖲𝖨𝖹𝖤D12+O(ε(n))[Ω(s2(n))] for some 𝖤-computable distribution family D.

  2. 2.

    There is a half-Levin reduction (R,Rwit) mapping n-length instances of L to O(nc)-length instances of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯O~(s1(n))Ω(s2(n))[O(ε)], where Rwit runs in time at most O~(s1(n)).

Note that we obtain correspondences between the output length of the reduction R and the output length of 𝖤𝗇𝖼, the s1 parameter of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯 and the running time of 𝖣𝖾𝖼, the s2 parameter of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯 and the security of witness encryption, and the ε parameter of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯 and the advantage parameter of witness encryption.

Unconditional NP-hardness of semi-proper learning for 𝗣/𝗽𝗼𝗹𝘆

As discussed in the introduction, 𝖭𝖯-hardness of proper agnostic PAC-learning for polynomial-size circuits is known. However, 𝖭𝖯-hardness of improper learning is a major open question that remains elusive. In this section, we advance toward 𝖭𝖯-hardness of “less proper” agnostic PAC learning for polynomial-size circuits.

First, we show that for polynomials s1 and ε1, letting s2(n)=s1(n)n1/(loglogn)O(1), 𝖢𝖦𝖫s1s2[ε] is 𝖭𝖯-hard under a randomized many-one reduction. In particular,

Theorem 8.

For some constant c and g: with g(n)=n1/(loglogn)O(1) for n, for any polynomials ε1 and s1 such that s1(n)εc(n) for n, there is a randomized many-one reduction from 𝖲𝖠𝖳 to 𝖢𝖦𝖫s1s1g[ε].

As a straightforward consequence of Theorem 8, we obtain 𝖭𝖯-hardness of agnostic PAC-learning in the semi-proper setting of learning size-s(n) circuits by size-s(n)n1/(loglogn)O(1) circuits.

Corollary 9.

For some polynomial s, it is 𝖭𝖯-hard under a randomized one-query reduction to agnostically PAC learn 𝖲𝖨𝖹𝖤[s(n)] by 𝖲𝖨𝖹𝖤[s(n)n1/(loglogn)O(1)] over a flat 𝖯/𝗉𝗈𝗅𝗒-samplable distribution.

This improves on a prior work of Hirahara [16], which contains two main results. The first shows that, for polynomials s, it is 𝖭𝖯-hard to agnostically learn programs of size s(n) by programs of size s(n)n1/(loglogn)O(1). This is accomplished by reduction from the “minimum monotone satisfying assignment” problem, 𝖬𝖬𝖲𝖠, which is known to be 𝖭𝖯-hard to approximate by way of a PCP theorem [8, 9]. We note that Hirahara’s proof makes use of efficient secret sharing schemes, a cryptographic primitive closely related to witness encryption and known to exist unconditionally (with statistical security) for access structures that are monotone formulae [27, 5] (see [4, 11]).

The second main result of [16] shows that 𝖬𝖢𝖲𝖯, a partial function version of 𝖬𝖢𝖲𝖯, is likewise 𝖭𝖯-hard. This proof requires even more sophisticated techniques in order to produce a truth-table on logarithmic-length inputs. In particular, Hirahara reduces from a gap version of the “collective minimum (weight) monotone satisfying assignment problem” (𝖢𝖬𝖬𝖲𝖠), which generalizes 𝖬𝖬𝖲𝖠. Using the techniques of that proof and padding the inputs to have polynomial length, one can obtain 𝖭𝖯-hardness of learning 𝖯/𝗉𝗈𝗅𝗒, but with a smaller gap.

Theorem 10 (Implicit in [16]).

For any constant β>0, there exists a constant α>0 such that for some polynomial s, it is 𝖭𝖯-hard under a randomized one-query reduction to agnostically PAC learn 𝖲𝖨𝖹𝖤[s(n)] by 𝖲𝖨𝖹𝖤[s(n)2α(logn)1β] over a flat 𝖯/𝗉𝗈𝗅𝗒-samplable distribution.

However, those ideas do not extend to obtain the larger s(n) vs. s(n)n1/(loglogn)O(1) gap for circuits rather than programs as in Corollary 9.

 Remark 11.

We note that Theorem 8 is in fact proved by a randomized half-Levin reduction from 𝖦𝖺𝗉𝖬𝖬𝖲𝖠 to 𝖢𝖦𝖫! That is, there is a poly-time machine Rwit such that, with high probability over r, if r is the output of the reduction of Theorem 8 run on a 𝖦𝖺𝗉𝖬𝖬𝖲𝖠 instance φ with randomness r, and α is a witness for φ, then Rwit(φ,α,r) outputs a circuit certifying that r is a yes-instance of 𝖢𝖦𝖫. However, unfortunately, we do not know how to extend the results of the previous section to obtain non-trivial witness encryption from a randomized half-Levin reduction, particularly since the amount of randomness required is greater than the parameter s2 (which translates to the security of the witness encryption scheme). In any case, our results show that if the reduction of Theorem 8 can be suitably derandomized (with the running time of Rwit less than the parameter s2) then one would unconditionally obtain non-trivial witness encryption.

Unconditional NP-hardness of improper learning for other concept classes

Though in the case of 𝖯/𝗉𝗈𝗅𝗒 we are unable to obtain 𝖭𝖯-hardness with a gap greater than n1/(loglogn)O(1), we show in this section that for some related concept classes, improper learning is 𝖭𝖯-hard. We demonstrate such results for learning oracle circuits and learning RAM programs.

In the following, for a fixed λ: and distribution family D={Dm}m, where each Dm is supported over strings of length 2m and samplable uniformly in time 𝗉𝗈𝗅𝗒(2m), D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s1s2[ε,λ] is a generalization of 𝖢𝖦𝖫s1s2[ε] in which all circuits and samplers take access to an oracle O{0,1}2λ(n) randomly sampled from Dλ(n). is a yes-instance of D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s1s2[ε,λ] if there exists a circuit C of size s1(n) such that for every O in the support of Dλ(n),

Pr(x,b)O[CO(x)=b]=1,

and is a no-instance of D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s1s2[ε,λ] if, with probability 1ε(n) over ODλ(n), for any circuit C of size s2(n),

Pr(x,b)O[CO(x)=b]<12+ε(n).

We prove the following 𝖭𝖯-hardness of 𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫, from which we will derive further corollaries.

Theorem 12.

There exist a distribution family D𝖯𝖲𝖠𝖬𝖯, an 𝖭𝖯-complete language L, and a polynomial s1 such that, for all polynomials s2,ε1, there is a function λ: such that λ(n)=O(logn) for n and L reduces to D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s1s2[ε,λ] under an honest half-Levin reduction.

Theorem 12 is proved using a framework developed by Huang, Ilango, and Ren for “witness encryption in oracle worlds” [21]. More specifically, drawing on the candidate witness encryption scheme proposed in [11], the authors show that there exist an 𝖭𝖯-complete language L and a distribution family D𝖯𝖲𝖠𝖬𝖯 such that, with high probability over oracles O sampled randomly from D, witness encryption for L exists with respect to O. That is, 𝖤𝗇𝖼, 𝖣𝖾𝖼, and adversaries all get oracle access to O.

We may then apply the techniques of one direction of our main equivalence, Theorem 4, to obtain 𝖭𝖯-hardness of learning with respect to O. Crucially, in [21], an oracle O with truth-table length n gives rise to a witness encryption scheme secure against non-uniform adversaries of size polynomially related to n. So, to achieve 𝖭𝖯-hardness of 𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫 with an arbitrary polynomial gap, one only requires oracle truth-tables of polynomial length.

As a consequence of Theorem 12, we show that agnostic learning is 𝖭𝖯-hard in an “oracle-PAC” model that we introduce here, which generalizes standard agnostic PAC learning of 𝖯/𝗉𝗈𝗅𝗒. In this model, a learner for 𝖲𝖨𝖹𝖤[s] on input length n is explicitly given the whole truth table of an oracle function 𝒪:{0,1}O(logn){0,1} sampled randomly from some distribution D𝖯𝖲𝖠𝖬𝖯. The learner is asked to output an oracle circuit h such that h𝒪 approximates the target function almost as well as the best size-s 𝒪-oracle circuit; see Definition 23.

Corollary 13.

For every constant c and sufficiently large polynomial s, it is 𝖭𝖯-hard under a randomized one-query reduction to agnostically oracle-PAC learn 𝖲𝖨𝖹𝖤[s(n)] by 𝖲𝖨𝖹𝖤[s(n)c].

We stress that the above holds under a standard polynomial-time reduction that does not itself require any oracle.

As a second consequence of the 𝖭𝖯-hardness of 𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫, we show that it is 𝖭𝖯-hard to agnostically learn RAM programs in an improper setting. In particular, we focus on a decision problem based on the problem 𝖬𝖨𝖭𝖫𝖳 defined in [31]. In Ko’s definition, the input to 𝖬𝖨𝖭𝖫𝖳 consists of pairs (xi,bi)i[m]({0,1}n×{0,1})m for some m=𝗉𝗈𝗅𝗒(n) along with size and running time parameters s,t represented in unary. The input is a yes-instance if and only if there exists a program M of size at most s and running time at most t such that M(xi)=bi for all i[m]. Ko exhibited an oracle relative to which 𝖬𝖨𝖭𝖫𝖳 is not 𝖭𝖯-hard [31].

Hirahara, using non-relativizing techniques, showed that a certain generalization of 𝖬𝖨𝖭𝖫𝖳 (called 𝖦𝖺𝗉𝖫𝖾𝖺𝗋𝗇) is in fact 𝖭𝖯-hard [16]. In this promise-problem, the pairs (xi,bi) are represented implicitly by a sampling circuit (as in the definition of 𝖢𝖦𝖫), and there is a gap, eg., in program size between yes-instances and no-instances. It is easy to see that 𝖭𝖯-hardness of this problem (Theorem 1.1 in [16]) implies 𝖭𝖯-hardness of Ko’s 𝖬𝖨𝖭𝖫𝖳.

We formulate our definition of 𝖦𝖺𝗉𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳 analogously to the definition of 𝖦𝖺𝗉𝖫𝖾𝖺𝗋𝗇 in [16] but considering programs that are granted random access to their inputs. Namely, the input to 𝖦𝖺𝗉g,ε𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳 is a sampler over pairs (x,b) along with parameters s,t. The input (,1s,1t) is a yes-instance if there exists a program of size at most s and running in time at most t such that Pr(x,b)[Mx=b]=1, and (,1s,1t) is a no-instance if for any program of size at most g(s) and running in time at most g(t), Pr(x,b)[Mx=b]<12+ε(n). Here, Mx denotes the program M with random access to its input x.

We obtain the following 𝖭𝖯-hardness of 𝖦𝖺𝗉𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳. Note that the parameters we obtain imply 𝖭𝖯-hardness of learning RAM programs in an improper setting.

Corollary 14.

For all polynomials g and ε1, 𝖦𝖺𝗉g,ε𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳 is 𝖭𝖯-hard under a randomized many-one reduction.

In contrast, [16] considers programs that do not require random access to their inputs; the gap in program size achieved there is small (namely, s(n) vs. s(n)n1/(loglogn)O(1)), though the no-instances extend to time-unbounded programs.

Applications for the possibility of cryptography

Finally, we prove a few results demonstrating what cryptography is possible assuming worst-case hardness of 𝖭𝖯 along with various assumptions concerning witness encryption and 𝖭𝖯-hardness of learning.

Assuming a randomized non-adaptive reduction from 𝖲𝖠𝖳 to improper agnostic learning as described in [2] (see Definition 28), we show that the worst-case assumption 𝖭𝖯𝖡𝖯𝖯 is sufficient to imply one-way functions secure infinitely often against uniform PPT adversaries. This improves a main result of [2] (Theorem 1 in the introduction), which requires average-case hardness of 𝖭𝖯. In terms of Impagliazzo’s five worlds [22], the result of [2] conditionally excludes Pessiland. In contrast, we conditionally exclude both Heuristica and Pessiland.

Theorem 15.

Suppose there is a randomized non-adaptive honest reduction R and a polynomial s: such that, for every constant c, R reduces 𝖲𝖠𝖳 to agnostically PAC learning 𝖲𝖨𝖹𝖤[s(n)] by 𝖲𝖨𝖹𝖤[s(n)c]. Then, unless 𝖭𝖯𝖡𝖯𝖯, there exist infinitely-often one-way functions.

As discussed in the introduction, Garg et al. show that WE for an 𝖭𝖯-complete language along with a one-way function is sufficient for public-key encryption [11]. Combining this result, our main equivalence in Theorem 4, and Theorem 15, we obtain that witness encryption simultaneously excludes Heuristica, Pessiland, and Minicrypt.

Theorem 16.

Suppose 𝖯/𝗉𝗈𝗅𝗒-secure witness encryption exists for 𝖭𝖯. Then, unless 𝖭𝖯𝖡𝖯𝖯, there exists public-key encryption secure infinitely often against polynomial-time adversaries.

We believe that this statement can also be proved by combining prior works [33, 19, 11] (see also [35]), though we have not seen the uniform version stated (ie. the version with the assumption 𝖭𝖯𝖡𝖯𝖯 rather than 𝖭𝖯𝖯/𝗉𝗈𝗅𝗒). In any case, we provide an alternative proof that does not rely on properties of zero-knowledge, such as the “SZK/OWF” characterization from [42].

Theorem 16 and our our main equivalence in Theorem 4 together imply that if learning is 𝖭𝖯-hard under a half-Levin reduction to 𝖢𝖦𝖫s1s2[ε] working for arbitrary polynomials s2 and ε1, one can rule out Minicrypt in addition to Heuristica and Pessiland. Thus, we obtain a stark distinction between the two kinds of reduction to learning defined in [2]. Namely, assuming worst-case hardness of 𝖭𝖯, a randomized non-adaptive reduction to agnostic learning over 𝖯𝖲𝖠𝖬𝖯/𝗉𝗈𝗅𝗒 yields one-way functions, whereas a deterministic many-one reduction to 𝖢𝖦𝖫, provided it is half-Levin, yields public-key encryption.

Theorem 17.

Suppose there exist a polynomial s1 and a pair of machines (R,Rwit) such that, for all polynomials s2,ε1, (R,Rwit) is an honest half-Levin reduction from 𝖲𝖠𝖳 to 𝖢𝖦𝖫s1s2[ε]. Then, unless 𝖭𝖯𝖡𝖯𝖯, there exists public-key encryption secure infinitely often against polynomial-time adversaries.

1.2 Related Work

As mentioned above, a very recent work of Liu, Mazor, and Pass shows that witness encryption for 𝖭𝖯 is equivalent to “laconic special-honest verifier zero-knowledge arguments” for 𝖭𝖯 [36]. Our main result shows that half-Levin reductions from 𝖭𝖯 to 𝖢𝖦𝖫 are equivalent to both. This sharpens the result of Applebaum et al. showing that 𝖭𝖯-hardness of 𝖢𝖦𝖫 under a many-one (not necessarily half-Levin) reduction implies 𝖭𝖯𝖲𝖹𝖪𝖠 [2].

Huang, Ilango, and Ren [21] used an assumption of the existence of witness encryption secure against subexponential-size circuits to conclude the 𝖭𝖯-hardness of approximating the time-bounded conditional Kolmogorov complexity 𝖪t(xy), in the regime where t>|y|, with essentially optimal gap parameters. This is similar to one direction of our Theorem 4 showing an equivalence between the existence of witness encryption (secure against 𝖯/𝗉𝗈𝗅𝗒) and 𝖭𝖯-hardness of a learning problem 𝖢𝖦𝖫 (under restricted reductions). We note that in our framework, proving such a connection between witness encryption and 𝖭𝖯-hardness of learning is much simpler than in the setting considered by [21].

Our Theorem 12 relies on a framework for “witness encryption in oracle worlds” developed in [21] (building on ideas from [11]). The authors of [21] obtain 𝖭𝖯-hardness of the Minimum Oracle Circuit Size Problem with a large circuit size gap [21]. This is somewhat like an “exponential regime” analogue of our oracle-PAC learning model, but we stress that, in [21], an algorithm for 𝖬𝖮𝖢𝖲𝖯 needs to work in the worst case over oracle truth-tables. In our case, a learner only needs to work with high probability over oracle truth-tables sampled from some distribution in 𝖯𝖲𝖠𝖬𝖯.

An earlier example of an equivalence between 𝖭𝖯-hardness of a meta-complexity problem and the existence of a secure cryptographic primitive is the result by Hirahara [17]. It shows an equivalence between the existence of one-way functions and 𝖭𝖯-hardness of distributional time-bounded Kolmogorov complexity d𝖪t under a certain kind of restricted reductions.

An earlier example of an equivalence between hardness of PAC learning and the existence of a cryptographic primitive is due to Nanashima [39]. In that work, Nanashima shows that auxiliary-input one-way functions exist if and only if a certain average-case, or “heuristic”, variant of PAC learning is hard for efficient algorithms. In contrast, we focus on 𝖭𝖯-hardness of worst-case agnostic PAC learning.

1.3 Overview of Techniques

Proof of Theorem 4

We start with an informal overview of the proof of our main equivalence, Theorem 4. In both directions, the constructions employed and the proofs of correctness are straightforward. We first show how to obtain a half-Levin reduction from L to 𝖢𝖦𝖫 from a 𝖯/𝗉𝗈𝗅𝗒-secure witness encryption scheme (𝖤𝗇𝖼,𝖣𝖾𝖼) for L. In particular, given an L-instance z, the reduction outputs a distribution as follows.

Sample b𝒰. Let x:=𝖤𝗇𝖼(z,b). Output (x,b).

Note that we can assume that the output length of 𝖤𝗇𝖼 is at least nΩ(1) without loss of generality, so the reduction described here is honest.

In the case that zL with witness w, is a yes-instance of 𝖢𝖦𝖫, since the circuit defined by 𝖣𝖾𝖼(,z,w) recovers the bit b from x with probability 1 over . This is true by the correctness of (𝖤𝗇𝖼,𝖣𝖾𝖼). Also note that, since 𝖣𝖾𝖼 is a uniform polynomial-time algorithm, the circuit 𝖣𝖾𝖼(,z,w) can be constructed in polynomial time given z and w. Thus, our reduction is half-Levin. In the case that zL, we observe that if there were a circuit C of polynomial size such that

Pr(x,b)[C(x)=b]12+1𝗉𝗈𝗅𝗒(|x|),

then by definition of ,

Pr[C(𝖤𝗇𝖼(z,1))=1]Pr[C(𝖤𝗇𝖼(z,0))=1]1𝗉𝗈𝗅𝗒(n),

violating the security of 𝖤𝗇𝖼. So, must be a no-instance of 𝖢𝖦𝖫.

Now we show how to obtain a 𝖯/𝗉𝗈𝗅𝗒-secure witness encryption scheme from a half-Levin reduction (R,Rwit) to 𝖢𝖦𝖫. We define a witness encryption scheme as follows. Let z be an instance of L, and if zL, let w be a witness. Let a be a secret bit to encrypt.

𝖤𝗇𝖼(z,a) computes =R(z), samples (x,b) and then outputs (x,ab).

𝖣𝖾𝖼((x,c),z,w) computes A:=Rwit(z,w) and then outputs A(x)c.

The correctness of (𝖤𝗇𝖼,𝖣𝖾𝖼) follows from the fact that, for zL, Rwit(z,w) is a witness for =R(z) being a yes-instance of 𝖢𝖦𝖫: that is, a circuit A of polynomial size such that

Pr(x,b)[A(x)=b]=1.

It follows that A(x)c=b(ab)=a. The security of 𝖤𝗇𝖼 follows from Yao’s argument that a distinguisher implies a next-bit predictor. Specifically, for (x,b), from a circuit of size s(n) distinguishing 𝖤𝗇𝖼(z,0)=(x,b) from 𝖤𝗇𝖼(z,1)=(x,1b), we would obtain a circuit of size O(s(n)) predicting (with inverse polynomial advantage) b from x. By the honesty of the reduction, we have O(s(n))=𝗉𝗈𝗅𝗒(|x|). This violates the fact that is a no-instance of 𝖢𝖦𝖫s1s2 for every polynomial s2, a contradiction.

Proof of Theorem 6

To modify the argument above for the case of 𝖡𝖯𝖯-secure witness encryption, we note that, in the first direction discussed, the security of WE is only violated if the circuit C can be constructed uniformly in polynomial time. Therefore, we consider 𝖡𝖯𝖯-black-box reductions to search-𝖢𝖦𝖫. That is, roughly, the reduction is only guaranteed to work when its oracle is a PPT algorithm, and we require the oracle to actually construct a small circuit with inverse polynomial advantage over the queried distribution if one exists. For the other direction, we employ the observation that, from a uniform PPT distinguisher algorithm A, one can construct a next-bit predictor PA uniformly.

Proof of Theorem 7

For the case of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯 in Theorem 7, a challenge is that an arbitrary witness encryption scheme (𝖤𝗇𝖼,𝖣𝖾𝖼) may not yield a distribution that represents a well-defined function. On one hand, if zL, then there is guaranteed to be a unique bit bx such that (x,bx) is in the support of , by the correctness of the witness encryption scheme. However, if zL, this may not be true. Luckily, since is supported over logarithmic-length strings, we can check in polynomial time whether each x has a unique label. If not, we use our assumption of a circuit lower bound for 𝖤 to output a no-instance of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯. In the other direction, from a (deterministic) half-Levin reduction from 𝖲𝖠𝖳 to 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯, we obtain a circuit lower bound for 𝖤 by applying the reduction to a trivial no-instance of 𝖲𝖠𝖳, as in [29].

Proof of Theorem 8 and corollary

Our proof of 𝖭𝖯-hardness of 𝖢𝖦𝖫 with parameters corresponding to semi-proper learning works within the framework developed in [16]. It differs from Theorem 7.2 in that paper mainly in the “completeness” argument. Referring to the notation in the proof of that theorem and of our Lemma 64, we need to construct a circuit of size not exceeding O(θλ).

For one thing, in the proof of [16], using 𝖣𝖯k generators to encode the high-complexity truth-tables fi, the number of input gates to the circuit alone would exceed the desired threshold. Therefore, we use Nisan-Wigderson generators rather than 𝖣𝖯k generators to allow for a short random seed. (A similar idea is used elsewhere in [16].)

More importantly, we apply a recent result of Holmgren and Rothblum showing that “multiselection” of t indices from a string of length m can be done by a circuit of size at most O(m+tlog3(m)) [20]. See Lemma 61. So, we can make 𝗉𝗈𝗅𝗒(n) queries to θ truth-tables fj, each of length λ, with total circuit size at most O(θ(λ+o(λ))). This allows us to compute values of the amplified functions f^j and recover the secret labels b without exceeding the desired size threshold. Note that Uhlig’s Theorem as in [16] cannot accommodate 𝗉𝗈𝗅𝗒(n) queries.

To obtain Corollary 9, which states that semi-proper agnostic learning is 𝖭𝖯-hard, we give a simple randomized reduction from 𝖢𝖦𝖫 to learning that works by empirical estimation of the success probability of the hypothesis produced by the learner. See Lemma 33.

Proof of Theorem 12 and corollaries

To obtain 𝖭𝖯-hardness of 𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫, we apply an argument analogous to the first direction of our main equivalence, along with the fact that, for some distribution family D𝖯𝖲𝖠𝖬𝖯, witness encryption for 𝖭𝖯 exists unconditionally (with high probability) with respect to oracles sampled from D. This was demonstrated in [21]. More specifically, the authors show the following.

Lemma 18 ([21] – informal; see Lemma 41 and definition 40).

For some 𝖭𝖯-complete language L, distribution D𝖯𝖲𝖠𝖬𝖯, and constant , there is a witness encryption scheme (𝖤𝗇𝖼,𝖣𝖾𝖼) for L such that, for all m, with high probability over ODm (with O{0,1}m), (𝖤𝗇𝖼O,𝖣𝖾𝖼O) is secure against O-oracle adversaries of size m1/.

Our idea is to apply the foregoing “oracle witness encryption” along with our main technique for proving that witness encryption implies 𝖭𝖯-hardness of 𝖢𝖦𝖫. We modify the definition of 𝖢𝖦𝖫 accordingly so that all circuits and samplers have access to an oracle randomly sampled from D. See Definition 25. Crucially, to achieve 𝖭𝖯-hardness of D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s(n)s(n)c for a constant c, we require witness encryption with security against adversaries of size s(n)c, and thus we require an oracle O with truth-table length only s(n)c=𝗉𝗈𝗅𝗒(n).

Corollary 13 follows from Theorem 12 analogously to Corollary 9 and Theorem 8, as discussed above: a straightforward reduction from 𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫 to oracle-PAC learning.

To prove Corollary 14, we give a randomized reduction from D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫 to the problem 𝖦𝖺𝗉𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳. Given an instance () of D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫, the reduction starts by sampling a random truth-table OD. Then, it defines an instance of 𝖦𝖺𝗉𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳 which always outputs the entire truth-table O along with a string (x,b) sampled according to O; one can think of as sampling pairs (x,b) with x=(x,O). Clearly, this results in a non-oracle instance of 𝖦𝖺𝗉𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳. In the proof of correctness, we also rely on the fact that a program of size and running time at most s can be simulated by a circuit of size 𝗉𝗈𝗅𝗒(s) (and vice versa). Since the oracles obtained from [21] require length polynomially greater than the size of adversary they are secure against, we need to consider machines running in time less than the length of their inputs. Given these observations and the definitions of 𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫 and 𝖦𝖺𝗉𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳, the corollary follows easily.

Proof of Theorem 15

We prove Theorem 15 with an argument from [2] as a starting point.444Note that we include a proof below for the reader’s convenience; see Lemma 72. Let R be a randomized non-adaptive reduction from an 𝖭𝖯-complete language L to agnostic learning over 𝖯𝖲𝖠𝖬𝖯/𝗉𝗈𝗅𝗒. Assume that i.o. one-way functions do not exist. Then, for any D𝖯𝖲𝖠𝖬𝖯 and auxiliary-input function {fz}z{0,1}, there is a machine I that distributionally inverts fz with high probability over auxiliary inputs zD. We think of instances z of L as auxiliary inputs for the function fz that, roughly, simulates R(z) to obtain a sampling circuit over string/label pairs and then samples from . Applebaum et al. show that there is a polynomial-time-computable hypothesis h, using a distributional inverter for fz, having near-maximum agreement over . Running the reduction R and answering oracle queries with hypotheses h defined in this way, we obtain

𝖣𝗂𝗌𝗍𝖭𝖯𝖧𝖾𝗎𝗋𝖡𝖯𝖯,

where “𝖧𝖾𝗎𝗋”, informally, refers to average-case heuristics that may err by incorrectly outputting 1 or 0.

We next use the fact that the inversion of polynomial-time-computable functions is verifiable. That is, if a machine fails to invert on some instance, it can be defined to “realize this” in polynomial time. More concretely, the machine I above can be defined to output on input z if it fails to invert often on some particular fz, using empirical estimation. From this observation, we further obtain

𝖣𝗂𝗌𝗍𝖭𝖯𝖠𝗏𝗀𝖡𝖯𝖯,

where 𝖠𝗏𝗀 refers to heuristics that do not err but may output rather than 1 or 0 with inverse polynomial probability. Lastly, we apply a recent result of [12], which, from 𝖣𝗂𝗌𝗍𝖭𝖯𝖠𝗏𝗀𝖡𝖯𝖯, yields worst-case easiness of agnostic PAC learning over distributions in 𝖯𝖲𝖠𝖬𝖯/𝗉𝗈𝗅𝗒. We then run the reduction R for a second time, answering oracle queries with the worst-case agnostic learner guaranteed by [12], to obtain 𝖭𝖯𝖡𝖯𝖯.

Proofs of Theorem 16 and Theorem 17

To obtain Theorem 16, start by assuming the existence of witness encryption for an 𝖭𝖯-complete language. Then, apply the characterization given by Theorem 4 to obtain a half-Levin reduction from 𝖭𝖯 to 𝖢𝖦𝖫. Since 𝖢𝖦𝖫 reduces to agnostic PAC learning, we may apply Theorem 15 as described above to exclude Heuristica and Pessiland. To summarize, informally,

WE 𝖭𝖯hl𝖢𝖦𝖫 (Theorem 4)
𝖭𝖯tt agnostic PAC-learning over 𝖯𝖲𝖠𝖬𝖯/𝗉𝗈𝗅𝗒
(𝖭𝖯𝖡𝖯𝖯 i.o. OWF), (Theorem 15)

where hl refers to a half-Levin reduction and tt refers to a randomized non-adaptive (ie. “truth-table”) reduction. Furthermore, Garg et al. show that witness encryption for 𝖭𝖯, together with the existence of a one-way function, yields public-key encryption [11]. This, together with the above, conditionally yields public-key encryption. To summarize,

WE and 𝖭𝖯𝖡𝖯𝖯 WE and i.o. OWF
i.o. PKE ([11])

Theorem 17 is immediate from the reasoning above together with another application of the characterization in Theorem 4. That is,

𝖭𝖯hl𝖢𝖦𝖫 WE (Theorem 4)
(𝖭𝖯𝖡𝖯𝖯 i.o. PKE) (Theorem 16)

2 Preliminaries

Definition 19.

We say that a distribution is flat if it is equal to the uniform distribution over its support.

Definition 20.

For a distribution family D={Dn}n, s:, and δ:[0,1], 𝖲𝖨𝖹𝖤Dδ[s] denotes the class of function families f={fn:{0,1}n{0,1}}n such that, for all sufficiently large n, there exists a circuit Cn of size at most s(n) such that

PrxDn[Cn(x)=fn(x)]δ(n).

2.1 PAC Learning Problems

Definition 21 (Optimal hypotheses).

For n, a function s:, a distribution family n supported over {0,1}n×{0,1}, a parameter ε>0, and a hypothesis h:{0,1}n{0,1}, we say that h is (s,ε)-optimal for n if

Pr(x,b)n[h(x)=b]maxf𝖲𝖨𝖹𝖤[s(n)]{Pr(x,b)n[f(x)=b]}ε.

For λ, an oracle function O:{0,1}λ{0,1}, and a hypothesis g():{0,1}n{0,1}, we say that g is (s,ε,O)-optimal for n if

Pr(x,b)n[gO(x)=b]maxf𝖲𝖨𝖹𝖤[s(n)]{Pr(x,b)n[fO(x)=b]}ε.
Definition 22 (Agnostic PAC learning circuits).

Consider s1,s2:. We say that an algorithm A agnostically PAC learns 𝖲𝖨𝖹𝖤[s1(n)] by 𝖲𝖨𝖹𝖤[s2(n)] if, for every n, δ>0, ε>0, and joint distribution family ={n}n where each n is supported over {0,1}n×{0,1}, the following holds. With probability at least 1δ, An outputs an (s1,ε)-optimal hypothesis h for n of size at most s2(n).

A is given the parameters n,1/ε, and 1/δ in unary.

We say that A agnostically PAC learns 𝖲𝖨𝖹𝖤[s1] in the improper setting if the above is true except there is no restriction on the complexity of h (except that it cannot exceed size the run time of A).

Definition 23 (Agnostic Oracle-PAC learning circuits).

The definition is analogous to Definition 22, except we also consider distribution families D={Dλ}λ for sampling oracle truth-tables, where Dλ is supported over {0,1}2λ and uniformly samplable in time 𝗉𝗈𝗅𝗒(2λ). We say that an algorithm A agnostically oracle-PAC learns 𝖲𝖨𝖹𝖤[s1] by 𝖲𝖨𝖹𝖤[s2] if, for any such D, for every ε,δ>0 and n,λ, the following holds with probability at least 1δ over ODλ.

For any joint distribution family ={n}n where each n is supported over {0,1}n×{0,1}, with probability at least 1δ over the randomness of A, An(𝗍𝗍(O)) outputs an (s1,ε,O)-optimal hypothesis h for n of circuit size at most s2(n).

A is also given the parameters n,1/ε, and 1/δ in unary.

Applebaum et al. define the following decision version of the PAC learning problem for circuits.

Definition 24 (Computational gap-learning problem (CGL) [2]).

The computational gap-learning problem with size parameters s1,s2: and security parameter ε:[0,1], denoted 𝖢𝖦𝖫s1s2[ε], is the promise problem defined as follows.555We note that [2] define 𝖢𝖦𝖫 for a given completeness parameter α:[0,1] (ie. a yes-instance has C such that Pr[C(x)=b]α(n)). In this work, we only consider α=1. The input consists of a distribution over pairs (x,b){0,1}n×{0,1} represented as a circuit of size 𝗉𝗈𝗅𝗒(n).666In this work, we abuse notation by taking to refer both to the distribution and the circuit sampling it.

  • is a yes-instance if there exists a circuit C of size at most s1(n) such that

    Pr(x,b)[C(x)=b]=1.
  • is a no-instance if for every circuit C of size at most s2(n),

    Pr(x,b)[C(x)=b]<12+ε(n).
Definition 25 (Oracle computational gap-learning problem (𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫)).

Let D={Dλ}λ be a polynomial-time-samplable distribution family such that each Dλ is supported over truth-tables of oracle functions Oλ:{0,1}λ{0,1}. The D-oracle computational gap-learning problem with size parameters s1,s2:, security parameter ε:[0,1], and oracle parameter λ:, denoted D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s1s2[ε,λ], is defined analogously to 𝖢𝖦𝖫s1s2[ε] except all circuits (including the sampler ) have access to an oracle randomly drawn from Dλ(n). More specifically,

  • () is a yes-instance if there exists an oracle circuit C() of size at most s1(n) such that, for every oracle O in the support of Dλ(n),

    Pr(x,b)O[CO(x)=b]=1.
  • () is a no-instance if, with probability at least 1ε(n) over ODλ(n), for every oracle circuit C() of size at most s2(n),

    Pr(x,b)O[CO(x)=b]<12+ε(n).
Definition 26 (𝖦𝖺𝗉g,ε𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳).

𝖦𝖺𝗉𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳 with gap parameter g: and soundness parameter ε1:, denoted 𝖦𝖺𝗉g,ε𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳, is the decision problem defined as follows. The input consists of a distribution over pairs (x,b){0,1}n×{0,1} represented as a circuit of size 𝗉𝗈𝗅𝗒(n). The input also includes a threshold parameter s and a running time parameter t represented in unary.

  • (,1s,1t) is a yes-instance of 𝖦𝖺𝗉g,ε𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳 if there exists a t-time program M of size at most s such that Mx=b for all (x,b) in the support of .

  • (,1s,1t) is a no-instance of 𝖦𝖺𝗉g,ε𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳 if for every g(t)-time program M of size at most g(s),

    Pr(x,b)[Mx=b]<12+ε(n)

2.2 Reductions to Learning

Here we define a form of reduction that is intermediate between the standard notions of a Karp reduction and a Levin reduction.

Definition 27 (Half-Levin reduction).

Consider languages L1,L2𝖭𝖯 with witness relations VL1,VL2. A half-Levin reduction from L1 to L2 consists of polynomial-time machines R and Rwit with the following properties.

  • For any L1-instance x, xL1R(x)L2;

  • For any (x,w)VL1, it holds that (R(x),Rwit(x,w))VL2.

A number of the results in this paper involve half-Levin reductions to 𝖢𝖦𝖫, which belongs to (a promise version of) 𝖬𝖠 but not necessarily 𝖭𝖯. Note that the definition of a half-Levin reduction can be extended to 𝖢𝖦𝖫 in the natural way. Specifically, in a half-Levin reduction from L𝖭𝖯 to 𝖢𝖦𝖫s1s2[ε], the machine Rwit, given some zL and witness w, is required to produce a circuit C of size at most s1 such that

Pr(x,b)[C(x)=b]=1,

where is the output of R(z). In addition, we say that a reduction to 𝖢𝖦𝖫 is honest if, on an input z of length n, it outputs a distribution supported over {0,1}nΩ(1)×{0,1}.

Definition 28 (Reduction to agnostic PAC learning circuits [2]).

For polynomials s1 and s2 and δ:[0,1], a reduction from a language L to agnostic PAC learning 𝖲𝖨𝖹𝖤[s1] by 𝖲𝖨𝖹𝖤[s2] over 𝖯𝖲𝖠𝖬𝖯/𝗉𝗈𝗅𝗒 with failure probability δ consists of a PPT machine R, which on input z{0,1}n, makes queries that are joint distributions (i) over {0,1}m(n)×{0,1} for some m:, each represented by a sampling circuit. The oracle provides a hypothesis hi, represented as a 𝗉𝗈𝗅𝗒(n)-size circuit, as a response to each query. Based on the hypotheses hi and its own computation, R accepts or rejects the input z.

It is guaranteed that, for some inverse polynomial ε, if the oracle returns an (s1,ε)-optimal hypothesis h of size s2(m(n)) for each query (i), then with probability at least 1δ(n), R(z)=L(z).

If the failure probability is omitted, we take it to be 1/3.

If m(n)nΩ(1), we call the reduction honest.

Definition 29 (Reduction to agnostic oracle-PAC learning circuits).

We define reductions to agnostic oracle-PAC learning analogously with Definition 28. Specifically, the reduction samples random O(i)Dλ(n) and then queries pairs ((i),𝗍𝗍(O(i))) to its oracle. The guarantee of the reduction holds if the oracle responds with an (s1,ε,O(i))-optimal hypothesis of size at most s2(m(n)) for each query (i).

For our results concerning 𝖡𝖯𝖯-secure witness encryption, we will require non-black-box notions of reduction. We start by recalling the standard definition of a class-specific black-box reduction.

Definition 30 (Class-specific black-box reduction [13]).

For a complexity class C and languages L1 and L2, a PPT oracle machine R is a C-black-box reduction from L1 to L2 if, for any oracle 𝒪C deciding L2, R𝒪 decides L1.

We will actually require a slightly less strict kind of reduction: rather than insisting L2C everywhere, to be correct on a given input z, the reduction will only require that an oracle 𝒪C agree with L2 on the queries actually made given that input z. For our purposes, we only need to consider the case of a reduction that generates its queries deterministically.

Definition 31 (Instance-wise class-specific black-box reduction).

Consider a complexity class C, languages L1 and L2, and a non-adaptive oracle machine R that generates its queries deterministically but may use randomness elsewhere. R is an instance-wise C-black-box reduction from L1 to L2 if the following holds. For any oracle 𝒪C and instance z of L1, if q1,,qm are the oracle queries made by R on input z and 𝒪(qi)=L2(qi) for all i[m], then R𝒪(z) correctly decides L1(z).

We now explicitly define our required notion of an instance-wise 𝖡𝖯𝖯-black-box half-Levin reduction to the search version of 𝖢𝖦𝖫. Note that we may drop the qualifier “instance-wise” elsewhere in the presentation. Also note that the definition is extended from those above to the case of a reduction to a search problem.

Definition 32 (Instance-wise 𝖡𝖯𝖯-black-box half-Levin reduction to 𝗌𝖾𝖺𝗋𝖼𝗁-𝖢𝖦𝖫s1s2[ε]).

For polynomials s1 and s2 and a language L, an instance-wise 𝖡𝖯𝖯-black-box half-Levin reduction from L to 𝗌𝖾𝖺𝗋𝖼𝗁-𝖢𝖦𝖫s1s2[ε] is an oracle machine R as follows. On an input z{0,1}, R deterministically makes one query, z: a joint distribution supported on {0,1}m(|z|)×{0,1} for some polynomial m, represented by a sampling circuit.

(Instance-wise 𝗕𝗣𝗣-black-box) Consider an input z{0,1}n for some n, and suppose the oracle A is a PPT algorithm meeting the following condition:

Given the query (z,z) made by R on input z, if z is not a no-instance of 𝖢𝖦𝖫s1s2[ε], then with probability at least 12Ω(n) over its internal randomness, A returns a hypothesis h of size at most s2(m(n)) such that

Pr(x,b)z[h(x)=b]12+ε(m(n))2.

Then, the following are guaranteed for such z.

  • if zL, z is a yes-instance of 𝖢𝖦𝖫s1s2[ε], and RA(z) accepts with probability 12Ω(n);

  • if zL, z is a no-instance of 𝖢𝖦𝖫s1s2[ε], and RA(z) rejects with probability at least nO(1).

(Half-Levin) Further, there exists a polynomial-time computable function Rwit such that, for any input z{0,1}, if z is a yes-instance of L with witness w, then Rwit(z,w) outputs a witness for z𝖢𝖦𝖫s1s2[ε]: that is, a circuit C of size at most s1(m(n)) such that Pr(x,b)z[C(x)=b]=1.

The lemma below shows that a reduction to 𝖢𝖦𝖫 implies a reduction to agnostic PAC learning as in Definition 28.

Lemma 33.

For a language L, δ:[0,1], and polynomials s1,s2,ε1 with ε(n)1/8 for n, suppose there is a randomized many-one reduction from L to 𝖢𝖦𝖫s1s2[ε] with failure probability δ. Then, there is a randomized poly-time reduction from L to agnostic PAC learning 𝖲𝖨𝖹𝖤[s1] by 𝖲𝖨𝖹𝖤[s2] with failure probability δ, where δ(n):=δ(n)+2Ω(n) for n.

Proof.

Let R be the assumed reduction to 𝖢𝖦𝖫s1s2[ε]. Define a reduction R as follows:

On input an instance z{0,1}n of L and a choice of randomness r, simulate R(z,r) to produce a polynomial-size sampling circuit supported over {0,1}m×{0,1}. Query to the oracle, obtaining a hypothesis h in return. If the size of h is greater than s2(m), reject. By sampling 𝗉𝗈𝗅𝗒(ε1(m)n) times from , obtain an empirical estimate of Pr(x,b)[h(x)=b]. If the value obtained is less than 12ε(m), reject. Otherwise, accept.

We will show that R constitutes a reduction to agnostic PAC learning 𝖲𝖨𝖹𝖤[s1] by 𝖲𝖨𝖹𝖤[s2].

Consider an L-instance z. First suppose zL. Then, with probability at least 1δ(n) over r, is a yes-instance of 𝖢𝖦𝖫s1s2[ε]. In this case,

maxC𝖲𝖨𝖹𝖤[s1(m)]{Pr(x,b)[C(x)=b]}=1.

Assume that the oracle A returns an (s1,ε(m))-optimal hypothesis h of size s2(m): that is,

Pr(x,b)[h(x)=b]1ε(m).

Then, by a Chernoff bound, with probability at least 12Ω(n), R accepts. Overall, for zL, R accepts with probability greater than 1(δ(n)+2Ω(n)).

Now suppose z is a no-instance of L. In this case, with probability at least 1δ(n) over r, is a no-instance of 𝖢𝖦𝖫s1s2[ε]: that is, for every circuit C of size at most s2(m),

Pr(x,b)[C(x)=b]<12+ε(m).

Thus, A cannot return a hypothesis h𝖲𝖨𝖹𝖤[s2(m)] with success better than (1/2)+ε(m)<12ε(m) over . So R rejects with probability at least 12Ω(n). Overall, for zL, R accepts with probability at most δ(n)+2Ω(n), as desired.

By a very similar argument, we obtain the following for the oracle-PAC setting. We omit the proof here.

Lemma 34.

For a language L, δ:[0,1], D𝖯𝖲𝖠𝖬𝖯, λ: with λ(n)=O(logn) for n, and polynomials s1,s2,ε1 with ε(n)1/8 for n, suppose there is a randomized many-one reduction from L to D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s1s2[ε,λ] with failure probability δ. Then, there is a randomized poly-time reduction from L to agnostic oracle-PAC learning 𝖲𝖨𝖹𝖤[s1] by 𝖲𝖨𝖹𝖤[s2] with failure probability δ, where δ(n):=δ(n)+ε(n)+2Ω(n) for n.

2.3 Meta-complexity

The following is a natural generalization of 𝖬𝖢𝖲𝖯 to the distributional version, where in addition to the truth table of a function f:{0,1}n{0,1}, one is also given an explicit distribution μ over {0,1}n (as a table of probabilities), and is asked to distinguish “easy” f’s that can be well approximated by small boolean circuits from “hard” f’s that are almost impossible to approximate even by much larger boolean circuits.

Definition 35 (𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯).

The gap Distributional Minimum Circuit Size problem with size parameters s1,s2: and advantage parameter ε:[0,1], denoted 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯s1s2[ε], is the decision problem defined as follows. The input consists of the truth table of a function f:{0,1}n{0,1} (represented as a truth-table of length 2n bits) and a probability distribution μ:{0,1}n[0,1] (represented as a table of length 2O(n) bits).

  • (f,μ) is a yes-instance if there exists a circuit C of size at most s1(n) such that

    Prxμ[C(x)=f(x)]=1.
  • (f,μ) is a no-instance if for every circuit C of size at most s2(n),

    Prxμ[C(x)=f(x)]<12+ε(n).

We note that, for an appropriate choice of the approximation parameters and distribution μ, 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯 generalizes other variants of 𝖬𝖢𝖲𝖯, e.g., 𝖬𝖢𝖲𝖯 of [16] and an average-case version 𝖠𝗏𝖾𝖬𝖢𝖲𝖯 of [45].

2.4 Secret Sharing

See [4] for a survey.

Definition 36 (Access Structures Represented by Monotone Formulae).

An access structure is a monotone collection 𝒜2[n] of non-empty subsets of [n]. For T[n], we say that T is authorized if it belongs to 𝒜 and unauthorized otherwise. We say that a monotone formula φ:{0,1}n{0,1} represents 𝒜 if φ(χT)=1 exactly when T𝒜, where χT is the characteristic vector of T.

Definition 37 (Efficient Secret Sharing Scheme).

An efficient secret sharing scheme for an access structure 𝒜2[n] represented by a monotone formula φ consists of a pair of poly-time algorithms Share and Rec with the following properties. Share takes three inputs; the first is the formula φ:{0,1}n{0,1}, the second is a “secret” b{0,1}, and the third is a choice of “randomness” r{0,1}|φ|. Rec also takes three inputs: the formula φ, a subset of “shares” vT=(vi)iT({0,1}|φ|)|T|, and the set T[n].

  • Correctness: Authorized parties can recover the secret. Consider any b{0,1} and r{0,1}|φ|, let (v1,,vn) be the output of Share(φ,b;r), and let T be an authorized set. Then Rec(φ,vT,T)=b.

  • Privacy: Unauthorized parties cannot learn anything about the secret, in an information theoretic sense. Let T be an unauthorized set. Then, for b{0,1} and r{0,1}|φ|, the random variable Share(φ,b;r)T is statistically independent from the random variable b.

Lemma 38 ([27, 5]).

For every access structure 𝒜, there exists an efficient secret sharing scheme as in Definition 37. The circuit size required to simulate Rec is at most O(|vT|2).

Proof sketch..

We will describe the construction given in [5]. Let φ:{0,1}n{0,1} be a monotone formula representing 𝒜. Starting at the root of the tree representation of φ, the Share algorithm assigns values to the nodes in a recursive manner as follows. For a secret b{0,1}, assign b to the root, and instantiate all shares vi for i[n] to be empty. Then:

  • If the current gate is and the current value is s, assign s to both children.

  • if the current gate is and the current value is s, randomly choose t{0,1}. Assign t to one child and ts to the other.

  • If the current gate is a variable i[n] and the current value is s, include s in the share vi.

Note that the amount of randomness required is at most |φ|. The Rec algorithm proceeds from the leaves of the tree back to the root. Start by assigning shares (vi)iT to leaves using the set T. Then:

  • If the current gate is and both children have value s, assign s to the parent.

  • If the current gate is and the children have values s and t, assign ts to the parent.

  • If the current gate is the root, return the value assigned.

2.5 Witness Encryption

Definition 39 (Witness Encryption [11]).

Consider a language L𝖭𝖯 with witness relation RL, a class of function families Γ, a security function ε:[0,1], and a length function :. A (Γ,ε)-secure -length witness encryption scheme for L consists of polynomial-time algorithms 𝖤𝗇𝖼 and 𝖣𝖾𝖼 as follows.

  • Given an L-instance x{0,1}n, a message bit b{0,1}, and randomness r𝒰𝗉𝗈𝗅𝗒(n), 𝖤𝗇𝖼(x,b;r) outputs a ciphertext c of length at most (n).

  • Given a ciphertext c and an instance/witness pair (x,w)R, 𝖣𝖾𝖼(c,x,w) deterministically outputs a bit b{0,1}.

Moreover, 𝖤𝗇𝖼 and 𝖣𝖾𝖼 have the properties below.

  • Correctness: For all sufficiently large n, any b{0,1}, x{0,1}nL, and w such that (x,w)RL,

    Prr[𝖣𝖾𝖼(𝖤𝗇𝖼(x,b;r),x,w)=b]=1.
  • Security: For any A={An}nΓ, sufficiently large n, and x{0,1}n\L,

    |Pr[An(x,𝖤𝗇𝖼(x,1;r))=1]Pr[An(x,𝖤𝗇𝖼(x,0;r))=1]|<ε(n).

If ε is omitted, we assume it is some negligible function. If is omitted, we allow the output length of 𝖤𝗇𝖼 to be any polynomial.

Definition 40 (Witness Encryption in Oracle Worlds [21]).

Consider a language L𝖭𝖯 with witness relation RL, a class of function families Γ, a security function ε:[0,1], and a family of distributions D={Dλ}λ, where each Dλ is supported over {0,1}2λ. A (Γ,ε)-secure witness encryption scheme for L with respect to D is defined as in Definition 39, except the algorithms 𝖤𝗇𝖼 and 𝖣𝖾𝖼 both take an oracle, and they satisfy the following properties.

  • Correctness: For all sufficiently large n,λ, O in the support of Dλ, b{0,1}, x{0,1}nL, and w such that (x,w)RL,

    Prr[𝖣𝖾𝖼O(1λ,𝖤𝗇𝖼O(1λ,x,b;r),x,w)=b]=1.
  • Security: For all sufficiently large n,λ, with probability 1ε(λ) over ODλ, for any A={An,λ}n,λΓ and any x{0,1}n\L,

    |Prr[An,λO(𝖤𝗇𝖼O(1λ,x,1;r))=1]Prr[An,λO(𝖤𝗇𝖼O(1λ,x,0;r))=1]|<ε(λ).

The following result from Huang, Ilango, and Ren was originally stated with security against oracle programs of polynomial size and query complexity; we observe that the same holds for polynomial-size circuits.

Lemma 41 ([21]).

There exist an 𝖭𝖯-complete language L, a constant , and a family of distributions D={Dλ}λ, where each Dλ is supported over {0,1}2λ and samplable in time 𝗉𝗈𝗅𝗒(2λ), such that there is a (𝖲𝖨𝖹𝖤[2λ/],2λ/)-secure witness encryption scheme for L with respect to D.

2.6 Hardness of Approximation

Definition 42 (𝖬𝖬𝖲𝖠).

For g,θ,Δ:, 𝖦𝖺𝗉g𝖬𝖬𝖲𝖠θΔ denotes the following promise problem.

The input consists of a monotone formula φ over variables [n] and of length at most Δ(n) as a binary string. Let 𝖬𝖬𝖲𝖠(φ) denote min{i[n]α(i)|α:[n]{0,1} satisfies φ}.

  • ΠY={φ𝖬𝖬𝖲𝖠(φ)θ(n)}

  • ΠN={φ𝖬𝖬𝖲𝖠(φ)>g(n)θ(n)}

Lemma 43 ([8, 9, 16]).

There exist g,θ,Δ: with g(n)=n(loglogn)O(1), θ(n)n, and Δ a polynomial with Δ(n)n for n as follows. For every language L𝖭𝖯, there is a half-Levin reduction from L to 𝖦𝖺𝗉g𝖬𝖬𝖲𝖠θΔ.

Definition 44 (𝖢𝖬𝖬𝖲𝖠 [16]).

For g,ε1,θ,Δ:, 𝖦𝖺𝗉g,ε𝖢𝖬𝖬𝖲𝖠θΔ denotes the following promise problem. The input consists of a collection Φ={φ1,,φm} of monotone formulas over variables [n], where each φi has length at most Δ(n) as a binary string, along with a weight function w:[n]. For an assignment α:[n]{0,1}, let w(α) denote i[n]α(i)w(i).

  • ΠY={(Φ,w) there exists α:[n]{0,1} such that 
    w(α)θ(n) and PrφΦ[φ(α)=1]=1}

  • ΠN={(Φ,w) for all α:[n]{0,1} such that w(α)θ(n)g(n)
    PrφΦ[φ(α)=1]<ε}

Lemma 45 ([9, 16]).

There exists a constant γ>0 and a function θ: such that for all Δ: with Δ(n)=O(logn), for every language L𝖭𝖯, there is a half-Levin reduction from L to 𝖦𝖺𝗉Δγ,Δγ𝖢𝖬𝖬𝖲𝖠θΔ.

2.7 One-way functions and Inversion

Definition 46 (δ-invertible (auxiliary-input) functions).

For n,m, a function f:{0,1}n{0,1}m, a PPT algorithm A, and δ:[0,1], we say that A is a δ-inverter for f if

Prx𝒰n[A(1n,f(x))f1(f(x))]1δ(n).

In the case of an auxiliary-input function fz:{0,1}{0,1}m, where ,m, we say that A is a δ-inverter for fz if

Prx𝒰n[A(z,fz(x))fz1(fz(x))]1δ(|z|).
Definition 47 (Infinitely-often one-way function).

A polynomial-time computable function family f={fn:{0,1}n{0,1}m(n)}n is an infinitely-often one-way function if, for every PPT algorithm A, there exists an inverse polynomial δ such that, for infinitely many n, A is not a δ-inverter for fn.777It is known that a weak one-way function is equivalent to a standard (strong) one-way function [47], so we will not distinguish between the two in this work.

Definition 48 (Auxiliary-input one-way function).

A polynomial-time computable auxiliary-input function family f={fz:{0,1}(|z|){0,1}m(|z|)}z{0,1} is an auxiliary-input one-way function if, for every PPT algorithm A, there exists an inverse polynomial δ such that, for infinitely many z{0,1}, A is not a δ-inverter for fz.

Definition 49 (Statistical Distance).

The statistical distance between two probability distributions D(1) and D(2) over domain {0,1}n, denoted Δ(D(1),D(2)), is defined as

12x{0,1}n|D(1)(x)D(2)(x)|,

or equivalently,

maxT{0,1}n|PrxD(1)[xT]PrxD(2)[xT]|.
Definition 50 (Distributionally invertible auxiliary-input functions).

Consider an auxiliary input function fz computable uniformly in polynomial time given z{0,1}n and an input x{0,1}(n). The function fz is said to be distributionally invertible if for every constant b>0 there is a PPT algorithm I such that

Δ((x,fz(x)),(I(z,fz(x)),fz(x)))1/nb,

where x𝒰(n). We refer to the machine I as an nb-distributional inverter for fz.

Lemma 51 (Strong inversion implies distributional inversion [25]).

For every polynomial-time computable auxiliary-input function fz and constant b, there is a constant c such that the following holds. If there exists an nc-inverter for fz, then there exists an nb-distributional inverter for fz.

3 Proofs of Main Results

3.1 Connections between WE and learning

3.1.1 Proof of Theorem 4

Lemma 52.

Consider a language L𝖭𝖯. For s2,ε1,:, suppose a (𝖲𝖨𝖹𝖤[s2],ε)-secure -length witness encryption scheme (𝖤𝗇𝖼,𝖣𝖾𝖼) exists for L. Let the polynomial s1 be such that s1(n) upper bounds the running times of 𝖤𝗇𝖼 and 𝖣𝖾𝖼 on L-instances of length n. Let s1((n))=s1(n)2, s2((n))=s2(n), and 2ε((n))=ε(n).

Then, there is a half-Levin reduction from L to 𝖢𝖦𝖫s1s2[ε] mapping L-instances of length n to distributions supported over {0,1}(n)×{0,1}.

Proof.

Given an L-instance z{0,1}n, the reduction to 𝖢𝖦𝖫s1s2[ε] will output a distribution as follows.

Sample b𝒰. Let x:=𝖤𝗇𝖼(z,b){0,1}(n). Output (x,b).

First consider the case that zL. Let w be a witness for z. Let C be the circuit that, with z and w hard-wired, given x as input, outputs b:=𝖣𝖾𝖼(x,z,w). Note that the size of C is less than s1(n)2=s1((n)). Moreover, since C is efficiently computable given z and w, the reduction is a half-Levin reduction. By the correctness of the witness encryption scheme, we have

Pr(x,b)[C(x)=b]=1.

Thus, is a yes-instance of 𝖢𝖦𝖫s1s2[ε].

Now consider the case that zL. Toward a contradiction, suppose there exists a circuit C of size at most s2((n))=s2(n) such that

Pr(x,b)[C(x)=b]12+ε((n))=12+ε(n)2.

Then,

12+ε(n)2 12Pr𝖤𝗇𝖼,b[C(x)=0b=0]+12Pr𝖤𝗇𝖼,b[C(x)=1b=1]
=12+12(Pr[C(x)=1b=1]Pr[C(x)=1b=0]),

which implies that

|Pr[C(𝖤𝗇𝖼(z,1))=1]Pr[C(𝖤𝗇𝖼(z,0))=1]|ε(n),

contradicting the security of the witness encryption scheme. Thus, is a no-instance of 𝖢𝖦𝖫s1s2[ε].

This completes the proof of the lemma.

Observe that a 𝖯/𝗉𝗈𝗅𝗒-secure encryption scheme is (𝖲𝖨𝖹𝖤[s],ε)-secure for all polynomials s and ε1. Thus, we obtain the corollary below, which gives the first direction of Theorem 4.

Corollary 53.

Consider any language L𝖭𝖯, and suppose a 𝖯/𝗉𝗈𝗅𝗒-secure witness encryption scheme (𝖤𝗇𝖼,𝖣𝖾𝖼) exists for L. Then, there exist a polynomial s1 and a pair of machines (R,Rwit) such that, for all polynomials s2,ε1, (R,Rwit) is an honest half-Levin reduction from L to 𝖢𝖦𝖫s1s2[ε].

Lemma 54.

Consider any polynomials s1,m:, any s2,ε1:, and any language L𝖭𝖯. Suppose there is a half-Levin reduction (R,Rwit) from L to 𝖢𝖦𝖫s1s2[ε], where on L-instances of length n, R outputs distributions supported over {0,1}m(n)×{0,1} in time tR(n), and Rwit runs in time tRwit(n).

Then, there is a (𝖲𝖨𝖹𝖤[s2],ε)-secure (m(n)+1)-length witness encryption scheme for L, where s2(n)=s2(m(n))/2, ε(n)=2ε(m(n)), 𝖤𝗇𝖼 runs in time O(tR(n)), and 𝖣𝖾𝖼 runs in time O(tRwit(n)).

Proof.

We define a witness encryption scheme for L as follows.

For z{0,1}n and a{0,1}, 𝖤𝗇𝖼(z,a) computes =R(z), samples (x,b) and then outputs (x,ab).

𝖣𝖾𝖼((x,c),z,w) computes A:=Rwit(z,w) and then outputs A(x)c.

We first show the correctness of (𝖤𝗇𝖼,𝖣𝖾𝖼). Suppose z is a yes-instance of L with witness w. Then, by the correctness of the half-Levin reduction, A=Rwit(z,w) is a witness for =R(z)𝖢𝖦𝖫s1s2[ε]; that is, A is a circuit of size at most s1(m(n)) such that

Pr(x,b)[A(x)=b]=1.

Then, for any a{0,1} and any random choice of (x,b) made by 𝖤𝗇𝖼, it holds that A(x)=b, so 𝖣𝖾𝖼((x,ab),z,w) correctly outputs a=b(ab).

We now argue for security. Suppose z is a no-instance of L. Then =R(z) is a no-instance of 𝖢𝖦𝖫s1s2[ε]; that is, for any circuit A of size at most s2(m(n)),

Pr(x,b)[A(x)=b]<12+ε(m(n)).

Let A be any circuit of size at most s2(n)=s2(m(n))/2. Toward a contradiction, suppose888We have removed the absolute value signs without loss of generality: a similar argument can be applied if the left-hand side is at most ε(n).

Pr[A(𝖤𝗇𝖼(z,0))=1]Pr[A(𝖤𝗇𝖼(z,1))=1]ε(n)=2ε(m(n)).

By definition of 𝖤𝗇𝖼, the above can be written as

Pr(x,b)[A(x,b)=1]Pr(x,b)[A(x,1b)=1]2ε(m(n)).

Let PA denote Yao’s next-bit predictor applied to A. In particular,

On input x{0,1}n, PA samples a random bit c𝒰 and outputs c iff A(x,c)=1.

By a standard argument, it holds for some fixed choice of c{0,1} that

Pr(x,b)[PA(x)=b]12+ε(m(n)).

Finally, observe that PA with randomness fixed as above can be implemented as a circuit of size at most s2(m(n)). This contradicts the assumption that is a no-instance of 𝖢𝖦𝖫s1s2[ε]. We conclude that 𝖤𝗇𝖼 is (𝖲𝖨𝖹𝖤[s2],ε)-secure.

This completes the proof of the lemma.

Since a witness encryption scheme that is (𝖲𝖨𝖹𝖤[s],ε)-secure for all polynomials s and ε1 is 𝖯/𝗉𝗈𝗅𝗒-secure, we obtain the corollary below, which gives the second direction of Theorem 4.

Corollary 55.

Consider any language L𝖭𝖯. Suppose there exist a polynomial s1 and a pair of machines (R,Rwit) such that, for all polynomials s2,ε1, (R,Rwit) is an honest half-Levin reduction from L to 𝖢𝖦𝖫s1s2[ε]. Then, there is a 𝖯/𝗉𝗈𝗅𝗒-secure witness encryption scheme for L.

3.1.2 Proof of Theorem 6

Lemma 56.

Consider any language L𝖭𝖯, and suppose a 𝖡𝖯𝖯-secure witness encryption scheme (𝖤𝗇𝖼,𝖣𝖾𝖼) exists for L.

Then, there exist a polynomial s1 and a pair of machines (R,Rwit) such that, for all polynomials s2 and ε1, (R,Rwit) is an honest 𝖡𝖯𝖯-black-box half-Levin reduction from L to 𝗌𝖾𝖺𝗋𝖼𝗁-𝖢𝖦𝖫s1s2[ε]

Proof.

Consider a language L𝖭𝖯 with witness relation RL{0,1}n×{0,1}nc for some constant c and an L-instance z{0,1}n. Let (𝖤𝗇𝖼, 𝖣𝖾𝖼) be a 𝖡𝖯𝖯-secure -length witness encryption scheme for L. Let the polynomial s1 be such that s1((n)) upper bounds both nc and the running time of 𝖣𝖾𝖼 on L-instances of length n. As in Lemma 52, we consider the following distribution .

Sample b𝒰. Let x:=𝖤𝗇𝖼(z,b){0,1}(n) and output (x,b).

Our reduction R will make one query to its oracle: namely, (z,). After receiving a hypothesis h in return, R will sample from and simulate h to obtain an empirical estimate of Pr(x,b)[h(x)=b]. R will accept iff the estimate of this value is greater than 1/2+ε((n))/3.

Let A be a PPT algorithm, z{0,1}n an instance of L, and s2,ε1 arbitrary polynomials. Assume that A meets the condition in Definition 32 on z: namely, letting (z,) be the query made by R(z), if is not a no-instance of 𝖢𝖦𝖫s1s2[ε], then with probability at least 12Ω(n) over its internal randomness, A returns a hypothesis h with agreement at least 1/2+ε((n))/2 over .

First consider the case that zL. Let w{0,1}nc be a witness for z. Let C be the circuit that, with z and w hard-wired, given x as input, outputs b:=𝖣𝖾𝖼(x,z,w). Note that the size of C is at most s1((n))2. Without loss of generality, assume s2((n))s1((n))2. By the correctness of the witness encryption scheme, we have

maxC𝖲𝖨𝖹𝖤[s2((n))]{Pr(x,b)[C(x)=b]} maxC𝖲𝖨𝖹𝖤[s1((n))2]{Pr(x,b)[C(x)=b]}
Pr(x,b)[C(x)=b]
=1.

This implies that is a yes-instance of 𝖢𝖦𝖫(s1)2s2[ε]. Moreover, by the assumption on A, A outputs a hypothesis with agreement at least 1/2+ε((n))/2 over with probability at least 12Ω(n), so the reduction accepts with probability at least 12Ω(n) overall.

Now consider the case that zL. Toward a contradiction, suppose that with probability at least 1ε((n))/4, A returns a hypothesis h such that

Pr(x,b)[h(x)=b]12+ε((n))4.

Let A be the algorithm that, on input (z,x), constructs the distribution from z as above, simulates A on (z,) to obtain a hypothesis h, and then outputs h(x). Then,

PrA;(x,b)[A(z,x)=b](1ε((n))4)(12+ε((n))4)>12+ε((n))16.

By the reasoning in Lemma 52 and the definition of , this implies that

|Pr[A(z,𝖤𝗇𝖼(z,1))=1]Pr[A(z,𝖤𝗇𝖼(z,0))=1]|ε((n))8,

contradicting the security of the witness encryption scheme on input z. So, with probability greater than ε((n))/4, A returns a hypothesis h with agreement over less than 1/2+ε((n))/4, in which case the reduction rejects with probability 12Ω(n). Overall, the reduction rejects with probability greater than ε((n))/8=nO(1), as desired. Finally, by the assumption on A, must be a no-instance of 𝖢𝖦𝖫(s1)2s2[ε].

Also note: it is easy to see that the reduction is half-Levin, since a circuit C simulating 𝖣𝖾𝖼 can always be constructed from an instance z and witness w in polynomial time.

The correctness of 𝖡𝖯𝖯-black-box half-Levin reduction follows. This completes the proof of the lemma.

Lemma 57.

Consider any language L𝖭𝖯. Suppose there exist a polynomial s1 and a pair of machines (R,Rwit) such that, for all polynomials s2,ε, (R,Rwit) is a 𝖡𝖯𝖯-black-box half-Levin reduction from L to 𝗌𝖾𝖺𝗋𝖼𝗁-𝖢𝖦𝖫s1s2[ε]. Then, 𝖡𝖯𝖯-secure witness encryption exists for L.

Proof.

Let (R,Rwit) be a 𝖡𝖯𝖯-black-box half-Levin reduction from L to 𝗌𝖾𝖺𝗋𝖼𝗁-𝖢𝖦𝖫, and let the polynomial m: be such that R maps L-instances of length n to distributions supported over {0,1}m(n)×{0,1}. We define a witness encryption scheme for L as in Lemma 54. In particular:

For z{0,1}n and a{0,1}, 𝖤𝗇𝖼(z,a) computes the query made by R(z), samples (x,b) and then outputs (x,ab){0,1}m(n)+1.

𝖣𝖾𝖼((x,b),z,w) computes C:=Rwit(z,w) and then outputs C(x)b.

We will first show the correctness of (𝖤𝗇𝖼,𝖣𝖾𝖼). Suppose z{0,1}n is a yes-instance of L with witness w. By the definition of a half-Levin reduction to 𝗌𝖾𝖺𝗋𝖼𝗁-𝖢𝖦𝖫, C=Rwit(z,w) is a circuit of size at most {0,1}s1(m(n)) such that

Pr(x,b)[C(x)=b]=1.

Then, for any a{0,1},

Pr(x,b)[𝖣𝖾𝖼((x,ab),z,w)=a]=1,

as desired.

We will now show the security of (𝖤𝗇𝖼,𝖣𝖾𝖼). Toward a contradiction, suppose that for some PPT algorithm A and polynomial q, for infinitely many n and no-instances z{0,1}n of L,

Pr[A(z,𝖤𝗇𝖼(z,0))=1]Pr[A(z,𝖤𝗇𝖼(z,1))=1]1q(n).

Let the polynomial s2 be such that s2(m(n))1/2 upper bounds the running time of A on inputs of length m(n)+1+n, and let ε(m(n)):=1/(2q(n)).

Consider the PPT algorithm A that, on input (z,), outputs a circuit describing PA as in Lemma 54, fixing the bit c that maximizes success probability over , which A determines by sampling from and taking empirical estimates. By the same reasoning as in Lemma 54, with probability at least 12Ω(n), A outputs a hypothesis PA of size at most s2(m(n)) such that

Pr(x,b)[PA(x)=b]12+ε(m(n)).

Thus, for infinitely many inputs z, A meets the condition on the oracle of an instance-wise 𝖡𝖯𝖯-black-box reduction to 𝗌𝖾𝖺𝗋𝖼𝗁-𝖢𝖦𝖫s1s2[ε]: namely, given the query (z,) made by R(z), A returns a hypothesis with non-trivial success probability over . However, is not a no-instance of 𝖢𝖦𝖫s1s2[ε] for the no-instance z of L. This contradicts the correctness of R.

3.1.3 Proof of Theorem 7

Lemma 58.

Consider a language L𝖭𝖯. For s2,ε1: and a constant c, assume that (𝖲𝖨𝖹𝖤[s2],ε)-secure clogn-length witness encryption exists for L. Let the polynomial s1 be such that s1(n) upper bounds the running time of 𝖣𝖾𝖼 on L-instances of length n. Let s1(clogn)=4s1(n)logn, s2(clogn)=s2(n), and 2ε(clogn)=ε(n). Assume, for some 𝖤-computable distribution family D, that 𝖤𝗂𝗈-𝖲𝖨𝖹𝖤D12+ε(n)[s2(n)].

Then, there is a half-Levin reduction (R,Rwit) from L to 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯s1s2[ε] mapping L-instances of length n to truth-tables of length nc, where Rwit runs in time at most s1(clogn).

Proof.

The reduction R will first define a distribution as in the proof of Lemma 52: namely,

Sample b𝒰. Let x:=𝖤𝗇𝖼(z,b){0,1}clogn. Output (x,b).

Let denote the marginal distribution of over x{0,1}clogn.

The reduction R then proceeds as follows. If, for all x𝗌𝗎𝗉𝗉(), there is a unique bx{0,1} such that (x,bx)𝗌𝗎𝗉𝗉(), R constructs a truth-table f:{0,1}clogn{0,1} such that

f(x)={bxx𝗌𝗎𝗉𝗉()0otherwise

and outputs (f,). If, for any x, both (x,0) and (x,1) are in 𝗌𝗎𝗉𝗉(), then R outputs (fclogn,Dclogn), where f𝖤 and f𝖲𝖨𝖹𝖤D12+ε(n)[s2(n)]. This function exists by our assumption, and both it and a table representing Dclogn can be constructed uniformly in time 𝗉𝗈𝗅𝗒(n). Note that (fclogn,Dclogn) is a no-instance of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯s1s2[ε].

In the case that zL, for all x𝗌𝗎𝗉𝗉(), there is a unique bx{0,1} such that (x,bx)𝗌𝗎𝗉𝗉(); this follows from the correctness of the witness encryption scheme. Thus, using a circuit for 𝖣𝖾𝖼 as in the proof of Lemma 52, we get that (f,) is a yes-instance of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯s1s2[ε]. Let Rwit be the function that, given z and a witness w for zL, outputs a circuit describing 𝖣𝖾𝖼(,z,w). It follows that (R,Rwit) is a half-Levin reduction. Note that Rwit runs in time at most 4s1(n)logn.

In the case that zL, if both (x,0),(x,1)𝗌𝗎𝗉𝗉(), then R correctly outputs a no-instance of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯. Now supposing that the labels associated with strings x are unique, let C be any circuit of size s2(clogn)=s2(n). Toward a contradiction, suppose

Pr(x,bx)[C(x)=bx]=Prx[C(x)=f(x)]12+ε(clogn).

As in Lemma 52, this implies

|Prr[C(𝖤𝗇𝖼(z,1,r))=1]Prr[C(𝖤𝗇𝖼(z,0,r))=1]|ε(n),

contradicting the security of the witness encryption scheme. Thus, (f,) is a no-instance of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯s1s2[ε].

This completes the proof of the lemma.

Lemma 59.

Consider a constant c and s1,s2,ε1:. Suppose that there is a half-Levin reduction (R,Rwit) from 𝖲𝖠𝖳 to 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯s1s2[ε], where R maps 𝖲𝖠𝖳-instances of length n to truth-tables of length nc, and Rwit runs in time at most O~(s1(clogn)) on L-instances of length n. Let s1(n)=s1(clogn), s2(n)=s2(clogn)/2, and ε(n)=2ε(clogn).

Then, there is a (𝖲𝖨𝖹𝖤[s2],ε)-secure O~(s1(n))-decryption-time (clogn+1)-length witness encryption scheme for 𝖲𝖠𝖳. Moreover, there exists an 𝖤-computable distribution family D such that 𝖤𝗂𝗈-𝖲𝖨𝖹𝖤D12+ε(n)[s2(n)].

Proof.

We argue as in the proof of Lemma 54. In particular, the witness encryption scheme for 𝖲𝖠𝖳 is defined as follows.

For φ{0,1}n and a{0,1}, 𝖤𝗇𝖼(φ,a) computes (f,D)=R(φ), samples xD and then outputs (x,af(x)).

𝖣𝖾𝖼((x,c),φ,w) computes A:=Rwit(φ,w) and then outputs A(x)c.

Note that, the output length of 𝖤𝗇𝖼 is clogn+1 and that the running time of 𝖣𝖾𝖼 is at most O~(s1(n)).

For φ𝖲𝖠𝖳, A=Rwit(φ,w) is a circuit of size at most s1(clogn) such that

PrxD[A(x)=f(x)]=1.

The correctness of (𝖤𝗇𝖼,𝖣𝖾𝖼) follows as in Lemma 54.

For φ𝖲𝖠𝖳, by the same reasoning as in Lemma 54, we have that 𝖤𝗇𝖼 is (𝖲𝖨𝖹𝖤[s2],ε)-secure.

To see the second conclusion of the lemma, let φ{0,1}n be a trivial no-instance of 𝖲𝖠𝖳, constructible uniformly in time O(n). Let (f,D)=R(φ). By the correctness of the reduction (R,Rwit) and the definition of 𝖦𝖺𝗉𝖣𝗂𝗌𝗍𝖬𝖢𝖲𝖯, we have that, for all circuits C of size s2(clogn),

PrxD[C(x)=f(x)]12+ε(clogn).

Also note that tables representing both f and D are uniformly constructible in time 2O(clogn), as desired.

3.2 NP-hardness of learning polynomial-size circuits

We start with the following auxiliary lemmas.

Lemma 60 ([40, 44]).

For all sufficiently large ,m with m<2, for some d=O(2logm) there are sets 𝒮1,,𝒮m[d] constructible in time 𝗉𝗈𝗅𝗒(m,d) such that |𝒮i|= and

j<i2|SiSj|m1

for all i[m].

Lemma 61 ([20]).

For n,t, consider the “multiselector” function Selnt that, given a string x{0,1}n and indices i1,,it[n], outputs (xi1,,xit).

For all n,t, there is a logspace-uniform boolean circuit of size O(n+tlog3(n)) computing Selnt.

Lemma 62 ([26]; see [16] and [15]).

There exist constants a,c as follows. Consider a function f:{0,1}n{0,1} and a parameter ε=o(1). There is a function f^:{0,1}an{0,1} such that the truth-table of f^ can be computed from that of f in time 𝗉𝗈𝗅𝗒(2n/ε) given ε, and f^ has the following properties.

  • Local encodability: There is a non-adaptive f-oracle circuit of size at most (n/ε)c computing f^, making at most 1/ε2 queries to f.

  • Decodability: For any oracle 𝒪:{0,1}{0,1} and sufficiently large time bound t,

    𝖪(2nt/ε)c,𝒪(f)𝖪12+εt,𝒪(f^)+2n/2(1/ε)c+o(2n).

The following is the “algorithmic information extraction” lemma of [16] modified to give a set B defined in terms of 𝖪𝗉𝗈𝗅𝗒 rather than 𝖪. It is easy to see from the proof in [16] that B can be defined in this way.

Lemma 63.

Consider an algorithm D running in time tD, parameters n,Δ,ε1,,d, and gi:{0,1}{0,1} for i[n]. Let

((𝒮1(1),,𝒮Δ(1)),,(𝒮1(n),,𝒮Δ(n)))

be a family of -sized subsets of [d]. Define

G(z)=(z,w(1)(z),,w(n)(z)),

where for z{0,1}d,

w(i)(z)=(gi(z|𝒮1(i)),,gi(z|𝒮Δ(i))){0,1}Δ.

Define a set

B:={i[n]|𝖪12+ε42tD,D(gi)4(Δn)j<i2|SiSj|}.

Then, for any advice string β{0,1}O(Δn),

|Pr[D(β,G(z))]Pr[D(β,G(z)|B)]|<2Δnε,

where G(z)|B=(z,u(1),,u(n)) is such that u(i)=w(i)(z) for iB and u(i)𝒰Δ for iB. The probabilities above are over z𝒰d and u(i)𝒰Δ.

Lemma 64.

For some polynomial p and constant k, for any polynomials g,Δ,θ,γ1 with θ(n)n, Δ(n)n, and γ(n)4n4 for n, there exists a polynomial-time machine R satisfying the following. For a polynomial s1 with s1(n)γ1(Δ(n))k for n, let

λ(n)=s1(n)θ(n)logn,

and consider an advice string f=(fi{0,1}λ(n))i[n] such that, for every subset B[n],

𝖪p(λ(n))(fB)|B|λ(n)2nlogn.

Then, given as input f and an instance φ of 𝖦𝖺𝗉g𝖬𝖬𝖲𝖠θΔ over n variables, R outputs a circuit sampling a flat distribution supported over {0,1}2Δ(n)n×{0,1} as follows.

  • If φ is a yes-instance of 𝖦𝖺𝗉g𝖬𝖬𝖲𝖠θΔ, then there exists a circuit C of size at most s1(n) such that

    Pr(x,b)[C(x)=b]=1.
  • If φ is a no-instance of 𝖦𝖺𝗉g𝖬𝖬𝖲𝖠θΔ, then for every circuit C of size at most

    s2(n):=s1(n)g(n)/(logn)3,

    it holds that

    Pr(x,b)[C(x)=b]<12+γ(n).
Proof.

Let φ be a given instance of 𝖦𝖺𝗉g𝖬𝖬𝖲𝖠θΔ, where φ is over n variables. Let a,c be as in Lemma 62. Consider an arbitrary polynomial s1 such that s1(n)γ(Δ)8c. Define :=alog(λ). Let f=(fi{0,1}λ)i[n] satisfy the condition stated in the lemma.

The reduction 𝑹.

Identifying each fi with a function fi:{0,1}logλ{0,1}, let fi^:{0,1}{0,1} be the encoded function as in Lemma 62 with ε=γ(Δ)/(2Δ2). Let (𝒮1(1),,𝒮Δ(1)),, (𝒮1(n),,𝒮Δ(n)) be the sets of size guaranteed by Lemma 60 with d=O(2log(Δn))=o(Δn).

For any choice of z{0,1}d, r{0,1}Δ, and b{0,1}, define (v1,,vn)=Share(φ,b;r)({0,1}Δ)n, and define the string x(z,r,b) as

(z,(f^1(z|𝒮1(1)),,f^1(z|𝒮Δ(1)))v1,,(f^n(z|𝒮1(n)),,f^n(z|𝒮Δ(n)))vn),

padded to have length 2Δn. The output of the reduction is a circuit sampling the following distribution :

For uniformly random z𝒰d, r𝒰Δ, and b{0,1}, output (x(z,r,b),b).

Note that has a 𝗉𝗈𝗅𝗒(n)-size sampling circuit and that the marginal distribution of over x is flat.

Completeness.

To see the completeness of the reduction, suppose 𝖬𝖬𝖲𝖠(φ)θ, and let Tα={j[n]α(j)=1} where α:[n]{0,1} is an assignment of Hamming weight at most θ satisfying φ.

We will construct a circuit C of size at most s1(n) such that C(x)=b for all (x,b) in the support of . C takes as input a string x=(z,w1,,wn) with z{0,1}d and wi{0,1}Δ for i[n], padded to have length 2nΔ. C uses θΔ encoding circuits as in Lemma 62 to compute the values

{f^j(z|𝒮m(j))|jTα,m[Δ]}

by making at most θΔ(1/ε)2<γ4(Δ) total queries to fTα. C answers these queries with the string fTα hard-wired along with a multiselector circuit as in Lemma 61. C then XORs the outputs of the encoding circuits with the appropriate inputs wj to obtain an authorized set of shares (vj)jTα. Finally, C applies Rec with φ and Tα hard-wired to obtain the secret b.

C consists of 2nΔ input gates, θλ gates hard-wiring fTα, O(θλ+γ4(Δ)log3(θλ))=O(θλ) gates for multiselection, θΔ encoding circuits each of size (logλ/ε)c=o(λ), O(θΔ) gates to XOR, and O((θΔ)2) gates for Rec. This amounts to at most O(θλ)<s1(n) gates in total.

Thus, there exists a circuit C of size at most s1(n) such that

Pr(x,b)[C(x)=b]=1.
Soundness.

We now argue for the soundness of the reduction. Suppose 𝖬𝖬𝖲𝖠(φ)>θg(n). For a fixed circuit C of size s2(n)=s1(n)g(n)/(logn)3 , define a program D as follows. D takes as advice a string β(b,r):=(b,v1,,vn){0,1}O(nΔ), where (v1,,vn)=Share(φ,b;r), and accepts its input (z,w1,,wn) if and only if

C(z,w1v1,,wnvn)=b.

Note that D runs in time at most tD:=4s2(n)log(s2(n)). Now, define a set B as in Lemma 63, applying the lemma with gi=f^i for i[n]. In particular,

B:={i[n]|𝖪12+ε4λatD,D(f^i)(Δn)2}.

By Lemma 63,

|Pr[D(β,G(z))]Pr[D(β,G(z)|B)]| <2Δ2ε
=γ(Δ), (1)

where G and G(z)|B are as defined in that lemma.

We claim that B is not too large. Note that for iB, by the decodability statement of Lemma 62, for some polynomial q,

𝖪q(λ),D(fi) 𝖪12+ε4λatD,D(f^i)+λ1/2(1/ε)c+o(λ)
<o(λ). (definition of B)

Thus, for p(λ):=n2q(λ)tD,

|fB|=|B|λ 𝖪p(λ)(fB)+2nlogn
𝖪n2q(λ),D(fB)+|D|+2nlogn
iB(𝖪q(λ),D(fi))+O(|C|log|C|)+|β|+O(nlogn)
<s1(n)g(n)2logn+o(|B|λ)
=λθg(n)2+o(|B|λ).

It follows that

|B| θg(n)2(1o(1))
<θg(n),

so B is not an authorized set.

Now observe that

Prz,r,b[C(x(z,r,b))=b] =Pr[D(β(b,r),G(z))=1] (definition of D)
<Pr[D(β(b,r),G(z)|B)=1]+γ(Δ) (Eq. (3.2))
12+γ(Δ),

where the last line follows from the security of Share. We conclude that for any circuit C of size at most s2(n),

Pr(x,b)[C(x)=b]<12+γ(Δ)12+γ(n).

This completes the proof of the lemma.

Lemma 65 (Claim 7.3 of [16]).

For a uniformly random string f=(fi)i[n](𝒰λ)n, it holds with probability 12n that for all subsets B[n],

𝖪(fB)|B|λ2nlogn.
Theorem 66 (Restatement of Theorem 8).

For some constant c and g: with g(n)=n1/(loglogn)O(1) for n, for any polynomials γ1,s1 such that s1(n)γc(n) for n, there is a randomized many-one reduction from 𝖲𝖠𝖳 to 𝖢𝖦𝖫s1s1g[γ].

Proof.

By Lemma 43, there is a reduction R0 from 𝖲𝖠𝖳 to 𝖦𝖺𝗉g0𝖬𝖬𝖲𝖠θΔ for some

g0(n)=n1(loglogn)O(1),

polynomial Δ(n)n, and θ(n)n. Let k be the constant of Lemma 64, and define the constant c such that nc>Δ(n)k. For arbitrary polynomials γ1,s1, define γ(n):=γ(2nΔ(n)) and s1(n):=s1(2nΔ(n)) for n. We will apply Lemma 64 with parameters s1,γ, g0, Δ, and θ.

More specifically, our reduction will first apply R0 to obtain an instance φ of 𝖦𝖺𝗉g0𝖬𝖬𝖲𝖠θΔ on n variables, and then it will sample a random string f=(fi)i[n](𝒰λ)n. It will then apply R as in Lemma 64 to obtain an instance of 𝖢𝖦𝖫s1s2[γ] supported over {0,1}2nΔ(n)×{0,1}, where

s2(nΔ(n)) =s1(2nΔ(n))g0(n)(logn)3
=s1(2nΔ(n))g(2nΔ(n)),

where we define g(2nΔ(n)):=g0(n)/(logn)3.

By Lemma 65, with probability 12n over the choice of f, f satisfies the condition of Lemma 64. The desired statement is then immediate from the correctness of R and the definition of 𝖢𝖦𝖫.

Corollary 9 now follows by combining Theorem 8 with Lemma 33.

3.3 NP-hardness of improper learning in other settings

Theorem 67 (Restatement of Theorem 12).

There exist a distribution D𝖯𝖲𝖠𝖬𝖯, an 𝖭𝖯-complete language L, and a polynomial s1 such that, for all polynomials s2,ε1, there is a function λ: such that λ(n)=O(logn) for n and L reduces to D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s1s2[ε,λ] under an honest half-Levin reduction.

Proof.

Let D={Dλ}λ𝖯𝖲𝖠𝖬𝖯 be the distribution family and L the 𝖭𝖯-complete language of Lemma 41 with witness relation VL. That is, each Dλ is supported over truth-tables of functions O:{0,1}λ{0,1}, and there is an oracle witness encryption scheme (𝖤𝗇𝖼,𝖣𝖾𝖼) for L with respect to D secure against programs of size and query complexity 2λ/ and with advantage 2λ/ for some constant .

Let d be such that nd upper-bounds the running time of 𝖣𝖾𝖼() on L-instances of length n, and define s1(n)=n2d. Let s2 and ε be given, and define λ(n)=log(s2(n)ε(n)1)=O(logn) for n.

We now give a reduction from L to D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s1s2[ε,λ]. Given an L-instance z{0,1}n for some n, the reduction outputs () as follows.

Given access to an oracle O:{0,1}λ(n){0,1}, sample b𝒰, and let x=𝖤𝗇𝖼O(1λ(n),z,b). Output (x,b).

First suppose zL. Let w be such that (z,w)VL. Consider the oracle circuit C() that, with (z,w) hard-wired, given input x and oracle O, simulates 𝖣𝖾𝖼O(1λ(n),x,z,w) and returns its output b. Note that C() has size at most s1(n). By the correctness guarantee of (𝖤𝗇𝖼,𝖣𝖾𝖼), for every O in the support of Dλ(n) and every (x,b) in the support of , it holds that CO(x)=b. Therefore, in this case, the reduction outputs a yes-instance of D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s1s2[ε,λ].

Now suppose zL, and let C() be any oracle circuit of size at most s2(n). Note that

2λ/=s2(n)ε(n)1>s2(n),

so (𝖤𝗇𝖼,𝖣𝖾𝖼) is secure against C. Suppose for some oracle O that

Pr(x,b)[CO(x)=b]12+ε(n).

By the same reasoning as in Lemma 52,

|Pr[CO(𝖤𝗇𝖼O(1λ(n),z,1))=1]Pr[CO(𝖤𝗇𝖼O(1λ(n),z,0))=1]|ε(n)>12λ(n)/.

By the security of the witness encryption scheme, this occurs with probability at most 2λ(n)/<ε(n) over ODλ(n). Thus, in this case, the reduction outputs a no-instance of D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s1s2[ε,λ].

By combining Theorem 12 with Lemma 34, we obtain the following.

Corollary 68 (Restatement of Corollary 13).

For some polynomial s, for any constant c, it is 𝖭𝖯-hard under a randomized one-query reduction to agnostically oracle-PAC learn 𝖲𝖨𝖹𝖤[s(n)] by 𝖲𝖨𝖹𝖤[s(n)c].

Theorem 12 also implies 𝖭𝖯-hardness of learning RAMs in the improper setting, as indicated below.

Corollary 69 (Restatement of Corollary 14).

For all polynomials g and ε1, the problem 𝖦𝖺𝗉g,ε𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳 is 𝖭𝖯-hard under a randomized many-one reduction.

Proof.

We show that D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫 as in Theorem 12 reduces to 𝖦𝖺𝗉𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳. The main idea is to include the whole truth-table of an oracle O sampled from Dλ(n) in all the inputs to a RAM.

Let the polynomials g and ε1 be given. Let s1 be as in Theorem 12. Let s2(n)=g(s1(n))4, and let λ be as in Theorem 12 applied with s2 and ε. Let D be the distribution guaranteed by Theorem 12, and recall that Dλ(n) is supported over truth-tables of length 2λ(n), where s2(n)ε1(n)<2λ(n)𝗉𝗈𝗅𝗒(n).

The reduction, given an instance () of D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫s1s2[ε,λ], samples ODλ(n) and then defines the following distribution :

Sample (x,b)O, let x=(x,O), and output (x,b).

The reduction outputs (,1s1(n)2,1s1(n)2).

First suppose that () is a yes-instance of D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫. In this case, there is an oracle circuit C() of size at most s1(n) such that, for all O in the support of Dλ(n) and all (x,b) in the support of O, CO(x)=b. Note that C can be simulated by a program M of size and running time at most s1(n)2. We obtain that for all (x,b)𝗌𝗎𝗉𝗉(),

Mx=Mx,O=b,

so (,1s1(n)2,1s1(n)2) is a yes-instance of 𝖦𝖺𝗉g,ε𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳.

Now suppose that () is a no-instance of D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫. Consider any random-access machine M of size and running time at most g(s1(n)2). Note that M can be simulated by an oracle circuit C of size at most s2(n), where we may let the “x” part of the input x=(x,O){0,1}n×{0,1}2λ(n) be a standard (non-oracle) input to C. Then, since () is a no-instance of D-𝗈𝗋𝖺𝖼𝗅𝖾-𝖢𝖦𝖫, with probability at least 1ε(n) over ODλ(n),

Pr(x,b)O[CO(x)=b]<12+ε(n).

This implies, for such oracles O, that

Pr(x,b)[Mx=b]<12+ε(n).

Thus, with probability at least 1ε(n) over the randomness of the reduction,
(,1s1(n)2,1s1(n)2) is a no-instance of 𝖦𝖺𝗉g,ε𝖱𝖠𝖬-𝖬𝖨𝖭𝖫𝖳, as desired.

3.4 Excluding Pessiland, Heuristica, and Minicrypt

We show that if the hardness of learning can be based on the hardness of 𝖭𝖯 (under a non-adaptive reduction), then so can the existence of OWFs. Applying our main results connecting witness encryption and the 𝖭𝖯-hardness of learning, we obtain Theorems 16 and 17 as consequences.

3.4.1 Proof of Theorem 15

We will make use of the following lemmas.

Lemma 70.

Let f={fz}z{0,1} be a polynomial-time computable auxiliary input function. If infinitely-often one-way functions do not exist, then for every distribution family D𝖯𝖲𝖠𝖬𝖯 and constant b, there exists a PPT machine I such that for all sufficiently large n,

PrzDn[I is an nb-inverter for fz]11nb.

Moreover, if I does not nb-invert fz for some z{0,1}n, then for any y{0,1},

PrI[I(z,y) outputs ]12n.
Proof.

Assume that i.o. one-way functions do not exist. Let S be a sampler for D. For a constant b, let A be an n4b-inverter for the function g defined as g(r,x):=fS(r)(x). Define the PPT machine I as follows.

On input (z,y), sample uniformly random strings y1,,yn8b. Compute the fraction γ of i[n8b] such that A(z,yi) fails to output a pre-image of its input under g. If γ is less than 2/n4b, output A(z,y). Otherwise, output .

We first claim that, with probability at least 1n2b over zD, A(z,) is an nb inverter for fz. Toward a contradiction, suppose the claim does not hold. That is, with probability greater than n2b over zDn, A(z,) only inverts fz with success probability 1nb. But then the overall probability of A inverting g, for uniformly random r and y, is at most

1n2b+n2b(1nb)<1n4b,

contradicting our assumption on A. Moreover, by a Chernoff bound, the probability that γ is greater than 2/n4b is less than 2n. By the definition of I and a union bound, this proves the first part of the lemma.

For the second part, suppose that A(z,) does not nb-invert fz. Then the expected value of γ in the definition of I is at least nb. By a Chernoff bound, the probability that γ is less than 2/n4b is less than 2n. This completes the proof of the lemma.

We will require the following lemma, which states that if 𝖭𝖯 is easy on average for randomized algorithms, then agnostic PAC learning over efficiently samplable distributions is possible.

Lemma 71 ([12]).

If 𝖣𝗂𝗌𝗍𝖭𝖯𝖠𝗏𝗀𝖡𝖯𝖯, then there is a PPT machine that, for any polynomial s, for all sufficiently large n, agnostically learns 𝖲𝖨𝖹𝖤[s(n)] in the improper setting over distributions in 𝖯𝖲𝖠𝖬𝖯/𝗉𝗈𝗅𝗒.

It remains to exhibit a reduction from 𝖲𝖠𝖳 to inverting an auxiliary input one-way function, assuming 𝖭𝖯-hardness of learning. The following lemma is as in [2]. We include a proof for completeness.

Lemma 72 ([2]).

Suppose there is a randomized non-adaptive honest reduction R and a polynomial s: such that, for every constant c, R reduces 𝖲𝖠𝖳 to agnostically PAC learning 𝖲𝖨𝖹𝖤[s(n)] by 𝖲𝖨𝖹𝖤[s(n)c]. Then there exist a poly-time computable auxiliary input function family f={fφ}φ{0,1}, a constant b, and a PPT oracle machine A satisfying the following: for any n, φ{0,1}n, and PPT machine I that nb-distributionally inverts fφ,

AI(φ)=𝖲𝖠𝖳(φ).
Proof.

Let R be the assumed reduction from 𝖲𝖠𝖳 to agnostic PAC learning, let the polynomial tR denote its running time, and let the inverse polynomial ε be such that R requires an (s,2ε)-optimal hypothesis in response to each query. Define δ(n):=ε(n)/tR(n)2. Define an auxiliary-input function

fφ(r0,i,r):=(r0,i,i(φ,r0)(r)[1]),

where i(φ,r0) (henceforth denoted i) is the ith query of R on input φ and randomness r0, sampling a distribution (Xi,Yi). In particular, for r𝒰tR, i(r)[1] represents a random sample xXi. Note that all such samplers i can be simulated uniformly in polynomial time given (φ,r0,i), by the efficiency of R.

Let I be a PPT machine that distributionally inverts fφ. In particular, we require the success probability of I to be such that, with probability 1o(1) over r0𝒰tR(n), for every i[tR(n)], I δ(n)10-distributionally inverts fφ(r0,i,).

For i[tR(n)], define hypotheses hi as follows:

On input x, invoke I to find a pre-image r under fφ(r0,i,) of (r0,i,x), and then return y:=i(r)[2],

and define a machine A (taking oracle access to I) as follows:

On input φ{0,1}n and randomness r0{0,1}tR(n), simulate R(φ;r0), obtaining queries 1,,m. Answer the ith query of R with the hypothesis Hi, which evaluates hi on δ(n)8 independent choices of randomness (used by I) and then outputs the majority output y{0,1}. After answering all the queries this way and completing the simulation of R, accept φ iff R accepts.

For i[tR(n)], let fopt(i) be a fixed function (of arbitrary complexity) that maximizes

Pr(x,y)(Xi,Yi)[fopt(i)(x)=y].

We will argue that Hi(x) predicts y almost as well as fopt(i) over (Xi,Yi).

For any x𝗌𝗎𝗉𝗉(Xi), let the random variable Yix denote the conditional distribution yi(r)[2] given that i(r)[1]=x, for r𝒰tR(n). For simplicity, start by considering the hypotheses h^i and H^i, which are the same as hi and Hi except taking an oracle I^ that “perfectly” distributionally inverts fφ(r0,i,). That is,

Δ((r,fφ(r0,i,r)),(I^(φ,fφ(r0,i,r)),fφ(r0,i,r)))=0,

for r𝒰tR(n). Observe that for xXi, outputs of h^i(x) are distributed exactly according to Yix. In particular, for r𝒰tR(n) and x=i(r)[1],

Δ((x,Yix),(x,h^i(x))) =Δ((x,i(r)[2]),(x,i(I^(φ,(r0,i,x)))[2])) (def. of h^i)
Δ((x,r),(x,I^(φ,(r0,i,x)))))
=Δ((r,fφ(r0,i,r)),(I^(φ,fφ(r0,i,r)),fφ(r0,i,r)))
=0. (2)

Now, for i[tR(n)] and strings x in the support of Xi, define

αi(x):=maxy{0,1}{Pr[Yix=y]}.

That is, y{0,1} is the label most likely to be associated with x by (Xi,Yi), and αi(x) is the corresponding conditional probability. Note that fopt(i)(x) must always output this optimal label y, so αi(x) is the success probability of fopt(i) on input x. Moreover, by Eq. (2), for every x𝗌𝗎𝗉𝗉(Xi),

Prh^i[h^i(x)=fopt(i)(x)]=αi(x). (3)

We now fix i and x and break the subsequent analysis into two cases: either fopt(i) predicts the label of x with probability substantially bounded away from 1/2, or not.

Case I: 𝜶𝒊(𝒙)𝟏/𝟐+𝜹(𝒏)𝟐.

For j[δ(n)8], let zj be the Bernoulli random variable that equals 1 if h^i(x)=fopt(i)(x) in the jth iteration of H^i(x), and 0 otherwise. Recall that H^i(x)=fopt(i)(x) iff the former event holds in a majority of trials j[δ(n)8]. Let

Z:=j[δ(n)8]zj,

and note that, by Eq. (3),

𝔼[Z]=δ(n)8αi(x)δ(n)8(12+δ(n)).

Then, by a Chernoff bound,

Pr[H^i(x)fopt(i)(x)] =Pr[Zδ(n)8/2]
Pr[Z𝔼[Z](1δ(n)3)]
2Ω(n).
Case II: 𝜶𝒊(𝒙)<𝟏/𝟐+𝜹(𝒏)𝟐.

Now,

PryYix[H^i(x)y]PryYix[fopt(i)(x)y] 12(1αi(x))
δ(n)2.

Thus, in both Cases I and II, it holds that

PryYix[H^i(x)y]PryYix[fopt(i)(x)y]δ(n)2.

Moreover,

Δ(H^i(x),Hi(x))δ(n)8δ(n)10=δ(n)2.

for xXi. Overall, by a union bound and the definition of δ,

PrA[A(φ)=𝖲𝖠𝖳(φ)]232tR(n)δ(n)2o(1)=23o(1),

as desired.

We are now ready to finish the proof of Theorem 15, restated below.

Theorem 73 (Restatement of Theorem 15).

Suppose there is a randomized non-adaptive honest reduction R and a polynomial s: such that, for every constant c, R reduces 𝖲𝖠𝖳 to agnostically PAC learning 𝖲𝖨𝖹𝖤[s(n)] by 𝖲𝖨𝖹𝖤[s(n)c]. Then, unless 𝖭𝖯𝖡𝖯𝖯, there exist infinitely-often one-way functions.

Proof.

Assume the non-existence of i.o. one-way functions. Let R be the assumed randomized non-adaptive reduction from 𝖲𝖠𝖳 to agnostic learning 𝖲𝖨𝖹𝖤[s] running in polynomial time tR. Let the inverse polynomial ε be such that R requires (s,ε)-optimal hypotheses in response to each query, and let the constant b be as in Lemma 72. Combining Lemma 70 with Lemma 51, for every distribution family D𝖯𝖲𝖠𝖬𝖯, there exists a PPT machine I such that, for all sufficiently large n,

PrzDn[I is an nb-distributional inverter for fz]11nb,

and if I does not nb-invert fz for some z{0,1}, then for any y{0,1},

PrI[I(z,y) outputs ]12n.

Along with Lemma 72, this implies 𝖣𝗂𝗌𝗍𝖭𝖯𝖠𝗏𝗀𝖡𝖯𝖯. Let B be the PPT agnostic learner for 𝖲𝖨𝖹𝖤[s] guaranteed by Lemma 71, with accuracy and confidence parameters δ chosen such that δ1(n)max{tR(n)2,ε1(n)} for all n.

Now consider the PPT machine RB defined as follows.

On input φ{0,1}n and randomness r0𝒰tR(n), simulate R(φ;r0), obtaining queries 1,,m, where each i samples a distribution (Xi,Yi). For each such i[m], let hi be the output of B on input i. Answer the ith query of R with hi. Accept iff R accepts.

By Lemma 71, for all sufficiently large n, for each i[m], with probability 1δ(n) over the randomness of B, B returns an (s,ε)-optimal hypothesis hi for i.

By a union bound and the definition of R,

PrRB[RB(φ)=𝖲𝖠𝖳(φ)]231tR(n).

We conclude that 𝖲𝖠𝖳𝖡𝖯𝖯, as desired.

3.4.2 Proofs of Theorems 16 and 17

We will need the following lemma. For the reader’s convenience, we give a sketch of the proof in Appendix B. Though not originally stated for the infinitely-often setting, it is easily seen to follow from the same argument.

Lemma 74 ([11]).

Suppose 𝖡𝖯𝖯-secure witness encryption exists for 𝖭𝖯. Then, if infinitely-often one-way functions exist, there exists public-key encryption secure infinitely often against polynomial-time adversaries.

Theorem 16 is now immediate from Theorem 4, Lemma 33, Theorem 15, and Lemma 74.

Theorem 17 follows from Theorem 4 and Theorem 16.

4 Concluding Remarks

We have shown that the existence of secure witness encryption is equivalent to 𝖭𝖯-hardness of a certain learning problem (𝖢𝖦𝖫) under restricted deterministic reductions. This may be taken as evidence that computational learning is indeed 𝖭𝖯-hard, if one believes in cryptography. At the same time, this result also suggests that one may want to consider a more general kind of reductions when trying to prove 𝖭𝖯-hardness of learning, as establishing the existence of secure witness encryption seems hard. On the other hand, actually proving 𝖭𝖯-hardness of learning for restricted deterministic reductions would imply a breakthrough result that public-key cryptography can be based on the worst-case assumption that 𝖭𝖯𝖡𝖯𝖯.

There are many interesting research directions to explore in the context of cryptography and 𝖭𝖯-hardness of meta-complexity problems. For example, Mazor and Pass [38] have recently showed that the existence of secure Indistinguishability Obfuscation (a cryptographic primitive that implies Witness Encryption) would mean that a polynomial-gap version of 𝖬𝖢𝖲𝖯 cannot be shown 𝖭𝖯-complete under Levin reductions, unless 𝖭𝖯𝖡𝖯𝖯. Can one get similar results for some computational learning problems? What if one has secure Witness Encryption only? Another possible research direction is to find out whether existing tools from 𝖭𝖯-hardness of learning could unconditionally yield new non-trivial forms of witness encryption (eg. with security against restricted circuit classes).

Lastly, of course, the 𝖭𝖯-hardness of improper learning for 𝖯/𝗉𝗈𝗅𝗒 remains open. We have made some partial progress, and we have demonstrated cryptographic consequences of 𝖭𝖯-hardness under certain strong formulations, but we are optimistic that further progress can be made.

References

  • [1] Michael Alekhnovich, Samuel R. Buss, Shlomo Moran, and Toniann Pitassi. Minimum propositional proof length is np-hard to linearly approximate. J. Symb. Log., 66(1):171–191, 2001. doi:10.2307/2694916.
  • [2] Benny Applebaum, Boaz Barak, and David Xiao. On basing lower-bounds for learning on worst-case assumptions. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 211–220. IEEE Computer Society, 2008. doi:10.1109/FOCS.2008.35.
  • [3] Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil P. Vadhan, and Ke Yang. On the (im)possibility of obfuscating programs. J. ACM, 59(2):6:1–6:48, 2012. doi:10.1145/2160158.2160159.
  • [4] Amos Beimel. Secret-sharing schemes: A survey. In Yeow Meng Chee, Zhenbo Guo, San Ling, Fengjing Shao, Yuansheng Tang, Huaxiong Wang, and Chaoping Xing, editors, Coding and Cryptology – Third International Workshop, IWCC 2011, Qingdao, China, May 30-June 3, 2011. Proceedings, volume 6639 of Lecture Notes in Computer Science, pages 11–46. Springer, 2011. doi:10.1007/978-3-642-20901-7_2.
  • [5] Josh Cohen Benaloh and Jerry Leichter. Generalized secret sharing and monotone functions. In Shafi Goldwasser, editor, Advances in Cryptology – CRYPTO ’88, 8th Annual International Cryptology Conference, Santa Barbara, California, USA, August 21-25, 1988, Proceedings, volume 403 of Lecture Notes in Computer Science, pages 27–35. Springer, 1988. doi:10.1007/0-387-34799-2_3.
  • [6] Avrim Blum, Merrick Furst, Michael Kearns, and Richard J. Lipton. Cryptographic primitives based on hard learning problems. In Douglas R. Stinson, editor, Advances in Cryptology — CRYPTO’ 93, pages 278–291, Berlin, Heidelberg, 1994. Springer Berlin Heidelberg.
  • [7] Marco L. Carmosino, Russell Impagliazzo, Valentine Kabanets, and Antonina Kolokolova. Agnostic learning from tolerant natural proofs. In Klaus Jansen, José D. P. Rolim, David Williamson, and Santosh S. Vempala, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2017, August 16-18, 2017, Berkeley, CA, USA, volume 81 of LIPIcs, pages 35:1–35:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPICS.APPROX-RANDOM.2017.35.
  • [8] Irit Dinur, Prahladh Harsha, and Guy Kindler. Polynomially low error pcps with polyloglog n queries via modular composition. In Rocco A. Servedio and Ronitt Rubinfeld, editors, Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 267–276. ACM, 2015. doi:10.1145/2746539.2746630.
  • [9] Irit Dinur and Shmuel Safra. On the hardness of approximating label-cover. Inf. Process. Lett., 89(5):247–254, 2004. doi:10.1016/J.IPL.2003.11.007.
  • [10] Sanjam Garg, Craig Gentry, Shai Halevi, Mariana Raykova, Amit Sahai, and Brent Waters. Candidate indistinguishability obfuscation and functional encryption for all circuits. SIAM J. Comput., 45(3):882–929, 2016. doi:10.1137/14095772X.
  • [11] Sanjam Garg, Craig Gentry, Amit Sahai, and Brent Waters. Witness encryption and its applications. In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, Symposium on Theory of Computing Conference, STOC’13, Palo Alto, CA, USA, June 1-4, 2013, pages 467–476. ACM, 2013. doi:10.1145/2488608.2488667.
  • [12] Halley Goldberg, Valentine Kabanets, Zhenjian Lu, and Igor C. Oliveira. Probabilistic kolmogorov complexity with applications to average-case complexity. In Shachar Lovett, editor, 37th Computational Complexity Conference, CCC 2022, July 20-23, 2022, Philadelphia, PA, USA, volume 234 of LIPIcs, pages 16:1–16:60. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.CCC.2022.16.
  • [13] Dan Gutfreund and Amnon Ta-Shma. Worst-case to average-case reductions revisited. In Moses Charikar, Klaus Jansen, Omer Reingold, and José D. P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 10th International Workshop, APPROX 2007, and 11th International Workshop, RANDOM 2007, Princeton, NJ, USA, August 20-22, 2007, Proceedings, volume 4627 of Lecture Notes in Computer Science, pages 569–583. Springer, 2007. doi:10.1007/978-3-540-74208-1_41.
  • [14] Johan Håstad, Russell Impagliazzo, Leonid A. Levin, and Michael Luby. A pseudorandom generator from any one-way function. SIAM J. Comput., 28(4):1364–1396, 1999. doi:10.1137/S0097539793244708.
  • [15] Shuichi Hirahara. Non-disjoint promise problems from meta-computational view of pseudorandom generator constructions. In Shubhangi Saraf, editor, 35th Computational Complexity Conference, CCC 2020, July 28-31, 2020, Saarbrücken, Germany (Virtual Conference), volume 169 of LIPIcs, pages 20:1–20:47. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.CCC.2020.20.
  • [16] Shuichi Hirahara. Np-hardness of learning programs and partial MCSP. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 – November 3, 2022, pages 968–979. IEEE, 2022. doi:10.1109/FOCS54457.2022.00095.
  • [17] Shuichi Hirahara. Capturing one-way functions via np-hardness of meta-complexity. In Barna Saha and Rocco A. Servedio, editors, Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, pages 1027–1038. ACM, 2023. doi:10.1145/3564246.3585130.
  • [18] Shuichi Hirahara and Mikito Nanashima. Learning in pessiland via inductive inference. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 447–457, 2023. doi:10.1109/FOCS57990.2023.00033.
  • [19] Shuichi Hirahara and Mikito Nanashima. One-way functions and zero knowledge. In Bojan Mohar, Igor Shinkar, and Ryan O’Donnell, editors, Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, Vancouver, BC, Canada, June 24-28, 2024, pages 1731–1738. ACM, 2024. doi:10.1145/3618260.3649701.
  • [20] Justin Holmgren and Ron Rothblum. Linear-size boolean circuits for multiselection. In Rahul Santhanam, editor, 39th Computational Complexity Conference, CCC 2024, July 22-25, 2024, Ann Arbor, MI, USA, volume 300 of LIPIcs, pages 11:1–11:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.CCC.2024.11.
  • [21] Yizhi Huang, Rahul Ilango, and Hanlin Ren. Np-hardness of approximating meta-complexity: A cryptographic approach. In Barna Saha and Rocco A. Servedio, editors, Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, pages 1067–1075. ACM, 2023. doi:10.1145/3564246.3585154.
  • [22] Russell Impagliazzo. A personal view of average-case complexity. In Proceedings of the Tenth Annual Structure in Complexity Theory Conference, Minneapolis, Minnesota, USA, June 19-22, 1995, pages 134–147. IEEE Computer Society, 1995. doi:10.1109/SCT.1995.514853.
  • [23] Russell Impagliazzo, Valentine Kabanets, and Ilya Volkovich. Synergy between circuit obfuscation and circuit minimization. In Nicole Megow and Adam D. Smith, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2023, September 11-13, 2023, Atlanta, Georgia, USA, volume 275 of LIPIcs, pages 31:1–31:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.APPROX/RANDOM.2023.31.
  • [24] Russell Impagliazzo and Leonid A. Levin. No better ways to generate hard NP instances than picking uniformly at random. In 31st Annual Symposium on Foundations of Computer Science, St. Louis, Missouri, USA, October 22-24, 1990, Volume II, pages 812–821. IEEE Computer Society, 1990. doi:10.1109/FSCS.1990.89604.
  • [25] Russell Impagliazzo and Michael Luby. One-way functions are essential for complexity based cryptography (extended abstract). In 30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, North Carolina, USA, 30 October – 1 November 1989, pages 230–235. IEEE Computer Society, 1989. doi:10.1109/SFCS.1989.63483.
  • [26] Russell Impagliazzo and Avi Wigderson. P = BPP if E requires exponential circuits: Derandomizing the XOR lemma. In Frank Thomson Leighton and Peter W. Shor, editors, Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing, El Paso, Texas, USA, May 4-6, 1997, pages 220–229. ACM, 1997. doi:10.1145/258533.258590.
  • [27] Mitsuru Ito, Akira Saio, and Takao Nishizeki. Multiple assignment scheme for sharing secret. J. Cryptol., 6(1):15–20, 1993. doi:10.1007/BF02620229.
  • [28] Aayush Jain, Huijia Lin, and Amit Sahai. Indistinguishability obfuscation from well-founded assumptions. Commun. ACM, 67(3):97–105, 2024. doi:10.1145/3611095.
  • [29] Valentine Kabanets and Jin-yi Cai. Circuit minimization problem. In F. Frances Yao and Eugene M. Luks, editors, Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, May 21-23, 2000, Portland, OR, USA, pages 73–79. ACM, 2000. doi:10.1145/335305.335314.
  • [30] Michael J. Kearns, Robert E. Schapire, and Linda Sellie. Toward efficient agnostic learning. Mach. Learn., 17(2-3):115–141, 1994. doi:10.1007/BF00993468.
  • [31] Ker-I Ko. On the complexity of learning minimum time-bounded turing machines. SIAM J. Comput., 20(5):962–986, 1991. doi:10.1137/0220059.
  • [32] Caleb Koch, Carmen Strassle, and Li-Yang Tan. Properly learning decision trees with queries is np-hard. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, CA, USA, November 6-9, 2023, pages 2383–2407. IEEE, 2023. doi:10.1109/FOCS57990.2023.00146.
  • [33] Ilan Komargodski, Tal Moran, Moni Naor, Rafael Pass, Alon Rosen, and Eylon Yogev. One-way functions and (im)perfect obfuscation. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 374–383. IEEE Computer Society, 2014. doi:10.1109/FOCS.2014.47.
  • [34] Ilan Komargodski, Moni Naor, and Eylon Yogev. Secret-sharing for NP. In Palash Sarkar and Tetsu Iwata, editors, Advances in Cryptology – ASIACRYPT 2014 – 20th International Conference on the Theory and Application of Cryptology and Information Security, Kaoshiung, Taiwan, R.O.C., December 7-11, 2014, Proceedings, Part II, volume 8874 of Lecture Notes in Computer Science, pages 254–273. Springer, 2014. doi:10.1007/978-3-662-45608-8_14.
  • [35] Yanyi Liu, Noam Mazor, and Rafael Pass. A note on zero-knowledge for NP and one-way functions. Electron. Colloquium Comput. Complex., TR24-095, 2024. URL: https://eccc.weizmann.ac.il/report/2024/095.
  • [36] Yanyi Liu, Noam Mazor, and Rafael Pass. On witness encryption and laconic zero-knowledge arguments. Electron. Colloquium Comput. Complex., TR24-194, 2024. URL: https://eccc.weizmann.ac.il/report/2024/194.
  • [37] Zhenjian Lu, Noam Mazor, Igor C. Oliveira, and Rafael Pass. Lower bounds on the overhead of indistinguishability obfuscation. Electron. Colloquium Comput. Complex., TR24-146, 2024. URL: https://eccc.weizmann.ac.il/report/2024/146.
  • [38] Noam Mazor and Rafael Pass. Gap MCSP is not (levin) np-complete in obfustopia. In Rahul Santhanam, editor, 39th Computational Complexity Conference, CCC 2024, July 22-25, 2024, Ann Arbor, MI, USA, volume 300 of LIPIcs, pages 36:1–36:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.CCC.2024.36.
  • [39] Mikito Nanashima. A theory of heuristic learnability. In Mikhail Belkin and Samory Kpotufe, editors, Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 3483–3525. PMLR, August 2021. URL: https://proceedings.mlr.press/v134/nanashima21a.html.
  • [40] Noam Nisan and Avi Wigderson. Hardness vs randomness. J. Comput. Syst. Sci., 49(2):149–167, 1994. doi:10.1016/S0022-0000(05)80043-1.
  • [41] Igor C. Oliveira and Rahul Santhanam. Conspiracies between learning algorithms, circuit lower bounds, and pseudorandomness. In Ryan O’Donnell, editor, 32nd Computational Complexity Conference, CCC 2017, July 6-9, 2017, Riga, Latvia, volume 79 of LIPIcs, pages 18:1–18:49. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPICS.CCC.2017.18.
  • [42] Shien Jin Ong and Salil P. Vadhan. Zero knowledge and soundness are symmetric. In Moni Naor, editor, Advances in Cryptology – EUROCRYPT 2007, 26th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Barcelona, Spain, May 20-24, 2007, Proceedings, volume 4515 of Lecture Notes in Computer Science, pages 187–209. Springer, 2007. doi:10.1007/978-3-540-72540-4_11.
  • [43] Leonard Pitt and Leslie G. Valiant. Computational limitations on learning from examples. J. ACM, 35(4):965–984, 1988. doi:10.1145/48014.63140.
  • [44] Ran Raz, Omer Reingold, and Salil P. Vadhan. Extracting all the randomness and reducing the error in trevisan’s extractors. J. Comput. Syst. Sci., 65(1):97–128, 2002. doi:10.1006/JCSS.2002.1824.
  • [45] Rahul Santhanam. Pseudorandomness and the minimum circuit size problem. In Thomas Vidick, editor, 11th Innovations in Theoretical Computer Science Conference, ITCS 2020, January 12-14, 2020, Seattle, Washington, USA, volume 151 of LIPIcs, pages 68:1–68:26. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.ITCS.2020.68.
  • [46] Leslie G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, 1984. doi:10.1145/1968.1972.
  • [47] Andrew Chi-Chih Yao. Theory and applications of trapdoor functions (extended abstract). In 23rd Annual Symposium on Foundations of Computer Science, Chicago, Illinois, USA, 3-5 November 1982, pages 80–91. IEEE Computer Society, 1982. doi:10.1109/SFCS.1982.45.

Appendix A WE from Indistinguishability Obfuscation

In this section, we give a sketch of the construction showing that indistinguishability obfuscation for 𝖯/𝗉𝗈𝗅𝗒 implies witness encryption for 𝖭𝖯.

Definition 75 (Indistinguishability obfuscation (iO)).

A PPT algorithm i𝒪 is an indistinguishability obfuscator for a circuit class 𝒞 if the following conditions are met.

  • Correctness: for all C𝒞 and all inputs x, letting C=i𝒪(C), it holds that C(x)=C(x).

  • Security: for any efficient adversary A, for any circuits C0 and C1 computing the same function, i𝒪(C0) and i𝒪(C1) are indistinguishable for A.

Theorem 76 ([10]).

For any L𝖭𝖯, iO for 𝖯/𝗉𝗈𝗅𝗒 implies WE for L.

Proof sketch..

Let V be a poly-time verifier for L. For an instance x of L and a bit b{0,1}, define

Fx,b(w)={bV(x,w)=10otherwise

Consider the witness encryption scheme defined as follows:

  • 𝖤𝗇𝖼(x,b)=i𝒪(Fx,b);

  • 𝖣𝖾𝖼(c,x,w)=c(w).

By the correctness of the iO, if V(x,w)=1, then for C:=𝖤𝗇𝖼(x,b)=i𝒪(Fx,b), it holds that 𝖣𝖾𝖼(C,x,w)=C(w)=b. Therefore, the correctness of WE is satisfied.

We leave the proof of security to the reader, but we note that if xL, then Fx,0 and Fx,1 are the same function.

Appendix B Public-key Encryption from WE and OWFs

In this section, we give a sketch of the construction showing that witness encryption, together with the existence of a one-way function, implies the possibility of public-key encryption.

Definition 77 (Public-key encryption).

A public-key encryption scheme is a triple of algorithms (𝖦𝖾𝗇,𝖤𝗇𝖼,𝖣𝖾𝖼) meeting the following conditions.

  • Correctness: for all sufficiently large λ and any bit b{0,1},

    Pr(sk,pk)𝖦𝖾𝗇(1λ)[𝖣𝖾𝖼(sk,𝖤𝗇𝖼(pk,b))=b]=1.
  • Security: for all sufficiently large λ and any efficient adversary A,

    Pr(sk,pk)𝖦𝖾𝗇(1λ);b𝒰[A(pk,𝖤𝗇𝖼(pk,b))=b]12+𝗇𝖾𝗀𝗅(λ).
Theorem 78 ([11]).

Suppose that WE exists for every language in 𝖭𝖯 and that a one-way function exists. Then, public-key encryption exists.

Proof sketch..

By [14], we may assume the existence of a PRG G:{0,1}λ{0,1}2λ for all sufficiently large λ. Consider the language

L={t{0,1}2λs{0,1}λ,t=G(s)},

and let (𝖤𝗇𝖼WE,𝖣𝖾𝖼WE) be a witness encryption scheme for L. Define a public-key encryption scheme as follows.

  • 𝖦𝖾𝗇(1λ): sample s𝒰λ and let t=G(s). Let sk=s and pk=t.

  • 𝖤𝗇𝖼(pk,b): return (𝖤𝗇𝖼WE(pk,b),pk).

  • 𝖣𝖾𝖼(sk,c=(c,pk)): return the output of 𝖣𝖾𝖼WE(c,pk,sk).

By the correctness of the witness encryption, for any pair (sk,pk)=(s,t) in the support of 𝖦𝖾𝗇, it holds that s is a witness for tL. Therefore, 𝖣𝖾𝖼(sk,𝖤𝗇𝖼(pk,b))=𝖣𝖾𝖼WE(𝖤𝗇𝖼WE(t,b),t,s)=b. This shows the correctness of the PKE scheme.

We leave the proof of security to the reader.