On White-Box Learning and Public-Key Encryption
Abstract
We consider a generalization of the Learning With Error problem, referred to as the white-box learning problem: You are given the code of a sampler that with high probability produces samples of the form where is small, and is computable in polynomial-size, and the computational task consist of outputting a polynomial-size circuit that with probability, say, over a new sample according to the same distributions, approximates (i.e., is small). This problem can be thought of as a generalizing of the Learning with Error Problem (LWE) from linear functions to polynomial-size computable functions.
We demonstrate that worst-case hardness of the white-box learning problem, conditioned on the instances satisfying a notion of computational shallowness (a concept from the study of Kolmogorov complexity) not only suffices to get public-key encryption, but is also necessary; as such, this yields the first problem whose worst-case hardness characterizes the existence of public-key encryption. Additionally, our results highlights to what extent LWE “overshoots” the task of public-key encryption.
We complement these results by noting that worst-case hardness of the same problem, but restricting the learner to only get black-box access to the sampler, characterizes one-way functions.
Keywords and phrases:
Public-Key Encryption, White-Box LearningFunding:
Yanyi Liu: Research partly supported by NSF CNS-2149305.Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computation Computational complexity and cryptographyEditors:
Raghu MekaSeries and Publisher:

1 Introduction
Public-Key Encryption (PKE) [12, 31] is one of the most central cryptographic primitives enabling secure communication on the Internet: it is the primitive that enables entities that have not physically met to engage in confidential conversations and collaborations.
In contrast to private-key primitives, such as symmetric-key encryption and digital signatures, that can be securely build from the minimal primitive of one-way functions (and for which many candidate problems are known), we only know of a handful of candidate hard problems from which public-key encryption can be constructed. More specifically, these include (a) number-theory problems based on either factoring [31, 29] or discrete logarithms [12, 13], (b) coding-theory based problems [27], (c) lattice problems as finding shortest/longest vectors in lattices [1, 30, 8], and (d) noisy linear-algebra based problems [2, 5]. Out of these, the number-theory based problems can be efficiently solved by quantum algorithms [32], and the coding-theory, lattice and noisy linear algebra problems are all very related – in essence, they can all be viewed as different instances of solving noisy systems of linear equations (on either particular natural distributions, or even in the worst-case, when restricting attention to systems of equations satisfying the condition that appropriate solutions exist.111In more detail, we require worst-case hardness to hold when conditioning on instances that define a lattice where the shortest vector is long compared to amount of noise.
The main purpose of this paper is to provide an assumption that generalizes all these assumptions (i.e., is implied by all of them), yet suffices for obtaining secure PKE. Indeed, the main result of this paper is that hardness of a notion of white-box learning achieves this goal.
White-box Learning
Perhaps the most common noisy linear algebra-based assumption is the hardness of the Learning With Error (LWE) problem [30], which, in essence, stipulates the hardness of recovering a vector given , where is a matrix, is some “small” noise vector and all arithmetic is modulo some prime . In more detail, we typically require an stronger condition: not only that is hard to recover , but also that it is hard to compute a value “close” to for a random vector . In other words, we can think of as the description of a function that we are trying to improperly approximately learn given noisy samples – thus the name “learning with error”.
In fact, there is also a different way to think about the LWE problem that will be useful for our purposes (which follows from the construction of Regev’s PKE [30]): We are given the code of a sampler that enables providing samples of the form , and the goal is to approximate on a new fresh sample according to the same distribution as .222The reason for this is that given , we can generate noisy samples of by taking linear combinations. See [30] and the full version of this paper for more details. We refer to this alternative way of thinking of LWE as an instance of (improper) white-box learning, where, more generally, we are given the code of a sampler that generates noisy samples of the form and the goal is to approximate for a fresh sample according to the same distribution, using a polynomial-size circuit. In essence, this problem is generalizing the LWE problem from (improperly) learning linear functions from noisy samples, to (improperly) learning a polynomial-size circuit from noisy samples, and given white-box access to the sampler333As we will discuss in more detail later, it is also more general than the LWE problem in the aspect the LWE sampler has a particular form, but we here may consider more general classes of samplers.; the white-box access feature can be viewed as a generalization of Valiant classic PAC-learning model [35] to a setting where the learner not only gets random samples, but also gets the code of the sampler.
In more detail, given a sampler circuit that samples “labeled instances” (where we think of as a, perhaps, noisy label for ), let denote the set of circuits that with probability over satisfy the property that (when interpreting both and as elements in ). For a function , let - denote the following learning problem:
-
Input: Circuit , with the promise that there exists a circuit such that .
-
Output: Circuit .444We remark that for efficient algorithms, outputting a circuit that approximates a function is essentially equivalent to approximating the value of on a random sample . Indeed, the circuit implementation of an algorithm for the latter task, is a valid output for the first task.
In other words, we are given a sampler that with very high () probability outputs labelled samples where the label is very close () to being correct, and the goal is to, given the code of the sampler , simply find a circuit that with probability approximates the label within (i.e., with a factor higher error). We use to denote - when .
Hardness of Learning v.s. (Public-Key) Cryptography
Roughly speaking, the main result of this paper will be to show that under certain restrictions on samplers (which come from the study of time-bounded Kolmogorov complexity [23, 34, 11, 22, 17, 33] and in particular the notion of computational depth [4]), this generalization of the LWE problem not only suffices to realize a PKE, but will also be necessary. In more detail, worst-case hardness of - under these conditions will characterize the existence of public-key encryption. As such, our results yield insight on the extent with which LWE “overshoots” the task of public-key encryption.
We highlight that we are not the first ones to consider connections between learning-theory and cryptography; however, as far as we know, all earlier connections were between the hardness of learning and private-key cryptography (i.e., the notion of one-way functions), as opposed to public-key cryptography. Indeed, classic results from [21, 7] demonstrate the equivalence of the hardness of a notion of average-case PAC-learning of polynomial-size circuits (i.e., black-box learning) and one-way functions.555And the results of [20] can be thought of as characterizing one-way functions through a different type of learning. In contrast, our focus here is on PKE, and instead of considering black-box learning, we consider white-box learning. An additional difference is that we consider worst-case hardness, as opposed to average-case hardness, of the learning theory problem. As was recently shown in [18], this issue can be overcome (in the context of characterizing one-way functions) through the use of a (alternative) notion of computational depth (more details on the relationship with this work below). To put out result in context, we additionaly show that exactly the same problem that we demonstrate characterizes PKE, also characterizes one-way functions once modified to only provide the learner black-box access to the sampler.
1.1 Our Results
Towards explaining our results in more detail, let us first recall the notion of computational depth [4]. Let the -computational depth of , denoted , be defined as where denotes the Kolmogorov complexity of and denotes the -bounded Kolmogorov complexity of . That is, measures how much more could be compressed if the decompression algorithm may be computationally unbounded, as opposed to it running in time bounded by .
We will focus on instances of the problem having the property that is “computationally shallow”, where is a circuit that agrees with with high probability: let - denote - with the additional promise that .666We note that there is nothing special about the constant 2; it can be anything that is strictly larger than 1.
Characterizing PKE through Hardness of White-Box Learning
Our main results is that the hardness of this problem characterizes the existence of public-key encryption:
Theorem 1.
Let be a constant and let be an efficiently computable function such that , and be polynomial such that . Then, following are equivalent:
-
PKE exists.
-
.
Computational depth is typically thought of as measure of “unnaturality” of strings: strings with low computational depth are considered “natural” and those with high computational depth are considered “unnatural”.777The reason why low computational depth captures “natural string” is as follows: random strings are known to have low computational depth; furthermore, known results (c.f. slow growth laws [18]) show (at least under derandomization assumptions) that one needs to have a long running time to even find a string with high computational depth. So strings with high computational depth are rare and “hard to find”, which is why they can be thought of as “unnatural”. Given this interpretation, our characterization of public key encryption is thus in terms of the worst-case hardness of white-box learning for “natural” instances.
As far as we know, this thus yields the first problem whose worst-case hardness not only suffices for public-key encryption (such as e.g., [30]) but also is necessary.
On the Use of Computational Depth
We note that Antunes and Fortnow [3] elegantly used computational depth to connect worst-case hardness of a problem when restricting attention to elements with small computational depth and errorless average-case hardness on sampleable distributions; errorless hardness, however, is not sufficient for cryptographic applications. Nevertheless, inspired by the work of [3], worst-case hardness conditioned on instances with small computational depth was used in [26] (and independently using a variant of this notion in [18]) to characterize one-way functions; additionally, an (interactive) variant of such a notion was also implicitly used in [6] to characterize key exchange protocols. Our techniques are similar to those employed in [26, 6] but instead of applying them to study the hardness of a time-bounded Kolmogorov complexity problem (following [25]), we here instead apply them to study a learning theory problem (namely, white-box learning).
We note that learning theory problems conditioned on small computational depth were recently used in [18] to characterize one-way functions, but our techniques here are more similar to [26, 6]. In particular, [18] does not actually use the standard notion of computational depth but instead define a new alternative variant; in contrast, we here rely on just the standard notion.
Relating Exact and Approximate White-Box Learning
Note that in Theorem 1, the equivalence hold for any (sufficiently small) choice of , as such we directly get as a corollary the equivalence of Exact and Approximate White-box Learning:
Corollary 2.
Let be a constant and let be any efficiently computable function such that , and let be any polynomial such that . Then the following are equivalent:
-
-
-
Bounded-Degree Learning and LWE
While in the - problem, we allow the function we are trying to learn to be any polynomial-size circuit, we may also consider a restricted version of the problem, denoted -, where we restrict attention to functions that can be computed by a degree polynomial, and we assume that arithmetic is now over .
We first remark that our main theorem can next be generalized (basically using padding) to show that it suffices to use learning of degree- polynomials to characterize PKE:
Theorem 3.
Let be a constant and let be efficiently computable functions with and , and be polynomial such that . The following are equivalent:
-
PKE exists.
-
.
-
.
-
.
Additionally, as informally discussed above, we note that the hardness of the LWE problem, implies hardness of the - problem (i.e., white-box learning of linear functions.)
Lemma 4 (Informal).
Assuming the hardness of LWE, there exists a polynomial such that where .
Lemma 4 thus shows that white-box learning of linear functions is at least as weak an assumption as LWE. At first sight, one would expect that a converse result may also hold due to known worst-case to average-case reductions for the LWE problem [30, 28, 9] and thus that LWE is equivalent to white-box learning of linear functions. However, the problem with proving the converse direction is that the known worst-case to average-case reductions for LWE only work when the LWE instance defines a lattice where the shortest vector is long compared to amount of noise. Instances sampled from may not necessarily satisfy this promise, and thus is it not clear how to use an LWE oracle to generally solving white-box learning of linear functions. Thus, it would seem that hardness even of just is seemingly a weaker (and therefore more general) assumption than LWE. (Of course, it may be that a stronger worst-case to average-case reduction can be established for the LWE problem, in which case equivalence would hold.)
We finally investigate what happens in the regime of “intermediate-degree” polynomials. We remark that using standard linearization techniques, the constant-degree problem is equivalent to the case of degree 1:
Lemma 5.
For every constant , and functions ,
there exist such that
and
Black-box Learning
Finally, to put our results in context, we consider the standard PAC learning model [35] where the learner only get access to samples: Let - denote identically the same problem as with the exception that the learner gets oracle access (i.e., black-box access) to the sampler (as opposed to white-box access to the sampler). This notion is equivalent to the notion of improper -approximate PAC learning for polynomial-size circuits (and when to simply improper PAC learning).
As before, let - denote the problem - with the additional promise that , and let and to denote - and - when .
The following theorem can be viewed as the worst-case analog of the classic result of [21, 7] characterizing one-way functions through the hardness of average-case PAC learning.
Theorem 6.
Let be a constant and let be any efficiently computable function such that , and let be any polynomial such that . Then the following are equivalent:
-
One-way function exists
-
-
As mentioned, [18] also recently obtained a worst-case characterization of one-way functions through a learning problem, and using a notion of computational depth. The problems, however, are somewhat different. As opposed to [18], we here consider the standard PAC learning problem (whereas they consider a more general learning problem), and condition on the standard notion of low computational depth (whereas they condition on a new notion of low computational depth that they introduce).888Of course, on a conceptual level, these results are similar; the key point we are trying to make here is that exactly the same learning problem characterizes either one-way functions or PKE, depending on whether the learner gets black-box or white-box access to the sampler.
We remark that our Theorem 6 differs from [21, 7] not only in the worst-case condition, but also generalizes those results in the sense that we handle the hardness of -approximate learning for any . As a consequence, we again get an equivalence of approximate and exact black-box learning:
Corollary 7.
Let be a constant and let be any efficiently computable function such that , and let be any polynomial such that . Then the following are equivalent:
-
-
-
Open Problems
We leave as an intriguing open problem the question of whether white-box learning of polynomial, or even logarithmic-degree polynomials, also can be collapsed down to the case of constant-degree functions (and thus to linear functions); if this were possible, it would show that PKE, in essence, inherently requires the structure of the LWE problem.
Additionally, as discussed above, even when just restricting attention to learning linear functions, our learning problem generalizes LWE. It would appear that despite this, cryptographic applications of LWE (e.g., to obtain fully homomorphic encryption [14, 10]) nevertheless may still be possible from this generalized version; we leave an exploration of this for future work.
Finally, it is an intriguing open problem to relate black-box and white-box learning. By our results, doing so is equivalent to relating public-key and secret-key encryption. We note that relating black-box and white-box learning is interesting even just for the case of linear functions (which by our results is equivalent to -degree polynomials). Indeed, Regev’s construction of a PKE [30] can be thought of a reduction from black-box learning to white-box learning of linear functions for a specific distribution; it is possible that a similar reduction may be applicable more generally.
1.2 Proof Overview
We here provide a detailed proof overview for the proof of Theorem 1. For simplicity, we will show the equivalence between PKE and the worst-case hardness of the exact version, . We start with the construction of PKE based on the hardness of .
Weak PKE
First, we use the well-known fact that a PKE is simply a two-rounds key-agreement protocol. Moreover, by the Key-agreement Amplification Theorem of Holenstein [19] and an application of the Goldreich-Levin theorem [15], to obtain (full-fledged) PKE, it suffices to obtain a weak form of two-rounds key-agreemnt, which we simply refer to as Weak PKE defined as follows: There exist some such that agreement between A and B happens with probability . Security requires that Eve cannot guess the key (output by Alice) with probability better than .
The Weak PKE protocol
We will next define the weak PKE protocol. We note that this construction resembles the universal key-agreement construction from [16], but with some crucial difference that enable our security proof. The parties A and B on input perform the following steps:
-
Sample random program: A samples a random length , and a random program of length .
-
Run random program: Next, A runs the program for at most steps to get an output, and interprets the output as a pair of circuits and . (Think of as the sampler for a white-box learning instance, and of as a potential solution to the problem.) If the output of is not a valid encoding of such a pair, A sets and for two fixed circuits that always output .
-
Agreement estimation: A estimates the “agreement probability” of and (i.e., checking whether indeed is a good solution): it samples random inputs for , and computes . It then lets . If , A reset the pair .
-
First message: A sends (the “sampler”) to B.
-
Second message: B applies on a random input , and computes . It then sends to A.
-
Outputs: A outputs and Bob outputs .
Agreement
We claim that with probability , Alice and Bob will agree (i.e., the final outputs are the same). Note that if , Alice and Bob always agree. Moreover, let . Then, the probability of Alice and Bob to agree given that Alice uses as the circuits in the protocol, is exactly . We observe that by the Chernoff bound, the probability that is far from is small, and thus Alice uses only when is larger than .
Security
We claim that that can guess with probability can be used to solve
.
In more detail, consider such , and let be an input for
. The idea is to construct an algorithm , that on input outputs a circuit , such that simply simulates on the messages and . then outputs ’s output. For simplicity, in the following we assume that (and thus ) is deterministic. (We note that this is a non-black-box reduction: We are using the code of Eve to generate – in particular, we are including the code of Eve into this circuit.999This particular non-black usage of Eve is not inherent. We could have considered a different formalization of the learning theory problem which simply requires the attacker to succeed on a randomly sampled instance. Subsequent parts of the argument, however, will use non-black-box access to Eve more inherently.)
Let be a random variable distributed according to the same distribution of the first message in the above protocol. By assumption, we have that
It follows by a simple averaging argument that
Namely, solves with probability at least over the distribution of . We next use ideas from [26, 6] to show this implies that solves in the worst case.
Indeed, assume that fails on some instance of . By the promise of the problem, there exists a circuit such that , and agrees with with probability at least . Let . Our goal is to show that , which is a contradiction.
Toward this goal, let Let be the set of all pairs of circuits with that agree with probability at least , and on which fails, so that . Fix , and let be the length program that outputs in time . Observe that the probability of A to sample in the first step of the above protocol is at least its probability to sample the program , which is . Since agree with high probability, the equality test it the third step of the protocol will pass with high probability, and thus will send to B with probability at least . In other words,
for every , and thus
(here we say that if there is some such that ).
On the other hand, by definition of , fails on every . Since fails with probability at most on , it must holds that
Combining the above, we get that
We can now use the bound on to bound the Kolmogorov complexity of . To describe , it is enough to describe the set , and the index of the pair in this set. That is,
We conclude the proof with the observation that to describe it is enough to describe (which can be done using bits), ( bits) and (that can be described with constant many bits101010This non-black-box usage of Eve (which is taken from [26, 6]) is seemingly inherent to our proof technique. Note that we are here relying on the fact that Eve is a uniform algorithm, but as we discuss in the formal section, the argument can be extended to work also in the non-uniform setting.). Thus, , and we get that
as we wanted to show.
Hardness of from PKE
We next show that the existence of PKE implies the hardness of . We now sketch the proof. Let be the algorithm the generate pair of public and secret key, and let be the encryption and decryption protocols. For a random pair of keys , we construct two circuits, , and .
By the security of the PKE scheme, it follows that with high probability over the randomness of , it is hard to learn the function of , as the circuit only uses the public key, and the function computes is the decryption of a random encryption. This already implies that is hard, but we still need to show that is inside the promise of the problem .
It follows by the correctness of the PKE scheme that computes the function that is sampled by . Thus, to be in the promise of , we only need that has small computational depth. This, however, is not necessarily the case. To solve this, we simply pad with the randomness used by . That is, we let be a functionally equivalent circuit to , with encoded to it, where is such that . It follows that when is large enough, (as we can describe them by simply describing the randomness and the algorithm ). On the other hand, (since can be obtained from ), which is at least with high probability. Together, we conclude that with high probability over the randomness of , the circuits are in the promise of .
Comparison with [6]
We remark that at a high-level, our construction of a two-message key-agreement protocol shares many similaries with the key-agreement protocol developing in [6], relying on the hardness of an interactive notion of time-bounded Kolmogorov complexity, conditionned on an analog of computational shallowness. They central difference is that the protocol in [6] requires at least 3 rounds due to the use of an equility check protocol to determine whether Alice and Bob managed to agree on a key. In contrast, our protocol does not rely on such an equality check step (and thus it can be executed in only two rounds, which is crucial to get PKE); indeed, we replaced the equality protocol with a step where Alice on her own can determine whether the message she sends will enable agreement to happen. Of course, the reason why we can do this is that we are reducing security from a different problem.
Comparison with Universal Constructions
Universal constructions of PKE are known (see [16]); that is, constructions having the property that they are secure if (any) PKE exist. We emphasize that while the details of the protocol from [16] are somewhat different, the “agreement estimation” step performed there is very similar to what we do. Furthermore, we can interpret our protocol as an alternative (variant) universal PKE protocol in which Alice chooses a random (key-generation) program and executes it to get an encryption scheme and description scheme , with the keys hardcoded to the scheme.111111In the protocol of [16], Alice would instead choose random programs for Alice and Bob to run; we do not know how to prove the security of such a protocol under our assumption. Alice then estimates the agreement probability of the encryption, and if it is high enough, she sends the encryption scheme as the public-key, and uses the decryption scheme as the private-key. If PKE exists, with a noticeable probability over the choice of , this scheme will be secure. We emphasize that since we base the security of our protocol (and thus also the above universal one) on the hardness of the white-box learning problem, it enables an approach for measuring the concrete security of the protocol by relating it to the security of the learning theory problem.
2 Preliminaries
2.1 Notations
All logarithms are taken in base . We use calligraphic letters to denote sets and distributions, bold uppercase for random variables, and lowercase for values and functions. Let stand for the set of all polynomials. Let ppt stand for probabilistic poly-time, and n.u.-poly-time stand for non-uniform poly-time. An n.u.-poly-time algorithm is equipped with a (fixed) poly-size advice string set (that we typically omit from the notation), and we let stand for equipped with the advice (used for inputs of length ). For a randomized algorithm , we denote by the algorithm with fixed randomness . Let stand for a negligible function. Given a vector , let denote its entry, let and . Similarly, for a set , let be the ordered sequence . For , we use to denote the concatenation of and . For a set , we use to denote the set .
2.2 Distributions and Random Variables
When unambiguous, we will naturally view a random variable as its marginal distribution. The support of a finite distribution is defined by . For a (discrete) distribution , let denote that was sampled according to . Similarly, for a set , let denote that is drawn uniformly from . For , we use to denote a uniform random variable over (that is independent from other random variables in consideration). The statistical distance (also known as, variation distance) of two distributions and over a discrete domain is defined by .
The following lemma is proven in the full version of this paper.
Lemma 8.
There exists an efficient oracle-aided algorithm such that the following holds. Let be a pair of jointly distributed random variables over , and let be a uniform independent random variable over . Let be an algorithm such that , for . Then .
2.3 Circuits
In this paper we consider circuits over the De-Morgan Basis, which contains the following gates: (“and” gate with fan-in ), (“or” gate with fan-in ), and (“not” gate with fan-in one). The size of a circuit , is the number of gates in .
We consider encoding of a circuit as a string over in the following natural way: Given a circuit , we first encode the length of using a prefix-free encoding, and then for every gate , according to a topological order, we encode its type (input, output, ,, or ), and the (up to two) gates wired into .
Observe that every circuit of size can be encoded to a string of length . Moreover, given the encoding and an input to , can be computed efficiently (in polynomial time in the encoding length).
We identify a string in with the circuit it encodes. For example, for we use to denote the value of the circuit encoded by applied on the input . We denote by the length of the encoding of .
Finally, by encoding first the size of , we assume that for every encoding of a circuit , and for every , the string encodes the same circuit as .121212We actually only use the fact that we can encode into the circuit. This can be done by adding dummy gates that do not change the output of , where each gate is either or according to the th bit of .
2.4 Entropy
For a random variable , let denote the (Shannon) entropy of , and let denote the min-entropy of .
For a random variable and an event , we use to denote the min-entropy of the distribution . We will use the following facts.
Fact 9.
Let and be independent random variables. Then .
2.5 Complexity Classes
We define the complexity classes FBPP and .
Definition 10 (Infinitely-often FBPP ()).
A binary relation is in if there exists ppt algorithms such that the following holds for infinitely many ’s:
For every such that there exists with ,
for every . is in if the above holds with respect to algorithms .
2.6 One-Way Function
Definition 11 (One-way function).
A polynomial-time computable function is called a one-way function if for every polynomial-time algorithm ,
Definition 12 (Weak one-way function).
Let be a polynomial-time computable function. A polynomial-time computable function is called -weak one-way function if for every polynomial-time algorithm , for every large enough ,
is a weak one-way function if it is -weak one way function, for some .
Theorem 13 (Weak to strong OWFs, [36]).
One-way functions exist if and only if weak-one way functions exist.
2.7 Public-Key Encryption
Definition 14 (Public-key encryption scheme (PKE)).
A triplet of randomized, efficiently computable functions is a -public-key encryption scheme (PKE) if the following holds:
-
Correctness: For every large enough and any ,
-
Security: For every PPT , for every large enough ,
Such a scheme is a PKE if it is -PKE for every .
The following lemma shows that it is possible to amplify an -bit weak key-agreement protocol into a key-agreement. This lemma is a simple case of the more general result of [19].
Lemma 15 (PKE amplification, [19]).
The following holds for every constants . Assume there exists an -PKE. Then, there exists a PKE.
We also define weak-PKE.
Definition 16 (Weak Public-key encryption scheme (weak-PKE)).
For an efficiently computable function , a triplet of randomized, efficiently computable functions is a -weak-public-key encryption scheme (weak-PKE) if the following holds:
-
For every , given and randomness , outputs a message and an output .
-
Correctness: For every large enough
-
Security: For every PPT , for every large enough ,
In the full version of this paper we prove the following lemma, stating that weak-PKE can be used to construct PKE.
Lemma 17 (Weak-PKE amplification).
The following holds for every constants . Assume there exists an -weak-PKE. Then, there exists a PKE.
2.8 Kolmogorov Complexity and Computational Depth
Roughly speaking, the -time-bounded Kolmogorov complexity, , of a string is the length of the shortest program such that, when simulated by an universal Turing machine, outputs in steps. Here, a program is simply a pair of a Turing Machine and an input , where the output of is defined as the output of . When there is no running time bound (i.e., the program can run in an arbitrary number of steps), we obtain the notion of Kolmogorov complexity.
In the following, fix universal TM with polynomial simulation overhead, and let denote the output of when emulated on for steps. We now define the notion of Kolmogorov complexity with respect to the universal TM .
Definition 18.
Let be a polynomial. For all , define the -bounded Kolmogorov complexity of
where is referred to as the description length of . When there is no time bound, the Kolmogorov complexity of is defined as
The computational depth of [4], denoted by , is defined to be
We use to denote the Kolmogorov complexity of some generic self-delimiting encoding of the pair . Recall that we use to denote the complexity of the concatenation of and . We will use the following well-known fact:
Fact 19.
For every ,
We will also use the following bound on the Kolmogorov complexity of strings sampled from distributions with high min-entropy.
Lemma 20.
For every , and every distribution , it holds that
Proof.
There are at most strings with . Thus,
We will also use the well-known Chernoff bound in our proof.
Fact 21 (Hoeffding’s inequality).
Let be independent random variables s.t. . Let and . For every it holds that:
3 White-Box Distributional Learning
Let be a circuit that, given bits of randomness, samples labeled instances . In the following we view as a binary representation of a number in (and respectively all the operations below are over , and we use to denote the absolute value). We define the set of all circuits that approximate with high probability,
We define the following white-box learning problem WBLearn:
Definition 22 (-WBLearn).
For a function , let - be the following learning problem:
-
Input: Circuit , with the promise that there exists a circuit such that .
-
Output: Circuit
In this work we are focusing on inputs for which the circuit that agree with with high probability, is such that the description of and is computational shallow. Formally, for a time function , we denote by - the problem - with the additional promise that .
We use and to denote - and - when . Note that and are incomparable: While in we only need to find a circuit that approximates , the promise in is stronger. Yet, there is a simple reduction from to
.
Lemma 23.
For every such that , it holds that
Similarly, for any such and a function , there exists , such that
Proof.
We start with the first reduction . Given a circuit of length , the reduction outputs a circuit , which is equivalent to , with additional output gates that always output . That is, , where and . As we only added gates to , can be encoded using bits. Using padding we assume that .
For correctness, observe that if there exists , then there exists such (where is defined similarly to using ). Moreover, an approximation of the output of within a distance of is equivalent to the exact output of . Indeed, given an -approximation of we can find by simply dividing by and rounding to the closest integer.
Finally, to see the second reduction, it is enough to show that . Since can be efficiently constructed given , and similarly can be efficiently constructed given , it holds that for large enough (polynomial time , .
We prove the following theorem.
Theorem 24.
Let be a constant and let be any efficiently computable function such that , and let be any polynomial such that . Then the following are equivalent:
-
PKE exists
-
-
As a corollary, we get a result of independent interest relating exact and approximate white-box learning:
Corollary 25.
Let be a constant and let be any efficiently computable function such that , and let be any polynomial such that . Then the following are equivalent:
-
-
-
Theorem 24 follows by the following two theorems.
Theorem 26.
Let be a polynomial and an efficiently computable function. Then if , PKE exist.
Theorem 27.
Assume there exists a PKE. Then for any constant and any and any efficiently computable such that , .
Theorem 26 is proven in Section 4, and Theorem 27 is proven in the full version of this paper.
4 Worst-Case hardness of KE
In this section we prove Theorem 26, that states that the worst-case hardness of implies the existence of public-key encryption scheme. In the following, let be the circuit that always outputs , and be the circuit that always outputs .
To prove the above theorem, fix and , and consider the following scheme :
Algorithm 28 ().
Parameter: function .
Input: .
Operation:
-
1.
Sample and .
-
2.
Run for steps to get circuits for some . If the output of is not two such circuits, set , , .
-
3.
Randomly sample , and compute
If , reset and .
-
4.
Output as the secret key, and as the public key.
.
Algorithm 29 ().
Input: public-key .
Operation:
-
1.
Sample randomness .
-
2.
Compute to get .
-
3.
Output and .
.
Algorithm 30 ().
Input: secret-key , cipher .
Operation:
-
1.
Compute .
-
2.
Output .
.
Observe that the size of the circuits and sampled by is at most . Thus, all of the above algorithms can be implemented efficiently. Below we bound the correctness and the security of the above scheme. For every , let and be the random variables distributed according to the secret and public keys in a random execution of . Let and be the output of in a random execution.
4.1 Correctness
We start by analyzing the correctness probability of the scheme.
Lemma 31.
For every , it holds that
To prove Lemma 31 we will use the following simple claim, which is immediate from the Hoffeding bound.
Claim 32.
Let and be two circuits. Let be a random variable distributed according to the following process:
Sample , and let .
Let .
Then .
Proof.
Immediate by 21.
Proof of Lemma 31.
4.2 Security
We next bound the leakage of the scheme.
Lemma 33.
Assume there exists an algorithm such that
for infinitely many ’s. Then .
In the following, let be an algorithm that uses bits of randomness and guesses with probability at least . Recall that for , denotes the execution of when its randomness is fixed to be .
Algorithm 34.
Input: .
Operation:
-
1.
Let be such that .
-
2.
Sample uniformly at random.
-
3.
Construct a circuit such that .
-
4.
Return .
.
We prove the following lemma.
Lemma 35.
Assume that
for infinitely many ’s. Then the following holds for infinitely many ’s. For every input , 34 outputs with probability at least .
Proof of Lemma 33.
Assume there exists an algorithm such that
for infinitely many ’s.
By Lemma 35, for infinitely many ’s, 34 outputs with probability at least , for every . We want to construct an algorithm Sol that, given and , outputs with probability at least , for every such .
Let Sol be the algorithm that, given input , first run 34 on for times, to get circuits . Then, for every circuit , Sol samples , and computes
Finally, Sol outputs the circuit for the index with maximal value of .
We now analyze the success probability of Sol. Using 21, for every , the probability that is at most
Thus, by the union bound, the probability that for some is at most . Next, by the success probability of 34, with probability at least
at least one of the circuits is in . Let be the index of such a circuit. Then, with probability at least , such exists, and is at least , while for every with , is at most
which implies that the output of Sol is in .
4.3 Proving Lemma 35
To prove Lemma 35, we start with the following claim.
Claim 36.
Proof of 36.
Let be a circuit such that
Then, by definition it holds that
Using Markov’s inequality, we gets that
which implies that the circuit is in with probability at least over the choice of , as we wanted to show.
Proof of Lemma 35.
Assume that is such that
for infinitely many ’s. In the following, fix such large enough . We show that
(1) |
for every with . The proof then follows by 36. To see the above, let , and notice that
(2) | ||||
Indeed, it holds that
which implies that Equation 2 holds. Next, we use Equation 2 and the upper bound on the computational depth of instances in , to show that Equation 1 holds for every . To do so, fix and let with be the circuit promised by the definition of . Assume towards a contradiction that
We want to upperbound , to get a lower bound on the computational depth of . To this end, let be the set of all pairs , such that , , and on which fails to approximate with probability more than .
By our assumption on , it holds that . We next bound the size of . First, we claim that for every ,
(3) |
Indeed, by definition there exists a program of length that outputs in at most steps. Thus, samples with probability at least . Next, by 32, the test in Step 3 of passes with probability at least , since by definition of , . In this case that the test passes, and are the output of , and thus Equation 3 holds.
By Equation 3 and the definition of , we get that
Combining the above with Equation 2 yields that
Observe that can be (inefficiently) computed given and . Thus, to encode , it is enough to encode and the index of in (according to the lexicographic order). We conclude that,
where the last inequality holds for every large enough . By the above, , in contradiction to the choice of . This yields that Equation 1 holds for every , as we wanted to show.
4.4 Proving Theorem 26
We are now ready to use Lemma 15 in order to prove Theorem 26.
Proof of Theorem 26.
By Lemmas 31 and 33, is a -weak-PKE (for ), and thus it is also a -weak-PKE. Thus, by Lemma 15, can be amplified into a PKE.
Remark 37 (The non-uniform setting).
A similar theorem can be proven when assuming that , and when the PKE is secure against non-uniform adversaries. In this case, we assume that is a non-uniform algorithm that breaks the PKE protocol, and want to construct a non-uniform (randomized) algorithm that decides .
The issue with the above proof is that we cannot simply use to bound the Kolmogorov complexity of as done in the proof of Lemma 33, as does not have constant size. However, we can find using a small Turing machine: Let be the (inefficient) Turing machine that, given a constant such that is a bound on the size of , and an input for , first find the circuit of size at most that maximize the advantage in predicting given an encryption of by , and then execute on the input. Observe that has prediction advantage at least as the advantage of . The theorem now follows using the same proof, by replacing in the proof of Lemma 33 with , and replacing in 34 with .
References
- [1] Miklós Ajtai and Cynthia Dwork. A public-key cryptosystem with worst-case/average-case equivalence. In stoc29, pages 284–293, 1997. See also ECCC TR96-065. doi:10.1145/258533.258604.
- [2] Michael Alekhnovich. More on average case vs approximation complexity. In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 298–307. IEEE, 2003. doi:10.1109/SFCS.2003.1238204.
- [3] Luis Antunes and Lance Fortnow. Worst-case running times for average-case algorithms. In 2009 24th Annual IEEE Conference on Computational Complexity, pages 298–303. IEEE, 2009. doi:10.1109/CCC.2009.12.
- [4] Luis Antunes, Lance Fortnow, Dieter Van Melkebeek, and N Variyam Vinodchandran. Computational depth: concept and applications. Theoretical Computer Science, 354(3):391–404, 2006. doi:10.1016/J.TCS.2005.11.033.
- [5] Benny Applebaum, Boaz Barak, and Avi Wigderson. Public-key cryptography from different assumptions. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 171–180, 2010. doi:10.1145/1806689.1806715.
- [6] Marshall Ball, Yanyi Liu, Noam Mazor, and Rafael Pass. Kolmogorov comes to cryptomania: On interactive kolmogorov complexity and key-agreement. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 458–483. IEEE, 2023. doi:10.1109/FOCS57990.2023.00034.
- [7] Avrim Blum, Merrick Furst, Michael Kearns, and Richard J Lipton. Cryptographic primitives based on hard learning problems. In Annual International Cryptology Conference, pages 278–291. Springer, 1993. doi:10.1007/3-540-48329-2_24.
- [8] Andrej Bogdanov, Miguel Cueto Noval, Charlotte Hoffmann, and Alon Rosen. Public-key encryption from homogeneous clwe. In Theory of Cryptography: 20th International Conference, TCC 2022, Chicago, IL, USA, November 7–10, 2022, Proceedings, Part II, pages 565–592. Springer, 2022. doi:10.1007/978-3-031-22365-5_20.
- [9] Zvika Brakerski, Adeline Langlois, Chris Peikert, Oded Regev, and Damien Stehlé. Classical hardness of learning with errors. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 575–584, 2013. doi:10.1145/2488608.2488680.
- [10] Zvika Brakerski and Vinod Vaikuntanathan. Efficient fully homomorphic encryption from (standard) lwe. Journal of the ACM, 43(2):831–871, 2014.
- [11] Gregory J. Chaitin. On the simplicity and speed of programs for computing infinite sets of natural numbers. J. ACM, 16(3):407–422, 1969. doi:10.1145/321526.321530.
- [12] Whitfield Diffie and Martin E. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, pages 644–654, 1976. doi:10.1109/TIT.1976.1055638.
- [13] Taher ElGamal. A public key cryptosystem and a signature scheme based on discrete logarithms. In Annual International Cryptology Conference (CRYPTO), pages 10–18, 1984.
- [14] Craig Gentry. Fully homomorphic encryption using ideal lattices. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 169–178, 2009. doi:10.1145/1536414.1536440.
- [15] Oded Goldreich and Leonid A. Levin. A hard-core predicate for all one-way functions. In Proceedings of the twenty-first annual ACM symposium on Theory of computing (STOC), pages 25–32, 1989. doi:10.1145/73007.73010.
- [16] Danny Harnik, Joe Kilian, Moni Naor, Omer Reingold, and Alon Rosen. On robust combiners for oblivious transfer and other primitives. In Advances in Cryptology–EUROCRYPT 2005: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Aarhus, Denmark, May 22-26, 2005. Proceedings 24, pages 96–113. Springer, 2005. doi:10.1007/11426639_6.
- [17] J. Hartmanis. Generalized kolmogorov complexity and the structure of feasible computations. In 24th Annual Symposium on Foundations of Computer Science (sfcs 1983), pages 439–445, 1983. doi:10.1109/SFCS.1983.21.
- [18] Shuichi Hirahara and Mikito Nanashima. Learning in pessiland via inductive inference. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 447–457. IEEE, 2023. doi:10.1109/FOCS57990.2023.00033.
- [19] Thomas Holenstein. Strengthening key agreement using hard-core sets. PhD thesis, ETH Zurich, 2006. doi:10.3929/ETHZ-A-005205852.
- [20] Russell Impagliazzo and Leonid A. Levin. No better ways to generate hard NP instances than picking uniformly at random. In focs31, pages 812–821, 1990. doi:10.1109/FSCS.1990.89604.
- [21] Michael Kearns and Leslie Valiant. Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM (JACM), 41(1):67–95, 1994. doi:10.1145/174644.174647.
- [22] Ker-I Ko. On the notion of infinite pseudorandom sequences. Theor. Comput. Sci., 48(3):9–33, 1986. doi:10.1016/0304-3975(86)90081-2.
- [23] A. N. Kolmogorov. Three approaches to the quantitative definition of information. International Journal of Computer Mathematics, 2(1-4):157–168, 1968.
- [24] Yanyi Liu, Noam Mazor, and Rafael Pass. On white-box learning and public-key encryption. Cryptology ePrint Archive, Paper 2024/1931, 2024. URL: https://eprint.iacr.org/2024/1931.
- [25] Yanyi Liu and Rafael Pass. On one-way functions and kolmogorov complexity. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 1243–1254. IEEE, 2020. doi:10.1109/FOCS46700.2020.00118.
- [26] Yanyi Liu and Rafael Pass. On one-way functions and the worst-case hardness of time-bounded kolmogorov complexity. Cryptology ePrint Archive, 2023.
- [27] Robert J McEliece. A public-key cryptosystem based on algebraic. Coding Thv, 4244:114–116, 1978.
- [28] Chris Peikert. Public-key cryptosystems from the worst-case shortest vector problem. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 333–342, 2009.
- [29] Michael O Rabin. Digitalized signatures and public-key functions as intractable as factorization. Technical report, Massachusetts Inst of Tech Cambridge Lab for Computer Science, 1979.
- [30] Oded Regev. On lattices, learning with errors, random linear codes, and cryptography. Journal of the ACM (JACM), 56(6):1–40, 2009. doi:10.1145/1568318.1568324.
- [31] Ronald L Rivest, Adi Shamir, and Leonard Adleman. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2):120–126, 1978. doi:10.1145/359340.359342.
- [32] Peter W Shor. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM review, 41(2):303–332, 1999. doi:10.1137/S0036144598347011.
- [33] Michael Sipser. A complexity theoretic approach to randomness. In Proceedings of the fifteenth annual ACM symposium on Theory of computing, pages 330–335, 1983. doi:10.1145/800061.808762.
- [34] R.J. Solomonoff. A formal theory of inductive inference. part i. Information and Control, 7(1):1–22, 1964. doi:10.1016/S0019-9958(64)90223-2.
- [35] Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984. doi:10.1145/1968.1972.
- [36] Andrew C. Yao. Theory and applications of trapdoor functions. In Annual Symposium on Foundations of Computer Science (FOCS), pages 80–91, 1982. doi:10.1109/SFCS.1982.45.