Abstract 1 Introduction 2 Preliminaries 3 Impossibility Results 4 Algorithmic Result: an LCA given Weighted Sampling Access 5 Discussion and Future Work References

Local Computation Algorithms for Knapsack: Impossibility Results, and How to Avoid Them

Clément L. Canonne ORCID The University of Sydney, Australia Yun Li ORCID The University of Sydney, Australia Seeun William Umboh ORCID The University of Melbourne, Australia
ARC Training Centre in Optimisation Technologies, Integrated Methodologies, and Applications (OPTIMA), Melbourne, Australia
Abstract

Local Computation Algorithms (LCA), as introduced by Rubinfeld, Tamir, Vardi, and Xie (2011), are a type of ultra-efficient algorithms which, given access to a (large) input for a given computational task, are required to provide fast query access to a consistent output solution, without maintaining a state between queries. This paradigm of computation in particular allows for hugely distributed algorithms, where independent instances of a given LCA provide consistent access to a common output solution.

The past decade has seen a significant amount of work on LCAs, by and large focusing on graph problems. In this paper, we initiate the study of Local Computation Algorithms for perhaps the archetypal combinatorial optimization problem, Knapsack. We first establish strong impossibility results, ruling out the existence of any non-trivial LCA for Knapsack as several of its relaxations. We then show how equipping the LCA with additional access to the Knapsack instance, namely, weighted item sampling, allows one to circumvent these impossibility results, and obtain sublinear-time and query LCAs. Our positive result draws on a connection to the recent notion of reproducibility for learning algorithms (Impagliazzo, Lei, Pitassi, and Sorrell, 2022), a connection we believe to be of independent interest for the design of LCAs.

Keywords and phrases:
Local computation algorithms, Knapsack, algorithms, lower bounds
Category:
RANDOM
Funding:
Clément L. Canonne: Supported by the Australian Government through the Australian Research Council DE230101329.
Yun Li: Supported in part by the Australian Government through the Australian Research Council DE230101329 and DP240101353.
Seeun William Umboh: Supported by the Australian Government through the Australian Research Council DP240101353.
Copyright and License:
[Uncaptioned image] © Clément L. Canonne, Yun Li, and Seeun William Umboh; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Distributed algorithms
; Theory of computation Parallel algorithms ; Theory of computation Streaming, sublinear and near linear time algorithms ; Theory of computation Packing and covering problems
Related Version:
Full Version: https://arxiv.org/abs/2504.01543
Acknowledgements:
The authors thank the anonymous reviewers for their helpful suggestions.
Editors:
Alina Ene and Eshan Chattopadhyay

1 Introduction

Most classical algorithms can be divided into three phases: (1) read and pre-process the entire input; (2) perform the computations to solve the problem; and (3) finally output the solution. However, several constraints can impact the effectiveness of these algorithms in practice. For instance, computation might become infeasible with massive size input that could not fit in memory; finding an optimal solution may be exceptionally difficult while a quick response is needed; or the size of the solution itself might be too large to allow for it to be easily outputted. Different algorithmic paradigms have been introduced to address these issues, e.g., the streaming model [1] to address memory constraints, or property testing [7] to address very large inputs in the context of sublinear-time algorithms.

Local Computation Algorithms (LCAs) are sublinear time and space algorithms for search problems first introduced by [15]. The LCA model seeks to capture settings where both input and output are massive, so even efficient algorithms are too time-consuming, as both the time required to fully read the input and describe the output are too large. LCAs implement efficient query access to a small portion of a valid solution of the underlying problem without computing the whole output, while maintaining no state between the various queries it is asked to answer (for a full definition of the model, see Section 2). Due to this last requirement, LCAs are particularly well-suited for the distributed and parallel settings, as they allow for many instances of the algorithm to be run independently, each providing local query access to the same solution to the computational problem at hand.

Most of the work on LCAs, since their introduction, has focused on graph problems, such as Hypergraph Coloring, Independent Set Cover, Maximal Independent Set [15, 2, 6], and Maximum Matching [11, 3], to name a few. Yet, far fewer works have considered other computational tasks in the context of LCAs, and in particular combinatorial optimization questions. In this work, we revisit this state of affairs, focusing on one of the most fundamental combinatorial optimization problems: Knapsack. In this setting, the algorithm is provided with a (read-only) random seed, and given query access to an instance I of Knapsack: upon receiving query asking whether item i is part of an optimal solution, the algorithm is allowed to make a small (sublinear in the instance size) number of lookups111To distinguish between the queries the LCA receives and the queries made by the LCA to the input Knapsack instance, we will refer the latter as “lookups.” to I, and must answer according to some feasible solution C to Knapsack (with high probability). The algorithm must be able to answer as many such queries as desired, in arbitrary order and without maintaining a state between queries, while providing consistent access to the same solution C.

1.1 Our results and contributions

Our first set of results investigates the very possibility of obtaining an efficient local computation algorithm for Knapsack. Our first theorem rules out any sublinear-time222In what follows, and in particular our lower bounds, we ignore the computational complexity aspects of the algorithms and focus on their query complexity, that is, the worst-case number of lookups it needs to perform in order to answer an LCA query about the output. The query complexity clearly lower bounds time complexity; but this distinction is important to avoid vacuous conditional results (assuming PNP), as the search version of Knapsack is NP-Hard. LCA who provides query access to an optimal solution:

Theorem 1 (Informal; see Theorem 13).

There is no sublinear-time LCA for Knapsack that provides consistent query access to an optimal solution.

This hardness result, however, only rules out algorithms for the exact version of Knapsack: one could still hope for non-trivial approximation algorithms in the local computation model. Our second result closes this door as well:

Theorem 2 (Informal; see Theorem 14).

There is no sublinear-time LCA for Knapsack that provides consistent query access to an α-approximate solution, for any fixed α(0,1].

One could still relax the goal in another direction: instead of requiring optimality or near-optimality of the solution provided by the LCAs, one could ask for access to any maximal feasible solution, regardless of its value – that is, any solution which cannot be improved by adding further items. Unfortunately, we show that even this somewhat modest goal is impossible:

Theorem 3 (Informal; see Theorem 15).

There is no sublinear-time LCA for Knapsack that provides consistent query access to a maximal feasible solution.

These strong impossibility results, at first glance, seem to close our line of inquiry: any LCA providing access to a reasonable solution of Knapsack must essentially look up the whole input. But this relies on the fact that the algorithm has only very limited access to the instance of the problem: also random access to the instance I might seem powerful, this also does not allow it to easily leverage any feature of the instance, as the weights and profits across items are essentially independent. One avenue to circumvent these impossibility results is to equip the algorithms with a stronger and natural type of lookups to I: following [10] (in the classical setting), we consider a weighted sampling model, where the LCA can randomly sample items from the instance I proportionally to their profit. Intuitively, this should enable the algorithm to focus on the most relevant items, which are more likely to be sampled. We show that this is indeed the case, and are able to obtain a query-efficient LCA for Knapsack in this weighted sampling model:

Theorem 4 (Main theorem (Informal; see Theorem 17)).

There exists an LCA which, given weighted sampling access to the Knapsack instance, provides consistent query access to a (1/2ε)-approximation, for any fixed ε>0. The LCA has query complexity

(1/ε)O(logn),

where log denotes the iterated logarithm.

We elaborate on the techniques, and provide an outline of the proofs, below.

Technical overview

Our first two impossibility results, Theorems 1 and 2, follow a very similar approach: namely, we reduce the Knapsack problem on n items to computing the OR function on n bits, before invoking the known randomized query complexity lower bound for ORn. While the resulting reduction itself is quite simple, the key aspect is to maintain the type of query access provided (i.e., simulate query access to the constructed Knapsack instance I given query access to the input x{0,1}n for the ORn function) in a local way: that is, for the lower bound to carry through with no loss in parameters, we want to be able to answer any LCA query to I with only a small number – ideally O(1) – of lookups to x.

The third impossibility result, Theorem 3, is more involved, and does not follow from a reduction. Instead, we prove it directly, first restricting ourselves to deterministic algorithms using Yao’s Principle, and defining a suitable distribution of “hard Knapsack instances.” These hard instances have the following structure: the weight limit is K=1, and all items, except for a uniformly randomly chosen pair, have weight zero (recall that since we consider the maximal feasibility version of Knapsack in this theorem, we do not have to define or care about the items’ profits). The remaining two, items i and j, have weights randomly chosen to be either 3/4,1/4 or 3/4,3/4. Note that in the first case, a maximally feasible solution must include both items i,j; while in the second case, only one of the two can – and must – be part of the solution. The crux of the argument is then to use both the consistency and memorylessness requirements of an LCA to show that, unless it makes enough lookups to find the (randomly chosen) location of both special items i,j, any LCA must err on some sequence of (constantly many) queries, either by including both items when it should include only one, or only one when it should include both. This gives an Ω(n) lower bound for the number of lookups required to answer such an O(1)-length sequence of queries, and so an Ω(n) lower bound on the query complexity.

The starting point for our positive result, Theorem 4, is the work of Ito, Kiyoshima, and Yoshida [10], which provides a constant-time randomized algorithm for approximating the optimal value of a solution to a Knapsack instance. At a high level, given a parameter ε their algorithm works by first (implicitly) partitioning the n items, each given by a profit pi and weight wi, in 3 sets: (1) the set L of large items, with profit pi>ε2; (2) the set S of small items, with profit piε2 but large “efficiency ratio” pi/wiε2; and (3) the set G of garbage items, with low profit and low efficiency. (To interpret the notion of efficiency, recall the greedy algorithm for Fractional Knapsack, which first sorts items by non-increasing efficiency ratio before selecting as many as possible in this order.) The algorithm of [10] then samples O~(1/ε4) items proportionally to their profit, discarding all items from G: by a coupon collector argument, with high probability all items from L will be sampled, and further the algorithm will obtain a good approximation of the frequency distribution of items in S: a set of efficiency thresholds e1,,ek, for k=O(1/ε), with the guarantee that for each there is approximately an ε fraction of the total profit on items with efficiency ratio e. Using these two facts, the algorithm can build a new constant-size (i.e., Oε(1)-size) instance I to Knapsack by including all large items and a suitable number of “representatives” for each efficiency from the small items, whose optimal value ε-approximates that of the original instance, and solve this new instance optimally (in exponential time in the size of the new instance) before returning its value.

To adapt this approach to the LCA setting, we must overcome several obstacles. The first is that the above algorithm does not actually solve the original instance of Knapsack (which would be a hard problem), nor an approximation of it: it solves a different, constructed instance, which just happens to have approximately the same optimal value. But our LCA needs to answer very specific queries on the original instance: “is item i in the approximately optimal solution?” While running the algorithm of Ito, Kiyoshima, and Yoshida to obtain an optimal solution to I would let us answer this type of queries for “large” items (as these are included verbatim in the constructed instance I) and “garbage” items (as these are discarded, and not part of any solution), it does not readily let us get this information for any of the “small” items. To address this, we leverage the structure of the standard 1/2-approximation algorithm for Knapsack (which takes the best of two solutions: that returned by the greedy algorithm mentioned above, and the singleton solution obtained by including the first item left out by this greedy algorithm). By running this 1/2-approximation algorithm on the constructed instance I, we end up with a threshold τ and very simple rule to decide whether a (constructed) item i is in the approximate solution to the (constructed) solution I: check its efficiency ratio against this threshold τ. But this threshold, by construction, also allows us (after carefully handling some corner cases) to decide whether an item i of the original instance is in a good feasible solution of the original instance I: check if it is a large item (then we can decide easily if it is in the solution), a garbage item (then it is not), and otherwise it is a small item: check its efficiency ratio against τ.

The above outline almost gets us where we want, and would lead (if correct) to a poly(1/ε)-query LCA for (1/2ε)-approximate Knapsack. Unfortunately, it has one major issue, leading to our second obstacle: the sampling step used to approximate the distribution of efficiency profiles of the small items needs to be performed for each query, as the LCA is not allowed to maintain state between queries, this random sampling will lead to inconsistent answers. Indeed, even small variations in the efficiency thresholds e1,,ek computed from this sampling step may lead to a different threshold τ, and as a result to our LCA failing to answer consistently to a single feasible solution. The solution would be to somehow have our sampling step give the same results across repetitions, which is of course not possible: one can easily design examples where randomly sampling items gives different outcomes with high probability, even across only a few repetitions; and naive attempts at rounding the sampled values suffer the same issue.

To solve this second major hurdle, we take recourse in the recent framework of reproducible learning (Impagliazzo, Lei, Pitassi, and Sorrell [9]), originally proposed to ensure that learning algorithms output (with high probability) the same answer when run on a fresh sample of training data. Making this connection between the LCA consistency requirement and the reproducible learning definition, we are able to leverage the reproducible median algorithm of [9] (suitably generalized to quantiles instead of simply median) to perform the frequency estimation step of our algorithm in a consistent fashion. Combining these building blocks together then yields our final algorithm: the logn dependence stemming from the use of the reproducible median, which provably requires this dependence on the domain size.

1.2 Related Work

Local Computation Algorithms

Since the introduction of Local Computation Algorithms in [15], many classical graph problems have been studied under this lens [15, 2, 6, 11, 8, 3], both from the upper and lower bound sides.

There is a strong connection between distribution algorithms and LCAs, which follows from earlier work by Parnas and Ron [14]: broadly speaking, any d-round LOCAL algorithm with degree bound Δ can be simulated by a q-query LCA, where q=O(Δd).

Another important design technique for LCAs (and one our algorithms leverage) is that of simulating greedy algorithms. For instance, [13] showed that some subclass of greedy algorithms for graph problems can be simulated by first randomly assigning a random number to the vertices, inducing a random ordering to the vertices. This technique was later generalized and improved by [18]. The connection to LCAs was first made explicit in [12], which used it to show how to convert some online algorithm to LCAs while preserving the same approximation ratio. This led, among others, to LCAs for maximal matching with O(log3n) time and space complexity and several load balancing problems with O(logn) time and space complexity.

We conclude this short review of LCAs by mentioning the recent work of Biswas, Cao, Pyne, and Rubinfeld [4], which introduced a variant of the LCA model where the input is assumed to come from some probabilistic process (e.g., random Erdős–Rényi graph). It would be interesting to see if such average-case assumptions could lead to faster LCAs for Knapsack on “almost all instances,” or even allow one to bypass our impossibility results.

Knapsack

The Knapsack problem is perhaps the archetypal combinatorial optimization problem, and one of Karp’s original 21 NP-Hard problems. Many variants and relaxations have been studied over the years, including under stochastic assumptions, or various restrictions on the input. Given the vast body of work on Knapsack, we only discuss below the two most relevant to this paper, and refer the interested reader to [16, Section 3.1] for a more extensive coverage of the question and some of its approximation algorithms.

One important relaxation is when the quantity of each item to be included, instead of being binary, is allowed to take any value in [0,1]. This is known as the Fractional Knapsack, and can be solved optimally by either linear programming or a very simple greedy algorithm which first sorts items by non-increasing ratio of pi/wi (profit-to-weight), and then greedily picks items in this order until the Knapsack capacity is exhausted. This greedy algorithm, while in itself does not yield any non-trivial approximation ratio for the original Knapsack problem, can be easily modified to yield a 1/2-approximation: see, e.g., [16, Exercise 3.1]. We will draw on this simple 1/2-approximation algorithm for our main algorithmic result. The second result on Knapsack we will rely on is the aforementioned work of Ito, Kiyoshima, and Yoshida [10], which, for any constant ε>0, shows how to approximate (with high probability) the value of an optimal solution to a Knapsack instance to an additive ±ε in constant time (i.e., poly(1/ε)). Their work, which we draw inspiration from, relies on the ability to sample items from the instance with probability proportional to their profit.

2 Preliminaries

Throughout the paper, we use [n] to denote the set of integers {1,2,n}, and standard asymptotic notation O(),Ω(),Θ(), as well as the (slightly) less standard O~() which omits polylogarithmic factors in its argument. We recall the definition of log, the iterated logarithm:

logn={0 if n11+loglogn otherwise 

As out work will be concerned with approximation algorithms, we recall the general definition:

Definition 5 ((α,β)-approximation algorithm).

For any α[0,1] and β0, an (α,β)-approximation algorithm for an optimization problem that for all instances of the problems produces a solution whose value is

  1. 1.

    at least (αOPTβ) if the problem is a maximization problem;

  2. 2.

    at most (αOPT+β) if the problem is a minimization problem

where OPT is the value of the optimal solution.

We now formally define Local Computation Algorithms. This definition is the standard definition of [15], tailored to the Knapsack problem.333The usual definition of LCAs includes a first parameter, s(n), for the space complexity of the LCA; we omit it for simplicity, as it is not the focus our of work (neither lower nor upper bounds), which is the query complexity.

Definition 6 (LCA for Knapsack Problem).

A (t(n),δ(n))-LCA 𝒜 for Knapsack is a (randomized) algorithm which is given access to a read-only random seed r{0,1}, and query access to a Knapsack Instance I=(S,K) on n=|S| items, where the total profit of items in S is normalized to 1, and the (integer) weight of any item in S is at most K. The algorithm must support the following type of queries: on input i[n], after making lookups to the instance I and running in time at most t(n), 𝒜 outputs whether item i is part of a feasible solution C, and must be correct with probability at least 1δ(n). Importantly, C only depends on the input instance I and random bits r used during computation. Additionally, each run has no access to previous computation results. The quantity t(n) is referred to as the time complexity, and δ(n) the failure probability.444Note that one would want to set δ(n) to be 1/poly(n), or at most O(1/q) when the LCA is expected to answer q queries (so that all queries are answered successfully with high probability, by a union bound). Our lower bounds hold even for δ(n)=Ω(1), making them even stronger.

We also recall two desirable properties of LCAs, which are often taken to be part of the definition:

Definition 7 (Parallelizable).

We say an LCA 𝒜 is parallelizable if multiple copies of 𝒜 can be run at the same time. Runs are consistent to the same solution as long as their input Knapsack instance I is the same and use the same random seed r during computation.

Definition 8 (Query-Order Oblivious).

We say an LCA 𝒜 is query-order oblivious if the outputs of 𝒜 do not depend on the order of the queries but depend only on the input and the random seed r.

In what follows, when discussing LCAs we will always refer to parallelizable, query-order oblivious LCAs. Of course, this definition by itself is not enough, as one could achieve a (O(1),O(1),1)-LCA trivially by always answering “no” to any query (which would be consistent with the feasible solution ). We typically will require a guarantee on the profit of the solution as well.

We will also require the definition of reproducible algorithm, as introduced in [9]. A reproducible algorithm returns the same output on two distinct runs with high probability, provided that (1) the input samples it takes in both runs come from the same distribution, and (2) it uses the same internal randomness on both runs:

Definition 9 (Reproducibility, [9]).

Let D be a probability distribution over a universe 𝒳, and let 𝒜 be a randomized algorithm with sample access to D. An algorithm 𝒜 is ρ-reproducible if

Prs1,s2,r[𝒜(s1;r)=𝒜(s2;r)]1ρ,

where s1 and s2 denote sequences of samples drawn i.i.d. from D, and r denotes a random binary string representing the internal randomness used by 𝒜.

In their work, Impagliazzo et al. provide a reproducible algorithm to compute an approximate median, defined as follows:

Definition 10 (τ-approximate median).

Let D be a distribution over a well-ordered domain 𝒳. For τ[0,1/2], an element x𝒳 is a τ-approximate median of D if PrXD[Xx]1/2τ and PrXD[Xx]1/2τ.

The guarantees of their algorithm are given in the theorem below:

Theorem 11 ([9, Theorem 4.2]).

Let τ,ρ[0,1] and let δ=1/3. Let D be a distribution over 𝒳, where |𝒳|=2d. Then there exists an algorithm 𝚛𝙼𝚎𝚍𝚒𝚊𝚗ρ,d,τ,δ which is ρ-reproducible, outputs a τ-approximate median of D with success probability 1ρ/2, and has sample complexity

Ω~((1τ2ρ2)(3τ2)log|𝒳|).

Finally, recall that in the Knapsack problem, an instance I consists of a list of n items (a1,,an), each with their own non-negative value pi and weight wi0 (so that ai=(pi,wi)), and the weight limit K0. The task to select a set S[n] of items which maximizes total value iSpi without exceeding the weight limit: iSwiK.

3 Impossibility Results

To establish our lower bounds, we will reduce from a query complexity problem. Recall that in query complexity, given a function f:{0,1}n{0,1}, the objective is to analyse the minimum number of queries (bits) to an input x{0,1}n an algorithm has to read in order to successfully compute the value f(x). The randomised query complexity R(f) is the worst-case query complexity (over all possible inputs x{0,1}n) of any randomised algorithm which correctly computes f, on any input, with probability at least 2/3. For more on query complexity, the reader is referred to the survey [5]. In particular, we will use the following standard result on the OR function, ORn, defined as ORn(x)=i=1nxi:

Lemma 12.

The randomised query complexity of the OR function on n bits is R(ORn)=Ω(n).

Theorem 13.

Any (t(n),1/3)-LCA for Knapsack which provides query access to an optimal solution must satisfy t(n)=Ω(n).

Proof.

Let 𝒜 be any (t(n),1/3)-LCA for Knapsack. We will show how to use it to compute the function ORn1 with at most t(n) queries to the input (and success probability at least 2/3: by Lemma 12, this will imply t(n)=Ω(n).

On input x{0,1}n1, we will simulate access the Knapsack instance I(x) on n items and weight limit K=1 defined as follows:

(pi,wi)={(xi,1) if 1in1(1/2,1) if i=n

That is, all n items have weight equal to the weight limit, and so any feasible solution can only contain at most one item. It is easy to see that ORn1(x)=1 if, and only if, an optimal solution to the Knapsack instance has value 1; otherwise, it has value 1/2 (as xi=0 for all 1in1)), and the only optimal solution is the singleton {sn} which has profit 1/2.

Figure 1: An illustration of the reduction. Given query access to an input x{0,1}n1 (top), we simulate query access to a Knapsack instance I(x) (bottom) with weight limit K=1, such that the n-th item belongs to the optimal solution (which is then unique) if, and only if, ORn(x)=0.

Moreover, one can simulate any lookup made by 𝒜 to this instance I(x) at the cost of (at most) one query to x:

  • If 𝒜 looks up item sn, answer (1/2,1) (no query to x needed);

  • If 𝒜 looks up item si for some 1in1, query xi and answer (xi,1) (one query to x needed).

Given the above, to compute ORn(x), it suffices to make a single query to the LCA 𝒜: namely, the query asking if sn is in the optimal solution it gives access to. By the above discussion, sn is part of an optimal solution if, and only if, ORn1(x)=0, so the answer given by 𝒜 on this single query provides (with probability at least 2/3) the value ORn1(x).

To conclude, note since the LCA 𝒜 can answer any single query to an optimal solution to our instance I(x) (of size n) in time at most t(n), it makes at most t(n) lookups to the instance I(x): indeed, each lookup takes unit time, so the number of lookups is a lower bound on the time complexity. Overall,

t(n)number of lookups by 𝒜to the instance I(x)number of queriesmade to the input x

So our reduction allows us, on any input x, to compute (with probability at least 2/3) ORn1(x) with at most t(n) queries to x: by Lemma 12, it follows that t(n)=Ω(n). The above result rules out LCAs which provide local access to an optimal Knapsack solution. We will reduce from the same query complexity problem to a slightly different Knapsack Instance to show that any LCA that gives query access to any finite approximate Knapsack solution takes at least Ω(n) time.

Theorem 14.

Fix any α(0,1]. Any (t(n),1/3)-LCA for Knapsack which provides query access to an α-approximate feasible solution must satisfy t(n)=Ω(n).

Proof.

Let 𝒜 be any (t(n),1/3)-LCA for α-approximation Knapsack, and fix any 0<β<α. On input x{0,1}n1, we simulate access the Knapsack Instance I(x) on n items and weight limit K=1 defined as follows:

(pi,wi)={(xi,1) if 1in1(β,1) if i=n

i.e., the same instance as in the proof of Theorem 13, but with the profit of the last item set to β. As before, any feasible solution of I(x) contains at most one item as all n items have the weight equal to the weight limit. The value of the optimal solution is then either 1 (if at least one of the the xi’s is 1) or β (if all xi’s are 0, in which case {sn} is the optimal solution. Thus {sn} is an α-approximation solution if, and only if, ORn1(x)=0; in this case, sn=(β,1) has higher profit than any other items which makes it the optimal solution, hence it is also a unique α-approximation solution for I(x). Otherwise the optimal solution contains an item with profit 1, and {sn} is not an α-approximation solution as β<1α. Again, we can simulate any lookup by 𝒜 to this instance I(x) at the cost of one query to x:

  • If 𝒜 looks up item sn, answer (β,1) (no query to x needed);

  • If 𝒜 looks up item si for some 1in1, query xi and answer (xi,1) (one query to x needed).

It is sufficient to compute ORn1(x) by making a single query to the LCA 𝒜: by asking whether sn is in a α-approximation solution it gives access to. Since the LCA 𝒜 can answer any single query to a α-approximation solution to our instance I(x) (of size n) in at most t(n) time, 𝒜 makes at most t(n) lookups to the instance I(x). And any lookup made to I(x) can by simulated by at most one query to x. Therefore, t(n) is lower bounded by the query complexity of ORn1(x). By Lemma 12, it follows that t(n)=Ω(n). Since we can choose α to be arbitrarily close to 0, we get that there is no sublinear time LCA that answers to any finite approximation Knapsack solution with failure probability at most 1/3.

Having ruled out in Theorems 13 and 14 any sublinear-time LCAs for optimal and approximate Knapsack, we further relax the constraint to show that even asking for an different relaxation of Knapsack is impossible. That is, instead of giving answers consistent to a finite approximation solution, we ask if an LCA can give access to a maximal feasible Knapsack solution in sublinear time. Instead of proving the impossibility result directly for LCAs, we will show that no deterministic algorithm can provide query access to a maximal feasible solution on some difficult input distribution in sublinear time. By Yao’s Principle [17], this will imply that no randomized algorithm can provide sublinear time query access to maximal feasible Knapsack.

Theorem 15.

Any (t(n),4/5)-LCA for Knapsack which provides query access to an maximal feasible solution must satisfy t(n)=Ω(n).

Proof.

We define the distribution of items S of the input Knapsack instance I as follows:

  1. 1.

    select uniformly at random a pair of indices (i,j){1,2,,n}2

  2. 2.

    assign wi=3/4, and

    wj={1/4 with probability 1/23/4 with probability 1/2
  3. 3.

    every other item k{i,j} is assigned wk=0.

Let 𝒜 be a deterministic algorithm for Maximal Feasible Knapsack with time complexity t(n)<n11 and probability of success at least 4/5.

The weight limit K is 1. (Since we are looking at maximal feasible solutions, the profits of the items are irrelevant: we set them all to 0 (i.e., set pk=0 for all k) and ignore them for the rest of the proof.) Note that if wj=1/4, then the (unique) maximal solution is C={s1,,sn} (all items); while if wj=3/4, then the two distinct maximal solutions are C(i)={s1,,sn}{si} and C(j)={s1,,sn}{sj}. The following claim will be crucial:

Lemma 16.

Suppose 𝒜 receives query sk, such that wk=3/4. Then with probability at least 9/10 over the choice of the instance I, 𝒜 outputs yes (i.e., that this item is in the maximal feasible solution).

Proof.

Given such a query sk, in view of our choice of input distribution, there are two options: either k=i (since wi=3/4 always), or k=j and wj=3/4. So k{i,j}: let k be the other value of the couple (that if, if k=i then k=j, and vice-versa).

On query sk, we assume that the algorithm knows that wk=3/4 (since this only take one lookup to the instance I to learn this, given k). The algorithm makes deterministically a sequence of at most qt(n)<n11 lookups to the instance I, where lookup reveals the weight w of item . Without loss of generality, we can assume that the algorithm does not look up an item it already knows the weight of; so it looks up at most q<n11 new items. Since k is uniformly distributed among the remaining n1 items, the probability that any of these q lookups (even made adaptively) finds the other non-zero-weight item sk is at most

nn1111110

since the answer to every single lookup made is answered by “w=0.” We denote by Ei the event

Ek={on query k,A does not look up any other item with non-zero weight }

By the above, Pr[Ek]910. Now, the claim is that conditioned on Ek, the algorithm must say yes to the query (i.e., that sk is in the maximal solution it provides access to). To see why, note that given only the information given (namely, the fact that the k-th item has weight 3/4, and that all the other items looked up have weight 0), the three following cases are possible given our distribution over instances I:

  • the k-th item is sj, and there is somewhere another item, si, with weight 3/4 (this has probability 1/3)

  • the k-th item is si, and there is somewhere another item, sj, with weight 3/4 (this has probability 1/3)

  • the k-th item is si, and there is somewhere another item, sj, with weight 1/4 (this has probability 1/3)

In the first two cases, the algorithm could reply either yes or no and still be consistent with one of the two maximal solutions; but in the third case, the algorithm must respond yes since all items are in the (unique) maximal solution. Since we are in the third case with probability 1/3 (and that the algorithm gets no useful information except with probability at most 1/10), to achieve error less than 1/5<1/31/10 overall the algorithm must respond yes. By the principle of deferred decisions, we can make the random choice of deciding the value of wj (either 1/4 or 3/4 uniformly at random) when 𝒜 looks up sj.

Consider what happens when the LCA 𝒜 receives queries for item si, then sj. By the above claim (for k=i), on query si, 𝒜 will output yes with probability at least 9/10 (over the choice of the instance).

On the second query sj, we then have two choices: with probability 1/2, wj=1/4 and the algorithm can safely respond yes. With probability 1/2, however, wj=3/4, and by the same claim (for k=j) will then output yes with probability at least 9/10.

This means, by a union bound, that the algorithm is correct with probability at most

Pr[Ei¯]+Pr[wj34]+Pr[Ej¯]110+12+110<45

since with probability Pr[Ei{wj=3/4}Ej] it says yes to item si with weight 3/4 and item sj which has weight 3/4; i.e., the solution it is consistent with is not feasible.

This implies that any deterministic algorithm for maximal feasible Knapsack which is correct with probability at least 4/5 on this specific input distribution must make at least n11 lookups. By Yao’s Principle, this implies that any randomized algorithm cannot answer Maximal Feasible Knapsack with success probability 4/5 unless t(n)n11.

4 Algorithmic Result: an LCA given Weighted Sampling Access

We have shown in the last section that it is impossible to obtain a (non-trivial) LCA for Knapsack, or even some relaxations of it. The crux of the issue lies in the “needle in a haystack” phenomenon: in order to be consistent across queries, an LCA seemingly needs to get a global sense of the distribution of weights and profits of the instance, and in particular the largest profits. Yet, given only query access to the instance, finding these requires a very large number of lookups.

To mitigate this, we adapt the weighted sampling model considered in [10], which equips the algorithm with the ability to sample items with probability proportional to their profit (assuming, essentially without loss of generality, that the total profit and weight are both normalized to 1). In this setting, we establish the following result:

Theorem 17.

Algorithm 2 (𝙻𝙲𝙰𝙺𝙿) is a (t(n),ε)-LCA for Knapsack which provides query access to an (1/2,6ε)-approximate solution, where t(n)=(1/ε)O(logn).

The remainder of this section is dedicated to proving this theorem. Before we describe the general idea of the algorithm, we first cover the weighted sampling model and results from [10] we will use in our algorithm. Fixing ε(0,1], the items of the Knapsack instance I=(S,K) are partitioned into three sets:

L(I) :={(p,w)S:p>ε2} (items with high profit)
S(I) :={(p,w)S:pε2 and p/wε2} (items with low profit but high efficiency)
G(I) :={(p,w)S:pε2 and p/w<ε2} (items with low profit and low efficiency)

The items in L(I), S(I), G(I) are referred as large items, small items and garbage items respectively. The following lemma, which follows from a coupon-collector type argument, ensures that one can sample all sufficiently heavy elements with high probability.

Lemma 18 (Lemma 2, [10]).

Let B={(w,p)I:pδ}. By taking 6δ1(logδ1+1) samples using weighted sampling, all items of B are sampled at least once with probability at least 5/6.

Importantly, this success probability can be easily amplified by repetition from 5/6 to 1ε at the cost of a logarithmic factor in 1/ε, which we will rely on later.

Small items are partitioned over intervals of efficiency such that the total profits of items within each interval is approximately ε. The set of small items in each efficiency intervals is denoted by A0(I),A1(I),,At(I), and the corresponding efficiency thresholds are denoted by a non-increasing efficiency sequence e1e2et, such that:

A0(I) :={(p,w)S(I):p/we1}
Ak(I) :={(p,w)S(I):ek>p/wek+1} (1kt1)
At(I) :={(p,w)S(I):et>p/w}

A crucial definition is then that of an equally partitioning sequence, which is a sequence of efficiency threshold such that the total profit of each efficiency interval satisfies iAk(I)piε:

Definition 19 (Equally Partitioning Sequence (EPS), [10]).

We say the efficiency sequence e1,e2,,et is equally partitioning with respect to I if, for all 0it1, Ai(I) has total profit within [ε,ε+ε2) and At(I) has total profit within [0,ε+ε2).

The authors of [10] also show how to obtain an EPS, with probability at least 1O(1/m2), from sampling O(1ε4log1ε) items.555The statement in [10] only claims success probability 5/6; however, inspection of the proof show the authors establish the stronger claim. Now, their algorithm, given weighted sampling access to the original instance I and a parameter ε, constructs a different instance I~ (of size poly(1/ε)) as follows:

  1. 1.

    It samples a multiset of O(1ε4log1ε) items, and puts all the large items into a set M. (By Lemma 18, all items of L(I) are included in M, except with probability 1/6.)

  2. 2.

    It samples a multiset of O(1ε4log1ε) items, and uses it to obtain an Equally Partitioning Sequence e~1,,e~t, except with probability 1/6.

  3. 3.

    From the above set M and the EPS, it constructs a new instance I~=(S~,K~) as follows. It creates three disjoint sets L(I~),S(I~),G(I~) mimicking the sets L(I),S(I),G(I) defined for I:

    L(I~) :=M,
    S(I~) :=k=0t1Ak(I~), where Ak(I~) contains exactly ε1 copies of (ε2,ε2/e~k+1),
    G(I~) :=

    and defines S~:=L(I~)S(I~)G(I~) and K~=K.

Note that t=O(1/ε) and |M||L(I)|1/ε2, and so I~ is an instance on only O(1/ε2) items. We call the above procedure the I~-construction algorithm. The central lemma they show then relates the value of the optimal solution to the new Knapsack instance I~ to that of the original instance I.

Lemma 20 (Lemma 1, [10]).

Let I=(S,K) be an instance of Knapsack, and let I~ be the instance constructed from I using an equally partitioning sequence with respect to I. Then, OPT(I~)ε is a (1,6ε)-approximation to OPT(I).

4.1 General Idea of the Algorithm

As in the algorithm of [10], we will use weighted sampling to gather items with high profit and learn the distribution of the efficiencies of small items with high probability. Then we will compute the greedy solution of the corresponding simplified Knapsack instance I~, as defined above. (Recall that the greedy algorithm for the Knapsack problem first sorts the items over their efficiency, then it includes the items in descending order until the output set reaches the weight limit; we will refer to the efficiency of the first item that the greedy solution cannot fully include as the efficiency cut-off.) The greedy solution will either be the set of items with efficiency higher than the efficiency cut-off, or the one item with efficiency cut-off if that item’s profit is higher than the total profit of the items with higher efficiency.

Our LCA will determine the queried item is chosen or not based on the efficiency cut-off of the greedy solution of the simplified Knapsack of that run. As discussed in Section 1.1, one challenge lies in the fact that the simplified instance I~ is constructed by sampling, hence items of the simplified instance and the corresponding efficiency cut-off may (and likely will) be different between each run, jeopardizing the consistency requirement of the LCA. To address this issue, we generalize the reproducible median algorithm from [9] to a reproducible quantile algorithm that will output the exact same approximation quantiles of the efficiency of the items with high probability. We will start with the reproducible quantile algorithm.

4.2 Reproducible Quantile Algorithm

We will leverage the reproducible algorithm of [9] (Theorem 11) to compute the quantiles in a reproducible fashion. The idea is to reduce the task of computing quantiles to that of computing the median. Given an array T of n elements, to find the p-quantile of T, we can add x many and y many + to T, such that x+pn=(1p)n+y and x+y=n, let the new array be T. The median of array T will be the same value as the p-quantile of array T.

Similarly, we can compute the reproducible p-quantile of a given distribution D by computing the reproducible median of a new distribution D such that:

  1. 1.

    The domain of D is the union of the domain of D (assumed to be of size 2d) and {+,}; note that we can bound domain size of D by 2d+1,

  2. 2.

    For any element e from distribution D, PriD[i=e]=PriD[i=e]2,

  3. 3.

    PriD[i=]=x,PriD[i=+]=y,x=(1p)/2,y=p/2.

Algorithm 1 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎(s,p).
Theorem 21 (𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎 Correctness and Query Complexity).

Let τ,β,ρ[0,1]. Let D be a distribution over 𝒳, with |𝒳|=2d. Then 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β (Algorithm 1) is a reproducible algorithm with sample complexity

O~((1τ2(ρβ)2)(12τ2)log|𝒳|+1)

which, on input p, outputs a τ-approximate p-quantile of D with probability at least 1β.

Proof.

We first argue the correctness of 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎. 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎 returns a value v which is a ρ-reproducible τ/2-approximation median of distribution D except with probability β. Thus,

PriD[iv]1/2τ/2 and PriD[iv]1/2τ/2.

By the definition of D,

PriD[iv]=2PriD[iv and i]2(1/2τ/2(1p)/2)=pτ

and

PriD[iv]=2PriD[iv and i+]2(1/2τ/2p/2)=1pτ.

Therefore, 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎 returns a τ-approximation p-quantile of distribution D. By [9, Theorem 4.8], 𝚛𝙼𝚎𝚍𝚒𝚊𝚗ρ,d,τ,β has sample complexity of

O~((1τ2(ρβ)2)(3τ2)log|𝒳|).

Since 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎 runs 𝚛𝙼𝚎𝚍𝚒𝚊𝚗 with target accuracy τ/2 and domain size 2d+1, the sample complexity of 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎 is

O~((1τ2(ρβ)2)(12τ2)log|𝒳|+1),

as desired.

Mapping to a finite domain

𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎 can only compute quantiles with respect to a distribution D over a finite domain 𝒳 (of some size of the form 2d) since the recursion of 𝚛𝙼𝚎𝚍𝚒𝚊𝚗 terminates when the domain size become 2. However, in our case the distribution D is the distribution of efficiencies, which (since both profits and weights are a priori arbitrary positive numbers) have domain >0. To address this, we first note that the weights are not quite arbitrary: before the normalization assumption, they were all positive integers. Assuming a poly(n) bound on the number of bits needed to describe any given weight, we get that the domain of the (normalized) weights is of the form 𝒲={1/B,2/B,,1}, where B=1/2poly(n).

We can assume the same for the profits, i.e., that all profits take value in the domain 𝒫={0,1/B,2/B,,1} for some B=1/2poly(n).666Another possibility is to avoid this bit complexity argument and instead use a rounding procedure, standard in the fully polynomial-time approximation scheme (FPTAS) for Knapsack (see [16, Section 3.2]), which rounds each profit “on-the-fly” to a multiple of ε/n, as the price of an 1ε approximation factor loss in the solution. This would bound the size of the domain of the profits by n/ε, for any fixed choice of ε>0. We here describe the bit-complexity bound argument, as it mimics that of the weights. Overall, this ensures that each efficiency is of the form

BBab

for some 0aB and 1bB. This leads to the domain 𝒳 of the efficiencies being known and finite (but huge), of size bounded by |𝒳|=(B+1)B=2poly(n). Note that this implies that the dependence on the domain size from 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎, which involves log|𝒳| (in the exponent), will be sublinear in n, as

log|𝒳|=log(2poly(n))=O(logn).

4.3 From Reproducible Quantiles to our Knapsack LCA

We now put all the pieces together to analyze our final algorithm, Algorithm 2, thus establishing Theorem 17. We will show that the solution C the algorithm answers according to is (1) feasible (Lemma 23), (2) a good approximate solution (Lemma 24), and (3) consistent across queries (Lemma 25); before bounding the sample (query) complexity in Lemma 26.

Algorithm 2 𝙻𝙲𝙰𝙺𝙿.
Algorithm 3 𝙲𝙾𝙽𝚅𝙴𝚁𝚃𝙶𝚁𝙴𝙴𝙳𝚈.
Algorithm 4 𝙼𝙰𝙿𝙿𝙸𝙽𝙶𝙶𝚁𝙴𝙴𝙳𝚈.

Without loss of generality, we will refer to the items by their index after sorting in 𝙲𝙾𝙽𝚅𝙴𝚁𝚃𝙶𝚁𝙴𝙴𝙳𝚈. We separate 𝙲𝙾𝙽𝚅𝙴𝚁𝚃𝙶𝚁𝙴𝙴𝙳𝚈 and 𝙼𝙰𝙿𝙿𝙸𝙽𝙶𝙶𝚁𝙴𝙴𝙳𝚈 as we will only call 𝙲𝙾𝙽𝚅𝙴𝚁𝚃𝙶𝚁𝙴𝙴𝙳𝚈 in 𝙻𝙲𝙰𝙺𝙿. We start with the below lemma, whose proof is deferred to the full version due to space constraints.

Lemma 22.

Conditioned on the sample R containing all elements of L(I) after line 1, then, with probability at least 113ε/36, the approximate quantile sequence e~1,,e~t in 𝙻𝙲𝙰𝙺𝙿 (Algorithm 2) is an EPS with respect to I.

Lemma 23.

The output C of 𝙼𝙰𝙿𝙿𝙸𝙽𝙶𝙶𝚁𝙴𝙴𝙳𝚈 (Algorithm 4) is a feasible solution of I.

Proof.

We argue the feasibility of C, that is w(C)K. First we consider the special case where C={(pj+1,wj+1)}. In this case pj+1>i=1jpi, since all items in S(I~) have the same profit ε2, (pj+1,wj+1) must be an item with profits greater than ε2, which implies (pj+1,wj+1)L(I~). Since L(I~)L(I) and all items in I have weight at most K, w(C)=wj+1K and so C is feasible.

Now we consider the general case where

C={(p,w)L(I)p/wpj/wj}{(p,w)S(I)p/we~k2}.

Let A0(I),A1(I),,At(I) be the partition of items in S(I) with respect to efficiency sequence e~1,e~2,,e~t. Let A0(I~),A1(I~),,At1(I~) be the partition of items in S(I~) with respect to efficiency sequence e~1,e~2,,e~t. By construction, Ai(I~) contains ε1 copies of item (ε2,ε2/e~i+1) for i=0,1,,t1. We consider the total weight of the items in S(I~) that are included in the greedy solution 𝙶𝚁𝙴𝙴𝙳𝚈(I~). Since e~k>pj/wj, all items in Ai(I~) must be included in the greedy solution 𝙶𝚁𝙴𝙴𝙳𝚈(I~), for i=0,1,,k1. Therefore,
w(S(I~)𝙶𝚁𝙴𝙴𝙳𝚈(I~)) i=0k1w(Ai(I~)) =ε1ε2/e~1+ε1ε2/e~2++ε1ε2/e~k =ε1ε2i=1k(1/e~i) (ε11)ε2i=1k(1/e~i) =(ε+ε2)i=1k2(1/e~i)2ε2i=1k2(1/e~i)+(εε2)(1/e~k1+1/e~k) >(ε+ε2)i=1k2(1/e~i)2ε2i=1k2(1/e~k2)+(εε2)(1/e~k2+1/e~k2) =(ε+ε2)i=1k2(1/e~i)2ε2(k1ε1)/e~k2.

Since the total profit of the items in I is 1 and p(Ai(I))ε, the length of the efficiency sequence t is at most ε1. Since kt and k<ε1ε1, combining both inequalities give us (k1ε1)<0. Therefore,

w(S(I~)𝙶𝚁𝙴𝙴𝙳𝚈(I~))(ε+ε2)i=1k2(1/e~i)=i=1k2((ε+ε2)/e~i)i=0k3w(Ai(I)).

The last inequality is implied by the fact that e~i+1 lower bounds the efficiency of the items in Ai(I) while the total profit of Ai(I) is upper bounded by ε+ε2. Now we consider the total weight of items in L(I~) that are included in 𝙶𝚁𝙴𝙴𝙳𝚈(I~). Since L(I~)L(I), we get w(L(I~)𝙶𝚁𝙴𝙴𝙳𝚈(I~))=w(L(I)𝙶𝚁𝙴𝙴𝙳𝚈(I~)). Combining these inequalities, we have

w(C) =w({(p,w)L(I)p/wpj/wj})+w({(p,w)S(I)p/we~k2})
w(L(I~)𝙶𝚁𝙴𝙴𝙳𝚈(I~))+w((S(I~)𝙶𝚁𝙴𝙴𝙳𝚈(I~))
=w(𝙶𝚁𝙴𝙴𝙳𝚈(I~))
K~=K,

concluding the proof.

Lemma 24.

If I~ contains all large items from I, then C is a (1/2,6ε)-approximation solution of I.

Proof.

We consider the difference between the total profit of C and 𝙶𝚁𝙴𝙴𝙳𝚈(I~). Let OPT(I) be the optimal solution of instance I and OPT(I~) be the optimal solution of instance I~. We first consider the case where C={(pj+1,wj+1)}. In this case p(C)=p(𝙶𝚁𝙴𝙴𝙳𝚈(I~)). By Lemma 20,

p(C) =p(𝙶𝚁𝙴𝙴𝙳𝚈(I~))1/2p(OPT(I~))1/2(p(OPT(I))5ε)
=1/2p(OPT(I))5ε/2.

Now we consider the case where C={(p,w)L(I)p/wpj/wj}{(p,w)S(I)p/we~k2}. Since we assume I~ contains all large items from I, therefore {(p,w)L(I)p/wpj/wj}=𝙶𝚁𝙴𝙴𝙳𝚈(I~)L(I~). We consider the difference between the profit of 𝙶𝚁𝙴𝙴𝙳𝚈(I~)S(I~) and {(p,w)S(I)p/we~k2}. The least efficient item (pj,wj) is either in S(I~) or L(I~).

Subcase 1.

If (pj,wj)S(I~), then i=0i=k1Ai(I~)𝙶𝚁𝙴𝙴𝙳𝚈(I~)S(I~)i=0i=kAi(I~). Since p(Ai(I))[ε,ε+ε2) and p(Ai(I~))ε for i=0,1,,t1,

p({(p,w)S(I)p/we~k2}) =i=0k3p(Ai(I))i=0k3p(Ai(I~))i=0kp(Ai(I~))3ε
p(𝙶𝚁𝙴𝙴𝙳𝚈(I~)S(I~))3ε.
Subcase 2.

If (pj,wj)L(I~), then i=0i=k1Ai(I~)=𝙶𝚁𝙴𝙴𝙳𝚈(I~)S(I~). Since p(Ai(I))[ε,ε+ε2) and p(Ai(I~))ε for i=0,1,,t1,

p({(p,w)S(I)p/we~k2}) =i=0k3p(Ai(I))i=0k3p(Ai(I~))i=0k1p(Ai(I~))2ε
=p(𝙶𝚁𝙴𝙴𝙳𝚈(I~)S(I~))2ε.

In both cases, p({(p,w)S(I)p/we~k2})p(𝙶𝚁𝙴𝙴𝙳𝚈(I~)S(I~))3ε. Using the assumption that I~ contains L(I), we obtain the following lower bound on p(C):

p(C) =p({(p,w)L(I)p/wpj/wj})+p({(p,w)S(I)p/we~k2})
p(𝙶𝚁𝙴𝙴𝙳𝚈(I~)S(I~))+p(𝙶𝚁𝙴𝙴𝙳𝚈(I~)L(I~))3ε
=p(𝙶𝚁𝙴𝙴𝙳𝚈(I~))3ε
1/2(p(OPT(I~)))3ε
1/2(p(OPT(I))5ε)3ε
1/2p(OPT(I))6ε.

Therefore, C is a (1/2,6ε)-approximation solution of I.

Lemma 25 (Consistency).

𝙻𝙲𝙰𝙺𝙿 provides query access consistent to a (1/2,6ε)-approximation solution of I with probability 1ε.

Proof.

The output of 𝙻𝙲𝙰𝙺𝙿 is determined by I~ only. Thus, as long as two runs of 𝙻𝙲𝙰𝙺𝙿 constructs the same I~, their answer for any index i will be the same. Every run satisfying the following conditions constructs the same instance I~:

  1. 1.

    all items in L(I) are included in R,

  2. 2.

    |E| is at least the sample complexity of 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎,

  3. 3.

    the approximate quantiles returned by 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎 are both reproducible and within error τ.

We consider the failure probability of Condition (1). By Lemma 18, all items in L(I) can be gathered by taking m=O(1ε4log1ε) samples with probability 1ε/3. Therefore, the failure probability of Condition (1) is at most ε/3. Conditioned on success of (1), he failure probability of Condition (2) is at most ε/3 as proved in Lemma 22.

Conditioned on the success of (2), we consider Condition (3). Fix k and internal randomness r, let E1,E2 be two independent samples given to 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β on two distinct runs of 𝙻𝙲𝙰𝙺𝙿. By the definition of reproducibility in [9],

PrE1,E2,r[𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E1,k)=𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E2,k)]1ρ.

Let Dk be the set of all possible output of 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E,k). Note that Dk is finite since it shares the same domain with 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E,k). Since E1 and E2 are two independent samples draw from S(I)G(I), we have:

PrE1,E2,r[𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E1,k)=𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E2,k)]
= xDk(PrES(I)G(I)[𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E,k)=x])2.

Let ekDk such that for any xDk,

PrES(I)G(I)[𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E,k)=ek]PrES(I)G(I)[𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E,k)=x].

Let y=PrES(I)G(I)[𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E,k)=ek], since
xDk(PrES(I)G(I)[𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E,k)=x])=1. Therefore,

y21/yxDk(PrES(I)G(I)[𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E,k)=x])2=1ρ,

and so PrE,r[𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β(E,k)=ek]1ρ. The probability that ek is not a τ-approximation quantile is at most β. By union bound 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β outputs some value ek and a τ-approximation (1kq)-quantile, except with probability ρ+β. Moreover, 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎ρ,d,τ,β is called t times where t=q1q1=1p(L(I))ε+ε2/2ε1. By union bound, all calls to 𝚛𝚀𝚞𝚊𝚗𝚝𝚒𝚕𝚎 are consistent to some τ-approximation quantile sequence e1,e2,,et except with probability ε1(ρ+β)=ε/12.

By a union bound, all three conditions succeed except with probability ε/3+ε/3+ε/12<ε. By Lemma 24, 𝙻𝙲𝙰𝙺𝙿 gives query access consistent to a (1/2,6ε)-approximation solution of I with probability 1ε.

Lemma 26 (Sample Complexity).

The sample complexity of 𝙻𝙲𝙰𝙺𝙿 is (1/ε)O(logn).

Proof.

𝙻𝙲𝙰𝙺𝙿 requires two samples to compute: Q and R. On line 4, the algorithm checks if 1p(L(I~))ε, so |Q|3nrq2ε. As a result,

|R|+|Q| =O(1ε4log1ε)+32εO~((1τ2(ρβ)2)(12τ2)log|𝒳|+1)
=O~(ε7(ε6)O(logn))
=(1/ε)O(logn),

as desired. Combining Lemma 25 and Lemma 26 establishes Theorem 17.

5 Discussion and Future Work

The obvious question left open by our work is whether the exp(O(logn)) dependence on the instance size can be improved, or removed altogether. Doing so would require an entirely different approach to deal with the consistency issue, as the reproducible median algorithm does require a dependence (albeit very mild) on the domain size. Another parameter of interest is the probability of failure of the LCA, here set to δ=ε: ideally, one would keep δ as a free parameter, with a runtime growing as polylog(1/δ): unfortunately, the use of replicable algorithms appears to require a polynomial dependence in 1/δ, so achieving this would require new ideas and techniques. Relatedly, it would be interesting to improve the approximation ratio to obtain an 1±ε approximation, for arbitrary constant ε>0.

The connection we made between LCAs and to reproducible algorithms was key to our main algorithmic result. We believe that exploring further this interplay, and how insights and techniques from reproducible learning algorithms can be used to design new or better LCAs for a variety of tasks, would be an fruitful direction of study.

Another, orthogonal direction of research would be to see if allowing a very small amount of memory (i.e., polylogarithmic in the input size) between computations, as allowed in the original definition of LCAs of [15], would enable significantly better algorithms for Knapsack – or even bypass our impossibility results entirely. If so, it would be interesting to characterize the minimum amount of local memory sufficient for this purpose.

Finally, as mentioned in the introduction, it would be interesting to see if the relaxation to average-case local computation, as introduced in [4], would lead to either faster LCAs for Knapsack with weighted sampling access, or help circumvent our impossibility results absent this type of access.

References

  • [1] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In STOC, pages 20–29. ACM, 1996. doi:10.1145/237814.237823.
  • [2] Noga Alon, Ronitt Rubinfeld, Shai Vardi, and Ning Xie. Space-efficient local computation algorithms. In SODA, pages 1132–1139. SIAM, 2012. doi:10.1137/1.9781611973099.89.
  • [3] Soheil Behnezhad, Mohammad Roghani, and Aviad Rubinstein. Local computation algorithms for maximum matching: New lower bounds. In FOCS, pages 2322–2335. IEEE, 2023. doi:10.1109/FOCS57990.2023.00143.
  • [4] Amartya Shankha Biswas, Ruidi Cao, Edward Pyne, and Ronitt Rubinfeld. Average-case local computation algorithms. CoRR, abs/2403.00129, 2024. doi:10.48550/arXiv.2403.00129.
  • [5] Harry Buhrman and Ronald de Wolf. Complexity measures and decision tree complexity: a survey. Theor. Comput. Sci., 288(1):21–43, 2002. doi:10.1016/S0304-3975(01)00144-X.
  • [6] Mohsen Ghaffari. Local computation of maximal independent set. In FOCS, pages 438–449. IEEE, 2022. doi:10.1109/FOCS54457.2022.00049.
  • [7] Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. In FOCS, pages 339–348. IEEE Computer Society, 1996. doi:10.1109/SFCS.1996.548493.
  • [8] Christoph Grunau, Slobodan Mitrovic, Ronitt Rubinfeld, and Ali Vakilian. Improved local computation algorithm for set cover via sparsification. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 2993–3011. SIAM, 2020. doi:10.1137/1.9781611975994.181.
  • [9] Russell Impagliazzo, Rex Lei, Toniann Pitassi, and Jessica Sorrell. Reproducibility in learning. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 818–831, 2022. doi:10.1145/3519935.3519973.
  • [10] Hiro Ito, Susumu Kiyoshima, and Yuichi Yoshida. Constant-time approximation algorithms for the knapsack problem. In Theory and Applications of Models of Computation: 9th Annual Conference, TAMC 2012, Beijing, China, May 16-21, 2012. Proceedings 9, pages 131–142. Springer, 2012. doi:10.1007/978-3-642-29952-0_17.
  • [11] Reut Levi, Ronitt Rubinfeld, and Anak Yodpinyanee. Local computation algorithms for graphs of non-constant degrees. Algorithmica, 77(4):971–994, 2017. doi:10.1007/S00453-016-0126-Y.
  • [12] Yishay Mansour, Aviad Rubinstein, Shai Vardi, and Ning Xie. Converting online algorithms to local computation algorithms. In Automata, Languages, and Programming: 39th International Colloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part I 39, pages 653–664. Springer, 2012. doi:10.1007/978-3-642-31594-7_55.
  • [13] Huy N Nguyen and Krzysztof Onak. Constant-time approximation algorithms via local improvements. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 327–336. IEEE, 2008. doi:10.1109/FOCS.2008.81.
  • [14] Michal Parnas and Dana Ron. Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms. Theoretical Computer Science, 381(1-3):183–196, 2007. doi:10.1016/J.TCS.2007.04.040.
  • [15] Ronitt Rubinfeld, Gil Tamir, Shai Vardi, and Ning Xie. Fast local computation algorithms. In Bernard Chazelle, editor, Innovations in Computer Science - ICS 2011, Tsinghua University, Beijing, China, January 7-9, 2011. Proceedings, pages 223–238. Tsinghua University Press, 2011. URL: http://conference.iiis.tsinghua.edu.cn/ICS2011/content/papers/36.html.
  • [16] David P Williamson and David B Shmoys. The design of approximation algorithms. Cambridge university press, 2011.
  • [17] Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 222–227, 1977. doi:10.1109/SFCS.1977.24.
  • [18] Yuichi Yoshida, Masaki Yamamoto, and Hiro Ito. Improved constant-time approximation algorithms for maximum matchings and other optimization problems. SIAM Journal on Computing, 41(4):1074–1093, 2012. doi:10.1137/110828691.