Abstract 1 Introduction 2 A Summary of Jin and Wu’s Algorithm 3 Our Improvement for Pigeonhole Equal Subset Sum 4 A Polynomial-Space Algorithm 5 Open Questions References Appendix A Proof of Lemma 4 by Jin and Wu

New Algorithms for Pigeonhole Equal Subset Sum

Ce Jin ORCID MIT, Cambridge, MA, USA Ryan Williams ORCID MIT, Cambridge, MA, USA Stan Zhang MIT, Cambridge, MA, USA
Abstract

We study the Pigeonhole Equal Subset Sum problem, which is a total-search variant of the Subset Sum problem introduced by Papadimitriou (1994): we are given a set of n positive integers {w1,,wn} with the additional restriction that i=1nwi<2n1, and want to find two different subsets A,B[n] such that iAwi=iBwi.

Very recently, Jin and Wu (ICALP 2024) gave a randomized algorithm solving Pigeonhole Equal Subset Sum in O(20.4n) time, beating the classical meet-in-the-middle algorithm with O(2n/2) runtime. In this paper, we refine Jin and Wu’s techniques to improve the runtime even further to O(2n/3).

Keywords and phrases:
pigeonhole principle, subset sums
Funding:
Ce Jin: Supported by the Jane Street Graduate Research Fellowship, NSF grant CCF-2330048, and a Simons Investigator Award.
Ryan Williams: Work supported in part by NSF CCF-2420092.
Stan Zhang: Self Funded
Copyright and License:
[Uncaptioned image] © Ce Jin, Ryan Williams, and Stan Zhang; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithms
Related Version:
This paper is based on Stan Zhang’s Master’s thesis.
Editors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz Herman

1 Introduction

Subset Sum is a fundamental NP-hard problem: given positive integers w1,w2,,wn and a target integer t, we want to find a subset A[n] such that iAwi=t.111Throughout this paper, we write [n]={1,,n}. Subset Sum has a simple meet-in-the-middle algorithm in about O(2n/2) time, given by Horowitz and Sahni in 1974 [13]. There has been a long line of research attempting to improve that meet-in-the-middle running time for Subset Sum [4, 5, 22, 3, 14, 8, 9, 12], but it remains open whether there exists an algorithm running in O(2(1/2ε)n) time for any constant ε>0. The fastest known algorithm for Subset Sum only has a poly(n)-factor improvement over O(2n/2) [12].

Despite the lack of progress on the time complexity of Subset Sum, there has been significant progress for solving variants of the problem. Examples include Equal Subset Sum [16], 2-Subset Sum and Shifted Sums [2], more general subset balancing problems [11], algorithms solving either Equal Subset Sum or Subset Sum [20], Merlin–Arthur protocols for Subset Sum [17, 1], Subset Sum in low space [7, 18], and quantum algorithms for Subset Sum [2]. There is also a large body of works on pseudo-polynomial-time algorithms and approximation schemes for Subset Sum and related problems (see e.g., [10] and the references therein). In this work, we focus on exact exponential-time algorithms.

An important variant of Subset Sum is the Equal Subset Sum problem:

Equal Subset Sum [24] Input: integers w1,w2,,wn where |wi|2poly(n) Output: two different subsets A,B[n] such that iAwi=iBwi (if exist).

Woeginger and Yu [24] first studied the Equal Subset Sum problem and showed that it is NP-Complete, similarly to Subset Sum. Since we can assume the solution satisfies AB= (otherwise, we can return (AB,BA) instead), a standard meet-in-the-middle approach can solve Equal Subset Sum in O(3n/2)O(1.7321n) time. Mucha, Nederlof, Pawlewicz, and Węgrzycki [16] gave a faster O(1.7088n)-time algorithm in 2019, answering an open question of [23].222Throughout this paper, we use O(),Ω() to hide poly(n) factors, where n is the number of input integers.

A special case of the Equal Subset Sum problem is the Pigeonhole Equal Subset Sum problem (also sometimes called Pigeonhole Equal Sums). This problem is the focus of our paper.

Pigeonhole Equal Subset Sum [19] Input: positive integers w1,w2,,wn, with promise i=1nwi<2n1. Output: two different subsets A,B[n] such that iAwi=iBwi.

Pigeonhole Equal Subset Sum is a total search problem, in that there always exists a solution; our goal is merely to find one. To see why this is true, note that there are 2n subsets S[n], which can only attain 2n1 possible subset sums iSwi{0,1,,2n2} due to the promise i=1nwi<2n1, so by the pigeonhole principle there must exist a pair of subsets with the same subset sum.

The Pigeonhole Equal Subset Sum problem was introduced by Papadimitriou in 1994 [19], and has been studied in the TFNP literature [6, 21]. It belongs to the total search complexity class PPP, and is conjectured to be PPP-complete [19].

Pigeonhole Equal Subset Sum can be solved faster than the best known Equal Subset Sum algorithm. Until very recently, the fastest known algorithm for Pigeonhole Equal Subset Sum had time complexity O(2n/2), either by a simple binary search combined with the classical meet-in-the-middle approach by Horowitz and Sahni [13] (see a short description in [15, Section 2]), or by a mod-p dynamic programming algorithm by Allcock, Hamoudi, Joux, Klingelhöfer, and Santha [2, Theorem 6.2]. Recently, a major advance by Jin and Wu [15] gave a randomized algorithm for Pigeonhole Equal Subset Sum running in O(20.4n) time. Roughly speaking, the key idea of Jin and Wu was to show that a Pigeonhole Equal Subset Sum instance with very few solutions must satisfy a certain structural property: namely, the input numbers should resemble the geometric progression 1,2,4,,2n1 (see the formal statement in Lemma 4).

1.1 Our contribution

In this paper, we improve the time complexity of Jin and Wu [15]’s Pigeonhole Equal Subset Sum algorithm from O(20.4n) to the clean bound O(2n/3):

Theorem 1 (Main).

Pigeonhole Equal Subset Sum can be solved by a randomized algorithm in O(2n/3) time.

Jin and Wu used similar techniques to give a polynomial-space algorithm for Pigeonhole Equal Subset Sum in O(20.75n) time, improving the straightforward O(2n)-time polynomial-space algorithm based on binary search and meet-in-the-middle. We show that their polynomial-space algorithm can also be further improved:

Theorem 2.

Pigeonhole Equal Subset Sum can be solved by a randomized algorithm in O(22n/3) time and poly(n) space.

Our improved algorithms are based on the same structural property (see the formal statement in Lemma 4) proved by Jin and Wu [15], but we exploit it in a more efficient way. Jin and Wu [15] only exploited the property in the case when there are few solutions; if there are many solutions, they simply switched to a subsampling approach to try to hit a solution randomly, without using the structural property. In contrast, we make use of the structural property even in the case where there are many solutions, thereby improving the overall time complexity.

The rest of the paper is organized as follows: In Section 2, we give an overview of Jin and Wu’s algorithm, describe their aforementioned structural property precisely, and extract a few lemmas from their work which are useful to us. Then in Section 3, we present our improved algorithm for Pigeonhole Equal Subset Sum, proving Theorem 1. In Section 4 we prove Theorem 2. Finally, we discuss some future research directions in Section 5.

2 A Summary of Jin and Wu’s Algorithm

Throughout this paper, we use w1,,wn to denote the input integers, and we use the shorthand w(A)=iAwi for the subset sum attained by the subset of indices A[n].

For an Equal Subset Sum instance (w1,,wn) (not necessarily satisfying the pigeonhole promise), Jin and Wu [15] considered the parameter F(w1,,wn) defined as follows:

Definition 3 (ft() and F()).

For w1,,wn and t, let

ft(w1,,wn)|{S[n]:w(S)=t}|,

be the number of ways to achieve subset sum t. When w1,,wn are clear from the context, we abbreviate ft=ft(w1,,wn).

Then, define

F(w1,,wn)tmax{0,ft1}.

Observe that F(w1,,wn){0,1,,2n1}.

Note that an Equal Subset Sum instance (w1,,wn) has a solution A,B[n],AB,w(A)=w(B) if and only if F(w1,,wn)1.

Suppose the Pigeonhole Equal Subset Sum input instance consists of positive integers w1<w2<<wn (assuming no trivial solution wi=wj,ij exists) where w([n])<2n1 (recall that this upper bound on the sum of all weights is the promise given by Pigeonhole Equal Subset Sum). Jin and Wu’s key structural property relies on the following extra condition [15, Equation (1)]:

w([i])2i1, for all i[n1]. (1)

That is, Equation 1 says that the total weight of the set {1,,i} is at least 2i1. We may assume without loss of generality that Equation 1 holds. To see why, note that if w([i])<2i1 for some in1, then {w1,,wi} is also a Pigeonhole Equal Subset Sum instance. We can solve this strictly smaller instance instead, and obtain a solution A,B[i][n],AB,w(A)=w(B). Hence, we may assume such i does not exist, so Equation 1 holds.

Using Equation 1, Jin and Wu [15] proved a key structural lemma for Pigeonhole Equal Subset Sum instances (w1,,wn). Their lemma says that if F(w1,,wn) is small, then the sequence w1,,wn is close to the geometric progression 1,2,4,,2n1 with small additive error.

Lemma 4 ([15, Equation (8)]).

Suppose positive integers w1<w2<<wn satisfy w([n])<2n1 and Equation 1. Then, for all i[n],

iF(w1,,wn)wi2i1F(w1,,wn). (2)

Lemma 4 will be used in our improved algorithm as well. For completeness, in Appendix A we include a short proof of Lemma 4 given by Jin and Wu [15]. Jin and Wu gave two different algorithms, which we summarize below in Lemma 5 and Lemma 6. The first algorithm is suitable when the Pigeonhole Equal Subset Sum instance (w1,,wn) has small F(w1,,wn). It was obtained by using Lemma 4 to speed up the standard binary search and meet-in-the-middle approach:

Lemma 5 ([15, Lemma 4]).

Given parameter 1Δ2n/(3n2), a Pigeonhole Equal Subset Sum instance (w1,,wn) (where wi<wi+1) satisfying F(w1,,wn)Δ and Equation 1 can be solved deterministically in O(Δ) time.

The second algorithm of Jin and Wu works well when F(w1,,wn) is large. The intuition is that, in this case there are many solutions, so one can perform subsampling and still expect at least one solution to survive. Jin and Wu used this subsampling idea to speed up the mod-p dynamic programming algorithm (which had previously been used for Subset Sum and related problems, e.g., [2, 4]):

Lemma 6 ([15, Lemma 5]).

Given parameter 2n/2Δ<2n, an Equal Subset Sum instance (w1,,wn) with F(w1,,wn)Δ can be solved in O((22n/Δ)1/3) time by a randomized algorithm.

We remark that originally [15, Lemma 5] was stated for Pigeonhole Equal Subset Sum only. However, upon inspecting their proof, one can easily verify that the pigeonhole promise was never used, so that the algorithm actually works for Equal Subset Sum as well. This point turns out to be useful for our improvement later.

Jin and Wu’s final algorithm for Pigeonhole Equal Subset Sum combines Lemma 5 and Lemma 6, and the overall runtime is balanced at the bottleneck case Δ20.8n, leading to an overall worst case runtime of O(20.4n).

In Jin and Wu’s algorithm, Lemma 4 was used in developing the algorithm of Lemma 5 (the algorithm for the small F(w1,,wn) case), but was ignored in the case where F(w1,,wn) is large. In our improved algorithm, we show how the structural property of Lemma 4 can be exploited even when F(w1,,wn) is large. This will allow us to reduce the original Pigeonhole Equal Subset Sum instance to a small number of Equal Subset Sum instances, which have sizes much smaller than n (and still have large F-values).

3 Our Improvement for Pigeonhole Equal Subset Sum

In this section we prove our main Theorem 1. Let w1<w2<<wn be the input Pigeonhole Equal Subset Sum instance. Throughout, let k{0,1,,n1} be such that

2kF(w1,,wn)<2k+1. (3)

Our algorithm does not know the value of k in advance; instead, it tries all n possible values of k in parallel, and terminates as soon as any one of them succeeds. This only increases the runtime by an n factor. Hence, in the following we assume that the value of k is known.

We will separately consider the input items in [n][k]={k+1,,n} and in [k]. At a high level, our algorithm performs the following two steps:

  1. 1.

    Guess a suitable pair of subsets X,Y[n][k].

  2. 2.

    Find subsets A0,B0[k] such that w(A0)w(B0)=w(Y)w(X), i.e., (A0X,B0Y) is a solution for Pigeonhole Equal Subset Sum.

For the first step, we will use Lemma 4 and Equation 3 to show that there are only poly(n) many pairs of X,Y[n][k] that we need to consider. For the second step, we will reduce the task of finding A0,B0 to a smaller Equal Subset Sum instance with (k+1) input integers and large F-value, which can be solved by Lemma 6. (In comparison, Jin and Wu’s original algorithm applied Lemma 6 to solve Equal Subset Sum on n integers.)

Now we define the close pairs, which are the pairs of subsets (X,Y) that we need to consider for the first step:

Definition 7 (Close pairs).

Define the set of close pairs as

𝒞{(X,Y):X,Y[n][k] and |w(X)w(Y)|(k+1)2k+1}.

Define the set of disjoint close pairs as

𝒟{(X,Y)𝒞:XY=}.

By the identity w(X)w(Y)=w(XY)w(YX), we have

𝒟={(XY,YX):(X,Y)𝒞}. (4)

The following simple lemma explains why it is sufficient to only consider close pairs:

Lemma 8.

If A,B[n] and w(A)=w(B), then (A[k],B[k])𝒞.

Proof.

Since w(A[k])+w(A[k])=w(A)=w(B)=w(B[k])+w(B[k]), we have

|w(A[k])w(B[k])|=|w(B[k])w(A[k])|w([k]).

By the upper bounds in Lemma 4 and Equation 3, we have

w([k])i=1k(2i1+F(w1,,wn))<(k+1)2k+1.

The claim then follows from the definition of 𝒞.

Now we bound the number of close pairs and disjoint close pairs:

Lemma 9.

We have |𝒟|200n5 and |𝒞|200n52nk. Moreover, 𝒟 can be computed in poly(n) time.

Proof.

First, we prove the claimed upper bound on |𝒟|. For X[n][k], denote w~(X)iX2i1. By Lemma 4 and Equation 3, for all i[n],

|wi2i1|iF(w1,,wn)n2k+1.

Hence, |w~(X)w(X)|n22k+1 for all X[n][k]. Therefore, for all (X,Y)𝒟, we have |w~(X)w~(Y)||w(X)w(Y)|+2n22k+1(k+1)2k+1+2n22k+16n22k. Now it remains to characterize all members of 𝒟 defined as

𝒟{(X,Y)2[n][k]×2[n][k]:XY= and |w~(X)w~(Y)|6n22k}𝒟.

Note that (,)𝒟. Now consider any (X,Y)𝒟 not equal to (,), and let i be the maximum element in XY. By symmetry, we assume iX without loss of generality, and thus Y[i1][k]. Then,

w~(X)w~(Y) =(2i1+jX{i}2j1)(j=k+1i12j1j{k+1,,i1}Y2j1)
=2k+jX{i}2j1+j{k+1,,i1}Y2j10.

On the other hand, w~(X)w~(Y)6n22k by definition of 𝒟. Hence, for every j[i1][k] with 2j16n22k, we must have jY and jX (otherwise, 2j1 would appear as a summand above, violating w~(X)w~(Y)6n22k). Then, to fully determine (X,Y), it only remains to decide for each integer k+1j<log2(6n22k)+1 whether j belongs to X, Y, or neither. There are at most 3log2(6n22k)+1k=3log2(6n2)+1<60n4 possible choices in total. Since there are up to n choices of i and two choices for whether i belongs to X or Y, we conclude that |𝒟|1+2n60n4<200n5.

The above argument can be easily converted into an algorithm that computes the set 𝒟 in poly(n) time. Since 𝒟𝒟, we have |𝒟|200n5 as claimed, and we can also compute 𝒟 in poly(n) time by checking every pair in 𝒟 for containment in 𝒞.

Now we bound |𝒞|. Recall from Equation 4 that every (X,Y)𝒞2[n][k]×2[n][k] can be mapped to (XY,YX)𝒟, and note that every (X,Y)𝒟 can only have at most 2nk pre-images in 𝒞 under this mapping. Hence, |𝒞|2nk|𝒟|200n52nk.

The following lemma allows us to reduce the original problem to Equal Subset Sum instances on k or k+1 integers with large F-value:

Lemma 10.

For sets X,Y[n][k], define the k- or (k+1)-tuple WX,Y as

WX,Y{(w1,,wk) if X=Y,(w1,,wk,w(X)w(Y)) if XY. (5)

Then, there must exist (X,Y)𝒟, such that F(WX,Y)F(w1,,wn)/|𝒞|.333Note that in our definition of WX,Y for (X,Y)𝒟, the first case X=Y happens only if X=Y=. However we later will consider WX,Y where (X,Y) may not belong to 𝒟.

Before proving Footnote˜3, we first use it to finish the proof of our main theorem:

Proof of Theorem 1 (assuming Footnote˜3).

As mentioned in the beginning of this section, we can assume the integer k defined by Equation 3 is known. We assume k satisfies

2k200n522n/3, (6)

since otherwise we can already use Lemma 5 to solve the Pigeonhole Equal Subset Sum instance (w1,,wn) in O(2k+1)=O(2n/3) time as required.

Compute 𝒟 with |𝒟|poly(n) by Lemma 9 in poly(n) time. Enumerate each (X,Y)𝒟, and then solve the Equal Subset Sum instance WX,Y (defined in Equation 5) using Lemma 6. If the algorithm in Lemma 6 successfully finds a solution (U,V),UV in the instance WX,Y (where we can assume UV= without loss of generality), then we distinguish between two cases:

  • Case where U,V[k]:

    Then, since w(U)=w(V) and UV, we can return (U,V) as a solution for the original Pigeonhole Equal Subset Sum instance w1,,wn.

  • Case where k+1UV:

    This can only happen in the XY case, where WX,Y contains the extra (k+1)-th element w(X)w(Y). Without loss of generality assume k+1U,k+1V. Then, define A=(U{k+1})X and B=VY, both of which are subsets of [n]. Note that AB due to XY. Then we have w(A)=w(U{k+1})+w(X)=(w(V)(w(X)w(Y)))+w(X)=w(V)+w(Y)=w(B). So we can return (A,B) as a solution for the original Pigeonhole Equal Subset Sum instance w1,,wn.

Now we analyze the runtime. By Footnote˜3, for at least one (X,Y)𝒟, F(WX,Y)F(w1,,wn)/|𝒞|(200n5)122knΔ (where we used Lemma 9 and Equation 3). Then, applying Lemma 6 to solve WX,Y takes time O((22(k+1)/Δ)1/3)=O(2n/3). Note that the precondition Δ2(k+1)/2 required by Lemma 6 is satisfied due to Equation 6. We can try all |𝒟|poly(n) possibilities for the pair (X,Y) in parallel and run the algorithm of Lemma 6 until the earliest of them succeeds. The total time complexity is still O(2n/3).

It remains to prove Footnote˜3.

Proof of Footnote˜3.

We will show that there exists a pair (X,Y)𝒞 satisfying the desired bound F(WX,Y)F(w1,,wn)/|𝒞|. Once this is shown, by w(X)w(Y)=w(XY)w(YX) we have WXY,YX=WX,Y, so the pair (XY,YX)𝒟 also satisfies the desired bound, which would finish the proof.

We start by rewriting F(WX,Y) as follows:

Claim 11.

For sets X,Y[n][k] and integer t, define

h(X,t)|{S[n]:S[k]=X and w(S)=t}|,

which is the number of ways to extend X into [n] to achieve a sum of t, and define

g(X,Y,t){max{h(X,t)+h(Y,t)1,0}if XY,max{h(X,t)1,0}if X=Y.

Then, F(WX,Y)=tg(X,Y,t).

Proof of Claim 11.

By definition of h(X,t), observe that h(X,t)=ftw(X)(w1,,wk) (where ft() was defined in Definition 3). Hence,

F(WX,X) =tmax{ft(w1,,wk)1,0}
=tmax{ftw(X)(w1,,wk)1,0}
=tmax{h(X,t)1,0}.

This finishes the proof for the X=Y case. Now observe that

h(X,t)+h(Y,t) =ftw(X)(w1,,wk)+ftw(Y)(w1,,wk)
=ftw(Y)(w1,,wk,w(X)w(Y)),

where the last equality can be seen from separately considering subsets of {w1,,wk,w(X)w(Y)} which include w(X)w(Y) and those which do not. Therefore, if XY, then we similarly have

F(WX,Y) =tmax{ft(w1,,wk,w(X)w(Y))1,0}
=tmax{h(X,t)+h(Y,t)1,0}

as desired.

Now we relate F(w1,,wn)=tmax{ft(w1,,wn)1,0} to the expressions from Claim 11. For any t with ft(w1,,wn)1, fix an arbitrary At[n] with w(At)=t, and let XtAt[k]. Note that h(Xt,t)1. Then, by Lemma 8, every Y[n][k] with h(Y,t)1 must satisfy (Xt,Y)𝒞. Hence,

ft(w1,,wn)1 =1+Y[n][k]:h(Y,t)1h(Y,t) (by definition of ft() and h())
1+Y:(Xt,Y)𝒞h(Y,t)
=(h(Xt,t)1)+Y:(Xt,Y)𝒞,YXth(Y,t)
(h(Xt,t)1)+Y:(Xt,Y)𝒞,YXt(h(Xt,t)+h(Y,t)1)
Y:(Xt,Y)𝒞g(Xt,Y,t)
(X,Y)𝒞g(X,Y,t).

This means that for all t,

max{ft(w1,,wn)1,0}(X,Y)𝒞g(X,Y,t).

By summing over all t and substituting using Claim 11, we get

F(w1,,wn)(X,Y)𝒞F(WX,Y).

Then, by averaging, there exists (X,Y)𝒞 such that F(WX,Y)F(w1,,wn)/|𝒞| as claimed.

4 A Polynomial-Space Algorithm

In this section we prove Theorem 2. We use the following two lemmas from Jin and Wu [15], which can be viewed as polynomial-space versions of Lemma 5 and Lemma 6 respectively:

Lemma 12 ([15, Lemma 11]).

Given parameter 1Δ2n/(3n2), a Pigeonhole Equal Subset Sum instance (w1,,wn) (where wi<wi+1) satisfying F(w1,,wn)Δ and Equation 1 can be solved deterministically in poly(n) space and O(Δ) time.

Lemma 13 ([15, Lemma 13]).

Given parameter 1Δ2n, an Equal Subset Sum instance (w1,,wn) with F(w1,,wn)Δ can be solved in O(21.5n/Δ) time and poly(n) space by a randomized algorithm.

Again, although [15, Lemma 13] was originally stated for the Pigeonhole Equal Subset Sum problem, it actually works for Equal Subset Sum as well because the proof does not use the pigeonhole promise.

Proof sketch of Theorem 2.

We follow the same proof outline of Theorem 1, except that we replace the exponential-space subroutines, Lemma 5 and Lemma 6, by their polynomial-space counterparts, Lemma 12 and Lemma 13, respectively. The resulting algorithm is correct and runs in polynomial space. It only remains to re-analyze the time complexity.

Recall that we can assume the integer k{0,1,,n1} satisfying 2kF(w1,,wn)<2k+1 (Equation 3) is known. Let M[1,2n/(3n2)] be determined later. If 2kM, we choose to use Lemma 12 and solve the given Pigeonhole Equal Subset Sum instance in O(2k+1)=O(M) time. If 2k>M, we follow the main proof of Theorem 1 to reduce the original problem to poly(n) many Equal Subset Sum instances of size k+1 (or k) in which at least one instance has F-value Ω(22kn), so the total time complexity of applying Lemma 13 becomes O(21.5k/22kn)=O(2n0.5k)O(2n/M). By choosing M=22n/3, the overall time complexity becomes O(M+2n/M)=O(22n/3).

5 Open Questions

We list a few open questions related to this work.

  • Can we improve the O(2n/3) time complexity for Pigeonhole Equal Subset Sum? Can we improve the O(1.7088n) time complexity for Equal Subset Sum [16]?

  • Can we obtain fast deterministic algorithms for Pigeonhole Equal Subset Sum? Our algorithm (and Jin and Wu’s algorithm [15]) uses Lemma 6 which is based on random subsampling. To the best of our knowledge, the O(2n/2)-time algorithms via meet-in-the-middle ([2] or [15, Section 2]) remain the fastest known deterministic algorithms.

  • The authors of [2] proposed a more difficult Modular version of Pigeonhole Equal Subset Sum problem, and obtained a O(2n/2)-time algorithm. In this problem, we are given integers w1,,wn and a modulus m2n1, and need to find two distinct subsets A,B[n] such that iAwiiBwi(modm). Can we improve the O(2n/2) time complexity for Modular Pigeonhole Equal Subset Sum?

  • The k-SUM problem can be viewed as the fine-grained or parameterized version of the Subset Sum problem. Analogously, we propose the following Pigeonhole Equal k-SUM problem (for fixed integers k2):

    Pigeonhole Equal k-SUM [19] Input: k length-n arrays A1,,Ak of integers from the range [0,nk1k). Output: two different k-tuples (i1,,ik),(j1,,jk)[n]k such that A1[i1]++Ak[ik]=A1[j1]++Ak[jk].

    This is a total search problem, and the brute force algorithm runs in O(nk) time. It is easy to improve it to O~(nk/2) time, using the standard binary search with meet-in-the-middle (following the same idea as the warm-up Pigeonhole Equal Subset Sum algorithm described in [15, Section 2], except that we replace the O(2n/2)-time subroutine for counting Subset Sum solutions by the analogous O~(nk/2)-time algorithm for k-SUM). Can we improve the runtime to faster than O~(nk/2), say for the k=3 case?

References

  • [1] Shyan Akmal, Lijie Chen, Ce Jin, Malvika Raj, and R. Ryan Williams. Improved merlin-arthur protocols for central problems in fine-grained complexity. Algorithmica, 85(8):2395–2426, 2023. doi:10.1007/S00453-023-01102-6.
  • [2] Jonathan Allcock, Yassine Hamoudi, Antoine Joux, Felix Klingelhöfer, and Miklos Santha. Classical and quantum algorithms for variants of subset-sum via dynamic programming. In 30th Annual European Symposium on Algorithms, ESA 2022, September 5-9, 2022, Berlin/Potsdam, Germany, volume 244 of LIPIcs, pages 6:1–6:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.ESA.2022.6.
  • [3] Per Austrin, Petteri Kaski, Mikko Koivisto, and Jesper Nederlof. Subset sum in the absence of concentration. In 32nd International Symposium on Theoretical Aspects of Computer Science, STACS 2015, March 4-7, 2015, Garching, Germany, volume 30 of LIPIcs, pages 48–61. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2015. doi:10.4230/LIPIcs.STACS.2015.48.
  • [4] Per Austrin, Petteri Kaski, Mikko Koivisto, and Jesper Nederlof. Dense subset sum may be the hardest. In Proceedings of the 33rd Symposium on Theoretical Aspects of Computer Science (STACS), volume 47 of LIPIcs, pages 13:1–13:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPIcs.STACS.2016.13.
  • [5] Per Austrin, Petteri Kaski, Mikko Koivisto, and Jesper Nederlof. Sharper upper bounds for unbalanced uniquely decodable code pairs. IEEE Trans. Inf. Theory, 64(2):1368–1373, 2018. doi:10.1109/TIT.2017.2688378.
  • [6] Frank Ban, Kamal Jain, Christos H. Papadimitriou, Christos-Alexandros Psomas, and Aviad Rubinstein. Reductions in PPP. Inf. Process. Lett., 145:48–52, 2019. doi:10.1016/j.ipl.2018.12.009.
  • [7] Nikhil Bansal, Shashwat Garg, Jesper Nederlof, and Nikhil Vyas. Faster space-efficient algorithms for subset sum, k-sum, and related problems. SIAM J. Comput., 47(5):1755–1777, 2018. doi:10.1137/17M1158203.
  • [8] Anja Becker, Jean-Sébastien Coron, and Antoine Joux. Improved generic algorithms for hard knapsacks. In Advances in Cryptology – EUROCRYPT 2011 – 30th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Tallinn, Estonia, May 15-19, 2011. Proceedings, volume 6632 of Lecture Notes in Computer Science, pages 364–385. Springer, 2011. doi:10.1007/978-3-642-20465-4_21.
  • [9] Xavier Bonnetain, Rémi Bricout, André Schrottenloher, and Yixin Shen. Improved classical and quantum algorithms for subset-sum. In Advances in Cryptology – ASIACRYPT 2020 – 26th International Conference on the Theory and Application of Cryptology and Information Security, Daejeon, South Korea, December 7-11, 2020, Proceedings, Part II, volume 12492 of Lecture Notes in Computer Science, pages 633–666. Springer, 2020. doi:10.1007/978-3-030-64834-3_22.
  • [10] Lin Chen, Jiayi Lian, Yuchen Mao, and Guochuan Zhang. An improved pseudopolynomial time algorithm for subset sum. In 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024, pages 2202–2216. IEEE, 2024. doi:10.1109/FOCS61266.2024.00129.
  • [11] Xi Chen, Yaonan Jin, Tim Randolph, and Rocco A. Servedio. Average-case subset balancing problems. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, Virtual Conference / Alexandria, VA, USA, January 9 – 12, 2022, pages 743–778. SIAM, 2022. doi:10.1137/1.9781611977073.33.
  • [12] Xi Chen, Yaonan Jin, Tim Randolph, and Rocco A. Servedio. Subset sum in time 2n/2 / poly(n). In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2023, September 11-13, 2023, Atlanta, Georgia, USA, volume 275 of LIPIcs, pages 39:1–39:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.APPROX/RANDOM.2023.39.
  • [13] Ellis Horowitz and Sartaj Sahni. Computing partitions with applications to the knapsack problem. Journal of the ACM, 21(2):277–292, 1974. doi:10.1145/321812.321823.
  • [14] Nick Howgrave-Graham and Antoine Joux. New generic algorithms for hard knapsacks. In Advances in Cryptology – EUROCRYPT 2010, 29th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Monaco / French Riviera, May 30 – June 3, 2010. Proceedings, volume 6110 of Lecture Notes in Computer Science, pages 235–256. Springer, 2010. doi:10.1007/978-3-642-13190-5_12.
  • [15] Ce Jin and Hongxun Wu. A faster algorithm for pigeonhole equal sums. In 51st International Colloquium on Automata, Languages, and Programming, ICALP 2024, July 8-12, 2024, Tallinn, Estonia, volume 297 of LIPIcs, pages 94:1–94:11. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.ICALP.2024.94.
  • [16] Marcin Mucha, Jesper Nederlof, Jakub Pawlewicz, and Karol Węgrzycki. Equal-subset-sum faster than the meet-in-the-middle. In 27th Annual European Symposium on Algorithms, ESA 2019, September 9-11, 2019, Munich/Garching, Germany, volume 144 of LIPIcs, pages 73:1–73:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPIcs.ESA.2019.73.
  • [17] Jesper Nederlof. A short note on merlin-arthur protocols for subset sum. Inf. Process. Lett., 118:15–16, 2017. doi:10.1016/j.ipl.2016.09.002.
  • [18] Jesper Nederlof and Karol Węgrzycki. Improving Schroeppel and Shamir’s algorithm for subset sum via orthogonal vectors. In STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 1670–1683. ACM, 2021. doi:10.1145/3406325.3451024.
  • [19] Christos H. Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence. J. Comput. Syst. Sci., 48(3):498–532, 1994. doi:10.1016/S0022-0000(05)80063-7.
  • [20] Tim Randolph. Exact and Parameterized Algorithms for Subset Sum Problems. PhD thesis, Columbia University, 2024. doi:10.7916/baym-5m55.
  • [21] Katerina Sotiraki, Manolis Zampetakis, and Giorgos Zirdelis. PPP-completeness with connections to cryptography. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 148–158. IEEE Computer Society, 2018. doi:10.1109/FOCS.2018.00023.
  • [22] Henk C. A. van Tilborg. An upper bound for codes in a two-access binary erasure channel (corresp.). IEEE Trans. Inf. Theory, 24(1):112–116, 1978. doi:10.1109/TIT.1978.1055814.
  • [23] Gerhard J. Woeginger. Open problems around exact algorithms. Discret. Appl. Math., 156(3):397–405, 2008. doi:10.1016/J.DAM.2007.03.023.
  • [24] Gerhard J. Woeginger and Zhongliang Yu. On the equal-subset-sum problem. Inf. Process. Lett., 42(6):299–302, 1992. doi:10.1016/0020-0190(92)90226-L.
  • [25] Stan Zhang. Pigeonhole Equal Subset Sum in O(2n/3). Master’s thesis, Massachusetts Institute of Technology, 2024. URL: https://hdl.handle.net/1721.1/156830.

Appendix A Proof of Lemma 4 by Jin and Wu

Lemma 4 ([15, Equation (8)]). [Restated, see original statement.]

Suppose positive integers w1<w2<<wn satisfy w([n])<2n1 and Equation 1. Then, for all i[n],

iF(w1,,wn)wi2i1F(w1,,wn). (2)

The following proof of Lemma 4 was given by [15].

Proof.

(Jin and Wu [15]) Recall our notations w(A)=iAwi, ft=|{A[n]:w(A)=t}|. We abbreviate F=F(w1,,wn)=tmax{ft1,0}. Recall our assumption from Equation 1:

w([i])2i1 for all i[n1]. (1)

Since w([n])<2n1, we know ft=0 for all t[0,2n1], and 0t<2nft=2n. Then,

F=0t<2nmax{ft1,0}=0t<2n(ft𝟏[ft1])=2n0t<2n𝟏[ft1],

and thus

F=|{0t<2n:ft=0}|. (7)

Since ft=0 for all w([n])<t<2n, from Equation 7 we know F2n1w([n]), and hence w([n])2n1F. Combined with Equation 1 for i[n1], we get

w([i])2i1F for all i[n]. (8)

Now we have the following claim:

Claim 14.

For all i[n],

wi2i1+F. (9)

Summing Equation 9 over i gives

w([i])2i1+iF (10)

for all i[n].

Proof of Claim 14.

Fix i[n]. Let M be the number of subsets S[n] with w(S)<wi. Since wi<wi+1<<wn, any such S must be contained in [i1], and thus M2i1. On the other hand, M=t=0wi1ftwi|{0t<wi:ft=0}|wiF by Equation 7. Hence, wiM+F2i1+F.

Comparing Equation 8 with Equation 10 gives

wi=w([i])w([i1])(2i1F)(2i11+(i1)F)=2i1iF.

Together with Equation 9 we get the claimed bound

wi2i1[iF,F]

for all i[n].