Near-Linear Time Algorithm for n-fold ILPs via Color Coding

We study an important case of ILPs $\max\{c^Tx \ \vert\ \mathcal Ax = b, l \leq x \leq u,\, x \in \mathbb{Z}^{n t} \} $ with $n\cdot t$ variables and lower and upper bounds $\ell, u\in\mathbb Z^{nt}$. In $n$-fold ILPs non-zero entries only appear in the first $r$ rows of the matrix $\mathcal A$ and in small blocks of size $s\times t$ along the diagonal underneath. Despite this restriction many optimization problems can be expressed in this form. It is known that $n$-fold ILPs can be solved in FPT time regarding the parameters $s, r,$ and $\Delta$, where $\Delta$ is the greatest absolute value of an entry in $\mathcal A$. The state-of-the-art technique is a local search algorithm that subsequently moves in an improving direction. Both, the number of iterations and the search for such an improving direction take time $\Omega(n)$, leading to a quadratic running time in $n$. We introduce a technique based on Color Coding, which allows us to compute these improving directions in logarithmic time after a single initialization step. This leads to the first algorithm for $n$-fold ILPs with a running time that is near-linear in the number $nt$ of variables, namely $(rs\Delta)^{O(r^2s + s^2)} L^2 \cdot nt \log^{O(1)}(nt)$, where $L$ is the encoding length of the largest integer in the input. In contrast to the algorithms in recent literature, we do not need to solve the LP relaxation in order to handle unbounded variables. Instead, we give a structural lemma to introduce appropriate bounds. If, on the other hand, we are given such an LP solution, the running time can be decreased by a factor of $L$.


Introduction
Solving integer linear programs of the form max {c T x | Ax = b, x ∈ Z ≥0 } is one of the most fundamental tasks in optimization. This problem is very general and broadly applicable, but unfortunately also very hard. In this paper we consider n-fold ILPs, a class of integer linear programs with a specific block structure. This is, when non-zero entries appear only in the first r rows of A and in blocks of size s × t along the diagonal underneath. More XX:2 where A 1 , . . . , A n are r × t matrices and B 1 , . . . , B n are s × t matrices. In n-fold ILPs we also allow upper and lower bounds on the variables. Throughout the paper we subdivide a solution x into bricks of length t and denote by x (i) the i-th one. The corresponding columns in A will be called blocks. Lately, n-fold ILPs received great attention [2,7,12,14,16] and were studied intensively due to two reasons. Firstly, many optimization problems are expressible as n-fold ILPs [5,10,12,14]. Secondly, n-fold ILPs indeed can be solved much more efficiently than arbitrary ILPs [7,10,16]. The previously best algorithm has a running time of (rs∆) O(r 2 s+rs 2 ) L · (nt) 2 log 2 (n·t)+ LP and is due to Eisenbrand et al. [7]. Here LP is the running time required for solving the corresponding LP relaxation. This augmentation algorithm is the last one in a line of research, where local improvement/augmenting steps are used to converge to an optimal solution. Clever insights about the structure of the improving directions allow them to be computed fast. Nevertheless, the dependence on n in the algorithm above is still high. Indeed, in practice a quadratic running time is simply not suitable for large data sets [3,6,13]. For example when analyzing big data, large real world graphs as in telecommunication networks or DNA strings in biology, the duration of the computation would go far beyond the scope of an acceptable running time [3,6,13]. For this reason even problems which have an algorithm of quadratic running time are still studied from the viewpoint of approximation algorithms with the objective to obtain results in subquadratic time, even for the cost of a worse quality [3,6,13]. Hence, it is an intriguing question, whether the quadratic dependency on the number nt of variables can be eliminated. In this paper, we answer this question affirmatively. The technical novelty comes from a surprising area: We use a combinatorial structure called splitter, which has been used to derandomize Color Coding algorithms. It allows us to build a powerful data structure that is maintained during the local search and from which we can derive an improving direction in logarithmic time. Handling unbounded variables in an n-fold is a non-trivial issue in the previous algorithms from literature. They had to solve the corresponding LP relaxation and use proximity results. Unfortunately, it is not known whether linear programming can be solved in near-linear time in the number of variables. Hence, it is an obstacle for obtaining a near-linear running time. We manage to circumvent the necessity of solving the LP by introducing artificial bounds as a function of the finite upper bounds and the right-hand side of the n-fold.

Summary of Results
We present an algorithm, which solves n-fold ILPs in time (rs∆) O(r 2 s+s 2 ) L · nt log 4 (nt) + LP, where LP is the time to solve the LP relaxation of the n-fold. This is the first algorithm with a near-linear dependence on the number of variables. The crucial step is to speed up the computation of the improving directions. We circumvent the need for solving the LP relaxation. This leads to a purely combinatorial algorithm with running time (rs∆) O(r 2 s+s 2 ) L 2 · nt log 6 (nt).
In the running times above the dependence on the parameters, i.e., (rs∆) O(r 2 s+s 2 ) , improves on the function (rs∆) O(r 2 s+rs 2 ) in the previous best algorithms.

Outline of New Techniques
We will briefly elaborate the main technical novelty in this paper. Let x be some feasible, non-optimal solution for the n-fold. It is clear that when y * is an optimal solution for max{c T y | Ay = 0, ℓ − x ≤ y ≤ u − x, y ∈ Z nt }, then x + y * is optimal for the initial n-fold. In other words, y * is a particularly good improving step. A sensible approximation of y * is to consider directions y of small size and multiplying them by some step length, i.e., find some λ · y with y 1 ≤ k for a value k depending only on ∆, r, and s. This implies that at most k of the n blocks are used for y. If we randomly color the blocks into k 2 colors, then with high probability at most one block of every color is used. This reduces the problem to choosing a solution of a single brick for every color and to aggregate them. We add data structures for every color to implement this efficiently. There is of course a chance that the colors do not split y perfectly. We handle this by using a deterministic structure of multiple colorings (instead of one) such that it is guaranteed that at least one of them has the desired property.

Related Work
The first XP-time algorithm for solving n-fold integer programs is due to De Loera et al. [5] with a running time of n g(A) L. Here g(A) denotes a so-called Graver complexity of the constraint matrix A and L is the encoding length of the largest number in the input. This algorithm already uses the idea of iterative converging to the optimal solution by finding improving directions. Nevertheless, the Graver complexity appears to be huge even for small n-fold integer linear programs and thus this algorithm was of no practical use [10]. The exponent of this algorithm was then greatly improved by Hemmecke et al. in [10] to a constant factor yielding the first cubic time algorithm for solving n-fold ILPs. More precisely, the running time of their algorithm is ∆ O(t(rs+st)) L · (nt) 3 , i.e., FPT-time parameterized over ∆, r, s, and t. Lately, two more breakthroughs were obtained. One of the results is due to Koutecký et al. [16], who gave a strongly polynomial algorithm with running time ∆ O(r 2 s+rs 2 ) (nt) 6 · log(nt) + LP . Here LP is the running time for solving the corresponding LP relaxation, which is possible in strongly polynomial time, since the entries of the matrix are bounded. Simultaneously, Eisenbrand et al. reduced in [7] the running time from a cubic factor to a quadratic one by introducing new proximity and sensitivity results. This leads to an algorithm with running time (∆rs) O(r 2 s+rs 2 ) L · (nt) 2 log 2 (nt) + LP . Note that both results require only polynomial dependency on t.
As for applications, n-fold ILPs are broadly used to model various problems. We refer to the works [5,10,11,12,15,18] and the references therein for an overview.

Structure of the Document
In Section 2 we introduce the necessary preliminaries. Section 3 gives the algorithm for efficiently computing the augmenting steps. This is then integrated into an algorithm for n-fold ILPs in Section 4. At first we require finite variable bounds and then discuss how to eliminate this requirement using the solution of the LP relaxation. Finally, in Section 5 we discuss how to handle infinite variable bounds without the LP relaxation and give new structural results.

Preliminaries
In the following we introduce n-folds formally and state the main results regarding them. Further we familiarize splitters, a technique known from Color Coding.
◮ Definition 1. Let n, r, s, t ∈ N. Furthermore let A 1 , . . . , A n be r × t integer matrices and B 1 , . . . , B n be s × t integer matrices. Then an n-fold A is of following form: The matrix A is of dimension (r +n·s)×n·t. We will divide A into blocks of size (r +n·s)×t. Similarly, the variables of a solution x are partitioned into bricks of length t. This means each brick x (i) corresponds to the columns of one submatrix A i and therefore also B i . Given c, ℓ, u ∈ Z n·t and b ∈ Z r+n·s , the corresponding n-fold Integer Linear Programming problem is defined by: The main idea for the state-of-the-art algorithms relies on some insight about the Graver basis of n-folds, which are special elements of the kern of A. More formally, we introduce the following definitions: ◮ Definition 2. The kern of a matrix A is defined as the set of integral vectors x with Ax = 0. We write kern(A) for them.
◮ Definition 3. A Graver basis element g is a minimal element of kern(A). An element is minimal, if it is not the sum of two sign-compatible elements u, v ∈ kern(A).
Here, sign-compatible means that u i < 0 if and only if v i < 0 for every i.
◮ Theorem 4 ([4]). Let A ∈ Z n×m and let x ∈ kern(A). Then there exist 2n − 1 Graver basis elements g 1 , . . . , g 2n−1 , which are sign-compatible with x such that Many results for n-fold ILPs rely on the fact that the ℓ 1 -norm of Graver basis elements for n-fold matrices are small. The best bound known for the ℓ 1 -norm is due to [7].
◮ Theorem 5 ([7]). The ℓ 1 -norm of the Graver basis elements of an n-fold matrix A is bounded by O(rs∆) rs .
Next, we will introduce a technique called splitters (see e.g. [17]), which has its origins in the FPT community and was used to derandomize the Color Coding technique [1]. So far it has not been used with n-fold ILPs. We refer the reader to the outline of techniques in the introduction for the idea on how we apply the splitters.
If ℓ ≥ k, the above means that there is some hash function that has no collisions when restricted to S. Interestingly, there exist splitters of very small size.
We note that an alternative approach to the result above is to use FKS hashing. Although it has an extra factor of log(n), it is particularly easy to implement.
◮ Theorem 8 (Corollary 2 and Lemma 2 in [9]). Define for every prim q < k 2 log(n) and prim p < q the hash function . This is an (n, k, k 2 ) splitter of size O(k 4 log 2 (n)).

Efficient Computation of Improving Directions
The backbone of our algorithm is the efficient computation of augmenting steps. The important aspect is the fact that we can update the augmenting steps very efficiently if the input changes only slightly. In other words, whenever we change the current solution by applying an augmenting step, we do not have to recompute the next augmenting step from scratch. The augmenting steps depend on a partition of the bricks. In the following we define the notion of a best step based on a fixed partition. Later, we will independently find steps for a number of partitions and take the best among them.
◮ Definition 9. Let P be a partition of the n bricks into k 2 disjoint sets P 1 , P 2 , . . . , P k 2 . Let u ∈ Z nt ≥0 and ℓ ∈ Z nt ≤0 be some upper and lower bounds on the variables (not necessarily the same as in the n-fold). A (P, k)-best step is an optimal solution of max c T x This means a (P, k)-best step is an element of kern(A), which uses only one brick of every P j ∈ P . Within that brick the norm of the solution must be at most k. Proof. Let P be a partition of the bricks from matrix A into k 2 disjoint sets P 1 , P 2 , . . . , P k 2 . Solving the (P, k)-best step problem requires that from each set P j ∈ P we choose at most one brick and set this brick's variables. All variables in other bricks of P j must be 0. Let x be a (P, k)-best step and let x (j) have the values of x in variables of P j and 0 in all other variables. Then by definition, x (j) 1 ≤ k. This implies that the right-hand side regarding x (j) , that is to say, Ax (j) , is also small. Since the absolute value of an entry in A is at most ∆, we have that Ax (j) ∞ ≤ k∆. Let a i be the i-th row of A. If i > r, then a i x (j) = 0. This is because Ax = 0 and a i has all its support either completely inside P j or completely outside P j . Meaning, the value of Ax (j) is one of the (2k∆ + 1) r many values we get by enumerating all possibilities for the first r rows. Furthermore, since P has only k 2 sets, the partial sum A(x (1) + · · · + x (j) ) is always one of (2k 3 ∆ + 1) r = (k∆) O(r) many candidates. Hence to find a (P, k)-best step we can restrict our search to solutions whose partial sums stay in this range. To do so, we set up a graph containing k 2 + 2 layers L 0 , L 1 , . . . L k 2 , L k 2 +1 . An example is given in figure 1. The first layer L 0 will consist of just one node marking the starting point with partial sum zero. Similarly, the last layer L k 2 +1 will just contain the target point also having partial sum zero, since a (P, k)-best step is an element of kern(A). Each layer L j with 1 ≤ j ≤ k 2 will contain (2k 3 ∆ + 1) r many nodes, each representing one possible value of A(x (1) + · · · x (j) ). Two points v, w from adjacent layers L j−1 , L j will be connected if the difference of the corresponding partial sums, namely w − v, can be obtained by a solution y of variables from only one brick of P j (with y 1 ≤ k). The weight of the edge will be the largest gain for the objective function c T y over all possible bricks. Hence, it could be necessary to compute and compare up to n values for each P j and each difference in the partial sums to insert one edge into the graph. Finally, we just have to find the longest path in this graph as it corresponds to a (P, k)-best step. The out-degree of each node is bounded by (2k 3 ∆ + 1) r since at most this many nodes are reachable in the next layer. Therefore the overall number of edges is bounded by Using the Bellman-Ford algorithm we can solve the Longest Path problem for a graph with N vertices and M edges in time N · M as the graph does not contain any circles. This gives a running time of (k∆) O(r) · (k∆) O(r) = (k∆) O(r) for solving the problem. Constructing the graph, however, requires solving a number of IPs of the form where b ′ ∈ Z r is the corresponding right-hand side of the top rows and ℓ ′ , u ′ , c ′ are the upper and lower bounds, and the objective of the block. This is an IP with r + s constraints, t variables, lower and upper bounds, and entries of the matrix bounded by ∆ in absolute value. Using the algorithm by Eisenbrand and Weismantel [8], solving one of them requires time In fact, a little thought allows us to reduce the dependency on t to a logarithmic one: Since the number of constraints in the ILP above is very small, there are only ∆ O(r+s) many different columns. Because of the cardinality constraint x 1 ≤ k, we only have to consider 2k many variables of each type of column, namely: . . .

Figure 1
This figure shows an example for a layered graph obtained while solving the (P, k)-best step problem. There are k + 2 layers, visually separated by gray dashed lines. This includes one source layer L0, one target layer L k 2 +1 both with just a single node representing the zero sum. Further there are k 2 layers with (2k 3 ∆ + 1) r nodes each, where in one layer the nodes stand for all reachable partial sums. Two points v, w from adjacent layers Lj−1, Lj will be connected if the difference of the corresponding partial sums, namely w − v, can be obtained by a solution y of variables from only one brick of Pj (with y 1 ≤ k). The weight of the edge will be the largest gain for the objective function c T y over all possible bricks. For the sake of clarity both the values of the nodes and the edges are not illustrated.
The k many with u ′ i > 0 and maximal c ′ i and the k many with ℓ ′ i < 0 and minimal c ′ i . If some solution uses a variable not in this set, then by pigeonhole principle there is a variable with the same column values and a superior objective value and which can be increased/decreased. We can reduce the variable outside this set and increase the corresponding variable inside this set until all variables outside the set are 0. We can use an appropriate data structure (e.g. AVL trees) to maintain a set of all variables with u ′ i > 0 (ℓ ′ i < 0) such that we can find the k best among them in time O(k log(t)). Whenever the bounds of some variable change, we might have to add or remove entries, which also takes only logarithmic time. After initialization in time O(nt) (in total for all bricks) solving such an IP can therefore be implemented in time The number of IPs to solve is at most n times the number of edges, since we have to compare the values of up to n bricks. This gives a running time of for constructing the graph. To obtain the update time from the premise of the theorem, it is perfectly fine to solve the Longest Path problem again, but we cannot construct the graph from scratch. However, in order to construct the graph we still have to find the best value over all bricks for each edge. Fortunately, if only a few bricks are updated (in their lower and upper bounds) it is not necessary to recompute all values. Each edge corresponds to a particular P j ∈ P and a fixed right-hand side (a possible value of Ax (j) ). We require an appropriate data structure D e for every edge e, which supports fast computation of the operations FindMax, Insert, and Delete. Again, an AVL tree computes each of these operations in time O(log(N )), where N is the number of elements. In D e we store pairs (v, i) where i is a brick in P j and v is the maximum gain of brick i for the right-hand side of e. The pairs are stored in lexicographical order. Since there are at most n bricks in P j , the data structure will have at most n elements. Initially, we can build D e in time nt · ∆ O(r 2 +s 2 ) (this is replicated for each edge). Now consider a change to the instance. Recall that we are looking at changes that affect only a single brick, namely the upper and lower bounds within that brick change. We are going to update the data structure D e (for each edge) to reflect the changes and we are going to recompute the edge value of each edge e using D e . Then we simply solve the Longest Path problem again. Let P j ∈ P be the set that contains the brick i that has changed in some variable. We only have to consider edges from L j−1 to L j , since none of the other edges are affected by the change. For a relevant edge e we compute the previous value v and current value v ′ that the brick i would produce (before and after the bounds have changed). In D e we have to remove (v, i) and insert (v ′ , i). Both operations need only O(log(n)) time. Then the running time to update D e for one edge is In order to update the edge value of e using D e , we simply have to find the maximum element in D e , which again takes time O(log(n)). To summarize, the total time to update the (P, k)-best step after a change to a single brick consists of (1) updating each D e , (2) finding the maximum in each D e , and (3) solving the Longest Path problem. We conclude that the update time is The Augmenting Step Algorithm In this section we will assume that all lower and upper bounds are finite and give a complete algorithm for this case. Later, we will explain how to cope with infinite bounds. We start by showing how to converge to an optimal solution when an initial feasible solution is given.
To compute the initial solution, we also apply this algorithm on a slightly modified instance. The approach resembles the procedure in previous literature, although we apply the results from the previous section to speed up the computation of augmenting steps. Let x be a feasible solution for the n-fold, in particular Ax = b. Let x * be an optimal one. Theorem 4 states that we can decompose the difference vector x ′ = x * − x into at most 2nt weighted Graver basis elements, that is For intuition, consider the following simple approach (this is similar to the algorithm in [10]). Suppose we are able to guess the best vector λ i g i = argmax j {c T (λ j g j )} regarding the gain for the objective function. This pair of step length λ i and Graver element g i is called the Graver best step. Then we can augment the current solution x by adding λ i g i to it, i.e., we set x ← x + λ i g i . Feasibility follows because all g j are sign-compatible. This procedure is repeated until no improving step is possible and therefore x must be optimal. In each iteration this decreases the gap to the optimal solution by a factor of at least 1 − 1/(2nt) by the pigeonhole principle. It may be costly to guess the precise Graver best step, but for our purposes it will suffice to find an augmenting step that is approximately as good.
We will now describe how to guess λ i . Since x + λ i g i is feasible, we have that Hence, it suffices to check all values in the range {1, . . . , Γ}, where Γ = max j {u j − ℓ j }. Proceeding like in [7], we lower the time a bit further by not taking every value into consideration. Instead, we look at guesses of the form λ ′ = 2 k for k ∈ {0, . . . , ⌊log(Γ)⌋}. Doing so we lose a factor of at most 2 regarding the improvement of the objective function, since c T (λ ′ g i ) > 0.5 · c T (λ i g i ) when taking λ ′ = 2 ⌊log(λi)⌋ > λ i /2. Fix λ ′ to the value above. Next we describe how to compute an augmenting step that is at least as good as λ ′ g i . Note that g i is a solution of where k = O(rs∆) rs is the bound on the norm of Graver elements from Theorem 5. Suppose we have guessed some partition P = {P 1 , . . . , P k 2 } of the bricks such that of each P j only a single brick has non-zero variables in g i . Clearly, the augmenting step λ ′ y * , where y * is a (P, k)-best step with bounds ℓ = ⌈ ℓ−x λ ′ ⌉ and u = ⌊ u−x λ ′ ⌋ would be at least as good as λ ′ g i . Indeed Theorem 10 explains how to compute such a (P, k)-best step dynamically and when we add λ ′ g j to x we only change the bounds of at most k 3 many variables. Hence, it is very efficient to recompute (P, k)-best steps until we have converged to the optimal solution. However, valid choices of λ ′ and P might be different in every iteration. Regarding λ ′ , we simply compute (P, k)-best steps for every of the O(log(Γ)) many guesses and take the best among them. We proceed similarly for P . We guess a small number of partitions and guarantee that always at least one of them is valid. For this purpose we employ splitters. More precisely, we compute a (n, k, k 2 ) splitter of the n bricks. Since g j has a norm bounded by k, it can also only use at most k bricks. Therefore the splitter always contains a partition P = {P 1 , . . . , P k 2 } where g j only uses a single brick in every P j .
To recap, in every iteration we solve a (P, k)-best step problem for every guess λ ′ and every partition P in the splitter and take the overall best solution as an improving direction λ ′ y * . Then we update our solution x by adding λ ′ y * onto it. At most k 2 many bricks change (and within each brick only k variables can change) and therefore we can efficiently recompute the (P, k)-best steps for every guess for the next iteration. This way we guarantee that we improve the solution by a factor of at least 1−1/(4nt) in every iteration. The explicit running time of these steps will be analyzed in the next theorem.

Initial Solution
Recall that we still have to find an initial solution. This solution indeed can be computed by using the augmenting step algorithm described above. We construct a new n-fold ILP which has a trivial feasible solution and whose optimal solution corresponds to a feasible solution of the original problem.
First we extend our n-fold A by adding (r + s)n new columns as follows: After the first block (A 1 , B 1 , 0, . . . , 0) T add r + s columns. The first r ones will contain an r × r identity matrix we call I r . This matrix I r has all ones in the diagonal. All other entries are zero. The next s columns will contain an s × s identity matrix I s . This submatrix will start at row r + 1. Again all other entries are zeros in these columns. After the next block we again introduce r + s new columns, the first r ones containing just zeros, the next an I s matrix at the height of B 2 . We repeat this procedure of adding r + s columns after each block, the first r having solely zero entries and the next s containing I s at the height of B i until our resulting matrix A init for finding the initial solution looks like the following: Due to our careful extension A init has again n-fold structure. For clarity the relevant submatrices are framed in the matrix above. Remark that zero entries inside of a block do not harm as solely the zeros outside of the blocks are necessary for an n-fold structure. At first glance, it seems that for the right-hand side b we now have a trivial solution consisting only of the new columns. Keep in mind, however, that the old variables have upper and lower bounds and that 0 might be outside these bounds. In order to handle this case we subtract ℓ, the lower bound, from all upper and lower bounds and set the right-hand side to b ′ = b − Aℓ.
We get an equivalent n-fold where every solution is shifted by ℓ. Now we can find a feasible solution (for b ′ ) using solely the new variables by defining where each non-zero entry corresponds to the columns containing the submatrices I r and I s respectively with a multiplicity of the remaining right-hand side b ′ . Next we introduce an objective function that penalizes using the new columns by having non-zero entries c ′ i corresponding to the positions of the new variables. We set where the zero entries correspond to old variables. The values c ′ i and the lower and upper bounds for the new variables depend on the sign of the right-hand side.
If b ′ i ≥ 0, then set c ′ i = −1, the lower bound to 0, and the upper bound to b ′ i . This way the variable can only be non-negative. If b ′ i < 0, set c i = 1, the lower bound to b ′ i and the upper bound to 0. Hence this variable must be non-positive. Clearly a solution has a value of 0, if and only if none of the new columns are used and no solution of better value is possible. Hence, if we use our augmenting step algorithm and solve this problem optimally, we either find a solution with value 0 or one with a negative value. In the former, we indeed have not taken any of the new columns into our solution, therefore we can delete the new columns and obtain a solution for the original problem (after adding ℓ to it). Otherwise, there is no feasible solution for the original problem as we solved the problem optimal regarding the objective function.
◮ Theorem 11. The dynamic augmenting step algorithm described above computes an optimal solution for the n-fold Integer Linear Program problem in time (rs∆) O(r 2 s+s 2 ) · O(L 2 · nt log 4 (nt)) when finite variable bounds are given for each variable. Here L is the encoding length of the largest occurring number in the input.
Proof. Due to Theorem 4 we know that the difference vector of an optimal solution x * to our current solution x, i.e. x ′ = x * − x, can be decomposed into 2nt weighted Graver basis elements. Hence, if we adjust our solution x with the Graver best step, we reduce the gap between the value of an optimal solution and our current solution by a factor of at least 1 − 1/(2nt) due to the pigeonhole principle. Our algorithm finds an augmenting step that is at least half as good as the Graver best step. Therefore, the gap to the optimal solution is still reduced by at least a factor of 1 − 1/(4nt).
Regarding the running time we first have to compute the splitter. Theorem 7 says, that this can be done in time k O(1) · n log(n) = (rs∆) O(rs) · n log(n). Next we have to try all values for the weight λ. Due to our step-length we get O(log(Γ)) guesses. Recall that Γ denotes the largest difference between an upper bound and the corresponding lower bound, i.e., Γ = max j {u j − ℓ j }. Fixing one, we have to find the best improving direction regarding each of the ((rs∆) O(rs) ) O(1) log(n) = (rs∆) O(rs) log(n) partitions. In the first iteration we have to set up the tables in time k O(r) · ∆ O(r 2 +s 2 ) · nt = (rs∆) O(r 2 s) · ∆ O(r 2 +s 2 ) · nt by computing the gain for each possible summand for each set and setting up the data structure. In each following iteration we update each table and search for the optimum in time k O(r) · ∆ O(r 2 +s 2 ) · log(nt) = (rs∆) O (r 2 s) · ∆ O(r 2 +s 2 ) · log(nt). Now it remains to bound the number I of iterations needed to converge to an optimal solution. To obtain such a bound we calculate: By reordering the term, we get 1/(4nt)) .
As log(1 + x) = Θ(x), we can bound log(1 − 1/(4nt)) by Θ(−1/(4nt)) and thus As the maximal difference between the current solution x and an optimal one x * can be at most the maximal value of c times the largest number in between the bounds for each variable, we get |c T (x * − x)| ≤ nt max i |c i | · Γ and thus Let L denote the encoding length of largest integer in the input. Clearly 2 L bounds the largest absolute value in c and thus we get Hence after this amount of steps by always improving the gain by a factor of at least 1 − 1/(4nt) we close the gap between the initial solution and an optimal one. Given this, we can now bound the overall running time with: Here Splitter denotes the time to compute the initial set P of partitions and Partitions denotes the cardinality of P. First Iteration is the time to solve the first iteration of the (P, k)-best step problem. Further λ Guesses is the number of guesses we have to do to get the right weight and lastly Update Time is the time needed to solve each following (P, k)-best step including updating the bounds and data structures.
Note, that we still have to argue about finding the initial solution, since in the construction the parameters of the n-fold slightly change. The length of a brick changes to t ′ = t + r + s. This, however, can be hidden in the O-Notation of (rs∆) O(r 2 s+s 2 ) . Further, Γ ′ , the biggest difference in upper and lower bounds can be bounded by a function in Γ, ∆, L, t and n. Recall, that the difference between the bounds of old variables does not change. For the new variables, however, the difference can be as large as b ′ ∞ . Thus we bound this value by We conclude that the running time for finding an initial solution (and also the overall running time) is

Handling Infinite Bounds
Remark, that if no finite bounds are given for all variables, we have to introduce some artificial bounds first. Here we can proceed as in [7], where first the LP relaxation is solved to obtain an optimal fractional solution z * . Using the proximity results from [7], we know that an optimal integral solution x * exists such that x * − z * 1 ≤ nt(rs∆) O(rs) . This allows us to introduce artificial upper bounds for the unbounded variables. Remark that this comes at the price of solving the corresponding relaxation of the n-fold Integer Linear Program problem. However we also lessen the dependency from L 2 to L as the finite upper and lower bounds can also be bounded more strictly due to the same proximity result. This yields an overall running time of (rs∆) O(r 2 s+s 2 ) · L · nt log 4 (nt) + LP. Nevertheless, solving this LP can be very costly, indeed it is not clear if a potential algorithm even runs in time linear in n. Thus, it may even dominate the running time of solving the n-fold ILP with finite upper bounds. Fortunately we can circumvent the necessity of solving the LP as we will describe in the following section using new structural results.
◮ Theorem 12. The dynamic augmenting step algorithm described above computes an optimal solution for the n-fold Integer Linear Program problem in time (rs∆) O(r 2 s+s 2 ) · L · nt log 4 (nt) + LP when some variables have infinite upper bounds. Here LP is the running time to solve the corresponding relaxation of the n-fold ILP problem. Bounds on ℓ 1 -norm In the following, we prove that even with infinite variable bounds in an n-fold there always exists a solution of small norm (if the n-fold has a finite optimum). Therefore, we can apply the algorithm for finite variable bounds by replacing every infinite one with this value.
◮ Lemma 13. If the n-fold is feasible and y is some (possibly infeasible) solution satisfying the variable bounds, then there exists a feasible solution x with Proof. We take the same construction as in the algorithm for finding a feasible solution in Section 4. Indeed, this construction was not setup for infinite bounds, but we consider the straight-forward adaption where infinite bounds simply stay the same. The useful property is that an optimal solution for this n-fold is a feasible solution for the original n-fold. Recall, the construction has a right-hand side b ′ with b ′ 1 ≤ Ay 1 + b 1 , the value of t becomes t ′ = t + r + s, and the objective function c ′ consists only of the values {−1, 0, 1}. Moreover, there is a feasible solution y with y 1 = b ′ 1 . Let x * be an optimal solution for this altered n-fold that minimizes x * − y 1 . We consider the decomposition into Graver elements Then c ′T g i > 0 or λ i = 0 for all i, since otherwise x * − g i would be a better solution than x * . It follows that c ′T g i ≥ 1 by discreteness of c. Also, by Theorem 5, g i 1 ≤ O(rs∆) rs . Recall that by construction c ′T x * = 0 and c ′T y Here we use that b ′ 1 ≤ b 1 + Ay ≤ b 1 + (r + s)∆ · y 1 . Proof. Clearly there exists a (possibly infeasible) solution y satisfying the bounds with y 1 ≤ ntζ. By the previous lemma we know that there is a feasible solution y with y 1 ≤ (rs∆) O(rs) · ( b 1 + ntζ). Let x * be an optimal solution of minimal norm. W.l.o.g. assume that x * − y has only non-negative entries. If there is a negative entry, consider the equivalent n-fold problem with the corresponding column inverted and its bounds inverted and swapped. We know that there is a decomposition of x * − y into weighted Graver basis elements Since every g i is sign-compatible with x * − y, we have that all g i are non-negative as well. Furthermore, it holds that c T g i > 0 or λ i = 0 for every g i , since otherwise x * − g i would be a solution of smaller norm with an objective value that is not worse. Now suppose toward contradiction that there is some g i where all variables in supp(g i ) have infinite upper bounds. Then the n-fold is clearly unbounded, since y + α · g i is feasible for every α > 0 and in this way we can scale the objective value beyond any bound. Thus, every Graver basis element adds at least the value 1 to some finitely bounded variable. This implies that i λ i ≤ y 1 + ntζ: If not, then by pigeonhole principle there is some finitely bounded variable x * j with x * j = y j + ( i λ i g i ) j > y j + ζ + |y j | ≥ ζ.
Since x * is feasible, this cannot be the case. We conclude, This yields an alternative approach to solving the LP relaxation, because now we can simply replace all infinite bounds with ±(rs∆) O(rs) · nt · 2 L . Then we can apply the algorithm that works only on finite variable bounds. The new encoding length L ′ of the largest integer in the input can be bounded by L ′ ≤ log((rs∆) O(rs) · 2 L · nt) ≤ O(rs · log(rs∆) · L · log(nt)).
This way we obtain the following.
◮ Corollary 15. We can compute an optimal solution for an n-fold in time (rs∆) O(r 2 s+s 2 ) L 2 · nt log 6 (nt).
In a similar way, we can derive the following bound on the sensitivity of an n-fold ILP. This bound is not needed in our algorithm, but may be of independent interest, since it implies small sensitivity for problems that can be expressed as n-fold.
◮ Theorem 16. Let x be an optimal solution of an n-fold with right-hand side b, in particular, Ax = b. If the right hand side changes to b ′ and the n-fold still has a finite optimum, then there exists an optimal solution It is notable that this bound does not depend on n. This is in contrast to the known bounds for the distance between LP and ILP solutions of an n-fold [7].
Proof. Consider the matrix A init from the construction used for finding an initial solution, that is, identity matrices are added after every block. As opposed to the proof of Lemma 13, we leave everything except for the matrix the same. In particular, we do not change the value in the objective function c and new columns get a value of 0. As the right-hand side of the n-fold we use b ′ . For some solution x, we write x old and x new for the vector restricted to the old variables (with all others 0) and the variables added in the matrix A init , respectively. This means x = x old + x new . Let x be an optimal solution with A init · x new = b ′ − b and x ′ one with A init · x ′ new = 0. Here we assume that x ′ is chosen so as to minimize x − x ′ 1 . Those solutions naturally correspond to solutions of the original n-fold with right-hand side λ i g i = x ′ − x be the decomposition into Graver basis elements. Suppose toward contradiction there is some g i where all of supp(g i ) are old variables. If c T g i > 0, then x is not optimal, because x + g i is feasible and has a better objective value. If on the other hand c T g i ≤ 0, then x ′ − g i is a solution of at least the same value as x ′ and thus x − x ′ 1 is not minimal. Indeed, this means (g i ) new 1 ≥ 1 for all g i . In other words, each graver element contains a non-zero new variable. Recall that A init is the identity matrix when restricted to the new variables (plus some zero columns). Due to the sign-compatibility we get We conclude,