Coordinated Motion Planning: Reconfiguring a Swarm of Labeled Robots with Bounded Stretch

We present a number of breakthroughs for coordinated motion planning, in which the objective is to reconfigure a swarm of labeled convex objects by a combination of parallel, continuous, collision-free translations into a given target arrangement. Problems of this type can be traced back to the classic work of Schwartz and Sharir (1983), who gave a method for deciding the existence of a coordinated motion for a set of disks between obstacles; their approach is polynomial in the complexity of the obstacles, but exponential in the number of disks. Other previous work has largely focused on {\em sequential} schedules, in which one robot moves at a time. We provide constant-factor approximation algorithms for minimizing the execution time of a coordinated, {\em parallel} motion plan for a swarm of robots in the absence of obstacles, provided some amount of separability. Our algorithm achieves {\em constant stretch factor}: If all robots are at most $d$ units from their respective starting positions, the total duration of the overall schedule is $O(d)$. Extensions include unlabeled robots and different classes of robots. We also prove that finding a plan with minimal execution time is NP-hard, even for a grid arrangement without any stationary obstacles. On the other hand, we show that for densely packed disks that cannot be well separated, a stretch factor $\Omega(N^{1/4})$ may be required. On the positive side, we establish a stretch factor of $O(N^{1/2})$ even in this case.


Introduction
Since the beginning of computational geometry, robot motion planning has been at the focus of algorithmic research. Planning the relocation of a geometric object among geometric obstacles leads to intricate scientific challenges, requiring the combination of deep geometric and mathematical insights with algorithmic techniques. With the broad and ongoing progress in robotics, the increasing importance of intelligent global planning with performance guarantees requires more sophisticated algorithmic reasoning, in particular when it comes to the higher-level task of coordinating the motion of many robots.
From the early days, multi-robot coordination has received attention from the algorithmic side. Even in the groundbreaking work by Schwartz and Sharir [52] from the 1980s, one of the challenges was coordinating the motion of several disk-shaped objects among obstacles. Their algorithms run in time polynomial in the complexity of the obstacles, but exponential in the number of disks. This illustrates the significant challenge of coordinating many individual robots. In addition, a growing number of applications focus primarily on robot interaction in the absence of obstacles, such as air traffic control or swarm robotics, where the goal is overall efficiency, rather than individual navigation.
With the challenges of multi-robot coordination being well known, there is still a huge demand for positive results with provable performance guarantees. In this paper, we provide significant progress in this direction, with a broad spectrum of results.

Our Results
• For the problem of minimizing the total time for reconfiguring a system of labeled circular robots in a grid environment, we show that it is strongly NP-complete to compute an optimal solution; see Theorem 1.
• We give an O(1)-approximation for the long-standing open problems of parallel motion-planning with minimum makespan in a grid setting. This result is based on establishing an absolute performance guarantee: We prove that for any labeled arrangement of robots, there is always an overall schedule that gets each robot to its target destination with bounded stretch, i.e., within a constant factor of the largest individual distance. See Theorem 3 for the base case of grid-based configurations, which is extended later on.
• For our approach, we make use of a technique to separate planar (cyclic) flows into so-called subflows whose thickness can be controlled by the number of subflows, see Definition 7 and Lemma 8. This is of independent interest for the area of packet routing with bounded memory: Our Theorem 4 implies that O(D) steps are sufficient to route any permutation of dilation D on the grid, even with a buffer size of 1, resolving an open question by Scheideler [51] dating back to 1998.
• We extend our approach to establish constant stretch for the generalization of colored robot classes, for which unlabeled robots are another special case; see Theorem 13.
• We extend our results to the scenario with continuous motion and arbitrary coordinates, provided the distance between a robot's start and target positions is at least one diameter; see Theorem 15. This implies that efficient multi-robot coordination is always possible under relatively mild separability conditions; this includes non-convex robots.
• For the continuous case of N unit disks and weaker separability, we establish a lower bound of Ω(N 1/4 ) and an upper bound of O( √ N ) on the achievable stretch, see Theorem 14 and Theorem 15.
We also highlight the geometric difficulty of computing optimal trajectories even in seemingly simple cases; due to limited space, this can be found in Appendix E.

Related Work
Different variants of multiple-object motion planning problems have received a large amount of attention from researchers in various areas of computer science and engineering; see [18] for a survey. Their practical relevance is reflected by the fact that there are industrial solutions used in automated warehouses for certain restricted forms of these problems [73]. There are different orthogonal criteria by which these problems can be characterized. A very important distinction is between discrete and continuous scenarios. In the discrete case, the input is a graph in which no two objects may use a vertex or edge at the same time; depending on the scenario, it may be allowed to rotate fully populated cycles. In the continuous or geometric setting, the objects are shapes in some geometric space which must be moved to a given target position in such a way that their interiors do not intersect at any time. Depending on the scenario, the shapes may or may not touch. Moreover, the objects may be confined to a certain region and there may be stationary obstacles.
Under these restrictions, it is unclear whether the target configuration is reachable at all. Aronov et al. [4] demonstrate that, for up to three robots of constant complexity, a path can be constructed efficiently if one exists. Ramanathan and Alagar [45] as well as Schwartz and Sharir [52] consider the case of several disk-shaped objects moving amongst polygonal obstacles. They both find algorithms deciding whether a given target configuration is reachable. Their algorithms run in time polynomial in the complexity of the obstacles, but exponential in the number of disks. Hopcroft et al. [32] and Hopcroft and Wilfong [33] demonstrate the reachability of a given target configuration is PSPACE-complete to decide; this already holds when restricted to rectangular objects moving in a rectangular region. Their proof was later generalized by Hearn and Demaine [29,30], who proved that rectangles of size 1 × 2 and 2 × 1 are sufficient and introduced a more general framework to prove PSPACE-hardness of certain block sliding games. Moreover, this problem is similar to the well-known Rush Hour Problem, which was shown to be PSPACE-complete by Flake and Baum [23]. For moving disks, Spirakis and Yap [62] have proven strong NP-hardness of the same problem; however, their proof makes use of disks of varying size. Bereg et al. [6] as well as Abellanas et al. [1] consider minimizing the number of moves of a set of disks into a target arrangement without obstacles. They provide simple algorithms and establish upper and lower bounds on the number of moves, where a move consists of sliding one disk along some curve without intersecting other disks. These bounds were later improved on by Dumitrescu and Jiang [19], who also prove that the problem remains NP-hard for congruent disks even when the motion is restricted to sliding.
Kirkpatrick and Liu [37] consider the case of moving two disks of arbitrary radius from a start into a target configuration in an otherwise obstacle-free plane, minimizing the sum of distances travelled by the disks. They provide optimal solutions for two disks moving from an arbitrary initial configuration into an arbitrary goal configuration. Their arguments do not seem to generalize to the makespan.
Díaz-Báñez et al. [17] considered the task of extracting a single object from a group of convex objects, moving a minimal number of objects out of the way. They present an algorithm that finds the optimal direction for extracting the object in polynomial time.
On the practical side, there are several approaches to solving multi-object motion planning problems, both optimally and heuristically. For discrete instances with a moderate number of objects, optimal solutions can be found using standard search strategies like A * [28] in the high-dimensional search space of possible configurations. Numerous techniques can be used to improve the efficiency of these strategies [22,26,63]. Moreover, there is some work employing SAT solvers [34,36] to solve multi-object motion planning problems to optimality. More recently, Yu and LaValle [74] present an IP-based exact algorithm for minimizing the makespan that works for hundreds of robots, even for challenging configurations with densities of up to 100%.
For larger instances, one has to resort to heuristic solutions. In priority planning [9,16,21,25,47,67,69], the paths are planned one-by-one by assigning priorities to the objects and planning the movement in decreasing order of priority, treating all objects with higher priority as moving obstacles. Kant and Zucker [35] decompose the problem into planning the paths for all objects and avoiding collisions by adapting the velocity of the objects appropriately, an approach which several papers are based on [11,41,42,44,56]. Another approach is to compute paths for the objects individually and resolve collisions locally [24].
Between these simple decoupled heuristics which only consider individual objects at a time and highdimensional coupled search algorithms lie dynamically-coupled algorithms [3, 5, 53-55, 57, 68] which aim for better solutions at the price of higher computational costs. These algorithms typically consider individual objects and only increase the dimension of the search space once a non-trivial interaction between objects is discovered. Recently, Wagner and Choset [71] provided a complete algorithm based on a similar principle.
With the advent of robot swarms, practical solutions to these problems became more important and the robotics community started to develop practical sampling-based algorithms [31,49,50,59,60,64,70] which, while working well in practice, are not guaranteed to find an (optimal) solution. In another recent work, Yu and Rus [75] present a practical algorithm based on a fine-grained discretization combined with an IP for the resulting discrete problem to provide near-optimal solutions even for densely populated environments. Other related work includes Rubenstein et al. [48], who demonstrated how to reconfigure a large swarm of simple, disk-shaped Kilobots; however, their method is sequential, relocating one robot at a time, so a full reconfiguration of 1000 robots takes about a day, highlighting the relevance of truly parallel motion planning.
Further extensions to higher-dimensional problems (with a wide range of additional motion constraints) are swarms of drones (e.g., the work by Kumar [66]) and even air traffic control (see Delahaye et al. [12] for a recent survey).
In both discrete and geometric variants of the problem, the objects can be labeled, colored or unlabeled. In the labeled case, the objects are all distinguishable and each object has its own, uniquely defined target position. This is the most extensively studied scenario among the three. In the colored case, the objects are partitioned into k groups and each target position can only be covered by an object with the right color. This case was recently considered by Solovey and Halperin [57], who present and evaluate a practical sampling-based algorithm. In the unlabeled case, the objects are indistinguishable and each target position can be covered by any object. This scenario was first considered by Kloder and Hutchinson [38], who presented a practical sampling-based algorithm. In this situation, Turpin et al. [65] prove that it is possible to find a solution in polynomial time, if one exists. This solution is optimal with respect to the longest distance traveled by any one robot. However, their results only hold for disk-shaped robots under additional restrictive assumptions on the free space. For unit disks and simple polygons, Adler et al. [2] provide a polynomial-time algorithm under the additional assumption that the start and target positions have some minimal distance from each other. Under similar separability assumptions, Solovey et al. [61] provide a polynomial time-algorithm that produces a set of paths that is no longer than OPT + 4m, where m is the number of robots. However, they do not consider the makespan, but only the total path length. On the negative side, Solovey and Halperin [58] prove that the unlabeled multiple-object motion planning problem is PSPACE-hard, even when restricted to unit square objects in a polygonal environment.
Regarding discrete multiple-object motion planning, Cǎlinescu et al. [8] consider the non-parallel motion planning problem on graphs, where each object can be moved along an unoccupied path in one move. They prove that both in the unlabeled and in the labeled case, minimizing the number of moves required is APX-hard. They provide 3-approximation algorithms for the unlabeled case on general graphs. Moreover, they prove that the problem remains NP-complete on the infinite rectangular grid. Their results are different from our results because the objective they consider is not closely related to the makespan. For other work, see [7,14,15,27] for particular examples.
On grid graphs, the problem can be cast as a very restrictive variant of mesh-connected routing, where each processor can only hold one packet at any time. However, approaches developed for this problem (see Kunde [40] and Cheung and Lau [10]) typically assume that at least a constant number of packets can be held at any processor. On the other hand, on grid graphs, the problem resembles the generalization of the 15-puzzle, for which Wagner [72] and Kornhauser et al. [39] have given an efficient algorithm that decides reachability of a target configuration and provided both lower and upper bounds on the number of moves required. However, Ratner and Warmuth [46] proved finding a shortest solution for this puzzle remains NP-hard. Demaine et al. [13] also consider various grids. For the triangular grid, they give efficiently verifiable conditions for checking whether a solution exists.

Preliminaries
In the grid setting of Section 3 we consider an n 1 × n 2 -grid G = (V, E), which is dual to an n 1 × n 2 -rectangle P in which the considered robots are arranged. A configuration of P is a mapping C : V → {1, . . . , N, ⊥}, which is injective w.r.t. the labels {1, . . . , N } of the N ≤ |P | robots to be moved, where ⊥ denotes the empty square. The inverse image of a robot's label is C −1 ( ). In the following, we consider a start configuration C s and target configuration C t ; for i ∈ {1, . . . , N }, we call C −1 s (i) and C −1 t (i) the start and target position of the robot i. Given the (minimum) Manhattan distance between each robot's start/target positions for each robot, we denote by d the maximum such distance over all robots.
A configuration C 1 : V → {1, . . . , N, ⊥} can be transformed within one single transformation step into another configuration . . , N }, i.e., if each robot does not move or moves to one of the at most four adjacent squares. Furthermore, two robots cannot exchange their squares in one transformation step, i.e., for all occupied squares v = w ∈ V , we require that C 2 (v) = C 1 (w) implies C 2 (w) = C 1 (v). For M ∈ N, a schedule is a sequence C 1 → · · · → C M of transformations. The number of steps in a schedule is called its makespan. Given a start configuration C s and a target configuration C t , the optimal makespan is the minimum number of steps in a schedule starting with C s and ending with C t . Let n > 1. Note that for the 2 × 2-, 1 × n-and n × 1-rectangles, there are pairs of start and target configurations where no such sequence exists. For all other rectangles, such configurations do not exist; we provide an O(1)-approximation of the makespan in Section 3.
For the continuous setting of Section 5, we consider N robots R := {1, . . . , N } ⊆ N. The Euclidean distance between two points p, q ∈ R 2 is |pq| := ||p − q|| 2 . Every robot r has a start and target position s r , t r ∈ R 2 with |s i s j |, |t i t j | ≥ 2 for all i = j. In the following, d := max r∈R |s r t r | is the maximum distance a robot has to cover. A trajectory of a robot r is a curve m r : [0, T r ] → R 2 , where T r ∈ R + denotes the travel time of r. This curve m r does not have to be totally differentiable, but must be totally left-and right-differentiable. Intuitively, at any point in time, a robot has a unique past and future direction that are not necessarily identical. This allows the robot to make sharp turns, but does not allow jumps. We

Labeled Grid Permutation
Let n 1 ≥ n 2 ≥ 2, n 1 ≥ 3 and let P be an n 1 × n 2 -rectangle. In this section, we show that computing the optimal makespan of arbitrarily chosen start and target configurations C s and C t of k robots in P is strongly NP-complete. This is followed by a O(1)-approximation for the makespan. Theorem 1. The minimum makespan parallel motion planning problem on a grid is strongly NP-hard.
We prove hardness using a reduction from Monotone 3-Sat. Intuitively speaking, given a formula, we construct a parallel motion planning instance with a variable robot for each variable in the formula. To encode a truth assignment, each variable robot is forced to move on one of two paths. This is done by employing two groups of auxiliary robots that have to move towards their goal in a straight line in order to realize the given makespan. These auxiliary robots form moving obstacles whose position is known at any point in time.
The variable robots cross paths with checker robots, one for each literal of the formula, forcing the checker to wait for one time step if the assignment does not satisfy the literal. The checker robots then cross paths with clause robots; each clause robot has to move to its goal without delay and can only do so if at least one of the checkers did not wait. In order to ensure that the checkers meet with the clauses at the right time, further auxiliary robots force the checkers to perform a sequence of side steps in the beginning. Figure 1 gives a rough overview of the construction; full details of the proof are given in Section 6.
In the proof of NP-completeness, we use a pair of start and target configurations in which the corresponding grids are not fully occupied. However, for our constant-factor approximation, we assume in Theorem 3 that the grid is fully occupied. This assumption is without loss of generality; our approximation algorithm works for any grid population, see Theorem 4.
Our constant-factor approximation is based on an algorithm that computes a schedule with a makespan upper-bounded by O(n 1 + n 2 ) described by Lemma 2. Based on Lemma 2, we give a constant factor approximation of the makespan, see Theorem 3. Finally, we embed the algorithm of Theorem 3 into a more general approach to ensure simultaneously a polynomial running time w.r.t. the number N of input robots and a constant approximation factor, see Theorem 4.
Lemma 2. For a pair of start and target configurations C s and C t of an n 1 × n 2 -rectangle, we can compute in polynomial time w.r.t. n 1 and n 2 a sequence of O(n 1 + n 2 ) steps transforming C s into C t .  The high-level idea of the algorithm of Lemma 2 is the following. We apply a sorting algorithm called RotateSort [43] that computes a corresponding permutation of an n 1 × n 2 (orthogonal) grid within O(n 1 + n 2 ) parallel steps. Each parallel step is made up of a set of pairwise disjoint swaps, each of which causes two neighbouring robots to exchange their positions. Because in our model direct swaps are not allowed, we simulate one parallel step by a sequence of O(1) transformation steps. This still results in a sequence of O(n 1 + n 2 ) transformation steps. A detailed description of the algorithm used in the proof of Lemma 2 is given in Section 6.2.
Based on the algorithm of Lemma 2, we can give a constant-factor approximation algorithm.  At a high level, the algorithm of Theorem 3 first computes the maximal Manhattan distance d between a robot's start and target position. Then we partition P into a set T of pairwise disjoint rectangular tiles, where each tile t ∈ T is an n 1 × n 2 -rectangle for n 1 , n 2 ≤ 24d. We then use an algorithm based on flows to compute a sequence of O(d) transformation steps, ensuring that all robots are in their target tile. Once all robots are in the correct tile, we use Lemma 2 simultaneously on all tiles to move each robot to the correct position within its target tile. The details of the algorithm of Theorem 3 are given further down in this section.
The above mentioned tiling construction ensures that each square of P belongs to one unambiguously defined tile and each robot has a start and target tile.
Based on the approach of Theorem 3 we give a O(1)-approximation algorithm for the makespan with a running time polynomial w.r.t. the number N of robots to be moved.

Theorem 4.
There is an algorithm with running time O(N 5 ) that, given an arbitrary pair of start and target configurations of a rectangle P with N robots to be moved and maximum distance d between any start and target position, computes a schedule of makespan O(d), i.e., an approximation algorithm with constant stretch.
Intuitively speaking, the approach of Theorem 4 distinguishes two cases.
(1) Both n1 4 and the maximum distance d between the robots' start and target positions, are lowerbounded by the number N of input robots.

6
(2) N > n1 4 or N > d. In case (1), the grid is populated sparsely enough such that the robots' trajectories in northern, eastern, southern, and western direction can be done sequentially by four individual transformation sequences, see Figure 12.
In order to ensure that each robot has locally enough space, we consider a preprocessed start configuration C o in which the robots have odd coordinates. We ensure that C s can be transformed into C o within O(d) steps. Analogously, we ensure that the outcome of the northern, eastern, southern, and western trajectories is a configuration C e with even coordinates, such that C e can be transformed into C t within O(d) transformation steps.
In the second case, we apply the approach of Theorem 3 as a subroutine to a union of smallest rectangles that contain the robots' start and target configurations, see Figure 13.
The full detailed version of the proof of Theorem 4 can be found in Section 6.3.
In the rest of Section 3, we give the proof of Theorem 3, i.e. we give an algorithm that computes a schedule with makespan linear in the maximum distance between robots' start and target positions. The remainder of the proof of Theorem 3 is structured as follows. In Section 3.1 we give an outline of our flow algorithm that ensures that each robot reaches its target tile in O(d) transformation steps. Section 3.2 gives the full intuition of this algorithm and its subroutines. (For full details, we refer to Section 6).

Outline of the Approximation Algorithm of Theorem 3
We model the trajectories of robots between tiles as a flow f T , using the weighted directed graph G T = (T, E T , f T ), which is dual to the tiling T defined in the previous section. In G T , we have an edge (v, w) ∈ E T if there is at least one robot that has to move from v into w. Furthermore, we define the weight f T ((v, w)) of an edge as the integer number of robots that move from v to w. As P is fully occupied, f T is a circulation, i.e., a flow with no sources or sinks, in which flow conservation has to hold at all vertices. Because the side lengths of the tiles are greater than d, G T is a grid graph with additional diagonal edges and thus has degree at most 8.
The maximum edge value of f T is Θ(d 2 ), but only O(d) robots can possibly leave a tile within a single transformation step. Therefore, we decompose the flow f T of robots into a partition consisting of O(d) subflows, where each individual robot's motion is modeled by exactly one subflow and each edge in the subflow has value at most d. Thus we are able to realize each subflow in a single transformation step by placing the corresponding robots adjacent to the boundaries of its corresponding tiles before we realize the subflow. To facilitate the decomposition into subflows, we first preprocess G T . In total, the algorithm consists of the following subroutines, elaborated in detail in Section 3.2.
• Step 1: Compute d, the tiling T and the corresponding flow G T .
• Step 2: Preprocess G T in order to remove intersecting and bidirectional edges. • Step 5: Simultaneously apply Lemma 2 to all tiles, moving each robot to its target position.

Details of the Approximation Algorithm of Theorem 3
In this section we only give more detailed descriptions of Steps 1-4 because Step 5 is a trivial application of Lemma 2 to all tiles in parallel.

3.2.1
Step 1: Compute d, the Tiling T , and the corresponding Flow G T The maximal distance between robots' start and target positions can be computed in a straightforward manner.
For the tiling, we assume that the rectangle P is axis aligned and that its bottom-left corner is (0, 0). We consider k v := n1 12d vertical lines v 1 , . . . , v kv with x-coordinate modulo 12d equal to 0. Analogously, we consider k h := n2 12d horizontal lines h 1 , . . . , h k h with y-coordinate modulo 12d to 0. Finally, we consider the tiling of P that is induced by the arrangement induced by v 1 , . . . , v kv−1 , h 1 , . . . , h hv−1 and the boundary of P , see Figure 11. This implies that the side length of a tile is upper-bounded by 24d − 1.
Finally, computing the flow G T is straightforward by considering the tiling T and the robots' start and target positions.

Step 2: Ensuring Planarity and Unidirectionality
After initialization, we preprocess G T , removing edge intersections and bidirectional edges by transforming the start configuration C s into an intermediate start configuration C s , obtaining a planar flow without bidirectional edges. This transformation consists of two steps: (1) ensuring planarity and (2) ensuring unidirectionality. Step (1): We observe that edge crossings only occur between two diagonal edges with adjacent source tiles, as illustrated in Figure 2(a)+(b). To remove a crossing, it suffices to eliminate one of the diagonal edges by exchange robots between the source tiles. To eliminate all crossings, each robot is moved at most once, because after moving, the robot does no longer participate in a diagonal edge. Thus, all necessary exchanges can be done in O(d) steps by Lemma 2, covering the tiling T by constantly many layers, similar to the proof of Lemma 2.
Step (2): We delete a bidirectional edge (v, w), (w, v) by moving min{f T ((v, w)), f T ((w, v))} robots with target tile w from v to w and vice versa which achieves that min{f T ((v, w)), f T ((w, v))} robots achieve their target tile w and min{f T ((v, w)), f T ((w, v))} robots achieve their target tile v, thus eliminating the edge with lower flow value. This process is depicted in Figure 2(c)+(d). Like step (1), this can be done in O(d) parallel steps by Lemma 2. As we do not add any edges, we maintain planarity during step (2). Observe that during the preprocessing, we do not destroy the grid structure of G T .
Step (1) and step (2) maintain the flow property of f T without any other manipulations to the flow f T , because both preprocessing steps can be represented by local circulations.

Step 3: Computing a Flow Partition
After preprocessing, we partition the flow G T into d-subflows.
for all e ∈ E . If f T (e) ≤ z for all e ∈ E and some z ∈ N, we call G T a z-flow.

8
The flow partition relies on an upper bound on the maximal edge weight in G T . By construction, tiles have side length at most 24d; therefore, each tile consists of at most 576d 2 unit squares. This yields the following upper bound; a tighter constant factor can be achieved using a more sophisticated argument.
Proof sketch. In a slight abuse of notation, throughout this proof, the elements in sets of cycles are not necessarily unique. A (d, O(d))-partition can be constructed using the following steps.
• We start by computing a (1, h)-partition C of G T consisting of h ≤ n 1 n 2 cycles. This is possible because G T is a circulation. If a cycle C intersects itself, we subdivide C into smaller cycles that are intersection-free. Furthermore, h is clearly upper bounded by the number of robots n 1 n 2 , because every robot can contribute only 1 to the sum of all edges in G T . As the cycles do not self-intersect, we can partition the cycles C by their orientation, obtaining the set C of clockwise and the set C of counterclockwise cycles.
• We use C and C to compute a (1, h )-partition C 1 ∪ C 2 ∪ C 1 ∪ C 2 with h ≤ n 1 n 2 , such that two cycles from the same subset C 1 , C 2 , C 1 , or C 2 share a common orientation. Furthermore, we guarantee that two cycles from the same subset are either edge-disjoint or one lies nested in the other.
A partition such as this can be constructed by applying a recursive peeling algorithm to C and C as depicted in Figure 3, yielding a decomposition of the flow induced by C into two cycle sets C 1 and C 2 , where C 1 consists of clockwise cycles and C 2 consists of counterclockwise cycles, and a similar partition of C , see the appendix for details. Figure 3: Recursive peeling of the area bounded by the cycles from C , resulting in clockwise cycles (thick black cycles). Cycles constituting the boundary of holes are counterclockwise (thick red cycles). Note that an edge e vanishes when f T (e) cycles containing that edge are removed by the peeling algorithm described above.
• Afterwards, we partition each set C 1 , C 2 , C 1 , and C 2 into O(d) subsets, each inducing a d-subflow of G T , see the appendix for details.

A Subroutine of Step 4: Realizing a Single Subflow
In this section, we present a procedure for realizing a single d-subflow G T of G T .
Definition 9. A schedule t := C 1 → · · · → C k+1 realizes a subflow G T = (T, E , f T ) if, for each pair v, w of tiles, the number of robots moved by t from their start tile v to their target tile w is There is a polynomial-time algorithm that computes a schedule C 1 → · · · → C k+1 realizing G T for a constant k ∈ O(1).
Proof sketch. We give a high-level description of the proof and refer to Section 6.5 for details.    Our algorithm uses k = O(d) preprocessing steps C 1 → · · · → C k , as depicted in Figure 4(a)+(b), and one final realization step C k → C k+1 , shown in Figure 4(c), pushing the robots from their start tiles into their target tiles. The preprocessing eliminates diagonal edges and places the moving robots next to the border of their target tiles. For the final realization step we compute a pairwise disjoint matching between incoming and outgoing robots, such that each pair is connected by a tunnel inside the corresponding tile in which these tunnels do not intersect, see Figure 4(a). The final realization step is given via the robots' motion induced by pushing each robot into the interior of the tile and by pushing this one-step motion through the corresponding tunnel into the direction of the corresponding outgoing robot.

Step 4: Realizing All Subflows
Next we extend the idea of Lemma 10 to ≤ d subflows instead of one and demonstrate how this can be leveraged to move all robots to their target tile using O(d) transformation steps.
Proof sketch. We give a high level description of the proof and refer for details to Section 6.6.
Let t be an arbitrary tile. Similar to the approach of Lemma 10, we first apply a preprocessing step guaranteeing that the robots to be moved into or out of t are in the right position close to the boundary of t, see Figure 5. Thereafter we move the robots into their target tiles, using applications of the algorithm from Lemma 10 without the preprocessing phase. In particular, we realize a sequence of d-subflows by applying times the single realization step of Lemma 10. For the proof of Theorem 3, we still need to analyze the time complexity of our approach, for which we refer to Section 6.7.

Variants on Labeling
A different version is the unlabeled variant, in which all robots are the same. A generalization of both this and the labeled version arises when robots belong to one of k color classes, with robots from the same color class being identical.
We formalize this problem variant by using a coloring c : {1, . . . , n 1 n 2 } → {1, . . . , k} for grouping the robots. By populating unoccupied cells with robots carrying color k + 1, we may assume that each unit square in the environment P is occupied. The robots draw an image I = I 1 , . . . , I k , where I i is the set of cells occupied by a robot with color i. We say that two images I s and I t are compatible if in I s and I t the number of cells colored with color i are equal for each color i = 1, . . . , k. By moving the robots, we want to transform a start image I s into a compatible target image I t , minimizing the makespan. The basic idea is to transform the given unlabeled problem setting into a labeled problem setting by solving a geometric bottleneck matching problem, see the appendix for details.

Continuous Motion
The continuous case considers N unit disks that have to move into a target configuration; the velocity of each robot is bounded by 1, and we want to minimize the makespan. For arrangements of disks that are not well separated, we show that constant stretch is impossible.

Details for Labeled Grid Permutation
In this section, we give the details omitted in the high-level description of our results for the problem variant of labeled grid permutations that we considered in Section 3.

Details for the NP-Completeness of the Grid Case
Theorem 1. The minimum makespan parallel motion planning problem on a grid is strongly NP-hard.
Proof. The proof is based on a reduction from the NP-hard problem Monotone 3-Sat, which asks to decide whether a Boolean 3-CNF formula ϕ is satisfiable, where in each clause the literals are either all positive or all negative. All coordinates and the makespan are constructed to be polynomial in the input size, implying strong NP-hardness. Thus, there is no FPTAS for the problem, unless P = NP.
For the remainder of the proof, let ϕ have n variables {x 0 , . . . , x n−1 } and m clauses {C 1 , . . . , C m }. From ϕ, we construct an instance of the minimum makespan parallel motion planning problem that has optimum makespan M if ϕ is satisfiable and M + 1 otherwise. During the description of the construction, we keep M variable, fixing its value once the construction is complete. The structure of the resulting instance is sketched in Figure 6.
Each variable x j is represented by a variable robot. Additionally, for each variable there are two auxiliary robots that force the variable robot to take one of two different paths to its goal in any solution with makespan M , see Figure 6. The left auxiliary robots start at positions (1, 6j + 1) and move down towards their target positions (1, 6j + 1 − M ) in each time step. The right auxiliary robots start at positions (M − 3, −M + 6j + 1) and have to move up towards their target positions (M − 3, 6j + 1). The variable robot for variable x j starts at position (0, 6j) and has to travel M − 2 units to the right towards its goal position (M − 2, 6j). In the first time step, each variable robot can either wait or move upwards. Afterwards, it must move to the right in every time step until passing the right group of auxiliary robots at x = M − 3. It cannot wait or move down before this point, as this would lead to a collision with the corresponding right auxiliary robot. Therefore in any schedule with makespan M , after the kth time step, each variable robot has x-coordinate k − 1 for any For each clause C i = {x j1 , x j2 , x j3 } with j 1 < j 2 < j 3 , we have three checker robots c 1 i , c 2 i , c 3 i checking whether their corresponding literal satisfies the clause. The checkers for clause C i start at positions α 1 i := 6(ni + j 1 ), −6ni − f i , . . . , α 3 i := 6(ni + j 3 ), −6ni − f i , where f i = 1 iff C i is negative and f i = 0 otherwise. As depicted in Figure 7, a checker has to wait one time step for the corresponding variable iff the checked literal is not true.
. . . . Each checker travels a total distance of M − 1; thus they are allowed to wait for one time step, but have to move on an xy-monotone path towards their target position.
Because moves to the right do not change the position of a checker relative to the variables, we may assume the checkers to move to the right from their initial position before moving up. In fact, we enforce this behavior using auxiliary robots as depicted in Figure 8. Moreover, each clause C i also has a clause robot ensuring that there is at least one satisfied literal. The clause robots start to the right of the checkers and above the variables and have to move M − 2 units to the left and two units downwards, and therefore have to move towards their target in every round without waiting for the checkers. The clause robot of each clause is placed such that checkers for other clauses cannot interfere with its path, see Figure 6. To be more precise, as shown in Figure 8, the clause robot stops at position t 1 i − (3, 3) and starts at position t 1 i + (−1, M − 5). The vertical offset between the checkers introduced by the side steps that c 1 i and c 2 i perform is chosen such that the clause robot can pass through the checkers without waiting iff one of the checkers did not wait. This is the case iff at least one literal of the clause is satisfied.
It remains to determine the critical makespan M . This critical makespan M must be large enough to allow the checkers of the last clause C m to pass through the variable robots and their clause robot. Moreover, it must also allow the variable robots to cross paths with all checkers. The checkers of the last variable travel left of the line x = 6n(m + 1) − 6. Therefore, a makespan M ≥ 6n(m + 1) suffices for the variable robots. Regarding the clauses, if the last clause is negative, the starting points of its checkers are located on the line y = −6nm − 1. The topmost variable robot travels below the line y = 6(n − 1) + 1. To keep our argument Figure 8: Left: A group of auxiliary robots is used to force the first two checkers of each clause to perform their side steps before moving up. Each auxiliary robot has to move downwards M units. Right: A clause robot (orange) meeting the corresponding checkers (black for satisfied checkers, red for non-satisfied checkers).
simple, we want to make sure that the clauses stay strictly above all variables. Due to the position of the clauses, this means that we have to ensure that the checker for the first literal of the last clause has target position above the line y = 6(n − 1) + 5. Therefore, not accounting for the side steps of the checkers, we have to set M ≥ (6nm + 1) + (6(n − 1) + 5) = 6n(m + 1). Clearly, the number of side steps performed by each checker is less than 6n. Therefore, in total, a critical makespan of M := 6n(m + 2) is sufficient. In our construction, a makespan of M is feasible iff for every clause robot there is one checker that does not wait, which implies that each clause has a satisfied literal under the assignment induced by the variable robots. Therefore, a makespan of M is feasible iff ϕ is satisfiable.
Finally, observe that even though our reduction uses individually labeled robots, three colors are already sufficient. One can use color 1 for variables, color 2 for checkers and color 3 for clauses and all auxiliaries.

Details on Computing a Schedule With a Makespan of O(n 1 + n 2 )
Next, we give the details of an algorithm that computes a sequence of O(n 1 + n 2 ) steps transforming an arbitrary start configuration C s into an arbitrary target configuration C t of an n 1 × n 2 rectangle, see Lemma 2. This algorithm is based on a sorting algorithm, called RotateSort that uses swap operations, in which two robots exchanging their positions within one single step, as elementary operations. As our model does not allow swap operations, we first have to show how to simulate swap operations at the expense of increasing the makespan by a factor upper-bounded by some constant.
In order to simulate swap operations, we first observe that LaValle and Yu [74] proved that for a 3 × 3square, each start configuration can be transformed into an arbitrary target configuration. This result is easily established for 2 × 3-rectangles; see Figure 9 for how to realize a transposition. Lemma 16. For a pair of start and target configurations C s and C t of a 2 × 3-rectangle, we can compute a sequence of at most seven steps transforming C s into C t .
Lemma 16 is the building block for permuting n 1 × n 2 rectangles within makespan O(n 1 + n 2 ).
Lemma 2. For a pair of start and target configuration C s and C t of an n 1 × n 2 -rectangle, we can compute in polynomial time a sequence of O(n 1 + n 2 ) steps transforming C s into C t .
Proof. The straightforward proof relies on covering the rectangle by a set of disjoint 2 × 3-and 3 × 2-rectangles, on which swap operations are performed in parallel, with each swap operation exchanging the position of two adjacent robots. We say that two swap operations are disjoint if all four positions of the two swaps are distinct. Although direct swap operations of adjacent robots are not possible, Lemma 16 allows us to perform an arbitrary number of pairwise disjoint swap operations within each 2 × 3-rectangle with O(1) transformation steps. As illustrated in Figure 10, we cover P by twelve different layers of rectangles, such that each pair of adjacent unit squares from P lies in one of the 2 × 3-rectangles or in one of the 3 × 2-rectangles.
In particular, we distinguish between 2 × 3-and 3 × 2-rectangles inside the n 1 × n 2 -rectangle. Furthermore Given a set S of pairwise disjoint swap operations, we subdivide S into these twelve layers, such that the two robots of each swap operation lie in the same small rectangle of the corresponding layer. Lemma 16 implies that all swap operations of one layer can be done in parallel with O(1) transformation steps. Therefore, all swap operations in S can be done in O(1) transformation steps.
This allows us to apply a sorting algorithm for n 1 × n 2 -meshes, called Rotatesort [43], whose only elementary steps are swap operations of adjacent cells. We employ Rotatesort by labeling the robots in the target configuration based on the snake-like ordering guaranteed by Rotatesort. Applying Rotatesort to the start configuration with the robots labeled in this way, we obtain the required target configuration. Marberg and Gafni [43] show that Rotatesort needs O(n 1 + n 2 ) phases, where each phase consists of pairwise disjoint swap operations. This leads to O(n 1 + n 2 ) transformation steps in our model.   or N > d separately as follows: In case (1), we apply the following approach whose steps, described next, are all realizable because N ≤ n1 4 , d. We assume w.l.o.g. that n 1 and n 2 are even. Otherwise, starting from the start configuration, we move all robots from the last line into the second-to-last line and all robots from the last column into the second-to-last column within O(d) transformation steps. The reversed argument implies that there is a sequence of O(d) transformation steps leading from an even-sized configuration to the target configuration. Thus, from now on, we restrict our considerations to even-sized rectangles.
For each pair of start and target configurations C s and C t of P , there are two configurations C o and C e , such that the two following conditions are fulfilled: (1) The coordinates of the robots in C o are odd and the coordinates of the robots in C e are even and (2) C s and C e can be transformed into C o and C t within O(d) transformation steps. Thus, we still have to give an approach for how C o can be transformed into C e within O(d) transformation steps.
First of all, we ensure in parallel for all robots that they achieve the position that is induced by the x-coordinate of their position in C e and the y-coordinate of their position in C o . We call the corresponding configuration intermediate configuration C i with intermediate positions and coordinates. In order to obtain the intermediate configuration, starting from C o , we first push in parallel all robots, that have to move to the right, one position upwards, then move them simultaneously to the right until they achieve their intermediate x-coordinate, and, push a robot immediately one position downwards when it reaches its intermediate x-coordinate, see Figure 12.
After that, we apply the analogous approach for robots that have to move to the left, resulting in the intermediate configuration.
This results in a set of rectangles that are pairwise intersection free, allowing us to apply the approach of Theorem 3 to each resulting rectangle in parallel, while ensuring that each robot is involved in at most one application of the approach of Theorem 3.
As the side lengths of the initial rectangles are upper-bounded by d, we conclude that the sum of the lengths of the finally computed rectangles is upper bounded by N · d, which in turn is upper bounded by N 2 in that case. This implies a running time of O(d · N 2 ) ≤ O(N 3 ).  Proof. In a slight abuse of notation, throughout this proof, the elements in sets of cycles are not necessarily unique. A (d, O(d))-partition can be constructed using the following steps.

Details on Step 3: Computing a Flow Partition
• We start by computing a (1, h)-partition C of G T consisting of h ≤ n 1 n 2 cycles. This is possible because G T is a circulation. If a cycle C intersects itself, we subdivide C into smaller cycles that are intersection-free. Furthermore, h is clearly upper bounded by the number of robots n 1 n 2 , because every robot can contribute only 1 to the sum of all edges in G T . As the cycles do not self-intersect, we can partition the cycles C by their orientation, obtaining the set C of clockwise and the set C of counterclockwise cycles.
• We use C and C to compute a (1, h )-partition C 1 ∪ C 2 ∪ C 1 ∪ C 2 with h ≤ n 1 n 2 , such that two cycles from the same subset C 1 , C 2 , C 1 , or C 2 share a common orientation. Furthermore, we guarantee that two cycles from the same subset are either edge-disjoint or one lies nested in the other. A partition such as this can be constructed by applying a recursive peeling algorithm to C and C as depicted in Figure 3, yielding a decomposition of the flow induced by C into two cycle sets C 1 and C 2 , where C 1 consists of clockwise cycles and C 2 consists of counterclockwise cycles, and a similar partition of C . In particular, we apply the following approach iteratively to C : We consider the union A of the area bounded by the cycles from C . We remove a flow value of 1 from all edges of the outer boundary component of A. In particular, we add the corresponding 1-subflow G 1 to C 1 and remove G 1 from C . Analogously, we remove 1-subflows from C that are induced by inner boundary components and add these 1-subflows to C 2 .
• Afterwards, we partition each set C 1 , C 2 , C 1 , and C 2 into O(d) subsets, each inducing a d-subflow of G T . This can be done as follows. Let C ∈ {C 1 , C 2 , C 1 , C 2 }. Recall that every pair of cycles from C either consists of one cycle nested inside the other or of edge-disjoint cycles. The cycles induce a dual forest D = (C, E D ), where a cycle v has a child w iff w lies inside v and there is no other cycle lying in v that w lies in. We label the cycles by their depth in D modulo 576d and let G i be the flow induced by all cycles carrying label i, thus obtaining O(d) subflows G i .
Finally, we show that each subflow G i obtained in this way is a d-subflow of G T . To this end, we observe the following. Let e ∈ E T be an arbitrarily chosen edge and let v, w ∈ C be two cycles sharing e. This implies Figure 14: Recursive peeling of the area bounded by the cycles from C , resulting in clockwise cycles (thick black cycles). Cycles constituting the boundary of holes are counterclockwise (thick red cycles). Note that an edge e vanishes when f T (e) cycles containing that edge are removed by the peeling algorithm described above.
that v and w lie nested inside of each other; w.l.o.g., assume that w lies inside v. Thus, in D, v lies on the path from w to its root, and e is contained in all cycles on the path between v and w. On the other hand, due to Observation 6, all cycles containing e lie on a path of length at most 576d 2 in D. Therefore, e has a weight of at most 576d 2 576d = d in each G i , and G i is a d-subflow.

Details on a Subroutine of Step 4: Realizing a Single Subflow
By Lemma 10, we give an approach that computes a schedule of constant length for a given d-subflow.
There is a polynomial-time algorithm that computes a schedule C 1 → · · · → C k+1 realizing G T for a constant k ∈ O(1).
Proof. Our algorithm uses k = O(d) preprocessing steps C 1 → · · · → C k , as depicted in Figure 4(a)+(b), and one final realization step C k → C k+1 , shown in Figure 4(c), moving the robots from their start tiles into their target tiles. The preprocessing replaces diagonal edges by pairs of orthogonal edges, see the red arrows in Figure 4(a), and places the moving robots next to the border of their target tiles. Note that the replacements of the diagonal edges cannot be done as part of the preprocessing of Step 2. This is because the replaced diagonal edges may be part of circular flows that cannot be realized locally, as it is done for crossing or bidirectional edges in Step 2 of our algorithm. For the final realization step we compute a pairwise disjoint matching between incoming and outgoing robots, such that each pair is connected by a tunnel inside the corresponding tile in which these tunnels do not intersect each other, see Figure 4(a). The final realization step is given via the robots' motion induced by moving each robot into the interior of the tile and by moving this one-step motion through the corresponding tunnel into the direction of the corresponding outgoing robot.
The preprocessing steps C 1 → · · · → C k : Let v be an arbitrary tile. We place all robots corresponding to horizontal and vertical edges (v, w) of G T in a row adjacent to the side shared by v and w. We can do this for all tiles using O(d) parallel steps by applying Lemma 2.
Next, we eliminate diagonal edges (w, v) ∈ E T as follows. There are two tiles sharing a side with both w and v; let u be one of them. First we place the f T ((w, v)) robots with start tile w and target tile v in a row next to the side between w and u. Then, we move them to u by exchanging them with f T ((w, v)) robots with start and target tile u that lie next to the side between u and v, as shown in Figure 4(a). In the resulting flow, the diagonal edge (w, v) with weight f T ((w, v)) is replaced by adding a flow of value f T ((w, v)) on the edges (w, u), (u, v).
We process all tiles as described above in two parallel phases by applying Lemma 2 twice: first on all rows with even index and then on all rows with odd index, thus ensuring that parallel applications of Lemma 2 do not interfere with each other.
The realization step C k → C k+1 : Let t be an arbitrary tile. For the transformation step C k → C k+1 , we need a matching between incoming and outgoing robots of t, such that there is a set of non-intersecting paths in t connecting each incoming robot with its corresponding outgoing robot. As illustrated in Figure 4(c), these paths induce the required transformation C k → C k+1 .
We compute this matching by selecting an incoming robot r in and matching it to a robot r out , such that there is a path p ⊆ ∂t between r in and r out that does not touch another incoming or outgoing robot. We remove the matched robots from consideration and repeat the matching procedure until no further unmatched robots exist.
The non-intersecting paths between the positions of the matched robots are constructed as follows. For i ≥ 1, the ith hull of t is the union of all squares on the boundary of the rectangle remaining after the hulls 1, . . . , i − 1 are removed. The path between r in and r out consists of three pieces, as shown in Figure 4(c). For the ith matched pair of robots, the initial and the last part of the path are straight line segments orthogonal to ∂t, from the position of r in to the d + ith hull and from the d + ith hull towards the position of r out . The main part of the path lies on the d + ith hull, connecting the end of the initial part to the beginning of the last part.

Details on Step 4: Realizing All Subflows
There is a polynomial-time algorithm computing O(d) + transformation steps C 1 → · · · → C k+ realizing S.
Proof. Let t be an arbitrary tile. Similar to the approach of Lemma 10, we first apply a preprocessing step guaranteeing that the robots to be moved into or out of t are in the right position close to the boundary of t. Thereafter we move the robots into their target tiles, using applications of the algorithm from Lemma 10 without the preprocessing phase. In particular, we realize a sequence of d-subflows by applying times the single realization step of the algorithm from Lemma 10.
In order to ensure that a sequence of realization steps from Lemma 10 without intermediate preprocessing steps realizes a sequence of d-subflows, we apply the following O(d) preprocessing steps for all realization steps in advance: For each side of the tile t, we place all leaving or entering robots that belong to the same subflow in a common row and stack these rows in the order which is induced by the sequence of the subflows to be realized, see Figure 15(a). Finally, pushing all stacked robots downwards into the direction of the boundary ∂t of the tile ensures, that processing one realization step implies that all robots involved in the following realization step lie in a row adjacent to ∂t, see Figure 15.
In the following we describe how we place the robots in their start tiles as a preprocessing step. First, we use the same preprocessing step as in Lemma 10 to eliminate diagonal edges. For a simplified illustration, we describe the remainder of the preprocessing in two steps that can be realized by just one application of Lemma 2. After elimination of diagonal edges, we proceed by stacking the rows of robots moving out of t in the order in which the subflows are to be processed, see Figure 15(a). Then we push the robots towards the boundary ∂t of their start tile until they meet either ∂t or another moving robot. See Figure 15(a), image 2 for an example.
This preprocessing ensures that, after each application of the algorithm of Lemma 10, all robots moving out of t in the next transformation step lie in a row adjacent to ∂t. Therefore this preprocessing can be used to replace the preprocessing done in Lemma 10. For an example, see Figure 15(a), images 3-9.
As ≤ d, the stacked rows have a height of at most d. Thus, they are contained in hulls 1 to d. Therefore, and because the flows are unidirectional and diagonals are eliminated, the structure of the stacks is not damaged by the applications of Lemma 10, allowing us to realize ≤ d subflows in O(d) transformation steps instead of one.   Proof of Theorem 3. The steps of our algorithm have the following time complexity: Initialization step 1: Computing d, T and G T is possible in O(n 1 n 2 ) time.
Step 2 & 5: The application of Lemma 2 requires O(d 3 ) time for each tile, so these steps can be done in O(dn 1 n 2 ) time.
Step 3: All subroutines of Step 3 can be done in an overall time of O(n 1 n 2 ). In particular, the (1, h)-partition C of G T can be computed in e∈E T f T (e) ∈ O(n 1 n 2 ) time by a simple greedy algorithm. The number of edges in all cycles from C combined is at most n 1 n 2 , which is the number of robots in P . Thus, resolving self-intersections of cycles in C can be done in O(n 1 n 2 ) time. As |C | ∈ O(n 1 n 2 ), the partition of C into C 1 , C 2 , C 1 , and C 2 takes time O(n 1 n 2 ). Furthermore, the partitioning of C 1 , C 2 , C 1 , and C 2 into O(d) d-subflows can be done in time O(n 1 n 2 ).
Step 4: The parallel applications of Lemma 2 to disjoint rectangles can be computed in O(dn 1 n 2 ). Furthermore, the construction of all connecting paths between incoming and outgoing robots for all tiles needs O(dn 1 n 2 ) time per application of the algorithm of Lemma 11. By applying Lemma 11 constantly many times, Step 4 needs O(dn 1 n 2 ) time.

Details for Variants on Labeling
In this section we show how to extend our approach of Section 3 to the unlabeled and, more generally, the colored variant of the parallel robot motion-planning problem.
Theorem 13. There is an algorithm with running time O(k(N ) 1.5 log(N ) + N 5 ) for computing, given start and target images I s , I t with maximum distance d between start and target positions, an O(1)-approximation of the optimal makespan M and a corresponding schedule.
Proof. We transform the input into an instance of the labeled variant, such that an O(1)-approximation for the labeled instance provides an O(1)-approximation for the colored instance. For each color i, we consider the two point sets A i , B i ⊂ R 2 , where A i contains the center points a i v of all unit squares v ∈ I i s and B i contains the center points b i v of all v ∈ I i t . A bottleneck matching between A i and B i is a perfect matching between A i and B i that minimizes the maximal distance. The cost of an optimal bottleneck matching between A i and B i is in O(M ), because a transformation sequence induces a bottleneck matching on all color classes. Efrat et al. [20] show that the geometric bottleneck matching problem can be solved in O(|A + B| 1.5 log |A + B|) time.
A set of k bottleneck matchings between the sets A i and B i induces labeled start and target configurations C s , C t . Applying the algorithm from Section 3 to these yields a sequence of transformation steps of length O(M ).

Details for Continuous Motion
In this section, we consider the continuous geometric case in which the robots are identical geometric objects that have to move into a target configuration in the plane without overlapping at any point in time. We want to minimize the makespan under these conditions, where the velocity of each robot is bounded by 1.

A Lower Bound for Unbounded Environments
In this section we give a worst-case lower bound of Ω(N 1/4 d) for the continuous makespan where N is the number of robots. To be more precise, we construct a pair of start and target configurations of N robots as illustrated in Figure 16(a). In this instance, we have d = 2. In Theorem 14, we show that the optimal continuous makespan of this instance is in Ω(N 1/4 ), yielding the worst-case lower bound stated above.  , because asymptotically, the area gained during the movement is bounded by the product of makespan and circumference. This contradicts the lower bound stated above.
A key ingredient for the construction of the time point t ∈ [0, M ] is the fact that the distance between the centers of two robots change continuously. In fact, we know that the Euclidean distance between two centers is 2-Lipschitz, because the velocity of the robots is bounded by 1. In the following, we show that there is a time interval I = [t , t + 1 20 ] such that the area of V (p(t )) is lower bounded by 3.479 for all t ∈ I, see Lemma 21. This is larger than the area of V (p(0)) and V (p(M )) by a constant factor. Based on that, we construct the time point t ∈ [0, M ] such that the area of Conv(m 1 (t), . . . , m N (t)) is lower bounded by cN + Ω(N 3/4 ), see Lemma 22. To this end, we need to relate the area of a Voronoi region to the length of the corresponding Delaunay edges.
We find a lower bound on V (p) by lower bounding the intersections of V (p) with the Delaunay triangles that are adjacent to p, i.e., with the triangles built by the edges p 1 p 2 , . . . , p 5 p 6 and p 6 p 1 with p, see Figure 17. The area of the two triangles 1 and 6 built by p 1 p 2 and p 6 p 1 with p are minimized by assuming the configuration of Figure 17(a)+(b), i.e., for |p 1 p 2 | = |p 1 p 6 | = 2.
In the configuration of Figure 17(a)+(b), we lower bound the area of 6 ∩ V (p) as follows: We subdivide the area of 6 ∩ V (p) into three subsets A, B, and C, and lower bound the area of 6 ∩ V (p) by the sum of lower bounds for |A|, |B|, and |C|.
Let u, v, and c be the mid points of pp 6 , pp 1 , and 6 , see Figure 17(b). Furthermore, let h be the vertical side length of A and be the vertical side length of B. The interior angle of 6 at p is arccos( λ 4 ). Thus, we obtain h = sin arccos λ 4 , which implies |A| = 1 2 · λ 4 sin arccos λ Furthermore, the area of ( 2 ∪ · · · ∪ 5 ) ∩ V (p) is minimized by the configuration, implying the highest possible packing density, illustrated in Figure 17(c)+(d) for |p 2 p| = · · · = |p 6 p| = |p 2 p 3 | = · · · = |p 5 p 6 | = 2. Therefore, this area is at least 4· 1 All in all, we obtain |V (p)| ≥ 2(|A|+|B|+|C|)+ 4 Next, we prove that there is a time t with an interval I := [t , t + 1 20 ] during which the area of Conv(p, p 1 , . . . , p 6 ) is greater by a constant factor than the area of Conv(p, p 1 , . . . , p 6 ) in the target configuration. To this end, we use the following observation that is an immediate consequence of the intermediate value theorem.

Lemma 22.
There is a time t ∈ [0, M ] for which the area of Conv(m 1 (t), . . . , m N (t)) is lower-bounded by Proof. By Lemma 21, for each robot i that does not lie on the boundary of the start configuration, there is a point in time t ∈ [0, M ] such that the area of V (p(t )) is at least 3.479 for all t ∈ [t , t + 1 20 ]. The continuous pigeonhole principle yields a time point t ∈ [0, M ] such that the area of k := N 20M ∈ Θ( N M ) Voronoi regions V (q 1 (t)), . . . , V (q k (t)) is at least 3.479. For all the remaining Voronoi regions, the area is at least 2 √ 3 corresponding to the largest possible packing density as achieved in the start and target configurations.
We give an upper bound N ≤ √ 2π(2 √ N + M ) on the number of robots whose Voronoi regions are not contained in Conv(m 1 (t), . . . , m N (t)). W.l.o.g., we assume that all these regions are Voronoi regions whose area we lower bounded by 3.479. Moreover, we can assume all of these regions have zero area, i.e., ignoring when lower bounding the area of Conv(m 1 (t), . . . , m N (t)). Thus, the area of Conv(m 1 (t), . . . , m N (t)) is at least . Figure 18: An upper-bound construction for the number of robots whose Voronoi regions may intersect the boundary of smallest enclosing ball for {m 1 (t), . . . , m N (t)}. The radius of the smallest enclosing ball is upper bounded by the distance from the center to the boundary in the start configuration plus the considered makespan illustrated by the dashed circles.
It still remains to prove the upper bound N ≤ √ 2π(2 √ N + M ) on the number of robots whose Voronoi regions are not contained in Conv(m 1 (t), . . . , m N (t)). First, we observe that the length of the boundary of Conv(m 1 (t), . . . , m N (t)) is at most B := 2π(2 √ N + M ), because 2 √ N + M is an upper bound on the radius of the smallest ball containing m 1 (t), . . . , m N (t). In order to estimate N , we consider the maximal number of points from [0, B] × R ≥0 whose Voronoi regions intersect the x-axis. This number is achieved for the configuration as illustrated in Figure 18, implying N ≤ B √ 2 = √ 2π(2 √ N + M ), thus concluding the proof.
Proof. In the start configuration, the intersection of each Voronoi cell with Conv(m 1 (0), . . . , m N (0)) has an area of 2 √ 3. Thus, the convex hull of the start configuration has an area of at most 2 √ 3N . We give an upper bound on the area A gained during the motion, i.e., the area of Conv(m 1 (t), . . . , m N (t)) \ Conv(m 1 (0), . . . , m N (0)), corresponding to the gray region in Figure 18. The length of the boundary ∂Conv(m 1 (t), . . . , m N (t)) is at most 2π( √ N + M ), implying A ≤ 2π( √ N + M )M , thus concluding the proof.
Proof. Combining the bounds from Lemma 22 and Lemma 23 yields black the following , and thus M ∈ Ω(N 1/4 ), concluding the proof.

An Upper Bound for Unbounded Environments
Next we give upper bounds on the stretch and makespan for moving disks in unbounded environments. First, we show that we can achieve constant stretch for well-separated robots.
Theorem 24. If the distance between the centers of two robots of radius 1 is at least 4 in the start and target configurations, we can achieve a makespan in O(d), i.e., constant stretch.  Figure 19: A mesh size of 2 √ 2 avoids robot collisions, and the cell diagonals have length 4. Note that robots may have arbitrary shape, as the separation argument applies to their circumcircles.
Proof. We consider a grid D with mesh size 2 √ 2. In this way, as shown in Figure 19, two robots starting simultaneously from different cells and traveling along two incident edges can touch when they reach the midpoints, but do not collide. Moreover, the diagonals have length 4. By choosing a grid that has no robot center on a grid line, every cell of D contains at most one start and one target position of a robot. Additionally, we can move each robot in the start and target configuration to the center of its own cell, allowing us to to use our algorithm from Section 3. Overall, we achieve a set of trajectories with makespan in O(d).
In the remainder of this section we give an O( √ N )-approximation algorithm for the continuous makespan for these kind of well-separated arrangements, by extending the approach for discrete grids.
Again, we make use of an underlying grid with mesh size 2 √ 2. Our algorithm proceeds in three phases. (1) Moving the robots to vertices of the grid, (2) applying our O(1)-approximation for the discrete case, and (3) moving the robots from the vertices of the grid to their target positions. To ensure a O( √ N )-approximation, we move each robot center to a grid vertex within a distance of O( √ N ). Phases (1) and (3) are symmetric in the following sense. By applying the steps of phase (1) to the target configuration in reverse, we compute a grid configuration that serves as target configuration for phase (2).
Phase (1) works as follows. (1.1) We begin by sorting the N robots according to the (x, y)-lexicographical order. Then we subdivide them into √ N vertical slices, each containing at most √ N robots. To the right of every slice, we add a vertical buffer slice of width 4 √ 2 by moving all robots not yet considered by 4 √ 2 units to the right. These trajectories are used in parallel, the distance covered by each robot is in O( √ N ). The buffer slices guarantee that in all following steps, the robots in each vertical slice are independent of each other.
For (1.2), we continue by sorting the robots within the vertical slices according to the (y, x)-lexicographical order. We separate the robots by ensuring vertical distance at least 4 √ 2 between every pair of robots. This can be done by moving the robots upwards, starting from the second-to-lowest one. These trajectories can be done in parallel and the distance covered by each robot is in O( √ N ). For (1.3), we finally move each robot to the bottom-left vertex of the grid cell containing its center. Proof. Phase (1) guarantees that either the horizontal or the vertical distance between each pair of robots is at least 4 √ 2. Therefore, in each grid cell, there is at most one robot and each robot is moved to its own grid vertex. In phase (2), each robot is moved by O(d ) units, where d is the maximal distance between a robot's start and target position in the grid. As the distance each robot covers in phases (1) and (3) is O( √ N ), the distance traveled in phase (2) is in O(d + √ N ). Therefore, the trajectory set computed by the algorithm has continuous makespan in O(d + √ N ). The running time as described above is pseudopolynomial; it becomes polynomial by using standard compression techniques, e.g., by compressing large empty rectangles.

Colored and Unlabeled Disks
We can combine the positive results of the previous section with the technique of Theorem 24 to achieve the same result for colored (and in particular, unlabeled) disks.

Corollary 25.
There is an algorithm with running time O(k(mn) 1.5 log(mn) + dmn) that computes, given start and target images I s , I t , an O(1)-approximation of the optimal makespan M and a corresponding set of trajectories.
Proof. The proof proceeds analogous to Theorem 13: After computing an optimal bottleneck matching, apply Theorem 3 in the setting of Theorem 24.

Geometric Difficulties
From a practical point of view, it is also desirable to compute provably optimal trajectories for specific instances of moderate size, instead of solutions for large instances that are within a provable constant factor of the optimum. This also plays a role as a building block for other purposes, e.g., providing a formal proof of NP-hardness for parallel geometric motion planning for disks: We need to be able to establish the shape of optimal trajectories when building gadgets for a hardness construction.
However, the geometry involved in this goal is far from easy, even for an example as small as the one shown in Figure 20. This is closely related to recent work by Kirkpatrick and Liu [37], who devote a whole paper to computing optimal trajectories for two disks in an arbitrary initial and target configuration, with the objective of minimizing the total distance travelled instead of the makespan. A key insight is that optimal trajectories consist of a limited number of circular arcs. This is not necessarily the case for trajectories that minimize the makespan. Even for the seeming simplicity of our example, we do not have a proof of optimality of the trajectory T 3 shown on the right. This illustrates the difficulty of characterizing and establishing optimal trajectories that minimize the total duration of a parallel schedule, highlighting the special role of geometry for the problem.

Conclusion
We have presented progress on a number of algorithmic problems of parallel motion planning, also shedding light on a wide range of interesting open problems described in the following.
The first set of problems consider complexity. The labeled problem of Section 3 is known to be NP-complete for planar graphs. It is natural to conjecture that the geometric version is also hard. It seems tougher to characterize the family of optimal trajectories: As shown above, their nature is unclear, so membership in NP is doubtful.
A second set of questions considers the relationship between stretch factor and disk separability in the continuous setting. We believe that the upper bound of O( √ N ) on the worst-case stretch factor for dense arrangements is tight. What is the critical separability of disks for which constant stretch can be achieved? How does the stretch factor increase as a function of N below this threshold? For sparse arrangements of disks, simple greedy, straight-line trajectories between the origins and destinations of disks encounter only isolated conflicts, resulting in small stretch factors close to 1, i.e., 1 + o(1). What is the relationship between (local) density and the achievable stretch factor along the whole density spectrum?
Finally, practical motion planning requires a better handle on characterizing and computing optimal solutions for specific instances, along with lower bounds, possibly based on numerical methods and tools. Moreover, there is a wide range of additional objectives and requirements, such as accounting for acceleration or deceleration of disks, turn cost, or multi-stop tour planning. All these are left for future work.