Hardness and Approximation Algorithms for Balanced Districting Problems
Abstract
We introduce and study the problem of balanced districting, where given an undirected graph with vertices carrying two types of weights (different population, resource types, etc) the goal is to maximize the total weights covered in vertex disjoint districts such that each district is a star or (in general) a connected induced subgraph with the two weights to be balanced. This problem is strongly motivated by political redistricting, where contiguity, population balance, and compactness are essential. We provide hardness and approximation algorithms for this problem. In particular, we show NP-hardness for an approximation better than for any constant in general graphs even when the districts are star graphs, as well as NP-hardness on complete graphs, tree graphs, planar graphs and other restricted settings. On the other hand, we develop an algorithm for balanced star districting that gives an -approximation on any graph (which is basically tight considering matching hardness of approximation results), an approximation on planar graphs with extensions to minor-free graphs. Our algorithm uses a modified Whack-a-Mole algorithm [Bhattacharya, Kiss, and Saranurak, SODA 2023] to find a sparse solution of a fractional packing linear program (despite exponentially many variables) which requires a new design of a separation oracle specific for our balanced districting problem. To turn the fractional solution to a feasible integer solution, we adopt the randomized rounding algorithm by [Chan and Har-Peled, SoCG 2009]. To get a good approximation ratio of the rounding procedure, a crucial element in the analysis is the balanced scattering separators for planar graphs and minor-free graphs – separators that can be partitioned into a small number of -hop independent sets for some constant – which may find independent interest in solving other packing style problems. Further, our algorithm is versatile – the very same algorithm can be analyzed in different ways on various graph classes, which leads to class-dependent approximation ratios. We also provide a FPTAS algorithm for complete graphs and tree graphs, as well as greedy algorithms and approximation ratios when the district cardinality is bounded, the graph has bounded degree or the weights are binary. We refer the readers to the full version of the paper for complete set of results and proofs.
Keywords and phrases:
Approximation algorithms, algorithmic fairnessFunding:
Prathamesh Dharangutte: Supported by NSF IIS-2229876, DMS-2220271, DMS-2311064, CCF-2208663, CCF-2118953.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Approximation algorithms analysis ; Theory of computation Graph algorithms analysisEditors:
Mark BunSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
In this paper we study the problem of balanced districting, where we are given an undirected graph where vertices carry two types of weights (population, resource types, etc) and the goal is to find vertex disjoint districts such that each district is a connected induced subgraph with the total weights to be -balanced – each type of weight is at least of total weight of the district. We aim to maximize the total weights of the vertex disjoint balanced districts.
This problem is an abstraction of many real world scenarios of districting where contiguity/connectivity, population/resource type balance and compactness are desirable properties. For example, in political redistricting, several towns are grouped into a state legislative district or a congressional district. Balancedness requires that each district maintains a sufficient fraction of each (political or demographic) group, which is essential for several reasons. First, voter turnout rates sharply increase if the anticipated election outcome is expected to be a close tie. [14] Thus a balanced district would motivate and raise voter turnout rates. Additionally, balancedness ensures that each political group has the opportunity to elect a candidate of their choice, in compliance with the Voting Rights Act of 1965 [41] and other amendments [25]. This principle also helps to prevent the tipping point in racial segregation, where residents of one demographic group start to leave a district once their population falls below a certain threshold. [56, 52] Connectivity or contiguity, on the other hand, demands that each district be geographically contiguous, – in the language of graph theory that the vertices corresponding to the towns form an induced connected subgraph. This requirement is enforced by most state laws and is a standard practice in general. Many states also have a compactness rule [2], which refers to the principle that the constituents residing within an electoral district should live as near to one another as possible. It often manifests into a preference for regular geometric shapes or high roundness (small ratio of circumference and total area).
The problem of redistricting also appears in many other scenarios such as districting for public schools, sales and services, healthcare, police and emergency services [13], and logistics operations [46]. In addition, the balanced districting problem is of interest in a broader range of applications for resource allocation. For example modern computing infrastructure such as cloud computing provides services to a dynamic set of customers with diverse demands. Customer applications may have a variety of requirements on the combination of different resources (such as CPU cycles, memory, storage, or access of special hardware) that can be summarized by the balanced requirement.
Due to the importance of the problem, redistricting has been studied in a computational sense for schools and elections, that dates back to the 1960s. [42] Since then, an extensive line of work (see [8] for a survey) has formulated the redistricting task as an optimization problem with a certain set of objectives. A lot of existing work considers the geographical map as input and comes up with practical methods and software implementations that generate feasible districting plans. We will survey such work in Section 1.2. However, most previous works focus on optimizing a single desirable property alone (e.g., connectivity [1], or balance [38]), or optimizing average aggregated scores combining multiple objectives of the districts [31]. In contrast, our problem formulation takes these objectives as hard constraints and optimizes the total population that satisfies them. There are several merits of this formulation. First, it offers interpretable, fair, worst-case guarantees for the identified districts. Districting problem is a multi-faceted one. With multiple criteria taken into consideration, it feels ill-fit if only one criterion is singled out as the optimization objective. Furthermore, an average quality guarantee does not provide meaningful utility at the individual district level, and aggregated scores offer limited insight into each objective. Second, a districting solution has a consequential nature and should be taken with a dynamic and time-evolving perspective. Once a new districting plan is in place, residents naturally respond to the algorithm output, resulting in changes in the population distributions. One prominent example is the tipping point theory in racial segregation mentioned above. Optimizing a single balancedness score can still lead to many districts falling below such a tipping point, exacerbating segregation. With this in consideration it is important to keep balancedness as a hard constraint, which hopefully facilitates district stability and integration. With -balanced property as a requirement, a graph partitioning into vertex disjoint -balanced districts is not always possible. For example, if the total weight is not -balanced, some districts have to be unbalanced no matter how the districts are defined. Therefore, we aim to maximize the total weights in balanced districts.
In this paper we focus on the graph theoretic perspective of the redistricting problem. We abstract the input as a graph where vertices represent natural geographical entities/blocks (e.g., townships) and edges of two vertices represent geographical adjacency/contiguity. We focus on two important quality considerations namely connectivity and balanceness, and we maximize coverage, i.e., the total weight (population) covered by balanced districts. In addition, we also consider compactness, which in our setting leads to preference of districts as low-diameter subgraphs. An important case studied in this paper is to consider a balanced star district, which consists of a center vertex as well as a set of neighboring blocks all adjacent to . We also consider districts of bounded rank for a constant – where a district has at most vertices.
1.1 Our Results and Technical Overview
We report a systematic study of the balanced districting problem on both hardness results and approximation algorithms. Our goal is to dissect the problem along different types of input graph topologies (general graphs, planar graphs, bounded degree graphs, complete graphs, tree graphs, etc), district types (e.g., arbitrarily connected districts, star districts, or bounded rank-), and weight assumptions (arbitrary weights, binary weights). A brief summary of our results can be found in Table 1.
|
|
Result | |||
|---|---|---|---|---|---|
| General | arbitrary/star | -hard for -approx | |||
| Max degree | arbitrary/star |
|
|||
| Planar with | star with rank-3 | -hard | |||
| Complete or Tree | arbitrary/star | -hard | |||
| Planar | star | -approx | |||
| -Minor-Free | star | -approx, | |||
| Outerplanar | star | -approx | |||
| General | star | -approx | |||
| Complete graph | arbitrary/star | FPTAS -approx | |||
| Tree | arbitrary/star | FPTAS -approx | |||
| General | rank-2 | polynomially solvable | |||
| General | rank-, | -approx | |||
| Bounded degree | star | -approx | |||
| General (binary weights) | star | -approx |
Complexity and Challenges
There are three elements in the balanced districting problem that make it challenging and interesting, from a technical perspective: 1) connectivity – the induced subgraph of a district is connected 2) packing and coverage maximization – no vertex belongs to two districts and we maximize the total weights of included vertices; 3) balancedness – the two types of weights in a district need to be roughly balanced. These elements are shared with a number of well known hard problems, suggesting that our problem is also computationally challenging. For example, the exact set cover problem asks if there is a perfect coverage and packing in a set cover instance. The packing element is shared with maximum independent set problem – a vertex included will forbid its neighbors to be included. And the balancedness is shared with subset sum problem. Therefore by using the hardness of these problems we can show hardness and hardness of approximation of the balanced districting problem for a variety of graph classes. The hardness of the balanced districting problem is immediately shown by a reduction from exact set cover problem. By a reduction from maximum independent set problem, we can show that the balanced districting problem does not have an approximation of for any constant in a general graph of vertices unless and is -hard for bounded degree graphs. From a reduction from the planar 1-in-3SAT problem, the balanced districting problem is -hard for a planar graph, and even if each district has at most three vertices – a crucial condition since the problem can be solved exactly by maximum weighted matching if each district is only allowed to have two vertices. If the input graph is a tree or a complete graph – extremely simple topologies for which many -hard problems can be solved in polynomial time, the balanced districting problem is still -hard due to the balancedness requirement by a reduction from subset sum. All of the hardness proofs hold even if we limit the output districts to be only of a star topology. This set of hardness results can be seen from the Top section of Table 1.
Greedy Methods on Special Cases
On the positive side, it is natural to ask if existing techniques for solving or approximating these related problems can be borrowed for the balanced districting problem. The answer turns out to be often “not really”, even if we only look for balanced star districts. The additional requirements in our problem often break some crucial steps. For example, the problem of maximum independent set has an easy lower bound if the input graph is sparse (or planar) – thus a simple random greedy algorithm with conflict checking gives an easy constant approximation algorithm. But such lower bound no longer holds true for our problem if the weights are not balanced. Even for star districts, when the maximum degree is not bounded by a constant, the number of potential balanced districts can be exponential in , the size of the network.
We show in Section 6 of the full version that ideas using a greedy approach and local search method give us approximation algorithms, but only for very special cases. Namely, if the districts have rank-, we can try the greedy maximum hypergraph matching to have a -approximation to the optimal solution. When all weights are binary ( or ), we can use a greedy algorithm with local search to get a approximation to the optimal -balanced star districting solution. Similarly, if the graph has maximum degree , we can get a -approximation for -balanced star districting.
LP Framework and Rounding
To really tackle the problem with arbitrary weights, for districts that are not limited by rank and graphs beyond constant bounded degree, we first examine what we can do with complete graphs or tree graphs – here the topology is made some of the simplest possible, and we would like to address the challenge from packing and balancedness. In this setting we can obtain FPTAS for both complete graphs and tree graphs (Section 5)– although the algorithm is much more involved due to the additional requirements of packing and connectivity (for tree graphs, as connectivity is trivial for complete graphs). Our FPTAS uses a dynamic programming technique that maintains one district’s possible weights and introduces a new prioritized trimming method to approximate weights while ensuring that the resulting district satisfies the -balanced constraint and approximates the optimal weight. The FPTAS for the complete graph is later used as a subroutine for solving the relaxed LP formulation for other graph settings.
Beyond complete graphs and tree graphs, we develop a general framework (Section 4) that produce approximation algorithms for star districts on different types of graphs. All these algorithms start from a relaxed linear program where we formulate a variable for each potential balanced star district , which can take non-integer values and for all districts that share the same vertex, the sum of their variables is at most . Despite potentially exponentially many variables (and constraints in the dual program) that preclude standard solutions, we adapt the whack-a-mole framework [9], which can be seen as a lazy multiplicative weight update algorithm, on dual variables, and we design a “separation oracle” that selects a violating constraint in the dual program in time polynomial in and that can significantly improve the solution. Consequentially, the linear program can be solved in time polynomial in and up to any precision and the number of non-zero primal variables (i.e., the candidate balanced star districts) is also polynomial. Intriguingly, our separation oracle is based on our FPTAS on compute graphs for balanced districting.
Now we will round the fractional solution to an integer solution and in the process we may lose an approximation factor. We use a simple randomized rounding method where we sort the districts with non-zero values in decreasing order of total weight, and flip a coin with probability proportional to to include a potential district , if does not overlap with any districts already included. In order to bound the loss of quality in the rounding process, we need to upper bound the correlation of the variables, namely, sum of for all pairs of overlapping districts . These are the (fractional) districts that have to be dropped due to conflict. To establish the approximation factor, we wish to bound the total sum of correlation by a factor multiplied with the total sum over all possible districts – exactly the optimal LP solution. We show that this ratio is , which immediately gives an -approximate solution for star districts on a general graph. Notice that this is tight due to the hardness of approximation results.
Due to the strong motivation from political redistricting and resource allocation considering geographical proximity/constraints, the planar graph is of particular interest to us. One of the main technical contributions is a polylogarithmic-approximation algorithm for balanced star districting on planar graphs and related algorithms for minor-free graphs and outer planar graphs. For a planar graph we now adopt a balanced planar separator and use a divide-and-conquer analysis. Namely, we only need to analyze the overlapping districts with at least one of them including vertices in the separator. Now a crucial observation is, if we can partition the planar separator into 5-hop independent sets, then we can decompose the total sum of correlation by the independent sets – fix an 5-hop independent set , two star districts that touch different vertices in are disjoint and star districts that share the same vertex in have their total district value bounded by due to the primal constraint. This allows us to upper bound the correlation term for the star districts touching the separator by a factor of of the sum of with districts on the separator. Recursively, this gives an factor loss in the final approximation.
We remark that the above analysis asks for a new property of a balanced separator – one that can be decomposed into a small number of 5-hop independent sets (called a “scattering” separator) – and we do not care about the size of the separator. This is possible for a planar graph if we use the fundamental cycle separator, which is composed of two shortest paths, and thus at most -hop independent sets. For a -minor-free graph with as a graph of vertices, we show the existence of a similar separator, which can be decomposed into 5-hop independent sets. Thus the final approximation ratio for -minor-free graphs is . For outer planar graphs, we can skip the recursive step and work with graph partitions with -hop independent sets and get -approximation. We believe that this technique of using balanced scattering separators is interesting in its own and may find additional applications in other problems with some packing (non-overlapping) requirement on the solution.
On general graphs, the formulated linear program could have an integrality gap as large as . Since our rounding algorithm turns an fractional solution into an integral one, this barrier unavoidably blends into our analysis to the proposed rounding algorithm, producing an provable bound. However, by thinking about this argument contrapositively, an upper bound to our rounding algorithm leads to the integrality gap of the formulated LP, which could be an interesting takeaway. On the other hand, we also show that there are planar graphs (specifically grid graphs) such that our rounding algorithm produces a constant approximation ratio. This observation suggests that we cannot hope for a PTAS using this approach, even on planar graphs.
1.2 Related work
To the best of our knowledge, this paper is the first to study the balanced districting problem. Below, we briefly survey related problems and explore their potential connections to ours.
Districting
Our problem is connected to computational (re)districting for schools and elections, which dates back to the 1960s. [42] Since then, an extensive line of work (see [8] for a survey) has formulated the redistricting task as an optimization problem with a certain objective and constraints, e.g,. balancedness, contingency, or compactness. Our redistricting problem focuses on optimizing the population in balanced and contiguous districts. One concept related to our notion of balance is competitiveness. Recent work introduces vote-band metrics [30], which require a certain fraction of votes to fall within a specified range (e.g., 45-55%) for competitive elections. Subsequently, [22] also adopt similar notions called -Vote-Band Competitive which is equivalent to our -balancedness by setting . While related, our work diverges technically, offering both hardness and algorithmic results for several common graphs. [30] empirically evaluates ensemble methods for district distributions. [22] explored the hardness and heuristic algorithms for maximizing the number of districts meeting the target competitiveness constraints, with additional requirements that all districts have roughly the same population limited compactness consideration.
One approach treats contiguity as a transportation cost and designs linear programming models to minimize this total cost [1, 34, 24]. Interestingly, the fair clustering problem can also be viewed as optimizing contingency [12, 19, 44, 18]. Other research focuses on optimizing compactness scores [6, 45, 48, 43] or using Voronoi or power diagrams with some variant of -means [57, 26, 27, 35]. Finally, another line of work optimizes balance scores [38]. These approaches differ from ours in that they treat specific aspects of districting (contiguity, compactness, balance, etc.) as objectives, rather than maximizing the population that meets these criteria.
Besides the optimization approach, another popular approach uses sampling to generate a distribution over districts and create a collection of district plans for selection. [3, 23] One widely used method is ReCom [31], an MCMC algorithm. However, these approaches may suffer from slow mixing times and lack formal guarantees. [51] Finally, several papers take a fair division approach [49, 53, 29]. The problem is quite different, however, as fairness is defined concerning parties (types) and the number of seats they would win (i.e., the number of districts where they would have a majority) compared to other districts.
Algorithms
Beyond districting problems, as outlined in the technical overview, our problem connects to several classical algorithm problems. If we only want to maximize the population of a single connected and balanced district, the problem becomes a balanced connected subgraph problem [10, 11, 50]. However, the previous work in this area typically considers unit weights for either type, which does not adequately represent the districting problem that operates in an aggregated block-level setting. Our problem can be seen as packing subgraphs on graph [28], e.g., edges (maximum matching), triangles [47], circles [32]. Finally, we note a line of work on a balanced, connected graph partition [17, 21], and balanced bin-packing problem [33], which, however, aim to generate a partition where each component has similar weights.
1.3 Open Problems
As the first work to formally study the balanced districting problem in this formulation, our work leaves a number of interesting open problems for future work. Obviously it is good to close the gap of approximation and hardness for different families of graphs. Our results are tight for general connected districts on complete graphs and tree graphs, as well as star districts on general graphs, but leave gaps for other settings. We conjecture that there exists an algorithm with a constant approximation factor for -balanced star districting on planar graphs. It would also be interesting to develop algorithms to go beyond star districts, i.e., -hop graphs for a constant or the more general setting of connected districts. We remark that the scattering separator can be modified to handle -hop graphs but we need an efficient separation oracle. We consider two types of weights/populations and generalizing the problem and solutions to three or more colors would be interesting and is currently widely open. We remark that the PTAS algorithm for complete graphs is specific for two weights/colors. Finally, an interesting future direction would be to develop algorithms that also demand approximate population equality among districts.
2 Preliminaries
Let be an undirected graph where we call the vertices blocks. A district is a subset of blocks where the induced subgraph is connected. If there exists a block that is a neighbor of every other block in , then we say is a star district and is a center of . The rank of a district is the number of blocks in . A (partial) districting is a collection of disjoint districts. That is, where whenever . Notice that a districting is not necessarily a partitioning of the graph, i.e., not all blocks are included in the districts.
-Balanced Districting Problems
There are two communities or commodities of interest. Let the functions represent the population of each community or weight of each commodity on each vertex. The weight of a block is defined to be . Let be a district. By a natural extension we define for and accordingly as the weight of the district . Finally, given a districting , we define to be the total weight.
Given a constant , we say that a district is -balanced if
| (1) |
is a -balanced districting if all districts are -balanced. Notice that if the total weights in the graph are not -balanced, we cannot hope to include all blocks in -balanced districts. Given , a graph , and functions and , the problem -Balanced-Districting is subjected to find any -balanced districting that maximizes . That is, we wish to maximize population covered in the -balanced districts.
We will investigate several variants to the problem. A lot of our results concern restricting the output districting to be star shaped, respecting the need for compactness of the districts. We also consider districts of bounded rank if every district has at most vertices with assumed to be a constant. In general we consider the weights of the vertices to be arbitrary integer values. A special case is when all weights are uniform (binary) – each vertex has only one non-zero weight type, either , or , .
Let be any variant to the -balanced districting problem. A districting is said to be feasible on if satisfies all districting type constraints, but not necessarily to have its weight maximized. Any districting with the maximum possible total weight is said to be optimal. We say that a feasible districting is an -approximated solution if , where is any optimal districting.
Graph Types
A graph is said to be planar if there exists an embedding of all vertices to the Euclidean plane such that all edges can be drawn without intersections other than the endpoints. A face of an planar embedding is a connected region separated by the embedded edges. is said to be outerplanar if there exists an embedding of such that there is a face containing all vertices. Often this face is assumed to be the outer face. A graph is said to be a minor of if is isomorphic to the graph obtained by a sequence of vertex deletions, edge contractions, and edge deletions from . We say that is -minor-free if does not have as its minor.
3 Hardness Results
We first present hardness results for a variety of -balanced districting problems with increasing restrictions on the parameters. The proof of Theorem 4 is deferred to Appendix A while rest of the proofs appear in Appendix A of the full version.
Theorem 1.
The -balanced districting problem is -hard, for both the case when the districts are connected subgraphs and when the districts are required to be stars.
Theorem 1 uses a reduction from the ExactSetCover problem. The ExactSetCover problem remains -hard even when each set has exactly three elements and no element appears in more than three sets [37], or when each element appears in exactly three sets [39]. Therefore, if we limit that each district has at most four blocks, the problem remains -hard. In the following we show that the problem remains hard if each district has at most three blocks and the graph is planar. Notice that if each district has at most two blocks, the problem can be solved by maximum matching in polynomial time.
Theorem 2.
The -balanced districting problem is -hard, when is a planar graph with maximum degree , each district has rank- (i.e., with at most three blocks), and the districts must be stars.
The proof of the above claim uses reduction from planar 1-in-3SAT. We show next that the problem on a complete graph or a tree remains hard. This reduction uses the problem of subset sum.
Theorem 3.
The -balanced districting problem is -hard for any , when is a complete graph or a tree. This holds for both the case when the the districts are connected subgraphs and when the districts are required to be stars.
Last, we show hardness of approximation by a reduction from the maximum independent set problem.
Theorem 4.
The -balanced districting problem does not have an -approximation (for any constant ) in a general graph unless . On a graph with maximum degree , one cannot approximate the -balanced districting problem within a factor of assuming , and assuming the Unique Games Conjecture (UGC). Even if is a constant, the problem is -hard. These statements hold when the districts must be stars and when the centers of the stars are limited to a subset of vertices.
4 An Algorithm for -Balanced Star Districting
In this section, we give an approximation algorithm to the -balanced star districting problem. The algorithm is based on a multiplicative weights update approach of solving packing-covering linear programs [54, 4, 9] and then apply a randomized rounding procedure [15]. Interestingly, the same algorithm achieves different approximation guarantees on different classes of the graphs, summarized in the following theorem.
Theorem 5.
Let be a graph with weight functions and . There exists a polynomial time algorithm that computes a -balanced star districting , with the following guarantee:
-
(1)
For any general graph , is an -approximate solution.
-
(2)
If is planar, then is an -approximate solution.
-
(3)
If is an -minor-free graph with , then is an -approximate solution.
-
(4)
If is a tree or an outerplanar graph, then is an -approximate solution.
In Section 4.1, we formulate the problem as a linear program. In Section 4.2 of the full version, we describe how to apply the Whack-a-Mole algorithm [9] (with our own separation oracle) that obtains an -approximate solution to the linear program in polynomial time. In Section 4.2, we apply Chan and Har-Peled’s randomized rounding technique [15], showing that bounding the pairwise product terms leads to the desired approximation factors. We show that there is a bound of on any graph in Section 4.3, To bound the pairwise product terms, we introduce the scattering separators in Section 4.4. These scattering separators are useful for analyzing the approximation ratios for planar graphs and for minor-free graphs. For the graph classes that is a subclass to the planar graphs, we provide tailored-but-better analysis for outerplanar graphs and for trees , which conclude the proof of Theorem 5. We refer readers to the full version for complete details of these cases.
4.1 LP Formulation
We formulate the -balanced districting problem as a linear program. For each -balanced star district , we define a variable indicating whether or not this district is chosen. Thus, the integer linear program for can be defined as:
| maximize | (2) | |||
| subject to | ||||
To give an approximate solution to the above integer linear program, we follow the standard recipe that solves the relaxed linear program first and then apply a randomized rounding algorithm.
Equivalent Relaxed Linear Program
In order to solve the relaxed linear program of Equation 2, we use weighted variables: for each district , we define variable . Hence, the equivalent linear program (and its dual linear program) we will be solving can be described as follows.
We note that the total number of primal variables (i.e., the number of potential -balanced star districts) could be exponentially many in terms of the graph size. However, due to the special structure of this problem, seeking for an approximate solution does not require the participation of every variable. We summarize the result of solving the relaxed linear program as Theorem 6 below.
Theorem 6.
Given a graph and a precision parameter , there exists an algorithm that returns an -approximate solution to the above linear program in time. Moreover, there are at most non-zero terms among the returned primal variables .
Implementing the Separation Oracle
In order to have a polynomial time algorithm, we want an efficient separation oracle. The following lemma (Lemma 7) reduces the task of finding a violating district to solving the -balanced star districting problem in a complete graph.
Lemma 7.
Given an input instance , re-weighted values , and dual variables for each vertex , there exists an algorithm that either reports that , or returns a -balanced district such that and , where is a violating -balanced district with the maximum value. The algorithm runs in time.
4.2 The Randomized Rounding Algorithm
We use a randomized rounding technique modified from [15, Section 4.3]. Intuitively, the algorithm maintains a set of non-overlapping districts, which is initially empty, and keeps adding districts into .
The rounding algorithm is described as follows. Let be the output of an approximate solution to LP. Let be the support of the solution. The algorithm first sorts all non-zero valued districts according to their weights , from the largest to the smallest. Let be a parameter to be decided later. For each district , with probability , the algorithm adds into as long as there is no district in overlapping with . 111The randomization step appears to be necessary since there is an example where deterministic rounding incurs a large approximation factor (examples and details in the full version). The algorithm outputs after all districts in are considered. The necessity of scaling the non-zero variables by comes from the analysis of expected total weight in . In an actual implementation of the algorithm, one can make the algorithm oblivious of , by iteratively testing on different values of for and then picking the largest weighted districting among the returned ones.
Analysis
The output of the algorithm can be seen as a random variable. Let be the total weight of the districts within . A straightforward analysis (see Lemma 13) shows that
The right hand side of the above expression contains a weighted correlation term. The technique by Chan and Har-Peled [15] transforms the above weighted correlation terms into unweighted ones. They mentioned that, a desired -approximate solution can be achieved, as long as for any -thresholded subset , the total unweighted correlation terms between overlapped districts can be upper bounded by the sum over all primal variables within the subset:
| (3) |
The above condition implies the following (see Lemma 14):
where is the optimal value of the LP problem. Thus, is an -approximate solution in expectation. We remark that due to the factor appearing in the denominator, this linear program with randomized rounding approach (although not contradicting) is unlikely to achieve a PTAS.
The above analysis to the rounding algorithm enables the approach of seeking a suitable value, such that Equation 3 holds. The rest of the section focuses on providing upper bounds of on various classes of input graphs.
4.3 An -Approximation Analysis for General Graphs
In this section, we show that the algorithm achieves an -approximate ratio on any graph, by giving an upper bound for the randomized rounding algorithm (with proof in the full version).
Lemma 8.
Let be any graph. Let be any feasible solution to the linear program. Then,
Remarks: Integrality Gap
We remark that this algorithm is achieving nearly the best approximation factor since it is -hard to have approximation factor of for any constant . Related to this, we would like to examine the potential loss in different steps of our algorithm. In the first step, we relax the integer linear program to a linear program with variables taking real numbers, the integrality gap refers to the ratio of the optimal fractional solution to the optimal integer solution (since we consider maximization problem the optimal fractional solution is no smaller than the optimal integer solution). In the second step of our algorithm, we use randomized rounding to turn the fractional solution back to a feasible integer solution. We call the ratio between the sum of products between overlapping districts’ primal variables and the sum of all variables to be the rounding gap, i.e., . The rounding gap can be used to upper bound the loss of solution quality when we turn the fractional solution to a feasible integer solution using the randomized rounding algorithm.
Necessarily, a large integrality gap implies a large rounding gap for sure. Specifically, let be the rounding gap. The analysis to our rounding algorithm guarantees the existence of an integral solution within a factor of from the optimal fractional solution, which implies an integrality gap of at most when setting . Thus if the integrality gap is large, we cannot have a small rounding gap. Interestingly, the above discussion, combined with Lemma 8 implies that the integrality gap of the natural LP formulation for the star districting problem is at most .
Next we show that our LP relaxation could have a large integrality gap of for a general graph. We use a reduction from -uniform hypergraph matching problem to our -balanced star districting problem. Let be the given -uniform hypergraph – a hypergraph such that all its hyperedges have size . We construct a graph by creating additional vertices for each hyperedge. These vertices have heavy weights, say and vertices from have weights . For each hypergraph (which is a subset of vertices), we connect all vertices to the corresponding vertex in . It ensures that there is an one-to-one correspondence between hyperedges of to -balanced star districts on . Now, the relaxed linear program for will be equivalent (with an extra -factor in the objective function) to a fractional hypergraph matching. Thus, the integrality gap of -uniform hypergraph matching [36, 16] can be transferred to our LP formulation – specifically, the construction in [16] via projective planes leads to an integrality gap.
On the other hand, we do observe planar graph instances with a constant rounding gap with an at most integrality gap (refer the full version). Again, this does not eliminate the possibility of achieving PTAS, but it suggests a conjecture that we are unlikely to obtain a PTAS for planar graphs using the current analysis.
4.4 Scattering Separators
Let us now introduce the scattering separators, which is useful for the divide and conquer framework for upper bounding the approximation ratio of the randomized rounding procedure.
Definition 9.
Let be a graph and let be any subset. We say that is:
-
-scattered, if can be partitioned into at most subsets with each being a -hop independent set222We say that is a -hop independent set (with respect to the graph ) if for all pairs of distinct vertices and , the shortest distance between and is at least on .;
-
-orderly-scattered, if there exists a way to partition into a sequence of at most subsets , where each is a -hop independent set after the removal of all previous subsets .
Definition 10.
Let be a graph, , and . A -scattering separator is a subset of vertices such that (1) is -orderly-scattered, and (2) is a balanced separator of , that is, the largest connected component of has at most vertices.
We remark that a -scattered set is also -orderly-scattered. This orderly-scattered definition are useful when we remove subsets of vertices sequentially – they are used in the analysis of, for example, planar graphs and minor-free graphs. On the other hand, for some graph class such as outerplanar graphs it suffices to use -scattered sets within the analysis.
The scattering separators are useful in the -balanced districting problem for . To justify this, suppose that we have a -hop independent set . Any star district contains at most one vertex in . If two star districts contain different vertices of , the two districts must be disjoint. Thus we partition the pairs of overlapping districts by whether they overlap with or not, and if so, which vertex of . The following fact can be easily verified.
Fact 10.
Let be a -hop independent set. Consider a fixed district . Assume there is a district that overlaps with and touches , i.e., and . Since the diameter of is at most , we know that . Further, if overlaps with two other star districts with both centers of in , then , have the same center.
Fix a district . Since all other districts that overlap with contains (at most) the same vertex in , these primal variable values add up to at most . This implies that, removing from charges at most one copy of . If we are able to show that the entire vertex set is a -orderly-scattered, then we obtain a desired value for Equation 3. However, we do not know if such a constant can be achieved for planar graphs. Fortunately, using the idea of balanced separators, we are able to achieve a polylogarithmic approximate solution.
Lemma 11.
If and all its subgraphs have a -scattering separator, then the districting obtained from executing the algorithm on is a -approximated solution.
Proof.
Let be a -scattering separator of . Let be the set of all districts. Then, all summands of the form where and can be also split into three parts:
-
(1)
: one of the centers or is in .
-
(2)
but : one of their common vertices is in .
-
(3)
None of the above.
For , we denote the sum of products of those overlapping districts of case . For case (1), using the given constraint that is -orderly-scattered, we consider removing each set one at a time from the graph in the increasing order of . For each , without loss of generality, we may swap the role of and such that for each summand we have . By applying Section 4.4 (with ), we know that for each district , all districts that overlap with with are actually centered at the same vertex. This implies that the sum of all such values will be at most by the primal constraint. Hence, the contribution of any district under case (1) for in the graph is at most
By summing over all and over all the sets , we have .
For case (2), the terms can be partitioned according to the common vertex :
Again here we use the property that for any fixed vertex , the sum of the primal variables for star districts containing sum up to be at most , i.e., . Further, fix an , any star district includes at most one vertex from .
For case (3) we can delegate the cost to the recursion. Notice that, all districts whose centers are in will not participate in case (3). Hence, when considering each of the connected component in , all the districts (after chopping off vertices in ) are still connected and are star-shaped.
Since is a balanced separator, the divide and conquer analysis has at most layers. Thus, the sum over all products of overlapping districts is bounded by at most times the sum .
5 FPTAS for General Districting on Complete Graphs and Trees
In this section, we present FPTAS for complete graphs and trees with weighted blocks. The algorithms here find -balanced, connected districts that can be more than a star graph. Further, for complete graphs and trees, the LP-based algorithm in the previous section achieves -approximation ratio while the algorithms in this section achieves a ratio of .
5.1 Complete Graph
Let be a complete graph with functions of weights and . Because we can merge two adjacent -balanced districts on into a single -balanced district as shown in Section 5.1, the -balanced districting problem on complete graphs can be reduced to obtaining one -balanced district, described as the following:
Fact 11 (Mergeable Property).
Assume and are disjoint districts and is connected. If and are both -balanced, then is also a -balanced district.
Complete-Graph--Balanced-Districting
Input: Let be a complete graph of blocks and function of weights and .
Goal: Obtaining a subset such that the total weight is maximized subjected to the -balanced condition:
| and | (4) |
The following theorem gives an FPTAS using dynamic programming (Algorithm 1).
Theorem 12.
There exists an FPTAS algorithm solving Complete-Graph--Balanced-Districting so that for all , , and complete graph of nodes with functions of weights , the algorithm outputs an -approximation in time where .
We defer a detailed proof of Theorem 12 to Appendix C and here give a high level idea. One naive approach involves creating a complete list of potential subset sum values, denoted as and outputting the largest -balanced one. While this approach finds an optimal solution, it is not necessarily efficient, as can be exponentially large. Similar to the knapsack problem or subset sum problem, one may use a bucketing idea to trim the list, keeping only one value when several are close to each other. However, the -balanced constraint posts a challenge for the algorithm – for example, if an trimming algorithm keeps partial districts during the iterations, these partial districts may not always remain -balanced resulting in a poor approximation ratio. To address this, we design a prioritized trimming process that prioritizes subsets satisfying the -balanced condition in Equation 4 such that any -balanced district in would have an approximated district in our trimmed list.
Specifically, given , we say is an -approximate of if where . Let and be two linear functions on . We say is -dominated by if for , and , and is a -trimmed of if and for each there exists which -approximates and -dominates . The key observation is that if is -balanced satisfying Equation 4 with and -dominates , is also -balanced. A similar argument holds for . This observation suggests that when trimming multiple nearby values, we keep the one that optimizes (and ) that ensures the existence of -balanced approximated values. Therefore, we can find an -approximated solution if we can compute -trimmed of all possible subset sum values . Moreover, because and are linear, we can use dynamic programming to sequentially and efficiently compute and that is -trimmed and -trimmed of all possible subset sum values on the first blocks respectively. While Algorithm 1 only returns the size of our approximated solution , we can use an additional factor to store the set for each in to recover our approximated optimal districting.
Finally, we note that our prioritized trimming that ensures both inequalities in Equation 4: one through prioritized the other through exhausting cases of or . However, we cannot extend this approach to non-binary color settings. Instead, if we allow relaxing -balanced constraint to -balanced district with slightly larger than , the standard bucketing algorithm mentioned above can directly work even for the non-binary color setting.
Adapting to the Star Districting Setting
We note that this dynamic programming approach also works for star districts on tree graph and yields an FPTAS. Similar to the arbitrary districts setting, we consider three cases for each : the absent case where is not included in any district; the consolidating where is in a star district that is contained in its descendants; the incomplete case where is the center of a star district that is incomplete.
References
- [1] Maxwell Allman, Itai Ashlagi, Irene Lo, Juliette Love, Katherine Mentzer, Lulabel Ruiz-Setz, and Henry O’Connell. Designing school choice for diversity in the San Francisco unified school district. In Proceedings of the 23rd ACM Conference on Economics and Computation, EC ’22, pages 290–291, New York, NY, USA, 2022. Association for Computing Machinery. doi:10.1145/3490486.3538271.
- [2] Micah Altman. Traditional districting principles: judicial myths vs. reality. Social Science History, 22(2):159–200, 1998.
- [3] Micah Altman and Michael P McDonald. Bard: Better automated redistricting. Journal of Statistical Software, 42:1–28, 2011.
- [4] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(6):121–164, 2012. doi:10.4086/toc.2012.v008a006.
- [5] Per Austrin, Subhash Khot, and Muli Safra. Inapproximability of vertex cover and independent set in bounded degree graphs. Theory Comput., 7(1):27–43, 2011. doi:10.4086/TOC.2011.V007A003.
- [6] Assaf Bar-Natan, Lorenzo Najt, and Zachary Schutzman. The gerrymandering jumble: map projections permute districts’ compactness scores. Cartography and Geographic Information Science, 47(4):321–335, 2020.
- [7] Cristina Bazgan, Bruno Escoffier, and Vangelis Th. Paschos. Poly-APX- and PTAS-Completeness in standard and differential approximation. In Rudolf Fleischer and Gerhard Trippen, editors, Algorithms and Computation, pages 124–136, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.
- [8] Amariah Becker and Justin Solomon. Redistricting algorithms. In Moon Duchin and Olivia Walch, editors, Political Geometry: Rethinking Redistricting in the US with Math, Law, and Everything In Between, pages 303–340. Springer International Publishing, Cham, 2022. doi:10.1007/978-3-319-69161-9_16.
- [9] Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak. Dynamic algorithms for packing-covering LPs via multiplicative weight updates. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 1–47. SIAM, 2023. doi:10.1137/1.9781611977554.CH1.
- [10] Sujoy Bhore, Sourav Chakraborty, Satyabrata Jana, Joseph S B Mitchell, Supantha Pandit, and Sasanka Roy. The balanced connected subgraph problem. Discrete Appl. Math., 319:111–120, October 2022. doi:10.1016/J.DAM.2020.12.030.
- [11] Sujoy Bhore, Satyabrata Jana, Supantha Pandit, and Sasanka Roy. The balanced connected subgraph problem for geometric intersection graphs. Theoretical Computer Science, 929:69–80, 2022. doi:10.1016/J.TCS.2022.06.030.
- [12] Matteo Böhm, Adriano Fazzone, Stefano Leonardi, and Chris Schwiegelshohn. Fair clustering with multiple colors. arXiv 2002.07892, February 2020.
- [13] Victor Bucarey, Fernando Ordóñez, and Enrique Bassaletti. Shape and balance in police districting. In Applications of location analysis, pages 329–347. Springer, 2015.
- [14] Leonardo Bursztyn, Davide Cantoni, Patricia Funk, Felix Schönenberger, and Noam Yuchtman. Identifying the effect of election closeness on voter turnout: Evidence from Swiss referenda. J. Eur. Econ. Assoc., 22(2):876–914, June 2023.
- [15] Timothy M. Chan and Sariel Har-Peled. Approximation algorithms for maximum independent set of pseudo-disks. Discret. Comput. Geom., 48(2):373–392, 2012. doi:10.1007/S00454-012-9417-5.
- [16] Yuk Hei Chan and Lap Chi Lau. On linear and semidefinite programming relaxations for hypergraph matching. Math. Program., 135(1-2):123–148, 2012. doi:10.1007/S10107-011-0451-5.
- [17] Yong Chen, Zhi-Zhong Chen, Guohui Lin, Yao Xu, and An Zhang. Approximation algorithms for maximally balanced connected graph partition. Algorithmica, 83(12):3715–3740, 2021. doi:10.1007/S00453-021-00870-3.
- [18] Anshuman Chhabra, Karina Masalkovaitė, and Prasant Mohapatra. An overview of fairness in clustering. IEEE Access, 9:130698–130720, 2021. doi:10.1109/ACCESS.2021.3114099.
- [19] Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 5036–5044, 2017.
- [20] Miroslav Chlebík and Janka Chlebíková. Approximation hardness for small occurrence instances of NP-Hard problems. In Algorithms and Complexity, pages 152–164. Springer Berlin Heidelberg, 2003. doi:10.1007/3-540-44849-7_21.
- [21] Janka Chlebíková. Approximating the maximally balanced connected partition problem in graphs. Information Processing Letters, 60(5):225–230, 1996.
- [22] Gabriel Chuang, Oussama Hanguir, and Clifford Stein. Drawing Competitive Districts in Redistricting. In Guy N. Rothblum, editor, 5th Symposium on Foundations of Responsible Computing (FORC 2024), volume 295 of Leibniz International Proceedings in Informatics (LIPIcs), pages 7:1–7:22, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.FORC.2024.7.
- [23] Carmen Cirincione, Thomas A Darling, and Timothy G O’Rourke. Assessing South Carolina’s 1990s congressional districting. Political Geography, 19(2):189–211, 2000.
- [24] S Clarke and J Surkis. An operations research approach to racial desegregation of school systems. Socio-Economic Planning Sciences, 1(3):259–272, 1968.
- [25] Jeanne Clelland, Haley Colgate, Daryl DeFord, Beth Malmskog, and Flavia Sancier-Barbosa. Colorado in context: Congressional redistricting and competing fairness criteria in Colorado. Journal of Computational Social Science, 5(1):189–226, 2022. doi:10.1007/S42001-021-00119-7.
- [26] Vincent Cohen-Addad, Philip N Klein, and Neal E Young. Balanced power diagrams for redistricting. arXiv preprint arXiv:1710.03358, 2017. arXiv:1710.03358.
- [27] Vincent Cohen-Addad, Philip N. Klein, and Neal E. Young. Balanced centroidal power diagrams for redistricting. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’18, pages 389–396, New York, NY, USA, 2018. Association for Computing Machinery. doi:10.1145/3274895.3274979.
- [28] Gerard Cornuéjols, David Hartvigsen, and William Pulleyblank. Packing subgraphs in a graph. Operations Research Letters, 1(4):139–143, 1982. doi:10.1016/0167-6377(82)90016-5.
- [29] Jessica De Silva, Brady Gales, Bryson Kagy, and David Offner. An analysis of a fair division protocol for drawing legislative districts. arXiv preprint arXiv:1811.05705, 2018.
- [30] Daryl DeFord, Moon Duchin, and Justin Solomon. A computational approach to measuring vote elasticity and competitiveness. Statistics and Public Policy, 7(1):69–86, 2020.
- [31] Daryl DeFord, Moon Duchin, and Justin Solomon. Recombination: A family of Markov chains for redistricting. Harvard Data Science Review, 3(1):3, 2021.
- [32] Guoli Ding and Wenan Zang. Packing cycles in graphs. Journal of Combinatorial Theory, Series B, 86(2):381–407, 2002. doi:10.1006/JCTB.2002.2134.
- [33] E. Falkenauer and A. Delchambre. A genetic algorithm for bin packing and line balancing. In Proceedings 1992 IEEE International Conference on Robotics and Automation, pages 1186–1192 vol.2, 1992.
- [34] Allen D Franklin and Ernest Koenigsberg. Computed school assignments in a large district. Operations Research, 21(2):413–426, 1973. doi:10.1287/OPRE.21.2.413.
- [35] Roland G Fryer Jr and Richard Holden. Measuring the compactness of political districting plans. The Journal of Law and Economics, 54(3):493–535, 2011.
- [36] Zoltán Füredi, Jeff Kahn, and Paul D. Seymour. On the fractional matching polytope of a hypergraph. Comb., 13(2):167–180, 1993. doi:10.1007/BF01303202.
- [37] Michael R Gary and David S Johnson. Computers and intractability: A guide to the theory of NP-completeness, 1979.
- [38] Nabeel Gillani, Doug Beeferman, Christine Vega-Pourheydarian, Cassandra Overney, Pascal Van Hentenryck, and Deb Roy. Redrawing attendance boundaries to promote racial and ethnic diversity in elementary schools. Educational Researcher, 52(6):348–364, 2023.
- [39] Teofilo F Gonzalez. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci., 38:293–306, January 1985. doi:10.1016/0304-3975(85)90224-5.
- [40] Magnús Halldórsson and Jaikumar Radhakrishnan. Greed is good: approximating independent sets in sparse and bounded-degree graphs. In Proceedings of the twenty-sixth annual ACM symposium on Theory of Computing, STOC ’94, pages 439–448, New York, NY, USA, May 1994. Association for Computing Machinery. doi:10.1145/195058.195221.
- [41] J Gerald Hebert. The Realist’s Guide to Redistricting: Avoiding the Legal Pitfalls. American Bar Association, 2010.
- [42] Sidney Wayne Hess, JB Weaver, HJ Siegfeldt, JN Whelan, and PA Zitlau. Nonpartisan political redistricting by computer. Operations Research, 13(6):998–1006, 1965.
- [43] Matt Jacobs and Olivia Walch. A partial differential equations approach to defeating partisan gerrymandering. arXiv preprint arXiv:1806.07725, 2018.
- [44] Xinrui Jia, Kshiteej Sheth, and Ola Svensson. Fair colorful -center clustering. In Integer Programming and Combinatorial Optimization, pages 209–222. Springer International Publishing, 2020. doi:10.1007/978-3-030-45771-6_17.
- [45] Hai Jin. Spatial optimization methods and system for redistricting problems. PhD thesis, University of South Carolina, 2017.
- [46] Jörg Kalcsics, Stefan Nickel, and Michael Schröder. Towards a unified territorial design approach—applications, algorithms and GIS integration. Top, 13:1–56, 2005.
- [47] Peter Keevash and Benny Sudakov. Packing triangles in a graph and its complement. Journal of Graph theory, 47(3):203–216, 2004. doi:10.1002/JGT.20031.
- [48] Myung Jin Kim. Optimization approaches to political redistricting problems. PhD thesis, The Ohio State University, 2011.
- [49] Zeph Landau, Oneil Reid, and Ilona Yershov. A fair division solution to the problem of redistricting. Social Choice and Welfare, 32(3):479–492, 2009. doi:10.1007/S00355-008-0336-6.
- [50] Timothée Martinod, Valentin Pollet, Benoît Darties, Rodolphe Giroudeau, and J-C König. Complexity and inapproximability results for balanced connected subgraph problem. Theoretical Computer Science, 886:69–83, 2021. doi:10.1016/J.TCS.2021.07.010.
- [51] Elle Najt, Daryl DeFord, and Justin Solomon. Complexity and geometry of sampling connected graph partitions. arXiv preprint arXiv:1908.08881, 2019.
- [52] Gary Orfield and Danielle Jarvie. Black segregation matters: School resegregation and black educational opportunity. Civil Rights Project-Proyecto Derechos Civiles, 2020.
- [53] Wesley Pegden, Ariel D Procaccia, and Dingli Yu. A partisan districting protocol with provably nonpartisan outcomes. arXiv preprint arXiv:1710.08781, 2017. arXiv:1710.08781.
- [54] Serge A. Plotkin, David B. Shmoys, and Éva Tardos. Fast approximation algorithms for fractional packing and covering problems. In 32nd Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, 1-4 October 1991, pages 495–504. IEEE Computer Society, 1991. doi:10.1109/SFCS.1991.185411.
- [55] Alex Samorodnitsky and Luca Trevisan. A PCP characterization of NP with optimal amortized query complexity. In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, STOC ’00, pages 191–199, New York, NY, USA, 2000. Association for Computing Machinery. doi:10.1145/335305.335329.
- [56] Thomas C Schelling. Dynamic models of segregation. Journal of mathematical sociology, 1(2):143–186, 1971.
- [57] James B Weaver and Sidney W Hess. A procedure for nonpartisan districting: Development of computer techiques. Yale Law Journal, 73(2):288–308, 1963.
- [58] David Zuckerman. Linear degree extractors and the inapproximability of max clique and chromatic number. In Jon M. Kleinberg, editor, Proceedings of the 38th Annual ACM Symposium on Theory of Computing, Seattle, WA, USA, May 21-23, 2006, pages 681–690. ACM, 2006. doi:10.1145/1132516.1132612.
Appendix A Omitted Proofs from Section 3
Throughout, we refer type-1 vertices as vertices that have non-zero weight and zero weight of and type-2 vertices as vertices with non-zero weight and zero weight of .
See 4
Proof.
We take an instance of maximum independent set problem and turn it into an instance of -balanced districting problem. Given a graph , and . Denote by the maximum degree of . For each vertex , we create a vertex in graph with type-1 weight of and type-2 weight of . There are such “type-1 vertices”. We also have a set of “type-2 vertices” with type-2 weight of and type-1 weight of . Each vertex has exactly type-2 neighbors. If has an edge in , in , and share one type-2 vertex, which corresponds to the edge between in . See Figure 1. If the degree of is less than , the corresponding vertex in may have some dangling (degree-1) type-2 vertices. The total number of type-2 vertices is . Thus the total number of vertices in is and the number of edges in is . In order for a type-1 vertex to be covered, all its type-2 neighbors must be used. Thus a maximum independent set in means we can cover all corresponding vertices of in as well as all their type-2 neighbors, leading to a total coverage population of . Similarly, if we can find a -balanced districting problem in , the type-1 vertices that are covered in -balanced districts cannot share any common type-2 neighbors, and therefore the corresponding vertices in must be independent. This reduction works when the district must be a star.
This reduction shows hardness of approximation, as an -approximation for maximum independent set means an -approximation for -balanced districting problem, for any . The maximum independent set cannot be approximated by a factor of for any constant on general graph [7, 58]. If we have an approximation algorithm for the districting problem with approximation factor with as the number of vertices in the districting graph , by the reduction and this gives an for the maximum independent set problem on , since , which is impossible unless .
As the maximum degree in both and is , approximating the balanced districting problem in with some factor depending on gives the same approximation factor for the maximum independent set in . For bounded degree graphs, the maximum independent set has a constant approximation [40], but is APX-complete [20] and cannot expect an approximation ratio better than unless [55]. Further, assuming the Unique Games Conjecture, one cannot approximate the maximum independent set problem within a factor of [5]. These (conditional) hardness results extend to balanced star districting problem on graphs with maximum degree . This finishes the argument.
Appendix B Separation Oracle and Randomized Rounding
B.1 Separation Oracle: Proof of Lemma 7
See 7
Proof.
We generalize Algorithm 1 to accommodate dual variables. In particular, we will maintain a candidate list with the following property: For any -balanced district , there exists a district such that (1) is an -approximate of , and (2) . Recall that from the proof of Theorem 12 we defined that a pair of numbers is an -approximate to if both . The above property suggests that we add a third dimension for to the list, and modify the trimming algorithm slightly – we will not trim the solution if their values are too far from each other.
We now prove that the final list contains a -balanced district that is a weakly violating constraint. Consider the population of commodities of . Without loss of generality, assume that . Then, by the property we stated above, there exists a district that satisfies:
| (maintained by Algorithm 1) | ||||
| (whenever ) |
The above inequality shows that is indeed -balanced. Furthermore, we have . Thus, we are able to show that is a weakly violating constraint:
| ( is strongly violating) | ||||
| () | ||||
| On the other hand, we have | ||||
certifying that the output satisfies all the constraints from the lemma statement.
Let us now analyze the runtime of the algorithm. It suffices to analyze the number of scales at the new dimension. Since each value is at least and is at most , the number of scales in the third dimension can be bounded by
Together with the analysis in Theorem 12, the runtime of this generalized algorithm for the complete graph, including maintaining a solution, is .
B.2 Proof of Randomized Rounding
Lemma 13.
The randomized rounding algorithm from the fractional LP solution produces an expected weight for output districting as
Proof.
Consider a district with non-zero value , it is selected into only if two events happen: (1) the coin flip with probability turns out to be true; and (2) all the districts with value at least are not included in – their coin flips are false. The probability of both events happening is
Now, by linearity of expectation, we have
For any , let be the set of all districts whose weight is at least . The following lemma connects the unweighted correlation between overlapped districts and the expected approximation ratio to the randomized rounding algorithm.
Lemma 14.
Let be a fixed value. Suppose that for all ,
| (5) |
then .
Proof.
Appendix C Omitted Proofs from Section 5
See 12
Proof of Theorem 12.
Given an arbitrary ordering on blocks, let be the set of all values that can be obtained by selecting some subset of the first blocks ,
We use induction to show that in Algorithm 1 is an -trimmed of for all . The base case is trivially holds as . Suppose is an -trimmed of . For any with , by the induction hypothesis, there exists which -approximates and -dominates . Because ,
On the other hand, by the definition of , for any there exists so that
Combining these two proves that -approximates and -dominates . The identical argument holds for all . Thus, we show is an -trimmed of for all . Similar argument applies to .
Let be the optimal -balanced value in . Suppose . Since is -trimmed to , there exists that -approximates and -dominates . Because -approximates , the approximation guarantee holds, . Now we show is also -balanced. Because is -balanced and -dominates ,
Moreover, because and -approximates , we have
where the last inequality holds because . Combining these two, we have and completing the proof. Similarly, if , there exists an -approximation and -balanced solution in .
The running time of -th iteration is which can be bounded as the following. Consider a geometric grid with vertices in . Because and no two points in can be in a same rectangle after trimming, the size of is bounded by the size of grid . Therefore, the running time of Algorithm 1 is . The additional in the theorem statement is to reconstruct the set.
