Network Flow-Based Refinement for Multilevel Hypergraph Partitioning

We present a refinement framework for multilevel hypergraph partitioning that uses max-flow computations on pairs of blocks to improve the solution quality of a $k$-way partition. The framework generalizes the flow-based improvement algorithm of KaFFPa from graphs to hypergraphs and is integrated into the hypergraph partitioner KaHyPar. By reducing the size of hypergraph flow networks, improving the flow model used in KaFFPa, and developing techniques to improve the running time of our algorithm, we obtain a partitioner that computes the best solutions for a wide range of benchmark hypergraphs from different application areas while still having a running time comparable to that of hMetis.


Introduction
Given an undirected hypergraph H = (V, E), the k-way hypergraph partitioning problem is to partition the vertex set into k disjoint blocks of bounded size (at most 1 + ε times the average block size) such that an objective function involving the cut hyperedges is minimized. Hypergraph partitioning (HGP) has many important applications in practice such as scientific computing [12] or VLSI design [43]. Particularly VLSI design is a field where small improvements can lead to significant savings [56]. It is well known that HGP is NP-hard [38], which is why practical applications mostly use heuristic multilevel algorithms [11,13,25,26]. These algorithms successively contract the hypergraph to obtain a hierarchy of smaller, structurally similar hypergraphs. After applying an initial partitioning algorithm to the smallest hypergraph, contraction is undone and, at each level, a local search method is used to improve the partitioning induced by the coarser level. All state-of-the-art HGP algorithms [2,4,7,16,28,31,32,33,48,51,52,54] either use variations of the Kernighan-Lin (KL) [34,49] or the Fiduccia-Mattheyses (FM) heuristic [19,46], or simpler greedy algorithms [32,33] for local search. These heuristics move vertices between blocks in descending order of improvements in the optimization objective (gain) and are known to be prone to get stuck in local optima when used directly on the input hypergraph [33]. The multilevel paradigm helps to some extent, since it allows a more global view on the problem on the coarse levels and a very fine-grained view on the fine levels of the multilevel hierarchy. However, the performance of move-based approaches degrades A net is called cut net if λ(e) > 1. Given a k-way partition Π of H, the quotient graph Q := (Π, {(V i , V j ) | ∃e ∈ E : {V i , V j } ⊆ Λ(e)}) contains an edge between each pair of adjacent blocks. The k-way hypergraph partitioning problem is to find an ε-balanced k-way partition Π of a hypergraph H that minimizes an objective function over the cut nets for some ε. Several objective functions exist in the literature [5,38]. The most commonly used cost functions are the cut-net metric cut(Π) := e∈E ω(e) and the connectivity metric (λ − 1)(Π) := e∈E (λ(e) − 1) ω(e) [1], where E is the set of all cut nets [17]. In this paper, we use the (λ − 1)-metric. Optimizing both objective functions is known to be NP-hard [38]. Hypergraphs can be represented as bipartite graphs [29]. In the following, we use nodes and edges when referring to graphs and vertices and nets when referring to hypergraphs. In the bipartite graph G * (V∪E, F ) the vertices and nets of H form the node set and for each net e ∈ I(v), we add an edge (e, v) to G * . The edge set F is thus defined as F := {(e, v) | e ∈ E, v ∈ e}. Each net in E therefore corresponds to a star in G * .
Let G = (V, E, c, ω) be a weighted directed graph. We use the same notation as for hypergraphs to refer to node weights c, edge weights ω, and node degrees d (v). Furthermore Γ(u) := {v : (u, v) ∈ E} denotes the neighbors of node u. A path P = v 1 , . . . , v k is a sequence of nodes, such that each pair of consecutive nodes is connected by an edge. A strongly connected component C ⊆ V is a set of nodes such that for each u, v ∈ C there exists a path from u to v. A topological ordering is a linear ordering ≺ of V such that every directed edge (u, v) ∈ E implies u ≺ v in the ordering. A set of nodes B ⊆ V is called a closed set iff there are no outgoing edges leaving B, i.e., if the conditions u ∈ B and (u, v) ∈ E imply v ∈ B. A subset S ⊂ V is called a node separator if its removal divides G into two disconnected components.
is the residual network. An (s, t )-cut (or cut) is a bipartition (S, V \ S) of a flow network N with s ∈ S ⊂ V and t ∈ V \ S. The capacity of an (s, t )-cut is defined as e∈E c(e), where E = {(u, v) ∈ E : u ∈ S, v ∈ V \ S}. The max-flow min-cut theorem states that the value |f | of a maximum flow is equal to the capacity of a minimum cut separating s and t [21].

Flows on Hypergraphs.
While flow-based approaches have not yet been considered as refinement algorithms for multilevel HGP, several works deal with flow-based hypergraph min-cut computation. The problem of finding minimum (s, t )-cuts in hypergraphs was first considered by Lawler [36], who showed that it can be reduced to computing maximum flows in directed graphs. Hu and Moerder [29] present an augmenting path algorithm to compute a minimum-weight vertex separator on the star-expansion of the hypergraph. Their vertex-capacitated network can also be transformed into an edge-capacitated network using a transformation due to Lawler [37]. Yang and Wong [57] use repeated, incremental max-flow min-cut computations on the Lawler network [36] to find ε-balanced hypergraph bipartitions. Solution quality and running time of this algorithm are improved by Lillis and Cheng [39] by introducing advanced heuristics to select source and sink nodes. Furthermore, they present a preflow-based [22] min-cut algorithm that implicitly operates on the star-expanded hypergraph. Pistorius and Minoux [45] generalize the algorithm of Edmonds and Karp [18] to hypergraphs by labeling both vertices and nets. Liu and Wong [40] simplify Lawler's hypergraph flow network [36] by explicitly distinguishing between graph edges and hyperedges with three or more pins. This approach significantly reduces the size of flow networks derived from VLSI hypergraphs, since most of the nets in a circuit are graph edges. Note that the above-mentioned approaches to model hypergraphs as flow networks for max-flow min-cut computations do not contradict the negative results of Ihler et al. [30], who show that, in general, there does not exist an edge-weighted graph G = (V, E) that correctly represents the min-cut properties of the corresponding hypergraph H = (V, E).
Flow-Based Graph Partitioning. Flow-based refinement algorithms for graph partitioning include Improve [6] and MQI [35], which improve expansion or conductance of bipartitions. MQI also yields as small improvement when used as a post processing technique on hypergraph bipartitions initially computed by hMetis [35]. FlowCutter [24] uses an approach similar to Yang and Wong [57] to compute graph bisections that are Pareto-optimal in regard to cut size and balance. Sanders and Schulz [47] present a flow-based refinement framework for their direct k-way graph partitioner KaFFPa. The algorithm works on pairs of adjacent blocks and constructs flow problems such that each min-cut in the flow network is a feasible solution in regard to the original partitioning problem.
KaHyPar. Since our algorithm is integrated into the KaHyPar framework, we briefly review its core components. While traditional multilevel HGP algorithms contract matchings or clusterings and therefore work with a coarsening hierarchy of O(log n) levels, KaHyPar instantiates the multilevel paradigm in the extreme n-level version, removing only a single vertex between two levels. After coarsening, a portfolio of simple algorithms is used to create an initial partition of the coarsest hypergraph. During uncoarsening, strong localized local search heuristics based on the FM algorithm [19,46] are used to refine the solution. Our work builds on KaHyPar-CA [28], which is a direct k-way partitioning algorithm for optimizing the (λ − 1)-metric. It uses an improved coarsening scheme that incorporates global information about the community structure of the hypergraph into the coarsening process.

The Flow-Based Improvement Framework of KaFFPa
We discuss the framework of Sanders and Schulz [47] in greater detail, since our work makes use of the techniques proposed by the authors. For simplicity, we assume k = 2. The techniques can be applied on a k-way partition by repeatedly executing the algorithm on pairs of adjacent blocks. To schedule these refinements, the authors propose an active block scheduling algorithm, which schedules blocks as long as their participation in a pairwise refinement step results in some changes in the k-way partition.
An ε-balanced bipartition of a graph G = (V, E, c, ω) is improved with flow computations as follows. The basic idea is to construct a flow network N based on the induced subgraph G [B], where B ⊆ V is a set of nodes around the cut of G. The size of B is controlled by an imbalance factor ε := αε, where α is a scaling parameter that is chosen adaptively depending on the result of the min-cut computation. If the heuristic found an ε-balanced partition using ε , the cut is accepted and α is increased to min(2α, α ) where α is a predefined upper bound. Otherwise it is decreased to max( α 2 , 1). This scheme continues until a maximal number of rounds is reached or a feasible partition that did not improve the cut is found. to the source s and all border nodes δB ∩ V 2 to the sink t using directed edges with an edge weight of ∞. By connecting s and t to the respective border nodes, it is ensured that edges incident to border nodes, but not contained in G [B], cannot become cut edges. For α = 1, the size of B thus ensures that the flow network N has the cut property, i.e., each (s, t )-min-cut in N yields an ε-balanced partition of G with a possibly smaller cut. For larger values of α, this does not have to be the case.
After computing a max-flow in N , the algorithm tries to find a min-cut with better balance. This is done by exploiting the fact that one (s, t )-max-flow contains information about all (s, t )-min-cuts [44]. More precisely, the algorithm uses the 1-1 correspondence between (s, t )-min-cuts and closed sets containing s in the Picard-Queyranne-DAG D s,t of the residual graph N f [44]. First, D s,t is constructed by contracting each strongly connected component of the residual graph. Then the following heuristic (called most balanced minimum cuts) is repeated several times using different random seeds. Closed node sets containing s are computed by sweeping through the nodes of DAG s,t in reverse topological order (e.g. computed using a randomized DFS). Each closed set induces a differently balanced min-cut and the one with the best balance (with respect to the original balance constraint) is used as resulting bipartition.

Hypergraph Max-Flow Min-Cut Refinement
In the following, we generalize the flow-based refinement algorithm of KaFFPa to hypergraph partitioning. In Section 3.

Hypergraph Flow Networks
The Liu-Wong Network [40]. Given a hypergraph H = (V, E, c, ω) and two distinct nodes s and t , an (s, t )-min-cut can be computed by finding a minimum-capacity cut in the following For each multi-pin net e ∈ E with |e| ≥ 3, add two bridging nodes e and e to V and a bridging edge (e , e ) with capacity c(e , e ) = ω(e) to E. For each pin p ∈ e, add two edges (p, e ) and (e , p) with capacity ∞ to E. For each two-pin net e = (u, v) ∈ E, add two bridging edges (u, v) and (v, u) with capacity ω(e) to E. The flow network of Lawler [36] does not distinguish between two-pin and multi-pin nets. This increases the size of the network by two vertices and three edges per two-pin net. Figure 1 shows an example of the Lawler and Liu-Wong hypergraph flow networks as well as of our network described in the following paragraph.
Removing Low Degree Hypernodes. We further decrease the size of the network by using the observation that the problem of finding an (s, t )-min-cut of H can be reduced to finding a minimum-weight (s, t )-vertex-separator in the star-expansion, where the capacity of each star-node is the weight of the corresponding net and all other nodes (corresponding to Lawler [36] Liu and Wong [40] Our Network Hypergraph vertices in H) have infinite capacity [29]. Since the separator has to be a subset of the star-nodes, it is possible to replace any infinite-capacity node by adding a clique between all adjacent star-nodes without affecting the separator. The key observation now is that an infinite-capacity node v with degree d(v) induces 2d(v) infinite-capacity edges in the Lawler network [36], while a clique between star-nodes induces Thus we can reduce the number of nodes and edges of the Liu-Wong network as follows. Before applying the transformation on the star-expansion of H, we remove all infinite-capacity nodes v corresponding to hypernodes with d(v) ≤ 3 that are not incident to any two-pin nets and add a clique between all star-nodes adjacent to v. In case v was a source or sink node, we create a multi-source multi-sink problem by adding all adjacent star-nodes to the set of sources resp. sinks [20].
Reconstructing Min-Cuts. After computing an (s, t )-max-flow in the Lawler or Liu-Wong network, an (s, t )-min-cut of H can be computed by a BFS in the residual graph starting from s. Let S be the set of nodes corresponding to vertices of H reached by the BFS. Then (S, V \ S) is an (s, t )-min-cut. Since our network does not contain low degree hypernodes, we use the following lemma to compute an (s, t )-min-cut of H: to a bridging node e of a net e ∈ I(v).

Proof.
Since v ∈ S, there has to be some path s v in N f . By definition of the flow network, this path can either be of the form P 1 = s, . . . , e , v or P 2 = s, . . . , e , v for some bridging nodes e , e corresponding to nets e ∈ I(v). In the former case we are done, since e ∈ P 1 . In the latter case the existence of edge (e , v) ∈ E f implies that there is a positive flow f (v, e ) > 0 over edge (v, e ) ∈ E. Due to flow conservation, there exists at least one Furthermore this allows us to search for more balanced min-cuts using the Picard-Queyranne-DAG of N f as described in Section 2.3. By the definition of closed sets it follows that if a bridging node e is contained in a closed set C, then all nodes v ∈ Γ(e ) (which correspond to vertices of H) are also contained in C. Thus we can use the respective bridging nodes e as representatives of removed low degree hypernodes.

Constructing the Hypergraph Flow Problem
In the following, we distinguish between the set of internal border nodes Thus it has to hold that cut(Π f ) ≤ cut(Π 2 ). While external nets are not affected by a max-flow computation, the max-flow min-cut theorem [21] ensures the cut property for all internal nets. Border nets however require special attention. Since a border net e is only partially contained in H B , it will remain connected to the blocks of its external border nodes in H. In case external border nodes connect e to both V i and V j , it will remain a cut net in H even if it is removed from the cut-set in Π f . It is therefore necessary to "encode" information about external border nodes into the flow problem.

The KaFFPa Model and its Limitations. In KaFFPa, this is done by directly connecting internal border nodes
− → B to s and t . This approach can also be used for hypergraphs. In the hypergraph flow problem F G , the source s is connected to all nodes S = This limitation becomes increasingly relevant for hypergraphs with large nets as well as for partitioning problems with small imbalance ε, since large nets are likely to be only partially contained in H B and tight balance constraints enforce small B-corridors. While the former is a problem only for HGP, the latter also applies to GP.
A more flexible Model. We propose a more general model that allows an (s, t )-max-flow computation to also cut through border nets by exploiting the structure of hypergraph flow networks. Instead of directly connecting s and t to internal border nodes − → B and thus preventing all min-cuts in which these nodes switch blocks, we conceptually extend H B to contain all external border nodes ← − B and all border nets The key insight now is that by using the flow network of ←− H B and connecting s resp. t to the external border nodes we get a flow problem that does not lock any node v ∈ V B in its block, since none of these nodes is directly connected to either s or t . Due to the max-flow min-cut theorem [21], this flow problem furthermore has the cut property, since all border nets of H B are now internal nets T. Heuer, P. Sanders and S. Schlag Flow Problem F G of Sanders and Schulz [38] Cut

Implementation Details
Since KaHyPar is an n-level partitioner, its FM-based local search algorithms are executed each time a vertex is uncontracted. To prevent expensive recalculations, it therefore uses a cache to maintain the gain values of FM moves throughout the n-level hierarchy [2]. In order to combine our flow-based refinement with FM local search, we not only perform the moves induced by the max-flow min-cut computation but also update the FM gain cache accordingly.
Since it is not feasible to execute our algorithm on every level of the n-level hierarchy, we use an exponentially spaced approach that performs flow-based refinements after uncontracting i = 2 j vertices for j ∈ N + . This way, the algorithm is executed more often on smaller flow problems than on larger ones. To further improve the running time, we introduce the following speedup techniques: S1: We modify active block scheduling such that after the first round the algorithm is only executed on a pair of blocks if at least one execution using these blocks improved connectivity or imbalance of the partition on previous levels. S2: For all levels except the finest level: Skip flow-based refinement if the cut between two adjacent blocks is less than ten.

S3:
Stop resizing the corridor B if the current (s, t )-cut did not improve the previously best solution.

Experimental Evaluation
We implemented the max-flow min-cut refinement algorithm in the n-level hypergraph partitioning framework KaHyPar (Karlsruhe Hypergraph Partitioning). The code is written in C++ and compiled using g++-5.2 with flags -O3 -march=native. The latest version of the framework is called KaHyPar-CA [28]. We refer to our new algorithm as KaHyPar-MF. Both versions use the default configuration for community-aware direct k-way partitioning. 1

Instances.
All experiments use hypergraphs from the benchmark set of Heuer and Schlag [28] 2 , which contains 488 hypergraphs derived from four benchmark sets: the ISPD98 VLSI Circuit Benchmark Suite [3], the DAC 2012 Routability-Driven Placement Contest [55], the University of Florida Sparse Matrix Collection [15], and the international SAT Competition 2014 [9]. Sparse matrices are translated into hypergraphs using the row-net model [12], i.e., each row is treated as a net and each column as a vertex. SAT instances are converted to three different representations: For literal hypergraphs, each boolean literal is mapped to one vertex and each clause constitutes a net [43], while in the primal model each variable is represented by a vertex and each clause is represented by a net. In the dual model the opposite is the case [41]. All hypergraphs have unit vertex and net weights. Table 1 gives an overview about the different benchmark sets used in the experiments. The full benchmark set is referred to as set A. We furthermore use the representative subset of 165 hypergraphs proposed in [28] (set B) and a smaller subset consisting of 25 hypergraphs (set C), which is used to devise the final configuration of KaHyPar-MF. Basic properties of set C can be found in Table 10 in Appendix C. Unless mentioned otherwise, all hypergraphs are partitioned into k ∈ {2, 4, 8, 16, 32, 64, 128} blocks with ε = 0.03. For each value of k, a k-way partition is considered to be one test instance, resulting in a total of 175 instances for set C, 1155 instances for set B and 3416 instances for set A. Furthermore we use 15 graphs from [42] to compare our flow model F H to the KaFFPa [47] model F G . Table 11 in Appendix C summarizes the basic properties of these graphs, which constitute set D.  [32,33], and to PaToH 3.2 [12]. These HGP libraries were chosen because they provide the best solution quality [2,28]. The partitioning results of these tools are already available from http://algo2.iti.kit.edu/schlag/sea2017/. For each partitioner except PaToH the results summarize ten repetitions with different seeds for each test instance and report the arithmetic mean of the computed cut and running time as well as the best cut found. Since PaToH ignores the random seed if configured to use the quality preset, the results contain both the result of single run of the quality preset (PaToH-Q) and the average over ten repetitions using the default configuration (PaToH-D). Each partitioner had a time limit of eight hours per test instance. We use the same number of repetitions and the same time limit for our experiments with KaHyPar-MF.
In the following, we use the geometric mean when averaging over different instances in order to give every instance a comparable influence on the final result. In order to compare the algorithms in terms of solution quality, we perform a more detailed analysis using improvement plots. For each algorithm, these plots relate the minimum connectivity of KaHyPar-MF to the minimum connectivity produced by the corresponding algorithm on a per-instance basis. For each algorithm, these ratios are sorted in decreasing order. The plots use a cube root scale for the y-axis to reduce right skewness [14] and show the improvement of KaHyPar-MF in percent (i.e., 1 − (KaHyPar-MF/algorithm)) on the y-axis. A value below zero indicates that the partition of KaHyPar-MF was worse than the partition produced by the corresponding algorithm, while a value above zero indicates that KaHypar-MF performed better than the algorithm in question. A value of zero implies that the partitions of both algorithms had the same solution quality. Values above one correspond to infeasible solutions that violated the balance constraint. In order to include instances with a cut of zero into the results, we set the corresponding cut values to one for ratio computations.

Evaluating Flow Networks, Models, and Algorithms
Flow Networks and Algorithms. To analyze the effects of the different hypergraph flow networks we compute five bipartitions for each hypergraph of set B with KaHyPar-CA using different seeds. Statistics of the hypergraphs are shown in Table 2. The bipartitions are then used to generate hypergraph flow networks for a corridor of size |B| = 25 000 hypernodes around the cut. primal and literal SAT instances are the largest in terms of both numbers of nodes and edges. High average vertex degree combined with low average net sizes leads to subhypergraphs H B containing many small nets, which then induce many nodes and (infinite-capacity) edges in N L . Dual instances with low average degree and large average net size on the other hand lead to smaller flow networks. For VLSI instances (DAC, ISPD) both average degree and average net sizes are low, while for SPM hypergraphs the opposite is the case. This explains why SPM flow networks have significantly more edges, despite the number of nodes being comparable in both classes.
As expected, the Lawler-Network N L induces the biggest flow problems. Looking at the Liu-Wong network N W , we can see that distinguishing between graph edges and nets with |e| ≥ 3 pins has an effect for all hypergraphs with many small nets (i.e., DAC, ISPD, Primal, Literal). While this technique alone does not improve dual SAT instances, we see that the combination of the Liu-Wong approach and our removal of low degree hypernodes in N Our reduces the size of the networks for all instance classes except SPM. Both techniques only have a limited effect on these instances, since both hypernode degrees and net sizes are large on average. Since our flow problems are based on B-corridor induced subhypergraphs, N 1 Our additionally models single-pin border nets more efficiently as described in Section 3.2. This further reduces the network sizes significantly. As expected, the reduction in numbers of nodes and edges is most pronounced for hypergraphs with low average net sizes because these instances are likely to contain many single-pin border nets.
To further see how these reductions in network size translate to improved running times

Hypergraphs
Graphs α ε = 1% ε = 3% ε = 5% ε = 1% ε = 3% ε = 5% of max-flow algorithms, we use these networks to create flow problems using our flow model F H and compute min-cuts using two highly tuned max-flow algorithms, namely the BKalgorithm 3 [10] and the incremental breadth-first search (IBFS) algorithm 4 [23]. These algorithms were chosen because they performed best in preliminary experiments [27]. We then compare the speedups of these algorithms when executed on N W , N Our , and N 1 Our to the execution on the Lawler network N L . As can be seen in Figure 3 (bottom) both algorithms benefit from improved network models and the speedups directly correlate with the reductions in network size. While N W significantly reduces the running times for Primal and Literal instances, N Our additionally leads to a speedup for Dual instances. By additionally considering single-pin border nets, N 1 Our results in an average speedup between 1.52 and 2.21 (except for SPM instances). Since IBFS outperformed the BK algorithm in [27], we use N 1 Our and IBFS in all following experiments.
Flow Models. We now compare the flow model F G of KaFFPa to our advanced model F H described in Section 3.2. The experiments summarized in Table 3 were performed using sets C and D. To focus on the impact of the models on solution quality, we deactivated KaHyPar's FM local search algorithms and only use flow-based refinement without the most balanced minimum cut heuristic. The results confirm our hypothesis that F G restricts the space of possible solutions. For all flow problem sizes and all imbalances tested, F H yields better solution quality. As expected, the effects are most pronounced for small flow problems and small imbalances where many vertices are likely to be border nodes. Since these nodes are locked inside their respective block in F G , they prevent all non-cut border nets from becoming part of the cut-set. Our model, on the other hand, allows all min-cuts that yield a feasible solution for the original partitioning problem. The fact that this effect also occurs for the graphs of set D indicates that our model can also be effective for traditional graph partitioning. All following experiments are performed using F H .

Configuring the Algorithm
We now evaluate different configurations of the max-flow min-cut based refinement framework on set C. In the following, KaHyPar-CA [28] is used as a reference. Since it neither uses (F)lows nor the (M)ost balanced minimum cut heuristic and only relies on the (FM) algorithm for local search, it is referred to as (-F,-M,+FM). This basic configuration is then successively extended with specific components. The results of our experiments are summarized in Table 4 for increasing scaling parameter α . The table furthermore includes a configuration Constant128. In this configuration all components are enabled (+F,+M,+FM) and we perform flow-based refinements every 128 uncontractions. While this configuration is slow, it is used as a reference point for the quality achievable using flow-based refinement. The results indicate that only using flows (+F,-M,-FM) as refinement technique is inferior to localized FM local search in regard to both running time and solution quality. Although the quality improves with increasing flow problem size (i.e., increasing α ), the average connectivity is still worse than the reference configuration. Enabling the most balanced minimum cut heuristic improves partitioning quality. Configuration (+F,+M,-FM) performs better than the basic configuration for α ≥ 8. By combining flows with the FM algorithm (+F,-M,+FM) we get a configuration that improves upon the baseline configuration even for small flow problems. However, comparing this variant with (+F,+M,-FM) for α = 16, we see that using large flow problems together with the most balanced minimum cut heuristic yields solutions of comparable quality. Enabling all components (+F,+M,+FM) and using large flow problems performs best. Furthermore we see that enabling FM local search slightly improves the running time for α ≥ 8. This can be explained by the fact that the FM algorithm already produces good cuts between the blocks such that fewer rounds of pairwise flow refinements are necessary to further improve the solution. Comparing configuration (+F,+M,+FM) with Constant128 shows that performing flows more often further improves solution quality at the cost of slowing down the algorithm by more than an order of magnitude. In all further experiments, we therefore use configuration (+F,+M,+FM) with α = 16 for KaHyPar-MF. This configuration also performed best in the effectiveness tests presented in Appendix A. While this configuration performs better than KaHyPar-CA, its running time is still more than a factor of 3 higher.
We therefore perform additional experiments on set B and successively enable the speedup heuristics described in Section 3.3. The results are summarized in Table 5. Only executing pairwise flow refinements on blocks that lead to an improvement on previous levels (S1) reduces the running time of flow-based refinement by a factor of 1.27, while skipping flows in case of small cuts (S2) results in a further speedup of 1.19. By additionally stopping the resizing of the flow problem as early as possible (S3), we decrease the running time of flow-based improvement by a factor of 2 in total, while still computing solutions of comparable quality. Thus in the comparisons with other systems, all heuristics are enabled.

Comparison with other Systems
Finally, we compare KaHyPar-MF to different state-of-the-art hypergraph partitioners on the full benchmark set. We exclude the same 194 out of 3416 instances as in [28] because either PaToH-Q could not allocate enough memory or other partitioners did not finish in time. The excluded instances are shown in Table 12 in Appendix D. Note that KaHyPar-MF did not lead to any further exclusions. The following comparison is therefore based on the remaining 3222 instances. As can be seen in Figure  Comparing the best solutions of all systems simultaneously, KaHyPar-MF produced the best partitions for 2427 of the 3222 instances. It is followed by hMetis-R (678), KaHyPar-CA (388), hMetis-K (352), PaToH-D (154), and PaToH-Q (146). Note that for some instances multiple partitioners computed the same best solution and that we disqualified infeasible solutions that violated the balance constraint. Figure 5 shows that KaHyPar-MF also performs best for different values of k and that pairwise flow refinements are an effective strategy to improve k-way partitions. As can be seen in Table 6, the improvement over KaHyPar-CA is most pronounced for hypergraphs derived from matrices of web graphs and social networks 5 and dual SAT instances. While the former are difficult to partition due to skewed degree and net size distributions, the latter are difficult because they contain many large nets.
Finally, Table 9 compares the running times of all partitioners. By using simplified flow networks, highly tuned flow algorithms and several techniques to speed up the flow-based refinement framework, KaHyPar-MF is less than a factor of two slower than KaHyPar-CA and still achieves a running time comparable to that of hMetis.

Conclusion
We generalize the flow-based refinement framework of KaFFPa [47] from graph to hypergraph partitioning. We reduce the size of Liu and Wong's hypergraph flow network [40] by removing low degree hypernodes and exploiting the fact that our flow problems are built on subhypergraphs of the input hypergraph. Furthermore we identify shortcomings of the KaFFPa [47] approach that restrict the search space of feasible solutions significantly and introduce an advanced model that overcomes these limitations by exploiting the structure of hypergraph flow networks. Lastly, we present techniques to improve the running time of the flow-based refinement framework by a factor of 2 without affecting solution quality. The resulting hypergraph partitioner KaHyPar-MF performs better than all competing algorithms on all instance classes of a large benchmark set and still has a running time comparable to that of hMetis. Since our flow problem formulation yields significantly better solutions for both hypergraphs and graphs than the KaFFPa [47] approach, future work includes the integration of our flow model into KaFFPa and the evaluation in the context of a high quality graph partitioner. Furthermore an approach similar to Yang and Wong [57] could be used as an alternative to the most balanced minimum cut heuristic and adaptive B-corridor resizing. We also plan to extend our framework to optimize other objective functions such as cut or sum of external degrees.

A Effectiveness Tests
To evaluate the effectiveness of our configurations presented in Section 4.2 we give each configuration the same time to compute a partition. For each instance (hypergraph, k), we execute each configuration once and note the largest running time t H,k . Then each configuration gets time 3t H,k to compute a partition (i.e., we take the best partition out of several repeated runs). Whenever a new run of a partition would exceed the largest running time, we perform the next run with a certain probability such that the expected running time is 3t H,k . The results of this procedure, which was initially proposed in [47], are presented in Table 8. We see that the combinations of flow-based refinement and FM local search perform better than repeated executions of the baseline configuration (-F,-M,+FM). The most effective configuration is (+F,+M,-FM) with α = 16, which was chosen as the default configuration for KaHyPar-MF.