Empirical Evaluation of Approximation Algorithms for Generalized Graph Coloring and Uniform Quasi-Wideness

The notions of bounded expansion and nowhere denseness not only offer robust and general definitions of uniform sparseness of graphs, they also describe the tractability boundary for several important algorithmic questions. In this paper we study two structural properties of these graph classes that are of particular importance in this context, namely the property of having bounded generalized coloring numbers and the property of being uniformly quasi-wide. We provide experimental evaluations of several algorithms that approximate these parameters on real-world graphs. On the theoretical side, we provide a new algorithm for uniform quasi-wideness with polynomial size guarantees in graph classes of bounded expansion and show a lower bound indicating that the guarantees of this algorithm are close to optimal in graph classes with fixed excluded minor.


Introduction
The exploitation of structural properties found in sparse graphs has a long and fruitful history in the design of efficient algorithms. Besides the long list of results on planar graphs and graphs of bounded degree (which are too numerous to fairly represent here), the celebrated structure theory of graphs with excluded minors, developed by Robertson and Seymour [60] falls into this category. It not only had an immense influence on the design of efficient algorithms (see e.g. [19,20]) it further introduced the now widely used notion of treewidth (see e.g. [9]) and gave rise to the field of parameterized complexity: "In the beginning, all we did was graph minors" (M. Fellows, pers. comm.). As such, the impact of the theory of sparse graphs on algorithmic research cannot be overstated.
Many of the algorithmic results concerning classes excluding a minor or a topological minor are in some way based on topological arguments, depending on the structure theorems (e.g. decompositions) for the class under consideration. A complete paradigm shift was initiated by Nešetřil and Ossona de Mendez with their foundational work and introduction of the notions of bounded expansion [45,46,47] and nowhere denseness [49]. These graph classes extend and properly contain all the aforementioned sparse classes and many arguments based on topology can be replaced by more general, and surprisingly often much simpler, arguments based on density. We refer to the textbook [50] for extensive background on the theory of sparse graph classes.
The rich structural theory for bounded expansion and nowhere dense graph classes has been successfully applied to design efficient algorithms for hard computational problems on specific sparse classes of graphs, see e.g. [6,17,22,23,24,25,26,30,32,66]. On the other hand, several results indicate that nowhere dense graph classes form a natural limit for algorithmic methods based on sparseness arguments, see e.g. [22,24].
One core strength of the bounded expansion/nowhere dense framework is that there exists a multitude of equivalent definitions that provide complementing perspectives. Here, we study two structural properties of these classes that are of particular importance in the algorithmic context, namely the property of having bounded generalized coloring numbers and the property of being uniformly quasi-wide. The generalized coloring numbers intuitively measure reachability properties in a linear vertex ordering of a given graph. Such an ordering yields a very weak and local form of a graph decomposition which can be exploited combinatorially [25,57] and algorithmically [6,22,23,32]. Uniform quasi-wideness was originally introduced in finite model theory [16], and soon found combinatorial and algorithmic applications on nowhere dense classes [17,25,30,37,48,55,63].
Even though the above results render many problems tractable in theory, many of the known algorithms have worst-case running times that involve huge constant factors and combinatorial explosions with respect to the discussed parameters. The central question of our work here is to investigate how the generalized coloring numbers and uniform quasi-wideness behave on real-world graphs, an endeavor which so far has only been conducted for a single notion of bounded expansion and on a smaller scale [21]. Controllable numbers would be a prerequisite for practical implementations of these algorithms based on such structural approaches. We provide an experimental evaluation of several algorithms that approximate these parameters on real world graphs.
On the theoretical side, we provide a new algorithm for uniform quasi-wideness with polynomial size guarantees in graph classes of bounded expansion and show a lower bound indicating that the guarantees of this algorithm are close to optimal in graph classes with fixed excluded minor.
Organization. We give background on the theory of bounded expansion and nowhere dense graphs in Section 2. In Section 3 and Section 4 we describe our approaches to compute the weak coloring numbers and uniform quasi-wideness. Our experimental setup is described in Section 5 and our results are presented in Section 6 and Section 7.

Preliminaries
Graphs. All graphs in this paper are finite, undirected and simple, that is, they do not have loops or multiple edges between the same pair of vertices. For a graph G, we denote by V (G) the vertex set of G and by E(G) its edge set. The distance between a vertex v and a vertex w is the length (that is, the number of edges) of a shortest path between v and w. For a vertex v of G, we write N G (v) for the set of all neighbors of v, N G (v) = { u ∈ V (G) | {u, v} ∈ E(G) }, and for r ∈ N we denote by N G r [v] the closed r-neighborhood of v, that is, the set of vertices of G at distance at most r from v. Note that we always have v ∈ N G r [v].When no confusion can arise regarding the graph G we are considering, we usually omit the superscript G. The radius of a connected graph G is the minimum integer r such that there exists v ∈ V (G) with the property that all vertices of G have distance at most r to v. A set A is r-independent if all distinct vertices of A have distance greater than r. Bounded expansion and nowhere denseness. A minor model of a graph H in a graph G is a family (I u ) u∈V (H) of pairwise vertex-disjoint connected subgraphs of G, called branch sets, such that whenever uv is an edge in H, there are u ∈ V (I u ) and v ∈ V (I v ) for which u v is an edge in G. The graph H is a depth-r minor of G, denoted H r G, if there is a minor model (I u ) u∈V (H) of H in G such that each I u has radius at most r. A class C of graphs is nowhere dense if there is a function t : N → N such that for all r ∈ N it holds that K t(r) r G for all G ∈ C, where K t(r) denotes the clique on t(r) vertices. The class C has bounded expansion if there is a function d : N → N such that for all r ∈ N and all H r G with G ∈ C, the edge density of H, i.e. |E(H)|/|V (H)|, is bounded by d(r). Note that every class of bounded expansion is nowhere dense. The converse is not necessarily true in general [50].

The weak coloring numbers
The coloring number col(G) of a graph G is the minimum integer k such that there is a linear order L of the vertices of G for which each vertex v has back-degree at most k − 1, i.e., at most k − 1 neighbors u with u < L v. It is well-known that for any graph G, the chromatic number χ(G) satisfies χ(G) ≤ col(G), which possibly explains the name "coloring number".
We study a generalization of the coloring number that was introduced by Kierstead and Yang [33] in the context of coloring games and marking games on graphs. The weak coloring numbers wcol r are a series of numbers, parameterized by a positive integer r, which denotes the radius of the considered ordering.
The invariants wcol r are defined in a way similar to the definition of the coloring number. Let Π(G) be the set of all linear orders of the vertices of the graph G, and let L ∈ Π(G). Let u, v ∈ V (G). For a positive integer r, we say that u is weakly r-reachable from v with respect to L, if there exists a path P of length , 0 ≤ ≤ r, between u and v such that u is minimum among the vertices of P (with respect to L). Let WReach r [G, L, v] be the set of vertices that are weakly r-reachable from v with respect to L. Note that v ∈ WReach r [G, L, v]. The weak r-coloring number wcol r (G) of G is defined as As proved by Zhu [70], the weak coloring numbers can be used to characterize bounded expansion and nowhere dense classes of graphs: A class C of graphs has bounded expansion if and only if there exists a function f : N → N such that wcol r (G) ≤ f (r) for all r ∈ N and all G ∈ C. A class C is nowhere dense if and only if for every real > 0 there exists an integer n 0 such that for all n-vertex graphs H with n ≥ n 0 which are subgraphs of some G ∈ C we have wcol r (H) ≤ n .
An interesting aspect of the weak coloring numbers is that these invariants can also be seen as gradations between the coloring number col(G) and the treedepth td(G) (which is the minimum height of a depth-first search tree for a supergraph of G [44]). More explicitly, for every graph G we have (see [50,Lemma 6.5]) Consequently, we also consider an algorithm for computing treedepth in our empirical evaluation.
A related notion to weak coloring numbers are strong coloring numbers, which were also introduced in [33]. Let L ∈ Π(G), let r be a positive integer and let v ∈ V (G). We say that a vertex u is strongly r-reachable from v if there is a path P of length , 0 ≤ ≤ r such that u = v or u is the only vertex of P smaller than v (with respect to L). Let SReach r [G, L, v] be the set of vertices that are strongly r-reachable from v with respect to L. Again, v ∈ SReach r [G, L, v]. The strong r-coloring number col r (G) is defined as col r (G) := min L∈Π(G) max v∈V (G) SReach r [G, L, v] . As weak coloring numbers converge to treedepth with growing r, strong coloring numbers converge to treewidth [29]: The reason is that treewidth of G can be characterized by the minimal width of an elimination ordering of G defined exactly as col ∞ (G).
Clearly, for all r ∈ N, col r (G) ≤ wcol r (G) (and thus tw(G) ≤ td(G)). Moreover, for all r we have wcol r (G) ≤ (col r (G)) r [33]. It follows that for every graph G there is some (possibly large) integer r such that wcol r−1 (G) ≤ tw(G) ≤ wcol r (G). This gives a hope that an elimination ordering computed for treewidth gives a good upper bound for wcol r (G) where r ≤ r − 1. We we will evaluate orders produced by an algorithm for treewidth approximations, but interpreted as an order for weak coloring numbers.
Concrete bounds for the weak coloring numbers on restricted graph classes are given in [29,36,45,56,65,70]. Our approximation algorithms are based on the approaches described in [45,56,65], which we describe in more detail in the following subsections.

Distance-constrainted Transitive Fraternal Augmentations
Given a graph G and a linear order L of its vertices, observe that we have the following properties: We can approximate the weak coloring numbers by orienting the input graph G and iteratively inserting arcs so that the above reachability properties are satisfied. Introducing an arc with the aim of satisfying property 1 above is called a fraternal augmentation, while introducing an arc with the aim of satisfying property 2 is called a transitive augmentation. These operations were studied first in [46]. We are going to work with an optimized version, called distance-constrained transitive-fraternal augmentations, short dtf-augmentations, which was introduced in [56] as a more practical variant of transitive-fraternal augmentations. Let G be an undirected graph and let G 1 be any orientation of G. Then a dtf-augmentation of G is a sequence G 1 ⊆ G 2 ⊆ . . . of directed graphs which satisfy the following two constraints: 1. Let u, v, w ∈ V (G) be such that uv ∈ E( G i ) and uw ∈ E( G j ) are arcs of G i and G j , respectively. Then it follows that either vw ∈ G i+j or wv ∈ G i+j .
2. Let u, v, w ∈ V (G) be such that vu ∈ E( G i ) and wv ∈ E( G j ) are arcs of G i and G j , respectively. Then it follows that wu ∈ G i+j .
Just as above, arcs added because of the first item are called fraternal and arcs added because of the second item are called transitive. To simplify notation we associate a weight function and In other words: if the arc uv is present in G i but not in G i−1 , then we have ω ≥i (uv) = i and ω <i (uv) = ∞. It can be shown that the arcs of weight d appear exactly in augmentation G d . These augmentations behave similarly to graph powers in the following sense: consider two vertices u, v that are at distance d in G. Then in every augmenation G r for r ≥ d we either find the arc uv ∈ G d with ω d (uv) = d, or the the arc vu ∈ G d with ω d (vu) = d, or we find a common out-neighbor w of u and v in G d such that ω r (wu) + ω r (wv) = d. Importantly, graph classes of bounded expansion admit dtf-augmentations in which the maximum out-degree ∆ + ( G r ) is bounded by a function of depth r and the graph class in question [56] (we remark that commonly in the literature one orients the graphs G i to minimize in-degrees instead of out-degrees, however, for consistency with the weak coloring numbers we orient so that an arc uv ∈ E( G i ) corresponds to u ∈ WReach i [G, L, v]). The algorithm to compute such augmentations closely follows the original algorithm for tf-augmentations (described in [46,50]): first, the orientation G 1 is chosen to be the acyclic ordering derived from the degeneracy ordering of G; this orientation minimizes ∆ + ( G 1 ). Second, we can orient the fraternal arcs added in step r by first collecting all potential fraternal edges in an auxiliary graph G f r and then again compute an acyclic orientation G f r which minimizes the out-degree. We then insert the arcs into G r according to their orientation in G f r . If instead of computing fraternal edges at step r by searching for fraternal configurations in all pairs G i , G j with i + j = r, it suffices to consider the pair G r−1 , G 1 . The same optimization does not hold for transitive arcs, however.
The precise connection between dtf-augmentations and wcol-orderings is presented in the following lemma.
Lemma 3.1 ( [6,30]). Let G r be the r-th dtf-augmentation of a graph G and let G r be the underlying undirected graph. Let L be an ordering of V (G) such that every vertex has at most c smaller neighbours with respect to L. Then WReach r [G, L, v] ≤ (∆ + ( G r ) + 1)c + 1 for all v ∈ G.
Therefore we can obtain a wcol r -ordering from the rth dtf-augmentation G r by simply computing a degeneracy ordering of G r .

Flat decompositions
The following approach for approximating the weak coloring numbers was introduced in [65] and provably yields good results on graphs that exclude a fixed minor.
A decomposition of a graph G is a sequence H = (H 1 , . . . , H ) of non-empty subgraphs of G such that the vertex sets V A decomposition yields a good order for the weak coloring numbers if we can (2) ensure that we can order the vertices inside each H i so that we have good weak reachability properties.
We call such a decomposition flat. The following procedure was proposed in [65] to compute a decomposition of a graph G. If G excludes the complete graph K t as a minor, the resulting decomposition is flat. For a decomposition (H 1 , . . . , H ) of a graph G and Without loss of generality we may assume that G is connected. We iteratively construct a connected decomposition H 1 , . . . , H of G. To start, we choose an arbitrary vertex v ∈ V (G) and let H 1 be the connected subgraph G [v]. Now assume that for some q, 1 ≤ q ≤ − 1, the sequence H 1 , . . . , H q has already been constructed. Fix some component C of G[H ≥(q+1) ] and denote by Q 1 , . . . , Q s ∈ {H 1 , . . . , H q } the subgraphs that have a connection to C. Using that K t is excluded as a minor, one may argue that s ≤ t − 2. Because G is connected, we have s ≥ 1. Let v be a vertex of C and let T be a breadth-first search tree in G[C] with root v. We choose H q+1 to be a minimal connected subgraph of T that contains v and that contains for each i, 1 ≤ i ≤ s, at least one neighbor of Q i . As shown in [65], if K t G, then the above procedure produces a linear order L that certifies that wcol r (G) ∈ O(r t−1 ).
Observe that this procedure leaves some freedom on how to pick the vertex v of C from which we start the breadth-first search and in which order to insert the vertices of H i . We evaluate several options. For the choice of the root vertex, the following choices seem reasonable.
1. Choose a vertex that is maximizing the number of neighbors in some Q i , to possibly obtain a set H q+1 that is smaller than when we choose a vertex far from all Q i .
2. Choose a vertex that has maximum degree in C, high degree vertices should be low in the order.
3. Choose a vertex that has maximum degree in C, but only among those that are adjacent to some Q i .
For the order of the vertices of H i , we check the following options.
1. The breadth-first search and the depth-first search order from the root.
3. Each of the above, but reversed.

Treedepth heuristic
Since the 'limit' of weak-coloring numbers is exactly the treedepth of a graph, i.e. wcol ∞ (G) = td(G), we consider simply computing a treedepth decomposition and using an ordering derived from the decomposition. Our algorithm of choice, developed by Sanchez [62] and implemented by Oelschlägel [54], recursively extracts separators from the graph. To minimize the search space, only close separators are considered, that is, separators S that lie in the closed neighborhood of some vertex. Furthermore, the algorithm makes use of the following proposition.
Let N S (G) be the set of minimal separators that can be constructed from a minimal separator S by applying the above proposition, where S is an arbitrary minimal close separator. The algorithm then finds the separator S 0 ∈ N S (G) which minimizes the size of the largest connected component in G − S 0 (the implementation supports other heuristics, but this heuristic turned out to have an acceptable running time for the large instances).

Treewidth heuristic
A well-known approach to compute a treewidth decomposition of a graph is to find a linear order of the vertices, an elimination order, of possibly small maximum back-degree. From such an order it is easy to construct a tree decomposition of width equal to the back-degree (see, e.g. [10]). Let L ∈ Π(G) and There is a number of heuristics to produce good elimination orders. We chose one that is simple, fast and that gives rather good results for treewidth: the so-called minimum-degree heuristic [10]. The minimum-degree algorithm orders the vertices of the graphs starting from the biggest vertex which is one with minimum degree. Assume that we already ordered vertices with indices greater than i, we put on position i a vertex with the least back-degree.

Sorting by degrees and other heuristics
Apart from algorithms with theoretical guarantees we also compared several naive heuristics.
• For r = 1 an optimal order is a degeneracy order, which can be easily computed. We can check if this order produces reasonable results for higher values of r as well.
• Intuitively, it makes sense to sort vertices by descending degree (ties are broken arbitrarily) because from vertices of high degree more vertices can be reached in one step. This intuition is further supported by one popular network model, the Chung-Lu random graphs which sample graphs with a fixed degree distribution and succesfully replicate several statistics exhibited by real-world networks [13,14]. In this model, vertices are assigned weights (corresponding to their expected degree) and edges are sampled independently but biased according to the endpoints weights. Under this model, vertices of the same degree are exchangable and the one ordering we can choose to minimize the number of r-reachable vertices is simply the descending degree ordering.
• A simple idea of generalizing the above heuristics to bigger values of r is to apply them to the rth power G r of G, i.e. G r is defined as the graph with V (G r ) = V (G) and uv ∈ E(G r ) ⇔ dist G (u, v) ≤ r.
• As a baseline we also included random ordering of vertices.

Local search
In addition to all these approaches we can try to improve their results by local search, a technique where we make small changes to a candidate solution. We applied the following local changes and tested whether they caused improvements to the current order L.
• Take any vertex v that has biggest WReach r [G, L, v] and swap it with a random vertex that is smaller with respect to L.
• Take any vertex v that has biggest WReach r [G, L, v] and swap it with its direct predecessor u in L.
Both heuristics try to place a vertex with many weakly reachable vertices to the left of them and thus to make them non-weakly reachable. The advantage of the second rule is that the only possible changes are that WReach r [G, L, v] loses u (if u was there) and that WReach r [G, L, u] may obtain v. So WReach r [G, L, v] is trivial to recompute and the only computationally heavy update is for the new WReach r [G, L, u]. For the first rule, recomputing WReach sets is more expensive. However, the disadvantage of the second rule is that it does not lead to further improvements quickly, hence applications of only the first rule give better results than applications of the second rule only. In our implementation we did a few optimizations in order to improve the results of second rule, but we refrain from describing them in detail. Final algorithm conducting local search firstly performs round of applications of first rule and when they no longer improve results it performs round of applications of second rule. Such combination turned out to be empirically most effective.

Uniform quasi wideness
Intuitively, a class of graphs is wide if for every graph G from the class, every radius r ∈ N and every large subset A ⊆ V (G) of vertices one can find a large subset B ⊆ A of vertices which are pairwise at distance greater than r (recall that such a subset is called r-independent). The notion of uniform quasi-wideness allows to additionally delete a small number of vertices to make B r-independent. The following definition formalises the meaning of "large" and "small". Definition 4.1. A class C of graphs is uniformly quasi-wide if for every m ∈ N and every r ∈ N there exist numbers N (m, r) and s(r) such that the following holds.
Let G ∈ C and let A ⊆ V (G) with |A| ≥ N (m, r). Then there exists a set S ⊆ V (G) with |S| ≤ s(r) and a set B ⊆ A \ S of size at least m such that for all distinct u, v ∈ B we have dist G−S (u, v) > r.
Uniform quasi-wideness was introduced by Dawar in [16] and it was proved by Nešetřil and Ossona de Mendez in [48] that uniform quasi-wideness is equivalent to nowhere denseness. Very recently, it was shown that the function N in the above definition can be chosen to be polynomial in m [37,55]. A single exponential dependency was earlier established for classes of bounded expansion [36]. We are going to evaluate the algorithms derived from the proofs in [36,55], as well as a new algorithm that is streamlined for bounded expansion classes and also achieves polynomial bounds in m. We discuss these algorithms in more detail next. We will prove in Section 8 that the bounds of our new algorithm are close to optimal.

Distance trees
We first describe the algorithm that was introduced in [55]. For simplicity, we focus on the case r = 2. First, observe that every graph from a nowhere dense class contains large independent sets. By definition of a nowhere dense class, some complete graph K t is excluded as a depth-0 minor, that is, simply as a subgraph. Hence, Ramsey's Theorem immediately implies that if we consider any set A ⊆ V (G) of size at least t+m−2 m−1 , then there exists a set B ⊆ A of size m which is independent (without deleting any elements). Furthermore, the proof of Ramsey's Theorem yielding this bound is constructive and can easily be implemented. The difficult part is now to find in a large independent set a large 2-independent set, possibly after deleting a few elements (consider a star to see that deletion may be necessary).
Assume now that A is a large independent set. The idea is to arrange the elements of A in a binary tree T , which we call a distance tree, and prove that this tree contains a long path. From this path the set B is extracted.
We identify the nodes of T with words over the alphabet {0, 1}, where corresponds to the root, and where for a word w the word w0 is its left and the word w1 is its right successor. Fix some enumeration of the set A. We define T by processing the elements of A sequentially according to the enumeration. We start with the tree that has its root labeled with the first element of A. For each remaining element a ∈ A we execute the following procedure which results in adding a node with label a to T .
When processing the vertex a, do the following. Start with w being the empty word. While w is a node of T , repeat the following step: if the distance from a to the vertex b which is at the position corresponding to w in T is at most 2, replace w by w0, otherwise, replace w by w1. Once w does not correspond to a node of T , extend T by adding the node corresponding to w and label it with a. In this way, we have processed the element a, and now proceed to the next element of A until all elements are processed. This completes the construction of T . Thus, T is a tree labeled with vertices of A, and every vertex of A appears exactly once in T . Now, based on the fact that some complete graph K t is excluded as a depth-2 minor of G, it is shown that T contains a long path. This path either has many left branches or many right branches. Take a subpath that has only left branches or only right branches. Such a path corresponds to a set X such that all elements have pairwise distance 2, or all elements have pairwise distance greater than 2, that is, to a 2-independent set. In the second case, we have found the set B that we are looking for. In the other case, we proceed to show that there must exist an element w ∈ V (G) that is adjacent to many elements of X, i.e., N (w) ∩ X is large. We add the vertex w to the set S of elements to delete and repeat the above tree-classification procedure with the set A = N (w) ∩ X. It is shown that this process must stop after at most t steps and yields a set B which is 2-independent in G − S.
The general case reduces to the case r = 1 or r = 2 if instead of starting with an independent set A we start with an i-independent set A i and contract the disjoint i/2 or (i + 1)/2-neighborhoods of the elements of A i , respectively, to single vertices. Then one iteratively finds i-independent sets A 1 , A 2 , . . . , A r for larger and larger radii.

Implementation details
We have implemented three variants of the above method, which we denote tree1, tree2 and ld it. In all variants, we get a graph G, a vertex subset A ⊆ V (G) and r ∈ N as input. We do not have the number m as input but we aim to find an r-independent subset B ⊆ A which is as large as possible while deleting as few elements as possible.
For the odd cases (which reduce to r = 1 in the description above), in each variant we use a simple heuristic for finding independent sets described in Section 4.4.
For more interesting even cases (which reduce to r = 2 in the description above), tree2 computes a set of candidate solutions (C, S) . Here, C is a set which corresponds to a long path in the distance tree and S is the set of vertices removed so far (for this set C). At every step we compute one candidate solution (C, S), remove a vertex w, i.e. move it to S, which has largest intersection |N (w) ∩ A| and continue the process with N (w) ∩ A until A becomes too small. In the end, we output the best solution from the pool of collected solutions.
In the version denoted by tree1, we modify tree2 as follows. We let C be a candidate for a large 2-independent set, which however, we do not choose as a subset of the currently handled set A, but of the original input set A. That is, we re-classify all distances of elements of the initial set A in a distance tree with vertices S that were deleted in later steps, to draw the candidate 2-independent set from a larger pool of vertices.
Finally, in the ld it version (least degree iterated) we do not find 2-independent sets based on the distance tree, but rather in a simple greedy manner as an independent set in the graph (G − S) 2 [A].

Weak coloring numbers and uniform quasi-wideness
We now describe the approach of [36] which is designed for classes of bounded expansion and combines the weak coloring numbers with uniform quasi-wideness.
Let G be a graph, A ⊆ V (G) and m, r ∈ N be given. First, fix some order L ∈ Π(G) such that WReach r [G, L, v]| ≤ c for every v ∈ V (G) (for some constant c). Let H be the graph with vertex set V (G), where we put an edge uv ∈ E(H) if and only if u ∈ WReach r [G, L, v] or v ∈ WReach r [G, L, u]. Then L certifies that H is c-degenerate, and hence, assuming that |A| ≥ (c+1)·2 m , we can greedily find an independent set I ⊆ A of size 2 m in H. By the definition of the graph H, we have that WReach r [G, L, v] ∩ I = {v} for each v ∈ I. Now observe that for v ∈ I, deleting WReach r [G, L, v] \ {v} from G leaves v at a distance greater than r (in G − (WReach r [G, L, v] \ {v})) from all the other vertices of I.
Based on this observation, one follows the simple approach also used to prove Ramsey's Theorem with exponential bounds. For each vertex v of I (in decreasing order, starting with the largest vertex with respect to L), we test whether v is connected by a path of length at most r to more than half of the remaining vertices of I. If this is the case, we delete the set WReach r [G, L, v] from G (i.e. add it to S) and add the vertex v to the set B. We continue with the subset of I that had such a connection to v (which is now separated by the deletion of S though). Otherwise, v is not connected to more than half of the remaining vertices of I, in which case we simply add v to B and do not delete anything. In this case, we continue the construction with those vertices of I that are not connected to v. It is proved that the first case can happen at most wcol r (G) ≤ c many times, hence, in total we delete at most c 2 vertices and arrive at a set B with m vertices that are pairwise at distance greater than r in G − S.
We have implemented exactly the algorithm outlined above. We denote it by mfcs.

A new algorithm
Motivated by the rather conservative character of the algorithm of [36] described above, we propose here a new algorithm (albeit inspired by [36]). Furthermore, in Section 8 we show an almost tight lower bound for the guarantees of this algorithm in graphs excluding a fixed minor. More formally, we show the following theorem. Proof. The algorithm iteratively constructs sets A = A 0 ⊇ A 1 ⊇ A 2 ⊇ . . ., ∅ = S 0 ⊆ S 1 ⊆ S 2 ⊆ . . ., and ∅ = B 0 ⊆ B 1 ⊆ B 2 ⊆ . . ., maintaining the following invariants in every step i: B i ⊆ A \ S i , the set B i is an r-independent set in G − S i , and every vertex of A i is within distance greater than r from every vertex in B i in the graph G − S i . At step i, given A i , S i , and B i , the algorithm proceeds as follows.
and delete the conflicting vertices from A i , that is set (deletion step) Otherwise, pick a vertex z ∈ V (G) \ S i that appears in a maximum number of weakly reachable sets of vertices of A i . That is, pick z ∈ V (G) \ S i maximizing the quantity Insert z into S i+1 and restrict A i to vertices containing z in their weak reachable sets. More formally, The algorithm stops when A i becomes empty, and then returns S = S i and B = B i . Let us now analyze the algorithm. The fact that in the growth step we remove from A i+1 the vertices of A i that are within distance at most r from v preserves the invariant that the distance between A i and B i in G − S i is greater than r. This invariant, in turn, proves that B i is an r-independent set in G − S i . It remains to show the bounds on the sizes of S and B. To this end, we show the following two claims. Proof. The claim follows directly from the fact that in the deletion step, we restrict A i+1 to be the set of those vertices of A i that have z in their weak reachability set.
Proof. Let v ∈ A i be the least vertex of A i in the ordering L. Since the growth step is not applicable, we have that the set X := N G−Si r [v] ∩ A i is of size at least |A i |/m. For every x ∈ X, fix a path P x of length at most r between x and v in G − S i , and let z x be the L-minimal vertex on this path. The subpath of P x from z x to v shows that z x ∈ WReach r [G, L, v] and the subpath of P x from z x to x shows that z This finishes the proof of the claim.
Consequently, when the algorithm executes the deletion step, we have |A i+1 | ≥ |A i |/(cm) − 1 (the −1 comes from the case z ∈ A i ).
In particular, we have that the last step of the algorithm is the growth step: the deletion step executes only if |A i | > 2cm, and then |A i+1 | ≥ |A i |/(cm) − 1 > 1. Let v be the vertex added to B i+1 in this last growth step. Then we have that S = S i+1 = S i ⊆ WReach r [G, L, v]. Consequently, the algorithm executed at most c deletion steps and |S| ≤ c.
For the bound on the size of set B, let i be the minimum index with |A i | ≤ 2cm. For every 0 ≤ j < i that executed a deletion step, we have For every 0 ≤ j < i that executed a growth step, we have In particular, we have A i = ∅ due to m ≥ 2. Consequently, since the algorithm executed |S i | deletion steps and |B i | growth steps, we have Hence, since (1 − 1/m) m ≥ 1/4 for every m ≥ 2, if |A| ≥ 4 · (2cm) c , then we have |B i | ≥ m. This finishes the proof.
The actual implementation of the above algorithm differs in a number of aspects. First, we found the threshold |A i |/m for the distinction between the growth step and the deletion step too small in practice, despite working well in the proof above. Moreover, experiments with this algorithm showed that it is unstable in the sense that small changes in this threshold can trigger big changes in produced result which are, a priori, hard to predict. Because of that our implementation has a fixed constant k and executes the above algorithm with thresholds 1 k+1 , 2 k+1 , . . . , k k+1 and chooses the best result (we will address comparing different results later).
Second, the above algorithm can be modified so that the growth step is applied only in cases where least vertex of A i with respect to L has only a small number of conflicts, in which case we use that first vertex to enlarge B. Note that such an algorithm also satisfies the theorem, because in the analysis of the algorithm we used only the fact that if the growth step is not applicable, then this condition is not satisfied for the first vertex of A i . Such a variant is present in our implementation.
Third, in the proof above, the algorithm always applies the growth step when the size of A i drops below the threshold 2cm. This is a minor technical detail, and can be omitted at the cost of some more hassle in the proof (in the analysis of the last steps of the algorithm) and somewhat worse bounds for |S| and |A|. In the implementation, we do not have this threshold, but instead we roll back the unnecessary deletion steps that were performed by the algorithm near the end of the execution. It is straightforward (but a bit more tedious) to adapt the above analysis to this variant.

Implementation details
We have implemented three variants of the above described method, which we denote new1, new2 and new ld. In the outlined algorithm, when we consider a vertex v, we compute the set of vertices from A conflicting with v. In new1, we consider two vertices to be conflicting if their WReach r sets intersect. In new2 and new ld, two vertices are considered to be conflicting if the distance between them in the remaining part of the graph is at most r. Moreover, new ld after every step tries to fill its partial solution with the heuristic described in Section 4.4 to find independent set in (G − S) r ∩ A, where S is a set of already removed vertices.

Other naive approaches and heuristic optimizations
Since uniform quasi-wideness for r = 1 is exactly finding independent sets, it makes sense to include heuristics for finding independent sets as a baseline. Moreover the problem of finding independent sets is also used as a subroutine in the approach based on distance trees. We used the following simple greedy algorithm to find independent sets. As long as our graph is nonempty, take any vertex that has the smallest degree, add it to the independent set and remove it and its neighbors from the graph.
The following algorithm is what we came up with as a naive but reasonable heuristic for larger values of r. For every number k ∈ {0, 1, . . . , K} (where K is some hardcoded constant) computes the biggest independent set in graph (G − S k ) r [A] using the greedy procedure described above, where S k is a set of k vertices with biggest degrees. This heuristic is based on the fact that independent sets in G r correspond to r-independent sets in G. Without any other knowledge about the graph, vertices with the biggest degree seem to be the best candidates to be removed. In the end, we output the best solution obtained in this manner. In the following, we abbreviate this approach as ld (least degree on power graph).

Comparing different results
Uniform quasi-wideness is a two-dimensional measure: we have to measure both the size m of the r-independent set B which we desire to find, as well as the size s(r) of vertices to be deleted. In order to compare the performance of our studied methods we propose the following approach that arises from applications of uniform quasi-wideness in several algorithms [17,22,55,63].
Let G, A ⊆ V (G), r ∈ N be an input to any of our algorithms (note that none of our algorithms takes the target size of the r-independent set as input) and let S ⊆ V (G) and B ⊆ A \ S such that B is r-independent in G − S be its output. Let us define π r [v, S] -the r-distance profile of v on S -as the function from S to {0, 1, . . . , r, ∞} so that π r [v, S](a) = dist G (v, a) if this distance is at most r, and π r [v, S](a) = ∞ otherwise. The performance of the algorithms [17,22,55,63] strongly depends on the size of the largest equivalence class on B defined by u ∼ v if π r [u, S] = π r [v, S] for u, v ∈ B.
We hence decided to use the size of the largest equivalence class in the above relation as the scoring function to measure the performance of our algorithms. Note that number of different r-distance profiles is bounded by (r + 2) |S| , so if r is fixed and |S| is bounded then the number of different r-distance profiles is also bounded, so having a big r-independent set implies having a big subset of this set with equal r-distance profiles on S.
This well defined scoring function makes it possible to compare the results of the algorithms. Furthermore, in our code the implementation of the scoring function can be easily exchanged, so if different scoring functions are preferred, re-computation and re-evaluation is easily possible.

Hard-and Software
The experiments on generalized coloring numbers has been performed on an Asus K53SC laptop with Intel Core i3-2330M CPU @ 2.20GHz x 2 processor and with 7.7GiB of RAM. Weak coloring numbers of a larger number of graphs for the statistics in Section 6.4 (presented without running times) were produced on a cluster at the Logic and Semantics Research Group, Technische Universitt Berlin. The experiments on uniform quasi-wideness have been performed on a cluster of 16 computers at the Institute of Informatics, University of Warsaw. Each machine was equipped with Intel Xeon E3-1240v6 3.70GHz processor and 16 GB RAM. All machines shared the same NFS drive. Since the size of the inputs and outputs to the programs is relatively small, the network communication was neglible for tests with substantial running times. The dtf implementation has been done in Python, while all other code in C++ or C. The code is available at [43,3].

Test data
Our dataset consists of a number of graphs from different sources.

PACE 2016 Feedback Vertex Set The Parameterized Algorithms and Computational Experiments Chal-
lenge is an annual programming challenge started in 2016 that aims at investigate the applicability of algorithmic ideas studied and developed in the subfields of multivariate, fine-grained, parameterized, or fixed-parameter tractable algorithms (from the PACE webpage). In the first edition, one of the tracks focused on the Feedback Vertex Set problem [18], providing 230 instances from various sources and of different sizes. We have chosen a number of instances with small feedback vertex set number, guaranteeing their very strong sparsity properties (in particular, low treewidth). In our result tables, they are named fvs???, where ??? is the number in the PACE 2016 dataset.
Random planar graphs In their seminal paper, Alber, Fellows, and Niedermeier [5] initiated the very fruitful direction of developing of polynomial kernels (preprocessing routines rigorously analyzed through the framework of parameterized complexity) in sparse graph classes by providing a linear kernel for Dominating Set in planar graphs. Dominating Set soon turned out to be the pacemaker of the development of fixedparameter and kernelization algorithms in bounded expansion and nowhere dense graph classes [6,17,22,23]. In [5], an experimental evaluation is conducted on random planar graphs generated by the LEDA library [2]. We followed their setup and included a number of random planar graphs with various size and average degree. In our result tables, they are named planarN, where N stands for the number of vertices.
Random graphs with bounded expansion A number of random graph models has been shown to produce almost surely graphs of bounded expansion [21]. We include a number of graphs generated by O'Brien and Sullivan [53] using the following models: the stochastic block model (sb-? in our dataset) [31] and the Chung-Lu model with households (clh-?) and without households (cl-?) [15]. We refer to [21,53] for more discussion on these sources.
The graphs have been partitioned into four groups, depending on their size: the small group gathers graphs up to 1 000 edges, medium between 1 000 and 10 000 edges, big between 10 000 and 48 000 edges, and huge above 48 000 edges. The random planar graphs in every test group have respectively 900, 3 900, 21 000, and 150 000 edges. The whole dataset is available for download at [3].
as previous, but only among neighbors of already processed vertices. The value is the average of the approximation ratios to the best generalized coloring numbers found by all versions of this algorithm. As discussed in Section 3.2, we have experimented with a number of variants of the flat decompositions approach, with regards to the choice of the next root vertex and the internal order of the vertices of the next B i . The results for the big dataset are presented in Table 1. They clearly indicate that (a) all reversed orders performed much worse, and (b) among other options, the best is to sort the vertices of a new B i nonincreasingly by degree and choose as the next root the vertex of maximum degree. In the subsequent tests, we use this best configuration for comparison with other approaches.  Out of all simple heuristics (c.f. Section 3.5) the degree sorting was supreme and we skip the results of inferior heuristics (see [43,3] for full data). Interestingly, this heuristic also outperformed all other (much more involved) approaches on larger graphs. On small graphs, the treewidth heuristic takes the lead. An explanation why the treewidth heuristic is better on smaller graphs G might be that tw(G) = col ∞ (G) and on small graphs the difference between col ∞ (G) and col r (G) for the considered r is not that big. However, this does not explain why treedepth does not perform better than treewidth. (Recall that td(G) = wcol ∞ (G).) It is worth observing that on larger graphs (the big group) the performance of the flat decomposition matches or outperforms the one of the treewidth heuristic for radii r = 2, 3, 4. However, the treewidth heuristic outperforms all approaches with proved guarantees for r = 5 on test sets up to the big group. Table 2 gathers total running time of our programs on discussed data sets. These results clearly indicate large discrepancy between consumed resources for different approaches. Out of the approaches with provable guarantees on the output coloring number, the flat decompositions approach is clearly the most efficient.

Comparison of all approaches
Note that we applied different timeout policies for generating different data. For generating time of execution and for applying local search we set timeout to be 1 minute, however for generating orders and wcol numbers we set timeout to be 5 minutes, but for the sake of completeness we sometimes allowed some programs to run longer.
In summary, on our data sets the simple heuristic is consistently the fastest and produces the best results, save for the smallest graphs on which the treewidth heuristic won. We remark here that it is simple to "fool" the degree-sorting heuristic by adding multiple pendant vertices of degree one and thus forcing it to take arbitrarily bad ordering, but such adversarial obstacles seem to be absent in real-world graphs. If one is to choose an algorithm with provable guarantees, the discussed variant of the flat decompositions approach appears to be the best choice.

Local search
In a second round of experiments we applied a simple local-search routine that, given an ordering output by one of the approaches, tries to improve it by moving vertices with the largest weakly reachable sets earlier in the ordering. The white columns in Table 3 show how local search improved orderings output by discussed approaches, and the gray columns show average approximation ratios of orderings improved by local search. Two remarks are in place. First, regardless of how the ordering was computed, a local search step always significantly improves the ordering (we have no good explanation on why local search is significantly less effective on the orderings output by the treewidth heuristic for bigger radii). Second, the local search step does not improve the orderings enough to change the relative order of the performance of the base approaches except for one remarkable case. On medium group the treewidth heuristic gave best results on r = 5, however degree sort regained the lead after application of local search due to its low performance on larger radii for treewidth heuristic. We therefore recommend the local search improvement as a relatively cheap post-processing improvement to any existing algorithm.

Correlation of weak coloring numbers with other parameters
While it is undeniable that weak coloring numbers have immense algorithmic power from a theoretical perspective, the efficient computation of such weak coloring orders is only one component to leverage them in practice: we also need these numbers to be reasonably low. So far, this had only been established on a smaller scale [21,56] for a related measure. Here, we computed the weak coloring number for r ∈ {1, . . . , 5} for 1675 real-world networks from various sources [38,41,61,7,1]. Figure 1 summarizes our findings: for r ∈ {1, . . . , 3} we find a modest correlation with n and a significant correlation with m. The correlation with n becomes quite pronounced for r = 5; the probable reason being that for all networks involved log n ≤ 10. Still, even in the worst examples wcol 5 is at least one order of magnitude smaller than n or m. We further see a high correlation between wcol 1 and the average degreed which vanishes for larger radii. It is no big surprise thatd and the degeneracy wcol 1 are highly correlated since these values are only far apart in graphs with highly inhomogeneous densities. The low dependence on the maximum degree confirms the findings of [21]: the exact shape of the degree distribution's tail is much more relevant than the singular value of the maximum degree. Finally, note that in our graphs the degeneracy wcol 1 practically does not grow with n. 7 Uniform quasi wideness: results Table 4 gathers aggregated data from our experiments on medium dataset. (Full data can be downloaded from [43,3].) Every tested algorithm has been run on every test with timeout 10 minutes and with radii r ∈ {2, 3, 4, 5} and with the starting set either A = V (G) or a random subset of 20% of vertices of V (G).
Data indicate the simple heuristic, ld, as the best choice in most scenarios, as it has always best or nearly-best total score and runs relatively quickly. The third variant of the new algorithm new ld has comparable results, but is inefficient and does not finish within the timeout. Other variants new1 and new2 as well as mfcs are significantly outperformed by other approaches. Out of other approaches with provable guarantees, the variants tree1, tree2, and ld it provide results in most cases less than 10% worse than the heuristic ld, with tree2 being consistently worse.
To sum up, our experiments show that the simple heuristic ld gives best results, but if one is interested in algorithm with provable guarantees, one should choose one of the variant tree1 over mfcs or new1/new2.

A lower bound to the TGV algorithm
In this section we observe that the construction of [29] shows also that the bounds of our new uniform quasi wideness algorithm of Section 4.3 are close to optimal. More precisely, we show the following corollary of the construction of [29].
Theorem 8.1. For every integers k, r ≥ 1 and every integer m > c where c = k+r r , there exists a graph G k,r,m with the following properties: • the treewidth of G k,r,m is at most k; • wcol r (G k,r,m ) = c; • |V (G)| ≥ (m − 1) c . r algorithm start with whole V (G) start with 20% of V (G) deleted independent score time deleted independent score time Apart from this slackness, the bounds in Theorem 8.1 are very similar to the ones of Theorem 4.2: to get a independent set of size m := cm + 1 in a graph with wcol r = c one needs a vertex set of a graph of size (m − 1) c ∼ (m/c) c and deletion of c vertices.
Proof of Theorem 8.1. We start by recalling the construction of [29]. Fix a branching degree d. For every k, r ≥ 1 let T (k, r) be a rooted tree of depth c = k+r r and branching degree d.We define graphs G(k, r) inductively as follows.
First, we start with T (k, r) being a spanning tree of G(k, r). We will maintain the invariant that every edge of G(k, r) connects an ancestor and a descendant in T (k, r) (i.e., G(k, r) is a subgraph of ancestor-descendant closure of T (k, r)).
For k = 1, we take G(k, r) = T (k, r). For r = 1, we take G(k, r) to be the whole ancestor-descendant closure of T (k, r), that is, we add uv to E(G(k, r)) whenever u is an ancestor of v in T (k, r). For k, r ≥ 2, note that one can equivalently construct T (k, r) as follows: start with T (k, r − 1) and for every leaf v of T (k, r − 1), create d copies of T (k − 1, r) and connect their roots to v. To define G(k, r), we proceed as follows: we start with G(k, r − 1) and for every leaf v of the spanning tree T (k, r − 1) of G(k, r − 1), we create d copies of G(k − 1, r) and make all of them fully adjacent to v.
In [29], it is shown that the treewidth of G(k, r) is k, and that as long as d ≥ c = k+r r , in every ordering of V (G(k, r)) there exists a leaf v of T (k, r) with its every ancestor belonging to WReach r [G(k, r), L, v] (in particular, wcol r (G(k, r)) ≥ c). We take G(k, r, m ) = G(k, r) for branching degree d = m − 1; recall that m > c = k+r r . The bound on the number of vertices of G(k, r, m ) is straightforward. It remains to show the last property of G(k, r, m ).
We start by observing the following.
Claim 8.2. For every v ∈ V (G(k, r)) and its ancestor u in T (k, r), there exists a path from v to u of length at most r that traverses only vertices on the unique path from v to u in T (k, r).
Proof. We proceed by induction on k + r. For k = 1 or r = 1 the statement is straightforward. Assume then k, r ≥ 2, and recall that G(k, r) consists of G(k, r − 1) and d copies of G(k − 1, r) attached to every leaf of T (k, r − 1). If u and v both belong to G(k, r − 1) or to the same copy of G(k − 1, r), then we are done by inductive hypothesis. Otherwise, v belongs to a copy of G(k − 1, r) attached to a leaf w of T (k, r − 1), and u belongs to G(k, r − 1). By induction hypothesis, there exists a path of length at most r − 1 from w to u that uses only vertices on the path from w to u in T (k, r − 1). Together with the edge vw, this path forms the desired path from v to u.
Consequently, for every subtree T of T (k, r), every vertex of T is within distance at most r from the topmost vertex of T in G(k, r), and, consequently, the vertex set of T induces a graph of diameter at most 2r in G(k, r).
Consider now a pair of disjoint sets B, Z ⊆ V (G) such that B is 2r-independent in G − Z. The observation from the preceeding paragraph implies that every connected component of T (k, r) − Z contains at most one vertex of B. On the other hand, the maximum degree of T (k, r) is d + 1 = m . Consequently, |B| ≤ |Z|m + 1. This finishes the proof.