Abstract 1 Introduction 2 From Sparsifier to Diversification and Clustering 3 Framework for 𝒌-Max-Distance Sparification 4 Framework for 𝒅-Limited 𝒌-Max-Distance Sparsification References

Max-Distance Sparsification for Diversification and Clustering

Soh Kumabe ORCID CyberAgent, Tokyo, Japan
Abstract

Let 𝒟 be a set family that is the solution domain of some combinatorial problem. The max-min diversification problem on 𝒟 is the problem to select k sets from 𝒟 such that the Hamming distance between any two selected sets is at least d. FPT algorithms parameterized by k+, where =maxD𝒟|D|, and k+d have been actively studied recently for several specific domains.

This paper provides unified algorithmic frameworks to solve this problem. Specifically, for each parameterization k+ and k+d, we provide an FPT oracle algorithm for the max-min diversification problem using oracles related to 𝒟. We then demonstrate that our frameworks provide the first FPT algorithms on several new domains 𝒟, including the domain of t-linear matroid intersection, almost 2-SAT, minimum edge s,t-flows, vertex sets of s,t-mincut, vertex sets of edge bipartization, and Steiner trees. We also demonstrate that our frameworks generalize most of the existing domain-specific tractability results.

Our main technical breakthrough is introducing the notion of max-distance sparsifier of 𝒟, a domain on which the max-min diversification problem is equivalent to the same problem on the original domain 𝒟. The core of our framework is to design FPT oracle algorithms that construct a constant-size max-distance sparsifier of 𝒟. Using max-distance sparsifiers, we provide FPT algorithms for the max-min and max-sum diversification problems on 𝒟, as well as k-center and k-sum-of-radii clustering problems on 𝒟, which are also natural problems in the context of diversification and have their own interests.

Keywords and phrases:
Fixed-Parameter Tractability, Diversification, Clustering
Copyright and License:
[Uncaptioned image] © Soh Kumabe; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Fixed parameter tractability
Related Version:
Full Version: https://arxiv.org/abs/2411.02845
Editors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz Herman

1 Introduction

1.1 Background and Motivation

The procedure for approaching real-world problems with optimization algorithms involves formulating the real-world motivations as mathematical problems and then solving them. However, real-world problems are complex, and the idea of a “good” solution cannot always be correctly formulated. The paradigm of diversification, introduced by Baste et al. [8] and Baste et al. [7], is a “formulation of the unformulatable problems”, which formulates diversity measures for a set of multiple solutions, rather than attempting to formulate the “goodness” of a single solution. By computing a set of solutions that maximize this measure, the algorithm provides effective options to evaluators who have the correct criteria for judging the “goodness” of a solution.

Let U be a finite set, k1 and d0. Let 𝒟2U be the feasible domain of some combinatorial problem. The following problem frameworks, defined by two types of diversity measures, have been studied extensively.

Max-Min Diversification Problem on 𝒟: Does there exist a k-tuple (D1,,Dk)𝒟k of sets in 𝒟 such that min1i<jk|DiDj|d?111We define Z1Z2:=(Z1Z2)(Z2Z1).

Max-Sum Diversification Problem on 𝒟: Does there exist a k-tuple (D1,,Dk)𝒟k of sets in 𝒟 such that 1i<jk|DiDj|d?

These problems ensure diversity by aiming to output solutions that are as dissimilar as possible in terms of Hamming distance.

Parameterized algorithms for diversification problems have been actively studied. Particularly, FPT algorithms for the max-min diversification problems parameterized by k+, where =maxD𝒟|D| [7, 8, 22, 26, 31], as well as by k+d [17, 21, 22, 23, 24, 29], have been the focus of research. Since assuming d2 does not lose generality in the max-min diversification problem, the latter addresses a more general situation than the former. Since the max-sum diversification problem is empirically more tractable than the max-min diversification problem, for some time hereafter, we will restrict our discussion to the max-min diversification problem.

This research provides general algorithmic frameworks for FPT algorithms solving max-min (and max-sum) diversification problems for both parameterizations k+ and k+d. Our frameworks are very general and can be applied to all domains [5, 6, 7, 8, 17, 21, 22, 23, 25, 26, 31] for which FPT algorithms parameterized by k+ and k+d are currently known for the case that diversity measure is defined using an unweighted Hamming distance. Moreover, our frameworks further provide the first algorithms for several domains where such algorithms were previously unknown.

Our main technical breakthrough is introducing a notion of max-distance sparsifier as an intermediate step, which, for the max-min diversification problem, essentially works as a core-set [2]. The formal definition is given in Section 1.3. The critical fact is that, when 𝒦 is a max-distance sparsifier of 𝒟, the max-min diversification problem on 𝒟 is equivalent to the same problem on 𝒦. Our framework constructs a constant-size max-distance sparsifier 𝒦 of 𝒟 using the oracles on 𝒟, enabling us to solve the max-min diversification problems on 𝒟 by brute-force search on 𝒦.

The power of max-distance sparsification is not limited to solving diversification problems. Specifically, the following k-center [27] and k-sum-of-radii clustering problems on 𝒟 [11] can also be solved via max-distance sparsification.

𝒌-Center Clustering Problem on 𝒟: Does there exist a k-tuple of subsets (D1,,Dk)𝒟k such that for all D𝒟, there exists an i{1,,k} satisfying |DiD|d?

𝒌-Sum-of-Radii Clustering Problem on 𝒟: Does there exist a k-tuple of subsets (D1,,Dk)𝒟k and a k-tuple of non-negative integers (d1,,dk)0k with i{1,,k}did such that for all D𝒟, there exists an i{1,,k} satisfying |DiD|di?

When 𝒟 is an explicitly given set of points, parameterized algorithms for these problems have been extensively studied in the area of clustering [3, 4, 12, 15, 16, 20, 28]. Furthermore, approximation algorithms for the relational k-means [13, 19, 30] and relational k-center [1] problems are investigated, which are the k-means and k-center clustering problem defined on a point set represented as a join of given relational databases. Their setting is similar to ours as the point set are implicitly given and its size can be exponential. This research adds several new combinatorial domains to the literature on clustering problems on implicitly given domains, and also introduces a parameterized view. These problems are also natural in the context of diversification, since in real situations, the concept of diversity often means that the extracted elements cover the entire space comprehensively rather than being mutually dissimilar. This motivation is formulated by clustering problems, which extract a list of sets in 𝒟 such that for each set in 𝒟, there is an extracted set near to it.

1.2 Our Results

This paper consists of two parts. In the first part, we design general frameworks for solving diversification and clustering problems. In the second part, we apply our frameworks to several specific domains 𝒟. Table 1 shows a list of domains on which our frameworks provide the first or an improved algorithm for the max-min diversification problem.

1.2.1 The Frameworks

We define the following (1,1)-optimization oracle on 𝒟 and the exact extension oracle on 𝒟.

(𝟏,𝟏)-Optimization Oracle on 𝒟: Let U be a finite set, 𝒟2U, and w{1,1}U be a weight vector. Return a set D𝒟 that maximizes eDwe.

Exact Extension Oracle on 𝒟: Let U be a finite set, r0, 𝒟2U, and C𝒟. Let X,YU be two disjoint subsets of U. If there exists a set D𝒟 such that |DC|=r, XD, and YD=, return one such set. If no such set exists, return .

When C= and X=, we specifically call the exact extension oracle on 𝒟 the exact empty extension oracle on 𝒟.

Exact Empty Extension Oracle on 𝒟: Let U be a finite set, r0, 𝒟2U, and YU. If there exists a set D𝒟 such that |D|=r and YD=, return one such set. If no such set exists, return .

Let 𝒫𝒟 be the max-min/max-sum diversification problem or the k-center/k-sum-of-radii clustering problem on 𝒟. Our main result is FPT algorithms for solving 𝒫𝒟 using these oracles. We construct frameworks for both types of parameterizations, k+ and k+d. The result for the parameterization by k+ is as follows.

Theorem 1.

There exists an oracle algorithm solving 𝒫𝒟 using the exact empty extension oracle on 𝒟, where the number of oracle calls and time complexity are both FPT parameterized by k+ and for each call of an oracle, r and |Y| are bounded by constants that depend only on k+.

The result for the parameterization by k+d is as follows.

Theorem 2.

There exists a randomized oracle algorithm solving 𝒫𝒟 using the (1,1)-optimization oracle on 𝒟 and the exact extension oracle on 𝒟, where the number of oracle calls and time complexity are both FPT parameterized by k+d and for each call of the exact extension oracle, r+|X|+|Y| are bounded by constants that depend only on k+d.

Table 1: List of new results for the max-min diversification problem obtained by our frameworks. The first column represents the domain 𝒟. The second column represents parameterization. For the formal definition of each domain, see the full version.
Domain Parameter
t-Linear Matroid Intersection k++t
Almost 2-SAT k+
Independent Set on Certain Graphs k+
Min Edge s,t-Flow k+d
Steiner Tree k+d+|T|
Vertex Set of Min s,t-Cut k+d
Vertex Set of Edge Bipartization k+d+s

1.2.2 Applications of Theorem 1

On most domains 𝒟, the exact empty extension oracle can be designed by almost the same way as an algorithm to extract a single solution from 𝒟. For example, consider the case where 𝒟 is the -path domain, i.e., a domain of sets of edges on paths of length . In this case, the exact empty extension oracle on 𝒟 is equivalent to the problem of finding an -path in the graph obtained by removing all edges in Y from the input. Combining Theorem 1 with this empirical fact, we can claim that, for most domains 𝒟, the diversification and clustering problems parameterized by k+ on 𝒟 are as easy as determining the non-emptiness of 𝒟.

To demonstrate that Theorem 1 yields existing tractability results, we design the oracles for the domains of the vertex cover [7, 8], t-hitting set [7], feedback vertex set [7], and common independent set of two matroids [17, 22]. We also apply our framework on new domains, t-represented linear matroid intersection, almost 2-SAT, and independent set on subgraph-closed IS-FPT graph classes. Here, a graph class is subgraph-closed IS-FPT if it is closed under taking subgraphs and the problem of finding independent set of size is FPT parameterized by . The following theorem summarizes our results, where the precise definitions of each domain are given in the full version.

Theorem 3.

Let 𝒟 be the domain of vertex covers, t-hitting sets, feedback vertex sets, t-represented linear matroid intersections, almost 2-SATs, or independent sets on subgraph-closed IS-FPT graph classes. Then, max-min and max-sum diversification problems and k-center and k-sum-of-radii clustering problems admit an FPT algorithm, where the parameterization is k+ except for the t-hitting set and t-represented linear matroid intersection, which are parameterized by k++t.

Theorem 1 also generalizes existing frameworks for diversification. Baste et al. [8] provided an algorithmic framework for diversification using a loss-less kernel [9, 10], which, roughly speaking, is a kernel that completely preserves the information of the solution space. Since loss-less kernels are known for very limited domains, their framework requires very strong assumptions. Our framework has broader applicability than theirs because it relies on a weaker oracle, as the exact empty extension oracle can be constructed using a loss-less kernel. Hanaka et al. [26] developed a color-coding-based framework for diversification. The oracle they use can be regarded as the exact empty extension oracle with an additional colorfulness constraint. Our framework again has broader applicability than theirs because our oracle can be constructed using theirs. Moreover, our framework also treats clustering problems, which these two do not.

1.2.3 Applications of Theorem 2

FPT algorithms for the max-min diversification problems on 𝒟 parameterized by k+d are known for the cases where 𝒟 is the family of matroid bases [17, 22], perfect matchings [22], and shortest paths [23]. The result for the perfect matchings is later extended to the matchings of specified size, not necessarily perfect [17]. Additionally, for the cases where 𝒟 is the family of interval schedulings [26] and the longest common subsequences of an absolute constant number of strings [31], FPT algorithms parameterized by k+ are known. Both of these two can be generalized to the domain of dynamic programming problems, which we define in the full version. Furthermore, the problem of finding a pair of a branching and an in-branching such that the Hamming distance between them is at least d is investigated as the name of d-distinct branchings problem [5, 6, 25], which admits FPT algorithm parameterized by d. This problem can naturally be extended to the case that selects k1 branchings and k2 in-branching, rather than one each. We give FPT algorithms parameterized by k+d for all those problems, where k=k1+k2 for the extended version of d-distinct branchings problem. We also give FPT algorithms on domains of minimum edge s,t-flows, Steiner trees, vertex sets of s,t-mincut, and vertex sets of edge bipartization, which are domains where no FPT algorithm for the max-min diversification problem is previously known. Remark that the domain of shortest paths [23] is the special case of the minimum edge s,t-flow domain and the minimum Steiner tree domain. The following theorem summarizes our results, where the precise definitions of each domain are given in the full version.

Theorem 4.

Let 𝒟 be the domain of matroid bases, branchings, matchings of specified size, minimum edge s,t-flows, minimum Steiner trees, vertex sets of s,t-mincut, vertex sets of edge bipartization, and dynamic programming problems. Then, max-min and max-sum diversification problems and k-center and k-sum-of-radii clustering problems admit an FPT algorithm, where the parameterization is k+d except for the Steiner tree, which is parameterized by k+d+|T| for the terminal set T, and edge bipartization, which is parameterized by k++s, where s is the minimum number of edges to be removed to make the given graph bipartite. Furthermore, the extended version of d-distinct branching problem also admits FPT algorithm parameterized by k+d.

Eiben et al. [17] provided a technique called determinantal sieving, which is a general tool to give and speed up parameterized algorithms, including that for diversification problems. Particularly, they provided a framework to solve the diversification problem by using an oracle that, roughly speaking, counts the number of solutions modulo 2. Using their framework, they improved the running times of FPT algorithms for max-min diversification problems on matchings and matroid bases, as well as the d-distinct branchings problem. Although not stated explicitly, their framework seems to yield FPT algorithms parameterized by k+d when 𝒟 is the dynamic programming domain, thereby improving the parameterizations in the results of [26] and [31], respectively, as well as the extended version of d-distinct branchings problems. We are not sure whether our framework generalizes theirs, that is, whether our oracle can be constructed using their oracle. However, we strongly believe that our framework has broader applicability because their framework assumes counting oracles, which is often hard even modulo 2. In contrast, our framework uses optimization-type oracles, which are generally more tractable than counting. Indeed, our framework provides an FPT algorithm with the same parameterization for every domain which they explicitly considered. Moreover, we do not think their framework can give an FPT algorithm for the max-min diversification problem on the domains of minimum edge s,t-flows, vertex sets of minimum s,t-cut, and vertex sets of edge bipartization. Furthermore, our framework can be applied not only to diversification problems but also to clustering problems.

1.3 Framework Overview

In this section, we provide an overview of the entire flow of our frameworks. Our frameworks first construct max-distance sparsifiers using the corresponding oracles and then solve diversification and clustering problems using them.

1.3.1 From 𝒅-Limited 𝒌-Max-Distance Sparsifier to Diversification and Clustering

We begin by defining the max-distance sparsifiers. The key for Theorem 1 is designing the following k-max-distance sparsifier of 𝒟.

Definition 5 (k-max-distance sparsifier).

Let k1. Let U be a finite set and 𝒟,2U. We say that 𝒦𝒟 is a k-max-distance sparsifier of 𝒟 with respect to if for any (F1,,Fk)k and (z1,,zk)0k, the two conditions

  • There exists D𝒟 such that for each i{1,,k}, |FiD|zi.

  • There exists K𝒦 such that for each i{1,,k}, |FiK|zi.

are equivalent. Unless specifically noted, when we write k-max-distance sparsifier of 𝒟, we mean the case where 𝒟=.

Similarly, the key for Theorem 2 is designing the following d-limited k-max-distance sparsifier of 𝒟.

Definition 6 (d-limited k-max-distance sparsifier).

Let k1 and d0. Let U be a finite set and 𝒟,2U. We say that 𝒦𝒟 is a d-limited k-max-distance sparsifier of 𝒟 with respect to if for any (F1,,Fk)k and (z1,,zk){0,,d}k, the two conditions

  • There exists D𝒟 such that for each i{1,,k}, |FiD|zi.

  • There exists K𝒦 such that for each i{1,,k}, |FiK|zi.

are equivalent. Unless specifically noted, when we write d-limited k-max-distance sparsifier of 𝒟, we mean the case where 𝒟=.

The difference between the two sparsifiers is that the domain of (z1,,zk) is 0k in the former case, while it is {0,,d}k in the latter. By definition, any k-max-distance sparsifier is also a d-limited k-max-distance sparsifier for any d0. We can prove that given a d-limited (k1)-max-distance sparsifier of 𝒟 with size bounded by a constant that depends only on k+d, we can construct FPT algorithms parameterized by k+d for the max-min/max-sum diversification problems on 𝒟. Similarly, we can prove that given a (d+1)-limited k-max-distance sparsifier of 𝒟 with size bounded by a constant that depends only on k+d, we can construct FPT algorithms parameterized by k+d for the k-center/k-sum-of-radii clustering problems on 𝒟. Therefore, to prove Theorems 1 and 2, it suffices to construct FPT oracle algorithms for designing k-max-distance sparsifiers and d-limited k-max-distance sparsifiers, respectively.

1.3.2 Computing 𝒌-Max-Distance Sparsifier

The remaining task towards Theorem 1 is to provide an FPT algorithm parameterized by k+ that constructs a k-max-distance sparsifier of 𝒟 with size bounded by a constant that depends only on k+. The key lemma toward this is that, if 𝒦 contains a sufficiently large sunflower (see Section 3 for the definition) consisting of sets of the same size, then we can safely remove one of them from 𝒦 while preserving the property that 𝒦 is a k-max-distance sparsifier (actually, for the sake of simplifying the framework, we prove a slightly stronger statement). Starting with 𝒦=𝒟 and exhaustively removing such sets leads to a 𝒦 that is still a k-max-distance sparsifier of 𝒟, and by using the well-known sunflower lemma (Lemma 13), its size is bounded by a constant.

However, this observation is still not sufficient to obtain an FPT algorithm. The reason is that 𝒟 generally has exponential size, and removing sets one by one would require an exponential number of steps. Instead, our algorithm starts with 𝒦= and exhaustively adds sets of 𝒟 to 𝒦 until 𝒦 becomes a k-max-distance sparsifier. In this way, the number of steps is bounded by a constant. The remaining task is to choose a set to be added at each step. For this task, we design an FPT algorithm using constant number of calls of the exact empty extension oracle.

1.3.3 Computing 𝒅-Limited 𝒌-Max-Distance Sparsifier

The algorithm in the previous section alone is insufficient to prove Theorem 2 since is unbounded and the sunflower-lemma-based bound for the number of steps cannot be used. Our algorithm divides 𝒟 into at most k clusters, computes a d-limited k-max-distance sparsifier for each cluster, and outputs their union. Let p>2d be a constant that depends only on k+d. We first find 𝒞𝒟 satisfying the following properties: (i) |𝒞|k, (ii) for all distinct C,C𝒞, |CC|>2d, and (iii) for all D𝒟, there exists C𝒞 such that |DC|p. If such a family does not exist, a trivial d-limited k-max-distance sparsifier of 𝒟 will be found, and we output it and terminate.

We provide an algorithm for computing 𝒞. Our algorithm starts with 𝒞= and exhaustively adds sets in 𝒟 to 𝒞 until 𝒞 satisfies the above conditions or its size exceeds k. To choose the elements to be added, we randomly sample w{1,1}U and call a (1,1)-optimization oracle. We can prove for sufficiently large constant p that if there exists D𝒟 such that |DC|>p for any C𝒞, with a constant probability, the (1,1)-optimization oracle will find a D𝒟 such that |DC|>2d for any C𝒞. Thus, if 𝒞 does not meet the conditions, by calling the (1,1)-optimization oracle a sufficient number of times, we can find a set to add to 𝒞 with high probability.

Here, we provide an algorithm for computing a d-limited k-max-distance sparsifier of 𝒟 using 𝒞. For each cluster 𝒟C:={D𝒟:|DC|p}, let 𝒟C:={DC:D𝒟C}. The algorithm computes a k-max-distance sparsifier of each 𝒟C and outputs their union. For technical reasons, we actually compute a slightly more general object, but we will not delve into the details here. Since each 𝒟C consists only of sets whose size is at most p, the k-max-distance sparsifier of 𝒟C can be constructed using the algorithm in Section 1.3.2. The exact empty extension oracle on 𝒟C corresponds to the exact extension oracle on 𝒟.

Here, we note the difference between our framework and that used by Fomin et al. [22] and Funayama et al. [23] to provide FPT algorithms for the max-min diversification problem on 𝒟 when 𝒟 is the family of perfect matchings and shortest paths, respectively. Their algorithms also start by dividing 𝒟 into clusters. However, their algorithms perform stricter clustering than ours. Specifically, in their clustering, clusters 𝒟C corresponding to different C𝒞 must be well-separated. In contrast, we allow clusters to overlap. This simplifies the clustering step compared to their approach at the cost of a more challenging task afterward. We resolve this more challenging task by introducing and designing the d-limited k-max-distance sparsifier.

1.4 Organization

The rest of this paper is organized as follows. In Section 2, we provide FPT algorithms for solving diversification and clustering problems on 𝒟 using a constant-size d-limited k-max-distance sparsifier of 𝒟. In Section 3, we prove Theorem 1 by providing an FPT oracle algorithm parameterized by k+ that computes a constant-size k-max-distance sparsifier of 𝒟. In Section 4, we prove Theorem 2 by providing an FPT oracle algorithm parameterized by k+d that computes a constant-size d-limited k-max-distance sparsifier of 𝒟. The discussion in Section 4 internally uses the results from Section 3. In the full version, we apply the results of Theorems 1 and 2 to several domains 𝒟 to obtain FPT algorithms for diversification and clustering problems. The full version also contains further related work and the proofs of the lemmas marked with an asterisk.

2 From Sparsifier to Diversification and Clustering

In this section, we provide FPT algorithms for diversification and clustering problems using a d-limited k-max-distance sparsifier of constant size.

2.1 Diversification

Let U be a finite set, d0, k1, and 𝒟2U. For diversification problems, we have the following.

Lemma 7.

Let 𝒦𝒟 be a d-limited (k1)-max-distance sparsifier of 𝒟 and (D1,,Dk)𝒟k. Then, there is a k-tuple (K1,,Kk)𝒦k such that min(d,|DiDj|)min(d,|KiKj|) holds for all 1i<jk. Particularly, if there exists a solution to the max-min/max-sum diversification problem on 𝒟, then there exists a solution consisting only of sets in 𝒦.

Proof.

Assume (D1,,Dk)𝒟k𝒦k and let i{1,,k} be an index such that Di𝒦. It is sufficient to prove that there is a set Ki𝒦 such that min(d,|DiDj|)min(d,|KiDj|) holds for all j{1,,k}{i}. For j{1,,k}{i}, let zj:=min(d,|DiDj|). Since 𝒦 is a d-limited (k1)-max-distance sparsifier of 𝒟, there exists Ki𝒦 such that min(d,|KiDj|)min(d,zj)=min(d,|DiDj|) holds for all j{1,,k}{i}. Considering an algorithm that exhaustively searches for a subfamily of 𝒦 of size k, we can state the following.

Lemma 8.

Assume there exists an FPT algorithm parameterized by k+d to compute a d-limited (k1)-max-distance sparsifier of 𝒟 with size bounded by a constant that depends only on k+d. Then, there exists an FPT algorithm parameterized by k+d for the max-min/max-sum diversification problem on 𝒟.

We note that, under the slightly stronger assumption, the discussion in this section can directly be extended to the case where the sets D1,,Dk are taken from different domains. Specifically, let 𝒟1,,𝒟k2U and assume (k1)-max-distance sparsifiers 𝒦1,,𝒦k of these domains with respect to 2U are computed. Then, we can determine whether there exists a k-tuple (D1,,Dk)𝒟1××𝒟k such that min1i<jk|DiDj|d (or 1i<jk|DiDj|d) by exhaustive search on 𝒦1××𝒦k.

2.2 Clustering

Let U be a finite set, d0, k1, and 𝒟2U. Here, we provide an FPT algorithm for the k-center and k-sum-of-radii clustering problems using a (d+1)-limited k-max-distance sparsifier 𝒦 of 𝒟. For ZU and r0, the ball of radius r centered at Z is defined as (Z,r):={ZU:|ZZ|r}. The algorithm first guesses a partition of 𝒦 into k clusters 𝒦1,,𝒦k. Since |𝒦| is constant, the cost of this guess is constant. Then, for each i{1,,k}, the algorithm computes the minimum radius ri such that there is a set Di𝒟 satisfying 𝒦i(Di,ri). If ri>d, the algorithm asserts it instead of computing the specific value of ri. The k-center clustering problem and k-sum-of-radii clustering problem on 𝒟 are solved by checking whether the maximum and sum, respectively, of the ris is at most d. We show the correctness of this algorithm by proving the following.

Lemma 9.

Let (D1,,Dk)𝒟k and (r1,,rk){0,,d}k. Assume 𝒦i(Di,ri) holds for all i{1,,k}. Then, for all D𝒟, there is an index i{1,,k} such that D(Di,ri).

Proof.

Assume the contrary. Then, there is a set D𝒟 such that for all i{1,,k}, |DiD|ri+1. Since 𝒦 is a (d+1)-limited k-max-distance sparsifier of 𝒟, there is a set K𝒦 such that for all i{1,,k}, |DiK|ri+1. Hence, K𝒦i for all i{1,,k}, contradicting the fact that (𝒦1,,𝒦k) is a partition of 𝒦.

We now provide an algorithm to decide whether there exists D𝒟 with 𝒦i(D,ri) for each i{1,,k} and ri{0,,d}. If the domain 𝒟 is 2U, this problem is equivalent to the closest string problem on binary strings, for which a textbook FPT algorithm parameterized by d+|𝒦i| is known [14]. Our algorithm is a modified version of this. An element eU is bad if there exist both K𝒦i with eK and K𝒦i with eK. The following lemma is fundamental.

Lemma 10 ([14]).

If there are more than d|𝒦i| bad elements, no D𝒟 satisfies 𝒦i(D,d).

Let B be the set of bad elements, and assume |B|d|𝒦i|. The algorithm first guesses BB. The cost of this guess is 2d|𝒦i|. Then, it determines whether there exists D𝒟 such that DB=B and 𝒦i(D,ri). Let K=argmaxK𝒦i|(KB)B|. Then, we can claim the following.

Lemma 11.

For D𝒟 such that DB=B, maxK𝒦i|KD|=|KD|.

Proof.

Let K𝒦i. Then, |KD|=|(KB)(DB)|+|(KB)(DB)|. From the definition of B, the value of |(KB)(DB)| is equal among all K𝒦i. Thus, the maximum value of |KD| for K𝒦i is achieved by the set K that maximizes |(KB)(DB)|=|(KB)B|. Now, it is sufficient to solve the problem of determining whether there exists D𝒟 such that DB=B and |KD|ri. This corresponds to the exact extension oracle on 𝒟 with r=ri, X=B, Y=BB, and C=K. Therefore, we can claim the following:

Lemma 12.

Assume there exists an FPT algorithm parameterized by k+d that computes a (d+1)-limited k-max-distance sparsifier of 𝒟 with size bounded by a constant that depends only on k+d, and the exact extension oracle on 𝒟 whose time complexity is FPT parameterized by r+|X|+|Y|. Then, there exists an FPT algorithm parameterized by k+d for the k-center/k-sum-of-radii clustering problem on 𝒟.

3 Framework for 𝒌-Max-Distance Sparification

In this section, we complete the proof of Theorem 1 by providing an FPT algorithm that uses the exact empty extension oracle on 𝒟 to obtain a k-max-distance sparsifier of 𝒟. For further use, we show a slightly more extended result. Let r0. We construct a k-max-distance sparsifier of 𝒟 with respect to (,r) for r. Since 𝒟(,)(,r) for r, this is also a k-max-distance sparsifier of 𝒟 (with respect to 𝒟). A set family 𝒮:={S1,,St} is called a sunflower if there exists a set called core C such that for any 1i<jt, SiSj=C. The following is well-known.

Lemma 13 (Sunflower Lemma [14, 18]).

Let U be a finite set, ,t0, and 𝒦2U be a family consisting only of sets of size at most . If |𝒦|>!(t1), then 𝒦 contains a sunflower of size t.

For t0, 𝒯2U, and Z2U𝒯, a sunflower 𝒮𝒯 is a (Z,t)-sunflower of 𝒯 if it satisfies the following three conditions.

  • |𝒮|=t,

  • For each S𝒮, |S|=|Z|, and

  • The core of 𝒮 is a subset of Z.

The following lemma is the core of our framework.

Lemma 14 (*).

Let U be a finite set, 𝒟2U, and 𝒦𝒟 be a k-max-distance sparsifier of 𝒟 with respect to (,r). Let Z𝒦 and assume there is a (Z,kr+1)-sunflower 𝒮 of 𝒦{Z}. Then, 𝒦{Z} is also a k-max-distance sparsifier of 𝒟 with respect to (,r).

Algorithm 1 k-max-distance sparsification of 𝒟.

Our algorithm is given in Algorithm 1, where ExactEmptyExtension(,Y) represents the exact empty extension oracle on 𝒟 with arguments and Y. The algorithm starts with 𝒦:= and repeatedly adds Z𝒟𝒦 such that there is no (Z,kr+1)-sunflower of 𝒦 to 𝒦. The following lemma shows this algorithm stops after a constant number of iterations.

Lemma 15 (*).

The number of iterations of the loop starting from line 3 in Algorithm 1, as well as the size of the output family, is at most (+1)!(kr+1).

In particular, at each step of the algorithm, since |R||𝒦|l(+1)!(kr+1), the size of Y chosen in line 7 is bounded by a constant. The time complexity is bounded as follows.

Lemma 16 (*).

Algorithm 1 makes at most 22O(log(klr)) calls of
ExactEmptyExtension() and has a time complexity of 22O(log(klr)).

The correctness of the algorithm is shown as follows.

Lemma 17 (*).

Algorithm 1 outputs a k-max-distance sparsifier of 𝒟 with respect to (,r).

4 Framework for 𝒅-Limited 𝒌-Max-Distance Sparsification

4.1 Overall Flow

In this section, we complete the proof of Theorem 2 by providing FPT algorithm that uses the (1,1)-optimization oracle and the exact extension oracle on 𝒟 to obtain a d-limited k-max-distance sparsifier of 𝒟. Actually, for further applications, we construct the slightly more general object of d-limited k-max-distance sparsifier of 𝒟 with respect to 2U, not with respect to 𝒟 itself. Our framework consists of two steps. Let p0 be an integer with 2d<p. The first step achieves one of the following.

  • Find a set 𝒞𝒟 with size at most k such that 𝒟C𝒞(C,p).

  • Find a set 𝒞𝒟 of size k+1 such that |CC|>2d holds for any distinct C,C𝒞.

We do this by using the following approximate far set oracle, which will be designed in Section 4.2.

Approximate Far Set Oracle: Let U be a finite set, d0, 𝒟2U, and 𝒞𝒟. The approximate far set oracle returns one of the following.

  • A set of 𝒟 that does not belong to C𝒞(C,2d).

  • . This option can be chosen only when 𝒟C𝒞(C,p).

Starting with 𝒞:=, we repeat the following steps. If the approximate far set oracle returns , terminate the loop. Otherwise, add the element found by the oracle to 𝒞. If the oracle returns within k iterations, the first condition is achieved. If not, the set 𝒞 after k+1 iterations satisfies the second condition. In the latter case, the following lemma shows that 𝒞 is a d-limited k-max-distance sparsifier.

Lemma 18 (*).

Let r0. Let 𝒞 be a subset of 𝒟 of size k+1 such that for any distinct C,C𝒞, |CC|2d. Then, 𝒞 is a d-limited k-max-distance sparsifier of 𝒟 with respect to 2U.

Now, we assume the first condition. Let 𝒞 be a subset of 𝒟 of size at most k such that 𝒟C𝒞(C,p). The second step involves constructing a d-limited k-max-distance sparsifier of 𝒟C:=𝒟(C,p) with respect to (C,p+d) for each C𝒞. We prove that the union of all such d-limited k-max-distance sparsifiers obtained in this manner is a d-limited k-max-distance sparsifier of 𝒟 with respect to 2U.

Lemma 19 (*).

Assume 𝒟=C𝒞𝒟C. For each C𝒞, let 𝒦C𝒟C be a d-limited k-max-distance sparsifier of 𝒟C with respect to (C,p+d). Then, 𝒦:=C𝒞𝒦C is a d-limited k-max-distance sparsifier of 𝒟 with respect to 2U.

Next, we reduce the computation of d-limited k-max-distance sparsifiers to the computation of k-max-distance sparsifiers of families consisting of constant-size sets, which was discussed in Section 3. For C𝒞, let 𝒟C:={DCD𝒟C}. By the definition of 𝒟C, we have 𝒟C(,p). The following holds.

Lemma 20 (*).

Let C𝒞. A subset 𝒦C𝒟C is a d-limited k-max-distance sparsifier of 𝒟C with respect to (C,p+d) if and only if 𝒦C:={KCK𝒦C} is a d-limited k-max-distance sparsifier of 𝒟C with respect to (,p+d).

If 𝒦C is a k-max-distance sparsifier of 𝒟C with respect to (,p+d), then it is also a d-limited k-max-distance sparsifier of 𝒟C with respect to (,p+d) for any d0. Therefore, a d-limited k-max-distance sparsifier 𝒦 of 𝒟 with respect to (C,p+d) can be computed as

𝒦:=C𝒞{KC:K𝒦C}.

From the discussion in Section 3, 𝒦C can be obtained by calling the exact empty extension oracle on 𝒟C a constant number of times that depends on k, =p, and r=p+d. The exact empty extension oracle for 𝒟C is equivalent to the exact extension oracle on 𝒟C when the inputs C,X,Y are taken to be C,YC,YC, respectively. Therefore, 𝒦C can be obtained by calling the exact extension oracle on 𝒟C a constant number of times that depends only on k and p.

4.2 Designing the Approximate Far Set Oracle

Here, we design a randomized algorithm parameterized by |𝒞| and d for the approximate far set oracle. Our algorithm repeats the following sufficient number of times: It selects a weight vector w{1,1}U uniformly at random and finds a set D𝒟 that maximizes w(D):=eDwe. If the found D does not belong to C𝒞(C,2d), it outputs this D and terminates. If no such D is found after a sufficient number of iterations, it returns . We now prove the correctness of the algorithm. We can claim the following.

Lemma 21 (*).

Assume maxD𝒟w(D)>maxC𝒞w(C)+2d. Then, the D that attains the maximum on the left-hand side does not belong to C𝒞(C,2d).

The following lemma is the core of the analysis.

Lemma 22 (*).

Assume p(4d+2)22k1 and |𝒞|k. Let D𝒟 and assume DC𝒞(C,p). Then,

Pr[w(D)>maxC𝒞w(C)+2d]22O(k).

By repeating the sampling of w a sufficient number of times, we can state the following.

Lemma 23.

Let ϵ>0, 𝒟,𝒞2U, k1, and d0. Assume |𝒞|k. Then, there exists a randomized algorithm that runs in time 22O(k)logϵ1 and satisfies the following.

  • If there exists D𝒟 such that DC𝒞(C,(4d+2)22k), the algorithm returns a set D𝒟 satisfying DC𝒞(C,2d) with probability at least 1ϵ.

  • If not, the algorithm returns either or a set D𝒟 satisfying DC𝒞(C,2d).

Combining Lemma 23 with the results from Sections 3 and 4.1, we have the following.

Lemma 24 (*).

Let ϵ>0, 𝒟2U, k1, and d0. Then, there exists a randomized algorithm that, with probability 1ϵ, computes a d-limited k-max-distance sparsifier of 𝒟 with respect to 2U with size at most 22O(k+logd) in time (222O(k+logd)+22O(k)logϵ1)poly(|U|). It uses at most 22O(k)logϵ1 calls to the (1,1)-optimization oracle on 𝒟 and at most 222O(k+logd) calls to the exact extension oracle on 𝒟 such that r(4d+2)22k1 and |X|,|Y|22O(k+logd).

References

  • [1] Pankaj Agarwal, Aryan Esmailpour, Xiao Hu, Stavros Sintos, and Jun Yang. Computing a well-representative summary of conjunctive query results. Proceedings of the ACM on Management of Data, 2(5):1–27, 2024. doi:10.1145/3695835.
  • [2] Pankaj Agarwal, Sariel Har-Peled, and Kasturi Varadarajan. Approximating extent measures of points. Journal of the ACM (JACM), 51(4):606–635, 2004. doi:10.1145/1008731.1008736.
  • [3] Amihood Amir, Jessica Ficler, Liam Roditty, and Oren Sar Shalom. On the efficiency of the hamming c-centerstring problems. In Combinatorial Pattern Matching (CPM), pages 1–10. Springer, 2014. doi:10.1007/978-3-319-07566-2_1.
  • [4] Sayan Bandyapadhyay, William Lochet, and Saket Saurabh. FPT constant-approximations for capacitated clustering to minimize the sum of cluster radii. In International Symposium on Computational Geometry (SoCG), volume 258, pages 12:1–12:14, 2023. doi:10.4230/LIPICS.SOCG.2023.12.
  • [5] Jørgen Bang-Jensen, Kristine Vitting Klinkby, and Saket Saurabh. k-distinct branchings admits a polynomial kernel. In European Symposium on Algorithms (ESA), pages 11:1–11:15, 2021. doi:10.4230/LIPICS.ESA.2021.11.
  • [6] Jørgen Bang-Jensen, Saket Saurabh, and Sven Simonsen. Parameterized algorithms for non-separating trees and branchings in digraphs. Algorithmica, 76:279–296, 2016. doi:10.1007/S00453-015-0037-3.
  • [7] Julien Baste, Michael Fellows, Lars Jaffke, Tomáš Masařík, Mateus de Oliveira Oliveira, Geevarghese Philip, and Frances Rosamond. Diversity of solutions: An exploration through the lens of fixed-parameter tractability theory. Artificial Intelligence, 303:103644, 2022. doi:10.1016/J.ARTINT.2021.103644.
  • [8] Julien Baste, Lars Jaffke, Tomáš Masařík, Geevarghese Philip, and Günter Rote. FPT algorithms for diverse collections of hitting sets. Algorithms, 12(12):254, 2019. doi:10.3390/A12120254.
  • [9] Clément Carbonnel and Emmanuel Hebrard. Propagation via kernelization: The vertex cover constraint. In International Conference on Principles and Practice of Constraint Programming (CP), pages 147–156. Springer, 2016. doi:10.1007/978-3-319-44953-1_10.
  • [10] Clément Carbonnel and Emmanuel Hebrard. On the kernelization of global constraints. In International Joint Conference on Artificial Intelligence (IJCAI), pages 578–584, 2017. doi:10.24963/IJCAI.2017/81.
  • [11] Moses Charikar and Rina Panigrahy. Clustering to minimize the sum of cluster diameters. In ACM Symposium on Theory of Computing (STOC), pages 1–10, 2001. doi:10.1145/380752.380753.
  • [12] Xianrun Chen, Dachuan Xu, Yicheng Xu, and Yong Zhang. Parameterized approximation algorithms for sum of radii clustering and variants. In AAAI Conference on Artificial Intelligence, volume 38, pages 20666–20673, 2024. doi:10.1609/AAAI.V38I18.30053.
  • [13] Ryan Curtin, Benjamin Moseley, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. RK-means: Fast clustering for relational data. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 2742–2752. PMLR, 2020. URL: http://proceedings.mlr.press/v108/curtin20a.html.
  • [14] Marek Cygan, Fedor Fomin, Łukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. Parameterized algorithms. Springer, 2015. doi:10.1007/978-3-319-21275-3.
  • [15] Erik Demaine, Fedor Fomin, MohammadTaghi Hajiaghayi, and Dimitrios Thilikos. Fixed-parameter algorithms for (k,r)-center in planar graphs and map graphs. ACM Transactions on Algorithms (TALG), 1(1):33–47, 2005. doi:10.1145/1077464.1077468.
  • [16] Eduard Eiben, Robert Ganian, Iyad Kanj, Sebastian Ordyniak, and Stefan Szeider. On the parameterized complexity of clustering problems for incomplete data. Journal of Computer and System Sciences, 134:1–19, 2023. doi:10.1016/J.JCSS.2022.12.001.
  • [17] Eduard Eiben, Tomohiro Koana, and Magnus Wahlström. Determinantal sieving. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 377–423. SIAM, 2024. doi:10.1137/1.9781611977912.16.
  • [18] Paul Erdös and Richard Rado. Intersection theorems for systems of sets. Journal of the London Mathematical Society, 1(1):85–90, 1960.
  • [19] Aryan Esmailpour and Stavros Sintos. Improved approximation algorithms for relational clustering. Proceedings of the ACM on Management of Data, 2(5):213:1–213:27, 2024. doi:10.1145/3695831.
  • [20] Andreas Emil Feldmann and Dániel Marx. The parameterized hardness of the k-center problem in transportation networks. Algorithmica, 82:1989–2005, 2020. doi:10.1007/S00453-020-00683-W.
  • [21] Fedor Fomin, Petr Golovach, Lars Jaffke, Geevarghese Philip, and Danil Sagunov. Diverse pairs of matchings. Algorithmica, 86(6):2026–2040, 2024. doi:10.1007/S00453-024-01214-7.
  • [22] Fedor Fomin, Petr Golovach, Fahad Panolan, Geevarghese Philip, and Saket Saurabh. Diverse collections in matroids and graphs. Mathematical Programming, 204(1):415–447, 2024. doi:10.1007/S10107-023-01959-Z.
  • [23] Ryo Funayama, Yasuaki Kobayashi, and Takeaki Uno. Parameterized complexity of finding dissimilar shortest paths. arXiv preprint arXiv:2402.14376, 2024. doi:10.48550/arXiv.2402.14376.
  • [24] Tatsuya Gima, Yuni Iwamasa, Yasuaki Kobayashi, Kazuhiro Kurita, Yota Otachi, and Rin Saito. Computing diverse pair of solutions for tractable sat. arXiv preprint arXiv:2412.04016, 2024. doi:10.48550/arXiv.2412.04016.
  • [25] Gregory Gutin, Felix Reidl, and Magnus Wahlström. k-distinct in-and out-branchings in digraphs. Journal of Computer and System Sciences, 95:86–97, 2018. doi:10.1016/J.JCSS.2018.01.003.
  • [26] Tesshu Hanaka, Yasuaki Kobayashi, Kazuhiro Kurita, and Yota Otachi. Finding diverse trees, paths, and more. In AAAI Conference on Artificial Intelligence, volume 35, pages 3778–3786, 2021. doi:10.1609/AAAI.V35I5.16495.
  • [27] Wen-Lian Hsu and George Nemhauser. Easy and hard bottleneck location problems. Discrete Applied Mathematics, 1(3):209–215, 1979. doi:10.1016/0166-218X(79)90044-1.
  • [28] Tanmay Inamdar and Kasturi Varadarajan. Capacitated sum-of-radii clustering: An FPT approximation. In European Symposium on Algorithms (ESA), 2020.
  • [29] Neeldhara Misra, Harshil Mittal, and Ashutosh Rai. On the parameterized complexity of diverse SAT. In International Symposium on Algorithms and Computation (ISAAC), volume 322, pages 50:1–50:18, 2024. doi:10.4230/LIPICS.ISAAC.2024.50.
  • [30] Benjamin Moseley, Kirk Pruhs, Alireza Samadian, and Yuyan Wang. Relational algorithms for k-means clustering. In International Colloquium on Automata, Languages, and Programming (ICALP), volume 198, pages 97:1–97:21, 2021. doi:10.4230/LIPICS.ICALP.2021.97.
  • [31] Yuto Shida, Giulia Punzi, Yasuaki Kobayashi, Takeaki Uno, and Hiroki Arimura. Finding diverse strings and longest common subsequences in a graph. In Symposium on Combinatorial Pattern Matching (CPM), volume 296, pages 27:1–27:19, 2024. doi:10.4230/LIPICS.CPM.2024.27.