Max-Distance Sparsification for Diversification and Clustering
Abstract
Let be a set family that is the solution domain of some combinatorial problem. The max-min diversification problem on is the problem to select sets from such that the Hamming distance between any two selected sets is at least . FPT algorithms parameterized by , where , and have been actively studied recently for several specific domains.
This paper provides unified algorithmic frameworks to solve this problem. Specifically, for each parameterization and , we provide an FPT oracle algorithm for the max-min diversification problem using oracles related to . We then demonstrate that our frameworks provide the first FPT algorithms on several new domains , including the domain of -linear matroid intersection, almost -SAT, minimum edge -flows, vertex sets of -mincut, vertex sets of edge bipartization, and Steiner trees. We also demonstrate that our frameworks generalize most of the existing domain-specific tractability results.
Our main technical breakthrough is introducing the notion of max-distance sparsifier of , a domain on which the max-min diversification problem is equivalent to the same problem on the original domain . The core of our framework is to design FPT oracle algorithms that construct a constant-size max-distance sparsifier of . Using max-distance sparsifiers, we provide FPT algorithms for the max-min and max-sum diversification problems on , as well as -center and -sum-of-radii clustering problems on , which are also natural problems in the context of diversification and have their own interests.
Keywords and phrases:
Fixed-Parameter Tractability, Diversification, Clustering2012 ACM Subject Classification:
Theory of computation Fixed parameter tractabilityEditors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz HermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
1.1 Background and Motivation
The procedure for approaching real-world problems with optimization algorithms involves formulating the real-world motivations as mathematical problems and then solving them. However, real-world problems are complex, and the idea of a “good” solution cannot always be correctly formulated. The paradigm of diversification, introduced by Baste et al. [8] and Baste et al. [7], is a “formulation of the unformulatable problems”, which formulates diversity measures for a set of multiple solutions, rather than attempting to formulate the “goodness” of a single solution. By computing a set of solutions that maximize this measure, the algorithm provides effective options to evaluators who have the correct criteria for judging the “goodness” of a solution.
Let be a finite set, and . Let be the feasible domain of some combinatorial problem. The following problem frameworks, defined by two types of diversity measures, have been studied extensively.
Max-Min Diversification Problem on : Does there exist a -tuple of sets in such that ?111We define .
Max-Sum Diversification Problem on : Does there exist a -tuple of sets in such that ?
These problems ensure diversity by aiming to output solutions that are as dissimilar as possible in terms of Hamming distance.
Parameterized algorithms for diversification problems have been actively studied. Particularly, FPT algorithms for the max-min diversification problems parameterized by , where [7, 8, 22, 26, 31], as well as by [17, 21, 22, 23, 24, 29], have been the focus of research. Since assuming does not lose generality in the max-min diversification problem, the latter addresses a more general situation than the former. Since the max-sum diversification problem is empirically more tractable than the max-min diversification problem, for some time hereafter, we will restrict our discussion to the max-min diversification problem.
This research provides general algorithmic frameworks for FPT algorithms solving max-min (and max-sum) diversification problems for both parameterizations and . Our frameworks are very general and can be applied to all domains [5, 6, 7, 8, 17, 21, 22, 23, 25, 26, 31] for which FPT algorithms parameterized by and are currently known for the case that diversity measure is defined using an unweighted Hamming distance. Moreover, our frameworks further provide the first algorithms for several domains where such algorithms were previously unknown.
Our main technical breakthrough is introducing a notion of max-distance sparsifier as an intermediate step, which, for the max-min diversification problem, essentially works as a core-set [2]. The formal definition is given in Section 1.3. The critical fact is that, when is a max-distance sparsifier of , the max-min diversification problem on is equivalent to the same problem on . Our framework constructs a constant-size max-distance sparsifier of using the oracles on , enabling us to solve the max-min diversification problems on by brute-force search on .
The power of max-distance sparsification is not limited to solving diversification problems. Specifically, the following -center [27] and -sum-of-radii clustering problems on [11] can also be solved via max-distance sparsification.
-Center Clustering Problem on : Does there exist a -tuple of subsets such that for all , there exists an satisfying ?
-Sum-of-Radii Clustering Problem on : Does there exist a -tuple of subsets and a -tuple of non-negative integers with such that for all , there exists an satisfying ?
When is an explicitly given set of points, parameterized algorithms for these problems have been extensively studied in the area of clustering [3, 4, 12, 15, 16, 20, 28]. Furthermore, approximation algorithms for the relational -means [13, 19, 30] and relational -center [1] problems are investigated, which are the -means and -center clustering problem defined on a point set represented as a join of given relational databases. Their setting is similar to ours as the point set are implicitly given and its size can be exponential. This research adds several new combinatorial domains to the literature on clustering problems on implicitly given domains, and also introduces a parameterized view. These problems are also natural in the context of diversification, since in real situations, the concept of diversity often means that the extracted elements cover the entire space comprehensively rather than being mutually dissimilar. This motivation is formulated by clustering problems, which extract a list of sets in such that for each set in , there is an extracted set near to it.
1.2 Our Results
This paper consists of two parts. In the first part, we design general frameworks for solving diversification and clustering problems. In the second part, we apply our frameworks to several specific domains . Table 1 shows a list of domains on which our frameworks provide the first or an improved algorithm for the max-min diversification problem.
1.2.1 The Frameworks
We define the following -optimization oracle on and the exact extension oracle on .
-Optimization Oracle on : Let be a finite set, , and be a weight vector. Return a set that maximizes .
Exact Extension Oracle on : Let be a finite set, , , and . Let be two disjoint subsets of . If there exists a set such that , , and , return one such set. If no such set exists, return .
When and , we specifically call the exact extension oracle on the exact empty extension oracle on .
Exact Empty Extension Oracle on : Let be a finite set, , , and . If there exists a set such that and , return one such set. If no such set exists, return .
Let be the max-min/max-sum diversification problem or the -center/-sum-of-radii clustering problem on . Our main result is FPT algorithms for solving using these oracles. We construct frameworks for both types of parameterizations, and . The result for the parameterization by is as follows.
Theorem 1.
There exists an oracle algorithm solving using the exact empty extension oracle on , where the number of oracle calls and time complexity are both FPT parameterized by and for each call of an oracle, and are bounded by constants that depend only on .
The result for the parameterization by is as follows.
Theorem 2.
There exists a randomized oracle algorithm solving using the -optimization oracle on and the exact extension oracle on , where the number of oracle calls and time complexity are both FPT parameterized by and for each call of the exact extension oracle, are bounded by constants that depend only on .
| Domain | Parameter |
|---|---|
| -Linear Matroid Intersection | |
| Almost -SAT | |
| Independent Set on Certain Graphs | |
| Min Edge -Flow | |
| Steiner Tree | |
| Vertex Set of Min -Cut | |
| Vertex Set of Edge Bipartization |
1.2.2 Applications of Theorem 1
On most domains , the exact empty extension oracle can be designed by almost the same way as an algorithm to extract a single solution from . For example, consider the case where is the -path domain, i.e., a domain of sets of edges on paths of length . In this case, the exact empty extension oracle on is equivalent to the problem of finding an -path in the graph obtained by removing all edges in from the input. Combining Theorem 1 with this empirical fact, we can claim that, for most domains , the diversification and clustering problems parameterized by on are as easy as determining the non-emptiness of .
To demonstrate that Theorem 1 yields existing tractability results, we design the oracles for the domains of the vertex cover [7, 8], -hitting set [7], feedback vertex set [7], and common independent set of two matroids [17, 22]. We also apply our framework on new domains, -represented linear matroid intersection, almost -SAT, and independent set on subgraph-closed IS-FPT graph classes. Here, a graph class is subgraph-closed IS-FPT if it is closed under taking subgraphs and the problem of finding independent set of size is FPT parameterized by . The following theorem summarizes our results, where the precise definitions of each domain are given in the full version.
Theorem 3.
Let be the domain of vertex covers, -hitting sets, feedback vertex sets, -represented linear matroid intersections, almost -SATs, or independent sets on subgraph-closed IS-FPT graph classes. Then, max-min and max-sum diversification problems and -center and -sum-of-radii clustering problems admit an FPT algorithm, where the parameterization is except for the -hitting set and -represented linear matroid intersection, which are parameterized by .
Theorem 1 also generalizes existing frameworks for diversification. Baste et al. [8] provided an algorithmic framework for diversification using a loss-less kernel [9, 10], which, roughly speaking, is a kernel that completely preserves the information of the solution space. Since loss-less kernels are known for very limited domains, their framework requires very strong assumptions. Our framework has broader applicability than theirs because it relies on a weaker oracle, as the exact empty extension oracle can be constructed using a loss-less kernel. Hanaka et al. [26] developed a color-coding-based framework for diversification. The oracle they use can be regarded as the exact empty extension oracle with an additional colorfulness constraint. Our framework again has broader applicability than theirs because our oracle can be constructed using theirs. Moreover, our framework also treats clustering problems, which these two do not.
1.2.3 Applications of Theorem 2
FPT algorithms for the max-min diversification problems on parameterized by are known for the cases where is the family of matroid bases [17, 22], perfect matchings [22], and shortest paths [23]. The result for the perfect matchings is later extended to the matchings of specified size, not necessarily perfect [17]. Additionally, for the cases where is the family of interval schedulings [26] and the longest common subsequences of an absolute constant number of strings [31], FPT algorithms parameterized by are known. Both of these two can be generalized to the domain of dynamic programming problems, which we define in the full version. Furthermore, the problem of finding a pair of a branching and an in-branching such that the Hamming distance between them is at least is investigated as the name of -distinct branchings problem [5, 6, 25], which admits FPT algorithm parameterized by . This problem can naturally be extended to the case that selects branchings and in-branching, rather than one each. We give FPT algorithms parameterized by for all those problems, where for the extended version of -distinct branchings problem. We also give FPT algorithms on domains of minimum edge -flows, Steiner trees, vertex sets of -mincut, and vertex sets of edge bipartization, which are domains where no FPT algorithm for the max-min diversification problem is previously known. Remark that the domain of shortest paths [23] is the special case of the minimum edge -flow domain and the minimum Steiner tree domain. The following theorem summarizes our results, where the precise definitions of each domain are given in the full version.
Theorem 4.
Let be the domain of matroid bases, branchings, matchings of specified size, minimum edge -flows, minimum Steiner trees, vertex sets of -mincut, vertex sets of edge bipartization, and dynamic programming problems. Then, max-min and max-sum diversification problems and -center and -sum-of-radii clustering problems admit an FPT algorithm, where the parameterization is except for the Steiner tree, which is parameterized by for the terminal set , and edge bipartization, which is parameterized by , where is the minimum number of edges to be removed to make the given graph bipartite. Furthermore, the extended version of -distinct branching problem also admits FPT algorithm parameterized by .
Eiben et al. [17] provided a technique called determinantal sieving, which is a general tool to give and speed up parameterized algorithms, including that for diversification problems. Particularly, they provided a framework to solve the diversification problem by using an oracle that, roughly speaking, counts the number of solutions modulo . Using their framework, they improved the running times of FPT algorithms for max-min diversification problems on matchings and matroid bases, as well as the -distinct branchings problem. Although not stated explicitly, their framework seems to yield FPT algorithms parameterized by when is the dynamic programming domain, thereby improving the parameterizations in the results of [26] and [31], respectively, as well as the extended version of -distinct branchings problems. We are not sure whether our framework generalizes theirs, that is, whether our oracle can be constructed using their oracle. However, we strongly believe that our framework has broader applicability because their framework assumes counting oracles, which is often hard even modulo . In contrast, our framework uses optimization-type oracles, which are generally more tractable than counting. Indeed, our framework provides an FPT algorithm with the same parameterization for every domain which they explicitly considered. Moreover, we do not think their framework can give an FPT algorithm for the max-min diversification problem on the domains of minimum edge -flows, vertex sets of minimum -cut, and vertex sets of edge bipartization. Furthermore, our framework can be applied not only to diversification problems but also to clustering problems.
1.3 Framework Overview
In this section, we provide an overview of the entire flow of our frameworks. Our frameworks first construct max-distance sparsifiers using the corresponding oracles and then solve diversification and clustering problems using them.
1.3.1 From -Limited -Max-Distance Sparsifier to Diversification and Clustering
We begin by defining the max-distance sparsifiers. The key for Theorem 1 is designing the following -max-distance sparsifier of .
Definition 5 (-max-distance sparsifier).
Let . Let be a finite set and . We say that is a -max-distance sparsifier of with respect to if for any and , the two conditions
-
There exists such that for each , .
-
There exists such that for each , .
are equivalent. Unless specifically noted, when we write -max-distance sparsifier of , we mean the case where .
Similarly, the key for Theorem 2 is designing the following -limited -max-distance sparsifier of .
Definition 6 (-limited -max-distance sparsifier).
Let and . Let be a finite set and . We say that is a -limited -max-distance sparsifier of with respect to if for any and , the two conditions
-
There exists such that for each , .
-
There exists such that for each , .
are equivalent. Unless specifically noted, when we write -limited -max-distance sparsifier of , we mean the case where .
The difference between the two sparsifiers is that the domain of is in the former case, while it is in the latter. By definition, any -max-distance sparsifier is also a -limited -max-distance sparsifier for any . We can prove that given a -limited -max-distance sparsifier of with size bounded by a constant that depends only on , we can construct FPT algorithms parameterized by for the max-min/max-sum diversification problems on . Similarly, we can prove that given a -limited -max-distance sparsifier of with size bounded by a constant that depends only on , we can construct FPT algorithms parameterized by for the -center/-sum-of-radii clustering problems on . Therefore, to prove Theorems 1 and 2, it suffices to construct FPT oracle algorithms for designing -max-distance sparsifiers and -limited -max-distance sparsifiers, respectively.
1.3.2 Computing -Max-Distance Sparsifier
The remaining task towards Theorem 1 is to provide an FPT algorithm parameterized by that constructs a -max-distance sparsifier of with size bounded by a constant that depends only on . The key lemma toward this is that, if contains a sufficiently large sunflower (see Section 3 for the definition) consisting of sets of the same size, then we can safely remove one of them from while preserving the property that is a -max-distance sparsifier (actually, for the sake of simplifying the framework, we prove a slightly stronger statement). Starting with and exhaustively removing such sets leads to a that is still a -max-distance sparsifier of , and by using the well-known sunflower lemma (Lemma 13), its size is bounded by a constant.
However, this observation is still not sufficient to obtain an FPT algorithm. The reason is that generally has exponential size, and removing sets one by one would require an exponential number of steps. Instead, our algorithm starts with and exhaustively adds sets of to until becomes a -max-distance sparsifier. In this way, the number of steps is bounded by a constant. The remaining task is to choose a set to be added at each step. For this task, we design an FPT algorithm using constant number of calls of the exact empty extension oracle.
1.3.3 Computing -Limited -Max-Distance Sparsifier
The algorithm in the previous section alone is insufficient to prove Theorem 2 since is unbounded and the sunflower-lemma-based bound for the number of steps cannot be used. Our algorithm divides into at most clusters, computes a -limited -max-distance sparsifier for each cluster, and outputs their union. Let be a constant that depends only on . We first find satisfying the following properties: (i) , (ii) for all distinct , , and (iii) for all , there exists such that . If such a family does not exist, a trivial -limited -max-distance sparsifier of will be found, and we output it and terminate.
We provide an algorithm for computing . Our algorithm starts with and exhaustively adds sets in to until satisfies the above conditions or its size exceeds . To choose the elements to be added, we randomly sample and call a -optimization oracle. We can prove for sufficiently large constant that if there exists such that for any , with a constant probability, the -optimization oracle will find a such that for any . Thus, if does not meet the conditions, by calling the -optimization oracle a sufficient number of times, we can find a set to add to with high probability.
Here, we provide an algorithm for computing a -limited -max-distance sparsifier of using . For each cluster , let . The algorithm computes a -max-distance sparsifier of each and outputs their union. For technical reasons, we actually compute a slightly more general object, but we will not delve into the details here. Since each consists only of sets whose size is at most , the -max-distance sparsifier of can be constructed using the algorithm in Section 1.3.2. The exact empty extension oracle on corresponds to the exact extension oracle on .
Here, we note the difference between our framework and that used by Fomin et al. [22] and Funayama et al. [23] to provide FPT algorithms for the max-min diversification problem on when is the family of perfect matchings and shortest paths, respectively. Their algorithms also start by dividing into clusters. However, their algorithms perform stricter clustering than ours. Specifically, in their clustering, clusters corresponding to different must be well-separated. In contrast, we allow clusters to overlap. This simplifies the clustering step compared to their approach at the cost of a more challenging task afterward. We resolve this more challenging task by introducing and designing the -limited -max-distance sparsifier.
1.4 Organization
The rest of this paper is organized as follows. In Section 2, we provide FPT algorithms for solving diversification and clustering problems on using a constant-size -limited -max-distance sparsifier of . In Section 3, we prove Theorem 1 by providing an FPT oracle algorithm parameterized by that computes a constant-size -max-distance sparsifier of . In Section 4, we prove Theorem 2 by providing an FPT oracle algorithm parameterized by that computes a constant-size -limited -max-distance sparsifier of . The discussion in Section 4 internally uses the results from Section 3. In the full version, we apply the results of Theorems 1 and 2 to several domains to obtain FPT algorithms for diversification and clustering problems. The full version also contains further related work and the proofs of the lemmas marked with an asterisk.
2 From Sparsifier to Diversification and Clustering
In this section, we provide FPT algorithms for diversification and clustering problems using a -limited -max-distance sparsifier of constant size.
2.1 Diversification
Let be a finite set, , , and . For diversification problems, we have the following.
Lemma 7.
Let be a -limited -max-distance sparsifier of and . Then, there is a -tuple such that holds for all . Particularly, if there exists a solution to the max-min/max-sum diversification problem on , then there exists a solution consisting only of sets in .
Proof.
Assume and let be an index such that . It is sufficient to prove that there is a set such that holds for all . For , let . Since is a -limited -max-distance sparsifier of , there exists such that holds for all . Considering an algorithm that exhaustively searches for a subfamily of of size , we can state the following.
Lemma 8.
Assume there exists an FPT algorithm parameterized by to compute a -limited -max-distance sparsifier of with size bounded by a constant that depends only on . Then, there exists an FPT algorithm parameterized by for the max-min/max-sum diversification problem on .
We note that, under the slightly stronger assumption, the discussion in this section can directly be extended to the case where the sets are taken from different domains. Specifically, let and assume -max-distance sparsifiers of these domains with respect to are computed. Then, we can determine whether there exists a -tuple such that (or ) by exhaustive search on .
2.2 Clustering
Let be a finite set, , , and . Here, we provide an FPT algorithm for the -center and -sum-of-radii clustering problems using a -limited -max-distance sparsifier of . For and , the ball of radius centered at is defined as . The algorithm first guesses a partition of into clusters . Since is constant, the cost of this guess is constant. Then, for each , the algorithm computes the minimum radius such that there is a set satisfying . If , the algorithm asserts it instead of computing the specific value of . The -center clustering problem and -sum-of-radii clustering problem on are solved by checking whether the maximum and sum, respectively, of the s is at most . We show the correctness of this algorithm by proving the following.
Lemma 9.
Let and . Assume holds for all . Then, for all , there is an index such that .
Proof.
Assume the contrary. Then, there is a set such that for all , . Since is a -limited -max-distance sparsifier of , there is a set such that for all , . Hence, for all , contradicting the fact that is a partition of .
We now provide an algorithm to decide whether there exists with for each and . If the domain is , this problem is equivalent to the closest string problem on binary strings, for which a textbook FPT algorithm parameterized by is known [14]. Our algorithm is a modified version of this. An element is bad if there exist both with and with . The following lemma is fundamental.
Lemma 10 ([14]).
If there are more than bad elements, no satisfies .
Let be the set of bad elements, and assume . The algorithm first guesses . The cost of this guess is . Then, it determines whether there exists such that and . Let . Then, we can claim the following.
Lemma 11.
For such that , .
Proof.
Let . Then, . From the definition of , the value of is equal among all . Thus, the maximum value of for is achieved by the set that maximizes . Now, it is sufficient to solve the problem of determining whether there exists such that and . This corresponds to the exact extension oracle on with , , , and . Therefore, we can claim the following:
Lemma 12.
Assume there exists an FPT algorithm parameterized by that computes a -limited -max-distance sparsifier of with size bounded by a constant that depends only on , and the exact extension oracle on whose time complexity is FPT parameterized by . Then, there exists an FPT algorithm parameterized by for the -center/-sum-of-radii clustering problem on .
3 Framework for -Max-Distance Sparification
In this section, we complete the proof of Theorem 1 by providing an FPT algorithm that uses the exact empty extension oracle on to obtain a -max-distance sparsifier of . For further use, we show a slightly more extended result. Let . We construct a -max-distance sparsifier of with respect to for . Since for , this is also a -max-distance sparsifier of (with respect to ). A set family is called a sunflower if there exists a set called core such that for any , . The following is well-known.
Lemma 13 (Sunflower Lemma [14, 18]).
Let be a finite set, , and be a family consisting only of sets of size at most . If , then contains a sunflower of size .
For , , and , a sunflower is a -sunflower of if it satisfies the following three conditions.
-
,
-
For each , , and
-
The core of is a subset of .
The following lemma is the core of our framework.
Lemma 14 (*).
Let be a finite set, , and be a -max-distance sparsifier of with respect to . Let and assume there is a -sunflower of . Then, is also a -max-distance sparsifier of with respect to .
Our algorithm is given in Algorithm 1, where ExactEmptyExtension() represents the exact empty extension oracle on with arguments and . The algorithm starts with and repeatedly adds such that there is no -sunflower of to . The following lemma shows this algorithm stops after a constant number of iterations.
Lemma 15 (*).
In particular, at each step of the algorithm, since , the size of chosen in line 7 is bounded by a constant. The time complexity is bounded as follows.
Lemma 16 (*).
Algorithm 1 makes at most calls of
ExactEmptyExtension() and has a time complexity of .
The correctness of the algorithm is shown as follows.
Lemma 17 (*).
Algorithm 1 outputs a -max-distance sparsifier of with respect to .
4 Framework for -Limited -Max-Distance Sparsification
4.1 Overall Flow
In this section, we complete the proof of Theorem 2 by providing FPT algorithm that uses the -optimization oracle and the exact extension oracle on to obtain a -limited -max-distance sparsifier of . Actually, for further applications, we construct the slightly more general object of -limited -max-distance sparsifier of with respect to , not with respect to itself. Our framework consists of two steps. Let be an integer with . The first step achieves one of the following.
-
Find a set with size at most such that .
-
Find a set of size such that holds for any distinct .
We do this by using the following approximate far set oracle, which will be designed in Section 4.2.
Approximate Far Set Oracle: Let be a finite set, , , and . The approximate far set oracle returns one of the following.
A set of that does not belong to .
. This option can be chosen only when .
Starting with , we repeat the following steps. If the approximate far set oracle returns , terminate the loop. Otherwise, add the element found by the oracle to . If the oracle returns within iterations, the first condition is achieved. If not, the set after iterations satisfies the second condition. In the latter case, the following lemma shows that is a -limited -max-distance sparsifier.
Lemma 18 (*).
Let . Let be a subset of of size such that for any distinct , . Then, is a -limited -max-distance sparsifier of with respect to .
Now, we assume the first condition. Let be a subset of of size at most such that . The second step involves constructing a -limited -max-distance sparsifier of with respect to for each . We prove that the union of all such -limited -max-distance sparsifiers obtained in this manner is a -limited -max-distance sparsifier of with respect to .
Lemma 19 (*).
Assume . For each , let be a -limited -max-distance sparsifier of with respect to . Then, is a -limited -max-distance sparsifier of with respect to .
Next, we reduce the computation of -limited -max-distance sparsifiers to the computation of -max-distance sparsifiers of families consisting of constant-size sets, which was discussed in Section 3. For , let . By the definition of , we have . The following holds.
Lemma 20 (*).
Let . A subset is a -limited -max-distance sparsifier of with respect to if and only if is a -limited -max-distance sparsifier of with respect to .
If is a -max-distance sparsifier of with respect to , then it is also a -limited -max-distance sparsifier of with respect to for any . Therefore, a -limited -max-distance sparsifier of with respect to can be computed as
From the discussion in Section 3, can be obtained by calling the exact empty extension oracle on a constant number of times that depends on , , and . The exact empty extension oracle for is equivalent to the exact extension oracle on when the inputs are taken to be , respectively. Therefore, can be obtained by calling the exact extension oracle on a constant number of times that depends only on and .
4.2 Designing the Approximate Far Set Oracle
Here, we design a randomized algorithm parameterized by and for the approximate far set oracle. Our algorithm repeats the following sufficient number of times: It selects a weight vector uniformly at random and finds a set that maximizes . If the found does not belong to , it outputs this and terminates. If no such is found after a sufficient number of iterations, it returns . We now prove the correctness of the algorithm. We can claim the following.
Lemma 21 (*).
Assume . Then, the that attains the maximum on the left-hand side does not belong to .
The following lemma is the core of the analysis.
Lemma 22 (*).
Assume and . Let and assume . Then,
By repeating the sampling of a sufficient number of times, we can state the following.
Lemma 23.
Let , , , and . Assume . Then, there exists a randomized algorithm that runs in time and satisfies the following.
-
If there exists such that , the algorithm returns a set satisfying with probability at least .
-
If not, the algorithm returns either or a set satisfying .
Lemma 24 (*).
Let , , , and . Then, there exists a randomized algorithm that, with probability , computes a -limited -max-distance sparsifier of with respect to with size at most in time . It uses at most calls to the -optimization oracle on and at most calls to the exact extension oracle on such that and .
References
- [1] Pankaj Agarwal, Aryan Esmailpour, Xiao Hu, Stavros Sintos, and Jun Yang. Computing a well-representative summary of conjunctive query results. Proceedings of the ACM on Management of Data, 2(5):1–27, 2024. doi:10.1145/3695835.
- [2] Pankaj Agarwal, Sariel Har-Peled, and Kasturi Varadarajan. Approximating extent measures of points. Journal of the ACM (JACM), 51(4):606–635, 2004. doi:10.1145/1008731.1008736.
- [3] Amihood Amir, Jessica Ficler, Liam Roditty, and Oren Sar Shalom. On the efficiency of the hamming -centerstring problems. In Combinatorial Pattern Matching (CPM), pages 1–10. Springer, 2014. doi:10.1007/978-3-319-07566-2_1.
- [4] Sayan Bandyapadhyay, William Lochet, and Saket Saurabh. FPT constant-approximations for capacitated clustering to minimize the sum of cluster radii. In International Symposium on Computational Geometry (SoCG), volume 258, pages 12:1–12:14, 2023. doi:10.4230/LIPICS.SOCG.2023.12.
- [5] Jørgen Bang-Jensen, Kristine Vitting Klinkby, and Saket Saurabh. -distinct branchings admits a polynomial kernel. In European Symposium on Algorithms (ESA), pages 11:1–11:15, 2021. doi:10.4230/LIPICS.ESA.2021.11.
- [6] Jørgen Bang-Jensen, Saket Saurabh, and Sven Simonsen. Parameterized algorithms for non-separating trees and branchings in digraphs. Algorithmica, 76:279–296, 2016. doi:10.1007/S00453-015-0037-3.
- [7] Julien Baste, Michael Fellows, Lars Jaffke, Tomáš Masařík, Mateus de Oliveira Oliveira, Geevarghese Philip, and Frances Rosamond. Diversity of solutions: An exploration through the lens of fixed-parameter tractability theory. Artificial Intelligence, 303:103644, 2022. doi:10.1016/J.ARTINT.2021.103644.
- [8] Julien Baste, Lars Jaffke, Tomáš Masařík, Geevarghese Philip, and Günter Rote. FPT algorithms for diverse collections of hitting sets. Algorithms, 12(12):254, 2019. doi:10.3390/A12120254.
- [9] Clément Carbonnel and Emmanuel Hebrard. Propagation via kernelization: The vertex cover constraint. In International Conference on Principles and Practice of Constraint Programming (CP), pages 147–156. Springer, 2016. doi:10.1007/978-3-319-44953-1_10.
- [10] Clément Carbonnel and Emmanuel Hebrard. On the kernelization of global constraints. In International Joint Conference on Artificial Intelligence (IJCAI), pages 578–584, 2017. doi:10.24963/IJCAI.2017/81.
- [11] Moses Charikar and Rina Panigrahy. Clustering to minimize the sum of cluster diameters. In ACM Symposium on Theory of Computing (STOC), pages 1–10, 2001. doi:10.1145/380752.380753.
- [12] Xianrun Chen, Dachuan Xu, Yicheng Xu, and Yong Zhang. Parameterized approximation algorithms for sum of radii clustering and variants. In AAAI Conference on Artificial Intelligence, volume 38, pages 20666–20673, 2024. doi:10.1609/AAAI.V38I18.30053.
- [13] Ryan Curtin, Benjamin Moseley, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. RK-means: Fast clustering for relational data. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 2742–2752. PMLR, 2020. URL: http://proceedings.mlr.press/v108/curtin20a.html.
- [14] Marek Cygan, Fedor Fomin, Łukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. Parameterized algorithms. Springer, 2015. doi:10.1007/978-3-319-21275-3.
- [15] Erik Demaine, Fedor Fomin, MohammadTaghi Hajiaghayi, and Dimitrios Thilikos. Fixed-parameter algorithms for -center in planar graphs and map graphs. ACM Transactions on Algorithms (TALG), 1(1):33–47, 2005. doi:10.1145/1077464.1077468.
- [16] Eduard Eiben, Robert Ganian, Iyad Kanj, Sebastian Ordyniak, and Stefan Szeider. On the parameterized complexity of clustering problems for incomplete data. Journal of Computer and System Sciences, 134:1–19, 2023. doi:10.1016/J.JCSS.2022.12.001.
- [17] Eduard Eiben, Tomohiro Koana, and Magnus Wahlström. Determinantal sieving. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 377–423. SIAM, 2024. doi:10.1137/1.9781611977912.16.
- [18] Paul Erdös and Richard Rado. Intersection theorems for systems of sets. Journal of the London Mathematical Society, 1(1):85–90, 1960.
- [19] Aryan Esmailpour and Stavros Sintos. Improved approximation algorithms for relational clustering. Proceedings of the ACM on Management of Data, 2(5):213:1–213:27, 2024. doi:10.1145/3695831.
- [20] Andreas Emil Feldmann and Dániel Marx. The parameterized hardness of the -center problem in transportation networks. Algorithmica, 82:1989–2005, 2020. doi:10.1007/S00453-020-00683-W.
- [21] Fedor Fomin, Petr Golovach, Lars Jaffke, Geevarghese Philip, and Danil Sagunov. Diverse pairs of matchings. Algorithmica, 86(6):2026–2040, 2024. doi:10.1007/S00453-024-01214-7.
- [22] Fedor Fomin, Petr Golovach, Fahad Panolan, Geevarghese Philip, and Saket Saurabh. Diverse collections in matroids and graphs. Mathematical Programming, 204(1):415–447, 2024. doi:10.1007/S10107-023-01959-Z.
- [23] Ryo Funayama, Yasuaki Kobayashi, and Takeaki Uno. Parameterized complexity of finding dissimilar shortest paths. arXiv preprint arXiv:2402.14376, 2024. doi:10.48550/arXiv.2402.14376.
- [24] Tatsuya Gima, Yuni Iwamasa, Yasuaki Kobayashi, Kazuhiro Kurita, Yota Otachi, and Rin Saito. Computing diverse pair of solutions for tractable sat. arXiv preprint arXiv:2412.04016, 2024. doi:10.48550/arXiv.2412.04016.
- [25] Gregory Gutin, Felix Reidl, and Magnus Wahlström. -distinct in-and out-branchings in digraphs. Journal of Computer and System Sciences, 95:86–97, 2018. doi:10.1016/J.JCSS.2018.01.003.
- [26] Tesshu Hanaka, Yasuaki Kobayashi, Kazuhiro Kurita, and Yota Otachi. Finding diverse trees, paths, and more. In AAAI Conference on Artificial Intelligence, volume 35, pages 3778–3786, 2021. doi:10.1609/AAAI.V35I5.16495.
- [27] Wen-Lian Hsu and George Nemhauser. Easy and hard bottleneck location problems. Discrete Applied Mathematics, 1(3):209–215, 1979. doi:10.1016/0166-218X(79)90044-1.
- [28] Tanmay Inamdar and Kasturi Varadarajan. Capacitated sum-of-radii clustering: An FPT approximation. In European Symposium on Algorithms (ESA), 2020.
- [29] Neeldhara Misra, Harshil Mittal, and Ashutosh Rai. On the parameterized complexity of diverse SAT. In International Symposium on Algorithms and Computation (ISAAC), volume 322, pages 50:1–50:18, 2024. doi:10.4230/LIPICS.ISAAC.2024.50.
- [30] Benjamin Moseley, Kirk Pruhs, Alireza Samadian, and Yuyan Wang. Relational algorithms for -means clustering. In International Colloquium on Automata, Languages, and Programming (ICALP), volume 198, pages 97:1–97:21, 2021. doi:10.4230/LIPICS.ICALP.2021.97.
- [31] Yuto Shida, Giulia Punzi, Yasuaki Kobayashi, Takeaki Uno, and Hiroki Arimura. Finding diverse strings and longest common subsequences in a graph. In Symposium on Combinatorial Pattern Matching (CPM), volume 296, pages 27:1–27:19, 2024. doi:10.4230/LIPICS.CPM.2024.27.
