Hardness of Median and Center in the Ulam Metric
Abstract
The classical rank aggregation problem seeks to combine a set of permutations into a single representative “consensus” permutation. In this paper, we investigate two fundamental rank aggregation tasks under the well-studied Ulam metric: computing a median permutation (which minimizes the sum of Ulam distances to ) and computing a center permutation (which minimizes the maximum Ulam distance to ) in two settings.
- Continuous Setting:
-
In the continuous setting, the median/center is allowed to be any permutation. It is known that computing a center in the Ulam metric is NP-hard and we add to this by showing that computing a median is NP-hard as well via a simple reduction from the Max-Cut problem. While this result may not be unexpected, it had remained elusive until now and confirms a speculation by Chakraborty, Das, and Krauthgamer [SODA ’21].
- Discrete Setting:
-
In the discrete setting, the median/center must be a permutation from the input set. We fully resolve the fine-grained complexity of the discrete median and discrete center problems under the Ulam metric, proving that the naive -time algorithm (where is the length of the permutation) is conditionally optimal. This resolves an open problem raised by Abboud, Bateni, Cohen-Addad, Karthik C. S., and Seddighin [APPROX ’23]. Our reductions are inspired by the known fine-grained lower bounds for similarity measures, but we face and overcome several new highly technical challenges.
Keywords and phrases:
Ulam distance, median, center, rank aggregation, fine-grained complexityFunding:
Mursalin Habib: Supported by the National Science Foundation under Grants CCF-2313372, CCF-2422558, and CCF-2443697.Copyright and License:
2012 ACM Subject Classification:
Mathematics of computing Permutations and combinations ; Mathematics of computing Combinatorial optimization ; Theory of computation Problems, reductions and completenessAcknowledgements:
The authors would like to thank the DIMACS Workshop on Efficient Algorithms for High Dimensional Metrics: New Tools for a special collaboration opportunity as well as Siam Habib and Antti Roeyskoe for helpful discussions.Funding:
Part of this work was done while Mursalin Habib and Karthik C. S. were visiting INSAIT, Sofia University “St. Kliment Ohridski”. Nick Fischer, Mursalin Habib, and Karthik C. S. were partially funded by the Ministry of Education and Science of Bulgaria’s support for INSAIT as part of the Bulgarian National Roadmap for Research Infrastructure.Editors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz HermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Suppose that judges each rank the performances of competitors. Given these rankings, how can the judges agree on a single consensus ranking? This fundamental question lies at the heart of a class of tasks known as rank aggregation, which has applications across various fields, including social choice theory [11], bioinformatics [37], information retrieval [29], machine learning [38], and recommendation systems [41], among others. Formally, the judges’ rankings can be represented as a set of permutations . Then, for an appropriate metric on the space of permutations , the two most prominent rank aggregation tasks are to compute a median permutation which minimizes [33, 49, 50, 23], or a center permutation which minimizes [8, 10, 43].
Among the metrics studied in this context, two stand out. The first one is the classic Kendall’s tau distance which measures the number of disagreeing pairs between two permutations, i.e., the number of pairs for which one ranking orders before while the other orders before . Kendall’s tau distance is well-motivated as it satisfies several desirable properties beyond the scope of this paper (e.g., neutrality, consistency, and the extended Condorcet property [33, 47]), and is also well-understood from a computational point-of-view [23, 24]. For example, it is known that computing the median or center of just four permutations is already NP-hard [23, 10]. Several approximation algorithms have also been proposed for this metric [5, 46], culminating in a PTAS [34, 42] for approximating the median under Kendall’s tau metric.
The other key metric is the Ulam distance which measures the minimum number of relocation operations required to turn one permutation into another permutation – i.e., the minimum number of competitors whose ranks have to be adjusted in so that it agrees with . This metric offers a simpler and more practical alternative to Kendall’s tau metric for rank aggregation tasks [20, 16, 19, 18]. Perhaps more importantly, the Ulam metric is intimately linked to the more general edit metric on arbitrary strings, which enjoys countless applications in computational biology [28, 42], specifically in the context of DNA storage systems [27, 44], and beyond [35, 39]. Despite the significance of Ulam rank aggregation problems and the extensive research dedicated to them [8, 16, 19, 18], some basic questions remain unanswered. This is the starting point of our paper.
1.1 Question 1: Polynomial-Time Algorithms for Ulam Median?
The first basic question is whether polynomial-time algorithms exist for exactly computing the center and median permutations under the Ulam metric. For almost all string metrics, including the aforementioned Kendall’s tau metric but also metrics beyond permutations such as the Hamming metric [26, 36, 4] or the edit metric [22, 40], median and center problems are well-known and easily-proven to be NP-hard.111A notable exception is the Hamming median problem that can trivially be solved in polynomial time by a coordinate-wise plurality vote. Quite surprisingly, while it is known that computing an Ulam center is NP-hard [8], the complexity of computing an Ulam median has remained an open question. This is not due to a lack of interest – despite the absence of an NP-hardness proof, Chakraborty, Das, and Krauthgamer have already initiated the study of approximation algorithms for the Ulam median problem [16, 17], achieving a 1.999-factor approximation in polynomial time. Our first contribution is that we finally provide this missing hardness proof:
Theorem 1.
The median problem is NP-hard in the Ulam metric.
1.2 Question 2: Fine-Grained Complexity of Discrete Ulam Center and Median?
How can we circumvent this new lower bound? There are two typical approaches. The first is to resort to approximation algorithms as was done in [16, 17]. But there is a commonly-studied second option for aggregation and clustering type problems: to restrict the solution space to only the input set of permutations and compute the best median or center from it.222Or alternatively, the best median/center among the permutations in another given set ; this is typically referred to as the bichromatic discrete center/median problem, or as center/median problem with facilities in the theory of clustering. All of our results also apply to these bichromatic variants. The best median or center from the input set is typically referred to as the discrete median or the discrete center, respectively, of . (In the same spirit, we will occasionally refer to the unrestricted median and center, discussed in the previous subsection, as the continuous median and continuous center, respectively.) Besides being a natural polynomial-time rank aggregation task, computing discrete medians or centers has two other motivations. First, it is easy to see that computing a discrete median/center yields a 2-approximation of the (continuous) median/center problems. In fact, a key observation in [16] is that a discrete median often provides a -approximation for the (continuous) median, particularly in practical DNA storage system instances where distances tend to be large. Second, the discrete median and center problems have gained significant attraction for the easier Hamming metric [2, 4] and harder edit metric [2], often leading to matching upper and lower bounds. Studying the Ulam metric therefore serves as an interesting intermediate problem capturing some – but not all – of the hardness of the edit metric.
Driven by these motivations, we study the fine-grained complexity of the discrete median and discrete center problems with respect to the Ulam distance. That is, we aim to pinpoint their precise polynomial run times.
Discrete Ulam Center
The trivial algorithm for computing a discrete center is to explicitly compute the Ulam distance for all pairs of permutations . We can then easily select the permutation minimizing . As the Ulam distance between two length- permutations can be computed in near-linear time333We write to suppress polylogarithmic factors. using the well-known patience sorting algorithm [6], the total time is .
We prove that this simple algorithm is optimal, up to subpolynomial factors and conditioned on a plausible assumption from fine-grained complexity:
Theorem 2.
Let and . There is no algorithm running in time that solves the discrete center problem in the Ulam metric for permutations of length , unless the Quantified Strong Exponential Time Hypothesis fails.
The Quantified Strong Exponential Time Hypothesis (QSETH) is a plausible generalization of the by-now well-established Strong Exponential Time Hypothesis (SETH), postulating that the CNF-SAT problem cannot be solved faster than brute-force search even when only some variables are existentially quantified and others are universally quantified (see Section 3.1 for the formal treatment) [12]. This hypothesis has already proven useful for conditional lower bounds for a wide array of problems [12, 3, 2]. Besides, we remark that it is impossible to obtain SETH-based lower bounds for the discrete center problem unless the Nondeterministic Strong Exponential Time Hypothesis [15] is false (see Section 5.1).
We emphasize that these lower bounds applies to the full range of versus , as long as is at least polynomial in . In the case when is very small, , previous work by Abboud, Bateni, Cohen-Addad, Karthik C. S. and Seddighin already established a matching conditional lower bound of [2].
It is interesting to compare Theorem 2 with the state of the art for discrete center problems in the Hamming metric (say, over a constant-size alphabet) and the edit metric. For concreteness, consider the case (i.e., the input consists of strings/permutations of length roughly ). Then, on the one hand, the discrete center problem in the Hamming metric can be solved in time [2, 4], where is the exponent of matrix multiplication [7]. On the other hand, the discrete center problem in the edit metric cannot be solved in time unless QSETH fails [2]. Therefore, remarkably, Theorem 2 indeed places the discrete center problem in the Ulam metric as a problem of intermediate complexity . This answers an explicit open question posed by Abboud, Bateni, Cohen-Addad, Karthik C. S., and Seddighin [2].
Discrete Ulam Median
The trivial algorithm for the discrete median problem is exactly the same as for the center problem: First compute all pairwise distances , then select the permutation minimizing . It also runs in time . So perhaps one could also hope that a matching lower bound follows from our Theorem 2. Unfortunately, this turns out to be true only for a restricted subproblem.444Namely, the bichromatic discrete median problem mentioned in Footnote 2. Nevertheless, with considerable technical overhead we manage to prove essentially the same matching lower bound:
Theorem 3.
Let and . There is no algorithm running in time that solves the discrete median in the Ulam metric for permutations of length , unless the SETH fails.
2 Proof Overview
In this section, we provide the proof overviews of Theorems 1, 2, and 3. We avoid some technicalities to highlight the core ideas – for instance, some subpolynomial factors have been dropped. Complete proofs can be found in Sections 4, 5, and 6, respectively.
2.1 NP-Hardness for Continuous Median in the Ulam Metric
In this section, we provide a high-level sketch of our NP-hardness proof for the continuous median problem in the Ulam metric. Our starting point is the Max-Cut problem555Recall that the Max-Cut problem is, given an undirected graph to compute a vertex partition maximizing the number of edges from to .. Given a Max-Cut instance where , our goal is to construct a set of permutations of length such that the median of these permutations encodes the cut in of maximum size.
To achieve this, we first set up a natural correspondence between cuts in and permutations of length . However, since there are many more permutations than cuts, not all permutations will represent valid cuts. To ensure that only relevant permutations are considered, we construct two special permutations:
for some fixed long permutations . The simple key insight is that any median of and has the following form: it starts with some subset of the symbols in increasing order, followed by , followed by the symbols in in decreasing order, finally followed by . Thus, medians of will naturally encode cuts of the form in .
To further enforce that the median represents a maximum cut, we include additional permutations in our instance. Specifically, for each edge , we include two permutations, and , which reward picking solutions that correspond to partitions that cut . This ensures that the final median permutation encodes a maximum cut of . The precise construction and formal analysis of these edge-cutting permutations is detailed in Section 4.
2.2 Fine-Grained Lower Bound for Discrete Center in the Ulam Metric
Our proof of the lower bound for the discrete center problem in the Ulam metric relies on two key components. The first is a reduction from the Orthogonal Vectors (OV) problem to the problem of computing the Ulam distance between two permutations. Specifically, we seek a pair of functions that, given two sets of binary vectors as inputs, independently output two permutations whose Ulam distance is small if and only if there exists an orthogonal pair of vectors in the input sets. This falls into a well-established framework in the fine-grained complexity literature [9, 13, 1]: Given two sets of roughly binary vectors, one constructs “coordinate gadgets”, “vector gadgets”, and “OR-gadgets” to produce a pair of length- strings whose edit distance encodes the existence of an orthogonal pair.
Clearly such a reduction cannot exist for the Ulam distance under SETH. The Ulam distance between two permutations can be computed in near-linear time by a simple reduction to the longest increasing subsequence problem [45], and thus, if there were a way to transform sets of many vectors into permutations of length such that the existence of an orthogonal pair in the sets could be determined via an Ulam distance computation of these permutations, then that would imply a near-linear time algorithm for OV, falsifying SETH! In light of this observation, we start with vectors in the OV instance. The constructions of the coordinate and vector gadgets are similar to the edit distance reduction. However, during the construction of the OR-gadgets, there is a quadratic blowup resulting in length permutations with the desired properties. A detailed construction of these gadgets can be found in Section 5.
The second component in our proof is a reduction from a problem called -Orthogonal Vectors. In this problem, we are given four sets of binary vectors and we have to decide if there exists , such that for all , there exist such that are orthogonal666We say that vectors are orthogonal if . If and , then this problem has a lower bound under the Quantified Strong Exponential Time Hypothesis. Given such sets , we proceed as follows. First, for each , we construct the set of vectors by taking the pointwise product of with all vectors in . There will be such sets , one for each choice of . Similarly, for each , we construct the set of vectors by taking the pointwise product of with all vectors in . We then run the OV to Ulam distance reduction from before on these sets to obtain two sets of -many permutations of length . Finally, we show that there exists a permutation in the first set with small Ulam distance to every permutation in the second set if and only if the starting -OV instance is a YES-instance.
To go from these two sets to the final discrete center instance, we append additional symbols to each permutation and introduce a new permutation that is far from every permutation in the second set. This ensures that the center indeed comes from the first set completing the reduction. We defer the details to Section 5.
2.3 Fine-Grained Lower Bound for Discrete Median in the Ulam Metric
Our lower bound proof for the discrete median problem follows a similar initial approach as our proof for the center lower bound, with only one difference: instead of starting with an -OV instance, we begin with a -OV (also known as 4-OV) instance. Given this 4-OV instance, we retrace the same steps to construct two sets, and , each containing permutations of length . As in the center proof, we show that there exists a permutation in whose total Ulam distance to all elements in is small if and only if the original 4-OV instance is a YES-instance.
However, going from these two sets to the standard single-set version of the discrete median problem is technically very challenging. In fact, such challenges were addressed in the past in the context of the closest pair problem [21, 32], and more generally identified as the task of reversing color coding [14], typically requiring extremal combinatorial objects which are then composed with the input in a black-box manner.
In this paper, we transform the bichromatic instance to a monochromatic one, in multiple steps but in a white-box manner using the structure of the input instance. The first key observation is that all pairwise Ulam distances within can be computed much faster than the naive time bound, specifically, in time. This speedup is possible because the permutations in are not arbitrary but outputs formed during our OV to Ulam distance reduction. Thus, in time, we can compute the total Ulam distance of each to all other elements in .
Once these distance sums are computed, we initiate a balancing procedure. This procedure iteratively appends additional symbols to each permutation in such that:
-
For every permutation in , the sum of its Ulam distances to all other elements in becomes equal.
-
The relative Ulam distances between permutations across the sets remain unchanged.
-
For every permutation in , the sum of its Ulam distances to all other elements in becomes very large.
We show that this balancing procedure can be performed efficiently without significantly increasing the permutation lengths. Finally, we include all modified permutations into a single set, and output that as our final discrete median instance. The details turn out to be quite involved, and we direct the reader to Section 6 for further details.
3 Preliminaries
Sets, Strings and Permutations
For a positive integer , let denote the set . We denote by the set of all permutations over . Throughout the paper, we treat any permutation as a string over the alphabet with no repeating symbols, and we write . Given strings over some alphabet, we denote by the concatenated string . We will often require strings that can be obtained by adding some fixed “offset” to each symbol of some other canonical string777Here we are treating the symbols of a string as integers themselves.. For every nonnegative integer and string of length , we let . In other words, is the string obtained by adding to each symbol of . We will also often require strings that are obtained by restricting some string to some sub-alphabet. Given any string over the alphabet and any sub-alphabet , we denote by to be string obtained from by deleting all symbols that are not in . Given , we write for the inner product of and , i.e., . We further write for the pointwise product and , i.e., is the vector satisfying for all .
Ulam Distances and Common Subsequences
The Hamming distance between two equal-length strings and , denoted by , is the number of locations where and have different symbols. Let be a permutation and be distinct positions. A symbol relocation operation from position to position in constitutes of deleting the symbol of and reinserting it at position . More formally, if is the permutation obtained after applying a symbol relocation from position to position in , then:
Given , the Ulam distance between and , denoted by , is the minimum number of symbol relocation operations required to transform into . We will further denote by the length of a longest common subsequence of two strings and . We will frequently use the following fact throughout the paper, which relates the Ulam distance between two permutations to the length of their longest common subsequence.
Fact 4 ([6]).
For every , we have .
3.1 Hardness Assumptions
Our results are conditional on several hardness assumptions and hypotheses, which we list in this section. The first one is the Strong Exponential Time Hypothesis, which is a standard assumption in the theory of fine-grained complexity.
Hypothesis 5 (Strong Exponential Time Hypothesis (SETH)).
For all , there exists such that no algorithm running in time can solve the -SAT problem on variables.
More specifically, for one of our lower bounds, we will need the following corollary of SETH, which we dub Unbalanced -OVH.
Hypothesis 6 (Unbalanced -OVH).
For all , no algorithm can, given sets with , and , determine if there exists , , , such that in time .
It is well-known that SETH in conjunction with the sparsification lemma [30] implies Unbalanced -OVH [48]. We will also make use of the following strengthening of SETH.
Hypothesis 7 (SETH).
For all and , there exists , such that given a -CNF formula over the variables , no algorithm running in time can determine if the following is true:
We note that SETH is a special case of the Quantified SETH proposed by Bringmann and Chaudhury [12] – a hypothesis about the complexity of deciding quantified -CNF formulas with a constant number of quantifier blocks where each block contains some constant fraction of the variables. We do not formally define Quantified SETH in all of its generality since we only require three quantfier alternations. In fact, the specific hardness assumption we need is the following which is implied by SETH.
Hypothesis 8 (Unbalanced OVH).
For all , no algorithm can, given sets with , and , determine if there exists such that for all , there exist such that in time .
4 NP-Hardness of Continuous Median in the Ulam Metric
In this section, we prove Theorem 1. Before we do so, we first formally define the continuous median problem in the Ulam metric.
| Continuous Ulam Median |
| Input: | A set of permutations and an integer . |
| Question: | Is there a permutation such that ? |
The main result of this section is the following. All missing proofs in this section can be found in the full version of this paper [25].
Theorem 9.
Continuous Ulam Median is NP-hard.
Proof.
We will reduce from the Max Cut problem, which is NP-hard [31]. Let be a Max Cut instance with . From , we will construct a Continuous Ulam Median instance consisting of permutations of length . On a high level, our construction will work as follows. We will first construct many copies of two special permutations that will force every median of to take on a very specific structure. All permutations of this structure will naturally encode cuts of the vertex set . Then for each edge in , we will construct permutations that reward choosing a median that “cuts” . Thus, we will end up with a set of permutations whose overall median will encode the maximum cut of . Details follow.
To describe our construction, it will be convenient to define the following two strings, both of length :
Next, we define the two special permutations alluded to earlier:
We make the observation that every median of the set naturally encodes a cut of the vertex set .
Definition 10.
For a nonnegative integer , let and be sets such that , and . Define as:
Furthermore, define as:
Clearly, permutations in naturally encode cuts of . We will first show that every permutation in has the same sum of Ulam distances to and .
Lemma 11.
For every , .
Next, we show that any permutation that is not in has strictly larger sum of Ulam distances to and .
Lemma 12.
Let . Then .
The final pieces in our construction are the “edge gadgets”, which we now define. For each edge , where , define the following two strings:
The role of the edge gadgets associated with an edge is to reward choosing partitions of the vertices that cut . This is formalized in the following lemma.
Lemma 13.
Let such that with and . Then we have the following:
Now let and be the set consisting of copies of and . Our final Continuous Ulam Median instance will be the multiset
We now show that has a cut of size at least if and only if the median of has cost at most , where . For the completeness case, assume there exist sets with such that the partition cuts at least edges in . Denote by the set of all edges cut by the partition . Now consider the permutation and note that:
For the soundness case, assume that there exists whose median cost to is at most . We can further assume that since otherwise, every will have a median cost that is at most that of . Indeed, assume that and fix any . We have:
Thus, the assumption that is without loss of generality. Then, we have for sets with . We claim that the partition cuts at least edges in . Indeed, since , we have . Then, by Lemma 13 the partition cuts at least edges.
Remark 14.
Although our NP-hardness reduction produces multisets instead of sets, it is not to hard to turn them into sets by appending a unique permutation to each permutation without affecting the structure of the solution. See [25, Appendix B] for details.
5 Fine-Grained Complexity of Discrete Center in the Ulam Metric
In this section, we prove Theorem 2. We first formally define the discrete center problem in the Ulam metric.
| Discrete Ulam Center |
| Input: | A set of permutations such that and an integer . |
| Question: | Is there a permutation such that ? |
The main result of this section is the following. All missing proofs in this section can be found in the full version of the paper [25].
Theorem 15.
Let and . There is no algorithm running in time that solves the Discrete Ulam Center problem for permutations of length , unless SETH fails.
The key step in the proof of Theorem 15 is a reduction from Orthogonal Vectors to Ulam Distance, i.e., to construct a pair of functions that, given a set of binary vectors of length each as input, outputs, independently of each other, a pair of permutations whose Ulam distance is small if and only if there exists an orthogonal pair of vectors in the input sets.
Theorem 16.
There exists a pair of functions and such that for all sets with , the following holds.
-
, i.e., both and output permutations of length .
-
If there exists such that , then the Ulam distance between and is at most . Otherwise, the Ulam distance between and is exactly .
-
Both and are computable in time .
Equipped with Theorem 16, Theorem 15 follows in a straightforward manner. For details, the reader is referred to [25].
5.1 The Need for Quantifiers
Our lower bound for Discrete Ulam Center is based on a plausible generalization of the Strong Exponential Time Hypothesis, namely SETH. One could ask if we could get a similar lower bound under the weaker but more standard SETH instead. We remark that this is impossible unless the Nondeterministic Strong Exponential Time Hypothesis (NSETH) [15] is false. This is because Discrete Ulam Center can be solved in (co-)nondeterministic time with the following simple algorithm. We first guess the center and compute . Then for each permutation , we guess the furthest permutation and verify that , thereby certifying that our guess is optimal. In light of this algorithm, a fine-grained lower bound based on SETH would contradict NSETH.
6 Fine-Grained Complexity of Discrete Median in the Ulam Metric
In this section, we prove a tight fine-grained lower bound for the Discrete Ulam Median problem conditioned on SETH. We first give a formal statement of the problem.
| Discrete Ulam Median |
| Input: | A set of permutations such that and an integer . |
| Question: | Is there a permutation such that ? |
This section is organized as follows. In Section 6.1, we start with a simple lower bound for the bichromatic version of the problem that essentially follows from the proof for the Discrete Ulam Center problem from before. Going from the bichromatic version to the standard monochromatic version defined above is presented in Section 6.2. This step is technically very involved: we show that any set of permutations can be balanced in such a way that the sum of Ulam distances from each permutation to all the others is (approximately) equal (see Theorem 19).
6.1 Hardness for Bichromatic Instances
In this subsection, we give a fine-grained lower bound for the Bichromatic Discrete Ulam Median problem. In this problem, we given two sets of permutations and an integer , and the goal is to determine if there exists such that .
Theorem 17.
Let and . There is no algorithm running in time that solves the Bichromatic Discrete Ulam Median problem for permutations of length , unless the SETH fails.
Proof.
The proof is almost identical to the first half of the proof of Theorem 15. The only difference is the starting problem, which is 4-OV (i.e., ) instead of . Given a 4-OV instance where , and , we retrace the proof of Theorem 15 and construct the sets and . Finally, we set . Once again, by Theorem 16, it is not hard to see that there exists such that if and only if the starting 4-OV instance is a YES-instance. The conclusion then follows from Unbalanced -OVH.
6.2 Hardness for Monochromatic Instances
Finally, in this subsection, we provide a fine-grained lower bound for the Monochromatic Discrete Ulam Median problem. To do so, we need the following technical results (which is one of our main contributions), whose proofs can be found in the full version of the paper [25].
Lemma 18 (Embedding the Hamming Metric on Small Alphabets).
Let . In time we can construct permutations such that for all .
Theorem 19 (Full Balancing).
Let and be integers such that is divisible by and let . Given , in time we can construct permutations and an integer with the following two properties:
-
Writing , it holds that for all .
-
It holds that .
Using the above theorem, we now prove the following:
Theorem 20.
Let and . There is no algorithm running in time that solves the (Monochromatic) Discrete Ulam Median problem for permutations of length , unless the SETH fails.
Proof.
We follow the same reduction as in Theorem 17, which, given an initial -OV instance produces an instance of Bichromatic Discrete Ulam Median on permutations of length . During the reduction, we can set everything up so that . Without loss of generality, we may also assume that is divisible by by adding some dummy vectors in the initial -OV instance if necessary. Let denote the permutations in and let denote the permutations in . As a first preprocessing step, concatenate each permutation (or ) two times to itself (using fresh symbols when necessary) so that each permutation becomes of length and the median distance becomes a multiple of .
We first make the observation that the Ulam distance between any two permutations in can be computed very quickly – in time that is proportional to the square root of the length of the permutations.
Observation 21.
Let . Then can be computed in time .
Proof.
Since , there exist such that , where is the function from Theorem 16 and each concatenation is done using a fresh set of symbols. Similarly, there exist such that , where again each concatenation is with a fresh set of symbols.
Therefore, we have . Thus, can be found by computing the Hamming distance between two bit strings of length . Further recall that . Therefore, and the conclusion follows.
By Observation 21, we can compute in time all pairwise distances . Let ; clearly we have that . Thus, we may apply Theorem 19 to obtain permutations , where , with
for some integer and for all , and with .
Additionally, let . Compute some length- permutations such that and such that . For instance, viewing as --strings under the Hamming distance to be embedded by Lemma 18, take to be the all- string of length , let half of the strings be the string and let the other half of the strings be the string .
We are now ready to construct the Monochromatic Discrete Ulam Median instance . We include into the following permutations:888As always, we use fresh symbols when necessary to ensure that the resulting strings are permutations.
-
(for ), and
-
(for ).
We claim that this construction is correct in the following sense: (1) All discrete medians of are strings . (2) Whenever is a discrete median in , then is a discrete median in . (3) There is some discrete median in such that is a discrete median in . The proofs of all three claims easily follow from the following calculations. On the one hand, the median distance for each is
| On the other hand, the median distance for each is | ||||
Comparing these two terms, and recalling that , it is clear that the median distance of any is always significantly larger that the median distance of any proving (1). Recalling further that all median distances in the original instance are multiples of , the term in the first computation becomes irrelevant, completing the proofs of claims (2) and (3).
Finally we comment on the running time. The original reduction, along with the computation of the values takes time . Running Theorem 19 takes time , and the final instance can be implemented in negligible overhead.
References
- [1] Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In Venkatesan Guruswami, editor, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59–78. IEEE Computer Society, 2015. doi:10.1109/FOCS.2015.14.
- [2] Amir Abboud, MohammadHossein Bateni, Vincent Cohen-Addad, Karthik C. S., and Saeed Seddighin. On complexity of 1-center in various metrics. In Nicole Megow and Adam D. Smith, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2023, September 11-13, 2023, Atlanta, Georgia, USA, volume 275 of LIPIcs, pages 1:1–1:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.APPROX/RANDOM.2023.1.
- [3] Amir Abboud, Karl Bringmann, Danny Hermelin, and Dvir Shabtay. Scheduling lower bounds via AND subset sum. J. Comput. Syst. Sci., 127:29–40, 2022. doi:10.1016/J.JCSS.2022.01.005.
- [4] Amir Abboud, Nick Fischer, Elazar Goldenberg, Karthik C. S., and Ron Safier. Can you solve closest string faster than exhaustive search? In Inge Li Gørtz, Martin Farach-Colton, Simon J. Puglisi, and Grzegorz Herman, editors, 31st Annual European Symposium on Algorithms, ESA 2023, September 4-6, 2023, Amsterdam, The Netherlands, volume 274 of LIPIcs, pages 3:1–3:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.ESA.2023.3.
- [5] Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5):23:1–23:27, 2008. doi:10.1145/1411509.1411513.
- [6] David Aldous and Persi Diaconis. Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem. Bulletin of the American Mathematical Society, 36(4):413–432, 1999.
- [7] Josh Alman, Ran Duan, Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou. More asymmetry yields faster matrix multiplication. In Yossi Azar and Debmalya Panigrahi, editors, Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2025, New Orleans, LA, USA, January 12-15, 2025, pages 2005–2039. SIAM, 2025. doi:10.1137/1.9781611978322.63.
- [8] Christian Bachmaier, Franz J. Brandenburg, Andreas Gleißner, and Andreas Hofmeier. On the hardness of maximum rank aggregation problems. J. Discrete Algorithms, 31:2–13, 2015. doi:10.1016/J.JDA.2014.10.002.
- [9] Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput., 47(3):1087–1097, 2018. doi:10.1137/15M1053128.
- [10] Therese Biedl, Franz-Josef Brandenburg, and Xiaotie Deng. On the complexity of crossings in permutations. Discret. Math., 309(7):1813–1823, 2009. doi:10.1016/J.DISC.2007.12.088.
- [11] Felix Brandt, Vincent Conitzer, Ulle Endriss, Jérôme Lang, and Ariel D. Procaccia, editors. Handbook of Computational Social Choice. Cambridge University Press, 2016. doi:10.1017/CBO9781107446984.
- [12] Karl Bringmann and Bhaskar Ray Chaudhury. Polyline simplification has cubic complexity. J. Comput. Geom., 11(2):94–130, 2020. doi:10.20382/JOCG.V11I2A5.
- [13] Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In Venkatesan Guruswami, editor, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 79–97. IEEE Computer Society, 2015. doi:10.1109/FOCS.2015.15.
- [14] Boris Bukh, Karthik C. S., and Bhargav Narayanan. Applications of random algebraic constructions to hardness of approximation. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 2022, pages 237–244. IEEE, 2021. doi:10.1109/FOCS52979.2021.00032.
- [15] Marco L. Carmosino, Jiawei Gao, Russell Impagliazzo, Ivan Mihajlin, Ramamohan Paturi, and Stefan Schneider. Nondeterministic extensions of the strong exponential time hypothesis and consequences for non-reducibility. In Madhu Sudan, editor, Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, Cambridge, MA, USA, January 14-16, 2016, pages 261–270. ACM, 2016. doi:10.1145/2840728.2840746.
- [16] Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer. Approximating the median under the ulam metric. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 761–775. SIAM, 2021. doi:10.1137/1.9781611976465.48.
- [17] Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer. Clustering permutations: New techniques with streaming applications. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference, ITCS 2023, January 10-13, 2023, MIT, Cambridge, Massachusetts, USA, volume 251 of LIPIcs, pages 31:1–31:24. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.ITCS.2023.31.
- [18] Diptarka Chakraborty, Syamantak Das, Arindam Khan, and Aditya Subramanian. Fair rank aggregation. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022. URL: http://papers.nips.cc/paper_files/paper/2022/hash/974309ef51ebd89034adc64a57e304f2-Abstract-Conference.html.
- [19] Diptarka Chakraborty, Kshitij Gajjar, and Agastya Vibhuti Jha. Approximating the center ranking under ulam. In Mikolaj Bojanczyk and Chandra Chekuri, editors, 41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2021, December 15-17, 2021, Virtual Conference, volume 213 of LIPIcs, pages 12:1–12:21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.FSTTCS.2021.12.
- [20] Graham Cormode, S. Muthukrishnan, and Süleyman Cenk Sahinalp. Permutation editing and matching via embeddings. In Fernando Orejas, Paul G. Spirakis, and Jan van Leeuwen, editors, Automata, Languages and Programming, 28th International Colloquium, ICALP 2001, Crete, Greece, July 8-12, 2001, Proceedings, volume 2076 of Lecture Notes in Computer Science, pages 481–492. Springer, 2001. doi:10.1007/3-540-48224-5_40.
- [21] Roee David, Karthik C. S., and Bundit Laekhanukit. On the complexity of closest pair via polar-pair of point-sets. SIAM J. Discret. Math., 33(1):509–527, 2019. doi:10.1137/17M1128733.
- [22] Colin de la Higuera and Francisco Casacuberta. Topology of strings: Median string is np-complete. Theor. Comput. Sci., 230(1-2):39–48, 2000. doi:10.1016/S0304-3975(97)00240-5.
- [23] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In Vincent Y. Shen, Nobuo Saito, Michael R. Lyu, and Mary Ellen Zurko, editors, Proceedings of the Tenth International World Wide Web Conference, WWW 10, Hong Kong, China, May 1-5, 2001, pages 613–622. ACM, 2001. doi:10.1145/371920.372165.
- [24] Ronald Fagin, Ravi Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In Alon Y. Halevy, Zachary G. Ives, and AnHai Doan, editors, Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9-12, 2003, pages 301–312. ACM, 2003. doi:10.1145/872757.872795.
- [25] Nick Fischer, Elazar Goldenberg, Mursalin Habib, and Karthik C. S. Hardness of median and center in the ulam metric. CoRR, abs/2504.16437, 2025. doi:10.48550/arXiv.2504.16437.
- [26] Moti Frances and Ami Litman. On covering problems of codes. Theory Comput. Syst., 30(2):113–119, 1997. doi:10.1007/S002240000044.
- [27] Nick Goldman, Paul Bertone, Siyuan Chen, Christophe Dessimoz, Emily M. LeProust, Botond Sipos, and Ewan Birney. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nat., 494(7435):77–80, 2013. doi:10.1038/NATURE11875.
- [28] Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997. doi:10.1017/CBO9780511574931.
- [29] Donna Harman. Ranking algorithms. In William B. Frakes and Ricardo A. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, pages 363–392. Prentice-Hall, 1992.
- [30] Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have strongly exponential complexity? Journal of Computer and System Sciences, 63(4):512–530, 2001. doi:10.1006/JCSS.2001.1774.
- [31] Richard M. Karp. Reducibility among combinatorial problems. In Raymond E. Miller and James W. Thatcher, editors, Proceedings of a symposium on the Complexity of Computer Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA, The IBM Research Symposia Series, pages 85–103. Plenum Press, New York, 1972. doi:10.1007/978-1-4684-2001-2_9.
- [32] Karthik C. S. and Pasin Manurangsi. On closest pair in euclidean metric: Monochromatic is as hard as bichromatic. Comb., 40(4):539–573, 2020. doi:10.1007/S00493-019-4113-1.
- [33] John G. Kemeny. Mathematics without numbers. Daedalus, 88(4):577–591, 1959.
- [34] Claire Kenyon-Mathieu and Warren Schudy. How to rank with few errors. In David S. Johnson and Uriel Feige, editors, Proceedings of the 39th Annual ACM Symposium on Theory of Computing, San Diego, California, USA, June 11-13, 2007, pages 95–103. ACM, 2007. doi:10.1145/1250790.1250806.
- [35] Teuvo Kohonen. Median strings. Pattern Recognit. Lett., 3(5):309–313, 1985. doi:10.1016/0167-8655(85)90061-3.
- [36] J. Kevin Lanctôt, Ming Li, Bin Ma, Shaojiu Wang, and Louxin Zhang. Distinguishing string selection problems. Inf. Comput., 185(1):41–55, 2003. doi:10.1016/S0890-5401(03)00057-9.
- [37] Xue Li, Xinlei Wang, and Guanghua Xiao. A comparative study of rank aggregation methods for partial and top ranked lists in genomic applications. Briefings in bioinformatics, 20(1):178–189, 2019. doi:10.1093/BIB/BBX101.
- [38] Yuting Liu, Tie-Yan Liu, Tao Qin, Zhiming Ma, and Hang Li. Supervised rank aggregation. In Carey L. Williamson, Mary Ellen Zurko, Peter F. Patel-Schneider, and Prashant J. Shenoy, editors, Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007, pages 481–490. ACM, 2007. doi:10.1145/1242572.1242638.
- [39] Carlos D. Martínez-Hinarejos, Alfons Juan, and Francisco Casacuberta. Use of median string for classification. In 15th International Conference on Pattern Recognition, ICPR’00, Barcelona, Spain, September 3-8, 2000, pages 2903–2906. IEEE Computer Society, 2000. doi:10.1109/ICPR.2000.906220.
- [40] François Nicolas and Eric Rivals. Complexities of the centre and median string problems. In Ricardo A. Baeza-Yates, Edgar Chávez, and Maxime Crochemore, editors, Combinatorial Pattern Matching, 14th Annual Symposium, CPM 2003, Morelia, Michocán, Mexico, June 25-27, 2003, Proceedings, volume 2676 of Lecture Notes in Computer Science, pages 315–327. Springer, 2003. doi:10.1007/3-540-44888-8_23.
- [41] Samuel E. L. Oliveira, Victor Diniz, Anísio Lacerda, Luiz H. C. Merschmann, and Gisele L. Pappa. Is rank aggregation effective in recommender systems? an experimental analysis. ACM Trans. Intell. Syst. Technol., 11(2):16:1–16:26, 2020. doi:10.1145/3365375.
- [42] Pavel A. Pevzner. Computational molecular biology - an algorithmic approach. MIT Press, 2000.
- [43] V. Y. Popov. Multiple genome rearrangement by swaps and by element duplications. Theor. Comput. Sci., 385(1-3):115–126, 2007. doi:10.1016/J.TCS.2007.05.029.
- [44] Cyrus Rashtchian, Konstantin Makarychev, Miklós Z. Rácz, Siena Ang, Djordje Jevdjic, Sergey Yekhanin, Luis Ceze, and Karin Strauss. Clustering billions of reads for DNA data storage. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 3360–3371, 2017. URL: https://proceedings.neurips.cc/paper/2017/hash/ab7314887865c4265e896c6e209d1cd6-Abstract.html.
- [45] Craige Schensted. Longest increasing and decreasing subsequences. Canadian Journal of mathematics, 13:179–191, 1961.
- [46] Anke van Zuylen and David P. Williamson. Deterministic algorithms for rank aggregation and other ranking and clustering problems. In Christos Kaklamanis and Martin Skutella, editors, Approximation and Online Algorithms, 5th International Workshop, WAOA 2007, Eilat, Israel, October 11-12, 2007. Revised Papers, volume 4927 of Lecture Notes in Computer Science, pages 260–273. Springer, 2007. doi:10.1007/978-3-540-77918-6_21.
- [47] Tiance Wang, John Sturm, Paul W. Cuff, and Sanjeev R. Kulkarni. Condorcet voting methods avoid the paradoxes of voting theory. In 50th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2012, Allerton Park & Retreat Center, Monticello, IL, USA, October 1-5, 2012, pages 201–203. IEEE, 2012. doi:10.1109/ALLERTON.2012.6483218.
- [48] Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci., 348(2-3):357–365, 2005. doi:10.1016/J.TCS.2005.09.023.
- [49] H. Peyton Young. Condorcet’s theory of voting. American Political science review, 82(4):1231–1244, 1988.
- [50] H. Peyton Young and Arthur Levenglick. A consistent extension of condorcet’s election principle. SIAM Journal on applied Mathematics, 35(2):285–300, 1978.
