Abstract 1 Introduction 2 Preliminaries 3 Quadtree 4 Tournament 5 Rejection Sampling References Appendix A Reducing the Bit Precision of Inputs Appendix B Proof of Lemma 11

Even Faster Algorithm for the Chamfer Distance

Ying Feng MIT, Cambridge, MA, USA Piotr Indyk MIT, Cambridge, MA, USA
Abstract

For two d-dimensional point sets A,B of size up to n, the Chamfer distance from A to B is defined as CH(A,B)=aAminbBab. The Chamfer distance is a widely used measure for quantifying dissimilarity between sets of points, used in many machine learning and computer vision applications. A recent work of Bakshi et al, NeuriPS’23, gave the first near-linear time (1+ε)-approximate algorithm, with a running time of 𝒪(ndlog(n)/ε2). In this paper we improve the running time further, to 𝒪(nd(loglogn+log1ε)/ε2)). When ε is a constant, this reduces the gap between the upper bound and the trivial Ω(dn) lower bound significantly, from 𝒪(logn) to 𝒪(loglogn).

Keywords and phrases:
Chamfer distance
Category:
Track A: Algorithms, Complexity and Games
Funding:
Ying Feng: Supported by an MIT Akamai Presidential Fellowship.
Piotr Indyk: Supported in part by the NSF TRIPODS program (award DMS-2022448).
Copyright and License:
[Uncaptioned image] © Ying Feng and Piotr Indyk; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation
Editors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

1 Introduction

For any two d-dimensional point sets A,B of sizes up to n, the Chamfer distance from A to B is defined as

𝖢𝖧(A,B)=aAminbBab

where is the underlying norm defining the distance between the points. Chamfer distance and its variant, the Relaxed Earth Mover Distance [17, 6], are widely used metrics for quantifying the distance between two sets of points. These measures are especially popular in fields such as machine learning (e.g.,[17, 19]) and computer vision (e.g.,[7, 18, 11, 14]). A closely related notion of “the sum of maximum similarities”, where minbBab is replaced by maxbBab, has been recently popularized by the ColBERT system [15]. Efficient subroutines for computing Chamfer distances are provided in prominent libraries including Pytorch [2], PDAL [1] and Tensorflow [3]. In many applications (e.g., see  [17]), Chamfer distance is favored as a faster alternative to the more computationally intensive Earth-Mover Distance or Wasserstein Distance.

Despite the popularity of Chamfer distance, efficient algorithms for computing it haven’t attracted as much attention as algorithms for, say, the Earth-Mover Distance. The first improvement to the naive 𝒪(dn2)-time algorithm was obtained in [18], who utilized the fact that 𝖢𝖧(A,B) can be computed by performing |A| nearest neighbor queries in a data structure storing B. However, even when the state of the art approximate nearest neighbor algorithms are used, this leads to an (1+ϵ)-approximate estimator with only slightly sub-quadratic running time of 𝒪(dn1+12(1+ϵ)21) in high dimensions [5]111All algorithms considered in this paper are randomized, and return (1+ε)-approximate answers with a constant probability. The first near-linear-time algorithm for any dimension was proposed only recently in  [8], who gave a (1+ϵ)-approximation algorithm with a running time of 𝒪(dnlog(n)/ε2), for 1 and 2 norms. Since any algorithm for approximating the distance must run in at least Ω(dn) time222The Chamfer Distance could be dominated by the distance from a single point aA to B., the the upper and lower running time bounds differed by a factor of log(n)/ε2.

Our result.

In this paper we make a substantial progress towards reducing the gap between the upper and lower bounds for this problem. In particular, we show the following theorem. Assume a Word RAM model where both the input coordinates and the memory/processor word size is 𝒪(logn) bits.333In the Appendix, we adopt the reduction of [8] to extend the result to coordinates of arbitrary finite precision. Then:

Theorem 1.

There is an algorithm that, given two sets A,B of d-dimensional points with coordinates in {1poly(n)} and a parameter ε>0, computes a (1+ε)-approximation to the Chamfer distance from A to B under the 1 metric, in time

𝒪(nd(loglogn+log1ε)/ε2)).

The algorithm is randomized and is correct with a constant probability.

Thus, we reduce the gap between upper and lower bounds from 𝒪(log(n)/ε2) to
𝒪(loglogn+log1ε)/ε2).

1.1 Our techniques

Our result is obtained by identifying and overcoming the bottlenecks in the previous algorithm [8]. On a high level, that algorithm consists of two steps, described below. For the sake of exposition, in what follows we assume that the target approximation factor 1+ε is some constant.

Outline of the prior algorithm.

In the first step, for each point aA, the algorithm computes an estimate 𝒟a of the distance 𝗈𝗉𝗍a from a to its nearest neighbor in B. The estimate is 𝒪(logn)-approximate, meaning that we 𝗈𝗉𝗍a𝒟a𝒪(logn)𝗈𝗉𝗍a. This is achieved as follows. First, the algorithm imposes 𝒪(logn) grids of sidelength 1,2,4,, and maps each point in B to the corresponding cells. Then, for each a, it identifies the finest grid cell containing both a and some point bB. Finally, it uses the distance between a and b as an estimate 𝒟a. To ensure that this process yields an 𝒪(logn)-approximation, each grid needs to be independently shifted at random. We emphasize that this independence between the shifts of different grids is crucial to ensure the 𝒪(logn)-approximation guarantee - the more natural approach of using “nested grids” does not work. The whole process takes 𝒪(nd) time per grid, or 𝒪(ndlogn) time overall.

In the second step, the algorithm estimates the Chamfer distance via importance sampling. Specifically, the algorithm samples T points from A, such that the probability of sampling a is proportional to the estimate 𝒟a. For each sampled point a, the distance 𝗈𝗉𝗍a from a to its nearest neighbor in B is computed directly in 𝒪(nd) time. The final estimate of the Chamfer distance is equal to the weighted average the T values 𝗈𝗉𝗍a . It can be shown that if the number of samples T is equal to the distortion 𝒪(logn) of the estimates 𝒟a, this yields a constant factor approximation to the Chamfer distance from A to B. The overall cost of the second step is 𝒪(Tnd)=𝒪(ndlogn), i.e., asymptotically the same as the cost of the first step.

Intuitions behind the new algorithm.

To improve the running time, we need to reduce the cost of each of the two steps. In what follows we outline the obstacles to this task and how they can be overcome.

Step 1.

The main difficulty in reducing the cost of the first step is that, for each grid, the point-to-cell assignment takes 𝒪(nd) time to compute, so computing these T assignments separately for each grid takes 𝒪(ndT) time. And, since each grid is independently translated by a different random vector, the grids are not nested, i.e., a (smaller) cell of side length 2i might contain points from many (larger) cells of side length 2i+1. As a result, is unclear how to reuse the point-to-cell assignment in one grid to speedup the assignment in another grid, while computing them separately takes 𝒪(ndT) time.

To overcome this difficulty, we abandon independent shifts and resort to 𝒪(logn) nested grids. Such grids can be viewed as forming a quadtree with 𝒪(logn) levels, where any cell C at level i+1 (i.e., of side length 2i+1) is connected to 2d cells at level i contained in C. (Note that the root node of the quadtree has the highest level 𝒪(logn)). Although using a single quadtree increases the approximation error, we show that using two independently shifted quadtrees retains the 𝒪(logn) approximation factor. That is, we repeat the process of finding the finest grid cell containing both a and some point from B twice, and return the point in B that is closer to a. This amplifies the probability of finding a point from B that is “close” to a, which translates into a better approximation factor compared to using a single quadtree.

We still need show that the point-to-cell assignments can be computed efficiently.. To this end, we observe that for each point a, its assignment to all 𝒪(logn) nested grids can be encoded as d words of length 𝒪(logn), or a d×𝒪(logn) bit matrix M. Each row corresponds to one of the d coordinates, and the most significant bit of a row indicates the assignment to cells at the highest level (i.e. cells with the largest side length) with respect to that coordinate. In other words, the most significant bits of all coordinates are packed into the first column, etc. We observe that two points a and b lie in the same cell of side length 2i if and only if their matrices agree in all but the last i columns. If we transpose M and read the resulting matrix in the row-major order, then finding a point bB in the finest grid cell containing a is equivalent to finding b that shares the longest common prefix with a. We show that this transposition can be done using 𝒪(lognloglogn) simple operations on words, yielding 𝒪(nlognloglogn)=𝒪(ndloglogn) time overall.

As an aside, we note that quadtree computation is a common task in many geometric algorithms [12]. Although an 𝒪(n) algorithm for this task was known for constant dimension d  [10]444Assuming that each coordinate can be represented using logn bits., to the best of our knowledge our algorithm is the first to achieve 𝒪(ndloglogn) time for arbitrary dimension.

Step 2.

At this point we computed estimates 𝒟a such that 𝗈𝗉𝗍a𝒟a𝒪(logn)𝗈𝗉𝗍a. Given these estimates, importance sampling still requires sampling Ω(logn) points. Therefore, we improve the running time by approximating (up to a constant factor) the values 𝗈𝗉𝗍a, as opposed to computing them exactly. This is achieved by computing 𝒪(loglogn) random projections of the input points, which ensures that that the distance between any fixed pair of points is well-approximated with probability 11/poly(logn). We then employ these projections in a variant of the tournament algorithm of [16] which computes 𝒪(1)-approximate estimates of 𝗈𝗉𝗍a for 𝒪(logn) sampled points a in 𝒪(ndloglogn) time. Since the algorithm of [16] works for the 2 metric as opposed to the 1 metric, we replace Gaussian random projections with Cauchy random projections, and re-analyze the algorithm.

This completes the overview of an 𝒪(ndloglogn)-time algorithm for estimate the Chamfer distance up to a constant factor. To achieve a (1+ε)-approximation guarantee for any ε>0, we proceed as follows. First, instead of sampling 𝒪(logn) points as before, we sample 𝒪(log(n)/ε2) points a. Then, we use the tournament algorithm to compute 𝒪(1)-approximations to 𝗈𝗉𝗍a, as before. 555Note that we could use the tournament algorithm to report (1+ε)-approximate answers, but then the dependence of the running time on 1/ε would become quartic, as the 1/ε2 term in the sample size would be multiplied by another 1/ε2 term in the bound for the number of projections needed to guarantee that the tournament algorithm returns (1+ε)-approximate answers. Then we use a technique called rejection sampling to simulate the process of sampling 𝒪(1/ε2) points a with probability proportional to Θ(𝗈𝗉𝗍a). For each such point, we compute 𝗈𝗉𝗍a exactly in 𝒪(nd) time. Finally, we use the 𝒪(1/ε2) sampled points a and the exact values of 𝗈𝗉𝗍a in importance sampling to estimate the Chamfer distance up to a factor of 1+ε.

This concludes the overview of our algorithm for the Chamfer distance under the 1 metric. We remark that [8] also extends their result from the 1 metric to the 2 metric by first embedding points from 2 to 1 using random projections. This takes 𝒪(ndlogn) time, which exceeds the runtime of our algorithm, eliminating our improvement. However, a faster embedding method would yield an improved runtime for the Chamfer distance under the 2 metric. We leave finding a faster embedding algorithm as an open problem.

2 Preliminaries

In this paper, we consider the regime where the approximation factor εlog2nn. Note that otherwise, an 𝒪(nd/ε2) time bound would be close to the runtime of a naive exact computation.

In the proof of Theorem 1, we assume a Word RAM model where both the input coordinates and the memory/processor word size is 𝒪(logn) bits. This model is particularly important in procedures Concatenate and Transpose, where we rely on the fact that we can shift bits and perform bit-wise AND, ADD and OR operations in constant time.

Notation.

For any integers a1, we use [n] to denote the set of all integers from 1 to n. For any two real numbers a,b such that ab, we use [a,b] to denote the set of all reals from a to b. Let d be the dimension of points.

For any qd, define 𝗈𝗉𝗍qP:=minpPqp1 for some subset P of d. We will omit the superscript P when it is clear in the context.

3 Quadtree

In Figure 1, we show an algorithm QuadTree that outputs crude estimations of the nearest distances simulatenously for a set of points. The estimation guarantee is the same as the 𝙲𝚛𝚞𝚍𝚎𝙽𝙽 algorithm in [8]. While [8] achieves this using a quadtree with logn independent levels, which naturally introduce a logn runtime overhead, we show that two compressed quadtrees with dependent levels suffice. Our construction of compressed quadtrees is a generalization of [10] to high dimensions.

Input: Two size-n subsets Q:={qi}i[n] and P:={pi}i[n] of a metric space (d,1), such that Q,P[0,α]d for some bound α=poly(n).

Output: A set of n values {𝒟i}i[n], such that every 𝒟i satisfies 𝒟i𝗈𝗉𝗍qiP.

  1. 1.

    Let t=log(α)+1. Sample two uniformly random points z,z[0,2t1]d. For any point x[0,α]d, define

    h(x):=(x1+z1,x2+z2,,xd+zd),
    h(x):=(x1+z1,x2+z2,,xd+zd),

    where xi,zi,zi are the i-th coordinates of x,z,z, respectively.

  2. 2.

    For each xQP:

    • Compute h(x) and write each element of h(x) as a t-bit binary string. Then h(x) can be viewed as a d-by-t binary matrix stored in the row-major order, whose (i,j)-th entry is the j-th significant bit of the i-th element of h(x). Transpose this matrix and concatenate the rows of the transpose. Denote the resulting binary string as h(x).

    • Similarly, compute h(x).

  3. 3.

    Use h(x) as keys to sort all xQP. Also, use h(x) as keys to sort all xQP.

  4. 4.

    For each qiQ:

    • Use the sort to find a pP that maximizes the length l of the longest common prefix of h(qi) and h(p). Similarly, find a pP that maximizes the length l of the longest common prefix of h(qi) and h(p).

    • If ll then output 𝒟i:=qip1; otherwise, output 𝒟i:=qip1.

Figure 1: The QuadTree Algorithm.

Correctness.

For any x[0,α]d and any integer k such that 0kt, let hk(x):=(x1+z12k,x2+z22k, ,xd+zd2k), where z is the random point drawn on Line 1 in Figure 1. Observe that hk(x) is related to the prefix of h(x).

Claim 2.

Let q,p[0,α]d be arbitrary. For any integer k such that 0kt, hk(q)=hk(p) if and only if h(q) and h(p) share a common prefix of length at least d(tk).

Proof.

If h(q) and h(p) share a common prefix of length at least d(tk), then in hashes h(q) and h(p), the first (tk) bits of all d coordinates are the same. hk(q) and hk(p) compute exactly these bits, thus hk(q)=hk(p). The reverse direction holds symmetrically.

Claim 2 justifies using hk()’s as an alternative representation of the binary string h(). [8] shows that hk has a locality-sensitive property, which will help us to bound the distance between points.

Claim 3 (Lemma A.4 of [8]).

For any fixed integer k such that 0kt and any two points q,p[0,α]d,

Pr[hk(q)hk(p)]qp12k,
Pr[hk(q)=hk(p)]exp(qp12k),

where the probabilities are over the random choice of z.

We now show that if two points have the same hash hk, then their distance is likely not too much greater than 2k. A straight-forward bound follows from the diameter of the d-dimensional cube.

Lemma 4.

For all qQ, pP, and 0kt, the following always holds: If hk(q)=hk(p) then qp12kd.

Proof.

Observe that hk(q)=hk(p) only if q+z and p+z are in the same d-dimensional cube of side-length 2k. The diameter of such a cube under the 1 norm is 2kd. Therefore, for any q,p and 0kt, qp12kd is a necessary condition for hk(q)=hk(p) to hold.

Moreover, using Claim 3, we can bound this ratio with respect to n.

Lemma 5.

With probability at least 1𝒪(1/n), the following holds simultaneously for all qQ, pP, and 0kt: If hk(q)=hk(p) then qp12k3logn.

Proof.

We show the contrapositive that with probability 1𝒪(1/n), k<log(qp1/3logn) implies hk(q)hk(p) simultaneously for all qQ and pP. It suffices to argue that for any fixed pair of points qQ and pP, this holds with probability at least 1𝒪(1/n3). The lemma then follows by a union bound over n2 pairs.

Let k0 denote the largest integer k that satisfies k<log(qp1/3logn). Then we have

Pr[hk0(q)=hk0(p)]exp(qp12k0)exp(3logn).

i.e., with probability at least 1𝒪(1/n3), hk0(q)hk0(p). Also, it is easy to see that if hk0(q)hk0(p), then for all kk0, hk(q)hk(p), concluding the claim.

Symmetrically, if we define hk(x):=(x1+z12k,x2+z22k,,xd+zd2k), the claims and lemmas above also hold for hk. Using these, we show that the expected outputs of the QuadTree algorithm are (crude) estimations of the nearest neighbor distances.

Theorem 6.

With probability at least 1𝒪(1/n), it holds for all qiQ that E[𝒟i]5min(d,3logn)𝗈𝗉𝗍qiP.

Proof.

We assume the success case of Lemma 5 for both hk and hk. Fix an arbitrary qiQ. Recall that the QuadTree algorithm finds p,pP for qi, which are associated with longest common prefixes of lengths l,l, respectively. For integer k:0kt, let k denote the event d(tk)max(l,l)<d(tk+1). Observe from Claim 2 that when k happens,

  • either ll and hk(qi)=hk(p),

  • or l>l and hk(qi)=hk(p).

In both cases, we know from Lemma 4 and 5 that 𝒟i2kmin(d,3logn).

Let D:=min(d,3logn), p:=argminpPqip1, and k:=log(𝗈𝗉𝗍qi). We have

E[𝒟i] 0ktPr[k](2kD)
D(0kkPr[k]𝗈𝗉𝗍qi+
k<ktPr[hk1(qi)hk1(p)hk1(qi)hk1(p)]2k)

where the second inequality holds because k implies that neither pair {h(qi),h(p)} nor {h(qi),h(p)} share a common prefix of length d(tk+1). Thus hk1(qi)hk1(p) and hk1(qi)hk1(p) by Claim 2.

Moreover, events k for all k form a partition of a sample space, so kPr[k]1. Applying this and the locality sensitive properties of hk1 and hk1, we get

E[𝒟i]D(𝗈𝗉𝗍qi+k<kt(𝗈𝗉𝗍qi2k1)22k)D(𝗈𝗉𝗍qi+2𝗈𝗉𝗍qik<kt𝗈𝗉𝗍qi2k1)5D𝗈𝗉𝗍qi

Runtime analysis.

Lemma 7 (Line 2).

For any xQP, h(x) (and h(x)) can be computed in 𝒪(dloglogn) time.

Proof.

We assume without loss of generality that both d,t are powers of 2. Computing the binary matrix representation of h(x) can be done in 𝒪(d) time since t=𝒪(logn). Given this, we compute h(x) as follows.

Case 1: 𝒅𝒕.

We partition the matrix into t-by-t square submatrices, denoted by

𝖬𝖺𝗍𝗋𝗂𝗑(h(x)):=[M1M2Md/t]t}d

For each i[d/t], we use a recursive subroutine Transpose(Mi,t) to compute Mi. See Figure 2 for a pictorial illustration of the Transpose algorithm.

Refer to caption
Figure 2: A pictorial example for 8-by-8 square matrix and w=4. The left figure (a) shows how the Transpose algorithm handles the rows of M~ in lines 2a and 3a. The right figure (b) illustrates the transpose outcome.

Input: An I-by-J bit matrix M where IJ are powers of 2. An integer w that is a power of 2 and 2wI.

Output: An I-by-J matrix M such that if it is partitioned into w-by-w square submatrices, then each submatrix is the transpose of the corresponding submatrix of M at the same coordinates.

  1. 1.

    Let

    M~={Mif w=2Transpose(M,w/2)otherwise

    be zero-indexed and M~[i,j] is its (i,j)-th entry.

  2. 2.

    For each integer i such that 0i<I:

    1. (a)

      Compute a J-bit binary string bi such that for j:0j<J, its j-th bit bi[j]={M~[i,j]if (jmodw)<w/20otherwise.

      Also, compute a string bi¯[j]={0if (jmodw)<w/2M~[i,j]otherwise.

  3. 3.

    Define an I-by-J matrix M, such that for each integer 0i<I:

    1. (a)

      Let the i-th row of M be {bi+bi+(w/2)(w/2)if (imodw)<w/2bi¯+bi(w/2)¯(w/2)if (imodw)w/2,

      where (w/2) (resp. ) denote the operation of shifting a string to the right (resp. left) by w/2 bits.

  4. 4.

    Output M.

The correctness of the Transpose algorithm can be shown by induction on (the base-2 logarithm of) w. When I=J=t=𝒪(logn), Line 2a and 3a can be done using a constant number of operations on words. Thus we get the following runtime.

Claim 8.

Assuming t=𝒪(logn) the procedure Transpose(Mi,t) runs in 𝒪(tlogt) time.

We execute the Transpose algorithm for all i, which takes 𝒪((d/t)tlogt)=𝒪(dloglogn). Then we can write down h(x) by concatenating rows of Mi’s, which takes 𝒪(t(d/t)) time.

Case 2: 𝒕>𝒅.

We again partition the matrix into t-by-t square submatrices. In this case, we obtain 𝖬𝖺𝗍𝗋𝗂𝗑(h(x))=M:=[M1M2Mt/d]t}d.

Claim 9.

Given t=𝒪(logn), Transpose(M,d) runs in 𝒪(dlogd)𝒪(dloglogn) time.

We execute Transpose(M,d) and obtain M=[M1M2Mt/d]. In principle, to obtain h(x), we just concatenate d(t/d) rows of all Mi. However, when td, this takes longer than 𝒪(dloglogn) time. We instead use another recursive subroutine Concatenate(M,d). An example of the Concatenate algorithm is given in Figure 3.

Refer to caption
Figure 3: A pictorial example that shows the behavior of Line 2a and 3a of the Concatenate algorithm on a 4-by-4w matrix.

Input: An I-by-J bit matrix M where IJ are powers of 2. An integer w that is a power of 2 and IwJ.

Output: An IJ-bit string B such that if it is partitioned into w-bit blocks, then the u-th block (zero-indexed from left to right) are bits on the (umodI)-th row of M from column wu/I to wu/I+w.

  1. 1.

    If w=I then output B=M.

  2. 2.

    For each integer i such that 0i<I:

    1. (a)

      Partition the i-th row of M into w-bit blocks, denoted as [bi,1wbi,2bi,J/w]J. Compute a 2J-bit string bi=[bi,10wbi,20wbi,J/w0w], where 0w is a w-bit all-zero string.

  3. 3.

    Define an I/2-by-2J matrix M, such that for each integer 0i<I/2:

    1. (a)

      Let the i-th row of M be b2i+b2i+1w, where w is the operation of shifting a string to the right by w bits.

  4. 4.

    Output Concatenate(M,2w).

The correctness of the Concatenate algorithm can again be observed by inducting on the logarithm of w. Line 2a and 3a can be done using 𝒪((J/w)w/logn) operations on words, and both lines are repeated for I times in each recursive call. Therefore, the total runtime is

𝒪(s=1log(d)2s2log(d)st2log(d)sd2log(d)sdlogn) =𝒪(s=1log(d)2smax(td,td2log(d)sdlogn))
=𝒪(logdmax(t,tdlogn))
=𝒪(dloglogn)

Theorem 10.

The QuadTree algorithm runs in 𝒪(ndloglogn) time.

Proof.

Computing h(x) for all x takes 𝒪(nd) time. Then computing h(x) takes
𝒪(ndloglogn) time. After that, sorting 𝒪(n)-many 𝒪(dlogn)-bit strings can be done in 𝒪(nd) time using radix sort. Finally, to find pP with the longest common prefix for every qQ, we go through the sorted list and link each qQ with adjacent pP, which takes 𝒪(n) total time. The above time bounds also hold for h(x)’s, resulting in 𝒪(ndloglogn) time in total.

4 Tournament

In this section, we compute the 2-approximation of the nearest neighbor distances for logarithmically many queries. We do so using a depth-2 tournament. In one branch of the tournament tree, we sample a small set S¯ of input points uniformly at random. In the other branch, we partition input points into random groups, project them to a lower dimensional space, and then collect the nearest neighbor in the projected space in every group as a set S~. The final output of the tournament is the nearest neighbor among points in S¯,S~ in the original space. Intuitively, if there are many (2-approximate) near neighbors, then a random subset S¯ should contain one of them. And when there are few, these neighbors are likely to be assigned to different random groups, in which case the true nearest neighbor should be collected to S~.

Notation.

We use the same notation D:=min(d,3logn) as in the previous section. For any finite subset T, let 𝗆𝖾𝖽T denote the median of T.

When working under the 1 norm, we use Cauchy random variables to project points. We first recall a standard bound on the median of projections, which will be useful for our analysis. (The following lemma essentially follows from Claim 2 and Lemma 2 in [13]; we reprove it in the appendix for completeness.)

Lemma 11.

Let x,yd and 0<c<1/2. Sample r random vectors v1,v2,,vr(𝖢𝖺𝗎𝖼𝗁𝗒(0,1))d. With probability at least 12erc2/50, 𝗆𝖾𝖽{|vi(xy)|:i[r]}(1±c)xy1.

In Figure 4, we describe how to construct a data structure to find 2-approximate nearest neighbors. The construction borrows ideas from the second algorithm of [16], but using a tournament of depth 2 instead of 𝒪(logn).

Input: A set of t queries {qi}i[t] and a set of n points P, which are both subsets of a metric space (d,1).

Output: A set of t values {𝒟i}i[t], such that every 𝒟i satisfies 𝒟i𝗈𝗉𝗍qiP.

Building the Data Structure.

  1. 1.

    Let r800(2logt+loglogn).

  2. 2.

    For each j[r], draw vj(𝖢𝖺𝗎𝖼𝗁𝗒(0,1))d, compute vjp for all points pP, and store all vj and vjp.

  3. 3.

    Randomly partition P into n/logn subsets P1,P2,,Pn/logn, each of size logn.

Processing the Queries.

For each query q:=qi for i[t]:

  1. 1.

    Compute vjq for all j[r].

  2. 2.

    Let S~ be an empty set. For each k[n/logn]:

    • Compute 𝗆𝖾𝖽p:=𝗆𝖾𝖽{|vj(qp)|:j[r]} for every pPk.

    • Find p:=argminpPk{𝗆𝖾𝖽p} and add it into S~.

  3. 3.

    Find p~=argminpS~qp1 by computing and comparing all exact distances qp1 for pS~.

  4. 4.

    Let S¯ be a set of w90tlogtlogn samples drawn uniformly at random from P. Find a point p¯=argminpS¯qp1 by computing and comparing all exact distances qp1 for pS¯.

  5. 5.

    Output 𝒟i:=min(qp~1,qp¯1).

Figure 4: The Tournament Algorithm.

Correctness.

Fix a query q:=qi. Let S denote the set of all 2-approximate nearest neighbors to q, i.e., S:={pP:qp12𝗈𝗉𝗍q}. We prove the correctness of the algorithm by casing on the size of S.

Lemma 12 (Case 1).

If |S|n30tlogn then with probability at least 1110t we have p¯S.

Proof.

With probability 130tlogn, a random sample from P is in S. Therefore, for S¯ containing w independent samples, at least one of them is in S with probability 1(1130tlogn)w1ew/(30tlogn). Moreover, when this happens, it must be that p¯S. Setting w90tlogtlogn gives the desired probability.

Lemma 13 (Case 2).

If |S|<n30tlogn then with probability at least 1110t, p~S.

Let pP denote a nearest neighbor of q, i.e. qp1=𝗈𝗉𝗍q. To prove Lemma 13, we first make the following observation:

Lemma 14.

Let P be an arbitrary subset of PS. The probability that there exists pP such that 𝗆𝖾𝖽p𝗆𝖾𝖽p, where 𝗆𝖾𝖽p:=𝗆𝖾𝖽{|vj(qp)|:j[r]} and 𝗆𝖾𝖽p:=𝗆𝖾𝖽{|vj(qp)|:j[r]}, is at most 2(|P|+1)t2logn.

Proof.

From P(PS) we know that qp1>2qp1 for any pP. Therefore, if 𝗆𝖾𝖽p𝗆𝖾𝖽p then either 𝗆𝖾𝖽p(1±14)qp1 or 𝗆𝖾𝖽p(1±14)qp1. Applying Lemma 11 with c=14 and r800(2logt+loglogn) and a union bound, we get that

Pr[pP:𝗆𝖾𝖽p𝗆𝖾𝖽p] Pr[pP:𝗆𝖾𝖽p(1±14)qp1]+
Pr[𝗆𝖾𝖽p(1±14)qp1]
(|P|+1)2e(2logt+loglogn)
=2(|P|+1)t2logn.

In Line 3 of the data structure building procedure, the point p is assigned to one of the subsets P{P1,P2, ,Pn/logn}. If |S|<n30tlogn, then one can show that p is likely the only 2-approximate nearest neighbor in P. Conditioned on this, we can use Lemma 14 to show that p is added into S~ with high probability.

Lemma 15.

If |S|<n30tlogn, then with probability at least 1110t, p is added into S~ on Line 2 when processing the query q.

Proof.

The set P contains logn points. Since we randomly partition P into P1,,Pn/logn, P{p} is a uniformly random subset of P. When |S|<n30tlogn, Pr[(P{p})S=]1120t. Conditioned on this event, we have

Pr[p(P{p}):𝗆𝖾𝖽p𝗆𝖾𝖽p]2|P|t2logn120t

by Lemma 14, as long as t40. Therefore, with probability at least 1110t, p has the smallest median of projected distance to q, and thus must be added to S~.

Lemma 13 is a direct corollary of Lemma 15.

Proof (of Lemma 13)..

pS~ implies that if we compute p~=argminpS~qp1, we are guaranteed to find a nearest neighbor of q, which is clearly in S.

Combining Lemma 12 and 13 we get the following correctness guarantee:

Theorem 16.

Given t queries {qi}i[t], with probability at least 9/10, the Tournament algorithm outputs 2-approximate nearest neighbors simulataneously for all t queries.

Finally, we state the runtime guarantee as follows:

Theorem 17.

The Tournament algorithm runs in 𝒪(n(d+t)(logt+loglogn)+dt2logtlogn) time.

Proof.

For preprocessing, the algorithm projects all points in P using r projections, which takes 𝒪(ndr) time. To process a query q, we first take 𝒪(dr) time to project q. We then count the number of comparisons we make to find the minimums of medians, which is 𝒪((n/logn)lognr) using a linear-time median selection algorithm [9]. Each comparison can be done in 𝒪(1) time given that vjp and vjq for all j[r] and pP are stored. Finally, we do a linear scan over S~ and S¯, which takes 𝒪(d(logn+w)) time.

We plug in r=𝒪(logt+loglogn) and w=𝒪(tlogtlogn). For t queries, the total runtime is 𝒪(n(d+t)(logt+loglogn)+dt2logtlogn).

For our purpose of estimating the Chamfer distance, we will apply the Tournament algorithm with a number of queries t=Θ(D/ε2) for D=min(d,3logn) and some ε>0 satisfying ε2=𝒪(nlog4n). Under this setting, the runtime is dominated by the first additive term of Theorem 17, which is at most 𝒪(nd(loglogn+log1ε)/ε2).

5 Rejection Sampling

Notation.

All occurrences of 𝗈𝗉𝗍 in this section are with respect to the set B. Let ε>0 be our target approximation factor. We call the distribution 𝒫 an f-Chamfer distribution for some f=f(n,d,ε), if it is supported on A and for every aA,

f𝗈𝗉𝗍a𝖢𝖧(A,B)𝒫(a),where we denote 𝒫(a):=Prx𝒫[x=a].

We first show a general bound for estimating the Chamfer distance using samples from a Chamfer distribution. This follows from a standard analysis of importance sampling.

Lemma 18.

Let X:={xi}i[t] be a set of t samples drawn from a f-chamfer distribution 𝒫. Fix h=h(n,d,ε)1. Given an arbitrarily 𝗈𝗉𝗍~xi for every xi that satisfies 𝗈𝗉𝗍xi𝗈𝗉𝗍~xih𝗈𝗉𝗍xi, then for any 0<κ<1,

Pr[𝖢𝖧~(A,B)(1κ)𝖢𝖧(A,B)]+Pr[𝖢𝖧~(A,B)(1+κ)h𝖢𝖧(A,B)]h2f1tκ2,

where 𝖢𝖧~(A,B):=i[t]𝗈𝗉𝗍~xi/𝒫(xi)t.

Proof.

For the purpose of analysis, assume that we additionally have arbitrary 𝗈𝗉𝗍~a for a(AX) that also satisfies 𝗈𝗉𝗍a𝗈𝗉𝗍~ah𝗈𝗉𝗍a. By linearity,

E[𝖢𝖧~(A,B)]=i[t]E[𝗈𝗉𝗍~xi/𝒫(xi)]t=aA𝒫(a)𝗈𝗉𝗍~a𝒫(a)[𝖢𝖧(A,B),h𝖢𝖧(A,B)].

We also bound the variance

Var[𝖢𝖧~(A,B)] E[𝗈𝗉𝗍~x12/𝒫(x1)2]t𝖢𝖧(A,B)2
1t(aA𝗈𝗉𝗍~a2𝒫(a)𝖢𝖧(A,B)2)
1t(hf𝖢𝖧(A,B)aA𝗈𝗉𝗍~a𝖢𝖧(A,B)2)
1t𝖢𝖧(A,B)2(h2f1)

where the third inequality follows from 1𝒫(a)𝖢𝖧(A,B)f𝗈𝗉𝗍a and 𝗈𝗉𝗍~ah𝗈𝗉𝗍a. Finally, by Chebyshev’s Inequality, we have

Pr[|𝖢𝖧~(A,B)E[𝖢𝖧~(A,B)]|κ𝖢𝖧(A,B)]1th2f1κ2.

In this section, we aim to construct a set of samples S={sj}j[s] for some large enough s, such that each sj is drawn from a fixed 𝒪(1)-Chamfer distribution. Once we have S, we can compute a weighted sum of the nearest neighbor distances for sjS, and invoke Lemma 18 to show that it is likely an (1+ε)-estimation of 𝖢𝖧(A,B).

We will construct such S via a two-step sampling procedure: in the first step, we sample Θ(D/ε2) points from A using a distribution defined by the estimations from the QuadTree algorithm. In the second step, we subsample these Θ(D/ε2) points, using an acceptance probability defined by the estimations from the Tournament algorithm. We describe our Chamfer-Estimate algorithm in Figure 5.

Input: Two subsets A,B of a metric space (d,1) of size n, a parameter ε>0, and a parameter q.

Output: An estimated value 𝖢𝖧~(A,B).

  1. 1.

    Execute the algorithm QuadTree(A,B), and let the output be a set of values {𝒟a}aA which always satisfy 𝒟a𝗈𝗉𝗍a. Let 𝒟:=aA𝒟a.

  2. 2.

    Construct a probability distribution 𝒫 supported on A such that for every aA, 𝒫(a)=𝒟a𝒟. For i[q], sample xi𝒫.

  3. 3.

    Execute the algorithm Tournament({xi}i[q],B), and let the output be a set of values {𝒟xi}i[q] which always satisfy 𝒟xi𝗈𝗉𝗍xi. Let 𝒟:=i[q]𝒟xi𝒫(xi)/q and denote 𝒫(a):=𝒟a𝒟 (which is well-defined only if a=xi for some i[q]).

  4. 4.

    Define

    M:=maxi[q]𝒫(xi)𝒫(xi).

    For each i[q], mark xi as accepted with probability 𝒫(xi)M𝒫(xi).

    If the number of accepted xi is less than s=10/ε2 then output Fail and exit the algorithm. Otherwise, collect the first s accepted xi as a set S:={sj}j[s].

  5. 5.

    Compute 𝗈𝗉𝗍sj for each j. Output

    𝖢𝖧~(A,B):=j[s]𝗈𝗉𝗍sj𝒫(sj)/s.
Figure 5: The Chamfer-Estimate Algorithm.

The Chamfer-Estimate algorithm applies the QuadTree algorithm and the Tournament algorithm as subroutines. If they are executed successfully, their outputs should satisfy the following conditions:

Condition 19.

We say the QuadTree algorithm succeeds if for every aA, E[𝒟a]5D𝗈𝗉𝗍a.

Condition 20.

We say the Tournament algorithm succeeds if for every xi for i[q], 𝒟xi2𝗈𝗉𝗍xi.

That is, as described in the introduction, we need QuadTree to provide 𝒪(logn)-approximation (to ensure that the sample size q can be at most logarithmic in n), and that Tournament provide 𝒪(1)-approximation (to ensure that the final estimator using s samples has variance bounded by a constant).

We state some facts about the Chamfer-Estimate algorithm, which will be useful for our analysis.

Claim 21 (Line 2).

Under Condition 19, with probability at least 9/10, 𝒫 is a (1/50D)-Chamfer Distribution.

Proof.

With probability at least 9/10, 𝒟50D𝖢𝖧(A,B) by Markov’s Inequality. Upon this condition, for any aA, 𝗈𝗉𝗍a50D𝖢𝖧(A,B)𝒟a𝒟.

Claim 22 (Line 3).

Let q104D. Under Condition 19 and 20, with probability at least 4/5, 𝒟𝖢𝖧(A,B)/2.

Proof.

We apply the importance sampling analysis in Lemma 18. We assume that Claim 21 holds and 𝗈𝗉𝗍xi𝒟xi2𝗈𝗉𝗍xi, then

Pr[𝒟(112)𝖢𝖧(A,B)]2250D1q(12)2<110.

Analysis of 𝑺.

We now show that the set S on Line 4 collects enough samples (thus the algorithm does not fail) and is equivalent to sampling from a 𝒪(1)-Chamfer distribution 𝒬. We note that the algorithm, in fact, only knows a (1/50D)-Chamfer distribution 𝒫 and probabilities 𝒫(xi) for {xi}i[q], so it cannot explicitly sample from such 𝒬. Nevertheless, by a standard analysis of rejection sampling, we show that S “simulates” sampling from 𝒬.

Lemma 23.

Let q104D/ε2. Under Condition 19 and 20, with probability at least 3/5, the number of accepted xi is at least q, so the algorithm does not fail.

Proof.

We assume that Claim 21 and 22 hold. Then for any xi, 1𝒫(xi)50D𝖢𝖧(A,B)𝗈𝗉𝗍xi and 𝒫(xi)=𝒟a𝒟2𝗈𝗉𝗍xi𝖢𝖧(A,B)/2. Thus M200D. The expectation is

E[|{accepted xi}|]=i[q]𝒫(xi)M𝒫(xi)1200Di[q]𝒟xi𝒫(xi)1𝒟=1200Dq𝒟1𝒟=q200D

where the second to last equality is due to the definition of 𝒟:=i[q]𝒟xi𝒫(xi)/q. The final bound holds by Markov’s Inequality and our setting of q.

Lemma 24.

Each sj is independently and identically distributed, and under Condition 20, Pr[sj=a]𝗈𝗉𝗍a2𝖢𝖧(A,B) for any aA.

Proof.

The independence and identicality follows directly from our sampling procedure. For the probability statement, we assume (without loss of generality) that during rejection sampling on Line 4, a sample xi is accepted and renamed as sj. Then for any aA,

Pr[sj=a] =Pr[xi=axi accepted]
=𝒫(a)Pr[xi acceptedxi=a]Pr[xi accepted]
=𝒫(a)Pr[xi acceptedxi=a]a0A𝒫(a0)Pr[xi acceptedxi=a0]
=𝒫(a)𝒫(a)M𝒫(a)a0A𝒫(a0)𝒫(a0)M𝒫(a0)

In the final equality, because we conditioned on xi=a (resp. xi=a0) on the LHS, we know that on the RHS, 𝒫(a)=𝒟(a)𝒟 is well-defined and satisfy 𝗈𝗉𝗍a𝒟(a)2𝗈𝗉𝗍a (resp. 𝗈𝗉𝗍a0𝒟(a0)2𝗈𝗉𝗍a0), given Condition 20. Therefore, we have

Pr[sj=a]=𝒫(a)a0A𝒫(a0)=𝒟(a)a0A𝒟(a0)𝗈𝗉𝗍a2𝖢𝖧(A,B).

Lemma 23 and 24 together say that S can be viewed as a set of s samples from a 12-Chamfer Distribution, thus we can invoke another importance sampling analysis. In the final step of the algorithm, we compute the exact nearest neighbor distance for all sj and then compute a weighted sum over them. With high probability, this gives an (1±ε)-estimation of 𝖢𝖧(A,B).

Theorem 25.

Under Condition 19 and 20, Chamfer-Estimate(A,B,ε,q104D/ε2) outputs 𝖢𝖧~(A,B) that satisfies (1ε)𝖢𝖧(A,B)𝖢𝖧~(A,B)(1+ε)𝖢𝖧(A,B) with probability at least 1/2.

Proof.

In the success case of Lemma 23, we can apply Lemma 18 with f=1/2, h=1, t=s, and κ=ε. Then

Pr[|𝖢𝖧~(A,B)𝖢𝖧(A,B)|ε𝖢𝖧(A,B)]1sε2.

Theorem 26.

Chamfer-Estimate(A,B,ε,q=104D/ε2) runs in time 𝒪(nd(loglogn+log1ε)/ε2)).

Proof.

This is dominated by the runtime of QuadTree, Tournament, and the time of computing 𝗈𝗉𝗍sj on Line 5. QuadTree(A,B) runs in 𝒪(ndloglogn) time and Tournament ({xi}i[q],B) runs in 𝒪(nd(loglogn+log1ε)/ε2) time. Finally, the brute-force search for 𝗈𝗉𝗍sj for j[10/ε2] takes 𝒪(nd/ε2) time.

References

  • [1] Pdal: Chamfer. https://pdal.io/en/2.4.3/apps/chamfer.html, 2023. Accessed: 2023-05-12.
  • [2] Pytorch3d: Loss functions. https://pytorch3d.readthedocs.io/en/latest/modules/loss.html, 2023. Accessed: 2023-05-12.
  • [3] Tensorflow graphics: Chamfer distance. https://www.tensorflow.org/graphics/api_docs/python/tfg/nn/loss/chamfer_distance/evaluate, 2023. Accessed: 2023-05-12.
  • [4] Arne Andersson, Torben Hagerup, Stefan Nilsson, and Rajeev Raman. Sorting in linear time? In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’95, pages 427–436, New York, NY, USA, 1995. Association for Computing Machinery. doi:10.1145/225058.225173.
  • [5] Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 793–801, 2015. doi:10.1145/2746539.2746553.
  • [6] Kubilay Atasu and Thomas Mittelholzer. Linear-complexity data-parallel earth mover’s distance approximations. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 364–373. PMLR, 09–15 June 2019. URL: https://proceedings.mlr.press/v97/atasu19a.html.
  • [7] Vassilis Athitsos and Stan Sclaroff. Estimating 3d hand pose from a cluttered image. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., volume 2, pages II–432. IEEE, 2003.
  • [8] Ainesh Bakshi, Piotr Indyk, Rajesh Jayaram, Sandeep Silwal, and Erik Waingarten. A near-linear time algorithm for the chamfer distance, 2023. doi:10.48550/arXiv.2307.03043.
  • [9] Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan. Time bounds for selection. J. Comput. Syst. Sci., 7(4):448–461, August 1973. doi:10.1016/S0022-0000(73)80033-9.
  • [10] Timothy M. Chan. Well-separated pair decomposition in linear time? Information Processing Letters, 107(5):138–141, 2008. doi:10.1016/j.ipl.2008.02.008.
  • [11] Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613, 2017.
  • [12] Sariel Har-Peled. Geometric approximation algorithms. Number 173. American Mathematical Soc., 2011.
  • [13] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM (JACM), 53(3):307–323, 2006. doi:10.1145/1147954.1147955.
  • [14] Li Jiang, Shaoshuai Shi, Xiaojuan Qi, and Jiaya Jia. Gal: Geometric adversarial loss for single-view 3d-object reconstruction. In Proceedings of the European conference on computer vision (ECCV), pages 802–816, 2018.
  • [15] Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020. doi:10.1145/3397271.3401075.
  • [16] Jon M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’97, pages 599–608, New York, NY, USA, 1997. Association for Computing Machinery. doi:10.1145/258533.258653.
  • [17] Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. In International conference on machine learning, pages 957–966. PMLR, 2015. URL: http://proceedings.mlr.press/v37/kusnerb15.html.
  • [18] Erik B Sudderth, Michael I Mandel, William T Freeman, and Alan S Willsky. Visual hand tracking using nonparametric belief propagation. In 2004 Conference on Computer Vision and Pattern Recognition Workshop, pages 189–189. IEEE, 2004.
  • [19] Ziyu Wan, Dongdong Chen, Yan Li, Xingguang Yan, Junge Zhang, Yizhou Yu, and Jing Liao. Transductive zero-shot learning with visual structure constraint. Advances in neural information processing systems, 32, 2019.

Appendix A Reducing the Bit Precision of Inputs

In our algorithm, we assumed that all points in input sets A,B are integers in {1,2,, poly(n)}d. Here, we show that this is without loss of generality, as long as all coordinates of the original input are w-bit integers for arbitrary wlogn in a unit-cost RAM with a word length of w bits.

Section A.3 of [8] gives an efficient reduction from real inputs to the case that

1minaA,bBab1maxaA,bBab1poly(n),

i.e., the input has a poly(n)-bounded aspect ratio. Their reduction can be adapted to our case as follows:

Claim 27 (Lemma A.3 of [8]).

Given an 𝖾𝗌𝗍 such that 𝖢𝖧(A,B)𝖾𝗌𝗍poly(n)𝖢𝖧(A,B), if there exists an algorithm that computes an (1+ε)-approximation to 𝖢𝖧(A,B) in 𝒪(nd(loglogn+log1ε)/ε2)) time under the assumption that A,B contain points from {1poly(n)}d, then there exists an algorithm that computes an (1+ε)-approximation to 𝖢𝖧(A,B) for any integer-coordinate A,B in asymptotically same time.

It remains to show how to obtain a poly(n)-approximation.

Lemma 28.

There exists an 𝒪(nd+nloglogn)-time algorithm that computes 𝖾𝗌𝗍 which satisfies 𝖢𝖧(A,B)𝖾𝗌𝗍poly(n)𝖢𝖧(A,B) with 11n probability.

Proof.

Similar to (the proof of Lemma A.3 in) [8], we sample a vector v𝖢𝖺𝗎𝖼𝗁𝗒(0,1), which can be discretized to 𝒪(logn)-bit precision following [13]. We then compute the inner products {va}aA and {vb}bB. The distribution of vavb follows 𝖢𝖺𝗎𝖼𝗁𝗒(0,ab1) by the 1-stability property of Cauchy’s. So we have that for every aA and bB,

ab1poly(n)|vavb|ab1poly(n),

with probability 11/poly(n). Therefore, 𝖾𝗌𝗍:=𝖢𝖧({va}aA,{vb}bB) is a poly(n)-approximation to 𝖢𝖧(A,B). We may assume by scaling that {va}aA,{vb}bB contain w-bit integers, which can be sorted in 𝒪(nloglogn) time [4]. Then to compute 𝖾𝗌𝗍, we find all one-dimensional nearest neighbors by going through the sorted list and link each a{va}aA with adjacent b{vb}bB, which takes 𝒪(n) time. Thus the total runtime is 𝒪(nd+nloglogn) as claimed.

Appendix B Proof of Lemma 11

Proof.

We use the fact that for v(𝖢𝖺𝗎𝖼𝗁𝗒(0,1))d and any xd, (vx)𝖢𝖺𝗎𝖼𝗁𝗒(0,x1). Also, for any k>0, if a random variable z𝖢𝖺𝗎𝖼𝗁𝗒(0,1) then kz𝖢𝖺𝗎𝖼𝗁𝗒(0,k). Therefore, for any vi:i[r], Pr[|vi(xy)|>(1+c)xy1]=Pr[U>1+c] where U𝖧𝖺𝗅𝖿𝖢𝖺𝗎𝖼𝗁𝗒(0,1). The density of U is fU(u)=2π11+u2, thus Pr[U>1]=1/2 and

Pr[U>1+c] =1211+cfU(u)𝑑u
12cfU(3/2) for 0<c<1/2
<12c/10

Similarly, we can get Pr[|vi(xy)|<(1c)xy1]<12c/10. For i[r], let i be an indicator variable that equals 1 if |vi(xy)|<(1c)xy1 and equals 0 otherwise. By Hoeffding’s bound,

Pr[i[r]ir2]<e2rc2/100,

which upper bounds the failure probability that the median is too small. We symmetrically bound the probability that the median is too large. Then

Pr[𝗆𝖾𝖽{|vi(xy)|:i[r]}(1±c)xy1]12erc2/50.